Overview
We believe it is extremely important for the database community to have access to high-quality implementations of existing algorithms to facilitate detailed and repeatable experiments. This website documents our initial project -- a spatial join benchmark. After completing this study, we hope to perform similar studies in several other areas, including knn queries and selectivity estimation algorithms. It is our hope that the community will become involved in creating a repository of algorithms for development and evaluation.Advisory Board
In order to encourage community participation, we have formed an advisory board to provide feedback and guide our work. We are fortunate to have the guidance of the following distinguished researchers:- Prof. Dr. Michael Gertz, Ruprecht-Karls-University of Heidelberg.
- Prof. Christian S. Jensen, Aarhus University.
- Prof. Beng Chin Ooi, National University of Singapore.
- Prof. Jignesh Patel, University of Wisconsin, Madison.
- Prof. Dr. Bernhard Seeger, Philipps-Universität, Marburg
Research
Many modern applications, including location-based services, games, and scientific simulations, rely on high performance spatial processing. In these applications, large numbers of moving objects sense their environment using spatial queries and then update their positions or velocities. For performance reasons, or to adhere to existing scientific models, such applications frequently divide time into discrete timesteps, and queries and updates occurring in the same timestep are logically simultaneous. This data access pattern can be abstracted as a spatial join that is executed repeatedly with batched updates between each execution.
While there have been many spatial join algorithms proposed in the literature, it is not clear which to choose for these iterative applications. There is a tension between repeatedly re-executing the join from scratch and using an algorithm designed for moving objects to incrementally maintain the join result. The former methods may recompute results that have not changed, while the latter are more complex and may have higher overhead. The best option may change depending on the properties of the application, and we would like to understand the full spectrum of possibilities.
To address this question, we have undertaken an extensive study of iterated spatial join algorithms for distance (range) queries. Unlike many existing benchmarks, we implement all algorithms in main-memory, as we have found that the working set of many spatial applications fits comfortably in the memory of a single machine or a small cluster. We tune and test ten distinct spatial join algorithms on four different workloads, making this one of the most extensive studies of its kind. The following table shows the methods we evaluated:
Algorithms
Join Approach | Index Approach | |
---|---|---|
Static | Moving | |
Index Nested Loops | R-Tree, CR-Tree, Linearized KD-Trie, Static Grid | TPR-Tree, STRIPES |
Specialized | Plane Sweep, PBSM, Synchronized Traversal | AE |
Workloads
Our workload consists of four workloads, the first three of which are based on a study by Chen et al. The workloads are described in more detail below.
![]() |
![]() |
![]() |
Uniform Workload | Gaussian Workload | Network Workload |
-
Uniform Workload
In the uniform workload, points are instantiated uniformly at random positions in a fixed size square space and move in random directions. The speed of each point is chosen uniformly at random from a fixed set of possible speeds. At each timestep, a fixed uniformly-chosen fraction of objects issue queries, and a fixed fraction of objects change velocity. To handle boundaries, we make the space a torus, so that objects that exit the space in the upper right corner reappear in the lower left corner with the same velocity.
-
Gaussian Workload
The gaussian workload models a skewed workload in which objects cluster around a fixed set of hotspots. The hotspots are initialized at uniform random locations, and points are distributed into rings around each hotspot according to a Gaussian distribution. The rings affect the speed and update frequency of the points -- those points closer to a hotspot move faster and update more frequently.
-
Network Workload
To model constrained motion in space, we use a simple network model that models random motion on three real road networks from Oldenburg, Singapore, and San Francisco. Objects are placed randomly on network edges and move randomly over time. When an object reaches a network node, it chooses a random edge incident to that node and proceeds with a new speed. If an object overshoots the next node during a tick, its position is corrected using a special position update in the trace. Each object queries with a fixed probability at every tick.
-
Fish Simulation
To evaluate our methods on a realistic application, we implemented a model of schooling fish proposed in Nature by Couzin et al. In this model, each fish is attracted to other fish within a distance α, but repelled by fish within distance between ρ if there are no fish within α. We implement this using two spatial joins -- the first finds all fish within distance α, and the second finds fish within distance ρ for those fish for which the first join produced no results.
Experimental Results
More coming soon!Tuning Results
Static Nested Loops | Static Specialized Methods | Moving Nested Loops | Moving Specialized Methods |
---|---|---|---|
|
|
|
Source Code
1/13 -- We are working hard to make sure that our code is easy to use and maintain. While we are finishing some modifications, we are making a prerelease version of the project available for those who are interested. We will replace this with an open source repository in the near future.
- Prerelease Code: prerelease_1_13.tgz