Overview

We believe it is extremely important for the database community to have access to high-quality implementations of existing algorithms to facilitate detailed and repeatable experiments. This website documents our initial project -- a spatial join benchmark. After completing this study, we hope to perform similar studies in several other areas, including knn queries and selectivity estimation algorithms. It is our hope that the community will become involved in creating a repository of algorithms for development and evaluation.

Advisory Board

In order to encourage community participation, we have formed an advisory board to provide feedback and guide our work. We are fortunate to have the guidance of the following distinguished researchers:

Research

Many modern applications, including location-based services, games, and scientific simulations, rely on high performance spatial processing. In these applications, large numbers of moving objects sense their environment using spatial queries and then update their positions or velocities. For performance reasons, or to adhere to existing scientific models, such applications frequently divide time into discrete timesteps, and queries and updates occurring in the same timestep are logically simultaneous. This data access pattern can be abstracted as a spatial join that is executed repeatedly with batched updates between each execution.

While there have been many spatial join algorithms proposed in the literature, it is not clear which to choose for these iterative applications. There is a tension between repeatedly re-executing the join from scratch and using an algorithm designed for moving objects to incrementally maintain the join result. The former methods may recompute results that have not changed, while the latter are more complex and may have higher overhead. The best option may change depending on the properties of the application, and we would like to understand the full spectrum of possibilities.

To address this question, we have undertaken an extensive study of iterated spatial join algorithms for distance (range) queries. Unlike many existing benchmarks, we implement all algorithms in main-memory, as we have found that the working set of many spatial applications fits comfortably in the memory of a single machine or a small cluster. We tune and test ten distinct spatial join algorithms on four different workloads, making this one of the most extensive studies of its kind. The following table shows the methods we evaluated:

Algorithms

Join Approach Index Approach
Static Moving
Index Nested Loops R-Tree, CR-Tree, Linearized KD-Trie, Static Grid TPR-Tree, STRIPES
Specialized Plane Sweep, PBSM, Synchronized Traversal AE

Workloads

Our workload consists of four workloads, the first three of which are based on a study by Chen et al. The workloads are described in more detail below.

Uniform Workload Gaussian Workload Network Workload

Experimental Results

More coming soon!

Tuning Results

Static Nested Loops Static Specialized Methods Moving Nested Loops Moving Specialized Methods
  • Plane Sweep/PBSM
  • Synchronous Traversal
  • TPR-Tree
  • STRIPES
  • AE

Source Code

1/13 -- We are working hard to make sure that our code is easy to use and maintain. While we are finishing some modifications, we are making a prerelease version of the project available for those who are interested. We will replace this with an open source repository in the near future.

Funding

This research has been supported by the National Science Foundation under Grant IIS-0725260, by the Air Force Office of Scientific Research, by the iAd Project funded by the Research Council of Norway and by a grant from Microsoft. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.