Menu:

Spatial Indexing at Cornell

Many modern applications, including location-based services, games, and scientific simulations, rely on high performance spatial processing. In these applications, large numbers of moving objects sense their environment using spatial queries and then update their positions or velocities. For performance reasons, or to adhere to existing scientific models, such applications frequently divide time into discrete timesteps, and queries and updates occurring in the same timestep are logically simultaneous. This data access pattern can be abstracted as a spatial join that is executed repeatedly with batched updates between each execution. At Cornell, we have developed a benchmark to better understand the performance of spatial joins on modern hardware.

Overview

While there have been many spatial join algorithms proposed in the literature, it is not clear which to choose for these iterative applications. There is a tension between repeatedly re-executing the join from scratch and using an algorithm designed for moving objects to incrementally maintain the join result. The former methods may recompute results that have not changed, while the latter are more complex and may have higher overhead. The best option may change depending on the properties of the application, and we would like to understand the full spectrum of possibilities.

To address this question, we have undertaken an extensive study of iterated spatial join algorithms for distance (range) queries. Unlike many existing benchmarks, we implement all algorithms in main-memory, as we have found that the working set of many spatial applications fits comfortably in the memory of a single machine or a small cluster. We tune and test ten distinct spatial join algorithms on four different workloads, making this one of the most extensive studies of its kind. The following table shows the methods we evaluated:

Methods

Join Approach Index Approach
Static Moving
Index Nested Loops R-Tree, CR-Tree, Linearized KD-Trie, Static Grid TPR-Tree, STRIPES
Specialized Plane Sweep, PBSM, Synchronized Traversal AE

Workloads

Uniform Workload Gaussian Workload Network Workload
  • Uniform Workload

    In the uniform workload, points are instantiated uniformly at random positions in a fixed size square space and move in random directions. The speed of each point is chosen uniformly at random from a fixed set of possible speeds. At each timestep, a fixed uniformly-chosen fraction of objects issue queries, and a fixed fraction of objects change velocity. To handle boundaries, we make the space a torus, so that objects that exit the space in the upper right corner reappear in the lower left corner with the same velocity.

  • Gaussian Workload

    The gaussian workload models a skewed workload in which objects cluster around a fixed set of hotspots. The hotspots are initialized at uniform random locations, and points are distributed into rings around each hotspot according to a Gaussian distribution. The rings affect the speed and update frequency of the points -- those points closer to a hotspot move faster and update more frequently.

  • Network Workload

    To model constrained motion in space, we use a simple network model that models random motion on three real road networks from Oldenburg, Singapore, and San Francisco. Objects are placed randomly on network edges and move randomly over time. When an object reaches a network node, it chooses a random edge incident to that node and proceeds with a new speed. If an object overshoots the next node during a tick, its position is corrected using a special position update in the trace. Each object queries with a fixed probability at every tick.

  • Fish Simulation

    To evaluate our methods on a realistic application, we implemented a model of schooling fish proposed in Nature by Couzin et al. In this model, each fish is attracted to other fish within a distance α, but repelled by fish within distance between ρ if there are no fish within α. We implement this using two spatial joins -- the first finds all fish within distance α, and the second finds fish within distance ρ for those fish for which the first join produced no results.

Publications

Software

New version is coming soon!

Advisory Board

In order to encourage community participation, we have formed an advisory board to provide feedback and guide our work. We are fortunate to have the guidance of the following distinguished researchers:

Funding

This research has been supported by the National Science Foundation under Grants IIS-0911036 and IIS-1012593, by the Air Force Office of Scientific Research under Award FA9550-10-1-0202, the iAd Project funded by the Research Council of Norway, and by gifts from NEC and Google. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.