CAREER: Towards Sensor Database Systems

NSF Award IIS-0133481

 

Principal Investigator

Johannes E. Gehrke

Department of Computer Science

Cornell University

4105B Upson Hall

Ithaca, NY 14850

Phone: 607-255-1045

Fax: 607-255-4428
johannes@cs.cornell.edu

http://www.cs.cornell.edu/johannes

 

Keywords

Sensor network, wireless network, ad-hoc network, query processing, ubiquitous computing

 

Project Summary

The widespread distribution and availability of small-scale sensors, actuators, and embedded processors is transforming the physical world into a computing platform. One such example is a sensor network consisting of a large number of sensor nodes that combine physical sensing capabilities such as temperature, light, or seismic sensors with networking and computation capabilities.  Applications range from environmental control, warehouse inventory, health care to military environments. Existing sensor networks assume that the sensors are preprogrammed and send data to a central frontend where the data is aggregated and stored for offline querying and analysis.  This approach has two major drawbacks.  First, the user cannot change the behavior of the system on the fly.  Second, communication in today's networks is orders of magnitude more expensive than local computation, thus in-network processing can vastly reduce resource usage and thus extend the lifetime of a sensor network.

 

Our work investigates a database approach to unite the seemingly conflicting requirements of scalability and flexibility in monitoring the physical world. The objective of our research is to build a new distributed data management infrastructure that scales with the growth of sensor interconnectivity and computational power on the sensors over the next decades. Our system called COUGAR resides directly on the sensor nodes and creates the abstraction of a single processing node without centralizing data or computation.  COUGAR provides scalable, fault-tolerant, flexible data access and intelligent data reduction, and its design involves a confluence of novel research in database query processing, data mining, networking, and distributed systems.

 

Publications and Products

  • Zhiyuan Chen, J. E. Gehrke, and Flip Korn. Query Optimization In Compressed Database Systems. In Proceedings of the 2001 ACM Sigmod International Conference on Management of Data (SIGMOD 2001), Santa Barbara, California, May 2001.
  • J. E. Gehrke, Flip Korn, and Divesh Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. In Proceedings of the 2001 ACM Sigmod International Conference on Management of Data (SIGMOD 2001), Santa Barbara, California, May 2001.
  • Anton Faradjian, J. E. Gehrke, and Philippe Bonnet. GADT: A Probability Space ADT For Representing and Querying the Physical World. In Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, February 2002.
  • Francis Chu, Joseph Halpern, and J. E. Gehrke. Least Expected Cost Query Optimization: What Can We Expect? In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2002). Madison, Wisconsin, June 2002.
  • Wai Fu Fung, David Sun, and J. E. Gehrke. COUGAR: The Network is the Database. In Proceedings of the 2002 ACM Sigmod International Conference on Management of Data (SIGMOD 2002), Madison, Wisconsin, June 2002. Demo description.
  • Yong Yao and J. E. Gehrke. The Cougar Approach to In-Network Query Processing in Sensor Networks. Sigmod Record, Volume 31, Number 3, September 2002.
  • Yong Yao and J. E. Gehrke. Query Processing in Sensor Networks. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, California, January 2003.
  • Rohit Ananthakrishna, Abhinandan Das, J. E. Gehrke, Flip Korn, S. Muthukrishnan, and Divesh Srivastava. Efficient Approximation of Correlated Sums on Data Streams. IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 3, May/June 2003, pages 569-572.
  • Tobias Mayr, Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. Leveraging Non-Uniform Resources for Parallel Query Processing. In Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003). Tokyo, Japan, May 2003.
  • Abhinandan Das, J. E.  Gehrke, and Mirek Riedewald. Approximate Join Processing Over Data Streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003). San Diego, CA, June 2003.

The project has a website with an online demo that can be accessed at http://cougar.cs.cornell.edu.

 

Project Impact

  • The project develops a novel way of thinking about sensor networks and new methods for programming sensor networks. This new approach could have significant impact on industry.
  • The project funds a graduate student with an expectation that his work will lead to a Ph.D. thesis.
  • The project involves three undergraduate students, two of them participants of Cornell’s Presidential Research Scholars Program.

 

Goals, Objectives and Targeted Activities

The objective of this work is to design and implement a distributed database system called Cougar for sensor networks that permits programming of the sensor network through declarative queries. We believe that declarative queries are especially suitable for sensor network interaction: Clients issue queries without knowing how the results are generated, processed, and returned to the client. Sophisticated catalog management, query optimization, and query processing techniques will abstract the user from the physical details of contacting the relevant sensors, processing the sensor data, and sending the results to the user.

 

One of the main goals of the Cougar System is to perform in-network query processing due to the importance of preserving limited resources, such as energy and bandwidth in battery-powered wireless sensor networks. Data transmission back to a central node for offline storage, querying, and data analysis is very expensive for sensor networks of non-trivial size since communication using the radio consumes a lot of energy.  Since sensor nodes have the ability to perform local computation, part of the computation can be moved from the clients and pushed into the sensor network, aggregating records, or eliminating irrelevant records.  Compared to traditional centralized data extraction and analysis, In-network processing can reduce energy consumption and reduce bandwidth usage by replacing more expensive communication with relatively cheaper computation, extending the lifetime of the sensor network significantly.

 

These goals encompass the following research activities:

  • Research on scalable and adaptive distributed query processing techniques for sensor data queries that allow us to trade off resource usage versus the quality of the query answer.
  • Algorithms for computation of aggregate queries over data streams in a single pass with limited memory.
  • New data types that model the uncertainty inherent in sensor data.
  • A prototype that integrates our research into a working system for use in research and education.

 

Area Background

Recent developments in hardware have enabled the widespread deployment of sensor networks consisting of small sensor nodes with sensing, computation, and communication capabilities. Already today networked sensors measuring only a few cubic inches can be purchased commercially, and Moore's law tells us that cheap, small, and powerful components will be ubiquitous in the near future. Our research deals with sensor nodes that are communicating via wireless multi-hop RF radio powered by small batteries. Such sensor nodes have the following resource constraints:

  • Communication. The wireless network connecting the sensor nodes provides usually only a very limited quality of service, has latency with high variance, limited bandwidth, and frequently drops   packets.
  • Power consumption. Sensor nodes have limited supply of energy, and thus energy conservation needs to be of the main system design considerations of any sensor network application.
  • Computation. Sensor nodes have limited computing power and memory sizes. This restricts the types of data processing algorithms on a sensor node, and it restricts the sizes of intermediate results that can be stored on the sensor nodes.
  • Uncertainty in sensor readings. Signals detected at physical sensors have inherent uncertainty, and they may contain noise from the environment.  Sensor malfunction might generate inaccurate data, and unfortunate sensor placement (such as a temperature sensor directly next to the air conditioner) might bias individual readings.

 

Area References

  • Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. Querying the Physical World. IEEE Personal Communications, Vol. 7, No. 5, October 2000, pages 10-15. Special Issue on Smart Spaces and Environments.
  • Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks. OSDI, December, 2002.
  • Yong Yao and J. E. Gehrke. Query Processing in Sensor Networks. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, California, January 2003.
  • Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The Design of an Acquisitional Query Processor for Sensor Networks. SIGMOD, June 2003, San Diego, CA

 

Related Projects

 

 

Project Websites

http://www.cs.cornell.edu/database/cougar.

This is the project webpage for the Cornell Cougar Sensor Data Management Project.