CAREER: Towards Sensor Database Systems
NSF Award IIS-0133481
Principal Investigator
Johannes
E. Gehrke
Department
of Computer Science
Cornell University
4105B
Upson Hall
Ithaca, NY 14850
Phone:
607-255-1045
Fax:
607-255-4428
johannes@cs.cornell.edu
http://www.cs.cornell.edu/johannes
Keywords
Sensor
network, wireless network, ad-hoc network, query processing, ubiquitous
computing
Project Summary
The widespread distribution and availability of
small-scale sensors, actuators, and embedded processors is transforming the
physical world into a computing platform. One such example is a sensor network
consisting of a large number of sensor nodes that combine physical sensing
capabilities such as temperature, light, or seismic sensors with networking and
computation capabilities. Applications
range from environmental control, warehouse inventory, health
care to military environments. Existing sensor networks assume that the sensors
are preprogrammed and send data to a central frontend where the data is
aggregated and stored for offline querying and analysis. This approach has two major drawbacks. First, the user cannot change the behavior of
the system on the fly. Second,
communication in today's networks is orders of magnitude more expensive than
local computation, thus in-network processing can vastly reduce resource usage
and thus extend the lifetime of a sensor network.
Our work investigates a database approach to unite
the seemingly conflicting requirements of scalability and flexibility in
monitoring the physical world. The objective of our research is to build a new
distributed data management infrastructure that scales with the growth of
sensor interconnectivity and computational power on
the sensors over the next decades. Our system called COUGAR resides directly on
the sensor nodes and creates the abstraction of a single processing node
without centralizing data or computation.
COUGAR provides scalable, fault-tolerant, flexible data access and
intelligent data reduction, and its design involves a confluence of novel
research in database query processing, data mining, networking, and distributed
systems.
Publications and Products
- Zhiyuan Chen, J. E. Gehrke, and Flip Korn. Query Optimization In Compressed Database Systems. In Proceedings of
the 2001 ACM Sigmod International Conference on Management of Data (SIGMOD
2001), Santa Barbara, California, May 2001.
- J. E. Gehrke, Flip Korn, and Divesh Srivastava. On Computing
Correlated Aggregates Over Continual Data
Streams. In Proceedings of the 2001 ACM Sigmod International Conference
on Management of Data (SIGMOD 2001), Santa Barbara, California, May 2001.
- Anton Faradjian, J. E.
Gehrke, and Philippe Bonnet. GADT: A Probability Space ADT For Representing and Querying the Physical World. In Proceedings
of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, February 2002.
- Francis Chu, Joseph Halpern, and J. E. Gehrke. Least Expected Cost
Query Optimization: What Can We Expect? In Proceedings of the 21st ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS
2002). Madison, Wisconsin, June 2002.
- Wai Fu Fung, David Sun, and J. E. Gehrke. COUGAR: The Network is the
Database. In Proceedings of the 2002 ACM Sigmod International
Conference on Management of Data (SIGMOD 2002), Madison, Wisconsin, June 2002. Demo
description.
- Yong Yao and J. E. Gehrke. The Cougar Approach to In-Network Query
Processing in Sensor Networks. Sigmod Record, Volume 31, Number 3,
September 2002.
- Yong Yao and J. E. Gehrke. Query Processing in Sensor Networks. In
Proceedings of the First Biennial Conference on Innovative Data Systems
Research (CIDR 2003), Asilomar, California, January 2003.
- Rohit Ananthakrishna,
Abhinandan Das, J. E. Gehrke, Flip Korn, S. Muthukrishnan, and Divesh
Srivastava. Efficient Approximation of Correlated Sums on Data Streams. IEEE
Transactions on Knowledge and Data Engineering, Vol. 15, No. 3,
May/June 2003, pages 569-572.
- Tobias Mayr, Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri.
Leveraging Non-Uniform Resources for Parallel Query Processing. In Proceedings
of the 3rd IEEE/ACM International Symposium on Cluster Computing and the
Grid (CCGrid 2003). Tokyo, Japan, May 2003.
- Abhinandan Das, J. E. Gehrke, and Mirek Riedewald. Approximate Join
Processing Over Data Streams. In Proceedings of the 2003 ACM SIGMOD
International Conference on Management of Data (SIGMOD 2003). San Diego, CA, June 2003.
The project has a website with an online demo that
can be accessed at http://cougar.cs.cornell.edu.
Project Impact
- The project develops a novel way of thinking about sensor networks
and new methods for programming sensor networks. This new approach could
have significant impact on industry.
- The project funds a graduate student with an expectation that his
work will lead to a Ph.D. thesis.
- The project involves three undergraduate students, two of them
participants of Cornell’s Presidential Research Scholars Program.
Goals, Objectives and Targeted
Activities
The objective of this work is to design and implement a
distributed database system called Cougar for sensor networks that permits
programming of the sensor network through declarative queries. We believe that
declarative queries are especially suitable for sensor network interaction:
Clients issue queries without knowing how the results are generated, processed,
and returned to the client. Sophisticated catalog management, query
optimization, and query processing techniques will abstract the user from the
physical details of contacting the relevant sensors, processing the sensor
data, and sending the results to the user.
One of the main goals of the Cougar System is to perform
in-network query processing due to the importance of preserving limited
resources, such as energy and bandwidth in battery-powered wireless sensor
networks. Data transmission back to a central node for offline storage,
querying, and data analysis is very expensive for sensor networks of
non-trivial size since communication using the radio consumes a lot of
energy. Since sensor nodes have the
ability to perform local computation, part of the computation can be moved from
the clients and pushed into the sensor network, aggregating records, or
eliminating irrelevant records. Compared
to traditional centralized data extraction and analysis, In-network processing
can reduce energy consumption and reduce bandwidth usage by replacing more
expensive communication with relatively cheaper computation, extending the
lifetime of the sensor network significantly.
These goals encompass the following research
activities:
- Research on scalable and adaptive distributed query processing
techniques for sensor data queries that allow us to trade off resource
usage versus the quality of the query answer.
- Algorithms for computation of aggregate queries over data streams
in a single pass with limited memory.
- New data types that model the uncertainty inherent in sensor data.
- A prototype that integrates our research into a working system for
use in research and education.
Area Background
Recent developments in hardware have enabled the
widespread deployment of sensor networks consisting of small sensor nodes with
sensing, computation, and communication capabilities. Already today networked
sensors measuring only a few cubic inches can be purchased commercially, and Moore's law tells us that cheap,
small, and powerful components will be ubiquitous in the near future. Our
research deals with sensor nodes that are communicating via wireless multi-hop
RF radio powered by small batteries. Such sensor nodes have the following
resource constraints:
- Communication. The wireless network connecting the sensor nodes
provides usually only a very limited quality of service,
has latency with high variance, limited bandwidth, and frequently
drops packets.
- Power consumption. Sensor nodes have limited supply of energy, and
thus energy conservation needs to be of the main system design considerations
of any sensor network application.
- Computation. Sensor nodes have limited computing power and memory
sizes. This restricts the types of data processing algorithms on a sensor
node, and it restricts the sizes of intermediate results that can be
stored on the sensor nodes.
- Uncertainty in sensor readings. Signals detected at physical
sensors have inherent uncertainty, and they may contain noise from the
environment. Sensor malfunction
might generate inaccurate data, and unfortunate sensor placement (such as
a temperature sensor directly next to the air conditioner) might bias
individual readings.
Area References
- Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. Querying the
Physical World. IEEE Personal Communications, Vol. 7, No. 5, October 2000,
pages 10-15. Special Issue on Smart Spaces and Environments.
- Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and
Wei Hong. TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks.
OSDI, December, 2002.
- Yong Yao and J. E. Gehrke. Query Processing in Sensor Networks. In
Proceedings of the First Biennial Conference on Innovative Data Systems
Research (CIDR 2003), Asilomar, California, January 2003.
- Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and
Wei Hong. The Design of an Acquisitional Query Processor for Sensor
Networks. SIGMOD, June 2003, San Diego, CA
Related Projects
Project Websites
http://www.cs.cornell.edu/database/cougar.
This is the project webpage for the Cornell Cougar
Sensor Data Management Project.