Mirek RiedewaldPhoto

Research Associate
Department of Computer Science
4105-C Upson Hall

Cornell University
Ithaca, NY 14853

phone +1-607-255 0110, fax (dept): +1-607-255 4428
 

Research Interests

My general areas of interest are databases and information systems. Currently I am focusing on the following areas:

Cayuga: Managing Data Streams

We have developed novel techniques for addressing resource constraints in data stream management systems [DGR03, DGR05], including an optimal offline algorithm, hardness results, and efficient heuristics. We also developed an approach for fast approximation of aggregate queries over spatial data [DGR04]. This technique can be used for highly accurate selectivity estimation in spatial databases, e.g., for spatial joins and range queries. It's main advantage over previous work in the area is that approximation quality guarantees (confidence intervals) are returned to the user.

Currently we are focusing on building a highly scalable data stream processing system called Cayuga. Cayuga's main feature is that it can support very high throughput, up to thousands of events per second depending on the application, even if it has to process tens of thousands of active stream monitoring queries. To achieve this kind of scalability, we designed a novel query language with formal semantics. Cayuga query expressions can be compiled into extended non-deterministic finite state machines with support for parameterization and aggregates [DGHRW06]. Scalability is achieved by aggressive indexing and optimized memory management (see [DGPR+07]). Cayuga can support a variety of applications, ranging from monitoring of large distributed computing systems and networks, automated stock trading, Business Activity monitoring (BAM), and Business Process Management (BPM), all the way to expressive publish-subscribe for intelligent filtering and dissemination of RSS feeds and blogs [BDGH+07].

The experience with Cayuga resulted in a variety of related results. We developed novel techniques for efficiently processing a large number of concurrently active join queries, which correlate the contents of multiple streams of XML documents [HDGK+07]. We also developed an axiomatic framework for temporal models for event processing [WRGD07]. Using this framework we show that requirements for the "reasonable" semantics of event pattern queries dramatically limit the possibilities for choosing the appropriate temporal model.

[HDGK+07] M. Hong, A. Demers, J. Gehrke, C. Koch, M. Riedewald, and W. White. Massively Multi-Query Join Processing in Publish/Subscribe Systems. In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 761-772, 2007
[BDGH+07] L. Brenna, A. Demers, J. Gehrke, M. Hong, J. Ossher, B. Panda, M. Riedewald, M. Thatte, and W. White. Cayuga: A High-Performance Event Processing Engine (Demo Paper). In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 1100-1102, 2007
[WRGD07] W. White, M. Riedewald, J. Gehrke and A. Demers. What is "Next" in Event Processing? In Proc. ACM Symp. on Principles of Database Systems, pages 263-272, 2007
[DGPR+07] A. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White. Cayuga: A General Purpose Event Monitoring System. In Proc. Biennial Conf. on Innovative Data Systems Research (CIDR), pages 411-422, 2007
[DGHRW06] A. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. White. Towards Expressive Publish/Subscribe Systems. In Proc. Int. Conf. on Extending Database Technology (EDBT), pages 627-644, 2006
[DGR05] A. Das, J. Gehrke, M. Riedewald. Semantic Approximation of Data Stream Joins. In IEEE Transactions on Knowledge and Data Engineering 17(1):44-59 (2005)
[DGR04] A. Das, J. Gehrke, and M. Riedewald. Approximation Techniques for Spatial Data. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2004
[DGR03] A. Das, J. Gehrke, M. Riedewald. Approximate Join Processing over Data Streams. In Proc. ACM SIGMOD Int. Conf. on Managament of Data, pages 40-51, 2003

eScience: Data Management and Analysis Services for the Sciences

Since September 2004 I am co-PI on an NSF ITR award whose goal is to develop novel approaches for tracking environmental change based on bird abundance data. Currently we are mining a wealth of observational data hosted by Cornell's Lab of Ornithology in order to determine the relationship between environmental features and the abundance of wild bird species in North America [CEMR+06]. A major direction of our research is to develop highly accurate prediction models; this work has already resulted in a novel regression technique that produces better predictions than state-of-the art methods [SCR07]. We also recently started to explore new approaches to enable scientists to discover interesting patterns in the complex prediction models trained from the collected data.

I have collaborated with scientists from different areas since 1999. As a graduate student I designed new summarization techniques for digital libraries, e.g., [RAE01, RAE01b]. In recent collaborations with physicists at SLAC and Cornell's Wilson Lab the emphasis was on mining high-energy physics data and on managing metadata and provenance for elementary particle physics. Our ongoing collaboration with the Cornell Astronomy department is surveyed in [CCD+04]. Data flow challenges for managing and analyzing astronomy data, elementary particle physics, and snapshots of the WWW are discussed in [AAC+06]. We recently started working with researchers in Cornell's Sibley School of Mechanical and Aerospace Engineering. The goal of this collaboration is to improve the performance of long-running complex simulations of combustions [PRPG+06, PRGP07].

[SCR07] D. Sorokina, R. Caruana, and M. Riedewald: Additive Groves of Regression Trees. In Proc. European Conf. on Machine Learning (ECML), pages 323-334, 2007 (Best Student Paper)
[PRGP07] B. Panda, M. Riedewald, J. Gehrke, and S. B. Pope: High-Speed Function Approximation. In Proc. IEEE Int. Conf. on Data Mining (ICDM), 2007
[PRPG+06] B. Panda, M. Riedewald, S. B. Pope, J. Gehrke, L. P. Chew. Indexing for Function Approximation. In Proc. Int. Conf. on Very Large Databases (VLDB), pages 523-534, 2006
[CEMR+06] R. Caruana, M. Elhawary, A. Munson, M. Riedewald, D. Sorokina, D. Fink, W. M. Hochachka, S. Kelling: Mining Citizen Science Data to Predict Prevalence of Wild Bird Species. In Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 909-915, 2006
[AAC+06] W. Y. Arms, S. Aya, M. Calimlim, J. Cordes, J. Deneva, P. Dmitriev, J. Gehrke, L. Gibbons, C. D. Jones, V. Kuznetsov, D. Lifka, M. Riedewald, D. Riley, A. Ryd, and G. J. Sharp. Three Case Studies of Large-Scale Data Flows. In Proc. IEEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow). 2006
[CCD+04] M. Calimlim, J. Cordes, A. Demers, J. Deneva, J. Gehrke, D. Kifer, M. Riedewald, and J. Shanmugasundaram. A Vision for PetaByte Data Management and Analysis Services for the Arecibo Telescope. Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, 27(4), 2004
[RAE01] M. Riedewald, D. Agrawal, A. El Abbadi. Managing and Analyzing Massive Data Sets with Data Cubes. In J. Abello, P. M. Pardalos, and M. G. C. Resende, editors, Handbook of Massive Data Sets. Kluwer Academic Publishers, 2001
[RAE01b] M. Riedewald, D. Agrawal, A. El Abbadi. Flexible Data Cubes for Online Aggregation. In Proc. Int. Conf. on Database Theory (ICDT), pages 159-173, 2001 (Copyright held by Springer-Verlag)

Selected Professional Activities

Invited Talkstypewriter

Indexing for Function Approximation (Northwest Database Society seminar at University of Washington, Seattle, December 2006)
Indexing for Function Approximation (database and data mining seminar at Microsoft Research, Redmond, November 2006)
Towards Expressive and Scalable Publish/Subscribe (invited talk at Microsoft Research, Redmond, October 2005)
Cayuga: Internet-Scale Monitoring of Data Streams (CS colloquium at the University of Florida, Gainesville, April 2005)
Data Warehouse Meets Data Stream (Dagstuhl Perspectives Workshop: Data Warehousing at the Crossroads, August 2004)
Efficient Processing of Data Streams for Mining and Monitoring (35th Symp. on the Interface, Salt Lake City, Utah, March 2003)
Efficient Analysis of Massive Data in Data Warehouses and Data Stream Processing Systems (CS colloquium at the University of Rostock, Germany, December 2002)

Professional Service (most recent)

2008 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
 2008 Int. Symp. on Temporal Representation and Reasoning (TIME)
2008 Int. Workshop on Mining Multimedia Streams in Large-Scale Distributed Environments (MMSDE)
2008 Int. Workshop on Scalable Stream Processing Systems (SSPS)
2008 IEEE Int. Conf. on Computational Science and Engineering
2008 IEEE Int. Conf. on Intelligence and Security Informatics (ISI)
2007 Int. Conf. on Very Large Databases (VLDB), Program Committee
2007 AAAI Nectar (New sCientific and Technical Advances in Research), Program Committee
2007 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2007 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2007 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2007 Int. Workshop on Scalable Stream Processing Systems (SSPS), Program Committee
2006 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2006 Int. Conf. on Geosensor Networks (GSN)
2006 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2006 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2006 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2006 IEEE Int. Conf. on Data Engineering (ICDE), Program Committee
2005 ACM Conf. on Information and Knowledge Management (CIKM), Program Committee
2005 Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK), Program Committee
2005 ACM Int. Workshop on Data Warehousing and OLAP (DOLAP), Program Committee
2005 IEEE Int. Conf. on Intelligence and Security Informatics (ISI), Program Committee
2004 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Program Committee
2004 ACM SIGMOD Int. Conf. on Management of Data, Program Committee
2004 NSF/NIJ Symp. on Intelligence and Security Informatics (ISI), Program Committee
2003 Int. Conf. on Machine Learning (ICML), Program Committee
2003 Int. Conf. of Asian Digital Libraries (ICADL), Program Committee
2003 NSF/NIJ Symp. on Intelligence and Security Informatics (ISI), Program Committee

Reviewer for leading research journals: ACM Transactions on Database Systems (TODS), ACM Transactions on Information Systems (TOIS), VLDB Journal, IEEE Transactions on Knowledge and Data Engineering (TKDE), IEEE Transactions on Multimedia, IEEE Computer, Information Systems, Information Processing Letters (IPL), and others

Links

11 Dec 2007