Johannes Gehrke

Assistant Professor
johannes@cs.cornell.edu
http://www.cs.cornell.edu/johannes/
Ph.D. University of Wisconsin-Madison, 1999

My primary research interest is in the development of new data mining and database technology.

In the Himalaya Data Mining Project, we have been building a scalable,

high-performance mining engine for interactive, spreadsheet-style data analysis. We are now looking at the online mining of high-speed data streams and its application to electronic commerce and intrusion detection. We are working on a distributed mining and monitoring system with which we can deploy, manage, and query a large number of lightweight stream mining components.

In the Cougar Device Database System, we develop database technology for querying the physical world. The widespread deployment of sensors and mobile devices is transforming our environment into a computing platform. There is now computing power on every device, and emerging networking techniques ensure that devices are interconnected and accessible from local- or wide-area networks. This is a distributed database system of unprecedented scale. We have implemented the first generation of the Cougar Device Database System where we leverage the processing power on the devices to push query processing directly to the data sources. Different query processing strategies allow us to balance resource usage, accuracy, and speed of query answers. Our current research focuses on distributed and fault tolerant query processing and meta-data management.

Awards

IBM Faculty Development Award.

University Activities

Member: Graduate Admissions Committee.

Professional Activities

Program Committee Member: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, May 2000; ACM SIGMOD 1st International Conference on Web-Age Information Management (WAIM ’2000), Shanghai, China, June 2000; 12th International Conference on Software Engineering and Knowledge Engineering (SEKE 2000), Chicago, IL, June 2000; 6th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Boston, MA, August 2000.

Editorial Board: Knowledge and Information Systems.

Lectures

Classification and Regression: Money *can* grow on trees. Tutorial. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999 (with W.-Y. Loh and R. Ramakrishnan).

Optimistic Decision Tree Construction. Invited talk. Fall 1999 Meeting of the Institute for Operations Research and the Management Sciences, Philadelphia, PA, November 1999.

Decision Trees and Predictive Rules. Invited tutorial. 2000 International Conference on Data Engineering, San Diego, CA, March 2000.

Decision Tree Construction. Tutorial. 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, April 2000.

An Overview of Modern Data Mining Technology. Short course. Financial Industry Solutions Center, New York, NY, May 2000.

Mining Large Databases: Present and Future. Invited talk. First Annual Meeting of the Advanced Cluster Computing Consortium, Ithaca, NY, June 2000.

Publications

“Database Management Systems.” Second Edition. McGraw Hill (1999) (with R. Ramakrishnan).

“Mining Very Large Databases.” IEEE Computer, Vol. 32, No. 9 (August 1999), 38–45 (with V. Ganti and R. Ramakrishnan).

“CACTUS – Clustering Categorical Data Using Summaries.” In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999 (with V. Ganti and R. Ramakrishnan).

“Online Scheduling to Minimize Average Stretch.” Proceedings of the 40th Annual Symposium on Foundations of Computer Science, New York, NY, October 1999 (with S. Muthukrishnan, R. Rajaraman, and A. Shaheen).

“DEMON: Mining and Monitoring Evolving Data.” Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, March 2000 (with V. Ganti and R. Ramakrishnan).

“RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets.” Data Mining and Knowledge Discovery, Volume 4, Issue 2/3 (July 2000), 127–162 (with R. Ramakrishnan, and V. Ganti).