Johannes Gehrke
Assistant Professor
johannes@cs.cornell.edu
http://www.cs.cornell.edu/johannes/
Ph.D. University of Wisconsin-Madison, 1999
My primary research interest is in the development of new data mining and database technology.
In the Himalaya Data Mining Project, we have been building a scalable, high-performance mining engine for interactive, spreadsheet-style data analysis. We are now looking at the online mining of high-speed data streams and its application to electronic commerce and intrusion detection. We are
working on a distributed mining and monitoring system with which we can deploy, manage, and query a large number of lightweight stream mining components.
In the Cougar Device Database System, we develop database technology for querying the physical world. The widespread deployment of sensors and mobile devices is
transforming our environment into a computing platform. There is now computing power on every device, and
emerging networking techniques ensure that devices are interconnected and
accessible from local- or wide-area networks. This is a distributed
database system of unprecedented scale. We have implemented the first generation of the Cougar Device Database System where we leverage the processing power on the devices to
push query processing directly to the
data sources. Different query processing strategies allow us to balance resource usage, accuracy, and
speed of query answers. Our current research focuses on distributed and
fault tolerant query processing and meta-data management.
Awards
- IBM Faculty Development Award.
University Activities
- Member: Graduate Admissions Committee.
Professional Activities
- Program Committee Member: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, May 2000; ACM SIGMOD 1st International
Conference on Web-Age Information Management
(WAIM '2000), Shanghai, China, June 2000; 12th International Conference on Software Engineering and Knowledge Engineering
(SEKE 2000), Chicago, IL, June 2000;6th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining, Boston, MA, August 2000.
- Editorial Board: Knowledge and Information Systems.
Lectures
- Classification and Regression: Money *can* grow on trees. Tutorial. 5th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999 (with W.-Y. Loh and R.
Ramakrishnan).
- Optimistic Decision Tree Construction. Invited talk. Fall 1999 Meeting of the Institute for
Operations Research and the Management Sciences, Philadelphia, PA, November
1999.
- Decision Trees and Predictive Rules. Invited tutorial. 2000 International Conference on
Data Engineering, San Diego, CA, March 2000.
- Decision Tree Construction. Tutorial. 4th Pacific-Asia
Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, April 2000.
- An Overview of Modern Data Mining Technology. Short course. Financial Industry Solutions Center, New York, NY, May 2000.
- Mining Large Databases: Present and Future. Invited talk.
First Annual Meeting of the
Advanced Cluster Computing Consortium, Ithaca, NY, June 2000.
Publications
- "Database Management Systems." Second Edition. McGraw Hill (1999) (with R.
Ramakrishnan).
- "Mining Very Large Databases." IEEE Computer, Vol. 32, No. 9 (August 1999),
38--45 (withV. Ganti and R. Ramakrishnan).
- "CACTUS-- Clustering Categorical Data Using Summaries." In Proceedings of the 5th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999 (with V. Ganti and R.
Ramakrishnan).
- "Online Scheduling to Minimize Average Stretch." Proceedings
of the 40th Annual Symposium on Foundations of Computer Science, New York, NY,
October 1999 (with S.
Muthukrishnan, R. Rajaraman, and A. Shaheen).
- "DEMON: Mining and Monitoring Evolving Data." Proceedings of the 16th International Conference on
Data Engineering, San Diego, CA, March 2000 (with V. Ganti and R.
Ramakrishnan).
- "RAINFOREST - A Framework for Fast Decision Tree Construction of Large
Datasets." Data Mining and Knowledge Discovery, Volume 4, Issue 2/3 (July 2000),
127--162 (with R.
Ramakrishnan, and V. Ganti).
|