Publications

Books

  1. Raghu Ramakrishnan and Johannes E. Gehrke. Database Management Systems, Third Edition, 2002. McGraw Hill.
  2. Raghu Ramakrishnan and Johannes E. Gehrke. Database Management Systems, Second Edition, 1999. McGraw Hill.

Journal Papers

  1. Abhinandan Das, J. E.  Gehrke, and Mirek Riedewald. Approximate Join Processing Over Data Streams. To appear in IEEE Transactions on Knowledge and Data Engineering.

  2. Cristian Bucila, J. E. Gehrke, Daniel Kifer, and Walker White. DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. Data Mining and Knowledge Discovery, Vol. 7, Issue 4, July 2003, pages 241-272.

  3. Rohit Ananthakrishna, Abhinandan Das, J. E. Gehrke, Flip Korn, S. Muthukrishnan, and Divesh Srivastava. Efficient Approximation of Correlated Sums on Data Streams. IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 3, May/June 2003, pages 569-572.

  4. Paul S. Bradley, J. E. Gehrke, Raghu Ramakrishnan and Ramakrishnan Srikant. Philosophies and Advances in Scaling Mining Algorithms to Large Databases. Communications of the ACM, August 2002.

  5. Venkatesh Ganti, J. E. Gehrke, Raghu Ramakrishnan, and W.-Y. Loh. A Framework for Measuring Changes in Data Characteristics. Journal of Computer and System Sciences, Vol. 64, No. 3, May 2002, pages 542-578.
  6. Venkatesh Ganti, J. E. Gehrke, and Raghu Ramakrishnan. DEMON: Mining and Monitoring Evolving Data. IEEE Transactions on Knowledge and Data Engineering, Vol. 13, No.1, January/February 2001, pages 50-63. 
  7. Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. Querying the Physical World. IEEE Personal Communications, Vol. 7, No. 5, October 2000, pages 10-15. Special Issue on Smart Spaces and Environments.
  8. J. E. Gehrke, Raghu Ramakrishnan, and Venkatesh Ganti. RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets. In Data Mining and Knowledge Discovery, Volume 4, Issue 2/3, July 2000, pages 127-162. Preliminary version: J. E. Gehrke, Raghu Ramakrishnan, and Venkatesh Ganti. RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets. In Proceedings of the Twenty-fourth International Conference on Very Large Data Bases, New York, New York, 1998.
  9. Venkatesh Ganti, J. E. Gehrke, and Raghu Ramakrishnan. Mining very large databases. IEEE Computer, Vol. 32, No. 9,  August 1999, pages 38-45.
  10. J. E. Gehrke, C. G. Plaxton, and R. Rajaraman. Rapid convergence of a local load balancing algorithm for asynchronous rings. Theoretical Computer Science, Vol. 220, No. 1, June 1999. (Preliminary version in Proceedings of the 11th International Workshop on Distributed Algorithms, Saarbrucken, Germany, Lecture Notes in Computer Science, no. 1320, M. Mavronicolas and P. Tsigas (Eds.), pages 81-95, September 1997.)
  11. S. K. Baruah, J. E. Gehrke, C. G. Plaxton, I. Stoica, H. Abdel-Wahab, and K. Jeffay. Fair on-line scheduling of a dynamic set of tasks on a single resource. Information Processing Letters, 64:43-51, 1997. (Preliminary version: S. K. Baruah, J. E. Gehrke, and C. G. Plaxton. Fair on-line scheduling of a dynamic set of tasks on a single resource. Department of Computer Science, University of Texas at Austin, Technical Report TR-96-03, 12 pages, February 1996.)

Refereed Conference Papers

  1. Dan Kifer, Shai Ben-David, and Johannes Gehrke. Detecting Change in Data Streams. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004). Toronto, Canada. August 2004.

  2. Abhinandan Das, Mirek Riedewald, and Johannes Gehrke. Approximation Techniques for Spatial Data. In Proceedings of the the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004). Paris, France, June 2004 

  3. Adina Crainiceanu, Prakash Linga, Johannes Gehrke, and Jayavel Shanmugasundaram. P-Tree: A P2P Index for Resource Discovery Applications. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). New York, NY, May 2004. Poster paper.

  4. Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, and Jayavel Shanmugasundaram. A Storage and Indexing Framework for P2P Systems. In Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). New York, NY, May 2004. Poster paper.

  5. Alin Dobra, Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Sketch-Based Multi-Query Processing over Data Streams. To appear in Proceedings of the 9th International Conference on Extending Database Technology (EDBT 2004). Heraklion-Crete, Greece, March 2004.

  6. David Kempe, Alin Dobra, and J. E. Gehrke. Computing Aggregate Information using Gossip. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science. Cambridge, MA, October 2003.

  7. Abhinandan Das, J. E.  Gehrke, and Mirek Riedewald. Approximate Join Processing Over Data Streams. In Proceedings of the the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003). San Diego, CA, June 2003.

  8. Daniel Kifer, J. E. Gehrke, Cristian Bucila, and Walker White. How to Quickly Find a Witness. In Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2003).  San Diego, CA, June 2003.

  9. Alexandre Evfimievski, J. E. Gehrke, and Ramakrishnan Srikant. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2003).  San Diego, CA, June 2003.

  10. Tobias Mayr, Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. Leveraging Non-Uniform Resources for Parallel Query Processing. In Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003). Tokyo, Japan, May 2003.

  11. Yong Yao and J. E. Gehrke. Query Processing in Sensor Networks. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, California, January 2003.

  12. Cristian Bucila, J. E. Gehrke, Daniel Kifer, and Walker White. DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.

  13. Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and J. E. Gehrke. Privacy Preserving Mining of Association Rules. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.

  14. Shai Ben-David, J. E. Gehrke, and Reba Schuller. A Theoretical Framework for Learning from a Pool of Disparate Data Sources. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.

  15. Jay Ayres, J. E. Gehrke, Tomi Yiu, and Jason Flannick. Sequential PAttern Mining Using Bitmaps. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.

  16. Alin Dobra and Johannes Gehrke. SECRET: A Scalable Linear Regression Tree Algorithm. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.

  17. Francis Chu, Joseph Halpern, and J. E. Gehrke. Least Expected Cost Query Optimization: What Can We Expect? In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2002). Madison, Wisconsin, June 2002.

  18. Alin Dobra, Minos Garofalakis, J. E. Gehrke, and Rajeev Rastogi. Processing Complex Aggregate Queries over Data Streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, June 2002.
  19. Wai Fu Fung, David Sun, and J. E. Gehrke. COUGAR: The Network is the Database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002), Madison, Wisconsin, June 2002. Demo description.

  20. Anton Faradjian, J. E. Gehrke, and Philippe Bonnet. GADT: A Probability Space ADT For Representing and Querying the Physical World. In Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, February 2002.
  21. Alin Dobra and J. E. Gehrke. Bias Correction in Classification Tree Construction. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2001), Williams College, Massachusetts, June 2001.
  22. Zhiyuan Chen, J. E. Gehrke, and Flip Korn. Query Optimization In Compressed Database Systems. In Proceedings of the 2001 ACM Sigmod International Conference on Management of Data, Santa Barbara, California, May 2001.
  23. J. E. Gehrke, Flip Korn, and Divesh Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. In Proceedings of the 2001 ACM Sigmod International Conference on Management of Data, Santa Barbara, California, May 2001.
  24. Doug Burdick, Manuel Calimlim, and J. E. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2001.
  25. Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. Towards Sensor Database Systems. In Proceedings of the Second International Conference on Mobile Data Management. Hong Kong, January 2001. 
  26. S. Muthukrishnan, R. Rajmohan, A. Shaheen, and J. E. Gehrke. Scheduling to Minimize Average Stretch. In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, October 1999. A preliminary version appeared as DIMACS Technical Report 99-02, January 1999.
  27. Venkatesh Ganti, J. E. Gehrke, and Raghu Ramakrishnan. DEMON: Mining and Monitoring Evolving Data. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, 2000. Best student paper award.
  28. Venkatesh Ganti, J. E. Gehrke, and Raghu Ramakrishnan. CACTUS--Clustering Categorical Data Using Summaries. In Proceedings of the 1999 SIGKDD Conference, San Diego, California, 1999.
  29. J. E. Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, and Wei-Yin Loh. BOAT -- Optimistic Decision Tree Construction. In Proceedings of the 1999 SIGMOD Conference, Philadelphia, Pennsylvania, 1999.
  30. Venkatesh Ganti, J. E. Gehrke, Raghu Ramakrishnan, and Wei-Yin Loh. A Framework for Measuring Changes in Data Characteristics. In Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, 1999. (Invited to Journal of Computer Science and Systems (JCSS).)
  31. Venkatesh Ganti, Raghu Ramakrishnan, J. E. Gehrke, Allison L. Powell, and James French. Clustering Large Datasets in Arbitrary Metric Spaces. In Proceedings of the Fifteenth International Conference on Data Engineering, Sidney, Australia, 1999.
  32. J. E. Gehrke, Raghu Ramakrishnan and Venkatesh Ganti. RainForest – A Framework for Fast Decision Tree Construction of Large Datasets. Proceedings of the Twenty-fourth International Conference on Very Large Data Bases, New York, New York, 1998.
  33. Rakesh Agrawal, J. E. Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proceedings of the 1998 SIGMOD Conference, Seattle, Washington, 1998.
  34. Michael J. Carey, David J. DeWitt, Jeffrey F. Naughton, Mohammad Asgarian, J.E. Gehrke, and Dhaval N.Shah. The BUCKY Object-Relational Benchmark. In Proceedings of the 1997 SIGMOD Conference, Tucson, Arizona, May 1997. More material, including the data generator used in the benchmark.
  35. I. Stoica, H. Abdel-Wahab, K. Jeffay, S. K. Baruah, J. E. Gehrke, and C. G. Plaxton. A proportional share resource allocation algorithm for real-time, time-shared systems. In Proceedings of the 17th Annual IEEE Real-Time Systems Symposium, Washington, DC, pages 288-299, December 1996.
  36. S. K. Baruah, J. E. Gehrke, and C. G. Plaxton. Fast scheduling of periodic tasks on multiple resources. In Proceedings of the 9th International Parallel Processing Symposium, Santa Barbara, California, pages 280-288, April 1995. (Expanded version as Department of Computer Science, University of Texas at Austin, Technical Report TR-95-02, 21 pages, February 1995.)

Workshop Publications

  1. Adina Crainiceanu, Prakash Linga, Johannes Gehrke, and Jayavel Shanmugasundaram. Querying Peer-to-Peer Networks Using P-Trees. In Proceedings of the Seventh International Workshop on the Web and Databases (WebDB 2004). Paris, France, June 2004. An expanded version of this paper is available as Cornell University Computing and Information Science Technical Report TR2004-1926.

  2. Doug Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, and Tomi Yiu. MAFIA: A Performance Study of Mining Maximal Frequent Itemsets. Workshop on Frequent Itemset Mining Implementations (FIMI'03). Melbourne, Florida, November 2003.
  3. Alan Demers, Johannes Gehrke, Raimohan Rajaraman, Niki Trigoni, and Yong Yao. Energy-Efficient Data Management for Sensor Networks: A Work-In-Progress Report. 2nd IEEE Upstate New York Workshop on Sensor Networks. Syracuse, NY, October 2003.

Other Publications

  1. Adina Crainiceanu, Prakash Linga, Ashwin Machanavajjhala, Johannes Gehrke, Jayavel Shanmugasundaram: P-Ring: An Index Structure for Peer-to-Peer Systems. Cornell University Computing and Information Science Technical Report TR2004-1946. July 2004.

  2. Adina Crainiceanu, Prakash Linga, Johannes Gehrke, and Jayavel Shanmugasundaram. Querying Peer-to-Peer Networks Using P-Trees. Cornell University Computing and Information Science Technical Report TR2004-1926.

  3. Alan Demers, Johannes Gehrke, Rajmohan Rajaraman, Niki Trigoni, and Yong Yao. The Cougar Project: A Work-In-Progress Report. In Sigmod Record, Volume 34, Number 4, December 2003.

  4. Anastassia Ailamaki and J. E. Gehrke. Time Management for New Faculty. Sigmod Record, Volume 32, Number 2, June 2003.

  5. Alan Demers, J. E. Gehrke, and Mirek Riedewald. Research Issues in Distributed Mining and Monitoring. In the informal proceedings of the National Science Foundation Workshop on Next Generation Data Mining (NGDM 2002). Baltimore, Maryland, November 2002.

  6. Yong Yao and J. E. Gehrke. The Cougar Approach to In-Network Query Processing in Sensor Networks. Sigmod Record, Volume 31, Number 3, September 2002.

  7. Venkatesh Ganti, J. E. Gehrke, and Raghu Ramakrishnan. Mining Data Streams under Block Evolution. Invited paper to SIGKDD Explorations, Volume 3, Issue 2, January 2002.

  8. J. E. Gehrke. Report on the SIGKDD 2001 Conference Panel “New Research Directions in KDD”. SIGKDD Explorations, Volume 3, Issue 2, January 2002.

  9. Roberto Bayardo and J. E. Gehrke. Report on the Workshop on Research Issues in Data Mining and Knowledge Discovery Workshop (DMKD 2001). SIGKDD Explorations, Volume 3, Issue 1. July 2001.

  10. Scalable Decision Tree Construction. In Newsletter of the Technical Committee on Distributed Processing, Spring 2001, pages 16-23.

Tutorials and Short Courses

  1. Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Querying and Mining Data Streams: You Only Get One Look. Tutorial at the following conferences:
  2. Internet Infrastructure and Applications. Lectures on Database Technology and Data Mining. Johnson Graduate School of Management, Cornell University. February/March 2002.

  3. Johannes Gehrke and Wei-Yin Loh. Advances in Decision Tree Construction. Tutorial at the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001. Part I of the tutorial slides.

  4. Johannes Gehrke. The Infrastructure of Electronic Commerce. Lectures on Database Systems and Data Mining. Johnson Graduate School of Management, Cornell University. January 2001.
  5. Johannes Gehrke. An Overview of Modern Data Mining Technology. Workshop at the Financial Industry Solutions Center (FISC) New York. November 8, 2000.
  6. Johannes Gehrke. An Overview of Modern Data Mining Technology. Workshop at the Financial Industry Solutions Center (FISC) New York. May 31, 2000.
  7. Johannes Gehrke. The Infrastructure of Electronic Commerce. Lectures on Database Systems and Data Mining. Johnson Graduate School of Management, Cornell University. April/May 2000.
  8. Johannes Gehrke. Data Mining with Decision Trees. Tutorial at the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto Japan, April 2000.
  9. Johannes Gehrke. Decision Trees and Predictive Rules. Invited tutorial at the Sixteenth International Conference on Data Engineering, San Diego, California, February 2000.
  10. Johannes Gehrke, Wei-Yin Loh, and Raghu Ramakrishnan. Data Mining with Decision Trees. (Slides and References.) Tutorial at the Fifth SIGKDD Conference, San Diego, California, August 1999.

Patents

  1. Rakesh Agrawal, J. E. Gehrke, Dimitrios Gunopulos and Prabhakar Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. United States Patent No. 6003029.
  2. J. E. Gehrke, Venkatesh Ganti and Raghu Ramakrishnan. Method of Constructing Binary Decision Trees with Reduced Memory Access. United States Patent No. 6442561.