- About
- Events
- Calendar
- Graduation Information
- Cornell Tech Colloquium
- Student Colloquium
- BOOM
- Fall 2023 Colloquium
- Conway-Walker Lecture Series
- Salton 2023 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University High School Programming Contests 2023
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
The Cornell Database Group is interested in all aspects of data analysis and database management. This includes projects at the intersection between database systems and other areas such as machine learning or natural language processing. For recent news, visit the Cornell Database Group Homepage or follow us on Twitter.
Recent Publications
2023
- MEAP Book AI-Assisted Data Science: Large Language Models for Multimodal Data Analysis. Immanuel Trummer.
[Book] - VLDBJ 2023 DB-BERT: a Database Tuning Tool that “Reads” the Manual. Immanuel Trummer.
- PVLDB 2023 Quantum-Inspired Digital Annealing for Join Ordering. Manuel Schönberger, Immanuel Trummer, Wolfgang Mauerer.
[Paper] - PVLDB 2023 Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. Simran Arora, Brandon Yang, Sabri Eyuboglu, Avanika Narayan, Andrew Hojel, Immanuel Trummer, Christopher Re.
[Paper Code] - PVLDB 2023 Can Large Language Models Predict Data Correlations from Column Names? Immanuel Trummer.
[Paper Code] - PVLDB 2023 ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Ahmet Kara, Dan Olteanu.
[Paper Code] - PVLDB 2023 Demonstrating ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Joins via Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Ahmet Kara, Dan Olteanu.
[Code] - PVLDB 2023 Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4. Immanuel Trummer.
[Code] - SIGMOD 2023 Best Demo Runner Up Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language. Immanuel Trummer.
[Paper Code Talk] - SIGMOD 2023 Demonstration of ThalamusDB: Answering Complex SQL Queries with Natural Language Predicates on Multi-Modal Data. Saehan Jo, Immanuel Trummer.
[Paper Talk] - VLDB 2023 QDSM Quantum Optimisation of General Join Trees. Manuel Schönberger, Immanuel Trummer, Wolfgang Mauerer.
[Paper]
2022
- PVLDB 2022 CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions Using GPT-3 Codex. Immanuel Trummer.
[Paper Code Talk] - PVLDB 2022 BABOONS: Black-Box Optimization of Data Summaries in Natural Language. Immanuel Trummer.
[Paper Code] - PVLDB 2022 From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management. Immanuel Trummer.
[Paper Talk] - PVLDB 2022 UDO: Universal Database Optimization Using Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Debabrota Basu.
[Paper Code Talk] - PVLDB 2022 SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms. Ziyun Wei, Immanuel Trummer.
[Paper] - VLDB 2022, PhD Workshop Building Learned Federated Query Optimizers. Victor Giannakouris, Immanuel Trummer.
[Paper] - SIGMOD 2022 Demonstrating DB-BERT: a Database Tuning Tool that “Reads the Manual”. Immanuel Trummer.
[Paper Code Talk] - AAAI 2022 Procrastinated Tree Search: Black-Box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. Junxiong Wang, Debabrota Basu, Immanuel Trummer.
[Paper Code] - SIGMOD 2022 DB-BERT: a Database Tuning Tool that “Reads the Manual”. Immanuel Trummer.
[Paper Code Talk] - CIDR 2022 Towards NLP-Enhanced Data Profiling Tools. (Abstract) Immanuel Trummer.
[Paper Code Talk]
2021
- TODS 2021 “Best of SIGMOD” Edition SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis
[Paper Code Talk] - PVLDB 2021 The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that Read the Manual. Immanuel Trummer.
[Paper Talk] - PVLDB 2021 Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries. Ziyun Wei, Immanuel Trummer, Connor Anderson.
[Paper] - IEEE Data Engineering Bulletin WebChecker: Towards an Infrastructure for Efficient Misinformation Detection at Web Scale. Immanuel Trummer.
[Paper Code] - SIGMOD Record 2021 Database Tuning Using Natural Language Processing. Immanuel Trummer.
[Paper] - SIGMOD 2021 Demonstrating UDO: a Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Debabrota Basu.
[Paper Code] - SIGMOD 2021 Demonstrating Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries. Ziyun Wei, Immanuel Trummer, Connor Anderson.
[Paper] - ICDE 2021 Optimally Summarizing Data by Small Fact Sets for Concise Answers to Voice Queries. Immanuel Trummer, Connor Anderson.
[Paper Code Talk]
2020
- BDA 2020 Best Demonstration Award Scrutinizer: a System for Checking Statistical Claims. Georgios Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
[Paper] - PVLDB 2020 Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification. George Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
[Paper Code] - PVLDB 2020 Mining an “Anti-Knowledge Base” from Wikipedia Updates with Applications to Fact Checking and Beyond. Georgios Karagiannis, Immanuel Trummer, Saehan Jo, Shubham Khandelwal, Xuezhi Wang, Cong Yu.
[Paper Data] - PVLDB 2020 Demonstration of ScroogeDB: Getting More Bang for the Buck with Deterministic Approximation in the Cloud. Saehan Jo, Jialing Pei, Immanuel Trummer.
[Paper] - PVLDB 2020 Demonstrating the Voice-Based Exploration of Large Data Sets with CiceroDB-Zero. Immanuel Trummer.
[Paper Code] - PVLDB 2020 Scrutinizer: Fact Checking Statistical Claims. George Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
[Paper Code] - SIGMOD 2020 Demonstration of BitGourmet: Data Analysis via Deterministic Approximation. Saehan Jo, Immanuel Trummer.
[Paper] - CIDR 2020 BitGourmet: Deterministic Approximation via Optimized Bit Selections. Saehan Jo, Immanuel Trummer.
[Paper Talk]
2019
- SIGMOD 2019 Selected for “Best of SIGMOD” SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis.
[Paper Talk] - SIGMOD 2019 A Holistic Approach for Query Evaluation and Result Vocalization in Voice-Based OLAP. Immanuel Trummer, Yicheng Wang, Saketh Mahankali.
[Paper Talk] - SIGMOD 2019 Exact Cardinality Query Optimization with Bounded Execution Cost. Immanuel Trummer.
[Paper Code Talk] - SIGMOD 2019 Verifying Text Summaries of Relational Data Sets. Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta.
[Paper Talk] - PVLDB 2019 AggChecker: a Fact-Checking System for Text Summaries of Relational Data Sets. Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta.
[Paper Demo] - CIDR 2019 Data Vocalization with CiceroDB. Immanuel Trummer.
[Paper Talk]
2018
- PVLDB 2018 SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Samuel Moseley, Joseph Antonakakis, Saehan Jo.
[Paper Code] - PVLDB 2018 Vocalizing Large Time Series Efficiently. Immanuel Trummer, Mark Bryan, Ramya Narasimha.
[Paper Talk]
2017
- CACM 2017 Multi-Objective Parametric Query Optimization. Immanuel Trummer, Christoph Koch.
[Paper] - PVLDB 2017 Data Vocalization: Optimizing Voice Output of Relational Data. Immanuel Trummer, Jiancheng Zhu, Mark Bryan.
[Paper Talk] - SIGMOD 2017 Solving the Join Ordering Problem via Mixed Integer Linear Programming. Immanuel Trummer, Christoph Koch.
[Paper Code]