Database Seminar (CS 7390)

The database seminar discusses recent research from the areas of data analysis and database management systems.

Logistics

Instructor: Immanuel Trummer (itrummer@cornell.edu)
Class: Tuesdays 4:55pm - 6:10pm
Mode: online presentations (Zoom)
Zoom Link

Details

We discuss recent papers from the area of database management systems (primarily VLDB, SIGMOD, and CIDR conferences). One to two papers (on related topics) are presented in each session. Alternatively, participants may choose to present their own, ongoing research if it connects to database systems. Beyond Cornell students, we will also have several external speakers presenting their work.

Presentations take up to 45 minutes (pure presentation time), allowing for at least 15 minutes of questions throughout the talk. After the talk, all participants summarize their impressions about the paper(s). All participants taking the seminar for credit are expected to read papers before the session to enable interesting discussions.

Schedule

Date	Speaker	Topic
9/22	(Internal)
9/29	Edward Gan (Stanford/Databricks)	CoopStore: optimizing pre-computed summaries for aggregation and Moment-based quantile sketches for efficient high cardinality aggregation queries.
10/6	Jialin Ding (MIT)	Learning multi-dimensional indexes and Tsunami: a learned multi-dimensional index for correlated data and skewed workloads.
10/13	Ji Sun, Xuanhe Zhou (Tsinghua)	And end-to-end learning-based cost estimator, QTune: A query-aware database tuning system with deep reinforcement learning, and Query Performance Prediction for Concurrent Queries using Graph Embedding.
10/20		TBD
10/27	Abdul H. Quamar (IBM), Chuan Lei (IBM), Jaydeep Sen (IBM)	Athena++: Natural language querying for complex nested SQL queries and Conversational BI: an ontology-driven conversation system for business intelligence applications.
11/3	Christina Christodoulakis (University of Toronto)	Pytheas: Pattern-based Table Discovery in CSV Files
11/10	Internal	Topic: automated fact checking.
11/24	(Break)
12/01	Internal	Exact cardinality query optimization with bounded execution cost, SIGMOD 2019.
12/08	Internal	Data vocalization with CiceroDB, CIDR 2019.
12/15	Internal	Building an "Anti-Knowledge Base" from Wikipedia updates with applications to fact checking and beyond, VLDB 2020.

Topic Propositions

Visual data management systems (e.g., D. Kang et al.: Challenges and opportunities in DNN-based video analytics: a demonstration of the Blazelt video query engine, CIDR 2019)
Finding optimal query processing plans via machine learning (e.g., R. Marcus et al.: Neo: a learned query optimizer, VLDB 2019)
Worst-case optimal join algorithms (e.g., T. Veldhuizen: Leap-frog triejoin: a simple, worst-case optimal join algorithm, ArXiV 2012)
NewSQL systems (e.g., J. Corbett et al.: Spanner: Google's globally distributed database, TOCS 2012)
Natural language query interfaces (e.g., J. Sen et al.: ATHENA++: Natural Language Querying for Complex Nested SQL Queries, VLDB 2020)
Novel data storage hardware (e.g., R. Appuswamy et al.: OligoArchive: using DNA in the DBMS storage hierarchy, CIDR 2019)
Learned database system components (e.g., T. Kraska et al.: The case for learned index structures, SIGMOD 2018)
Cloud database management systems (e.g., A. Verbitski et al.: Amazon Aurora: design considerations for high-throughput cloud-native relational databases, SIGMOD 2017)
Data processing with fast networks (e.g., E. Zamanian et al.: The end of a myth: distributed transactions can scale, VLDB 2017)
...