The past years have been marked by several breakthrough results in the domain of generative AI, culminating in the rise of tools like ChatGPT, able to solve a variety of language-related tasks without specialized training. In this talk, I outline novel opportunities in the context of data management, enabled by these advances. I discuss several recent research projects at Cornell, aimed at exploiting advanced language processing for tasks such as parsing a database manual to support automated tuning, mining data for patterns, described in natural language, or synthesizing scenario-specific code for data processing via specialized coding assistants. Finally, I discuss our recent and ongoing research, aimed at scaling up analysis of multimodal data via large language models to large repositories.

Immanuel Trummer is an assistant professor at Cornell University and a member of the Cornell Database Group. His papers were selected for “Best of VLDB”, “Best of SIGMOD”, for the ACM SIGMOD Research Highlight Award, and for publication in CACM as CACM Research Highlight. His online lecture introducing students to database topics collected over a million views. He received the NSF CAREER Award and multiple Google Faculty Research Awards.