Data Vocalization and Voice Interfaces (CiceroDB)

Demo: Trends in the 2019 Stack Overflow Developer Survey

On Android phones, press the Home button for a few seconds, release, and say "OK Google, talk to Developer Facts". Ask about topics (e.g., say "job satisfaction in the US" or "Python usage") and the system will speak out the two or three facts that best summarize the corresponding data subset. Also works on Google Home smart speakers or any device running the Google Assistant.

SIGMOD 2019 Talk on Voice-Based OLAP

Overview

The communication between user and computer is more and more shifting towards voice interfaces. This trend is evidenced by devices and services such as Google Home, Amazon Echo, or Apple's Siri. We study the question of how to exploit voice interfaces for data analysis.

Enabling voice-based access to structured data entails two research challenges. First, we need to translate speech input into queries. Second, we need to summarize potentially large query results via voice output ("data vocalization"). Our research on voice interfaces covers various scenarios in terms of query and data types. Beyond user-centric research questions (e.g., "how to resolve ambiguities?", "how to describe data?"), we also study possibilities to specialize backends and query processing methods to voice interfaces for increased efficiency. Research results are integrated into CiceroDB, a database system designed from the ground up for voice-based analysis of large data sets.

Publications

Ziyun Wei, Immanuel Trummer, and Connor Anderson. "Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries." SIGMOD 2020.

Immanuel Trummer, Connor Anderson. "Optimally Summarizing Data by Small Fact Sets for Concise Answers to Voice Queries." ICDE 2020.

Immanuel Trummer. "Demonstrating the Voice-Based Exploration of Large Data Sets with CiceroDB-Zero." VLDB 2020.

Immanuel Trummer. "Data Vocalization with CiceroDB." CIDR 2019.

Immanuel Trummer, Yicheng Wang, Saketh Mahankali. "A holistic approach for query evaluation and result vocalization in voice-based OLAP." SIGMOD 2019.

Immanuel Trummer, Mark Bryan, and Ramya Narasimha. "Vocalizing Large Time Series Efficiently." VLDB 2018.

Immanuel Trummer, Jiancheng Zhu, and Mark Bryan. "Optimizing Voice-Based Output of Relational Data". VLDB 2017.

Demonstrations

Mark Bryan, Immanuel Trummer, and Ramya Narasimha. "Voice-Based Analysis of Time Series Data". BOOM 2018. Winner of the JP Morgan Award!

Mark Bryan, Jiancheng Zhu, and Immanuel Trummer. "Optimizing Voice Output of Relational Data". BOOM 2017. Winner of the Lockheed Martin Award!

Funding

Google Faculty Research Award 2017 for "Optimizing Voice-Based Output of Relational Data".

Student Awards

Mark Bryan wins honorable mention for 2018 CRA Outstanding Undergraduate Researcher Award.