Data Vocalization

Driven by advances in speech recognition and synthesis, the interaction between user and computer is currently shifting towards voice based interfaces. Several major IT companies have recently presented devices and tools that use voice as primary communication medium. Examples include but are not limited to Google Home, Amazon Echo, and Apple's Siri. Those and other devices often need to communicate relational data to users (e.g., Google Home returns structured search results via voice output). There is a large body of research work available on how to optimally represent relational data to users. However, prior work focuses nearly exclusively on visual data representations which have been dominant over a long time.

In this project, we study the question of how to Vocalize data, i.e. how to translate data into optimal voice output. Voice output has to be extremely efficient. While users can themselves quickly identify relevant parts in a plot or written text (via skimming), they have to trust the computer to select only the most relevant information for voice output. Voice output transmits information rather slowly (compared to reading text), still it has to be short (in order to not to exceed the user's attention span), while at the same time of a simple structure (in order to restrict cognitive load on the listener). We treat voice output generation as a global optimization problem in which various metrics (e.g., speaking time, precision of transmitted information, cognitive load on the listener) come into play.

We are currently studying data vocalization in different scenarios, characterized by data type, size, and user context among other criteria. We are also studying questions of how to optimally support voice output via data processing, i.e. how to avoid generating results of a degree of detail that cannot be transmitted via speech. The results of this research project are integrated into CiceroDB, a prototypical database system that is designed from the ground up for efficient and effective voice output.


Immanuel Trummer, Jiancheng Zhu, and Mark Bryan. "Optimizing Voice-Based Output of Relational Data". VLDB 2017.


Jiancheng Zhu, Mark Bryan, and Immanuel Trummer. "CiceroDB: Optimizing Voice Output of Relational Data". BOOM 2017. Winner of the Lockheed Martin Award!


Google Faculty Research Award 2017 for "Optimizing Voice-Based Output of Relational Data".