The term “Big Data” is a catch phrase in vogue today to describe computing tools and technologies to deal with massive amounts of data. In this talk, I highlight alternate Big Data problems that are instead important to regular non-technical web-users who have to deal with large amounts of data and information, but lack the technical expertise, resources, or desire, to deploy large-scale computing systems.


The big-data problem for non-technical data experts, e.g., journalists, social scientists and NGOs, is that they have access to interesting datasets, but lack the tools to easily explore, analyze, visualize, share, or publish them. In Google Fusion Tables, we are beginning to address the data management needs of these data experts. I will describe, in particular, how our support for interactive visualizations forms  the basis for data-driven story-telling and how our support for database views as an abstraction for data sharing enables new data collaboration workflows.


The big-data problem for web-search users is that search engines often report the presence of millions of search results, but do not provide a meaningful summary that helps users make sense of the vast amounts of relevant information. We hypothesize that clustering search queries into groups representing distinct user information needs is a first step towards such summarization. I will describe a new clustering approach that is based on modeling user behavior within search sessions, which is able to obtain clusters that correspond to meaningful high-level concepts.



Jayant Madhavan is a member of the Structured Data Research group at Google Inc. His research interests include exploring the structure implicit in data on the Web, building interactive visualization over large datasets, and enabling non-experts effortlessly deal with large amounts of data. He currently leads the engineering team for Google Fusion Tables, a cloud data management solution. He was the Chief Architect at Transformic Inc., a portal that built search engines for the Deep Web, which was acquired by Google in 2005. He is a recipient of the Ten Year Best Paper Award at VLDB 2011. He received a Ph.D. from the University of Washington in 2005 and a B.Tech. from IIT Bombay in 1999.


Faculty Host: Johannes Gehrke


B17 Upson Hall

Tuesday, March 13, 2012

Refreshments at 3:45pm in the Upson 4th Floor Atrium


Computer Science


Big Data for Regular People

Jayant Madhavan