Speaker: Jun Yang
Time & Location: 4:15 PM, B14 Hollister Hall
Host: Johannes Gehrke
Title: Temporal Data Warehousing
The amount of information available to any large-scale enterprise is growing rapidly. New information is being generated continuously by various operational sources such as order-processing and inventory-control systems. To support efficient analysis and mining of such diverse, distributed information, a "data warehousing" system collects data from multiple sources and stores the integrated information in a central repository. The data warehouse needs to be updated continuously to reflect source data updates.
This talk focuses on how to support temporal information efficiently in data warehousing systems. Users of a data warehouse often are interested not only in monitoring the current information, but also in analyzing the history in order to predict future trends. I will present a temporal data warehousing framework in which we can create and incrementally update temporal "views" over the history of the source data, even when sources do not support temporal operations. Keeping temporal data warehouses up-to-date is a challenging problem, because temporal views may need to be updated not only when source data is updated but also as time advances, and these two dimensions of change interact in subtle ways. I will present efficient techniques for maintaining temporal data warehouses without disturbing source operations. A related challenge is supporting large-scale temporal aggregation operations in data warehouses. I will introduce new data structures that facilitate incremental computation and update of various types of temporal aggregates.
My work has been conducted in the context of the WHIPS project at Stanford, where we have built a complete infrastructure for incrementally maintaining data warehouses. In addition, I will briefly describe TIP, an extension I developed within a commercial database system to support a full range of temporal query capabilities.