Department of Computer Science Colloquium
Thursday April 25th, 2002 at 4:15pm 
Upson Hall B17

Where did my data come from? Annotating and Archiving Databases

Wang-Chiew Tan
University of Pennsylvania


Publishing data on the Web has revolutionized the way much scientific research is conducted. However it also brings new problems and new opportunities. Among the problems is that it is often difficult to trace a piece of data to its source, since it may have moved through several databases being transformed and edited on its journey from the source. Worse, the source may no longer exist! Knowing the source and provenance is essential for its scientific credibility. Among the opportunities is that scientists now want to annotate a data element and to have their annotations spread to other people who look at the same element. This is related to provenance because the annotation should "spread" back to the source and forwards to other databases and users.

This talk deals with two issues concerned with provenance. I will first examine the problem of propagating annotations through queries and show a dichotomy of complexity for this problem. In the second part of the talk, I will describe a new technique for archiving data that allows all versions of an evolving scientific database to be stored and retrieved with very small overhead.