Menu:

The Youtopia Project

Community Data Integration

Communities everywhere on the Web want to share, store and query data. Their motivations for data sharing are very diverse - from entertainment or commercial activity to the desire to collaborate on scientific or artistic projects. The data involved is also varied, running the gamut from unstructured through semistructured to relational. The solutions used for data sharing are frequently custom-built for a concrete scenario; as such, they exhibit significant diversity themselves. To name only a few prominent solutions, Wiki software has proved very successful for community management of unstructured data; scientists use custom portals to pool their datasets; and an increasingly large number of vertical social networking sites include a topic-specific database that is maintained by the site's members.

While the scenarios mentioned above vary widely in their parameters, they have in common many high-level properties that translate into concrete design desiderata for Collaborative Data Integration (CDI) systems. In the Youtopia project, we are building a system to address these desiderata and enable community data sharing in arbitrary settings. Our initial focus is on relational data; however, the ultimate goal is to include arbitrary data formats and manage the data in its full heterogeneity.

CDI has three fundamental aspects that distinguish it from other paradigms such as classical data integration. First, a CDI system must enable best-effort cooperation among community members with respect to maintenance of the data and metadata. That is, no worthwhile contribution to the repository should be rejected because it is incomplete, as another community member may be able to supply the knowledge required to complete it. This means a CDI system must be equipped to deal with incomplete data and metadata, as well as providing a way for users to complete them at a later time. Next, a CDI solution must manage disagreement regarding the data and schema or other metadata. Finally, it must maximize data utility.

These three aspects have clear tradeoffs in the extent to which they can be addressed; as such, they define a design space within which we can situate existing solutions and Youtopia. The structure of this design space also clarifies the relationship of CDI to classical data integration; the latter is fundamentally an effort to maintain utility while permitting as much disagreement as possible. CDI builds on this by introducing the added element of best-effort cooperation, familiar from the Web 2.0 model of enabling all users to create their own content on the internet.

Youtopia

Youtopia architecture

Youtopia is a system that allows users to add, register, update and maintain relational data in a collaborative fashion. The architecture of Youtopia is shown above. The storage manager provides a logical abstraction of the repository. In this abstraction, the repository consists of a set of logical tables or views containing the data; these are tied together by a set of mappings (or tuple-generating dependencies). The mappings are supplied by the users as the repository grows and serve to propagate changes to the data in a variant of the chase process. Thus, at the logical level Youtopia is an update exchange system.

The following is our vision for Youtopia; we explain how the system addresses all three of the CDI goals.

Enabling best-effort cooperation

Youtopia is designed from the ground up to allow users to cooperate on all data management tasks.

Maximizing utility

Ensuring high utility of data in a Youtopia repository requires both maintaining good data quality and providing flexible and appropriate mechanisms for data querying and browsing.

Handling disagreement

As data and mappings are added to the repository, disagreement is inevitable.

Finally, privacy is also always a consideration; therefore, Youtopia includes social network-like functionality that allows users to establish a network of trusted acquaintances or friends, so that data, mappings, rankings and user-defined views can be shared to a varying extent.

People

Publications