Causal Databases - Youtopia
Database users increasingly expect to work with, create, and query not just data, but rich and expressive metadata as well. One particular flavor of metadata that is increasingly important in real-world applications is causal metadata. This describes the causal structure, provenance and other causal constraints associated with the data.
In the Youtopia project, we are laying the foundations for a new breed of databases -- causal databases. Causal databases can model causal information explicitly, and allow for queries regarding causality and explanations, as well as supporting emerging applications that make use of causality information in novel ways. This is a joint project with the University of Washington Database group.
We are working in the following areas:
Modeling and understanding causality
We are pursuing two foundational projects on causality. The first involves examining more carefully how to model causality in real-world situations. As we now understand, this involves understanding the interplay between the effects of interventions in the world (modeled by the structural equations) and the expectations of agents (what they consider to be "normal" or "typical"). These issues arise in particular when causality is applied in the law. Much of this work is joint with Christopher Hitchcock. Preliminary work on this topic is discussed in the paper "Actual causation and the art of modeling"; work on this topic is ongoing.
The other major line of work explores the connection between the standard approach to modeling causality in the philosophy literature, which involves possible worlds models called counterfactual structures, and structural models approach that is the basis of work being done under this grant. In "From causal models to counterfactual structures" it is shown that, despite well-known claim to the contrary in the literature, these two approaches are, in general, inequivalent in expressive power, although in the special case most often considered in the literature, where the causal structure is acyclic, the structural equations approach can be viewed as a special case of the counterfactual structures approach.
Causality and coordination on the Social Web
Database users frequently want to explore causal information about data and queries. However, they also increasingly want to accomplish new tasks where they themselves create causality-related metadata. Nowhere is this more apparent than on the Social Web, where new data-driven applications are constantly emerging.
We focus on supporting one such emerging application - online coordination. We have developed entangled queries, a declarative, high-level data-centric abstraction that allows users to specify causal coordination constraints. When the system evaluates these queries, it ensures that all causality constraints are satisfied; thus, all users are guaranteed that the coordination has occurred as desired.
Online coordination
In the last decade, the amount of data on the Internet has grown at a staggering rate. The rise of Web 2.0 has fueled that growth by providing platforms that enable people to create, upload, and share content very easily. However, people do much more than just produce and consume data; they are beginning to center their life around the Web and use social applications for very complex data-driven tasks, many of which involve coordination.
As a concrete example, consider the use case of online travel planning. All of us have faced the scenario where we wished to coordinate travel plans with family, friends or colleagues. How does this coordination happen today? Typically, it starts with a phone or email conversation to decide on the itinerary. Then, one person books tickets for everyone, and there is another round of emails to sort out the finances. Alternately, everybody arranges to book at about the same time, hoping that the airline seats do not fill up in the meantime. Clearly, there has to be a better way.
Entangled queries
Entangled queries are a declarative abstraction that allows users to specify causal constraints on coordination. For example, suppose Donald wants to travel to Los Angeles on the same flight as his friend Minnie. Rather than communicating out-of-band, he can use a simple visual interface to specify his coordination request:
The system translates his request into an SQL-like query specification of the form:
SELECT 'Donald', fno INTO ANSWER Booking
WHERE
fno IN (SELECT flightno FROM Flight WHERE dest="LA")
AND ('Minnie', fno) IN ANSWER Booking
CHOOSE 1
This entangled query is a request to learn the number of a single flight to LA on which Minnie also plans to travel. Assuming Minnie also wants to coordinate with Donald, she issues a symmetric query with the strings 'Donald' and 'Minnie' exchanged. The system receives these queries for evaluation, recognizes that Donald and Minnie want to coordinate, chooses a flight and returns the same flight number to both friends. Donald and Minnie's queries may be embedded within entangled transactions - larger unites of code whose execution remains isolated except for the single information exchange about the flight number.
Our existing publications on this topic outline our vision for D3C, introduce entangled queries in detail, explain how these queries can be used in real-world applications, and present the entangled transaction abstraction in depth.
Community Data Integration (CDI)
Communities everywhere on the Web want to share, store and query data. Their motivations for data sharing are very diverse, as is the format of the data involved. However, community data sharing scenarios have in common many high-level properties that translate into design desiderata for Community Data Integration (CDI) systems. First, a CDI system must enable best-effort cooperation among community members with respect to maintenance of the data and metadata. Next, it must manage disagreement regarding the data and schema or other metadata. Finally, it must maximize data utility. We are building a system to address these desiderata and enable community data sharing in arbitrary settings.
People
- Gabriel Bender
- Johannes Gehrke
- Nitin Gupta
- Joseph Halpern
- Christopher Hitchcock
- Christoph Koch
- Lucja Kot
- Milos Nikolic
- Sudip Roy
Publications
- Nitin Gupta, Milos Nikolic, Sudip Roy, Gabriel Bender, Lucja Kot, Johannes Gehrke, Christoph Koch. Entangled Transactions. VLDB 2011. Slides.
- Nitin Gupta, Lucja Kot, Sudip Roy, Gabriel Bender, Johannes Gehrke, and Christoph Koch. Entangled queries: enabling declarative data-driven coordination. SIGMOD 2011. (Best Paper Award Winner) Slides.
- Nitin Gupta, Lucja Kot, Gabriel Bender, Sudip Roy, Johannes Gehrke, Christoph Koch. Coordination Through Querying in the Youtopia System (Demonstration Paper). SIGMOD 2011.
- J. Y. Halpern, From causal models to counterfactual structures, Proceedings of the Twelfth International Conference on Principles of Knowledge Representation and Reasoning (KR 2010), 2010.
- R. Dechter, H. Geffner, J.Y. Halpern, Heuristics, Probability, and Causality: A Tribute to Judea Pearl, College Publications, 2010.
- J.Y. Halpern and C. Hitchcock, Actual causation and the art of modeling, in Heuristics, Probability and Causality: A Tribute to Judea Pearl, (editors, R. Dechter, H. Geffner, and J. Y. Halpern), College Publications, 2010, pp. 383-406
- A. Meliou, W. Gatterbauer, J.Y. Halpern, C. Koch, K. F. Moore, and D. Suciu, Causality in databases, IEEE Data Engineering Bulletin 33:3, 2010, pp. 59-67.
- Lucja Kot, Nitin Gupta, Sudip Roy, Johannes Gehrke, and Christoph Koch. Beyond Isolation: Research Opportunities in Declarative Data-Driven Coordination. SIGMOD Record, 2010.
- Lucja Kot, Christoph Koch. Cooperative Update Exchange in the Youtopia System. VLDB 2009.
Funding
This research has been supported by the National Science Foundation under Grants IIS-0534404 and IIS-0911036, by a Google Research Award, by New York State Science Technology and Academic Research under agreement number C050061, and by the iAd Project funded by the Research Council of Norway. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.