MayBMS - A Database Management System for Uncertain and Probabilistic Data
Overview
Uncertain data often arises in practice. Examples include scientific databases, data integration, sensor data management, as well as scenarios where information is manually entered and is therefore prone to mistakes and incompleteness. MayBMS is a probabilistic database management system. Its main features include:
- A powerful query language for processing and transforming uncertain data
- Space-efficient representation and storage
- Support for data cleaning
- Efficient query evaluation
- Updates
Implementation
MayBMS has been implemented on top of PostgreSQL. Currently, the MayBMS code is not available to the public, but we plan to release it in the future. Various code snippets that people have requested, such as data generators and implementations of our confidence computation algorithms, are available for download from this page (see below, in the Papers section).
We are currently working on the second prototype of MayBMS, which is based on U-relations as the representation system (see our ICDE 2008 paper). The first prototype was based on world-set decompositions (WSDs). U-relations allow for more efficient query processing than WSDs and are more succinct.
People
- Lyublena Antova
- Michaela Götz
- Jiewen Huang (Oxford University)
- Christoph Koch
- Dan Olteanu (Oxford University)
Talk Slides and Overview Material
-
Slides on MayBMS2.
- This is currently the best document giving an overview of the project.
- Slides on MayBMS1.
Papers on MayBMS2
-
Using OBDDs for Efficient Query Evaluation on Probabilistic
Databases
Dan Olteanu and Jiewen Huang.
To appear in Proc. SUM 2008 (pdf).
- This paper uses binary decision diagrams for efficiently processing extensions of hierarchical conjunctive queries on tuple-independent probabilistic databases.
-
On APIs for Probabilistic Databases
Lyublena Antova and Christoph Koch.
To appear in Proc. MUD 2008 (pdf).
- This paper studies the challenge of defining an application programming interface for probabilistic databases. This is difficult because the goal of keeping the API independent from database internals (specifically, the representation system) clashes with the desire for efficiency.
-
Conditioning Probabilistic Databases
Christoph Koch and Dan Olteanu.
CoRR Technical Report arXiv:0803.2212.
To appear in Proc. VLDB 2008.
- This paper is the first to consider the problem of conditioning a probabilistic database outside of the context of graphical models. The core contribution is an exact confidence computation algorithm that seems to perform well in practice.
- Additional material: Code used in the experiments. This includes implementations of the optimal Karp-Luby approximation algorithm for confidence computation as well as an algorithm for exact confidence computation.
-
Approximating Predicates and Expressive Queries on Probabilistic
Databases
Christoph Koch.
In Proc. PODS 2008 (pdf).- This paper shows that queries in our expressive compositional query language can be efficiently arbitrarily closely approximated.
-
Fast and Simple Relational Processing of Uncertain Data.
Lyublena Antova, Thomas Jansen, Christoph Koch, Dan Olteanu.
Extended version in technical report INFOSYS-TR-2007-2 (pdf).
Proc. ICDE 2008. Best paper runner-up.
- This paper presents the representation system of MayBMS2 and the efficient SQL-only evaluation of a large fragment of our query language.
- Additional material: TPC-like generator of attribute-level U-relations, queries, translator from attribute-level to tuple-level U-relations, translator from tuple-level U-relations to ULDBs.
-
Query language support for incomplete information in the MayBMS system (Demonstration).
Lyublena Antova, Christoph Koch, Dan Olteanu.
In Proc. VLDB 2007 (pdf). -
From Complete to Incomplete Information and Back.
Lyublena Antova, Christoph Koch, Dan Olteanu.
Technical Report INFOSYS-TR-2006-15.
In Proc. SIGMOD 2007. (pdf)
- This paper presents the nonprobabilistic version of the MayBMS query language and studies its properties.
Papers on MayBMS1
-
World-set Decompositions: Expressiveness and Efficient Algorithms.
Lyublena Antova, Christoph Koch, Dan Olteanu.
Technical Report INFOSYS-TR-2006-12.
To appear in Theoretical Computer Science (pdf). Preliminary version in Proc. ICDT 2007. -
MayBMS: Managing Incomplete Information with Probabilistic
World-Set Decompositions (Demonstration).
Lyublena Antova, Christoph Koch, Dan Olteanu.
In Proc. ICDE 2007. Demo Paper. (pdf) -
10^(10^6) Worlds and Beyond: Efficient Representation and Processing of Incomplete Information.
Lyublena Antova, Christoph Koch, Dan Olteanu.
Technical Report INFOSYS-TR-2005-4.
In Proc. ICDE 2007. (pdf)
Poster
-
MayBMS: A System for Managing Large Uncertain and Probabilistic Databases.
Lyublena Antova, Christoph Koch, Dan Olteanu.
Best Poster Award at Spring'08 North East DB/IR Day, Columbia University, April 18, 2008. (pdf)
