MayBMS - A Database Management System for Uncertain and Probabilistic Data
Cite: L. Antova, T. Jansen, C. Koch, and D. Olteanu. "Fast and Simple Relational Processing of Uncertain Data". Proc. 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancun, Mexico, pp. 983-992.
MayBMS is a probabilistic database management system. Its main features include:
- A powerful query language for processing and transforming uncertain data
- Space-efficient representation and storage
- Efficient query evaluation based on mature relational technology
- Support for conditioning and data cleaning
MayBMS has been implemented on top of PostgreSQL and will soon be officially released. A current snapshot of the source code can be obtained at maybms.sourceforge.net. This is a complete reimplementation effort. The system currently lacks some of the features of the demos (but contains others in addition).
Various code snippets that people have requested, such as data generators and implementations of our confidence computation algorithms, are available for download below on this page.
MayBMS: A System for Managing Large Uncertain and
C. Koch. Unpublished draft, Sept. 26, 2008.
- This is currently the best document giving an overview of the project. It is a full paper version of the slide set below.
- Slides on MayBMS2.
- Slides on MayBMS1.
A Compositional Framework for Complex Queries over Uncertain Data
M. Goetz and C. Koch. Under submission, 2008.
Lazy versus Eager Query Plans for Tuple-Independent Probabilistic Databases
D. Olteanu, J. Huang, C. Koch. To appear in Proc. ICDE, 2008. Long paper.
- This paper shows how to find query plans that are more efficient than safe plans for hierarchical queries on tuple-independent databases. The paper also introduces a special operator for efficiently processing such plans.
Using OBDDs for Efficient Query Evaluation on Probabilistic Databases
D. Olteanu and J. Huang. To appear in Proc. SUM 2008.
- This paper uses binary decision diagrams for efficiently processing extensions of hierarchical conjunctive queries on tuple-independent probabilistic databases.
Conditioning Probabilistic Databases
C. Koch and D. Olteanu. Proc. VLDB 2008.
- This paper is the first to consider the problem of conditioning a probabilistic database outside of the context of graphical models. The core contribution is an exact confidence computation algorithm that seems to perform well in practice.
Approximating Predicates and Expressive Queries on Probabilistic
C. Koch. Proc. PODS 2008.
- This paper shows that queries in our expressive compositional query language can be efficiently arbitrarily closely approximated.
Fast and Simple Relational Processing of Uncertain Data
L. Antova, T. Jansen, C. Koch, D. Olteanu. Proc. ICDE 2008. Best paper runner-up.
- This paper presents the representation system of MayBMS2 and the efficient SQL-only evaluation of a large fragment of our query language.
A Compositional Query Algebra for Second-Order Logic and Uncertain
C. Koch. Technical Report arXiv:0807.4620.
- This paper proves that world-set algebra, (the nonprobabilistic version of) the core of the MayBMS query language, has exactly the same expressive power as second-order logic. It also provides some useful insights into query languages for uncertain databases in general.
On APIs for Probabilistic Databases
L. Antova and C. Koch. Proc. MUD 2008.
- This paper studies the challenge of defining an application programming interface for probabilistic databases. This is difficult because the goal of keeping the API independent from database internals (specifically, the representation system) clashes with the desire for efficiency.
Query language support for incomplete information in the MayBMS system (Demonstration)
L. Antova, C. Koch, D. Olteanu. Proc. VLDB 2007.
- This was a MayBMS2 demo, but the paper focuses on the query language of MayBMS. The PODS 2008 paper is a much better reference for (the algebraic version of) the language.
From Complete to Incomplete Information and Back
L. Antova, C. Koch, D. Olteanu. Proc. SIGMOD 2007.
- This paper presents the nonprobabilistic version of the MayBMS query language and studies its properties.
Note: We are currently working on the second prototype of MayBMS -- MayBMS2 -- which is based on U-relations as the representation system (see our ICDE 2008 paper). The first prototype, MayBMS1, was based on world-set decompositions (WSDs). U-relations allow for more efficient query processing than WSDs and are more succinct.
World-set Decompositions: Expressiveness and Efficient Algorithms
L. Antova, C. Koch, D. Olteanu. Theoretical Computer Science 403 (2-3):265-284 (2008) Preliminary version in Proc. ICDT 2007.
- This paper studies the theory of world-set decompositions. Of particular interest is the factorization algorithm, which does a form of minimization of representations.
MayBMS: Managing Incomplete Information with Probabilistic
World-Set Decompositions (Demonstration)
L. Antova, C. Koch, D. Olteanu. Proc. ICDE 2007. Demo Paper.
- This was a demo of MayBMS1. The paper is the first to discuss world-set decompositions for representing probabilistic databases.
10^(10^6) Worlds and Beyond: Efficient Representation and Processing of Incomplete Information.
L. Antova, C. Koch, D. Olteanu. Proc. ICDE 2007. Technical Report INFOSYS-TR-2005-4.
- This paper introduces world-set decompositions, the representation formalism of MayBMS1, and studies query evaluation on these representations. World-set decompositions are based on factorizations to exploit independence and, at least in their probabilistic form, can be thought of as shallow Bayesian Networks.
MayBMS: A System for Managing Large Uncertain and Probabilistic Databases.
L. Antova, C. Koch, D. Olteanu. Best Poster Award at Spring'08 North East DB/IR Day, Columbia University, April 18, 2008.
- Lyublena Antova
- Michaela Götz
- Jiewen Huang (Oxford University)
- Christoph Koch
- Dan Olteanu (Oxford University)
Code used in the experiments
of our VLDB 2008 paper.
- This includes implementations of the optimal Karp-Luby approximation algorithm for confidence computation as well as an algorithm for exact confidence computation (both in C++).
Resources used in our experiments for the ICDE 2008 paper on U-relations:
- TPC-like generator of attribute-level U-relations (C code),
- translator from attribute-level to tuple-level U-relations (PLSQL script),
- translator from tuple-level U-relations to ULDBs (PLSQL script).