Deterministic Approximate Query Processing

CIDR 2020 Talk on BitGourmet

Overview

Approximate query processing typically uses sampling to generate confidence bounds on query aggregates. Those bounds do not always contain the correct value. Also, sampling is problematic for aggregation functions such as maxima or minima that are sensitive to outliers.

BitGourmet aims at Deterministic Approximation, i.e. it generates bounds on query aggregates that guarantee to contain the true value. For that, we cannot use sampling. When sampling rows, we may always miss a row that would change the aggregate significantly. Instead of reading all data about a few rows, we read a few bits from every row. Thereby, we reduce the amount of data processed and therefore execution time. However, by carefully selecting which bits we read, we are able to give tight bounds on aggregation values.

Publications

Saehan Jo, Immanuel Trummer. "Demonstration of ScroogeDB: getting more bang for the buck with deterministic approximation in the Cloud." VLDB 2020.

Saehan Jo, Immanuel Trummer. "Demonstration of BitGourmet: data analysis via deterministic approximation." SIGMOD 2020.

Saehan Jo, Immanuel Trummer. "BitGourmet: deterministic approximation via optimized bit selections." CIDR 2020.