In collaboration with Google NYC, we have recently mined an Anti-Knowledge Base containing common factual mistakes from Wikipedia. This data set is currently undergoing a pre-publication review at Google.
Data sets are often summarized via natural language text documents. Examples include newspaper articles by data journalists, scientific papers summarizing experimental results, or business reports summarizing quarterly sales. A majority of the population never accesses raw relational data but relies on text summaries alone. In that context, the following question arises: how can we trust such summaries to be consistent with the data?
We are developing approaches for automated and semi-automated fact checking of data summaries to answer that question. A text document, together with an associated data set, form the input for fact checking. Our goal is to identify erroneous claims about the data in the input text. More precisely, we focus on text passages that can be translated into a pair of an SQL query and a claimed query result. A claim is erroneous if evaluating the query yields a result that cannot be rounded to the one claimed in text.
In our first project in this space, we have developed a "fact checker tool" that supports authors in producing accurate data summaries. The tool is similar in spirit to a spell checker: where a spell checker supports users in avoiding erroneous spelling and grammatical mistakes, the fact checker supports users in avoiding erroneous claims. We focus on a restricted class of claims that are at the same time common and error-prone. The fact checker translates text passages into equivalent SQL queries, evaluates them on a database, and marks up potentially erroneous claims. Users obtain a natural language explanation, summarizing the system's interpretation of specific text passages, and can easily take corrective actions if necessary. We have recently used this tool to identify erroneous claims in articles from several major newspapers, some of which had gone by unnoticed for years.
Try our Fact Checker Online Demo!
Collaborators: Cong Yu, Xuezhi Wang (Google Research, NY).
Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta. "AggChecker: A Fact-Checking System for Text Summaries of Relational Data Sets". VLDB 2019.
Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta. "Verifying Text Summaries of Relational Data Sets". SIGMOD 2019. Preprint on arXiv.
Fact checking benchmark data set (claims and ground truth queries) available here.
Results for several fact checking baselines on the benchmark data set are also available here (see point 3, bottom).