Automated Fact Checking

Data sets are often summarized via natural language text documents. Examples include newspaper articles by data journalists, scientific papers summarizing experimental results, or business reports summarizing quarterly sales. A majority of the population never accesses raw relational data but relies on text summaries alone. In that context, the following question arises: how can we trust such summaries to be consistent with the data?

We are developing approaches for automated and semi-automated fact checking of data summaries to answer that question. A text document, together with an associated data set, form the input for fact checking. Our goal is to identify erroneous claims about the data in the input text. More precisely, we focus on text passages that can be translated into a pair of an SQL query and a claimed query result. A claim is erroneous if evaluating the query yields a result that cannot be rounded to the one claimed in text.

In our first project in this space, we have developed a "fact checker tool" that supports authors in producing accurate data summaries. The tool is similar in spirit to a spell checker: where a spell checker supports users in avoiding erroneous spelling and grammatical mistakes, the fact checker supports users in avoiding erroneous claims. We focus on a restricted class of claims that are at the same time common and error-prone. The fact checker translates text passages into equivalent SQL queries, evaluates them on a database, and marks up potentially erroneous claims. Users obtain a natural language explanation, summarizing the system's interpretation of specific text passages, and can easily take corrective actions if necessary. We have recently used this tool to identify erroneous claims in articles from several major newspapers, some of which had gone by unnoticed for years.