Data-Driven Fact Checking

Manual fact checks cannot scale to the rate at which misinformation is spread and produced. In multiple projects on Data-Driven Fact Checking, we aim at automating fact checks by translating claims into queries on raw (relational) data. Immanuel's talk at the Global Fact conference gives an overview of our recent work in that space (slides can be downloaded Here):

Verifying Claims on Corona Virus

Scrutinizer is our newest system in the space of data-driven fact checking. Together with our collaborators at EURECOM, we recently published an online version of our system that verifies claims on Corona spread based on data sources from WHO and CDC. Scrutinizer translates claims into SQL queries on a large database, exploiting techniques such as natural language analysis as well as structural analysis of the source data. Scrutinizer learns to recognize new claim types over time, based on feedback from users. It uses planning to optimize questions asked to users for maximal impact with minimal overheads.

Our Web interface has attracted Thousands of Users at this point and received Wide Coverage in the Press (see below for a full list). Click Here for a pre-print outlining technical details of the Scrutinizer system and Here for our online interface!

In another project, we link claims on the Coronavirus back to a large database of existing fact checks (checks collected in June 2019). See Here for a demo!

Verifying Text Documents

Data sets are often summarized via natural language text documents. Examples include newspaper articles by data journalists, scientific papers summarizing experimental results, or business reports summarizing quarterly sales. A majority of the population never accesses raw relational data but relies on text summaries alone. In that context, the following question arises: how can we trust such summaries to be consistent with the data?

We are developing approaches for automated and semi-automated fact checking of data summaries to answer that question. A text document, together with an associated data set, form the input for fact checking. Our goal is to identify erroneous claims about the data in the input text. More precisely, we focus on text passages that can be translated into a pair of an SQL query and a claimed query result. A claim is erroneous if evaluating the query yields a result that cannot be rounded to the one claimed in text.

In our first project in this space, we have developed a "fact checker tool" that supports authors in producing accurate data summaries. The tool is similar in spirit to a spell checker: where a spell checker supports users in avoiding erroneous spelling and grammatical mistakes, the fact checker supports users in avoiding erroneous claims. We focus on a restricted class of claims that are at the same time common and error-prone. The fact checker translates text passages into equivalent SQL queries, evaluates them on a database, and marks up potentially erroneous claims. Users obtain a natural language explanation, summarizing the system's interpretation of specific text passages, and can easily take corrective actions if necessary. We have recently used this tool to identify erroneous claims in articles from several major newspapers, some of which had gone by unnoticed for years.

Try our Fact Checker Online Demo!

Collaborators: Cong Yu, Xuezhi Wang (Google Research, NY). Mohammed Saeed, Paolo Papotti (EURECOM, France).

Mining an Anti-Knowledge Base

In collaboration with Google NYC, we have recently mined an Anti-Knowledge Base containing common factual mistakes from Wikipedia. This data set is currently undergoing a pre-publication review at Google.

Publications

Georgios Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer. "Scrutinizer: Fact Checking Statistical Claims." VLDB 2020.

Georgios Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer. "Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification". VLDB 2020.

Georgios Karagiannis, Immanuel Trummer, Saehan Jo, Shubham Khandelwal, Xuezhi Wang, Cong Yu. "Mining an “Anti-Knowledge Base” from Wikipedia updates with applications to fact checking and beyond. VLDB 2020.

Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta. "AggChecker: A Fact-Checking System for Text Summaries of Relational Data Sets". VLDB 2019.

Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta. "Verifying Text Summaries of Relational Data Sets". SIGMOD 2019. Preprint on arXiv.

Press Coverage

By Corriere della Sera (Primary Italian Newspaper).

By Nuova Societa Magazine.

By Zazoom News.

By TPI.

By GG Giovani Genitori.

By Virgilio.

By Geos News.

By Triest All News.

By INPGI Notice.

By Lombard Report.

By Vincenca Piu.

By Glonna Bot.

Funding

Funding by Google for our research on "Data-driven Fact Checking of Coronavirus Claims".

Google Faculty Research Award for mining misinformation from Wikipedia and other sources.

Resources

Fact checking benchmark data set (claims and ground truth queries) available here.

Results for several fact checking baselines on the benchmark data set are also available here (see point 3, bottom).