Linguistic Models for Analyzing and Detecting Biased Language

Marta Recasens, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky

Proceedings of ACL, 2013.



       Collection of bias-driven edits (includes this Readme)

       Bias lexicon (includes this Readme)

       Other bias-related lexicons (includes this Readme)

       NPOV corpus (100GB unpacked; includes this Readme and this sample)



Unbiased language is a requirement for reference sources like encyclopedias and scientific text.  Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically.  To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles.  The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true.  We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective intensifiers.  These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word.  Our linguistically-informed model performs as well as humans tested on the same task.




  author={Marta Recasens and Cristian Danescu-Niculescu-Mizil and Dan Jurafsky},

  title={Linguistic Models for Analyzing and Detecting Biased Language},

  booktitle={Proceedings of ACL},