Publications

As of July 2019, this list is no longer being updated.



Errata

* Uncited but relevant to this paper is Raphael Cohen et al.'s 2014 PLoS One paper, Redundancy-Aware Topic Modeling for Patient Record Notes, which develops a topic model, Red-LDA, that aims to combat the effects of text duplication.

** This work focuses on solely English, a fact which the title does not specify. While the analysis methodologies should generalize to other languages, the results discouraging stemming may not. See Chandler May et al.'s 2016 arXiv paper Analysis of Morphology in Topic Modeling for an example of a somewhat different result in Russian.

*** This paper uses a version of gender analysis (assuming binary genders and classifying gender using common baby name lists) that I would not recommend due to its lack of gender inclusiveness and inaccuracy. For a better approach, check out Brian Larson's 2017 EthNLP paper, Gender as a Variable in Natural-Language Processing: Ethical Considerations.