Here are 100 papers using our Movie Review Data, listed in roughly chronological order (will eventually be alphabetized within year). As of April, 2012, we ceased maintaining this list, due to lack of cycles to maintain it. The list is probably not complete even up to that date. Search terms that are useful for finding papers using our datasets include “movie-review-data”.
  1. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of EMNLP, 2002. Introduced polarity dataset v0.9.
  2. Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. Proceedings of WWW, 2003.
  3. Stephen D. Durbin, J. Neal Richter, and Doug Warner. A System for Affective Rating of Texts. Proceedings of the KDD Workshop on Operational Text Classification Systems (OTC-3), 2003.
  4. Fuchun Peng. Language Independent Text Learning with Statistical n-gram Language Models. PhD thesis, University of Waterloo, 2003.
  5. Franco Salvetti, Stephen Lewis, and Christoph Reichenbach. Impact of Lexical Filtering on Overall Opinion Polarity Identification. Proceedings of the AAAI Symposium on Exploring Attitude and Affect in Text: Theories and Applications, 2004.
  6. Sreenivasa P. Sista and S. H. Srinivasan. Polarized Lexicon for Review Classification. Proceedings of the International Conference on Machine Learning; Models, Technologies & Applications, 2004.
  7. Philip Beineke, Trevor Hastie, and Shivakumar Vaithyanathan. The Sentimental Factor: Improving Review Classification via Human-Provided Information. Proceedings of the ACL, 2004.
  8. Bo Pang and Lillian Lee. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. Proceedings of the ACL, 2004. Introduced polarity dataset v2.0, subjectivity dataset v1.0.
  9. Tony Mullen and Nigel Collier. Sentiment Analysis using Support Vector Machines with Diverse Information Sources. Proceedings of EMNLP, 2004.
  10. Evgeniy Gabrilovich and Shaul Markovitch . Text Categorization with Many Redundant Features: Using Aggressive Feature Selection to Make SVMs Competitive with C4.5. ICML 2004.
  11. Edoardo M. Airoldi, William W. Cohen, Stephen E. Fienberg. Bayesian methods for frequent terms in text: Models of contagion and the Delta square statistic. Proceedings of the CSNA & INTERFACE Annual Meetings (2005)
  12. Agarwal Alekh and Pushpak Bhattaccharyya. Sentiment analysis: a new approach for effective use of linguistic knowledge and exploiting similarities in a set of documents to be classified. ICON 2005.
  13. Anthony Aue and Michael Gamon. Customizing Sentiment Classifiers to New Domains: a Case Study. Proceedings of RANLP, 2005.
  14. Pimwadee Chaovalit and Lina Zhou. Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches. Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS), 2005.
  15. Jeremy Fletcher and Jon Patrick. Evaluating the Utility of Appraisal Hierarchies as a Method for Sentiment Classification . Proceedings of the Australasian Language Technology Workshop, 2005.
  16. Alistair Kennedy and Diana Inkpen. Sentiment Classification of Movie and Product Reviews Using Contextual Valence Shifters. Workshop on the Analysis of Informal and Formal Information Exchange during Negotiations (FINEXIN 2005).
  17. Shotaro Matsumoto, Hiroya Takamura, and Manabu Okumura. Sentiment Classification using Word Sub-Sequences and Dependency Sub-Tree. Proceedings of PAKDD, 2005.
  18. Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the ACL, 2005. Introduced scale dataset v1.0 and a positive/negative sentence collection.
  19. Jonathon Read. Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification. ACL Student Research Workshop, 2005.
  20. Jun Suzuki. Kernels for structured data in natural language processing. PhD Thesis, Nara Institute of Science and Technology, 2005.
  21. Casey Whitelaw, Navendu Garg, and Shlomo Argamon. Using Appraisal Taxonomies for Sentiment Analysis. Second Midwest Computational Linguistic Colloquium (MCLC 2005).
  22. Agarwal Alekh and Pushpak Bhattaccharyya. Augmenting Wordnet with Polarity Information on Adjectives. 3rd Global Wordnet Conference, 2006.
  23. Edoardo M. Airoldi, Xue Bai, Rema Padman. Markov blankets and meta-heuristic search: Sentiment extraction from unstructured text. Lecture Notes in Computer Science. vol. 3932 (2006)
  24. Andrew B. Goldberg and Xiaojin Zhu. Seeing stars when there aren't many stars: Graph-based semi-supervised learning for sentiment categorization. HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing.
  25. Yi Mao and Guy Lebanon. Sequential models for sentiment prediction. ICML Workshop on Learning in Structured Output Spaces, 2006.
  26. Vincent Ng, Sajib Dasgupta, and S. M. Niaz Arifin. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. Proceedings of the COLING/ACL Poster Sessions, 2006.
  27. Ellen Riloff, Siddharth Parwardhan, and Janyce Wiebe. Feature subsumption of opinion analysis. Proceedings of EMNLP, 2006.
  28. Arnd Christian König and Eric Brill. Reducing the Human Overhead in Text Categorization. Proceedings of KDD, 2006.
  29. Hui Yang Luo Si, and Jamie Callan. Knowledge Transfer and Opinion Detection in the TREC2006 Blog Track. Text REtrieval Conference 2006 (TREC2006), Gathersburgh, MD, Nov 14-17 2006.
  30. Mostafa Al Masum Shaikh, Helmut Prendinger and Mitsuru Ishizuka. An Analytical Approach to Assess Sentiment of Text. In Proc.(CD-ROM) Int'l Conf. on Computer and Information Technology (ICCIT 2007)
  31. Mostafa Al Masum Shaikh, Helmut Prendinger and Mitsuru Ishizuka. Assessing Sentiment of Text by Semantic Dependency and Contextual Valence Analysis. ACII 2007 (LNCS 4738).
  32. Shlomo Argamon, Casey Whitelaw, Paul Chase, Shushant Dhawle, Sobhan Raj Hota, Navendy Garg, and Shlomo Levitan. Stylist text classification using functional lexical features. JASIST 58, 2007.
  33. David Blei and Jon McAuliffe. Supervised topic models. NIPS 2007.
  34. Kenneth Bloom and Navendu Garg and Shlomo Argamon. Extracting Appraisal Expressions. NAACL HLT 2007, pp. 308--315.
  35. Erik Boiy; Pieter Hens; Koen Deschacht; Marie-Francine Moens. Automatic Sentiment Analysis in On-line Text. Openness in Digital Publishing: Awareness, Discovery and Access - Proceedings of the 11th International Conference on Electronic Publishing (ELPUB), 2007.
  36. Surajit Chaudhuri, Kenneth Church, Arnd Christian Konig, Liying Sui. Heavy-Tailed Distributions and Multi-Keyword Queries. SIGIR 2007.
  37. Shoushan Li, Chengqing Zong, and Xia Wang. Sentiment Classification through Combining Classifiers with Multiple Feature Sets. Natural Language Processing and Knowledge Engineering, 2007.
  38. Yi Mao and Guy Lebanon. Isotonic Conditional Random Fields and Local Sentiment Flow. NIPS 2007.
  39. Deanna Osman, John Yearwood, and Peter Vamplew. Using corpus analysis to inform research into opinion detection in blogs. Australasian data mining conference, 2007.
  40. Stephan Raaijmakers. Sentiment Classification with Interpolated Information Diffusion Kernels. Data Mining and Audience Intelligence for Advertising (ADKDD) 2007.
  41. Kimitaka Tsutsumi, Kazutaka Shimada, and Tsutomu Endo. Movie Review Classification Based on a Multiple Classifier. PACLIC, pp. 481-488, 2007.
  42. Kiduk Yang and Ning Yu and Hui Zhang. WIDIT in TREC 2007 Blog Track: combining lexiconbased methods to detect opinionated blogs. TREC 2007.
  43. Omar F. Zaidan, Jason Eisner, and Christine Piatko. Using “Annotator Rationales” to Improve Machine Learning for Text Categorization. NAACL HLT 2007.
  44. Qi Zhang, Bingqing Wang, Lide Wu, Xuanjing Huang. FDU at TREC 2007: opinion retrieval of Blog Track. TREC 2007.
  45. Xiaojin Zhu and Andrew B. Goldberg. Kernel regression with order preferences. AAAI 2007.
  46. Ritesh Agarwal, T. V. Prabhakar, Sugato Chakrabarty."I know what you feel": analyzing the role of conjunctions in automatic sentiment analysis. Proceedings of the 6th internationl conference on advances in natural language processing, 2008.
  47. Ben Allison. Sentiment Detection Using Lexically-Based Classifiers. Proceedings of TSD 2008.
  48. Alina Andreevskaia and Sabine Bergler. When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. ACL 2008.
  49. Jake Bartlett and Russ Albright. Coming to a Theater Near You! Sentiment Classification Techniques Using SAS Text Miner. SAS Global Forum 2008.
  50. Erik Boiy and Marie-Francine Moens. A machine learning approach to sentiment analysis in multilingual Web texts. Information Retrieval, 2008.
  51. Brant Chee, Karrie G. Karahalios, Bruce Schatz. Social visualization of health messages. HICSS 2008.
  52. Bo Chen, Hui He, Jun Guo. Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis. Journal of Computer Science and Technology 23(2): 231-- 239, 2008.
  53. Harb et al, Web Opinion Mining: How to extract opinions from blogs?. International Conference on Soft Computing as Transdisciplinary Science and Technology, 2008.
  54. Daisuke Ikeda, Hiroya Takamura, Lev-Arie Ratinov, Manabu Okumura. Learning to Shift the Polarity of Words for Sentiment Classification. IJCNLP 2008.
  55. D. Li, A. Laurent, M. Roche and P. Poncelet. Extraction of Opposite Sentiments in Classified Free Format Text Reviews. In Proceedings of the 19th International Conference on Database and Expert Systems Applications (DEXA 08).
  56. Milos Radovanovic and Mirjana Ivanovic. Text mining: approaches and applications. Nov Sad J Math 38(3), 2008.
  57. Vikas Sindhwani and Prem Melville. Document-Word Co-Regularization for Semi-supervised Sentiment Analysis. Extended version of a paper at ICDM 2008.
  58. Fangzhong Su and Katja Markert. From Words to Senses: a Case Study in Subjectivity Recognition. COLING 2008.
  59. Pu Wang and Carlotta Domeniconi. Building semantic kernels for text classification using Wikipedia. KDD 2008.
  60. Omar F. Zaidan and Jason Eisner. Modeling Annotators: A Generative Approach to Learning from Annotator Rationales. EMNLP 2008.
  61. Yi Zhang, Arun Surendran, John Platt, and Mukund Narasimhan. Learning from multi-topic web documents for contextual advertisement. KDD 2008.
  62. Sajib Dasgupta and Vincent Ng. Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification. EMNLP 2009.
  63. Yuan Yuan Hao, Yi Jun Li, Peng Zou. Why some online product reviews have no usefulness rating?. PACIS 2009.
  64. Xuanjing Huang and W. Bruce Croft. A unified relevance model for opinion retrieval. CIKM 2009.
  65. Jungi Kim, Jin-Ji Li and Jong-Hyeok Lee. Discovering the discriminative views: Measuring term weights for sentiment analysis. ACL/IJCNLP, 2009.
  66. Shoushan Li, Rui Xia, Chengqing Zong, Chu-Ren Huang. A Framework of Feature Selection Methods for Text Categorization. ACL/IJCNLP 2009.
  67. Yi Mao and Guy Lebanon. Domain Knowledge Uncertainty and Probabilistic Parameter Constraints. Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), 2009.
  68. Yi Mao and Guy Lebanon. Generalized isotonic conditional random fields . Machine Learning, 2009.
  69. Justin Martineau and Tim Finin. Delta TFIDF: An improved feature space for sentiment analysis. ICWSM 2009.
  70. Tim O'Keefe and Irena Koprinska. Feature selection and weighting methods in sentiment analysis. Australasian Document Computing Symposium, 2009.
  71. Rudy Prabowo and Mike Thelwall. Sentiment analysis: A combined approach. Journal of Informatics 2009. [Preprint]
  72. Veselin Raychev and Preslav Nakov. Language-Independent Sentiment analysis using subjectivity and positional information. RANLP 2009.
  73. Rudy Prabowo and Mike Thelwal. Sentiment analysis: a combined approach. Journal of Informetrics 2009.
  74. Muhammad Abulaish, Tanvir Ahmad, Jahiruddin, and Mohammad Najmud Doja. Opinion-based imprecise query answering. PAKDD 2010.
  75. Shilpa Arora, Elijah Mayfield, Carolyn Penstein-Rose and Eric Nyberg. Sentiment classification using automatically extracted subgraph features. NAACL workshop on Computational approaches to analysis and generation of emotion in text, 2010.
  76. Adrian Bickerstaff and Ingrid Zukerman. A hierarchical classifier applied to multi-way sentiment detection. Coling 2010.
  77. Kenneth Bloom and Shlomo Argamon. Automated learning of appraisal extraction patterns. in Corpus-linguistic applications: Current studies, new directions, 2010.
  78. Jorge Carrillo de Albornoz, Laura Plaza, Pablo Gervás. A hybrid approach to emotional sentence polarity and intensity classification. CoNLL 2010.
  79. Remi Lavalley, Clhoe Clavel, Marc El-Beze, Patrice Bellot. Finding topic-specific strings in text categorization and opinion mining contexts. International Conference on Data Mining (DMIN 2010).
  80. Chenghua Lin, Yulan He, and Richard Everson. A comparative study of Bayesian models for unsupervised sentiment detection. CoNLL 2010.
  81. Fernanda S. Pimenta, Darko Obradovi, Rafael Schirru, Stephan Baumann and Andreas Dengel. Automatic sentiment monitoring of specific topics in the blogosphere. ECML PKDD Workshop on Dynamic Networks and Knowledge Discovery, 2010.
  82. Tetsuji Nakagawa, Kentaro Inui, and Sadao Kurohashi. Dependency tree-based sentiment classification using CRFs with hidden variables. NAACL 2010.
  83. Georgios Paltoglou and Mike Thelwall. A study of Information Retrieval weighting schemes for sentiment analysis. ACL 2010.
  84. Vassiliki Rentoumi1, Stefanos Petrakis, Vangelis Karkaletsis, Manfred Klenner and George A. Vouros. A collaborative system for sentiment analysis . SETN 2010.
  85. Ainur Yessenalina, Yejin Choi, and Claire Cardie. Automatically generating annotator rationales to improve sentiment classification. ACL 2010.
  86. Ainur Yessenalina, Yisong Yue, and Claire Cardie. Multi-level structured models for document-level sentiment classification. EMNLP 2010
  87. Adnan Duric and Fei Song. Feature Selection for Sentiment Analysis Based on Content and Syntax Models. ACL Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, 2011.
  88. Yulan He, Chenghua Lin, and Harith Alani. Automatically extraction polarity-bearing topics for cross-domain sentiment classification. ACL 2011.
  89. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. ACL 2011.
  90. Vivek Singh, Mousumi Mukherjee and Ghanshyam Kumar Mehta. Combining Collaborative Filtering and Sentiment Classification for Improved Movie Recommendations. Multi-disciplinary trends in Artificial Intelligence, 2011.
  91. Richard Socher, Jeffrey Pennington, Eric Huang, Andrew Y. Ng, and Christopher D. Manning. Semi-supervised recursive autoencoders for predicting sentiment distribution. EMNLP 2011.
  92. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-based methods for sentiment analysis. Computational Linguistics 2011.
  93. Venkatasubramanian, Suresh and Veilumuthu, Ashok and Krishnamurthy, Avanthi and C.E, Veni Madhavan and Nath, Kaushik and Arvindam, Sunil. A non-syntactic approach for text sentiment classification with stopwords. Proceedings of the 20th international conference companion on World wide web, 2011.
  94. Marco Bonzanini, Miguel Martinez-Alvarez and Thomas Roelleke. Investigating the Use of Extractive Summarisation in Sentiment Classification.. 3rd Italian Information Retrieval Workshop (IIR 2012).
  95. Liviu P. Dinu and Iulia Iuga. The Naive Bayes Classifier in Opinion Mining: In Search of the Best Feature Set. CICLing 2012.
  96. Yulan He. Incorporating Sentiment Prior Knowledge for Weakly-Supervised Sentiment Analysis. ACM TALIP, 2012.
  97. Amanda Hutton, Alexander Liu, Cheryl Martin. Crowdsourcing Evaluations of Classifier Interpretability. AAAI Spring Symposium on Wisdom of the Crowds, 2012
  98. Seungyeon Kim, Fuxin Li, Guy Lebanon, and Irfan Essa. Beyond sentiment: The manifold of human emotions. arXiv:1202.1568v1, 2012.
  99. G. Li and F. Liu. Application of a clustering method on sentiment analysis. Journal of Information Science 38(2):127--139, 2012.
  100. Hassan Saif, Yulan He, and Harith Alani. Alleviating data sparsity for Twitter sentiment analysis. Workshop on Making Sense of Microposts, 2012.
Data used in demos or distributed software:
  1. Bob Carpenter. Sentiment classification tutorial., 2005.
  2. Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing in Python. Forthcoming book (as of summer 2008)