Opinion mining and sentiment analysis

Bo Pang and Lillian Lee
Foundations and Trends in Information Retrieval 2(1-2), pp. 1–135, 2008.
Also available as a book or e-book.

The monograph itself:


Associated slides:

An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object.

This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, vulnerability to manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Mentions (roughly chronological order):
Congratulations”, Matthew Hurst | “THE survey to read”, “tremendous resource”, Jeffrey Carr | “Excellent”, George Tziralis | “excellent points”, Jessica Hullman | “more than a must”, José María Gómez Hidalgo | “excellent and very comprehensive”, Philip Resnik | “excellent and comprehensive survey”, Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis | “a gold mine”, Jaylan Turkkan | “definitive monograph” Seth Grimes, who also wrote a practitioners'-perspective mini-review | “entertaining ... excellent and timely”, Shlomo Argamon, Computational Linguistics brief review | linked to under anchor text “science” of sentiment by Discover Magazine's blog and named by an article on sentiment analysis in the New York Times.

Textbook for the following courses: Social Media Analysis, William Cohen, CMU Spring 2010; Computational linguistics II: opinion mining and sentiment analysis, Hyopil Shin, Seoul National University, Spring 2009

Table of Contents:

  1. Introduction
    1. The demand for information on opinions and sentiment
    2. What might be involved? An example examination of the construction of an opinion/review search engine
    3. Our charge and approach
    4. Early history
    5. A note on terminology: Opinion mining, sentiment analysis, subjectivity, and all that
  2. Applications
    1. Applications to review-related websites
    2. Applications as a sub-component technology
    3. Applications in business and government intelligence
    4. Applications across different domains
  3. General Challenges
    1. Contrasts with standard fact-based textual analysis
    2. Factors that make opinion mining difficult
  4. Classification and Extraction
    Part One: Fundamentals
    1. Problem formulations and key concepts
      1. Sentiment polarity and degrees of positivity
      2. Subjectivity detection and opinion identification
      3. Joint topic-sentiment analysis
      4. Viewpoints and perspectives
      5. Other non-factual information in text
    2. Features
      1. Term presence vs. frequency
      2. Term-based features beyond term unigrams
      3. Parts of speech
      4. Syntax
      5. Negation
      6. Topic-oriented features
    Part Two: Approaches
    1. The impact of labeled data
    2. Domain adaptation and topic-sentiment interaction
      1. Domain considerations
      2. Topic (and sub-topic or feature) considerations
    3. Unsupervised approaches
      1. Unsupervised lexicon induction
      2. Other unsupervised approaches
    4. Classification based on relationship information
      1. Relationships between sentences and between documents
      2. Relationships between discourse participants
      3. Relationships between product features
      4. Relationships between classes
    5. Incorporating discourse structure
    6. Language models
    7. Special considerations for extraction
      1. Identifying product features and opinions in reviews
      2. Problems involving opinion holders
  5. Summarization
    1. Single-document opinion-oriented summarization
    2. Multi-document opinion-oriented summarization
      1. Some problem considerations
      2. Textual summaries
      3. Non-textual summaries
      4. Review(er) quality
  6. Broader Implications
    1. Economic impact of reviews
      1. Surveys summarizing relevant economic literature
      2. Economic-impact studies employing automated text analysis
      3. Interactions with word of mouth (WOM)
    2. Implications for manipulation
  7. Publicly Available Resources
    1. Datasets
      1. Acquiring labels for data
      2. An annotated list of datasets
    2. Evaluation campaigns
      1. TREC opinion-related competitions
      2. NTCIR opinion-related competitions
    3. Lexical resources
    4. Tutorials, bibliographies, and other references
  8. Concluding Remarks
  9. References

Lillian Lee's home page | Lillian Lee's co-authored papers on sentiment analysis