CS 6742, Fall 2013: Natural Language Processing and Social Interaction

Time and place Tuesdays and Thursdays, 10:10-11:25, Upson 315
Instructor Professor Lillian Lee. For contact info, see http://www.cs.cornell.edu/home/llee
Course homepage http://www.cs.cornell.edu/courses/cs6742/2013fa. Main site for course info, assignments, readings, lecture references, etc.; updated frequently.
Course CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments.
Course Piazza page http://piazza.com/cornell/fall2013/cs6742 Course announcements and Q&A/discussion site. Social interaction and all that, you know.
 
This page last modified Fri September 27, 2013 3:30 PM.

Brief course description

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. The intended audience is strongly research-oriented students.

Prerequisites CS 2110 or equivalent programming experience, and at least one course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, and graduate standing;
  or,
permission of instructor.

For more information

Lectures

access restricted= access restricted to students in the course

Quick links: overview | (start of) review sites | discussion threads | (start of) discourse analysis

Lecture Date Agenda and references
Assignments and other handouts
#1 Th Aug 29

Course overview

Scan of lecture notes

Related talk:

I'm giving the CS colloquium on some recent research I've been involved in in the afternoon:
Language as influence(d): Power and Memorability
4:15, Upson B17

References:

Bennett, Shea. Twitter Now Seeing 400 Million Tweets Per Day, Increased Mobile Ad Revenue, Says CEO. June 7, 2012

Brake, David R. 2009. ‘As if nobody’s reading’?: The imagined audience and socio-technical biases in personal blogging practice in the UK. Ph.D. Thesis, the London School of Economics and Political Science.

Hancock, Jeffrey T., Jennifer Thom-Santelli, and Thompson Ritchie. 2004. Deception and design: The impact of communication technology on lying behavior. Proceedings of the SIGCHI conference on Human factors in computing systems (CHI): 129-134. doi:10.1145/985692.985709.

Lejeune, Philippe. 2009. On Diary. Ed. Jeremy D. Popkin and Julie Rak. University of Hawaii Press. [Google books link]. The editors note that Lejeune dates the practice of the "Dear diary" entry heading to the end of the nineteenth century; "The diffusion of the practice of writing to one's "dear diary" is significant: even though each diearist wrote in private, the spread of the formula indicates that diarists were increasingly aware that they were following a widely diffused model" (pg. 7)

Marwick, Alice E., and danah boyd. 2010. I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience. New Media & Society (July 7). doi:10.1177/1461444810365313.

Pang, Bo, and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. Describes the positive/negative opinion examples from lecture in more detail (and gives their sources) on pages 19 and 21.

The Radicati group. Email Statistics Report, 2011-2015. May 2011.

Assignment 1 (A1)
#2 Tu Sep 3

To what extent is there social interaction on review sites?

Scan of lecture notes

Amazon's explanation of customer review quotes

Amazon comment thread regarding personal point of view in reviews

Remark on the effectiveness of Amazon review quotes. From LinkedIn page of someone stating that they worked on the product.

Review with 42/42 helpfulness score

The Harriet Klausner appreciation society. A site regarding Harriet Klausner, one-time number-one top reviewer on Amazon.com.

References:

Gilbert, Eric and Karrie Karahalios. 2010. Understanding deja reviewers. Proceedings of CSCW, pp.225—228. [ACM link] [alternative link]

Otterbacher, Jahna. Gender, writing and ranking in review forums: A case study of the IMDB. Knowledge and Information Systems 35 (3): 645-664, 2012.

Pang, Bo, and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. Section 5.2.4 reviews work on predicting helpfulness of reviews; later work can be found by forward-searching in a citation database for work that cites the references from our chapter.

Pinch, Trevor and Filip Kesler. 2011. How Aunt Ammy gets her free lunch: A study of the top-thousand customer reviewers at Amazon.com. http://www.freelunch.me/filecabinet. Possibly related version (I haven't looked carefully): Trevor Pinch, "Book Reviewing for Amazon.com: How Socio-technical Systems Struggle to Make Less From More," in Managing Overflow in Affluent Societies, Barbara Czarniawska and Orvar Löfgren (eds.). New York and London: Routledge, 2012.

Wu, Fang, and Bernardo A. Huberman. 2010. Opinion formation under costly expression. ACM Transactions on Intelligent Systems and Technology 1, no. 1: 1-13.
 
#3 Th Sep 5

Correlates of review helpfulness; the making of our WWW 2009 paper

Scan of lecture notes

Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW: 141—150. The slides we looked at in class can be found in this pdf.

Ghose, Anindya and Panagiotis Ipeirotis. 2011. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering 23(10): 1498—1512. Official link can be found through Worldcat, e.g., here.

Kim, Soo-Min, Patrick Pantel, Tim Chklovski, and Marco Pennacchiotti. 2006. Automatically assessing review helpfulness. Proceedings of EMNLP, 423-430.

Liu, Jingjing, Yunbao Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. 2007. Low-quality product review detection in opinion summarization. Proceedings of EMNLP-CoNLL, pp.334--342.

Lu, Yue, Panayiotis Tsaparas, Alexandros Ntoulas, and Livia Polanyi. Exploiting social context for review quality prediction. Proceedings of WWW, 691-700, 2010.

Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. Proceedings of CHI, 955-964.

Zhang, Zhu and Balaji Varadarajan. 2006. Utility scoring of product reviews. Proceedings of CIKM, pp.51--57.

 
#4 Tu Sep 10

Asynchronous online discussions

Scan of lecture notes

Wikipedia pages we examined: An article's talk page. An article's revision history page. A structured user talk page. An unstructured user talk page. Statistics on Wikipedians. Requests for adminship. Article deletion discussions.

Slashdot pages we examined: Slashdot home Slashdot scoring system. Wikipedia's explanation of the Slashdot scoring system A Slashdot corpus (may be downloadable if you register for an account): README and legal notices training set. Another corpus containing slashdot material is the BC3 Blog Corpus

References:

Backstrom, Lars, Jon Kleinberg, Lillian Lee, and Cristian Danescu-Niculescu-Mizil. 2013. Characterizing and curating conversation threads: Expansion, focus, volume, re-entry. Proceedings of WSDM, pp. 13–22.

Bakshy, Eitan, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone's an influencer: Quantifying influence on Twitter. Proceedings of WSDM.

Chen, Zoey and Jonah Berger. 2013. When, why, and how controversy causes conversation. Journal of Consumer Research, 40:580–593.

De Choudhury, Munmun, Hari Sundaram, Ajita John, and Dorée Duncan Seligmann. 2009. What makes conversations interesting?: Themes, participants and consequences of conversations in online social media. Proceedings of WWW, pp. 331–340.

Elsner, Micha and Eugene Charniak. September 2010. Disentangling chat. Computational Linguistics 36 (3): 389-409.

Ferschke, Oliver, Johannes Daxenberger, and Iryna Gurevych. 2013. A survey of NLP methods and resources for analyzing the collaborative writing process in Wikipedia. Chapter 5 in The People's Web Meets NLP: Collaboratively Constructed Language Resources.

Gómez, Vicenç, Andreas Kaltenbrunner, and Vicente López. 2008. Statistical analysis of the social network and discussion threads in Slashdot. Proceedings of WWW, pp. 645–654.

Gilad Mishne and Natalie Glance. 2006. Leave a reply: An analysis of weblog comments. Third Annual Workshop on the Weblogging Ecosystem.

Nov, Oded. 2007. What motivates Wikipedians? CACM 50(11):60–64.

Shmueli, Erez, Amit Kagian, Yehuda Koren, and Ronny Lempel. 2012. Care to comment?: Recommendations for commenting on news stories. Proceedings of WWW, pp. 429–438.

Wang, Yi-Chia, Mahesh Joshi, and Carolyn Penstein Rosé. 2008. Investigating the effect of discussion forum interface affordances on patterns of conversational interactions. Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW), pp. 555–558.

 
#5 Th Sep 12 Pilot-study presentations  
#6 Th Sep 16

Discourse phenomena: clues regarding structure

Scan of lecture notes

Some conversation/discourse corpora:

AMI Meeting Corpus

British Columbia Conversation Corpus (40 email threads)

Enron email dataset

Penn Discourse Treebank

Saarbrücken Corpus of Spoken English

Santa Barbara Corpus of Spoken American English

Supreme Court dialogs corpus

IRC chat data and disentanglement code from Micha Elsner [.tgz]

Wikipedian conversations corpus: [zipfile] [README alone]

References not already in the lecture handout:

Grice, H.P. 1975. Logic and Conversation. In Syntax and semantics 3: Speech Acts, pp. 41-58.

Jurafsky, Dan, and Martin, James H. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Second edition, illustrated. Upper Saddle River, N.J.: Prentice Hall. Chapter 21 covers discourse.

Rogers, Todd and Michael I Norton. June 2011. The artful dodger: Answering the wrong question the right way. Journal of Experimental Psychology: Applied 17 (2).

Lecture handout
#7 Th Sep 18

Global discourse structure: the Grosz and Sidner theory

Scan of lecture notes

Clark, Herbert H. 1996. Using language. Second edition. Cambridge University Press.

Clark, Herbert H., and Fox Tree, Jean E. 2002. Using uh and um in spontaneous speaking. Cognition 84, no. 1: 73 - 111. doi:10.1016/S0010-0277(02)00017-3.

Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204.

Mann, William C., and Thompson, Sandra A. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text: Interdisciplinary Journal for the Study of Discourse 8, no. 3: 243-281.

Marcu, Daniel. 2000. Extending a formal and computational model of Rhetorical Structure Theory with intentional structures à la Grosz and Sidner. Proceedings of COLING: 523-529

Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature

 

A2
#8 Tu Sep 24 No class  
#9 Th Sep 26

Discussion of A2 (discourse annotation exercise)

Scan of lecture notes

Egg, Markus and Gisela Redeker. 2010. How complex is discourse structure? Proceedings of LREC, 1619-1623.

Walker, Marilyn A. 1996. Limited attention and discourse structure. Computational Linguistics 22(2): 255-264.

Wolf, Florian and Edward Gibson. 2005. Representing discourse coherence: A corpus-based study. Computational Linguistics 31(2):249-288.

 
  Tu Oct 1

Class really meets on Wednesday Oct 2, 4-5pm in the 301 College Ave seminar room: Cristian Danescu-Niculescu-Mizil (MPI-SWS) is giving the IS colloquium on Language and social dynamics in online communities.

References:

Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. Proceedings of ACL.

Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. Proceedings of the 22nd International Conference on World Wide Web, pp.307–318.

 

 
  Tu Oct 15 Fall Break - no class  
  Th Nov 28 Thanksgiving Break - no class  
  W Dec 11 Final project due  

Code for generating the calendar above and css was (barely) adapted from the original versions created by Andrew Myers.