CS 6742: Natural Language Processing and Social Interaction

Time and place Tuesdays and Thursdays, 10:10-11:00, Upson 315
Instructor Professor Lillian Lee. For contact info and updates, see http://www.cs.cornell.edu/home/llee
Co-pilot Cristian Danescu-Niculescu-Mizil, http://www.cs.cornell.edu/∼cristian
Course homepage http://www.cs.cornell.edu/courses/cs6742/2011sp. Site for course info; updated frequently.
Course CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments.

Prerequisites CS 2110 or equivalent programming experience, course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning), and graduate standing; or, permission of instructor.

Brief course description

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes.

CS6742 is geared towards the research-oriented, although not specifically tailored towards students with much research experience or with research interests directly in natural language processing or social interaction (network science, social media, traditional modes of communication, etc). See the course description and policies first-day handout for a list of related courses with different foci.

For more information


Lecture Date

Agenda and references

access restricted= access restricted to students in the course

Assignments and other handouts

access restricted=access restricted to students in the course

#1 Jan 25

Course overview

Lecture references:

Hancock, Jeffrey T., Thom-Santelli, Jennifer, and Ritchie, Thompson. 2004. Deception and design: The impact of communication technology on lying behavior. Proceedings of the SIGCHI conference on Human factors in computing systems. CHI '04: 129-134. doi:10.1145/985692.985709.

Pang, Bo, and Lee, Lillian. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. Describes the positive/negative opinion examples in more detail (and gives their sources) on pages 19 and 21.

ToneCheck and ToneADay

References on audience:

Brake, David R. 2009. ‘As if nobody’s reading’?: The imagined audience and socio-technical biases in personal blogging practice in the UK. Ph.D. Thesis, the London School of Economics and Political Science.

Marwick, Alice E., and boyd, danah. 2010. I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience. New Media & Society (July 7). doi:10.1177/1461444810365313.

  • course description and policies first-day handout

  • A1 You may need to download the file and open it outside your browser to see the additional comments in the paper.

    Answers due 6am Thursday Jan 27th on the course CMS (get added to course CMS by 3pm Wed, as per instructions).

#2 27

Introduction: the question of review helpfulness

Happy families are all alike; every unhappy family is unhappy in its own way. —Tolstoy, Anna Karenina

Lecture references:

Gilbert, Eric, and Karahalios, Karrie. 2010. Understanding deja reviewers. Proceedings of the 2010 ACM conference on Computer supported cooperative work. CSCW '10: 225-228. doi:10.1145/1718918.1718961.

Liu, J, Cao, Y, Lin, C Y, Huang, Y, and Zhou, M. 2007. Low-quality product review detection in opinion summarization. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL): 334-342. The review (excerpt) we evaluated in class was annotated as a "Best" review in this paper. The paper also addresses issues of review bias.

Other references on automatic helpfulness systems (see also next lecture):

Pang, Bo, and Lee, Lillian. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. A survey of issues and prior work (including in the economics literature) regarding review(er) quality is presented on pages 80-88.

Ghose, Anindya, and Ipeirotis, Panagiotis. 2010. Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics. IEEE Transactions on Knowledge and Data Engineering. doi:10.1109/TKDE.2010.188. Note in particular the "economic impact" part.

Kim, Soo-Min, Pantel, Patrick, Chklovski, Tim, and Pennacchiotti, Marco. 2006. Automatically assessing review helpfulness. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing: 423-430. Early and representative paper on the classification problem. Note that the most useful features included review length and product rating.

Tsur, Oren, and Rappoport, Ari. 2009. Revrank: A fully unsupervised algorithm for selecting the most helpful book reviews. International AAAI Conference on Weblogs and Social Media. Uses distance from a sort of "review centroid" to rank reviews, which is reminiscent of the Gilbert/Kahalios paper.

More on reviewer motivations:

David, Shay, and Pinch, Trevor J. Six degrees of reputation: The use and abuse of online review and recommendation systems. First Monday 2006, no. Special Issue on Commercial Applications of the Internet. .

Wu, Fang, and Huberman, Bernardo A. 2010. Opinion formation under costly expression. ACM Transactions on Intelligent Systems and Technology 1, no. 1: 1-13.


A2 reading

#3 Feb 1

Case study: helpfulness as a proxy for influence, and the making of our "helpfulness" paper (WWW 2009)

I have come to the conclusion that the making of laws is like the making of sausages—the less you know about the process the more you respect the result. —Frank W. Tracy (1898) [apparently often misattributed to Bismarck]

Lecture references (see also previous lecture):

Danescu-Niculescu-Mizil, Cristian, Kossinets, Gueorgi, Kleinberg, Jon, and Lee, Lillian. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW: 141--150.

Sites of possible interest mentioned during lecture (including by students):

Other references (see also previous lecture):

Lu, Yue, Tsaparas, Panayiotis, Ntoulas, Alexandros, and Polanyi, Livia. 2010. Exploiting social context for review quality prediction. Proceedings of the 19th international conference on World wide web. WWW '10: 691-700. doi:10.1145/1772690.1772761.

Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. Proceedings of the 27th international conference on Human factors in computing systems: 955-964.

#4 3

Project-proposal session: Mishne, Gilad, and Glance, Natalie. 2006. Leave a reply: An analysis of weblog comments. Third annual workshop on the weblogging ecosystem.


(Click on the image to see the complete "thread".)

Submitted suggestions, anonymized and randomizedaccess restricted

Project proposal
(includes references to related work and links mentioned in class) access restricted


A3 out



#5 8

Case study: Accommodation (linguistic style adaptation)

Lecture references:

Danescu-Niculescu-Mizil, Cristian, Michael Gamon, and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW (2011).

Ireland, Molly E., Richard B. Slatcher, Paul W. Eastwick, Lauren E. Scissors, Eli J. Finkel, and James W. Pennebaker. 2011. Language style matching predicts relationship initiation and stability. Psychological Science 22 (1): 39-44.

Levelt, Willem J. M., and Stephanie Kelter. Surface form and memory in question answering. Cognitive Psychology 14, no. 1 (1982).

Taylor, Paul J and Sally Thomas. 2008. Linguistic style matching and negotiation outcome. Negotiation and Conflict Management Research 1:263-281.

van Baaren, Rick B. , Rob W. Holland, Bregje Steenaert, and Ad van Knippenberg. 2003. Mimicry for money: Behavioral consequences of imitation. Journal of Experimental Social Psychology 39 (4): 393 - 398.

More on linguistic style adaptation/coordination (also related to or called by other terms):

Bilous, Frances R and Robert M. Krauss. 1988. Dominance and accommodation in the conversational behaviours of same- and mixed-gender dyads. Language & Communication 8 (3-4): 183 - 194.

Niederhoffer, Kate G. and James W. Pennebaker. 2002. Linguistic style matching in social interaction. Journal of Language and Social Psychology 21 (4): 337-360.

Giles, Howard, Justine Coupland, and Nikolas Coupland. 1991. Accommodation theory: Communication, context, and consequence. In Contexts of Accommodation: Developments in Applied Sociolinguistics. Cambridge Univ Pr.

More on Twitter:

Bakshy, Eitan, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone's an influencer: Quantifying influence on Twitter. Proceedings of WSDM.

Cha, Meeyoung, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring user influence in Twitter: The million follower fallacy. In Proceedings of ICWSM.

Chen, Jilin, Nairn, Rowan, and Chi, Ed H. 2011. Speak little and well: Recommending conversations in online social streams. Proceedings of CHI.

Eisenstein, Jacob, O'Connor, Brendan, Smith, Noah A., and Xing, Eric P. 2010. A latent variable model for geographic lexical variation. Proceedings of EMNLP: 1277-1287.

Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. What Is Twitter, a Social Network or a News Media? In Proceedings of WWW, 2010.

#6 10

Introduction to discourse analysis


Lecture references (see also handout):

Clark, Herbert H., and Fox Tree, Jean E. 2002. Using uh and um in spontaneous speaking. Cognition 84, no. 1: 73 - 111. doi:10.1016/S0010-0277(02)00017-3.

Jurafsky, Dan, and Martin, James H. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Second edition, illustrated. Upper Saddle River, N.J.: Prentice Hall. Chapter 21 covers discourse.

For more on imagined audiences, see the lecture 1 references.

References on local focus:

Brennan, Susan E., Friedman, Marilyn W., and Pollard, Carl J. 1987. A centering approach to pronouns. Proceedings of the ACL: 155-162.

Grosz, Barbara J, Weinstein, Scott, and Joshi, Aravind K. 1995. Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21 (June): 203-225.

Other references:

Lejeune, Philippe. 2009. On diary. Ed. by Jeremy D. Popkin and Julie Rak. University of Hawaii Press. The editors note that Lejeune dates the practice of the "Dear diary" entry heading to the end of the nineteenth century; "The diffusion of the practice of writing to one's "dear diary" is significant: even though each diearist wrote in private, the spread of the formula indicates that diarists were increasingly aware that they were following a widely diffused model" (pg. 7)

Some conversation/discourse corpora:

AMI Meeting Corpus

British Columbia Conversation Corpus (40 email threads)

Enron email dataset

Penn Discourse Treebank

Saarbrücken Corpus of Spoken English

Santa Barbara Corpus of Spoken American English

IRC chat data and disentanglement code from Micha Elsner [.tgz]


Discourse examples handout (with citations added)

#7 15

Theories of discourse structure

Lecture references (see also previous lecture):

Clark, Herbert H. 1996. Using language. Second edition. Cambridge University Press.

Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204.

Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature

References on alternate theories:


Mann, William C., and Thompson, Sandra A. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text: Interdisciplinary Journal for the Study of Discourse 8, no. 3: 243-281.

Marcu, Daniel. 2000. Extending a formal and computational model of Rhetorical Structure Theory with intentional structures à la Grosz and Sidner. Proceedings of COLING: 523-529. doi:10.3115/990820.990896.

Walker, Marilyn A. 1996. Limited attention and discourse structure. Computational Linguistics 22(2): 255-264.

A4 out

lecture handout (bring to next class, too)

#8 17

(a) Extended discourse example

(b) Case study: from discourse to influence via politics "Get out the vote" paper (EMNLP 2006)

Lecture references (see also previous lecture):

Orwell, George. 1947. Politics and the English language. Horizon.

Monroe, Burt L. and Philip A Schrodt. 2008. Introduction to the special issue: The statistical analysis of political text. Political Analysis 16 (4): 351-355.


Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In Proceedings of EMNLP.

handout: excerpt from a U.S. Congress floor debate

(also last lecture's handout)

#9 22 Group discussion of A4 preliminary solutions



#10 24

Group discussion of A4 final solutions

Subsequent assignments "out" on Thursdays are due the next Wednesday morning for the whole class, with the discussion leader running the project-discussion the following day and turning in a finalized project proposal the day after that (Friday). So, the process takes about a week.


#11 Mar 1

Influence and diffusion


Lecture references

For more on economic impact, see lecture 2.

Androutsopoulos, Ion and Prodromos Malakasiotis. A survey of paraphrasing and textual entailment methods. J. Artif. Int. Res. 38:135-187.

Backstrom, Lars, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of KDD, 44-54. "Taken together, these results support the notion that a burst of authors moving into a conference C from some other conference B are drawn to topics that are currently hot at C; but there is also evidence that this burst of authors produces papers that are comparably impoverished in their usage of terms that will be hot in the future".

Bordia, Prashant and Nicholas Difonzo. Problem solving in social interactions on the internet: Rumor as social cognition. Social Psychology Quarterly 67 (1): 33--49.

Broder, John M. 2007. Familiar Fallback for Officials: ‘Mistakes Were Made’. The New York Times, March 14. C'mon, guys! "Past exonerative"! How can you not laugh at William Schneider's coinage?!

Gruhl, Daniel, R Guha, David Liben-Nowell, and Andrew Tomkins. 2004. Information diffusion through blogspace. In Proceedings of WWW, 491-501.

Guerin, Bernard and Yoshihiko Miyazaki. Analyzing rumors, gossip, and urban legends through their conversational properties. Psychological Record 56 (1): 23-34.

Heath, Chip, Chris Bell, and Emily Steinberg. Emotional selection in memes: The case of urban legends. Journal of Personality 81 (6): 1028-1041.

Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? Proceedings of WWW, 591-600.

Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-Tracking and the dynamics of the news cycle. In Proceedings of KDD, 497-506. Data and visualization website: http://www.memetracker.org/index.html

Madnani, Nitin and Bonnie J Dorr. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Comput. Linguist. 36 (3): 341-387.

Taraborelli, Dario and Giovanni Luca Ciampaglia. Beyond notability. [sic] Collective deliberation on content inclusion in Wikipedia. Second International Workshop on Quality in Techno-Social Systems. Here is the visualization and data site: notabilia.net

Wu, Fang, Bernardo A. Huberman, Lada A. Adamic, and Joshua R. Tyler. Information flow in social groups. Physica A: Statistical and Theoretical Physics 337 (1-2): 327-335.



A6 out, reading for student-led project-proposal session, due the morning of the Monday before lecture 13.

Subsequent assignments "out" on Tuesdays are due the next Monday morning for the whole class, with the discussion leader running the project-discussion the following day and turning in a finalized project proposal the day after that (Wednesday). So, the process takes about a week.

#12 3

Project proposal session for A5, led by Bishan.

Ranganath, Rajesh, Dan Jurafsky, and Dan McFarland. 2009. It's not you, it's me: Detecting flirting and its misperception in speed-dates. Proceedings of EMNLP.

A7 reading released




#13 8

Project proposal session for A6, led by Amit

Shaparenko, Benyah and Thorsten Joachims. 2007. Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 619-628.

A8 reading released (requires Cornell IP address; or students in the class can access via netID login here)

#14 10

Project proposal session for A7, led by Nikunj

Harper, F Maxwell, Daniel Moy, and Joseph A Konstan. 2009. Facts or friends?: Distinguishing informational and conversational questions in social Q&A sites. In Proceedings of CHI, 759-768.

A9 reading released

#15 15

Project proposal session for A8, led by Karthik

Guerini, Marco, Carlo Strapparava, and Oliviero Stock. 2008. Trusting politicians' words (for persuasive NLP). In Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing (Cicling).

#16 17

Project proposal session for A9, led by Jon

Altmann, Eduardo G, Janet B Pierrehumbert, and Adilson E Motter. 2011. Niche as a determinant of word fate in online groups. PLos ONE 6(5): e19009. doi:10.1371/journal.pone.0019009.

A10 reading access restrictedreleased

A11 reading released

Mar 22 Spring Break
Mar 24 Spring Break

No class, to avoid requiring work over Spring Break (per recent Faculty Senate resolution)

#17 31

Project proposal session for A10, led by Chenhao

Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT.

#18 Apr 05

Project proposal session for A11, led by Bin

Girju, Roxana. 2010. Toward social causality: An analysis of interpersonal relationships in online blogs and forums. Proceedings of ICWSM, pp. 66--73.

#19 7

Organization session for picking projects/groups

A12 (project "contracts") opened


Small-group presentations to Lillian and Cristian of project contracts: ~15-minute presentations of the final draft of your project contracts, for additional feedback/comments.




No class, so as to allow you to concentrate on your projects. You may want to use the time to meet with your group, since you know you all have the time available (we assume you are also meeting with your group at other times, of course!)

A13 (checkpoint 1) opened

"checkup" meetings: each group meets individually with Cristian for a timeslot during the lecture time.


no class: you may want to use the time to meet with your group (somewhere).


All-hands presentation of progress reports. Each group will have 15 minutes to talk to the whole class about where you are and solicit suggestions from your classmates.


Optional consulting hours, Upson 315 during the usual lecture time. Schedule a slot on CMS (preferred to prevent people waiting around for their turn) or take your chances and just drop by. Lillian and Cristian will be there even if nobody signs up beforehand.

  May 03

Optional consulting hours, Upson 315 during the usual lecture time. Schedule a slot on CMS (preferred to prevent people waiting around for their turn) or take your chances and just drop by. Cristian and Lillian will be there even if nobody signs up beforehand.


All-hands presentations. Each group will have 15 minutes to talk to the whole class about where you are and solicit suggestions from your classmates.

  May 12

Final project due


