CS 6742: Natural Language Processing and Social Interaction

Time and place Tuesdays and Thursdays, 10:10-11:00, Upson 315
Instructor Professor Lillian Lee. For contact info and updates, see http://www.cs.cornell.edu/home/llee
Co-pilot Cristian Danescu-Niculescu-Mizil, http://www.cs.cornell.edu/∼cristian
Course homepage http://www.cs.cornell.edu/courses/cs6742/2011sp. Site for course info; updated frequently.
Course CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments.

Prerequisites CS 2110 or equivalent programming experience, course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning), and graduate standing; or, permission of instructor.

Brief course description

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes.

CS6742 is geared towards the research-oriented, although not specifically tailored towards students with much research experience or with research interests directly in natural language processing or social interaction (network science, social media, traditional modes of communication, etc). See the course description and policies first-day handout for a list of related courses with different foci.

For more information

Here is the course description and policies first-day handout, which includes an outline of the class schedule and course workload.
Here is a tentative list of the papers that will be addressed in lectures and class project-proposal discussions.

Lectures

Quick links: #2, helpfulness (influence) #4, comments as interaction #5, accommodation/linguistic style adaptation #6, discourse #8, Congressional debates #9, discussion of Twitter-reaction assignment #11, influence/diffusion, #12, misperception, #13, information genealogy, #14, informational vs conversational , #15, persuasion , #16, individual/group influence on the lexicon, #17, social power relationships, #18, causal and reciprocal relationships

Lecture	Date	Agenda and references = access restricted to students in the course	Assignments and other handouts =access restricted to students in the course
#1	Jan 25	Course overview Lecture references: Hancock, Jeffrey T., Thom-Santelli, Jennifer, and Ritchie, Thompson. 2004. Deception and design: The impact of communication technology on lying behavior. Proceedings of the SIGCHI conference on Human factors in computing systems. CHI '04: 129-134. doi:10.1145/985692.985709. Pang, Bo, and Lee, Lillian. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. Describes the positive/negative opinion examples in more detail (and gives their sources) on pages 19 and 21. ToneCheck and ToneADay References on audience: Brake, David R. 2009. ‘As if nobody’s reading’?: The imagined audience and socio-technical biases in personal blogging practice in the UK. Ph.D. Thesis, the London School of Economics and Political Science. Marwick, Alice E., and boyd, danah. 2010. I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience. New Media & Society (July 7). doi:10.1177/1461444810365313.	course description and policies first-day handout A1 You may need to download the file and open it outside your browser to see the additional comments in the paper. Answers due 6am Thursday Jan 27th on the course CMS (get added to course CMS by 3pm Wed, as per instructions).
#2	27	Introduction: the question of review helpfulness Happy families are all alike; every unhappy family is unhappy in its own way. —Tolstoy, Anna Karenina Lecture references: Gilbert, Eric, and Karahalios, Karrie. 2010. Understanding deja reviewers. Proceedings of the 2010 ACM conference on Computer supported cooperative work. CSCW '10: 225-228. doi:10.1145/1718918.1718961. Liu, J, Cao, Y, Lin, C Y, Huang, Y, and Zhou, M. 2007. Low-quality product review detection in opinion summarization. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL): 334-342. The review (excerpt) we evaluated in class was annotated as a "Best" review in this paper. The paper also addresses issues of review bias. Other references on automatic helpfulness systems (see also next lecture): Pang, Bo, and Lee, Lillian. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. A survey of issues and prior work (including in the economics literature) regarding review(er) quality is presented on pages 80-88. Ghose, Anindya, and Ipeirotis, Panagiotis. 2010. Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics. IEEE Transactions on Knowledge and Data Engineering. doi:10.1109/TKDE.2010.188. Note in particular the "economic impact" part. Kim, Soo-Min, Pantel, Patrick, Chklovski, Tim, and Pennacchiotti, Marco. 2006. Automatically assessing review helpfulness. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing: 423-430. Early and representative paper on the classification problem. Note that the most useful features included review length and product rating. Tsur, Oren, and Rappoport, Ari. 2009. Revrank: A fully unsupervised algorithm for selecting the most helpful book reviews. International AAAI Conference on Weblogs and Social Media. Uses distance from a sort of "review centroid" to rank reviews, which is reminiscent of the Gilbert/Kahalios paper. More on reviewer motivations: David, Shay, and Pinch, Trevor J. Six degrees of reputation: The use and abuse of online review and recommendation systems. First Monday 2006, no. Special Issue on Commercial Applications of the Internet. . Wu, Fang, and Huberman, Bernardo A. 2010. Opinion formation under costly expression. ACM Transactions on Intelligent Systems and Technology 1, no. 1: 1-13.	A2 reading
#3	Feb 1	Case study: helpfulness as a proxy for influence, and the making of our "helpfulness" paper (WWW 2009) I have come to the conclusion that the making of laws is like the making of sausages—the less you know about the process the more you respect the result. —Frank W. Tracy (1898) [apparently often misattributed to Bismarck] Lecture references (see also previous lecture): Danescu-Niculescu-Mizil, Cristian, Kossinets, Gueorgi, Kleinberg, Jon, and Lee, Lillian. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW: 141--150. Sites of possible interest mentioned during lecture (including by students): Ciao includes Circle of Trust feature Epinions includes Web of Trust (viewable) and Block List (not obviously viewable) features. Social network dataset available courtesy of Jure Leskovec, Dan Huttenlocher, and Jon Kleinberg, Signed networks in social media. 28th ACM Conference on Human Factors in Computing Systems (CHI), 2010. Gamespot and IGN have expert and player reviews with like/dislike annotations. It was asserted in class that visitors to the site tend to read all the reviews. Slashdot many annotations on comments, based in part on a (somewhat complex) scoring system; also, users can be friends or foes, and other users' friends and foes are viewable. Friend/foe network 2008 dataset and 2009 dataset available courtesy of Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney, Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. arXiv.org:0810.1355, 2008. (Expanded version of a WWW 2008 paper) Wikipedia includes votes (yea and nay) on candidates for administration positions. Adminship votes dataset (does not include text of the nominations or subsequent comments on the candidates) available courtesy of Jure Leskovec, Dan Huttenlocher, and Jon Kleinberg, Signed networks in social media. 28th ACM Conference on Human Factors in Computing Systems (CHI), 2010 and Prediciting positive and Predicting positive and negative links in online social networks, WWW 2010 Yelp review annotations are "Useful", "Funny", Cool"; an automatic "fake/shill/malicious" review filter is in place. Other references (see also previous lecture): Lu, Yue, Tsaparas, Panayiotis, Ntoulas, Alexandros, and Polanyi, Livia. 2010. Exploiting social context for review quality prediction. Proceedings of the 19th international conference on World wide web. WWW '10: 691-700. doi:10.1145/1772690.1772761. Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. Proceedings of the 27th international conference on Human factors in computing systems: 955-964.
#4	3	Project-proposal session: Mishne, Gilad, and Glance, Natalie. 2006. Leave a reply: An analysis of weblog comments. Third annual workshop on the weblogging ecosystem. (Click on the image to see the complete "thread".) Submitted suggestions, anonymized and randomized Project proposal (includes references to related work and links mentioned in class)	A3 out
#5	8	Case study: Accommodation (linguistic style adaptation) Lecture references: Danescu-Niculescu-Mizil, Cristian, Michael Gamon, and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW (2011). Ireland, Molly E., Richard B. Slatcher, Paul W. Eastwick, Lauren E. Scissors, Eli J. Finkel, and James W. Pennebaker. 2011. Language style matching predicts relationship initiation and stability. Psychological Science 22 (1): 39-44. Levelt, Willem J. M., and Stephanie Kelter. Surface form and memory in question answering. Cognitive Psychology 14, no. 1 (1982). Taylor, Paul J and Sally Thomas. 2008. Linguistic style matching and negotiation outcome. Negotiation and Conflict Management Research 1:263-281. van Baaren, Rick B. , Rob W. Holland, Bregje Steenaert, and Ad van Knippenberg. 2003. Mimicry for money: Behavioral consequences of imitation. Journal of Experimental Social Psychology 39 (4): 393 - 398. More on linguistic style adaptation/coordination (also related to or called by other terms): Bilous, Frances R and Robert M. Krauss. 1988. Dominance and accommodation in the conversational behaviours of same- and mixed-gender dyads. Language & Communication 8 (3-4): 183 - 194. Niederhoffer, Kate G. and James W. Pennebaker. 2002. Linguistic style matching in social interaction. Journal of Language and Social Psychology 21 (4): 337-360. Giles, Howard, Justine Coupland, and Nikolas Coupland. 1991. Accommodation theory: Communication, context, and consequence. In Contexts of Accommodation: Developments in Applied Sociolinguistics. Cambridge Univ Pr. More on Twitter: Bakshy, Eitan, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone's an influencer: Quantifying influence on Twitter. Proceedings of WSDM. Cha, Meeyoung, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. Measuring user influence in Twitter: The million follower fallacy. In Proceedings of ICWSM. Chen, Jilin, Nairn, Rowan, and Chi, Ed H. 2011. Speak little and well: Recommending conversations in online social streams. Proceedings of CHI. Eisenstein, Jacob, O'Connor, Brendan, Smith, Noah A., and Xing, Eric P. 2010. A latent variable model for geographic lexical variation. Proceedings of EMNLP: 1277-1287. Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. What Is Twitter, a Social Network or a News Media? In Proceedings of WWW, 2010.
#6	10	Introduction to discourse analysis Lecture references (see also handout): Clark, Herbert H., and Fox Tree, Jean E. 2002. Using uh and um in spontaneous speaking. Cognition 84, no. 1: 73 - 111. doi:10.1016/S0010-0277(02)00017-3. Jurafsky, Dan, and Martin, James H. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Second edition, illustrated. Upper Saddle River, N.J.: Prentice Hall. Chapter 21 covers discourse. For more on imagined audiences, see the lecture 1 references. References on local focus: Brennan, Susan E., Friedman, Marilyn W., and Pollard, Carl J. 1987. A centering approach to pronouns. Proceedings of the ACL: 155-162. Grosz, Barbara J, Weinstein, Scott, and Joshi, Aravind K. 1995. Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21 (June): 203-225. Other references: Lejeune, Philippe. 2009. On diary. Ed. by Jeremy D. Popkin and Julie Rak. University of Hawaii Press. The editors note that Lejeune dates the practice of the "Dear diary" entry heading to the end of the nineteenth century; "The diffusion of the practice of writing to one's "dear diary" is significant: even though each diearist wrote in private, the spread of the formula indicates that diarists were increasingly aware that they were following a widely diffused model" (pg. 7) Some conversation/discourse corpora: AMI Meeting Corpus British Columbia Conversation Corpus (40 email threads) Enron email dataset Penn Discourse Treebank Saarbrücken Corpus of Spoken English Santa Barbara Corpus of Spoken American English IRC chat data and disentanglement code from Micha Elsner [.tgz]	Discourse examples handout (with citations added)
#7	15	Theories of discourse structure Lecture references (see also previous lecture): Clark, Herbert H. 1996. Using language. Second edition. Cambridge University Press. Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204. Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature References on alternate theories: Mann, William C., and Thompson, Sandra A. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text: Interdisciplinary Journal for the Study of Discourse 8, no. 3: 243-281. Marcu, Daniel. 2000. Extending a formal and computational model of Rhetorical Structure Theory with intentional structures à la Grosz and Sidner. Proceedings of COLING: 523-529. doi:10.3115/990820.990896. Walker, Marilyn A. 1996. Limited attention and discourse structure. Computational Linguistics 22(2): 255-264.	A4 out lecture handout (bring to next class, too)
#8	17	(a) Extended discourse example (b) Case study: from discourse to influence via politics "Get out the vote" paper (EMNLP 2006) Lecture references (see also previous lecture): Orwell, George. 1947. Politics and the English language. Horizon. Monroe, Burt L. and Philip A Schrodt. 2008. Introduction to the special issue: The statistical analysis of political text. Political Analysis 16 (4): 351-355. Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from congressional floor-debate transcripts. In Proceedings of EMNLP.	handout: excerpt from a U.S. Congress floor debate (also last lecture's handout)
#9	22	Group discussion of A4 preliminary solutions
#10	24	Group discussion of A4 final solutions Final writeups (access restricted to students in the class) Amit and Chenhao Bin and Bishan Hussam and Jon Karthik and Nikunj	A5 out, reading for student-led project-proposal session, due the morning of the Wednesday before lecture 12 so the discussion leader can integrate others' suggestions with their own for Tuesday. Subsequent assignments "out" on Thursdays are due the next Wednesday morning for the whole class, with the discussion leader running the project-discussion the following day and turning in a finalized project proposal the day after that (Friday). So, the process takes about a week.
#11	Mar 1	Influence and diffusion Lecture references For more on economic impact, see lecture 2. Androutsopoulos, Ion and Prodromos Malakasiotis. A survey of paraphrasing and textual entailment methods. J. Artif. Int. Res. 38:135-187. Backstrom, Lars, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of KDD, 44-54. "Taken together, these results support the notion that a burst of authors moving into a conference C from some other conference B are drawn to topics that are currently hot at C; but there is also evidence that this burst of authors produces papers that are comparably impoverished in their usage of terms that will be hot in the future". Bordia, Prashant and Nicholas Difonzo. Problem solving in social interactions on the internet: Rumor as social cognition. Social Psychology Quarterly 67 (1): 33--49. Broder, John M. 2007. Familiar Fallback for Officials: ‘Mistakes Were Made’. The New York Times, March 14. C'mon, guys! "Past exonerative"! How can you not laugh at William Schneider's coinage?! Gruhl, Daniel, R Guha, David Liben-Nowell, and Andrew Tomkins. 2004. Information diffusion through blogspace. In Proceedings of WWW, 491-501. Guerin, Bernard and Yoshihiko Miyazaki. Analyzing rumors, gossip, and urban legends through their conversational properties. Psychological Record 56 (1): 23-34. Heath, Chip, Chris Bell, and Emily Steinberg. Emotional selection in memes: The case of urban legends. Journal of Personality 81 (6): 1028-1041. Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? Proceedings of WWW, 591-600. Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-Tracking and the dynamics of the news cycle. In Proceedings of KDD, 497-506. Data and visualization website: http://www.memetracker.org/index.html Madnani, Nitin and Bonnie J Dorr. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Comput. Linguist. 36 (3): 341-387. Taraborelli, Dario and Giovanni Luca Ciampaglia. Beyond notability. [sic] Collective deliberation on content inclusion in Wikipedia. Second International Workshop on Quality in Techno-Social Systems. Here is the visualization and data site: notabilia.net Wu, Fang, Bernardo A. Huberman, Lada A. Adamic, and Joshua R. Tyler. Information flow in social groups. Physica A: Statistical and Theoretical Physics 337 (1-2): 327-335.	A6 out, reading for student-led project-proposal session, due the morning of the Monday before lecture 13. Subsequent assignments "out" on Tuesdays are due the next Monday morning for the whole class, with the discussion leader running the project-discussion the following day and turning in a finalized project proposal the day after that (Wednesday). So, the process takes about a week.
#12	3	Project proposal session for A5, led by Bishan. Ranganath, Rajesh, Dan Jurafsky, and Dan McFarland. 2009. It's not you, it's me: Detecting flirting and its misperception in speed-dates. Proceedings of EMNLP. A5 discussion transcript A5 proposal	A7 reading released
#13	8	Project proposal session for A6, led by Amit Shaparenko, Benyah and Thorsten Joachims. 2007. Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 619-628. Discussion transcript A6 proposal	A8 reading released (requires Cornell IP address; or students in the class can access via netID login here)
#14	10	Project proposal session for A7, led by Nikunj Harper, F Maxwell, Daniel Moy, and Joseph A Konstan. 2009. Facts or friends?: Distinguishing informational and conversational questions in social Q&A sites. In Proceedings of CHI, 759-768. Discussion transcript A7 proposal	A9 reading released
#15	15	Project proposal session for A8, led by Karthik Guerini, Marco, Carlo Strapparava, and Oliviero Stock. 2008. Trusting politicians' words (for persuasive NLP). In Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing (Cicling). Discussion transcript A8 proposal
#16	17	Project proposal session for A9, led by Jon Altmann, Eduardo G, Janet B Pierrehumbert, and Adilson E Motter. 2011. Niche as a determinant of word fate in online groups. PLos ONE 6(5): e19009. doi:10.1371/journal.pone.0019009. Discussion transcript A9 proposal	A10 reading released A11 reading released
Mar 22		Spring Break
Mar 24		Spring Break
	29	No class, to avoid requiring work over Spring Break (per recent Faculty Senate resolution)
#17	31	Project proposal session for A10, led by Chenhao Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT. Discussion transcript A10 proposal
#18	Apr 05	Project proposal session for A11, led by Bin Girju, Roxana. 2010. Toward social causality: An analysis of interpersonal relationships in online blogs and forums. Proceedings of ICWSM, pp. 66--73. Discussion transcript A11 proposal
#19	7	Organization session for picking projects/groups	A12 (project "contracts") opened
	12	Small-group presentations to Lillian and Cristian of project contracts: ~15-minute presentations of the final draft of your project contracts, for additional feedback/comments.
	14	No class, so as to allow you to concentrate on your projects. You may want to use the time to meet with your group, since you know you all have the time available (we assume you are also meeting with your group at other times, of course!)	A13 (checkpoint 1) opened
	19	"checkup" meetings: each group meets individually with Cristian for a timeslot during the lecture time.
	21	no class: you may want to use the time to meet with your group (somewhere).
	26	All-hands presentation of progress reports. Each group will have 15 minutes to talk to the whole class about where you are and solicit suggestions from your classmates.
	28	Optional consulting hours, Upson 315 during the usual lecture time. Schedule a slot on CMS (preferred to prevent people waiting around for their turn) or take your chances and just drop by. Lillian and Cristian will be there even if nobody signs up beforehand.
	May 03	Optional consulting hours, Upson 315 during the usual lecture time. Schedule a slot on CMS (preferred to prevent people waiting around for their turn) or take your chances and just drop by. Cristian and Lillian will be there even if nobody signs up beforehand.
	05	All-hands presentations. Each group will have 15 minutes to talk to the whole class about where you are and solicit suggestions from your classmates.
	May 12	Final project due

Code for generating the calendar above and css was (barely) adapted from the original versions created by Andrew Myers.

Page last modified Mon July 18, 2011 11:35 PM