Course homepage http://www.cs.cornell.edu/courses/cs6742/2014fa. Main site for course info, assignments, readings, lecture references, etc.; updated frequently.
Course CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments, unless otherwise noted.
Course Piazza page http://piazza.com/cornell/Fall2014/cs6742 Course announcements and Q&A/discussion site. Social interaction and all that, you know.
Instructor Professor Lillian Lee. For contact info, see http://www.cs.cornell.edu/home/llee
Time and place Tuesdays and Thursdays, 10:10-11:25, Hollister 401 (since this room has reconfigurable seating) Gates Hall 344 breakout room (quietly enter through 344, since students are working there, an go to the room on the right).
 
This page last modified Tue September 30, 2014 8:54 PM.

Brief course description More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include sentiment analysis, learning social-network structure, analysis of text in political or legal domains, review aggregation systems, analysis of online conversations, and text categorization with respect to psychological categories.

Prerequisites As previously announced in the 2014-2015 Courses of Study, enrollment is limited to PhD students except by permission of instructor . August 14 addition: given the number of PhD students who have registered for credit, permission will not be granted to non-PhD students, and auditing will not be allowed. Required background: CS 2110 or equivalent programming experience, and at least one course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning).

Related courses In Fall 2014, there's CS4744 Computational linguistics, CS6783 Machine learning theory, CS6788/INFO 6150 Advanced topic modeling, ECE 5960 Graphical models, IS 6320 Games, economic behavior, and the Internet. In Spring 2015, there's CS 4740 Natural language processing, and new IS professor Cristian Danescu-Niculescu-Mizil may be offering a course quite similar to CS6742.

Informative links

Lectures

Quick links: overview | reviews, helpfulness, social interaction | what do conversations "look" like? | discourse |adaptation| proposals based on Unspeakable/Kickstarter

Lecture Date Agenda and references Assignments and other handouts
#1 Aug 26

Course overview: scope, course goals, course design

The school of Athens - people talking and reading

Image source: http://en.wikipedia.org/wiki/The_School_of_Athens. Some people are speaking to each other; some are reading and perhaps being influenced by that text; some are writing text, perhaps hoping to have an effect on others; some texts are being read by several people simulataneously.

Scan of lecture notes

Images and webpages displayed in class:

References

Bryan, Christopher J, Gregory M Walton, Todd Rogers, and Carol S Dweck. 2 August 2011. Motivating voter turnout by invoking the self. Proceedings of the National Academy of Sciences 108 (31): 12653-12656.

Chong, Dennis and James N. Druckman. 2007. Framing theory. Annual Review of Political Science 10:103--126.

Assignment 1 (A1) officially released
#2 28

To what extent is there social interaction on review sites?

Image source: Dorothy Gambrel, Cat and Girl: http://catandgirl.com/archive/2001-05-21-cg0043drive.gif. Permission policy here.

Scan of lecture notes

Images and webpages displayed in class:

References

Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. Proceedings of WWW, pp. 307--318.

Gilbert, Eric and Karrie Karahalios. 2010. Understanding deja reviewers. Proceedings of CSCW, pp.225—228. [ACM link]

Jurafsky, Dan, Victor Chahuneau, Bryan R. Routledge and Noah A. Smith. Narrative framing of consumer sentiment in online restaurant reviews. First Monday 19(4).

Michael, Loizos and Jahna Otterbacher. 2014. Write like I write: Herding in the language of online reviews. Proceedings of ICWSM.

Mimno, David. Data carpentry. 2014.

Pinch, Trevor and Filip Kesler. 2011. How Aunt Ammy gets her free lunch: A study of the top-thousand customer reviewers at Amazon.com.

 
#3 Sep 2

Review "quality" and "helpfulness": a lens for studying social influence

Image source: Randall Munroe, xkcd (click on image for original link). Expletive obscured in this presentation.

Scan of lecture notes

Images and handouts from class

References on lecture topics

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2014. How community feedback shapes user behavior. Proceedings of ICWSM.

Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW: 141—150. [alt link]

Ghose, Anindya and Panagiotis Ipeirotis. 2011. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering 23(10): 1498—1512. Official link can be found through Worldcat, e.g., here.

Muchnik, Lev, Sinan Aral, and Sean Taylor. 2013. Social influence bias: A randomized experiment. Science 341.

Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. Proceedings of CHI, 955-964.

Sipos, Ruben, Arpita Ghosh, and Thorsten Joachims. 2014. Was this review helpful to you? It depends! Context and voting patterns in online content. Proceeedings of WWW.

Wang, R.Y. and Strong, D.M. Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12, 4 (1996), 5-34.

Representative additional references on "unconventional" text classification, by popular demand

Davidov, Dmitry, Oren Tsur, and Ari Rappoport. 2010. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 107--116. http://aclweb.org/anthology/W10-2914

Kiddon, Chloé and Yuriy Brun. That's what she said: Double entendre classification. Proceedings of the ACL (short papers), 89--94.

Li, Jiwei, Myle Ott, Claire Cardie, and Eduard Hovy. 2014. Towards a general rule for identifying deceptive opinion spam. Proceedings of the ACL. The paper showing a learned classifier outperforming humans on Tripadvisor-style reviews is Ott, M, Y Choi, C Cardie, and J T Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the ACL, pp. 309--319.

Mihalcea, Rada and Carlo Strapparava. 2006. Learning to laugh (automatically): Computational models for humor recognition. Computational Intelligence 22(2).

 
#4 4

What do conversations "look" like?

Scan of lecture notes

Aside: email corpora

References

Backstrom, Lars, Jon Kleinberg, Lillian Lee, and Cristian Danescu-Niculescu-Mizil. 2013. Characterizing and curating conversation threads: Expansion, focus, volume, re-entry. Proceedings of WSDM, pp. 13–22. [alt link]

Elsner, Micha and Eugene Charniak. September 2010. Disentangling chat. Computational Linguistics 36(3): 389-409. [data and code]

Gonzalez-Bailon, Sandra, Andreas Kaltenbrunner, and Rafael E Banchs. 2010. The structure of political discussion networks: A model for the analysis of online deliberation. Journal of Information Technology 25(2): 230-243.

Kumar, Ravi, Mohammad Mahdian, and Mary McGlohon. 2010. Dynamics of conversations. Proceedings of KDD, pp. 553--562.

Nguyen, Viet-An, Jordan Boyd-Graber, Philip Resnik, Deborah A Cai, Jennifer E Midberry, and Yuanxin Wang. 2014. Modeling topic control to detect influence in conversations using nonparametric topic models. Machine Learning 95:381--421. [alt link]. [The talk slides we looked at in class]

Prabhakaran, Vinodkumar, Ashima Arora, and Owen Rambow. 2014. Power of confidence: How poll scores impact topic dynamics in political debates. ACL joint workshop on social dynamics and personal attributes.

Prabhakaran, Vinodkumar and Owen Rambow. 2014. Predicting power relations between participants in written dialog from a single thread. Proceedings of the ACL (short papers).

Seo, Jangwon, W. Bruce Croft, and David A. Smith. 2009. Online community search using thread structure. Proceedings of CIKM, pp. 1907--1910.

Siersdorfer, Stefan, Sergiu Chelaru, Jose San Pedro, Ismail Sengor Altingovde, and Wolfgang Nejdl. July 2014. Analyzing and mining comments and comment ratings on the social web. ACM Trans. Web 8 (3): 17:1-17:39. [alt link]

Wang, Yi-Chia, Mahesh Joshi, and Carolyn Penstein Rosé. 2008. Investigating the effect of discussion forum interface affordances on patterns of conversational interactions. Proceedings of CSCW, pp. 555–558.

 
#5 9

Checkpoints of A1 projects; Discourse phenomena: clues regarding structure

Image source: http://www.metmuseum.org/toah/works-of-art/49.70.33. "The image is one for which Picasso did a number of variations in Paris during the autumn–winter of 1912; in each version, a tall bottle and goblet are set out on a small round table."

Scan of lecture notes and the handout

References related to the A1 project discussions

 

References from discourse lecture

Grice, H.P. 1975. Logic and Conversation. In Syntax and semantics 3: Speech Acts, pp. 41-58.

Jurafsky, Dan, and Martin, James H. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Second edition. Chapter 21 covers discourse.

Moser, Megan and Johanna Moore. Toward a synthesis of two accounts of discourse structure. Computational Linguistics 22(3):409--419.

Rogers, Todd and Michael I Norton. June 2011. The artful dodger: Answering the wrong question the right way. Journal of Experimental Psychology: Applied 17 (2).

References for the examples on the handout:

Jordan Boyd-Graber Google+ post

Allen, James. 1995. Natural Language Understanding. Benjamin/Cummings Pub Co. Second ed.

Hirst, Graeme. 1981. Anaphora in Natural Language Understanding: A Survey. Lecture Notes in Computer Science. Springer, Berlin.

Sidner, Candace Lee. 1979. Towards a computational theory of definite anaphora comprehension in English discourse. MIT AITR-537.

Wilks, Yorick. 1975. An intelligent analyzer and understander of English. Communications of the ACM 18 (5): 264-274.

 

 
#6 11

Attention, intentions, and discourse structure: the Grosz and Sidner theory

Scan of lecture notes

References:

Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204.

Mann, William C., and Thompson, Sandra A. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text: Interdisciplinary Journal for the Study of Discourse 8, no. 3: 243-281.

Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature

A2 out (deadline subsequently extended to Sept. 22)
#7 16

A1 presentations, part one

 
#8 18

A1 presentations, part two

 
#9 23

Discussion of application of Grosz/Sidner theory in A2

"Stacking", by Alastair Hesletine. Image source: http://thumbpress.com/the-art-of-stacking-wood/

Scan of discussion notes

References see also the previous discourse lectures

Wikipedia entry on Deep Blue vs. Garry Kasparov (pronunciation)

Stolcke, Andreas Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer. 2000. Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Computational Linguistics 26(3): 339--373.

Taboada, Maite and William C. Mann. 2006. Rhetorical structure theory: Looking back and moving ahead. Discourse Studies 8(3): 423-459. Gives an overview of many issues in analyzing discourse structure.

Walker, Marilyn A. 1996. Limited attention and discourse structure. Computational Linguistics 22(2): 255-264.

Read one — your choice — of the readings for Tu Sep 30 (lecture 11) and post a project proposal inspired by it to Piazza by 3pm Mon the 29th; include the general idea, and a suggestion for a dataset. A paragraph suffices (and more is great, if you feel inspired!). Thoughtfulness and creativity are what I'm most interested in, but take feasibility into account.

And, read each other's proposals, commenting as you see fit, before class on the 30th.

#10 25

Language adaptation, power and within-group lifespan

Scan of lecture notes

Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. Proceedings of WWW, pp. 699--708. Link includes access to datasets, talk slides, etc. ACM link is here.

Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. Proceedings of WWW, pp. 307--318. Link includes access to datasets, talk slides, etc. ACM link is here.

References

http://minimalmovieposters.tumblr.com/archive

Beňuš, Štefan, Rivka Levitan, and Julia Hirschberg. 2012. Entrainment in spontaneous speech: The case of filled pauses in supreme court hearings. Proceedings of the 3rd IEEE Conference on Cognitive Infocommunications.

Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT.

Choudhury, Tanzeem and Alex Pentland. 2004. Characterizing social networks using the sociometer. Proceedings of the North American Association of Computational Social and Organizational Science (NAACSOS)

Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. Proceedings of the ACL.

Diehl, Christopher P., Galileo Namata, and Lise Getoor. 2007. Relationship identification for social network discovery. Proceedings of the AAAI Workshop on Enhanced Messaging, pp. 546--552.

Gilbert, Eric. 2012. Phrases that signal workplace hierarchy. Proceedings of CSCW.

Leber, Jessica. 2013. The immortal life of the Enron e-mails. Business News.

Ng, Sik Hung and James J Bradac. 1993. Power in Language: Verbal Communication and Social Influence. Sage Publications, Inc.

Vinod Prabhakaran and Owen Rambow's work on inferring power relationships

 
#11 30

Project-possibilities discussion

The assigned reading: one of:

  1. Glasgow, Kimberly, Clayton Fink, and Jordan Boyd-Graber. 2014. Our grief is unspeakable: Automatically measuring the community impact of a tragedy. Proceedings of ICWSM.
  2. Mitra, Tanushree and Eric Gilbert. 2014. The language that gets people to give: Phrases that predict success on Kickstarter. Proceedings of CSCW.

Sites examined or mentioned during class

References (including some that came up during class)

Althoff, Tim, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2014. How to ask for a favor: A case study on the success of altruistic requests. Proceedings of ICWSM.

Bailey, Michael, Daniel J Hopkins, and Todd Rogers. 2013. Unresponsive and unpersuaded: The unintended consequences of voter persuasion efforts. Working paper on SSRN.

Bell, Brad E and Elizabeth F Loftus. May 1989. Trivial persuasion in the courtroom: The power of (a few) minor details Journal of Personality and Social Psychology 56(5):669-679.

Gayo-Avello, Daniel. December 2013. A meta-analysis of state-of-the-art electoral prediction from Twitter data. Social Science Computer Review 31(6): 649-679. Hat tip to Brendan O'Connor; I saw this on his 2013 blog post Some analysis of tweet shares and “predicting” election outcomes. Also of interest, for the title alone: Gayo-Avello's "I wanted to predict elections with twitter and all I got was this lousy paper" -- A balanced survey on election prediction using twitter data, Eprint ArXiv:1204.6441 and On Twitter and Elections, catchy paper titles, press releases and telling scientist's opinions from facts: A brief comment to DiGrazia et al. 2013 and to Fabio Rojas Op-Ed in Washington Post.

Greenberg, Michael D, Bryan Pardo, Karthic Hariharan, and Elizabeth Gerber. 2013. Crowdfunding support tools: Predicting success & failure. Proceedings of CHI: Extended Abstracts, pp. 1815--1820.

Guerini, Marco, Carlo Strapparava, and Oliverio Stock. 2010. Evaluation metrics for persuasive NLP with Google adwords. Proceedings of LREC.

Hannak, Aniko, Drew Margolin, Brian Keegan, and Ingmar Weber. 2014. Get back! You don't know me like that: The social mediation of fact checking interventions in Twitter conversations. Proceedings of ICWSM.

Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of KDD, 497-506.

Petrovic, Sasa, Miles Osborne, and Victor Lavrenko. 2013. I wish I didn't say that! Analyzing and predicting deleted messages in Twitter. eprint arXiv:1305.3107.

Qazvinian, Vahed, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying misinformation in microblogs. Proceedings of EMNLP, 1589--1599.

Thelwall, Mike, Kevan Buckley, and Georgios Paltoglou. 2011. Sentiment in Twitter events. Journal of the American Society for Information Science and Technology 62(2): 406-418.

 

 

Read one — your choice — of the readings for Tu Oct 7 (lecture 13) and post a project proposal inspired by it to Piazza by 3pm Mon the 6th; include the general idea, and a suggestion for a dataset. A paragraph suffices (and more is great, if you feel inspired!). Thoughtfulness and creativity are what I'm most interested in, but take feasibility into account.

And, read each other's proposals, commenting as you see fit, before the in-class discussion.

#12 Oct 2

Project-possibilities discussion

Class is at 3:30 - let's say the Theory Lab.

Papers to be presented:

  1. Simmons, Matthew P., Lada A. Adamic, and Eytan Adar. 2011. Memes online: Extracted, subtracted, injected, and recollected. Proceedings of ICWSM, pp. 353--360.
  2. Acton, Eric K. 2011. On gender differences in the distribution of um and uh. Penn working papers in Linguistics: Selected papers from NWAV 17.

Bordia, Prashant and Nicholas Difonzo. Problem solving in social interactions on the internet: Rumor as social cognition. Social Psychology Quarterly 67 (1): 33--49.

 

Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. 2013. Prominent features of rumor propagation in online social media. Proceedings of ICDM

 
#13 7

Project-possibilities discussion

  1. Garley, Matt and Julia Hockenmaier. 2012. Beefmoves: Dissemination, diversity, and dynamics of English borrowings in a German hip hop forum. Proceedings of ACL.
  2. Vasilescu, Bogdan, Alexander Serebrenik, Prem Devanbu, and Vladimir Filkov. 2014. How social Q&A sites are changing knowledge sharing in open source software communities. Proceedings of CSCW, pp. 342--354.

Farshad Kooti, Haeryun Yang, Meeyoung Cha, Krishna Gummadi, and Winter Mason. The emergence of conventions in online social networks. Proceedings of ICWSM. Best paper award.

 
#14 9

 

 
Oct 14 Fall Break
#15 16    
#16 21    
#17 23    
#18 28    
#19 30    
#20 Nov 4    
#21 6    
#22 11    
#23 13    
#24 18    
#25 20    
#26 25    
Nov 27 Thanksgiving Break
#27 Dec 2    
#28 4    
Final-project due-date, as determined by the registrar: December 11 at 4:30 pm

Code for generating the calendar above and css was (barely) adapted from the original versions created by Andrew Myers.