CS 6742, Fall 2013: Natural Language Processing and Social Interaction

Time and place Tuesdays and Thursdays, 10:10-11:25, Upson 315
Instructor Professor Lillian Lee. For contact info, see http://www.cs.cornell.edu/home/llee
Course homepage http://www.cs.cornell.edu/courses/cs6742/2013fa. Main site for course info, assignments, readings, lecture references, etc.; updated frequently.
Course CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments, unless otherwise noted.
Course Piazza page http://piazza.com/cornell/fall2013/cs6742 Course announcements and Q&A/discussion site. Social interaction and all that, you know.
This page last modified Tue November 26, 2013 8:32 AM.

Brief course description

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. The intended audience is strongly research-oriented students.

Prerequisites CS 2110 or equivalent programming experience, and at least one course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, and graduate standing;
permission of instructor.

For more information


Quick links: overview | (start of) review sites | discussion threads | (start of) discourse analysis | conversational norms: group language, politeness | linguistic coordination | sentiment analysis | influence and diffusion | (start of) project-inspiration papers: phrasing+virality and meme mutation | authority claims and alignment moves + normalization of “non-standard” language in social media | power + tolerance | polarizing topics + Q&A sites | spoiler detection + legislative credit claiming | project phase of the class

Lecture Date Agenda and references
Assignments and other handouts
#1 Th Aug 29

Course overview

Scan of lecture notes

Related talk:

I'm giving the CS colloquium on some recent research I've been involved in in the afternoon:
Language as influence(d): Power and Memorability
4:15, Upson B17


Bennett, Shea. Twitter Now Seeing 400 Million Tweets Per Day, Increased Mobile Ad Revenue, Says CEO. June 7, 2012

Brake, David R. 2009. ‘As if nobody’s reading’?: The imagined audience and socio-technical biases in personal blogging practice in the UK. Ph.D. Thesis, the London School of Economics and Political Science.

Hancock, Jeffrey T., Jennifer Thom-Santelli, and Thompson Ritchie. 2004. Deception and design: The impact of communication technology on lying behavior. Proceedings of the SIGCHI conference on Human factors in computing systems (CHI): 129-134. doi:10.1145/985692.985709.

Lejeune, Philippe. 2009. On Diary. Ed. Jeremy D. Popkin and Julie Rak. University of Hawaii Press. [Google books link]. The editors note that Lejeune dates the practice of the "Dear diary" entry heading to the end of the nineteenth century; "The diffusion of the practice of writing to one's "dear diary" is significant: even though each diearist wrote in private, the spread of the formula indicates that diarists were increasingly aware that they were following a widely diffused model" (pg. 7)

Marwick, Alice E., and danah boyd. 2010. I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience. New Media & Society (July 7). doi:10.1177/1461444810365313.

Pang, Bo, and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. Describes the positive/negative opinion examples from lecture in more detail (and gives their sources) on pages 19 and 21.

The Radicati group. Email Statistics Report, 2011-2015. May 2011.

Assignment 1 (A1)
#2 Tu Sep 3

To what extent is there social interaction on review sites?

Image source: http://catandgirl.com/?p=1488

Scan of lecture notes

Amazon's explanation of customer review quotes

Amazon comment thread regarding personal point of view in reviews

Remark on the effectiveness of Amazon review quotes. From LinkedIn page of someone stating that they worked on the product.

Review with 42/42 helpfulness score

The Harriet Klausner appreciation society. A site regarding Harriet Klausner, one-time number-one top reviewer on Amazon.com.


Gilbert, Eric and Karrie Karahalios. 2010. Understanding deja reviewers. Proceedings of CSCW, pp.225—228. [ACM link] [alternative link]

Otterbacher, Jahna. Gender, writing and ranking in review forums: A case study of the IMDB. Knowledge and Information Systems 35 (3): 645-664, 2012.

Pang, Bo, and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011. Section 5.2.4 reviews work on predicting helpfulness of reviews; later work can be found by forward-searching in a citation database for work that cites the references from our chapter.

Pinch, Trevor and Filip Kesler. 2011. How Aunt Ammy gets her free lunch: A study of the top-thousand customer reviewers at Amazon.com. http://www.freelunch.me/filecabinet. Possibly related version (I haven't looked carefully): Trevor Pinch, "Book Reviewing for Amazon.com: How Socio-technical Systems Struggle to Make Less From More," in Managing Overflow in Affluent Societies, Barbara Czarniawska and Orvar Löfgren (eds.). New York and London: Routledge, 2012.

Wu, Fang, and Bernardo A. Huberman. 2010. Opinion formation under costly expression. ACM Transactions on Intelligent Systems and Technology 1, no. 1: 1-13.
#3 Th Sep 5

Correlates of review helpfulness; the making of our WWW 2009 paper

Relevant image here [warning: bad language]

Scan of lecture notes

Danescu-Niculescu-Mizil, Cristian, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. Proceedings of WWW: 141—150. The slides we looked at in class can be found in this pdf.

Ghose, Anindya and Panagiotis Ipeirotis. 2011. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering 23(10): 1498—1512. Official link can be found through Worldcat, e.g., here.

Kim, Soo-Min, Patrick Pantel, Tim Chklovski, and Marco Pennacchiotti. 2006. Automatically assessing review helpfulness. Proceedings of EMNLP, 423-430.

Liu, Jingjing, Yunbao Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou. 2007. Low-quality product review detection in opinion summarization. Proceedings of EMNLP-CoNLL, pp.334--342.

Lu, Yue, Panayiotis Tsaparas, Alexandros Ntoulas, and Livia Polanyi. Exploiting social context for review quality prediction. Proceedings of WWW, 691-700, 2010.

Otterbacher, Jahna. 2009. 'Helpfulness' in online communities: a measure of message quality. Proceedings of CHI, 955-964.

Zhang, Zhu and Balaji Varadarajan. 2006. Utility scoring of product reviews. Proceedings of CIKM, pp.51--57.

#4 Tu Sep 10

Asynchronous online discussions

Scan of lecture notes

Wikipedia pages we examined: An article's talk page. An article's revision history page. A structured user talk page. An unstructured user talk page. Statistics on Wikipedians. Requests for adminship. Article deletion discussions.

Slashdot pages we examined: Slashdot home Slashdot scoring system. Wikipedia's explanation of the Slashdot scoring system A Slashdot corpus (may be downloadable if you register for an account): README and legal notices training set. Another corpus containing slashdot material is the BC3 Blog Corpus


Backstrom, Lars, Jon Kleinberg, Lillian Lee, and Cristian Danescu-Niculescu-Mizil. 2013. Characterizing and curating conversation threads: Expansion, focus, volume, re-entry. Proceedings of WSDM, pp. 13–22.

Bakshy, Eitan, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone's an influencer: Quantifying influence on Twitter. Proceedings of WSDM.

Chen, Zoey and Jonah Berger. 2013. When, why, and how controversy causes conversation. Journal of Consumer Research, 40:580–593.

De Choudhury, Munmun, Hari Sundaram, Ajita John, and Dorée Duncan Seligmann. 2009. What makes conversations interesting?: Themes, participants and consequences of conversations in online social media. Proceedings of WWW, pp. 331–340.

Elsner, Micha and Eugene Charniak. September 2010. Disentangling chat. Computational Linguistics 36 (3): 389-409.

Ferschke, Oliver, Johannes Daxenberger, and Iryna Gurevych. 2013. A survey of NLP methods and resources for analyzing the collaborative writing process in Wikipedia. Chapter 5 in The People's Web Meets NLP: Collaboratively Constructed Language Resources.

Gómez, Vicenç, Andreas Kaltenbrunner, and Vicente López. 2008. Statistical analysis of the social network and discussion threads in Slashdot. Proceedings of WWW, pp. 645–654.

Gilad Mishne and Natalie Glance. 2006. Leave a reply: An analysis of weblog comments. Third Annual Workshop on the Weblogging Ecosystem.

Nov, Oded. 2007. What motivates Wikipedians? CACM 50(11):60–64.

Shmueli, Erez, Amit Kagian, Yehuda Koren, and Ronny Lempel. 2012. Care to comment?: Recommendations for commenting on news stories. Proceedings of WWW, pp. 429–438.

Wang, Yi-Chia, Mahesh Joshi, and Carolyn Penstein Rosé. 2008. Investigating the effect of discussion forum interface affordances on patterns of conversational interactions. Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW), pp. 555–558.

#5 Th Sep 12 Pilot-study presentations  
#6 Th Sep 16

Discourse phenomena: clues regarding structure

Scan of lecture notes

Some conversation/discourse corpora:

AMI Meeting Corpus

British Columbia Conversation Corpus (40 email threads)

Enron email dataset

Penn Discourse Treebank

Saarbrücken Corpus of Spoken English

Santa Barbara Corpus of Spoken American English

Supreme Court dialogs corpus

IRC chat data and disentanglement code from Micha Elsner [.tgz]

Wikipedian conversations corpus: [zipfile] [README alone]

References not already in the lecture handout:

Grice, H.P. 1975. Logic and Conversation. In Syntax and semantics 3: Speech Acts, pp. 41-58.

Jurafsky, Dan, and Martin, James H. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Second edition, illustrated. Upper Saddle River, N.J.: Prentice Hall. Chapter 21 covers discourse.

Rogers, Todd and Michael I Norton. June 2011. The artful dodger: Answering the wrong question the right way. Journal of Experimental Psychology: Applied 17 (2).

Lecture handout
#7 Th Sep 18

Global discourse structure: the Grosz and Sidner theory

Scan of lecture notes

Bracewell, David B., Marc T. Tomlinson, and Hui Wang. Identification of social acts in dialogue. Proceeedings of COLING, pp. 375--390.

Clark, Herbert H. 1996. Using language. Second edition. Cambridge University Press.

Clark, Herbert H., and Fox Tree, Jean E. 2002. Using uh and um in spontaneous speaking. Cognition 84, no. 1: 73 - 111. doi:10.1016/S0010-0277(02)00017-3.

Grosz, Barbara J., and Sidner, Candace L. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175-204.

Mann, William C., and Thompson, Sandra A. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text: Interdisciplinary Journal for the Study of Discourse 8, no. 3: 243-281.

Marcu, Daniel. 2000. Extending a formal and computational model of Rhetorical Structure Theory with intentional structures à la Grosz and Sidner. Proceedings of COLING: 523-529

Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature


#8 Tu Sep 24 No class (LL out of town)  
#9 Th Sep 26

Discussion of A2 (discourse annotation exercise)

Scan of lecture notes

Egg, Markus and Gisela Redeker. 2010. How complex is discourse structure? Proceedings of LREC, 1619-1623.

Walker, Marilyn A. 1996. Limited attention and discourse structure. Computational Linguistics 22(2): 255-264.

Wolf, Florian and Edward Gibson. 2005. Representing discourse coherence: A corpus-based study. Computational Linguistics 31(2):249-288.

#10 Tu Oct 1

Class really meets on Wednesday Oct 2, 4-5pm in the 301 College Ave seminar room: Cristian Danescu-Niculescu-Mizil (MPI-SWS) is giving the IS colloquium on Language and social dynamics in online communities.


Danescu-Niculescu-Mizil, Cristian, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. Proceedings of ACL.

Danescu-Niculescu-Mizil, Cristian, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: User lifecycle and linguistic change in online communities. Proceedings of the 22nd International Conference on World Wide Web, pp.307–318.

#11 Tu Oct 3

Linguistic coordination: the case of Twitter

Image source: Matt Groening, Life is Hell, 1982

Scan of lecture notes, which were authored by Cristian Danescu-Niculescu-Mizil

Danescu-Niculescu-Mizil, Cristian, Michael Gamon, and Susan Dumais. 2011. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW.

Danescu-Niculescu-Mizil, Cristian and Lillian Lee. 2011. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. Proceedings of the Cognitive Modeling and Computational Linguistics Workshop.

Ireland, Molly E., Richard B. Slatcher, Paul W. Eastwick, Lauren E. Scissors, Eli J. Finkel, and James W. Pennebaker. 2011. Language style matching predicts relationship initiation and stability. Psychological Science 22 (1): 39-44.

Levelt, Willem J. M., and Stephanie Kelter. 1982. Surface form and memory in question answering. Cognitive Psychology 14(1):76–106.

Maddux, William W, Elizabeth Mullen, and Adam D Galinsky. 2008. Chameleons bake bigger pies and take bigger pieces: Strategic behavioral mimicry facilitates negotiation outcomes. Journal of Experimental Social Psychology 44 (2): 461 - 468,

Taylor, Paul J and Sally Thomas. 2008. Linguistic style matching and negotiation outcome. Negotiation and Conflict Management Research 1:263–281.

van Baaren, Rick B. , Rob W. Holland, Bregje Steenaert, and Ad van Knippenberg. 2003. Mimicry for money: Behavioral consequences of imitation. Journal of Experimental Social Psychology 39 (4): 393-398.

#12 Tu Oct 8

Sentiment analysis

Image source: http://xkcd.com/937/


Most of the references can be found in Pang, Bo, and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2, no. 1-2 (January): 1-135. doi:10.1561/1500000011:

Blitzer, John, Ryan McDonald and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. Proceedings of EMNLP, 120--128.

Mao, Yi and Guy Lebanon. 2009. Generalized isotonic conditional random fields. Machine Learning 77 (2): 225-248,

Ott, Myle, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the ACL.

Smith, Aaron. Pew study on Digital Politics (technology and campaign 2012)

Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. Proceedings of EMNLP, 327--335.

Read at least one (and shoot for both, if you have time) of the readings for Th Oct 17th (lecture 14). Post a project proposal inspired by one or both of these readings to Piazza by 5pm Wed the 16th; include the general idea, and a suggestion for a dataset. A paragraph suffices (and more is great, if you feel inspired!). Thoughtfulness and creativity are what I'm most interested in, but take feasibility into account.

Do glance at each other's proposals, and comment as you see fit, before class on the 17th.

Here is a real-life example of a proposal and subsequent discussion, posted with permission.

#13 Th Oct 10

Influence and diffusion


Image source: notabilia.net

Scan of lecture notes


Androutsopoulos, Ion and Prodromos Malakasiotis. May 2010. A survey of paraphrasing and textual entailment methods. JAIR 38:135-187,

Backstrom, Lars, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of KDD, 44-54. "Taken together, these results support the notion that a burst of authors moving into a conference C from some other conference B are drawn to topics that are currently hot at C; but there is also evidence that this burst of authors produces papers that are comparably impoverished in their usage of terms that will be hot in the future".

Bordia, Prashant and Nicholas Difonzo. Problem solving in social interactions on the internet: Rumor as social cognition. Social Psychology Quarterly 67 (1): 33--49.

Broder, John M. 2007. Familiar Fallback for Officials: ‘Mistakes Were Made’. The New York Times, March 14. The "past exonerative".

Choi, Eunsol, Chenhao Tan, Lillian Lee, Cristian Danescu-Niculescu-Mizil, and Jennifer Spindel. 2012. Hedge detection as a lens on framing in the GMO debates: A position paper. Proceedings of the ACL Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, pp. 70--79.

Greene, Stephan and Philip Resnik. 2009. More than words: Syntactic packaging and implicit sentiment. Proceedings of NAACL, pp. 503--511.

Gruhl, Daniel, R Guha, David Liben-Nowell, and Andrew Tomkins. 2004. Information diffusion through blogspace. In Proceedings of WWW, 491-501.

Guerin, Bernard and Yoshihiko Miyazaki. Analyzing rumors, gossip, and urban legends through their conversational properties. Psychological Record 56 (1): 23-34.

Heath, Chip, Chris Bell, and Emily Steinberg. Emotional selection in memes: The case of urban legends. Journal of Personality 81 (6): 1028-1041.

Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? Proceedings of WWW, 591-600.

Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-Tracking and the dynamics of the news cycle. In Proceedings of KDD, 497-506. Data and visualization website: http://www.memetracker.org/index.html

Madnani, Nitin and Bonnie J Dorr. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics 36 (3): 341-387.

Simmons, Matthew P., Lada A. Adamic, and Eytan Adar. 2011. Memes online: Extracted, subtracted, injected, and recollected. Proceedings of ICWSM, pp. 353--360.

Taraborelli, Dario and Giovanni Luca Ciampaglia. Beyond notability. [sic] Collective deliberation on content inclusion in Wikipedia. Second International Workshop on Quality in Techno-Social Systems. Here is the visualization and data site: notabilia.net

Wu, Fang, Bernardo A. Huberman, Lada A. Adamic, and Joshua R. Tyler. Information flow in social groups. Physica A: Statistical and Theoretical Physics 337 (1-2): 327-335.

  Tu Oct 15 Fall Break - no class  
#14 Th Oct 17

Discussion/project brainstorming on phrasing and virality + meme mutation:

Image source: http://imgs.xkcd.com/comics/headlines.png

Guerini, Marco, Alberto Pepe, and Bruno Lepri. 2012. Do linguistic style and readability of scientific abstracts affect their virality?. Proceedings of ICWSM

Simmons, Matthew P., Lada A. Adamic, and Eytan Adar. 2011. Memes online: Extracted, subtracted, injected, and recollected. Proceedings of ICWSM, pp. 353--360.

Scan of lecture notes (includes administrative info that explains rationale behind the activities of the next few weeks)

Related work

Ashok, Vikas Ganjigunte, Song Feng, and Yejin Choi. 2013. Success with style: Using writing style to predict the success of novels. EMNLP

Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. Proceedings of KDD, pp. 497--506.

Omodei, Elisa, Thierry Poibeau, and Jean-Philippe Cointet. 2012. Multi-Level modeling of quotation families morphogenesis. Proceedings of ASE/IEEE SocialCom.

Schneider, Nathan, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan W. Black, Frederick L. Crabbe, and Noah A. Smith. 2010. Visualizing Topical Quotations Over Time to Understand News Discourse. CMU-LTI-01-103, CMU.

Shaparenko, Benyah and Thorsten Joachims. 2007. Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 619-628.

References that came up during discussion

The original Debatepedia (I learned about this from the Gottipati et al. paper listed on lecture 16); Debatabase

Adamic, Lada A and Natalie Glance. 2005. The political blogosphere and the 2004 U.S. Election: Divided they blog. Proceedings of the 3rd International Workshop on Link Discovery, pp. 36--43. This is in reference to whether referencing implies agreement.

Agrawal, Rakesh, Sridhar Rajagopalan, Ramakrishnan Srikant, and Yirong Xu. 2003. Mining newsgroups using networks arising from social behavior. In Proceedings of the 12th International Conference on World Wide Web, 529-535. This is in reference to whether referencing implies agreement.

Biran, Or and Owen Rambow. 2011. Identifying justifications in written dialogs. Proceedings of the IEEE International Conference on Semantic Computing (ICSC), pp. 162--168. Access to the pdf can be achieved through the Cornell library.

Guerini, Marco, Carlo Strapparava, and Oliverio Stock. 2010. Evaluation metrics for persuasive NLP with Google adwords. Proceedings of LREC.

Lakkaraju, Himabindu, Julian McAuley, and Jure Leskovec. 2013. What's in a name? Understanding the interplay between titles, content, and communities in social media. Proceedings of ICWSM.

Murakami, Akiko and Rudy Raymond. 2010. Support or oppose?: Classifying positions in online debates from reply activities and opinion expressions. Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 869--875. This is in reference to whether referencing implies agreement.

Tausczik, Y R and J W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29 (1): 24. LIWC site

All the next Piazza posts are due at 5pm the day before the in-class discussion, to allow more time for feedback.
#15 Tu Oct 22

Discussion/project proposals on authority claims and alignment moves + normalization of “non-standard” language in social media:

Bender, Emily M, Jonathan T Morgan, Meghan Oxley, Mark Zachry, Brian Hutchinson, Alex Marin, Bin Zhang, and Mari Ostendorf. 2011. Annotating social acts: Authority claims and alignment moves in Wikipedia talk pages. Proceedings of the ACL-HLT Workshop on Language in Social Media, pp. 48--57. The URL for the AAWD corpus has changed since the paper was published; it's now here.

Eisenstein, Jacob. 2013. What to do about bad language on the internet. Proceedings of NAACL-HLT, pp. 359--369.

Introductory notes

Brought up in class: "Other approaches don't have to be bad in order for your approach to be good", from a blog post by Hal Daumé.

Related work

The SRI Language Modeling Toolkits FAQ: see questions D4 and D5 about OOV words.

Abbott, Rob, Marilyn Walker, Pranav Anand, Jean E Fox Tree, Robeson Bowmani, and Joseph King. 2011. How can you say such things?!?: Recognizing disagreement in informal political argument. In Proceedings of the Workshop on Languages in Social Media, 2-11.

Galley, Michel, Kathleen McKeown, Julia Hirschberg, and Elizabeth Shriberg. 2004. Identifying agreement and disagreement in conversational speech: Use of bayesian networks to model pragmatic dependencies. In Proceedings of the ACL.

Garley, Matt and Julia Hockenmaier. 2012. Beefmoves: Dissemination, diversity, and dynamics of english borrowings in a german hip hop forum. Proceedings of ACL.

Hassan, Ahmed, Amjad Abu-Jbara and Dragomir Radev. 2012. Detecting subgroups in online discussions by modeling positive and negative relations among participants. Proceedings of EMNLP-CoNLL.

Hillard, Dustin, Mari Ostendorf, and Elizabeth Shriberg. 2003. Detection of agreement vs. Disagreement in meetings: Training with unlabeled data. Companion Volume of the Proceedings of HLT-NAACL 2003--short Papers - Volume 2, pp. 34--36.

Marin, Alex, Bin Zhang and Mari Ostendorf. 2011. Detecting forum authority claims in online discussions. Proceedings of the ACL-HLT Workshop on Language in Social Media, pp. 39--47.

Mayfield, Elijah and Carolyn P. Rosé. 2011. Recognizing authority in dialogue with an integer linear programming constrained model. Proceedings of the ACL.

Murakami, Akiko and Rudy Raymond. 2010. Support or oppose?: Classifying positions in online debates from reply activities and opinion expressions. Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 869--875.

Neils, Jean, David P. Roeltgen, and Anita Greer. June 1995. Spelling and attention in early alzheimer's disease: Evidence for impairment of the graphemic buffer. Brain and Language 49 (3): 241-62,

Pierrehumbert, Janet B. 2012. The dynamic lexicon. In Handbook of Laboratory Phonology, pp. 173--183.

Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. Proceedings of EMNLP, 327--335.

Thurlow, Crispin. 2006. From statistical panic to moral panic: The metadiscursive construction and popular exaggeration of new media language in the print media. Journal of Computer-Mediated Communication 11 (3): 667-701,

Other recommendations

Petrovic, Sasa, Miles Osborne, and Victor Lavrenko. 2013. I wish I didn't say that! Analyzing and predicting deleted messages in Twitter. eprint arXiv:1305.3107. Not required reading since analyzing tweet deletion would probably be out of scope for this semester, resource-wise, and not quite in scope with respect to social interaction.

(piazza post)
#16 Th Oct 24

Discussion/project proposals for power + tolerance:

Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT. The Enron email dataset can be found here.

Mukherjee, Arjun, Vivek Venkataraman, Bing Liu, and Sharon Meraz. 2013. Public dialogue: Analysis of tolerance in online discussions. Proceedings of the ACL.

Introductory notes

Some pages shown in class

Other sites I've seen referenced in other papers or webpages but haven't looked at much yet: www.forandagainst.com, www.convinceme.net, idebate.org.

Related work

Beňuš, Štefan, Rivka Levitan, and Julia Hirschberg. 2012. Entrainment in spontaneous speech: The case of filled pauses in supreme court hearings. Proceedings of the 3rd IEEE Conference on Cognitive Infocommunications.

Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. Proceedings of WWW, pp. 699--708.

Diehl, Christopher P., Galileo Namata, and Lise Getoor. 2007. Relationship identification for social network discovery. Proceedings of the AAAI Workshop on Enhanced Messaging, pp. 546--552.

Gilbert, Eric. 2012. Phrases that signal workplace hierarchy. Proceedings of CSCW.

Leber, Jessica. 2013. The immortal life of the Enron e-mails. Business News.

Ng, Sik Hung and James J Bradac. 1993. Power in Language: Verbal Communication and Social Influence. Sage Publications, Inc.

Prabhakaran, Vinodkumar, Owen Rambow, and Mona T Diab. 2012. Who's (really) the boss? Perception of situational power in written interactions. COLING, pp. 2259--2274.

Other recommendations

Gottipati, Swapna, Minghui Qiu, Yanchuan Sim, Jing Jiang, and Noah A Smith. 2013. Learning topics and positions from Debatepedia. Proceedings of EMNLP. Appendix. Not required reading because the Mukherjee et al. paper gives an overview of more areas, which is useful for generating project ideas.

Mayfield, Elijah, David Adamson, and Carolyn Penstein Rosé. 2012. Hierarchical conversation structure prediction in multi-party chat. Proceedings of ACL SigDIAL, pp. 60--69. Not required reading due to time constraints because the data annotation required to reproduce the work is probably out of scope for this semester, resource-wise.


(piazza post)
#17 Tu Oct 29

Discussion/project proposals for polarizing topics + Q&A sites

Image source: www.catandgirl.com/?p=2105

Balasubramanyan, Ramnath, William W Cohen, Douglas Pierce, and David P Redlawsk. 2012. Modeling polarizing topics: When do different political communities respond differently to the same news?. Proceedings of ICWSM

Treude, Christoph, Ohad Barzilay, and Margaret-Anne Storey. May 2011. How do programmers ask and answer questions on the web? (NIER track). Proceedings of the International Conference on Software Engineering (ICSE), 804-807.

Introductory notes

Some pages shown in class

Related work

Abu-Jbara, Amjad, Pradeep Dasigi, Pradeep Diab, Pradeep Diab, Mona Diab, and Dragomir Radev. 2012. Subgroup detection in ideological discussions. Proceedings of ACL.

Anderson, A, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2012. Discovering value from community activity on focused question answering sites: A case study of Stack Overflow. In Proceedings of KDD.

Dalip, Daniel Hasan, Marcos André Gonçalves, Marco Cristo, and Pavel Calado. 2013. Exploiting user feedback to learn to rank answers in Q&A forums: A case study with Stack Overflow. Proceedings of SIGIR, pp. 543--552.

Harper, F. Maxwell, Daniel Moy, and Joseph A. Konstan. 2009. Facts or friends?: Distinguishing informational and conversational questions in social Q&A sites. In Proceedings of CHI, 759-768.

Oktay, Hüseyin, Brian J Taylor, and David D Jensen. 2010. Causal discovery in social media using quasi-experimental designs. Proceedings of the First Workshop on Social Media Analytics, pp. 1--9.

Tausczik, Yla R and James W Pennebaker. 2011. Predicting the perceived quality of online mathematics contributions from users' reputations. Proceedings of CHI, pp. 1885--1888.

Wang, Gang, Konark Gill, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao. 2013. Wisdom in the social crowd: an analysis of Quora. Proceedings of WWW, pp. 1341--1352.

(piazza post)
#18 Th Oct 31

Discussion/project proposals for spoiler detection + legislative credit claiming

Boyd-Graber, Jordan, Kimberly Glasgow, and Jackie Sauter Zajac. 2013. Spoiler alert: Machine learning approaches to detect social media posts with revelatory information. Proceedings of ASIST.

Grimmer, Justin, Solomon Messing, and Sean J Westwood. October 2012. How words and money cultivate a personal vote: The effect of legislator credit claiming on constituent credit allocation. American Political Science Review 106 (04): 703-719. Supplementary material by authors.     LL notes from a talk given by Grimmer (search for "Grimmer" in the file)

Introductory notes

Pages visited in class

Ebert, Roger. 2005. Critics have no right to play spoiler.

Freeman, Nate. 2010. The history and use of "Spoiler Alert".

Klosowski, Thorin. 2012. How to block annoying tech rumors and movie spoilers on your brower. Note tag filtering capabilities that some sites have.

Singer, Matt. 2012. Does Rotten Tomatoes Need a Spoiler Warning System?

TV Tropes guidelines about handling spoilers.

http://www.doesthedogdie.com Those who only wish to know about a particular movie should shrink their browser window and just use the search box at top, or use http://www.doesthedogdie/search instead.

Related references.

Druck, Gregory and Bo Pang. 2012. Spice it up? Mining refinements to online instructions from user generated content. Proceedings of ACL. "Actionable improvements" (or changes to avoid) are kind of like a spoiler one would want to see and search for.

(piazza post)
#19 Tu Nov 5 No class (LL out of town). Two+ paragraph informal term-project proposal draft due on Piazza by 5pm -- but posting significantly beforehand will increase the quality of feedback, and help with deciding to team up.
#20 Th Nov 7 Individual/team meetings: midterm evaluation and project ideas  
#21 Tu Nov 12

Proof-of-concept presentations: everyone shows as cool an example as you can find of the phenomena you've proposed to study, or, alternately, an example that shows that what you've been proposing to study may be problematic. Handouts or slides OK. Think about a 5-7 minute time frame, and assume everyone has read your Piazza project proposal.

The idea is to get you looking at your data and acquiring a feel for what might turn out to be interesting as soon as possible.

#22 Th Nov 14 No class meeting per se, but LL will be in Upson 315 for consultation, so you can drop by. By 5pm: post a followup to your Piazza proposal that gives your planned schedule (by such-and-so-date, have this-or-that experiment done)
#23 Tu Nov 19 Individual progress-report/problem-solving meetings

By 5pm the night before (i.e., Monday), post a brief description of your progress, especially any cool findings and/or any problems you've run into, as a follow-up to your Piazza project proposal.

Doing so will help us have a more productive individual discussion, and in some cases allow us to cancel the individual meeting on Tuesday as appopriate.

#24 Th Nov 21 No class (LL out of town)  
#26 Tu Nov 26 Individual progress-report/problem-solving meetings

By 5pm the night before (i.e., Monday), post a brief description of your progress, especially any cool findings and/or any problems you've run into, as a follow-up to your Piazza project proposal.

Doing so will help us have a more productive individual discussion, and in some cases allow us to cancel the individual meeting on Tuesday as appopriate.

  Th Nov 28 Thanksgiving Break - no class  
#27 Tu Dec 3 Individual progress-report/problem-solving meetings

By 5pm the night before (i.e., Monday), post a brief description of your progress, especially any cool findings and/or any problems you've run into, as a follow-up to your Piazza project proposal.

Doing so will help us have a more productive individual discussion, and in some cases allow us to cancel the individual meeting on Tuesday as appopriate.

#28 Th Dec 5 Project presentations  
  W Dec 11 Final project due on CMS by 11:59pm. I'm expecting something like an ICWSM/ACL paper in terms of style, inclusion of discussion of related work, etc. I have no particular page length in mind, but please highlight the most interesting findings (positive or negative).  

Code for generating the calendar above and css was (barely) adapted from the original versions created by Andrew Myers.