CS/IS 6742, Fall 2021: Natural Language Processing and Social Interaction.  Prof. Lillian Lee. Tu/Th 1:00-2:15pm, Phillips 403 Image source: http://en.wikipedia.org/wiki/The_School_of_Athens

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.

Click on tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.

Enrollment, prerequisites, related classes

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to keep class meetings heavily discussion- and group-research-focused.

Prerequisites All of the following: (1) CS 2110 or equivalent programming experience; (2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); (3) proficiency with using machine learning tools (e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)

Related classes: see Cornell's NLP course list. Also GOVT 3294 Post-Truth Politics COMM 6750 Research Methods for Social Networks and Social Media, COMM 6770 Attitudes and Social Judgment

All prior runnings of CS/INFO 6742: 2019 fall :: 2018 fall :: 2017 fall :: 2016 fall :: 2015 fall :: 2014 fall :: 2013 fall :: 2011 spring

Administrative info

CMS https://cmsx.cs.cornell.edu. Site for submitting assignments, unless otherwise noted. Login with NetID credentials and select CS 6742. You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission; how to tell if CMS successfully received your files; how to form a group.

Course discussion site https://edstem.org/us/courses/8208/discussion (access restricted to enrolled students). Course announcements and Q&A/discussion site. Social interaction and all that, you know.

Office hours and contact info See Prof. Lee's homepage and scroll to the section on Contact and availability info.

Grading Of most interest to is productive research-oriented discussion participation (in class and/or on the course discussion site, interesting research proposals and pilot studies, and a good-faith final research project.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

Certain points deserve emphasis here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See "Acknowledging the Work of Others" in The Essential Guide to Academic Integrity at Cornell and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment “only” risks grade penalties.

Overall course structure

Lecture Agenda Pedagogical purpose Assignments
#1

Course overview

 

A1 released: pilot empirical study for a research idea based on the given readings.

#2 - #6

Lectures on topics related to the A1 readings

Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other.

 
Next block of meetings

Dicussion of proposed projects based on the readings

Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.

Discussion of student project proposals, based on the readings for that class meeting. Each class meeting involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to the course discussion site.

Thoughtfulness and creativity are most important to , but take feasibility into account.

Next block of meetings

Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling.

Foundational material

Potentially some assignments based on the lectures.

Remainder of the course

Activities related to course projects

Development of a "full-blown" research project (although time restrictions may limit ambitions). For purposes, "interesting" and "well-thought-out" is more important than "successful".

 

Resources

 

Lectures

Note that assignments will remain visible even when details are hidden.
#1 Aug 26: Introduction

Assignments/announcements

  • Assignment A1: Pilot empirical research study. Note the first deadline (of several) on Wed Sep 1, 11:59pm.

Class images, links and handouts

References

#2 Aug 31: A1 inspiration: Overview of conversations

Assignments/announcements

  • Assignment A1 finalized. Note the first deadline (of several) on Wed Sep 1, 11:59pm.

Class images, links and handouts

visualization of keep/delete comments in temporal order
Image source: notabilia.net

References

#3 Sep 2: Two A1 datasets, alike in dignity

Assignments/announcements

  • Reminder: try to post a preliminary pilot-study idea/sketch/possibilities/questions on Monday.

Class images, links and handouts


References

#4 Sep 7: Language coordination: a "direct linguistic" interaction

Assignments/announcements

  • Reminder: check Ed Discussions for announcements. And provide thoughts/encouragement to your classmates!
  • Use Passkey to get access to paywalled content via Cornell.
  • Toolkits possibly useful for A1: see the "Resources" tab at the top of this page. Note that Cornell's ConvoKit comes with the CMV data.

Class images, links and handouts

New Yorker cartoon showing most business people at a meeting in ridiculous outfits, but one person isn't.  Caption: Damn it, Hopkins, didn't you get yesteryad's memo?
Image source: Jack Ziegler, The New Yorker, 06/09/2015. License obtained through The Cartoon Bank

References

#5 Sep 9: (lecture cancelled: out sick)

Assignments/announcements

  • Reminder: A1 milestone: post pilot-study idea(s) by tonight, and if grouping, do so on CMS by tomorrow night.
#6 Sep 14: Quick look at settings mentioned last time; some nuts and bolts

Assignments/announcements

  • Next assignment, "A1 Reflection", released
  • Reminder: A1 milestone: post project update by tomorrow night

Class images, links and handouts

Why is Mrs. Thatcher Interrupted So Often?  Nature title

References

#7 Sep 16: A1 group/individual appointments

Assignments/announcements

  • Reminder: A1 milestone: submit project report on CMS by Monday night, in-class presentations on Tuesday
#8 Sep 21: A1 class presentations

Assignments/announcements

  • Reminder: A1R milestone: post slides to Ed Discussions (as a new post) by tonight; post self-reflection part by Thursday night.

Class images, links and handouts

  • Recording (only available to enrolled students)
#9 Sep 23: Exploring differences between two language samples: "Fightin' Words"

Assignments/announcements

  • Reminder: A1R self-reflection (main task 1) due tonight; feedback to at least one other group (main task 2) due Monday night.

Class images, links and handouts

The annual death rate is one in six among people who know that the chances of getting killed by lightning are 1 in 7 million.

Image source: https://xkcd.com/795/.

References

Implementations

  • Convokit implementation, based on prior code from Jack Hessel implementation and Xanda Schofield's visualizer
  • Hessel, Jack (who took this class!). FightingWords. In Python.
  • Lim, Kenneth (who took this class!). fightin-words. Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment) from Jack's version.
  • Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
  • Silge, Julia, Alex Hayes, Tyler Schnoebelen. tidylo: Weighted Tidy Log Odds Ratio. In R.
#10 Sep 28: ''Snippet'' propagation and competition (which get at influence)

Assignments/announcements

Class images, links and handouts


Image source: David Malki ! Wondermark 1209: Talk and Awe

  • Recording (only available to enrolled students)

References

#11 Sep 30: Language and communities (I)

Assignments/announcements

  • A2 (proposals for final project) is due Wed Oct 20 11:59pm. Details forthcoming, but:
    • What to submit and what is allowed will be similar to the instructions for Fall 2017. For example, a concrete feasibility test will be required.
    • Posting preliminary ideas on Ed Discussions for earlier feedback is encouraged. This also facilitates grouping.
    • Lecture 14 (Oct 14) will be (mandatory) group/individual appointments with me to discuss possibilities. Exact schedule TBD. OK if you haven't posted any preliminary ideas at that point, but better to have done so.

Class images, links and handouts

This is the Handmaid's Tale conversation. THAT'S the Westworld conversation
Image by Peter Sipress. Licensed from the Cartoon Bank

References

#12 Oct 5: Language and Communities (II): "Norms"

Assignments/announcements

Class images, links and handouts

example of Singlish with lots of code switching

Image credit: Renae Cheng, 10 Bizarre Things Singaporeans Do That The Rest Of The World Won't Understand, 2021.

  • Recording (only available to enrolled students)

References

#13 Oct 7: Conversation trajectories

Assignments/announcements

Class images, links and handouts

Some ways in which a conversation can go wrongSome ways in which a conversation can go wrong
Images: (left) photo of a description of "Some ways in which a conversation can go wrong" from Ben Schott, Schottenfreude: German Words for the Human Condition (2013). (right) photo of a page from Allie Brosh, Solutions and Other Problems (2020).

  • Recording (only available to enrolled students)

References

Oct 12: No class — Fall Break
#14 Oct 14: Mandatory A2 (initial proposal) appointments

Assignments/announcements

#15 Oct 19: Intention inference

Assignments/announcements

Class images, links and handouts

<line>T-Rex: What if people can't tell when I'm being sarcastic?</line>
          <line>T-Rex: This is a serious question! What if in the past, when I assume somebody has picked up on what I took to be obvious sarcasm, they took me at face value? Oh my God! The misunderstandings would be legion! This is a huge concern!</line>
          <line>T-Rex: I may have unintentionally lied or alienated every one of my friends!</line>
          <line>Utahraptor: Again?</line>
          <line>T-Rex: Utahraptor! Can you tell when I'm being sarcastic?</line>
          <line>Utahraptor: Well, I think so, but say something sarcastic now and I'll tell you what it sounds like.</line>
          <line>T-Rex: Ok- just give me a second to think of something! </line>
          <line>T-Rex: *ahem*</line>
          <line>T-Rex: Oh no! I'm so worried! What if people can't tell when I'm being sarcastic?</line>
Image source: Dinosaur Comics 168, by Ryan North.

References

#16 Oct 21: Joint project-proposal discussion: organization/grouping, recommended directions, etc.

Assignments/announcements

  • recording (only available to enrolled students)

Class images, links and handouts

References

Oct 26: No class — No class
Oct 28: No class — No class

Assignments/announcements

#17 Nov 2: Feasibility-check appointments

Assignments/announcements

#18 Nov 4: How different are two language models for different sources? (Part one: language models)

Assignments/announcements

Class images, links and handouts


Figure by Chenhao Tan. See handout for explanation.

#19 Nov 9: How different are two language models for different sources? (Part two: an example language-model derivation)

Assignments/announcements

  • Reminder: results of your commitment-for-the-week due Thu Nov 11 Mon Nov 15, following the A2, A3, A4 instructions.

Class images, links and handouts


Image source: Dorothy Gambrel, Silent Spring (Cat and Girl)

  • Handout; recording (only available to enrolled students); scan of what was displayed on the document camera

References

#20 Nov 11: continued example of language-model development: latent information; start: functions for measuring the difference between language models

Assignments/announcements

  • Reminder: post the results of your commitment-for-the-week by Thu Nov 11 Mon Nov 15 11:59pm, following the A2, A3, A4 instructions.

Class images, links and handouts

  • Handout; recording (only available to enrolled students); scan of what was displayed on the document camera
#21 Nov 16: Distances between distributions (conclusion)

Assignments/announcements

Class images, links and handouts

References

#22 Nov 18: Introduction to discourse

Assignments/announcements

  • See recording for explanation of and due dates for (new) plan for "class presentation"

Class images, links and handouts


Image source: my personal collection.

References

#23 Nov 23: Latent discourse structure

Assignments/announcements

  • Reminder: progress-report/current-results "presentation" due on Ed Discussions Thursday noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.

Class images, links and handouts

One robot is looking at a flowerpot in its hands.  The other robot says, 'can't we go five minutes without you checking your flower?'
Cartoon by Tom Chitty. Licensed from CartoonStock.

References

Nov 25: No class — Thanksgiving Break
#24 Nov 30: Intentions, attention, discourse structure

Assignments/announcements

  • Reminder: progress-report/current-results "presentation" due on Ed Discussions Thursday noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.

Class images, links and handouts


Left: Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks at the 1996 match against Deep Blue. Photo by Kenneth Thompson, provided at computerhistory.org
Right: Maurice Ashley and Yasser Seirawan commentating on the 1997 re-match. Photo by Monroe Newborn, provided at computerhistory.org

References

#25 Dec 2: (Mandatory) give-in-class-feedback-on-Ed-Discussions session

Assignments/announcements

  • Reminder: progress-report "presentation" due on Ed Discussions today at noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.
  • Instructions posted for the final writeup, due on CMS Thu Dec. 16, 7pm (date determined by the registrar).
  • Course grade factors have now been set as shown on CMS: A1 = 30%; A1R=4%, A2 = 30%, A3=5%; A4=5%; A5=5%; Final writeup= 21%.
#26 Dec 7: (Mandatory) appointments with me (each group makes one)

Assignments/announcements

Thu Dec. 16, 7pm: final project writeup due (date determined by the registrar)

Code for generating the calendar formatting adapted from Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.