CS/IS 6742, Spring 2024: Natural Language Processing and Social Interaction.  Prof. Lillian Lee. Tu/Th 1:25-2:40pm, Phillips 213 Image source: http://en.wikipedia.org/wiki/The_School_of_Athens

Main

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.

Click on the tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.

Enrollment, prerequisites, related classes

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to enable class meetings to be heavily discussion-focused.

Prerequisites All of the following: (1) CS 2110 or equivalent programming experience; (2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); (3) proficiency with using machine learning tools (e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)

Please take a look at the contents of some of the papers on this quick list of sample papers (URLs should be clickable) before deciding on enrollment; if most of them seem completely impenetrable (or uninteresting), this class may not be the right fit for you.

Related classes: see Cornell's NLP course list.

In particular, Spring 2024 courses CS 6741 Topics in natural language processing and machine learning, CS 5740 Natural language processing (Cornell Tech students only), INFO 4940-LEC 006 Advanced NLP for Humanities Research, CS 4744 (and other crosslists) Computational linguistics I, or CS/IS 4300 Language and information may be a better choice for you; they are excellent courses for sure!

Other classes I am less knowledgeable about: SOC 6520 Culture wars in the age of tribal politics, GOVT 3282 Data science applications in political and social research.

The webpage from the last time I (Prof. Lee) taught this class may be useful, as might the webpage from the last time I taught a graduate NLP course.

Administrative info

Websites

Office hours and contact info See Prof. Lee's homepage and scroll to the section on Contact and availability info.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

Certain points deserve emphasis here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See "Acknowledging the Work of Others" in The Essential Guide to Academic Integrity at Cornell and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment “only” risks grade penalties.

 

Lectures

Note that assignments will remain visible even when details are hidden.
#1 Jan 23: Introduction

Lecture

Lecture references and further reading

#2 Jan 25: Getting to know each other; easing into paper readings

Assignments/announcements

  • Annotation of the "No country" paper due on Perusall by midnight Wed Jan 31. See slides for details.

Lecture

Lecture references and further reading

#3 Jan 30: Exploring differences between two language samples: "Fightin' Words"

Lecture

Lecture references and further reading

Implementations

  • Convokit implementation, based on prior code from Jack Hessel implementation and Xanda Schofield's visualizer
  • Denny, Matt. SpeedReader. In R.
  • Hessel, Jack (who took this class!). FightingWords. In Python.
  • Lim, Kenneth (who took this class!). fightin-words. Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment) from Jack's version.
  • Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
  • Silge, Julia, Alex Hayes, Tyler Schnoebelen. tidylo: Weighted Tidy Log Odds Ratio. In R.
#4 Feb 1: Distances between language sources
plot of the behavior of different distributional difference functions

Lecture

Lecture references and further reading

#5 Feb 6: "No country for old members"

Lecture

Lecture references and further reading

#6 Feb 8: Breezy intro to semantic shift

Assignments/announcements

  • Assignment 2: presentation/annotation of semantic shift papers: schedule and instructions posted.

Lecture

Lecture references and further reading

#7 Feb 13: Semantic shift II

Assignments/announcements

  • Assignment 3 "Fightin' words" announced: Ed post due and presentations on Th Feb 22. Details in slides.

Lecture

Lecture references and further reading

#8 Feb 15: Semantic shift: presentations by PH and BW.

Lecture

Lecture references and further reading

#9 Feb 20: Semantic shift: Presentations by DK, HK, and TW

Lecture

Lecture references and further reading

#10 Feb 22: Fightin' words presentations

Lecture

  • Slides on are Ed discussion. Recording (only accessible to enrolled students)
#11 Feb 27: No class: Feb break. Keeping the lecture number so that even lecture numbers remain Thursdays.
#12 Feb 29: Conversation I

Assignments/announcements

Lecture

Lecture references and further reading

#13 Mar 5: Lecture title

Lecture

Lecture references and further reading

#14 Mar 7: Lecture title

Lecture

Lecture references and further reading

#15 Mar 12: Lecture title

Lecture

Lecture references and further reading

#16 Mar 14: Lecture title

Lecture

Lecture references and further reading

#17 Mar 19: Lecture title

Lecture

Lecture references and further reading

#18 Mar 21: Lecture title

Lecture

Lecture references and further reading

#19 Mar 26: Lecture title

Lecture

Lecture references and further reading

#20 Mar 28: Lecture title

Lecture

Lecture references and further reading

Apr 2: No class — Spring break
Apr 4: No class — Spring break
#21 Apr 9: Lecture title

Lecture

Lecture references and further reading

#22 Apr 11: Lecture title

Lecture

Lecture references and further reading

#23 Apr 16: Lecture title

Lecture

Lecture references and further reading

#24 Apr 18: Lecture title

Lecture

Lecture references and further reading

#25 Apr 23: Lecture title

Lecture

Lecture references and further reading

#26 Apr 25: Lecture title

Lecture

Lecture references and further reading

#27 Apr 30: Lecture title

Lecture

Lecture references and further reading

#28 May 2: Lecture title

Lecture

Lecture references and further reading

#29 May 7: Lecture title

Lecture

Lecture references and further reading

May TBD, TBD: Final writeup due

Code for generating the calendar formatting adapted from Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.