More and more of life is now manifested online, and many of the digital traces that are
left by human activity are increasingly recorded in natural-language format.
This research-oriented course examines the opportunities for natural language
processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.
Click on the tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.
Enrollment, prerequisites, related classes
Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to enable class meetings to be heavily discussion-focused.
Prerequisites All of the following: (1) CS 2110
or equivalent programming experience;
(2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning,
Cornell CS courses numbered 47xx or 67xx); (3)
proficiency with using machine learning tools
(e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)
Please take a look at the contents of some of the papers on this quick list of sample papers (URLs should be clickable) before deciding on enrollment; if most of them seem completely impenetrable (or uninteresting), this class may not be the right fit for you.
Zoom. Only accessible to enrolled students, and only meant for cases of illness, travel, and emergency. Notify the instructor ahead of time for each lecture you need to zoom-attend.
CMS.
Site for submitting assignments, unless otherwise noted. Login with NetID credentials and select course CS 6742.
You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission; how to tell if CMS successfully received your files; how to form a group.
Office hours and contact info
See Prof. Lee's homepage and scroll to the section on Contact and availability info.
In-class presentations (exact number depends on number of students enrolled and the difficulty of the papers we tackle). These may involve meeting with the instructor beforehand.
For days where another student is presenting: all non-presenting students are expected to prepare for class by at least skimming the abstract and intro of the paper(s) to be presented
Participation in discussion, either during class meetings or offline
Occasional small exercises of lecture material
Midterm paper that reviews and critically analyzes the class material. Due Tu Mar 26 11:59pm on CMS; see the full instructions.
Final paper that reviews and critically analyzes the class material. Due Thu May 16 4:30pm
Policies
Use of AI generation/editing systems: For each component of the workload, the vast majority of the intellectual work must be originated by you, not by text generation systems. It is OK to use aids for writing fluency --- but note that writing fluency is not part of the assessment rubrics above anyway.
Example of something that is allowed: you write the initial draft(s), review its contents and double-check with the original paper. You then use some form of text generation system to proofread and improve the flow. You do not use the system’s output to add extra content.
Example of something that is definitely not allowed: You essentially use a text generation system to generate an early draft, even if you later post-edit and correct the output.
Example of something that is OK but requires special treatment: You start with the procedure in point 1. But, the system output includes good points that you hadn’t thought of before, or makes you realize that a point you had made isn’t quite right.
You may include the new material and/or make appropriate edits, but you should mention what specific system(s) you used and what changes you made based on it.
Attendance: Please attend all class meetings that you are reasonably able to.
If attendance isn’t a reasonable option for a given class meeting, please contact the instructor ahead of time, if possible, for planning purposes.
Illness is always a valid reason to not attend and is not held against participation accounting.
Deadlines: We do not have slip days, and there is no "you can submit late for a small penalty": you need to hit the deadlines. But if there are extenuating circumstances, please email the instructor and we can talk. (Still submit what you have before the deadline, so we have an indication of your progress at that point.)
SDS accommodations: The instructor(s) have online access to SDS letters regarding accommodations for exams and other course matters, and will honor these accommodations. As recommended by the SDS office, we do ask that for each deadline, you let the instructor know beforehand in a timely fashion whether you wish to apply your accommodations.
Academic integrity
Claiming the work of others as your own is intellectual fraud and a violation of academic integrity. To avoid this, always track and credit your sources appropriately.
Liberman, Mark. Debate words (Fox News Republican presidential debate) 2023. Liberman's Language Log blog post also links to his previous analyses of other data using Monroe et al.'s technique.
Hessel, Jack (who took this class!).
FightingWords. In Python.
Lim, Kenneth (who took this class!).
fightin-words.
Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment)
from Jack's version.
Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
Fitch, W. Tecumseh. 2007. An Invisible Hand. Nature 7163:665--667. https://doi.org/10.1038/449665a.
Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/P16-1141.
Noble, Bill, Asad Sayeed, Raquel Fernández, and Staffan Larsson. 2021. Semantic Shift in Social Networks. *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, 26–37.
#13 Mar 5: Conversation II: the Grosz and Sidner '86 theory of discourse
Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks during the period of the 1996 match against Deep Blue.
Photo by Kenneth Thompson,
provided at computerhistory.org
Bao, Jiajun, Junjie Wu, Yiming Zhang, Eshwar Chandrasekharan, and David Jurgens. 2021. Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations. In Proceedings of the Web Conference 2021 (WWW '21). Association for Computing Machinery, New York, NY, USA, 1134–1145. https://doi.org/10.1145/3442381.3450122
Yuan, Jiaqing, and Munindar P. Singh. 2023. Conversation Modeling to Predict Derailment. International AAAI Conference on Web and Social Media (ICWSM) 17: 926–35. doi:10.1609/icwsm.v17i1.22200.
#18 Mar 21: Lecture title
Lecture
Lecture references and further reading
#19 Mar 26: Lecture title
Assignments/announcements
Midterm paper due, 11:59pm
Lecture
Lecture references and further reading
#20 Mar 28: Lecture title
Lecture
Lecture references and further reading
Apr 2: No class — Spring break
Apr 4: No class — Spring break
#21 Apr 9: Lecture title
Lecture
Lecture references and further reading
#22 Apr 11: Lecture title
Lecture
Lecture references and further reading
#23 Apr 16: Lecture title
Lecture
Lecture references and further reading
#24 Apr 18: Lecture title
Lecture
Lecture references and further reading
#25 Apr 23: Lecture title
Lecture
Lecture references and further reading
#26 Apr 25: Lecture title
Lecture
Lecture references and further reading
#27 Apr 30: Lecture title
Lecture
Lecture references and further reading
#28 May 2: Lecture title
Lecture
Lecture references and further reading
#29 May 7: Lecture title
Lecture
Lecture references and further reading
May 16 (Th), 4:30pm, as determined by the registrar: Final paper due
Code for generating the calendar formatting
adapted from Andrew Myers. Portions of the content of this
website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple
runnings of this course.