More and more of life is now manifested online, and many of the digital traces that are
left by human activity are increasingly recorded in natural-language format.
This research-oriented course examines the opportunities for natural language
processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.
Click on tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.
Enrollment, prerequisites, related classes
Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to keep class meetings heavily discussion- and group-research-focused.
Prerequisites All of the following: (1) CS 2110
or equivalent programming experience;
(2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning,
Cornell CS courses numbered 47xx or 67xx); (3)
proficiency with using machine learning tools
(e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)
CMShttps://cmsx.cs.cornell.edu.
Site for submitting assignments, unless otherwise noted. Login with NetID credentials and select CS 6742.
You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission; how to tell if CMS successfully received your files; how to form a group.
Course discussion sitehttps://edstem.org/us/courses/8208/discussion
(access restricted to enrolled students).
Course announcements and Q&A/discussion site.
Social interaction and all that, you know.
Office hours and contact info
See Prof. Lee's homepage and scroll to the section on Contact and availability info.
Grading Of most interest to is productive research-oriented discussion
participation (in class and/or on the course discussion site, interesting research proposals and pilot studies,
and a good-faith final research project.
Academic Integrity Academic and scientific integrity compels one to properly attribute to
others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.
Certain points deserve emphasis here.
In this class, talking to and helping others is strongly encouraged.
You may also, with attribution, use the code from other sources.
The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others.
Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers,
something you heard from a talk or a conversation or saw on the Internet,
or anything else, really, without acknowledging your sources.
See "Acknowledging the Work of Others" in
The Essential Guide to Academic Integrity at Cornell
and
http://www.theuniversityfaculty.cornell.edu/AcadInteg/
for more information and useful examples.
This is not to say that you can receive course credit for work that is not your own —
e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names.
However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on
top of any grade penalties imposed,
whereas not following the rules of the assignment “only” risks grade penalties.
Overall course structure
Lecture
Agenda
Pedagogical purpose
Assignments
#1
Course overview
A1 released: pilot empirical study for a research idea based on the given readings.
#2 - #6
Lectures on topics related to the A1 readings
Case studies to explore some topics and research styles find interesting.
Get-to-know-you exercises to get everyone familiar and comfortable with each other.
Next block of meetings
Dicussion of proposed projects based on the readings
Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.
Discussion of student project proposals, based on the readings for that class meeting.
Each class meeting involves everyone reading at least one of the two assigned papers
and posting a new research proposal based on the reading to the course discussion site.
Thoughtfulness and creativity are most important to , but take feasibility into account.
Next block of meetings
Lectures on, potentially, linguistic coordination, linguistic adaptation, influence,
persuasion, diffusion, discourse structure, advanced language modeling.
Foundational material
Potentially some assignments based on the lectures.
Remainder of the course
Activities related to course projects
Development of a "full-blown" research project (although time restrictions may limit ambitions).
For purposes, "interesting" and "well-thought-out" is more important than "successful".
Resources
Cornell's Passkey
for your web browser: "When you’re off-campus, connect to databases, journals and e-books that would otherwise be restricted or hidden behind paywalls through Passkey."
Bryan, Christopher J., Gregory M. Walton, Todd Rogers, and Carol S. Dweck. 2011.
Motivating voter turnout by invoking the self.
Proceedings of the National Academy of Sciences
108 (31): 12653-12656.
Response to followup: "What is an authentic replication attempt and what is not? Gerber et al.’s paper ... gives us the opportunity to reflect on this issue of longstanding concern to us." Bryan, Christopher J., Gregory M. Walton, and Carol S. Dweck, Oct 18, 2016. Psychologically authentic versus inauthentic replication attempts. Proceedings of the National Academy of Sciences 113(43): E6548.
Response: "Although we find Bryan et al.’s ... explanation unconvincing, this exchange is well-timed. The original findings have (to our knowledge) never been successfully replicated, and this November provides ample opportunity to test noun vs. verb in the political environment Bryan et al. ... suggest is ideal for producing 11–14 percentage-point effects." Gerber, Alan S., Gregory A. Huber, Daniel R. Biggers, and David J. Hendry, Oct 25, 2016. Reply to Bryan et al.: Variation in context unlikely explanation of nonrobustness of noun versus verb results. Proceedings of the National Academy of Sciences 113(43): E6549--E6550.
Kolbert, Elizabeth. 2017. Why Facts Don’t Change Our Minds: New discoveries about the human mind show the limitations of reason. The New Yorker, Books section. [publisher link] [highlighted link, viewable with Cornell NetID login]
Krohn, Rachel, and Tim Weninger. 2019. “Modelling Online Comment Threads from Their Start.” 2019 IEEE International Conference on Big Data (Big Data).
Sample oral argument transcript from the US Supreme Court. Quote: "I am attributing rationality to someone who was obviously not doing his job very well".
Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg.
2012. Echoes of power: Language effects and power differences in social interaction.
WWW, pp. 699--708.
[ACM link]
[
paper "homepage" (paper, slides, data, etc.)]
Sample oral argument transcript from the US Supreme Court. Quote: "I am attributing rationality to someone who was obviously not doing his job very well".
Beattie, Geoffrey W., Anne Cutler, and Mark Pearson. 1982. Why Is Mrs Thatcher Interrupted so Often? Nature 300 (December): 744--747. See also Bull and Mayer (1988).
Bunt, Harry, Volha Petukhova, David Traum, and Jan Alexandersson. 2017. Dialogue Act Annotation with the ISO 24617-2 Standard. In Multimodal Interaction with W3C Standards: Toward Natural User Interfaces to Everything, edited by Deborah A. Dahl, 109–35. Cham: Springer International Publishing.
Convokit implementation, based on prior code from Jack Hessel implementation and Xanda Schofield's visualizer
Hessel, Jack (who took this class!).
FightingWords. In Python.
Lim, Kenneth (who took this class!).
fightin-words.
Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment)
from Jack's version.
Marzagão, Thiago. mcq.py. "Because this script processes one file at a time, it can handle corpora that are too large to fit in memory".
Prabhumoye, Shrimai, Samridhi Choudhary, Evangelia Spiliopoulou,
Christopher Bogart, Carolyn Penstein Rosé, and Alan W. Black. 2017.
Linguistic
markers of influence in informal interactions. In the Workshop on
Natural Language Processing and Computational Social Science,
53--62.
A2 (proposals for final project) is due Wed Oct 20 11:59pm. Details forthcoming, but:
What to submit and what is allowed will be similar to the instructions for Fall 2017. For example, a concrete feasibility test will be required.
Posting preliminary ideas on Ed Discussions for earlier feedback is encouraged. This also facilitates grouping.
Lecture 14 (Oct 14) will be (mandatory) group/individual appointments with me to discuss possibilities. Exact schedule TBD. OK if you haven't posted any preliminary ideas at that point, but better to have done so.
Class images, links and handouts
Image by Peter Sipress.
Licensed from the Cartoon Bank
Studies of some of the issues we've seen applied specifically to software engineering (such as multiple communities, unhealthy interactions, information propagation): Prem Devanbu, Vladimir Filkov, Bogdan Vasilescu and colleagues' work, inter alia.
Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. “Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game.” In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1650–59. Beijing, China: Association for Computational Linguistics. https://doi.org/10.3115/v1/P15-1159.
Reminder: submit a request to the A3 appointment-booking site (for next Tuesday the 2nd) by Friday the 29th 11:59pm, following the A2, A3, A4 instructions.
Reminder: progress-report/current-results "presentation" due on Ed Discussions Thursday noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.
Class images, links and handouts
Cartoon by Tom Chitty. Licensed from CartoonStock.
Benotti, Luciana, and Patrick Blackburn. 2014. Conversational Implicatures. In Context in Computing: A Crossdisciplinary Approach for Modelling the Real World. Springer.
Grice, H. P. 1982. Logic and Conversation. In Speech Acts, edited by Peter Cole, 5. ed. Syntax and Semantics 3. New York: Academic Press.
Grosz, Barbara J., Weinstein, Scott, and Joshi, Aravind K. 1995.
Centering:
A framework for modeling the local coherence of discourse.
Computational Linguistics 21 (June): 203-225.
A theory said to account for the "wine on the table" example: structural preferences are subject > direct object > indirect object > other entities.
#24 Nov 30: Intentions, attention, discourse structure
Assignments/announcements
Reminder: progress-report/current-results "presentation" due on Ed Discussions Thursday noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.
Class images, links and handouts
Left: Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks at the 1996 match against Deep Blue.
Photo by Kenneth Thompson,
provided at computerhistory.org
Right: Maurice Ashley and Yasser Seirawan commentating on the 1997 re-match. Photo by Monroe Newborn, provided at
computerhistory.org
#25 Dec 2: (Mandatory) give-in-class-feedback-on-Ed-Discussions session
Assignments/announcements
Reminder: progress-report "presentation" due on Ed Discussions today at noon. And schedule an appointment time for your group with me here: https://6742-2021-a5appts.youcanbook.me/. See lecture 22 recording for instructions.
Instructions posted for the final writeup, due on CMS Thu Dec. 16, 7pm (date determined by the registrar).
Course grade factors have now been set as shown on CMS: A1 = 30%; A1R=4%, A2 = 30%, A3=5%; A4=5%; A5=5%; Final writeup= 21%.
#26 Dec 7: (Mandatory) appointments with me (each group makes one)
Code for generating the calendar formatting
adapted from Andrew Myers. Portions of the content of this
website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple
runnings of this course.