More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include analysis of online conversations, learning social-network structure, analysis of text in political or legal domains, review aggregation systems. CDNM's web page CDNM's web page

Prerequisites, enrollment, related classes

Prerequisites All of the following: CS 2110 or equivalent programming experience (Python encouraged); a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); proficiency with using machine learning tools (e.g., fluency at training an SVM, comfort with assessing a classifier’s performance using cross-validation)

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]. If you are interested in taking the class but do not belong to these categories, come to first day of class when enrolment will be discussed. Auditing (either officially or unofficially) is not permitted.

Masking Masks are encouraged but not required in classrooms for Fall 22, according to university policy. I would like to request that the class be masked and I will provide masks for you. I understand that I cannot require you to wear a mask, but I would be grateful if you did.

Related classes: see Cornell's NLP course list

The homepage for the previous running of CS6742 may also be useful. Here is the list of all prior runnings: 2021 fall :: 2019 fall :: 2018 fall :: 2017 fall :: 2016 fall :: 2015 fall :: 2014 fall :: 2013 fall :: 2011 spring

Administrative info and overall course structure

Course homepage Main site for course info, assignments, readings, lecture references, etc.; updated frequently.

CMS page Site for submitting assignments, unless otherwise noted.

Piazza page Course announcements and Q&A/discussion site. Social interaction and all that, you know. (Access code provided on first day of classes.)

Contacting the instructor

Overview of course schedule. Details subject to change. Full schedule is maintained on the main course webpage.

Lecture Agenda Pedagogical purpose Assignments

Course overview


A1: Pilot empirical study for a research idea based on provided datasets and readings.

# 2 - #3

Get-to-know-you exercises to get everyone familiar and comfortable with each other. A1 related discussions.

How to form research questions and quickly test their feasibility.  
# 4 - #7

Lecture topics related to the A1 startup projects: Conversational Structure, Lingusitic Cues, Conversation-specific Phenomena.

Case studies to explore some topics and research styles find interesting.

Next block of meetings

Dicussion of proposed projects based on starter projects and on topical readings

Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.

Discussion of student project proposals, based on the readings for that class meeting. Each class meeting thus involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to Piazza.

Thoughtfulness and creativity are most important to , but take feasibility into account.

Next block of meetings

Lectures on, potentially, linguistic socialization, conversational failure, moderation, influence, persuasion, diffusion, discourse structure, advanced language modeling

Familiarity with foundational material: concepts and methodology.

Potentially some assignments based on the lectures.

Remainder of the course

Activities related to course projects

Development of a "full-blown" research project (although time restrictions may limit ambitions). For our purposes, "interesting" is more important than "thorough".


Some time in December (to be determined by the registrar): final project writeup due

Grading Of most interest to is productive research-oriented discussion participation (in class and on Piazza), interesting research proposals and pilot studies, and a good-faith final research project.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

We emphasize certain points here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See and for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment only risk grade penalties.




Note that assignments will remain visible even when details are hidden.
#1 Aug 23: Course overview: scope, course goals, course design
  • Details will be appear here before each lecture.
  • Assignment A1 released (updates on Piazza)
  • Student-information assignment released on Piazza

Class images, links and handouts



#2 Aug 25: ConvoKit tutorial for A1 (Jonathan P. Chang)
# 3 Aug 30: Discussion of A1
#4 Sep 1: From monologues to conversations

Class images, links and handouts

#5 Sep 6: Conversational structure

Class images, links and handouts

#6 Sep 8: Conversatinal language. Case study: from hypothesis to research (Coordination)
Class images, links and handouts


#7 Sep 13: Social aspects of linguistic coordination


  • Upcoming deadlines: A1 Part D due today, Part E and presentations next week

Class images, links and handouts

Lecture references

#8 Sept 15: From hypothesis to research: Second case study (Socialization)

Class images, links and handouts

Lecture references

  • Nguyen, Dong, A. Seza Doğruöz, Carolyn P. Rosé, and Franciska de Jong. 2016. Computational Sociolinguistics: A Survey.
  • #9 Sept 20: (Breaking) conversation rules


    A1 presentations due on Thursday

    Class images, links and handouts


    #10-11 Sep 22, Sept 27: Discussions of starter projects (A1) based on explorations of conversational datasets.
    #12-14 Sept 29, Oct 4, Oct 6 : Project-inspiring discussions based on readings (A2): Moderation, conversational trajectories, persuasion, debates, narratives, polarization, framing, education, conversational dynamics, user roles.


    • Lambert et al. 2022 Conversational Resilience
    • Luu et al. 2019. Measuring Online Debaters’ Persuasive Skill from Text over Time
    • Antoniak et al. 2019. Narrative Paths and Negotiation of Power in Birth Stories.
    • Demszky et al. 2019. Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings.
    • Alic et al. 2022. Computationally Identifying Funneling and Focusing Questions in Classroom Discourse
    • Yang et al 2019. Seekers, Providers, Welcomers, and Storytellers: Modeling Social Roles in Online Health Communities
    • Links and additional related referenes on Piazza
    #15 Oct 14: Project propsal discussion and team-making.
    #16 Oct 18: What makes two sub-languages different?

    Class images, links and handouts

    Image source:

    Lecture references

    #17 Oct 20: N-Gram Language Models
    Class images, links and handouts

    Lecture references

    #18,19 Oct 25,27: Mandatory feasibility check-ins

    Assignments/announ cements

    #20 Nov 1: Entropy, Cross-entropy and Divergence
    Class images, links and handouts

    Lecture references

    #21 Nov 3Language models in practice, Confounds and Controls
    #22,23 Nov 8,10: Project check-ins


    #24,25,26 Nov 15,17,22Causality and Quasi-experimental Designs in the Conversational Domain

    Code for generating the calendar formatting adapted from the original versions created by Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.