More and more of life is now manifested online, and many of the digital traces that are
     left by human activity are increasingly recorded in natural-language format.
     This research-oriented course examines the opportunities for natural language
     processing to contribute to the analysis and facilitation of socially embedded processes.
     Possible topics include analysis of online conversations, learning social-network structure,
     analysis of text in political or legal domains, review aggregation systems.

If you're looking for something other than lecture content and have javascript enabled, click on the appropriate tab above. The tabs may take a little time to come up.

Prerequisites, enrollment, related classes

Prerequisites All of the following: CS 2110 or equivalent programming experience; a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); proficiency with using machine learning tools (e.g., fluency at training an SVM, comfort with assessing a classifier’s performance using cross-validation)

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]. Auditing (either officially or unofficially) is not permitted.

Related classes: see Cornell's NLP course list, plus GOVT 6461, Public Opinion [the 2012 syllabus, time/location/some material/paper coverage is different 2017fall] COMM 6750 Research methods for social networks and social media.

The homepage for the previous running of CS6742 may also be useful. Here is the list of all prior runnings: 2016 fall :: 2015 fall :: 2014 fall :: 2013 fall :: 2011 spring

Administrative info

CMS page http://cmsx.csuglab.cornell.edu. Site for submitting assignments, unless otherwise noted. You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission (point 1), how to tell if CMS successfully received your files (point 2), how to form a group (point 4).

Course discussion site https://blogs.cornell.edu/nlpsoc2017fa (access restricted to enrolled students). Course announcements and Q&A/discussion site. Social interaction and all that, you know.

Office hours and contact info See Prof. Lee's homepage and scroll to the section on Contact and availability info.

Grading Of most interest to is productive research-oriented discussion participation (in class and/or on the course discussion site, interesting research proposals and pilot studies, and a good-faith final research project.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

Certain points deserve emphasis here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See "Acknowledging the Work of Others" in The Essential Guide to Academic Integrity at Cornell and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment “only” risks grade penalties.

Overall course structure

Lecture Agenda Pedagogical purpose Assignments
#1

Course overview

 

A1 released: pilot empirical study for a research idea based on the given readings.

#2 - #4

Lectures on topics related to the A1 readings

Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other.

 
Next block of meetings

Dicussion of proposed projects based on the readings

Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.

Discussion of student project proposals, based on the readings for that class meeting. Each class meeting involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to the course discussion site.

Thoughtfulness and creativity are most important to , but take feasibility into account.

Next block of meetings

Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling.

Foundational material

Potentially some assignments based on the lectures.

Remainder of the course

Activities related to course projects

Development of a "full-blown" research project (although time restrictions may limit ambitions). For purposes, "interesting" is more important than "thorough".

 

Resources

 

Lectures

Note that assignments will remain visible even when details are hidden.
#1 Aug 22: Introduction

Assignments/announcements

  • Assignment A1: Pilot empirical research study. Note the first deadline (of several) on Friday Aug. 25.

Class images, links and handouts

Lecture references

#2 Aug 24: A1 inspiration: Overview of conversations

Class images, links and handouts

Gespraechsgemetzel
Image: photo of entry 106 of Ben Schott, Schottenfreude: German Words for the Human Condition (2013)

Lecture references

Other references

#3 Aug 29: More A1 inspiration: discussion and persuasion

Assignments/announcements

  • First time in the new room (Gates 344 breakout room)

Class images, links and handouts

Wondermark cartoon
Image credit: David Malki !, In which Debate is debated, Feb 21st, 2014.

Lecture references

#4 Aug 31: Linguistic coordination

Assignments/announcements

  • Upcoming deadlines (default - 5pm unless otherwise noted): Friday Sept. 1, 2:30pm; Monday Sept 4

Class images, links and handouts

Lecture references

#5 Sep 5: Real-time measurement of coordination; A1 check-ins

Assignments/announcements

  • Life can be easier:
    • View discussion-site comments in reverse-chronological order by clicking on the speech balloon in the top bar
    • Cornell's Passkey for accessing restricted content in your browser. (So I will stop posting Cornell-access-specific URLs.)
  • Remember what we talked about sharing on the course discussion site!

References

#6 Sep 7: Appointments (see email for signup link)
#7 Sep 12: A1 presentations
#8 Sep 14: News, influence and information propagation, part 1

Assignments/announcements

Class images, links and handouts


Image source: David Malki ! Wondermark 1209: Talk and Awe

Lecture references

Other references

#9 Sep 19: News, influence and information propagation, part 2

Class images, links and handouts

  • ICWSM 2011 Spinn3r dataset: "386 million blog posts, news articles, classifieds, forum posts and social media content between January 13th and February 14th"

Lecture references

#10 Sep 21: Proposals discussion (A2)

Assignments/announcements

The readings

Class images, links and handouts


Image source: Dorothy Gambrell, Cat and Girl: Steal This Cat and Girl

Lecture references

Other references

#11 Sep 26: Words across space, community, and time

Class images, links and handouts

Lecture references

Other references

#12 Sep 28: Proposals discussion (A3)

Assignments/announcements

A5, the final-project proposal assignment, has been released. Note the multiple phases and due-dates.

The readings

Class images, links and handouts

Image source: English Language & Usage Stack Exchange. Click through for some interesting answers!

Lecture references (thanks to everyone for these pointers!)

Other references

#13 Oct 3: (Misc.) topics and power

Assignments/announcements

In-class reminder: A5, the final-project proposal assignment, has been released. Note the multiple phases and due-dates.

Class images, links and handouts

Lecture references

#14 Oct 5: Proposals discussion (A4)

Assignments/announcements

The readings

Lecture references

Other references

Oct 10: No class — Fall Break
#15 Oct 12: Optional project-proposal appointments

Assignments/announcements

#16 Oct 17: What makes two sub-languages different?

Class images, links and handouts

Image source: http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-never-tell-me-the-odds-6/.

Lecture references

#17 Oct 19: How different are two language models?

Assignments/announcements

  • Reminder: Phase 3 of A5 due on Monday; sign up beforehand for and attend mandatory feasibility-check appointment on Tuesday.

Class images, links and handouts

Lecture references

Other references

  • An example reference for deriving the maximum-likelihood estimate (and a little about Dirichlet priors) for the multinomial is the following slide set: Ronald Williams, CSG 200, Spring 2007 Maximum Likelihood vs. Bayesian Parameter Estimation. Some of the slides from there were taken with attribution from "apparently ... Nir Friedman", PGM: Tirgul 10, Parameter Learning and Priors, which goes into more depth and covers more topics.
#18 Oct 24: Feasibility-check appointments

Assignments/announcements

  • Only come to class during your scheduled appointment; see Phase 3 of A5.
#19 Oct 26: Language modeling and differences between language models, cont.

Assignments/announcements

  • A5 "our week" commitment statements due tonight

Class images, links and handouts

Inspirational, thought-provoking image by Chenhao Tan (see handout for explanation):

Lecture references

Other references

#20 Oct 31: Models of local language structure: vocabulary space

Assignments/announcements

Class images, links and handouts

Lecture references

Other references

#21 Nov 2: Foreshadowing: some connections between information theory and psycholinguistics; the Brown clustering algorithm for deriving structure of vocabulary space

Lecture references

Other references

#22 Nov 7: Local structure: phrase and sentence space

Assignments/announcements

  • A6: (due Fri Nov. 10, 11:59pm): post as a comment to your final project posting your planned project schedule from now until Dec 11th (the project due date)
  • No lecture on November 14

Class images, links and handouts

This parrot is no more. It has ceased to be. It's expired and gone to meet its maker. This is a late parrot. It's a stiff. Bereft of life, it rests in peace. If you hadn't nailed it to the perch, it would be pushing up the daisies. It's rung down the curtain and joined the choir invisible. This is an ex-parrot. - Graham Chapman Image source: AZ Quotes.

Lecture references

Other references

#23 Nov 9: Latent discourse/dialog structure

Class images, links and handouts


Left: Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks at the 1996 match against Deep Blue. Photo by Kenneth Thompson, provided at computerhistory.org
Right: Muarice Ashley and Yasser Seirawan commentating on the 1997 re-match. Photo by Monroe Newborn, provided at computerhistory.org

Lecture references

Other references

#24 Nov 14: No class
#25 Nov 16: Latent discourse/dialog structure, part two

Assignments/announcements

  • Project presentations after Thanksgiving Break

Class images, links and handouts


Clip source: hill35billy's YouTube channel; the movie is The Pink Panther Strikes Again. Start at 50s.

Lecture references

Other references

  • Section 24.1.5 of Jurafsky, Daniel and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition (there does not seem to be an electronic version of the third edition's relevant chapter available) [chapter link at UCSC]
  • Grosz, Barbara J., Weinstein, Scott, and Joshi, Aravind K. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21 (June): 203-225. A theory said to account for the "wine on the table" example.
#26 Nov 21: Latent discourse/dialog structure, part three

Assignments/announcements

Class images, links and handouts


Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature

Lecture references

Other references

Nov 23: No class — Thanksgiving Break
#27 Nov 28: Project presentations (attendance by all is mandatory)

Assignments/announcements

  • A7 posted on CMS, due Mon Dec. 11, 4:30pm (date determined by the registrar). Submit both your presentation materials and your final project writeup; but don't spend time post-editing your presentation materials after the fact, as I will only be using them as a reference while evaluating your writeup.

    The main evaluation criteria will be the reasonableness (in approach and amount of effort), thoughtfulness, and creativity of what you tried, as documented in your writeup. Individual effort within team projects will be taken into account; see item 3 below.

    1. For the author heading, list only the names of your teammates that are enrolled in the class, even if you had external collaborators. (Reason: only students in the class are submitting the paper for a grade.) But see item 2bi below.
    2. Include the following sections:
      1. "content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, experiments/results, related work, conclusions (what you learned), directions for future work, references
        • Make sure that your introduction section explicitly sets out your hypotheses/research questions.
        • Throughout, highlight your most interesting findings (positive or negative).
        • For the purposes of CS/IS 6742 submission, your related-work section does not need to be exhaustive; you may cover just a few most-related papers.
      2. An "acknowledgments" section: give the name and state the contribution of those who you received significant help from. (This may or may not include your advisor(s), one or both of your instructors, fellow students in the class).
        1. Authorship statement: if you intend to ask or have already arranged to have people other than your CS6742-enrolled teammates, also name each such person.
    3. Projects done collaboratively must also include a section describing who did what. External collaborators should be included in this enumeration.

Class images, links and handouts

References

#28 Nov 30: Project presentations (attendance by all is mandatory)

Assignments/announcements

Class images, links and handouts

Lecture references

Mon Dec. 11, 4:30pm: Final project writeup due

Code for generating the calendar formatting adapted from Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.