More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include analysis of online conversations, learning social-network structure, analysis of text in political or legal domains, review aggregation systems. CDNM's web page CDNM's web page

No tab selected

(If you're looking for anything other than lecture contents and have javascript enabled, click on the appropriate tab above.)

Prerequisites, enrollment, related classes

Prerequisites All of the following: CS 2110 or equivalent programming experience; a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); proficiency with using machine learning tools (e.g., fluency at training an SVM, comfort with assessing a classifier’s performance using cross-validation)

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]. Auditing (either officially or unofficially) is not permitted.

Related classes: see Cornell's NLP course list, plus INFO 6750 Causal Inference and Design of Experiments , INFO 6310 Behavior and Information Technology.

The homepage for the previous running of CS6742 may also be useful. Here is the list of all prior runnings: 2017 fall :: 2016 fall :: 2015 fall :: 2014 fall :: 2013 fall :: 2011 spring

Administrative info and overall course structure

Course homepage http://www.cs.cornell.edu/courses/cs6742/2018fa. Main site for course info, assignments, readings, lecture references, etc.; updated frequently.

CMS page http://cms.csuglab.cornell.edu. Site for submitting assignments, unless otherwise noted.

Piazza page http://piazza.com/cornell/Fall2018/cs6742 Course announcements and Q&A/discussion site. Social interaction and all that, you know. (Access code provided on first day of classes.)

Contacting the instructor

Overview of course schedule. Details subject to change. Full schedule is maintained on the main course webpage.

Lecture Agenda Pedagogical purpose Assignments
#1

Course overview

 

Pilot empirical study for a research idea based on readings provided.

# 2 - #3

A1 Brainstorming (Prof. DNM out)

   
# 4 - #7

Lecture topics related to the A1 readings: Online reviews: individual expression, community dynamics; Online asynchronous conversations.

Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other.

 
Next block of meetings

Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling

Foundational material

Potentially some assignments based on the lectures.

Next block of meetings

Dicussion of proposed projects based on the readings

Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.

Discussion of student project proposals, based on the readings for that class meeting. Each class meeting thus involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to Piazza.

Thoughtfulness and creativity are most important to , but take feasibility into account.

Remainder of the course

Activities related to course projects

Development of a "full-blown" research project (although time restrictions may limit ambitions). For our purposes, "interesting" is more important than "thorough".

 

Some time in December (to be determined by the registrar): final project writeup due

Grading Of most interest to is productive research-oriented discussion participation (in class and on Piazza), interesting research proposals and pilot studies, and a good-faith final research project.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

We emphasize certain points here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See http://www.cs.cornell.edu/courses/cs6742/2011sp/handouts/ack-others.pdf and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment only risk grade penalties.

Resources

 

Lectures

Note that assignments will remain visible even when details are hidden.
#1 Aug 23: Course overview: scope, course goals, course design
  • Details will be appear here before each lecture.
Assignments/announcements:
  • Assignment A1 released
  • Student-information assignment released: see handout

Class images, links and handouts

Datasets

References

#2 Aug 28: No Lecture: Prof. DNM out
#3 Aug 30: A1 Brainstorming with Jonathan P. Chang
#4 Sep 4: Types and properties of conversations

Class images, links and handouts

#5 Sep 6: A1 check-ins, Instrumentation, Conversational Structure
#6 Sep 11: From monologues to conversations; Case study: from hypothesis to research (Coordination)
Class images, links and handouts

References

#7 Sep 13: Social aspects of coordination; Second case study (Socialization)

Assignments/announcements

  • Upcoming deadlines: A1 writeup and presentations due next week

Class images, links and handouts

Lecture references

#8 Sep 18: No Lecture: Prof. DNM out
#9 Sep 20: A1 presentations (fun! fun! fun!)
#10 Sep 25: From hypothesis to research: Second case study (Socialization)

Assignments/announcements

Class images, links and handouts

Lecture references

#11 Sep 27: News, influence and information propagation, part 1

Assignments/announcements

Class images, links and handouts

Lecture references

Other references

#12 Oct 2: Proposals discussion (A2)

Assignments/announcements

The readings

#13 Sep 27: News, influence and information propagation, part 2

Assignments/announcements

Lecture references

Other references

#14 Oct 11: Proposals discussion (A3)

Assignments/announcements

A5, the final-project proposal assignment, has been released. Note the multiple phases and due-dates.

The readings

#15 Oct 16: Proposals discussion (A4)

Assignments/announcements

The readings

#16 Oct 18: (Breaking) conversation rules

Assignments/announcements

Class images, links and handouts

References

#17 Oct 25:Advanced yet “off-the-shelf” features roundupp
Assignments/announcements

Class images, links and handouts

References

#18 Oct 30: What makes two sub-languages different?

Class images, links and handouts

Image source: http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-never-tell-me-the-odds-6/.

Lecture references

Nov6, Nov 8: No class — CSCW
#21 Nov 13: N-Gram Language Models

Final project writeup due Thursday Dec. 13, 4:30pm (date determined by the registrar). Submit both your presentation materials and your final project writeup; but don't spend time post-editing your presentation materials after the fact, as I will only be using them as a reference while evaluating your writeup.

The main evaluation criteria will be the reasonableness (in approach and amount of effort), thoughtfulness, and creativity of what you tried, as documented in your writeup. Individual effort within team projects will be taken into account; see item 3 below.

  1. Use the ICWSM style files provided by AAAI (LaTex style and bib files, Word template)
    1. We make this requirement to facilitate submission to ICWSM 2019. However, note that your final-project submission should have your names and acknowledgments included, in a particular format (see item 1c amd 2b below); in contrast, you will want to strip any identifying information for ICWSM submissions.
    2. AAAI prefers non-numbered section headings. You may change the style files to include section numbers in your headings for the purposes of CS6742 submission.
    3. For the author heading, list only the names of your teammates that are enrolled in the class, even if you had external collaborators. (Reason: only students in the class are submitting the paper for a grade.) But see item 2b1 below.
  2. Include the following sections:
    1. "content" sections: abstract, introduction/motivation (broad question), data description (how you gathered, cleaned, and processed it), methods (discuss operationalization of the high level question), experiments (highlight which controls you've done and why they are needed), related work, references, conclusions (what you learned), directions for future work.
      • Make sure that your introduction section explicitly sets out your hypothesis or hypotheses.
      • Throughout, highlight your most interesting findings (positive or negative).
      • For the purposes of CS6742 submission, your related-work section does not need to be exhaustive; you may cover just a few most-related papers.
    2. An "acknowledgments" section: give the name and state the contribution of those who you received significant help from. (This may or may not include your advisor(s), your instructor, fellow students in the class).
      1. Authorship statement: if you intend to ask or have already arranged to have people other than your CS6742-enrolled teammates, also name each such person.
  3. Projects done collaboratively must also include a section describing who did what. External collaborators should be included in this enumeration.
  4. Use the number of pages you feel is appropriate.
Class images, links and handouts

Lecture references

#22 Nov 15Mandatory projects progress-and-problems appointments
By 2pm the afternoon on Wednesday, post a Piazza followup to your proposal that summarizes your progress and what discussion points or problems you'd like to bring up with me. Ideally, this followup post will be the agenda for your team's appointment, and will make the meeting efficient and useful for you. -->
#23 Nov 20: Entropy
Assignments/announcements
Class images, links and handouts

Lecture references

Nov 22: No class — Thanksgiving Break
#24 Nov 27Cross-entropy and divergence, Language models in practice
#25 Nov 28: Project presentations (attendance by all is mandatory)

Assignments/announcements

Aim for 10-15 minutes presentations (include results, challenges and questions). Class participation is important.
#26 Dec 4 Final lecture

Code for generating the calendar formatting adapted from the original versions created by Andrew Myers