Natural Language Processing and Social Interaction, Fall 2021

More and more of life is now manifested online, and many of the digital traces that are left by human activity are increasingly recorded in natural-language format. This research-oriented course examines the opportunities for natural language processing to contribute to the analysis and facilitation of socially embedded processes. Possible topics include conversation modeling, analysis of group and sub-group language, language and social relations, persuasion and other causal effects of language.

Click on tabs just above to see information about enrollment/prerequisite policies, administrative info, overall course structure, resources, and so on.

Enrollment, prerequisites, related classes

Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]; PhD students not in CS/INFO will receive manual instructor permission to enroll (details to be arranged at lecture). Auditing (either officially or unofficially) is not permitted. These policies are to keep class meetings heavily discussion- and group-research-focused.

Prerequisites All of the following: (1) CS 2110 or equivalent programming experience; (2) a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning, Cornell CS courses numbered 47xx or 67xx); (3) proficiency with using machine learning tools (e.g., fluency at training an SVM or other classifier, comfort with assessing a classifier’s performance using cross-validation)

Related classes: see Cornell's NLP course list. Also GOVT 3294 Post-Truth Politics COMM 6750 Research Methods for Social Networks and Social Media, COMM 6770 Attitudes and Social Judgment

All prior runnings of CS/INFO 6742: 2019 fall :: 2018 fall :: 2017 fall :: 2016 fall :: 2015 fall :: 2014 fall :: 2013 fall :: 2011 spring

Administrative info

CMS https://cmsx.cs.cornell.edu. Site for submitting assignments, unless otherwise noted. Login with NetID credentials and select CS 6742. You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission; how to tell if CMS successfully received your files; how to form a group.

Course discussion site https://edstem.org/us/courses/8208/discussion (access restricted to enrolled students). Course announcements and Q&A/discussion site. Social interaction and all that, you know.

Office hours and contact info See Prof. Lee's homepage and scroll to the section on Contact and availability info.

Grading Of most interest to is productive research-oriented discussion participation (in class and/or on the course discussion site, interesting research proposals and pilot studies, and a good-faith final research project.

Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.

Certain points deserve emphasis here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See "Acknowledging the Work of Others" in The Essential Guide to Academic Integrity at Cornell and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.

This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment “only” risks grade penalties.

Overall course structure

Lecture	Agenda	Pedagogical purpose	Assignments
#1	Course overview		A1 released: pilot empirical study for a research idea based on the given readings.
#2 - #6	Lectures on topics related to the A1 readings	Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other.
Next block of meetings	Dicussion of proposed projects based on the readings	Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.	Discussion of student project proposals, based on the readings for that class meeting. Each class meeting involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to the course discussion site. Thoughtfulness and creativity are most important to , but take feasibility into account.
Next block of meetings	Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling.	Foundational material	Potentially some assignments based on the lectures.
Remainder of the course	Activities related to course projects	Development of a "full-blown" research project (although time restrictions may limit ambitions). For purposes, "interesting" and "well-thought-out" is more important than "successful".

Resources

Cornell's Passkey for your web browser: "When you’re off-campus, connect to databases, journals and e-books that would otherwise be restricted or hidden behind paywalls through Passkey."
Upcoming conference deadlines: ICWSM 2022: Sep 15 2021 or Jan 15 2022 :: The Web Conference (formerly WWW): Oct 14 2021 (abstract), Oct 21 2021 (full paper) :: ACL 2022: Nov 15 2021 :: NAACL 2022: Jan 15 to ARR :: CSCW 2022: Jan 15 2022 :: SIGDIAL 2022: not yet announced
Paper repositories: Papers With Code :: All ACL conferences, journals, workshops proceedings :: All WWW proceedings :: All CSCW proceedings :: All ICWSM proceedings
ACL wiki of resources: — corpora, datasets, tools, software, lexicons, organized by language
ConvoKit: Cornell Conversational Analysis Toolkit. Includes both tools and conversational datasets.
Books, surveys, and tutorials: Dan Jurafsky and James Martin, 2009: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (3rd edition draft chapters and slides) :: Jacob Eisenstein, 2017: A Technical Introduction to Natural Language Processing (book and slides) :: Dirk Hovy, 2020: Text Analysis in Python for Social Scientists (Cornell access) :: Yoav Goldberg, 2017: Neural Network Methods for Natural Language Processing (access via Cornell, JAIR version) :: Cristian Danescu-Niculescu-Mizil and Lillian Lee, 2016. Natural Language Processing for Computational Social Science. Invited Tutorial at NeurIPS. :: Atefeh Farzinder and Diana Inkpen, 2015: NLP for Social Media (access via Cornell, review by Annie Louis) :: Dong Nguyen, A. Seza Doğruöz, Carolyn P. Rosé and Franciska de Jong, 2016: Computational Sociolinguistics: A Survey. Computational Linguistics 42(3):537--593. :: Dirk Hovy and Diyi Yang, 2021: The Importance of Modeling Social Factors of Language: Theory and Practice. NAACL 588--602.
Toolkits, alphabetically: CMU twitter tools (Java) :: ConvoKit (Python) :: CRAN NLP tools (R) :: GATE (Java) :: Gensim (Python) :: Illinois tools (Java?) :: Lingpipe (Java) :: Mallet (Java) :: OpenNLP (Java) :: NLTK (Python) :: SpaCy (Cython) :: Stanford tools (Java) :: VADER (Valence Aware Dictionary and sEntiment Reasoner) (Python)
Pretrained word/sentence embeddings: a list by Sepehr Sameni
NLP at Cornell

Code for generating the calendar formatting adapted from Andrew Myers. Portions of the content of this website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple runnings of this course.

Enrollment, prerequisites, related classes

Administrative info

Overall course structure

Resources

Lectures