If you're looking for something other than lecture content and have javascript enabled, click on the appropriate tab above.
The tabs may take a little time to come up.
Prerequisites, enrollment, related classes
Prerequisites All of the following: CS 2110
or equivalent programming experience;
a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning,
Cornell CS courses numbered 47xx or 67xx);
proficiency with using machine learning tools
(e.g., fluency at training an SVM, comfort with assessing a classifier’s performance using cross-validation)
Enrollment Limited to [[PhD and [CS MS] students] who meet the prerequisites]. Auditing (either officially or unofficially) is not permitted.
CMS pagehttp://cmsx.csuglab.cornell.edu.
Site for submitting assignments, unless otherwise noted.
You may find this graphically-oriented guide to common operations useful: see how to replace a prior submission (point 1), how to tell if CMS successfully received your files (point 2), how to form a group (point 4).
Course discussion sitehttps://blogs.cornell.edu/nlpsoc2017fa
(access restricted to enrolled students).
Course announcements and Q&A/discussion site.
Social interaction and all that, you know.
Office hours and contact info
See Prof. Lee's homepage and scroll to the section on Contact and availability info.
Grading Of most interest to is productive research-oriented discussion
participation (in class and/or on the course discussion site, interesting research proposals and pilot studies,
and a good-faith final research project.
Academic Integrity Academic and scientific integrity compels one to properly attribute to
others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.
Certain points deserve emphasis here.
In this class, talking to and helping others is strongly encouraged.
You may also, with attribution, use the code from other sources.
The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others.
Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers,
something you heard from a talk or a conversation or saw on the Internet,
or anything else, really, without acknowledging your sources.
See "Acknowledging the Work of Others" in
The Essential Guide to Academic Integrity at Cornell
and
http://www.theuniversityfaculty.cornell.edu/AcadInteg/
for more information and useful examples.
This is not to say that you can receive course credit for work that is not your own —
e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names.
However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on
top of any grade penalties imposed,
whereas not following the rules of the assignment “only” risks grade penalties.
Overall course structure
Lecture
Agenda
Pedagogical purpose
Assignments
#1
Course overview
A1 released: pilot empirical study for a research idea based on the given readings.
#2 - #4
Lectures on topics related to the A1 readings
Case studies to explore some topics and research styles find interesting.
Get-to-know-you exercises to get everyone familiar and comfortable with each other.
Next block of meetings
Dicussion of proposed projects based on the readings
Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.
Discussion of student project proposals, based on the readings for that class meeting.
Each class meeting involves everyone reading at least one of the two assigned papers
and posting a new research proposal based on the reading to the course discussion site.
Thoughtfulness and creativity are most important to , but take feasibility into account.
Next block of meetings
Lectures on, potentially, linguistic coordination, linguistic adaptation, influence,
persuasion, diffusion, discourse structure, advanced language modeling.
Foundational material
Potentially some assignments based on the lectures.
Remainder of the course
Activities related to course projects
Development of a "full-blown" research project (although time restrictions may limit ambitions).
For purposes, "interesting" is more important than "thorough".
Resources
Cornell's Passkey
for your web browser: "If you find yourself on a web page that has access
restrictions, click on the bookmarklet icon and you will be redirected to
the Cornell Web log-in screen to check for your valid Cornell affiliation.
You will be automatically led to the page you were trying to read, this
time recognized for your right to gain access to the library's licensed
resources."
Upcoming conference deadlines:
NAACL 2018, long paper deadline Dec 15th, short paper deadline Jan 10 ::
ICWSM 2018: deadline not yet announced, expected early
Jan ::
ACL 2018: Feb 22 ::
CSCW 2018: second deadline during spring 2018 ::
SIGDIAL 2018: not yet announced ::
ACL wiki of resources —
corpora, datasets, tools, software, lexicons, organized by language
Books, surveys, and tutorials:
Dan Jurafsky and James Martin, 2009:
Speech and Language Processing:
An Introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition
(3rd edition draft chapters and slides) ::
Jacob Eisenstein, 2017:
A Technical Introduction to Natural Language Processing
(book and slides) ::
Cristian Danescu-Niculescu-Mizil and Lillian Lee, 2016.
Natural
Language Processing for Computational Social Science. Invited Tutorial at NIPS. ::
Atefeh Farzinder and Diana Inkpen, 2015:
NLP for Social Media
(access
via Cornell, review
by Annie Louis) ::
Yoav Goldberg, 2017:
Neural Network Methods for Natural Language Processing
(access
via Cornell,
JAIR version) ::
Dong Nguyen, A. Seza Doğruöz, Carolyn P. Rosé and Franciska de Jong, 2016:
Computational Sociolinguistics: A Survey.
Computational Linguistics 42(3):537--593.
Bryan, Christopher J., Gregory M. Walton, Todd Rogers, and Carol S. Dweck. 2011.
Motivating voter turnout by invoking the self.
Proceedings of the National Academy of Sciences
108 (31): 12653-12656.
Chong, Dennis and James N. Druckman. 2007.
Framing theory.
Annual Review of Political Science
10:103–26.
Hopkins, Daniel J. 2017. The exaggerated life of death panels?
The limited but real influence of elite rhetoric in the 2009–2010
health care debate. Policital Behavior.
[official link]
["ungated" version]
Related quote: "There is no such thing as conversation. There are intersecting monologues, that's all".
Rebecca West's short story, "There is no conversation".
Danescu-Niculescu-Mizil, Cristian, Lillian Lee, Bo Pang, and Jon Kleinberg.
2012.
Echoes of power: Language effects and power differences in social interaction.
WWW, pp. 699--708.
[ACM link]
[
paper "homepage" (paper, slides, data, etc.)]
#5 Sep 5:
Real-time measurement of coordination; A1 check-ins
Assignments/announcements
Life can be easier:
View discussion-site comments in reverse-chronological order by
clicking on the speech balloon in the top bar
Cornell's
Passkey for accessing restricted content in your browser.
(So I will stop posting Cornell-access-specific URLs.)
Remember what we talked about sharing on the course discussion site!
References
Boyd-Graber, Jordan, David Mimno, and David Newman. 2014.
Care and
feeding of topic models: Problems, diagnostics, and improvements.
In Handbook of Mixed Membership Models and Their Applications
[
Author-posted version]
On the cosine measure being still vulnerable to length effects:
Notes by Lakshmi Ganesh and Navin Sivakumar from a lecture by
Lillian Lee on pivoted document length normalization, Spring 2010.
Original paper, stating that "cosine normalization tends to favor short
documents in retrieval": Singhal, Amit, Chris Buckley, and
Mandar Mitra. 1996. Pivoted document
length normalization. SIGIR, 21--29 [
author-posted version]
See also Singhal, Amit, Gerard Salton,
Mandar Mitra and Chris
Buckley. 1996. Document length
normalization, Information Processing & Management
32(5):619–633. Special isssue on the history of information science.
Further reading can be found in the reference list of Lecture 3 of
CS6740, spring 2010.
#6 Sep 7: Appointments (see email for signup link)
#7 Sep 12: A1 presentations
#8 Sep 14: News, influence and information propagation, part 1
Assignments/announcements
Heads-up: final-project proposals due Fri Oct. 6 11:59pm
"Special Report With Brit Hume", September 10, 2008:
panel-discussion transcript
regarding Obama's "lipstick on a pig" utterance
#9 Sep 19: News, influence and information
propagation, part 2
Class images, links and handouts
ICWSM 2011 Spinn3r dataset:
"386 million blog posts, news articles,
classifieds, forum posts and social media content between January
13th and February 14th"
Prabhumoye, Shrimai, Samridhi Choudhary, Evangelia Spiliopoulou,
Christopher Bogart, Carolyn Penstein Rosé, and Alan W. Black. 2017.
Linguistic
markers of influence in informal interactions. In the Workshop on
Natural Language Processing and Computational Social Science,
53--62.
Past CS/IS 6742 projects on related topics that became publications:
Fu, Liye, Cristian Danescu-Niculescu-Mizil, and Lillian Lee.
2016. Tie-breaker: Using language models to quantify gender bias
in sports journalism. In IJCAI Workshop on NLP
Meets Journalism. Best paper award.
[paper homepage]
[
writeup in the New York Times' Upshot section
]
Yoder, Michael, Shruti Rijhwani, Carolyn Rosé, and Lori Levin.
2017. Code-Switching as a social act: The case of Arabic
Wikipedia talk pages. In the Second
Workshop on NLP and
Computational Social Science, 73-82. In the interests of
time, you can skip section 4.2 ("Language Identification") and
all of section 6.1 ("CS Type") except read the
paragraph about the "challenge" type.
Weber, Max. 1922 (translation date: 1978). Class, Status, Party. In Economy
and Society. Berkeley: University of California Press.
[an abridged version] "In general, we understand by
'power' the chance of a man or of a number of men to realize their own will in a
communal action even against the resistance of others who are participating in the action."
Lim, Kenneth (who took this class!).
fightin-words 1.0.4.
Compliant with sci-kit learn and distributed by PyPI; borrows (with acknowledgment)
from Jack's version.
An example reference for deriving the maximum-likelihood estimate (and a little about Dirichlet priors) for the multinomial is the following slide set: Ronald Williams, CSG 200, Spring 2007 Maximum Likelihood vs. Bayesian Parameter Estimation. Some of the slides from there were taken with attribution from "apparently ... Nir Friedman", PGM: Tirgul 10, Parameter Learning and Priors, which goes into more depth and covers more topics.
#18 Oct 24: Feasibility-check appointments
Assignments/announcements
Only come to class during your scheduled appointment; see
Phase 3 of A5.
#19 Oct 26: Language modeling and differences between language models, cont.
#21 Nov 2: Foreshadowing: some connections between information theory and psycholinguistics; the Brown clustering algorithm for deriving structure of vocabulary space
Coincidence: today's news (10:53am) about Robert Mercer:
Bloomberg
Lecture references
Brown, Peter F., Vincent J. Della Pietra, Peter V. DeSouza, Jennifer C. Lai, and Robert L. Mercer. 1992.
Class-based n-gram models of natural language.
Computational Linguistics 18(4): 467-479.
#22 Nov 7: Local structure: phrase and sentence space
Assignments/announcements
A6: (due Fri Nov. 10, 11:59pm): post as a comment to your final project posting your planned project schedule from now until Dec 11th (the project due date)
Knight, Kevin. 1999.
A statistical MT Tutorial Workbook.
"The basic text that this tutorial relies on is Brown et al., “The Mathematics of Statistical Machine Translation”, Computational Linguistics, 1993.
On top of this excellent presentation, I can only add some perspective and perhaps some sympathy for the poor reader, who has (after all) done nothing wrong."
Left: Garry Kasparov, Maurice Ashley, Yasser Seirawan and a bunch of soft drinks at the 1996 match against Deep Blue.
Photo by Kenneth Thompson,
provided at computerhistory.org
Right: Muarice Ashley and Yasser Seirawan commentating on the 1997 re-match. Photo by Monroe Newborn, provided at
computerhistory.org
Section 24.1.5 of Jurafsky, Daniel and James H. Martin. 2009.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition
(there does not seem to be an electronic version of the third edition's relevant chapter available)
[chapter link at UCSC]
#26 Nov 21: Latent discourse/dialog structure, part three
Assignments/announcements
Class images, links and handouts
Pinker, Steven and the Royal Society for the Encouragement of Arts, Manufactures and Commerce (RSA) Animate, posted to YouTube on Feb 10, 2011. Language as a Window into Human Nature
Section 27.4.2 of Jurafsky, Daniel and James H. Martin. 2009.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edition
(there does not seem to be an electronic version of the third edition's relevant chapter available)
[chapter link at UCSC]
#27 Nov 28: Project presentations (attendance by all is mandatory)
Assignments/announcements
A7 posted on CMS, due Mon Dec. 11, 4:30pm (date determined by the registrar).
Submit both your presentation materials and your final project writeup; but don't
spend time post-editing your presentation materials after the fact,
as I will only be using them as a reference while evaluating your
writeup.
The main evaluation criteria will be the reasonableness (in approach and amount of effort), thoughtfulness, and creativity of what you tried, as documented in your writeup. Individual effort within team projects will be taken into account; see item 3 below.
For the author heading, list only the names of your teammates that are
enrolled in the class, even if you had external collaborators.
(Reason: only students in the class are submitting the paper for a grade.)
But see item 2bi below.
Include the following sections:
"content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, experiments/results, related work, conclusions (what you learned), directions for future work, references
Make sure that your introduction section explicitly sets out your hypotheses/research questions.
Throughout, highlight your most interesting findings (positive or negative).
For the purposes of CS/IS 6742 submission, your related-work section does not need to be exhaustive; you may cover just a few most-related papers.
An "acknowledgments" section: give the name and state the contribution of those who you received significant help from. (This may or may not include your advisor(s), one or both of your instructors, fellow students in the class).
Authorship statement: if you intend to ask or have already arranged to have people other than your CS6742-enrolled teammates, also name each such person.
Projects done collaboratively must also include a section describing who did what. External collaborators should be included in this enumeration.
Class images, links and handouts
References
#28 Nov 30: Project presentations (attendance by all is mandatory)
Assignments/announcements
Class images, links and handouts
Lecture references
Mon Dec. 11, 4:30pm: Final project writeup due
Code for generating the calendar formatting
adapted from Andrew Myers. Portions of the content of this
website and course were created by collaboration between Cristian Danescu-Niculescu-Mizil and Lillian Lee over multiple
runnings of this course.