Prerequisites, course selection, enrollment
Prerequisites All of the following: CS 2110 or equivalent programming experience; a course in artificial intelligence or any relevant subfield (e.g., NLP, information retrieval, machine learning); proficiency with using machine learning tools (e.g., fluency at training an SVM, knowledge of how to assess a classifier’s performance using cross-validation)
Enrollment CS/IS PhD students may enroll online. Other students interested in adding the course, (wel)come to the first day of class. Enrollment questions will be addressed then, when we have a better sense of what the demand is and how many CS/IS PhD students are interested in taking the class.
Choosing among NLP courses: How do I know which one is right for me?
In 2016-2017, we are blessed with a plethora of NLP-related offerings!
Overview of course schedule. Details subject to change. Full schedule is maintained on the main course webpage.
Pilot empirical study for a research idea based on readings provided.
#2 - #4
Lecture topics related to the A1 readings: Online reviews: individual expression, community dynamics; Online asynchronous conversations.
Case studies to explore some topics and research styles find interesting. Get-to-know-you exercises to get everyone familiar and comfortable with each other.
Next 6 meetings, not counting presentations or discussions
Lectures on, potentially, linguistic coordination, linguistic adaptation, influence, persuasion, diffusion, discourse structure, advanced language modeling
Potentially some assignments based on the lectures.
Next large block of meetings
Dicussion of proposed projects based on the readings
Practice with fast research-idea generation. Feedback as to what proposals are most interesting, most feasible, etc.
Discussion of student project proposals, based on the readings for that class meeting. Each class meeting thus involves everyone reading at least one of the two assigned papers and posting a new research proposal based on the reading to Piazza.
Thoughtfulness and creativity are most important to , but take feasibility into account.
Remainder of the course
Activities related to course projects
Development of a "full-blown" research project (although time restrictions may limit ambitions). For our purposes, "interesting" is more important than "thorough".
Some time in December (to be determined by the registrar): final project writeup due
Grading Of most interest to is productive research-oriented discussion participation (in class and on Piazza), interesting research proposals and pilot studies, and a good-faith final research project.
Academic Integrity Academic and scientific integrity compels one to properly attribute to others any work, ideas, or phrasing that one did not create oneself. To do otherwise is fraud.
We emphasize certain points here. In this class, talking to and helping others is strongly encouraged. You may also, with attribution, use the code from other sources. The easiest rule of thumb is, acknowledge the work and contributions and ideas and words and wordings of others. Do not copy or slightly reword portions of papers, Wikipedia articles, textbooks, other students' work, Stack Overflow answers, something you heard from a talk or a conversation or saw on the Internet, or anything else, really, without acknowledging your sources. See http://www.cs.cornell.edu/courses/cs6742/2011sp/handouts/ack-others.pdf and http://www.theuniversityfaculty.cornell.edu/AcadInteg/ for more information and useful examples.
This is not to say that you can receive course credit for work that is not your own — e.g., taking someone else's report and putting your name at the top, next to the other person(s)' names. However, violations of academic integrity (e.g., fraud) undergo the academic-integrity hearing process on top of any grade penalties imposed, whereas not following the rules of the assignment only risk grade penalties.
ACL anthology of all conferences, journals and workshops published under the aegis of the Association for Computational Linguistics; ACM digital library proceedings publication archive for WWW; AAAI proceedings archive for ICWSM
Taraborelli, Dario and Giovanni Luca Ciampaglia. Beyond notability. Collective deliberation on content inclusion in Wikipedia. Second international workshop on quality in techno-social systems, pp. 122-125. [alt link]
Section 24.7.2 of Jurafsky, Daniel and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition. 2nd edition. [chapter link at UCSC]
Sep 22: Case study: from hypothesis to research
Flesch, Rudolf. June 1948.A new readability yardstick. Journal of Applied Psychology 32(3): 221-33. [Alternative link: the paper is bundled is the collection The Classic Readability Studies, ed. William H. DuBay. Published as Unlocking Language: The Classic Studies in Readability, BookSurge Publishing, 2007.
MRC Psycholinguistic database. Wilson, M.D. (1988) The MRC Psycholinguistic Database: Machine Readable Dictionary, Version 2. Behavioural Research Methods, Instruments and Computers, 20(1), 6-11. A search interface makes it clear what kind of features (annotations) are identified for the lexicon items.
F. Jelinek, R.L. Mercer and S. Roukos. Principles of Lexical Language Modeling for Speech Recognition. Advances in Speech Signal Processing, S. Furui and J. Sondhi, Eds. M. Dekker Publishers, New York, NY 1991. Pp.651-700
Gale, William A. and Kenneth W. Church. 1994.What's wrong with adding one. Corpus-based Research Into Language: In Honour of Jan Aarts, pp. 189--200.
Nov 10: Entropy and Divergence
Next week we'll have mandatory project progress-and-problems appointments. By 2pm the afternoon before your progress-and-problems appointment day, post a Piazza followup to your proposal that summarizes your progress and what discussion points or problems you'd like to bring up with me. Ideally, this followup post will be the agenda for your team's appointment, and will make the meeting efficient and useful for you.
Nov 29: Project presentations (mandatory attendance by all students for the whole session)
Schedule on Piazza. Starting at 1:15pm
Dec 1: Project presentations(mandatory attendance by all students for the whole session)
Schedule on Piazza. Starting at 1:15pm
Final project description due: 12/09/16 4:30 PM (date determined by the registrar)
The main evaluation criteria will be the reasonableness (in approach and amount of effort), thoughtfulness, and creativity of what you tried, as documented in your writeup. Individual effort within team projects will be taken into account; see item 3 below.
We make this requirement to facilitate submission to ICWSM 2017. However, note that your final-project submission should have your names and acknowledgments included, in a particular format (see item 1c amd 2b below); in contrast, you will want to strip any identifying information for ICWSM submissions.
AAAI prefers non-numbered section headings. You may change the style files to include section numbers in your headings for the purposes of CS6742 submission.
For the author heading, list only the names of your teammates that are enrolled in the class, even if you had external collaborators. (Reason: only students in the class are submitting the paper for a grade.) But see item 2b1 below.
Include the following sections:
"content" sections: abstract, introduction/motivation, data description (how you gathered, cleaned, and processed it), methods, an experiments, related work, references, conclusions (what you learned), directions for future work.
Make sure that your introduction section explicitly sets out your hypothesis or hypotheses.
Throughout, highlight your most interesting findings (positive or negative).
For the purposes of CS6742 submission, your related-work section does not need to be exhaustive; you may cover just a few most-related papers.
An "acknowledgments" section: give the name and state the contribution of those who you received significant help from. (This may or may not include your advisor(s), your instructor, fellow students in the class).
Authorship statement: if you intend to ask or have already arranged to have people other than your CS6742-enrolled teammates, also name each such person.
Projects done collaboratively must also include a section describing who did what. External collaborators should be included in this enumeration.
Use the number of pages you feel is appropriate.
Code for generating the calendar formatting
adapted from the original versions created by