CS474 Introduction to Natural Language Processing


Fall 2006


Time: Tuesdays and Thursdays, 1:25-2:40
Place: 206 Hollister
Instructor: Professor Claire Cardie, 5161 Upson Hall
Final Exam: Weds, Dec 6, 7-9:30pm, Olin Hall 245
Office hours:
Claire: see top of her home page
Ves (for QA assignment): Mondays 2:30-3:30 and Wednesdays 3:00-4:00 (Upson 334)

Course Materials:

Course Management System (CMS): We'll be using the CS department course management system for submission of assignments, grading, etc.  You can get to CMS via the above link.  You'll need your Cornell netid and password.


  • Lillian Lee's list of general NLP resources

  • NLP resources available locally are listed under the local resources link of the Cornell NLP home page.

Course Description
A computationally-oriented introduction to natural language processing, the goal of which is to enable computers to use human languages as input, output, or both.  Possible topics include parsing, grammar induction, information retrieval, and machine translation.

Topics to be Covered (tentative)
Introduction to NLP
History and state-of-the-art
Lexical semantics and word-sense disambiguation
Information retrieval models
Text categorization
Part-of-speech tagging and HMMs
Noisy channel model
Language modeling
Question answering systems
Summarization systems
Discourse processing
Dialogue systems
Inference and world knowledge
Semantic analysis 
Information extraction 
Machine Translation 

Reference Material
The text book for the course is: 

Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000 and

Other useful references:

  • Christopher Manning and Hinrich Schutze. Foundations of Statistical NLP, MIT Press, 1999.
  • James Allen. Natural Language Understanding, 2nd edition. 
  • Eugene Charniak. Statistical Language Learning, MIT Press, 1996.
  • Robert Dale, Hermann Moisl and Harold Somers, eds. Handbook of Natural Langauge Processing, 2000.
  • Lucja M. Iwanska and Stuart C. Shapiro, eds. Natural Language Processing and Knowledge Representation, MIT Press, 2000.
  • Frederick Jelinek. Statistical Methods for Speech Recognition, MIT Press, 1998.
  • Roland R. Hausser. Foundations of Computational Linguistics: Human-Computer Communication in Natural Language, Springer Verlag, 2001.
Elementary computer science background.

  • 15%: critiques of selected readings and research papers
    Guidelines for writing critiques
  • 40%: programming assignments
  • 10%: midterm
  • 25%: final examination
  • 10%: participation 
    You'll be expected to participate in class discussion and class exercises or otherwise demonstrate an interest in the material studied in the course.

Academic Integrity
You are responsible for knowing and following Cornell's academic integrity policy. Absolute integrity is expected of every Cornell student in all academic undertakings; he/she must in no way misrepresent his/her work fraudulently or unfairly advance his/her academic status, or be a party to another student's failure to maintain academic integrity. The maintenance of an atmosphere of academic honor and the fulfillment of the provisions of this Code are the responsibilities of the students and faculty of Cornell University. Therefore, all students and faculty members shall refrain from any action that would violate the basic principles of this Code. Violation of the academic integrity policy will not be tolerated, and will result in an F in the course.

See the University Code of Academic Integrity and the Department Policy on Academic Integrity.