CS674 Natural Language Processing

 

Spring 2003

 

Time: Mondays and Wednesdays, 11:15-12:05
Place: Hollister 110
Instructor: Claire Cardie, 5161 Upson Hall, office hours: Tuesday 3-4, Thursday 1-2


Course Materials:

Resources:

  • Lillian Lee's list of general NLP resources

  • NLP resources available locally are listed under the local resources link of the Cornell NLP home page.

Course Description
This course presents a graduate-level introduction to natural language processing, the primary concern of which is the study of human language use from a computational perspective. The course covers syntactic analysis, semantic interpretation, and discourse processing, examining both symbolic and statistical approaches. Possible topics include information extraction, natural language generation, memory models, ambiguity resolution, finite-state methods, mildly context-sensitive formalisms, deductive approaches to interpretation, machine translation, and machine learning of natural language.

Syllabus (tentative)
Introduction (1 lecture)
History and state-of-the-art (1 lecture)
Morphology (3 lectures)
Noisy channel model (1 lecture)
Context-sensitive spelling correction (1 lecture)
Pronunciation variation in speech recognition (1 lecture)
Language modeling (4 lectures)
Lexical semantics and WSD (4 lectures)
EM (1 lecture)
Part-of-speech tagging and HMMs (1 lecture)
Parsing (2 lectures)
Discourse processing (2 lectures)
Generation  (1 lecture)
Inference and World Knowledge (1 lecture)
Semantic analysis 
Information extraction 
Machine Translation 

Reference Material
The recommended text book for the course is:  Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000.

Other useful references:

  • Christopher Manning and Hinrich Schutze. Foundations of Statistical NLP, MIT Press, 1999.
  • James Allen. Natural Language Understanding, 2nd edition. 
  • Eugene Charniak. Statistical Language Learning, MIT Press, 1996.
  • Robert Dale, Hermann Moisl and Harold Somers, eds. Handbook of Natural Langauge Processing, 2000.
  • Lucja M. Iwanska and Stuart C. Shapiro, eds. Natural Language Processing and Knowledge Representation, MIT Press, 2000.
  • Frederick Jelinek. Statistical Methods for Speech Recognition, MIT Press, 1998.
  • Roland R. Hausser. Foundations of Computational Linguistics: Human-Computer Communication in Natural Language, Springer Verlag, 2001.
Prerequisites
Elementary computer science background, elementary knowledge of probability, familiarity with context-free grammars.

Grading
  • 30%: critiques of selected readings and research papers
    Guidelines for writing critiques
  • 60%: final project.  Grade based on (1) preliminary project proposal, (2) project literature survey, (3) project presentation, (4) final write-up. You are to complete an independent project on some topic in natural language processing. Both programming and non-programming projects are fine.  All must include a careful write-up. 
  • 10%: participation 
    You'll be expected to participate in class discussion and class exercises or otherwise demonstrate an interest in the material studied in the course.

Academic Integrity
You are responsible for knowing and following Cornell's academic integrity policy. Absolute integrity is expected of every Cornell student in all academic undertakings; he/she must in no way misrepresent his/her work fraudulently or unfairly advance his/her academic status, or be a party to another student's failure to maintain academic integrity. The maintenance of an atmosphere of academic honor and the fulfillment of the provisions of this Code are the responsibilities of the students and faculty of Cornell University. Therefore, all students and faculty members shall refrain from any action that would violate the basic principles of this Code. Violation of the academic integrity policy will not be tolerated, and will result in an F in the course.

See the University Code of Academic Integrity and the Department Policy on Academic Integrity.