CS 6740/INFO 6300, Spring 2010:
Advanced Language Technologies (or, IR, NLP, and special guests)

Instructor: Prof. Lillian Lee. Office hours: usually Fridays 10:45-11:45 or by appointment, but please check my webpage, http://www.cs.cornell.edu/home/llee, for updates.
Lectures: TR 10:10–11:25, Upson 315
Midterm: March 18th, in class; Final: Thursday May 13, 2-4:30pm, Upson 315

The content below renders in tab-organized format if javascript is enabled. This page sets a cookie so that the next time you return to this page (if ever ...), the tab from which you left will be open.

Administration

Brief overview

Philosophy, Spring 2010: This class is a graduate-level introduction to research fundamentals for information retrieval and natural language processing. The course focuses on the development and derivation of major ideas, and aims to promote research skills for students working in and outside of language technologies.While this course is thus not primarily a survey of the field, pointers to related/current work will be provided. Because of the wealth of Cornell machine-learning courses, learning is not an emphasis of this class (despite its immense importance in the field) to avoid overlap.

Please see the Course description and policies handout for more information.

Tentative syllabus: See “Lectures” section/tab.

Prerequisites: (Firm) knowledge of elementary computer science, probability, and linear algebra. Neither CS/INFO 4300 nor CS/COGST/LING 4740 are prerequisites.

Administrative handouts

General resources

Homepage for my previous running of this course   click here for Fall 07

Reference texts

Pointers to papers

    Alistair Moffat, Justin Zobel, and David Hawking, Recommended reading for IR research students, SIGIR Forum 39(2):3–14, 2005. [pdf,pdf2]
Cornell IR/NLP courses

Datasets and software:

0. (Jan 26) A prefatory lecture

  • Lecture slides
  • Handouts: course description and policies
  • The projects described in lecture correspond to these papers:
  • We'll be covering latent semantic indexing (LSI) itself in more depth later in the course. Here is some additional material and references on some of the other topics we covered.
  • 1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

    0. (Jan 26) A prefatory lecture

  • Lecture slides
  • Handouts: course description and policies
  • The projects described in lecture correspond to these papers:
  • We'll be covering latent semantic indexing (LSI) itself in more depth later in the course. Here is some additional material and references on some of the other topics we covered.
  • 1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

    1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

    2. (Feb 2) length normalization (who'da thunk?)

    3. (Feb 4) pivoted document-length normalization

    1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

    2. (Feb 2) length normalization (who'da thunk?)

    3. (Feb 4) pivoted document-length normalization

    4. (Feb 9) Evaluation: annotation and experimental design

    5. (Feb 11) Introduction to (Robertson/Spärck Jones) probabilistic retrieval

    • Lecture guide by Ellis Weng and Andrew Owens
    • Lecture references:
      • Stephen Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–46 (1976). The probabilistic argument is presented in the appendix.
    • Additional references:
      • Probabilistic information retrieval chapter of the Manning/Raghavan/Schütze book.
      • Michael D. Gordon and Peter Lenk. When is the probability ranking principle suboptimal? Journal of the American Society for Information Science 43:1–14 (1992).
      • C. J. “Keith” van Rijsbergen. The emergence of probabilistic accounts of information retrieval. Pp. 23–38 in John I. Tait. ed., Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
      • Stephen Robertson and Hugo Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends in Information Retrieval 3(4):pp. 333–389, 2009. Covers the “class” development of the material in the next few lectures.
      • Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). html of preface, pdf of Part 1, pdf of Part 2

    6. (Feb 16) RSJ probabilistic retrieval: binary models and the IDF

    7. (Feb 18) Two-Poisson models and BM weighting

    5. (Feb 11) Introduction to (Robertson/Spärck Jones) probabilistic retrieval

    • Lecture guide by Ellis Weng and Andrew Owens
    • Lecture references:
      • Stephen Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–46 (1976). The probabilistic argument is presented in the appendix.
    • Additional references:
      • Probabilistic information retrieval chapter of the Manning/Raghavan/Schütze book.
      • Michael D. Gordon and Peter Lenk. When is the probability ranking principle suboptimal? Journal of the American Society for Information Science 43:1–14 (1992).
      • C. J. “Keith” van Rijsbergen. The emergence of probabilistic accounts of information retrieval. Pp. 23–38 in John I. Tait. ed., Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
      • Stephen Robertson and Hugo Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends in Information Retrieval 3(4):pp. 333–389, 2009. Covers the “class” development of the material in the next few lectures.
      • Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). html of preface, pdf of Part 1, pdf of Part 2

    6. (Feb 16) RSJ probabilistic retrieval: binary models and the IDF

    7. (Feb 18) Two-Poisson models and BM weighting

    8 (Feb 23) Intro to the language-modeling approach to IR

    9 (Feb 25) About query likelihood; relevance LMs

    • Lecture guide by Navin Sivakumar, Lakshmi Ganesh, and Taiyang Chen
    • Lecture references:

    10 (Mar 2) More on language models

    11 (Mar 4) The Good-Turing estimate

    • Lecture guide, by Ellis Weng and Andrew Owens
    • Lecture references:
      • I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika 40: 237–264 (1953). The first page contains the following passage: “Our results are applicable, for example, to studies of literary vocabulary, of accident proneness and of chess openings, but for definiteness we formulate the theory in terms of species of animals”. .
      • Frederick Jelink and Robert Mercer. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin 28: 2591–2594 (1985).
      • Arthur Nádas. On Turing's formula for word probabilities. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(6):1414–1416, 1985.
    • Additional references:
      • Frederick Jelinek, Chapter 15 “Estimation of probabilities from counts and the back-off method” of Statistical methods for speech recognition, MIT Press (1997). Google books preview
      • Alon Orlitsky, Narayana P. Santhanam, and Junan Zhang give Codebreakers: The Inside Story of Bletchley Park by Hinsley and Stripp as a source for asserting that Good and Turing were working on deciphering the Enigma code.

    12 (Mar 9) Smoothing; LM evaluation

    • Handouts: Sample midterm (from 2006). Also available are brief solution notes (see section 2 of document)
    • Lecture references:
      • Frederick Jelinek and Robert L. Mercer. Interpolated estimation of Markov source parameters from sparse data. Workshop on Pattern Recognition in Practice, 1980.
      • Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3):400–401 (1987).
    • Additional references:

    13 (Mar. 16) Zipf's law and Miller's monkeys

    • handout
    • Lecture references:
      • J.B. Estoup, 1916. Gammes Stenographiques. Institut Stenographique de France.
      • W. Nelson Francis and Henry Kucera, 1982. Frequency Analysis of English Usage. Houghton Mifflin. Also of potential interest: the Brown corpus manual, 1979; and a search interface.
      • Benoit Mandelbrot, 1957. Théorie mathématique de la d'Estoup-Zipf. Inst. de Statistique de l'Univèrsité.
      • George A. Miller, 1957. Some effects of intermittent silence. American Journal of Psychology 70, pp. 311-313.
      • George Zipf, 1949. Human Behavior and the principle of least effort. Addison-Wesley Press.
      • See also section 1.4 of the Manning/Schütze text or chapter 4 of Timothy C. Bell, John G. Cleary, and Ian H. Witten, Text Compression, 1990.
    • Additional references:
      • Zipf's Law (webpage by Wentian Li). Many Zipf's Law refs in many fields.

    8 (Feb 23) Intro to the language-modeling approach to IR

    9 (Feb 25) About query likelihood; relevance LMs

    • Lecture guide by Navin Sivakumar, Lakshmi Ganesh, and Taiyang Chen
    • Lecture references:

    10 (Mar 2) More on language models

    11 (Mar 4) The Good-Turing estimate

    • Lecture guide, by Ellis Weng and Andrew Owens
    • Lecture references:
      • I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika 40: 237–264 (1953). The first page contains the following passage: “Our results are applicable, for example, to studies of literary vocabulary, of accident proneness and of chess openings, but for definiteness we formulate the theory in terms of species of animals”. .
      • Frederick Jelink and Robert Mercer. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin 28: 2591–2594 (1985).
      • Arthur Nádas. On Turing's formula for word probabilities. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(6):1414–1416, 1985.
    • Additional references:
      • Frederick Jelinek, Chapter 15 “Estimation of probabilities from counts and the back-off method” of Statistical methods for speech recognition, MIT Press (1997). Google books preview
      • Alon Orlitsky, Narayana P. Santhanam, and Junan Zhang give Codebreakers: The Inside Story of Bletchley Park by Hinsley and Stripp as a source for asserting that Good and Turing were working on deciphering the Enigma code.

    12 (Mar 9) Smoothing; LM evaluation

    • Handouts: Sample midterm (from 2006). Also available are brief solution notes (see section 2 of document)
    • Lecture references:
      • Frederick Jelinek and Robert L. Mercer. Interpolated estimation of Markov source parameters from sparse data. Workshop on Pattern Recognition in Practice, 1980.
      • Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3):400–401 (1987).
    • Additional references:

    13 (Mar. 16) Zipf's law and Miller's monkeys

    • handout
    • Lecture references:
      • J.B. Estoup, 1916. Gammes Stenographiques. Institut Stenographique de France.
      • W. Nelson Francis and Henry Kucera, 1982. Frequency Analysis of English Usage. Houghton Mifflin. Also of potential interest: the Brown corpus manual, 1979; and a search interface.
      • Benoit Mandelbrot, 1957. Théorie mathématique de la d'Estoup-Zipf. Inst. de Statistique de l'Univèrsité.
      • George A. Miller, 1957. Some effects of intermittent silence. American Journal of Psychology 70, pp. 311-313.
      • George Zipf, 1949. Human Behavior and the principle of least effort. Addison-Wesley Press.
      • See also section 1.4 of the Manning/Schütze text or chapter 4 of Timothy C. Bell, John G. Cleary, and Ian H. Witten, Text Compression, 1990.
    • Additional references:
      • Zipf's Law (webpage by Wentian Li). Many Zipf's Law refs in many fields.

    14 (Mar 30) Relevance feedback

    • Lecture references:
    • Additional references:
      • Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton, ed., The SMART retrieval system — experiments in automatic document processing, pp. 355–370 (1971).
      • Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. SIGIR, pp. 213–220 (2003). Best paper award.
      • Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. JASIS 41(4): 288–297 (1990).

    15 (Apr 1) Clickthrough data as implicit relevance feedback

    16 (Apr 13) End relevance feedback; begin syntactic structure

    14 (Mar 30) Relevance feedback

    • Lecture references:
    • Additional references:
      • Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton, ed., The SMART retrieval system — experiments in automatic document processing, pp. 355–370 (1971).
      • Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. SIGIR, pp. 213–220 (2003). Best paper award.
      • Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. JASIS 41(4): 288–297 (1990).

    15 (Apr 1) Clickthrough data as implicit relevance feedback

    16 (Apr 13) End relevance feedback; begin syntactic structure

    16 (Apr 13) End relevance feedback; begin syntactic structure

    17 (Apr 15) Feature-based CFGs with unification constraints

    18 (Apr 20). Feature-based CFGs; TAGs

    • Handouts: feature-based CFG example rules
    • Lecture references:
      • Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
    • Additional references:

    19 (Apr 22) More on TAGs

    20 (April 27) Feature-based TAGs

    16 (Apr 13) End relevance feedback; begin syntactic structure

    17 (Apr 15) Feature-based CFGs with unification constraints

    18 (Apr 20). Feature-based CFGs; TAGs

    • Handouts: feature-based CFG example rules
    • Lecture references:
      • Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
    • Additional references:

    19 (Apr 22) More on TAGs

    20 (April 27) Feature-based TAGs

    21 (Apr 29) PCFGs and EM

    22. Finish EM, start discourse

    21 (Apr 29) PCFGs and EM

    22. Finish EM, start discourse

    22. Finish EM, start discourse

    23 (May 6) Local and global theories of discourse

    22. Finish EM, start discourse

    23 (May 6) Local and global theories of discourse