CS 6740/INFO 6300, Spring 2010:
Advanced Language Technologies (or, IR, NLP, and special guests)

Instructor: Prof. Lillian Lee.

Click here for an alternate version of this page with hierarchical tabs (requires Javascript)

Brief Overview

Philosophy, Spring 2010: This class is a graduate-level introduction to research fundamentals for information retrieval and natural language processing. The course focuses on the development and derivation of major ideas, and aims to promote research skills for students working in and outside of language technologies.While this course is thus not primarily a survey of the field, pointers to related/current work will be provided. Because of the wealth of Cornell machine-learning courses, learning is not an emphasis of this class (despite its immense importance in the field) to avoid overlap.

Prerequisites: (Firm) knowledge of elementary computer science, probability, and linear algebra. Neither CS/INFO 4300 nor CS/COGST/LING 4740 are prerequisites.

Administrative handouts


Click on the boldfaced word to access resources


Quick links to the first lecture of each section

The vector space model Evaluation, annotation, and experimental designRSJ probabilistic retrieval
Language modeling, in IR and elsewhere Relevance feedback Sentential structure (CFGs and TAGs)
Grammar induction and EM Discourse


0. (Jan 26) A prefatory lecture

  • Lecture slides
  • Handouts: course description and policies
  • The projects described in lecture correspond to these papers:
  • We'll be covering latent semantic indexing (LSI) itself in more depth later in the course. Here is some additional material and references on some of the other topics we covered.
  • The Vector-Space Model

    1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

    2. (Feb 2) length normalization (who'da thunk?)

    3. (Feb 4) pivoted document-length normalization

    →back to quick links

    Evaluation, Annotation, and Experimental Design

    4. (Feb 9) Evaluation: annotation and experimental design

    →back to quick links

    RSJ probabilistic retrieval

    5. (Feb 11) Introduction to (Robertson/Spärck Jones) probabilistic retrieval

    6. (Feb 16) RSJ probabilistic retrieval: binary models and the IDF

    7. (Feb 18) Two-Poisson models and BM weighting

    →back to quick links

    Language modeling, in IR and elsewhere

    8 (Feb 23) Intro to the language-modeling approach to IR

    9 (Feb 25) About query likelihood; relevance LMs

    10 (Mar 2) More on language models

    11 (Mar 4) The Good-Turing estimate

    12 (Mar 9) Smoothing; LM evaluation

    13 (Mar. 16) Zipf's law and Miller's monkeys

    →back to quick links

    Relevance feedback

    14 (Mar 30) Relevance feedback

    15 (Apr 1) Clickthrough data as implicit relevance feedback

    →back to quick links

    Sentential structure (CFGs and TAGs)

    16 (Apr 13) End relevance feedback; begin syntactic structure

    17 (Apr 15) Feature-based CFGs with unification constraints

    18 (Apr 20). Feature-based CFGs; TAGs

    19 (Apr 22) More on TAGs

    20 (April 27) Feature-based TAGs

    →back to quick links

    Grammar induction and EM

    21 (Apr 29) PCFGs and EM

    →back to quick links


    22. Finish EM, start discourse

    23 (May 6) Local and global theories of discourse

    →back to quick links