CS 674/INFO 630, Fall 2007: Advanced Language Technologies (or, IR, NLP, and special guests;)

Prof. Lillian Lee (follow link for contact information and office hours)
TR 1:25-2:40, Hollister 312.
Final exam: Thursday, December 13, 2-4:30pm, Thurston 202.

Short description: This course is a graduate-level introduction to research fundamentals for information retrieval and natural language processing. Please see the Course description and policies handout for more information.

Tentative syllabus: Three fundamental paradigms in information retrieval: the vector-space model; the (Robertson-Spärck Jones) probabilistic retrieval paradigm; the language-modeling approach. Relevance feedback, explicit and implicit. Latent Semantic Indexing (LSI). Feature-based context-free grammars (CFGs). Tree adjoining grammars (TAGs). Parsing. The Expectation-Maximization (EM) algorithm. Maximum-entropy modeling.

Administrative info:

Resources: (reference texts, other lecture notes and slides, etc.)

The homepage for the previous running of this course may also be useful.

Lectures:


Quick links: starts of the units on: the vector space model (8/28/07); RSJ probabilistic retrieval (9/11/07); language-modeling approaches to IR (9/20/07); relevance feedback (10/2/07); implicit relevance feedback (10/11/07); the SVD and LSI (10/30/07); CFGs (11/8/07); TAGs (11/15/07); EM (11/27/07)
  1. 8/23/07: "Sense and Sensibility: Automatically Analyzing Subject and Sentiment in Human-Authored Texts"
    (a prefatory lecture; students are not responsible for the material that was covered)


  2. 8/28/07: Basics of information retrieval; the vector-space model


  3. 8/30/07: length normalization (who'da think?)


  4. 9/4/07: pivoted document-length normalization


  5. 9/6/07: remarks on evaluation (of SBM SIGIR '96, and beyond)


  6. 9/11/07: Introduction to probabilistic retrieval (Robertson-Spärck Jones version)


  7. 9/13/07: Probabilistic retrieval with binary attribute variables: derivations of the IDF


  8. 9/18/07: Probabilistic retrieval with attribute-count variables: Poisson-based models


  9. 9/20/07: Derivation of BM/Okapi term weighting. Intro to the language-modeling paradigm for IR


  10. 9/25/07: More on the LM approach


  11. 9/27/07: Completion of the LM approach


  12. 10/2/07: Introduction to relevance feedback


  13. 10/4/07: Automatic query expansion (AQE) and interactive query expansion (IQE)
  14. 10/11/07: Implicit feedback. Further notes on language models (in preparation for combining them with implicit feedback).
  15. 10/16/07: More on language models, in preparation for combining them with relevance feedback
  1. 10/23/07 An LM-based approach to implicit feedback
  2. 10/25/07: Clickthrough data as implicit feedback
  3. 10/30/07: The term-document matrix, revisited (preparation for the SVD)
  4. 11/1/07: The singular value decomposition and “approximating ± convex hulls ”
  5. 11/6/07: "Few-factor" representations (LSI, pLSI, and others)
  6. 11/8/07: Modeling syntactic structure: context-free grammars
  7. 4/13/06: Augmented context-free grammars
  8. 11/15/07: Introduction to tree-adjoining grammars
  9. 11/20/07: adjunction constraints and feature-based TAGS
  10. 11/27/07: The EM algorithm, part one
  11. 11/29/07: Conclusion of EM, and the course