CS/INFO 630, Spring 2006: Human Language Technology
(or, IR, NLP, and special guests; or, Representing and accessing digital information)

Note: After the Spring 2006 semester, this course was renumbered; the current version is CS674/INFO 630.

Prof. Lillian Lee (follow link for contact information and office hours)
TR 2:55-4:10, Bard 140

Official description (from the course catalog):

Information retrieval has evolved from the problem of locating books in a library to a multitude of tasks ubiquitous in business, science, and personal life. Modern information systems automatically compose newspapers, extract facts from the web, and analyze usage patterns. This course covers the necessary techniques for representing, organizing, and accessing digital information that is in textual or semistructured form. Topics combine information retrieval, natural language processing, and machine learning, with links to work in databases and data mining.

Administrative info:

Resources: (reference texts, other lecture notes and slides, etc.) Lectures:

  1. 1/24/06: "Sense and Sensibility: Automatically Analyzing Subject and Sentiment in Human-Authored Texts" (preface)


  2. 1/26/06: Basics of information retrieval; the vector-space model


  3. 1/31/06: An example of empirical IR research: length normalization


  4. 2/2/06: Completion of pivoted document-length normalization


  5. 2/7/06: Introduction to probabilistic retrieval (classic case)


  6. 2/9/06: Completion of a derivation of (a version of) the RSJ model; the case of binary attributes; an isolated-Poisson model for term frequencies


  7. 2/14/06: The two-Poisson model and approximations (complete classic probabilistic retrieval)


  8. 2/16/06: Introduction to the language modeling approach


  9. 2/21/06: More on the LM approach


  10. 2/23/06: Completion of alternate LM formulation; introduction to relevance feedback


  11. 2/28/06: Relevance feedback methods
  12. 3/2/06: Relevance feedback: further explorations and evaluation issues
  13. 3/7/06: Completion of relevance feedback; implicit feedback sources
  14. 3/9/06: An LM-based approach to implicit feedback
  15. 3/14/06: Clickthrough data as implicit feedback: human validation
    3/16/06: midterm exam (see also mid(-)term notes (3/28/06) and the Lecture 15 Krafft/Rabkin guide (includes brief solution sketches for some midterm questions))
  16. 3/28/06: Clickthrough data as relative implicit feedback
  17. 3/30/06: Matrix-theoretic characterizations of corpus structure (introduction to the singular value decomposition (SVD))
  18. 4/4/06: The SVD
  19. 4/6/06: "Few-factor" representations (LSI, pLSI, and others)
  20. 4/11/06: Modeling syntactic structure: context-free grammars
  21. 4/13/06: Feature-based context-free grammars; introduction to tree-adjoining grammars
  22. 4/18/06: TAGs, continued
  23. 4/20/06: TAGs for idiom analysis; adjunction constraints
  24. 4/25/06: Algorithms for grammar parsing and learning
  25. 4/27/06: The EM algorithm
  26. 5/2/06: Conclusion of EM; introduction to maximum-entropy models.
  27. 5/4/06: Conclusion of maximum entropy and of the class