CS/INFO 630, Spring 2006: Human Language Technology
(or, IR,
NLP, and special guests;
or, Representing and accessing digital information)
Note: After the Spring 2006 semester, this course was
renumbered; the current version is CS674/INFO 630.
Prof. Lillian
Lee (follow link for contact information and office hours)
TR 2:554:10, Bard 140
Official description (from the course catalog):
Information retrieval has evolved from the problem of locating books in a library to a multitude of tasks ubiquitous in business, science, and personal life. Modern information systems automatically compose newspapers, extract facts from the web, and analyze usage patterns. This course covers the necessary techniques for representing, organizing, and accessing digital information that is in textual or semistructured form. Topics combine information retrieval, natural language processing, and machine learning, with links to work in databases and data mining.
Administrative info:
Resources: (reference texts, other lecture
notes and slides, etc.)
Lectures:
 1/24/06: "Sense and Sensibility:
Automatically Analyzing
Subject and Sentiment in HumanAuthored Texts" (preface)
 Handouts: course description and
policies [pdf]
 1/26/06: Basics of information retrieval; the vectorspace model
 Lecture guides: cover sheet,
Haque/Shaparenko, Rabkin/Krafft
 Handouts: Examples for lectureguide preparation [pdf] and an example (for a different class)
of the scribenotes portion [ps]
 Lecture references:
 Amit Singhal. Modern
information retrieval: A brief overview. IEEE Data Engineering
Bulletin 24(4), pp. 35–43, 2001. [ps]
 TREC conference
proceedings. [website].
See especially the overviews.
 Additional references:
 See the course resources page for
other overviews.
 David Durbin. The most influential paper Gerard Salton never
wrote. Library Trends, Spring 2004. [html]
An interesting sidelight on the origins of the vector space model, the
assumptions underlying the VSM, and
phantom citations.
 Karen Spärck Jones. IDF term weighting and IR research lessons.
Journal of Documentation 60(5):521–523 (2004).
[pdf]
Reviews the history of the introduction of the three
"consensus" termweighting quantities.
Notes that Spärck Jones' data
consisted of presence/absence wordlists, thus explaining the
nonincorporation of
termfrequency information.
 1/31/06: An example of empirical IR research: length normalization
 Lecture guides: cover sheet for
Danis/Rogan (statement of problem corrected 2/21/06), Danis/Rogan
scribe notes, Danis/Rogan
problems, Leshed/Shami (cover
sheet for Leshed/Shami superseded by updated cover sheet for Danis/Rogan)
 Lecture references:
 Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document
length normalization. SIGIR 1996. [ps, pdf]
 Justin Zobel. How reliable are largescale information retrieval
experiments? SIGIR 1998 [pdf, pdf2]
 Additional references:
 Mark Sanderson and Hideo Joho. Forming test collections with no
system pooling. SIGIR 2004. [pdf]
 Ellen Voorhees and Donna Harman. Overview of the Eighth Text
REtrieval Conference (TREC8) [good ps, poor
pdf]. Section 3 discusses the effects of pooling.
 2/2/06: Completion of pivoted documentlength normalization
 Lecture guides: cover sheet, Dejgosha/Hu scribe notes, Hamatake/Park
 Additional references:
 Abdur Chowdury, M. Catherine McCabe, David Grossman, Ophir
Frieder. Document normalization revisited. SIGIR poster (2002). [pdf,
pdf2]

Jaap Kamps,
Maarten de Rijke, and
Boerkur Sigurbjoernsson. Length normalization in XML retrieval.
SIGIR (2004) [abstract,
pdf]
 2/7/06: Introduction to probabilistic retrieval (classic case)
 Lecture guides: cover sheet, Au/Gerner/Kot, Babinski/Lin
 Lecture references:
 Karen Spärck Jones, Steve Walker, and Stephen Robertson. A
probabilistic model of information retrieval:
Development and comparative experiments. Information Processing
and Management, pp. 779–808, 809–840 (2000). [html of preface, pdf
of Part 1, pdf
of Part 2]
 William S. Cooper. Some inconsistencies and misidentified
modeling assumptions in probabilistic information retrieval. ACM
Transactions on Information Systems (TOIS), pp. 100–111, 1995.
[pdf]
 Additional references:
 Fabio Crestani, Mounia Lalmas, C. J. Van Rijsbergen, and Iain
Campbell. "Is this document relevant? ...probably": A survey
of probabilistic models in information retrieval. ACM Computing
Surveys 30(4) (1998). [pdf,
pdf2]
 M. E. Maron. Probabilistic approaches to the document retrieval
problem. SIGIR, pp. 98–107 (1982). [pdf] Discusses models for
interpreting the probability of relevance, and also gives a personal
historical perspective on information science (as Maron construes it).
 Stephen Robertson. The probability ranking principle in IR.
Journal of Documentation 33 (1977). Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
Appendix gives an example where ranking by probability of relevance
is suboptimal.
 Stephen Robertston and Karen Spärck Jones. Relevance
weighting of search terms. Journal of the American Society for
Information Science 27, 12946 (1976). [pdf]
 2/9/06: Completion of a derivation of (a version of) the RSJ model;
the case of binary attributes; an isolatedPoisson model for term frequencies
 Lecture guides: cover sheet, Connolly/Mujeeb,
Haque/Shaparenko
 Lecture references:
 W. Bruce Croft and D. Harper. Using probabilistic models of
information retrieval without relevance information. Journal of
Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
 Additional references, including samples from the continuing examination of
explanations for/derivations of IDF weighting.
 Kishore Papineni. Why inverse document frequency? NAACL (2001).
[pdf]
 Stephen Robertson. Understanding inverse document frequency: On
theoretical arguments for IDF. Journal of Documentation
60(5):503–520 (2004). [pdf]
 Arjen P. de Vries and Thomas Roelleke. Relevance Information:
A Loss of Entropy but a Gain for IDF? SIGIR (2005). [pdf,
pdf2]
 2/14/06: The twoPoisson model and approximations (complete
classic probabilistic retrieval)
 Lecture guides: cover
sheet (3/9/06: minor update to note that a corrected version of the lecture
handout has been posted), Krafft/Rabkin, Leshed/Shami
 Handout: pdf (3/7/06: corrected
lecture number and noted typo in Singhal's Okapi equation)
 Lecture references:
 Abraham Bookstein and D. R. Swanson. Probabilistic models for
automatic indexing. Journal of the American Society for Information
Science, 25:312–318 (1974).
 Stephen P. Harter. A probabilistic approach to automatic keyword
indexing, part I: On the distribution of specialty words in a
technical literature. Journal of the American Society for Information
Science 26(4). [pdf.)
 Stephen E. Robertson and Steve Walker. Some simple effective
approximations to the 2Poisson model for probabilistic weighted
retrieval. SIGIR, pp. 232–241 (1994) [pdf, pdf2]. Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
 Additional references:

Eugene L. Margulis. NPoisson document modelling. SIGIR,
pp. 177–189 (1992). [pdf]
 S. E. Robertson and S. Walker and S. Jones and
M. M. HancockBeaulieu and M. Gatford. Okapi at TREC3 (1994) [pdf,
ps.gz]
Introduces BM25.
 Origin of the name "Okapi". [html]
 2/16/06: Introduction to the language modeling approach
 Lecture guide: cover sheet, Danis/Rogan
scribe notes, Danis/Rogan problems
 Lecture references:
 Jay M. Ponte and W. Bruce Croft. A language modeling approach to
information retrieval. SIGIR (1998). [pdf]
 John Lafferty and Chengxiang Zhai. Probabilistic relevance models
based on document and query generation. In Croft and Lafferty, eds.,
Language Modeling and
Information Retrieval (2003). [pdf,
ps]
 ChengXiang Zhai and John Lafferty. A study of smoothing methods
for language models applied to ad hoc information retrieval. SIGIR
(2001). [pdf,
ps]
 Additional references:
 Language modeling and information retrieval. The Lemur project. [html]
 Xiaoyong Liu and W. Bruce Croft. Statistical language modeling
for information retrieval. Annual Review of Information Science and
Technology 39 (2003). [pdf]
 2/21/06: More on the LM approach
 Lecture guides: cover
sheet for Babinski/Lin, Babinski/Lin
(3/10/06: lecturerinduced typo in exponent for HMM LM incorporated
into guide and hence notice removed from
cover sheet), Hamatake/Park scribe
notes (3/16: a few typos corrected), Hamatake/Park problems
 Additional references:
 David MacKay and Linda Peto. A hierarchical Dirichlet language model.
Natural Language Engineering 1(3):289–307 (1995). [pdf]
 ChengXiang Zhai. Notes on the KLdivergence retrieval formula
and Dirichlet prior smoothing (2003). [pdf]
Our discussion of the effects of using corpusdependent Dirichlet
smoothing is modeled on the derivation given in this paper,
although the initial scoring function is a bit different.
 2/23/06: Completion of alternate LM formulation; introduction to
relevance feedback
 Lecture guides (cover sheet
announces the altered derivation presented in both sets of scribe notes): Au/Gerner/Kot (3/29: a few
typos corrected),
Connolly/Mujeeb
 Lecture references:
 Ian Ruthven and Mounias Lalmas. A survey on the use of relevance
feedback for information access systems. Knowledge Engineering Review,
18(1) (2003) [pdf]
 Additional references:
 ChengXiang Zhai and John Lafferty. Document language models,
query models, and risk minimization for information retrieval. SIGIR
(2001) [ps,
pdf;
long
version]
Considers the query and the relevant documents to have been
generated by the same source LM.
 Karen Spärck Jones. Language modelling's generative model:
is it rational? (2004) [pdf]
 2/28/06: Relevance feedback methods
 Lecture guides: cover
sheet (updated 6/12/06),
Haque/Shaparenko, Krafft/Rabkin
 Lecture references:
 Victor Lavrenko and W. Bruce Croft. Relevancebased language
models. SIGIR (2001) [pdf,
pdf2]
 Stephen Robertson. On term selection for query expansion.
Journal of Documentation 46(4):359–364 (1990) [pdf]
 J. J. Rocchio. Relevance feedback in information retrieval. In
Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic
Document Processing (1971)
 E. Ide and G. Salton. Interactive search strategies and dynamic file organization in information retrieval. Ibid, pp. 373393 (1971)
 Additional references:
 Gerard Salton and Chris Buckley. Improving retrieval performance
by relevance feedback. JASIS 41(4): 288–297 (1990). [pdf]
 3/2/06: Relevance feedback: further explorations and evaluation issues
 Lecture guides: Danis/Rogan, Leshed/Shami
 Lecture references:
 Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval
using modified freezing, residual collection, and test and control
groups. In G. Salton, ed., The SMART retrieval system — experiments
in automatic document processing, pp. 355–370 (1971).
 Helene Fowkes and Micheline Beaulieu. Interactive searching
behavior: Okapi experiments for TREC8. Proceedings of 22nd BCSIRSG
European Colloquium on IR Research (2000).
 Ian Ruthven. Reexamining the potential effectiveness of
interactive query expansion. SIGIR, pp. 213–220 (2003). [pdf, pdf2]
 Xuehua Shen and ChengXiang Zhai. Active feedback in ad hoc
information retrieval. SIGIR, pp. 59–66 (2005). [pdf, pdf2]
 Additional references:
 Micheline Beaulieu, Helene Fowkes, Nega Alemayehu, and
Mark Sanderson. Interactive
Okapi at Sheffield — TREC8 (1999). [pdf]
 3/7/06: Completion of relevance feedback; implicit feedback sources
 Lecture guides: Hamatake/Park
 Lecture references:
 Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing
search via automated analysis of interests and activities. SIGIR,
pp. 449–456
(2005). [pdf,
pdf2]
 Masahiro Morita and Yoichi Shinoda. Information filtering
based on user behavior analysis and best match text
retrieval. SIGIR, pp. 272–281 (1994). [pdf]
 Diane Kelly and Nicholas J. Belkin. Display time as implicit
feedback: Understanding task effects. SIGIR (2004). [pdf,
pdf2]
 Additional references:
 NIPS 2005 workshop on machine learning for implicit feedback and
user modeling [homepage]
and associated PASCAL "inferring relevance from eye movements
challenge" 2005 [homepage]
 Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference:
A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003). [pdf, pdf2]
 Combining eye movements and collaborative filtering for proactive
information retrieval. Kai Puolamäki, Jarkko Salojärvi,
Eerika Savia, Jaana Simola, and Samuel Kaski. SIGIR (2005). [pdf, pdf2]
 Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Beyond the
commons: Investigating the value of personalizing web search.
Workshop on New Technologies for Personalized Information Access (PIA)
(2005). [pdf]
Study showing that individuals can give the same results
different relevancy judgments even when the individuals had the same information
need, that the same query can represent different information needs,
and related findings.
 3/9/06: An LMbased approach to implicit feedback
 Lecture guides: cover
sheet (updated 5/15/06 to correct a mistake in notation), Au/Gerner/Kot, Babinski/Lin
 Lecture references:

Xuehua Shen, Bin Tan, and
ChengXiang Zhai. Contextsensitive information retrieval using
implicit feedback. SIGIR, pp. 43–50 (2005).
[pdf,
pdf2
]
 Additional references:
 Thomas Cover and Joy A. Thomas. Elements of information theory.
Wiley (1991) [book
homepage]. Wellknown introduction to the field..
 Lillian Lee. Similaritybased approaches to natural language
processing (1997). [pdf]
Section 2.3 contains some introductory material describing motivations for the use of the KullbackLeibler (KL) divergence.
 3/14/06: Clickthrough data as implicit feedback: human validation
 Lecture guides: Connolly/Mujeeb, Haque/Shaparenko
 Lecture references:
 Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and
Geri Gay. Accurately interpreting clickthrough data as implicit
feedback. SIGIR, pp. 154–161 (2005). [pdf,
pdf2]
3/16/06: midterm exam (see also mid()term notes (3/28/06) and the Lecture 15 Krafft/Rabkin guide (includes brief solution
sketches for some midterm questions))
 3/28/06: Clickthrough data as relative implicit feedback
 Lecture guides: Au/Gerner, Krafft/Rabkin (includes brief solution
sketches for some midterm questions)
 Handouts: mid()term notes
 Lecture references:
 Thorsten Joachims. Optimizing search engines using clickthrough
data. KDD, 2002 [pdf, pdf2]
 Filip Radlinski and Thorsten Joachims.
Query chains: learning to rank from implicit feedback. KDD,
pp. 239–248 (2005)
[pdf,
pdf2]
Cowinner, best student paper award.
 Additional references:
 Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin
rank boundaries for ordinal regression. Advances in Large Margin
Classifiers, pp. 115–132 (2000) [ps.gz]
 3/30/06: Matrixtheoretic characterizations of corpus structure
(introduction to the singular value decomposition (SVD))
 Lecture guides: Danis/Rogan, Kot/Leshed

Additional references:
 Lloyd N. (Nick) Trefethen and David
Bau III. Numerical Linear Algebra (1997). [book
homepage] This book, and especially lectures one,
four
and five
(all from part I, "Fundamentals"),
should have been referenced in class: our presentation turns out to
resemble theirs, but is, not surprisingly, not nearly as nice.
 4/4/06: The SVD
 Lecture guides: Connolly/Lin/Mujeeb, Hamatake/Park (Update, 6/12/06: corrigenda)
 Additional references: (see also previous lecture)
 Gene H. Golub and Charles F. Van Loan. Matrix Computations, third edition
(1996). [book
homepage]. The "Bible" on, well, matrix
computations. Extensive coverage of the singular value
decomposition (SVD), although certainly not as gentle an introduction
as in class.
 4/6/06: "Fewfactor" representations (LSI, pLSI, and others)
 Lecture guides: Haque/Shaparenko.
Note: In class, I incorrectly stated that the GMAT automated scoring system,
erater,
uses LSI; I confused two different systems. This lecture guide
corrects the error.
 Handouts: Lecture aid
 Lecture references:
 David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B.
Tenenbaum. Hierarchical topic models and the nested Chinese restaurant
process. NIPS 16 (2003). [pdf]
 Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas
K. Landauer, and Richard Harshman. Indexing by latent semantic
analysis. Journal of the American Society of Information Science
41(6):391–407 (1990). [pdf,pdf2]
 Peter W. Foltz, Darrell Laham, and Thomas
K. Landauer. Automated essay scoring: Applications to education
technology. Proceedings of EDMEDIA (1999). [html]
 Thomas Hofmann. Probabilistic latent semantic indexing. SIGIR,
pp. 50–57 (1999). [pdf,pdf2]
 Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem:
The Latent Semantic Analysis theory of acquisition, induction and
representation of knowledge. Psychological Review 104(2):211–240
(1997). [pdf,html,html2]
 Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by
nonnegative matrix factorization. Nature 401:788–791
(1999) [pdf]

Christos H. Papadimitriou,
Hisao Tamaki,
Prabhakar Raghavan, and
Santosh Vempala. Latent semantic indexing: a probabilistic analysis.
Symposium on Principles of Database Systems, pp. 159–168
(1998)
[pdf].
Journal version: Journal of Computer and Systems Science
61(20):217–235 (2000) [ps]
 Additional references:
 Rie Kubota Ando and Lillian Lee. Iterative Residual Rescaling:
An analysis and generalization of LSI. SIGIR, pp. 154–162
(2001) [pdf,pdf2]
 Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared
Saia. Spectral analysis of data. Symposium on Theory of
Computing (STOC), pp. 619–626 (2001) [pdf,(longer?)
ps]
 Thomas Hofmann and Jan Puzicha. Statistical models for
cooccurrence data. AI Memo 1625, MIT (1998). [pdf]
 Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional
clustering of English words. Proceedings of the ACL (1993). [pdf]
 Donald E. Powers, Jill C. Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich. Stumping erater:
Challenging the validity of automated essay scoring (2001). [pdf]
An
interesting study in which knowledgeable humans were asked to produce
essays that would be incorrectly scored by automatic methods.
 Michael Steele and David Aldous. Introduction to the interface of
probability and algorithms. Probability and Algorithms: A Panel
Report. National Academy of Sciences (1993). [pdf]
One subsection gives a quick calculation for the expected number of
occupied tables in the Chinese restaurant process.
 Demo: score your own essay (on psycholinguistics) with LSI. [home page]
 4/11/06: Modeling syntactic structure: contextfree grammars
 Lecture guides: Au, Krafft/Rabkin
 Handouts: Lecture aid
 Lecture references:
 Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag.
Generalized phrase structure grammar. Harvard University Press
(1985).
 Maurice Gross. Méthodes en syntaxe: régime des constructions
complétives. Hermann (1975).
 Additional references:
 James Allen. Section 5.2: Movement phenomena in language. Natural language
understanding, second edition. Benjamin/Cummings (1995).
 Christopher Culy. The complexity of the vocabulary of
Bambara. Linguistics and Philosophy 8:345–351 (1985). [Springer
link]
 Daniel Jurafsky and James H. Martin. Chapter 9: Contextfree grammars for
English. Speech and language processing: An introduction to natural
language processing, computational linguistics, and speech
recognition. Prentice Hall (2000).
 Fernando C. N. Pereira and Stuart M. Shieber. Prolog and
naturallanguage analysis. CLSI (1987) [pdf].
Sections 4.2.34.2.5 and 4.3.3 (pages 102ff. of the pdf file) describe
syntactic aspects of fillergap dependencies.
 Geoff Pullum, 1986. Footloose and contextfree. Natural language and
linguistic theory 4, pp. 283–289. Reprinted in The great Eskimo
vocabulary hoax, U. of Chicago Press 1991. A very fun read.
 Stuart Shieber. Evidence against the contextfreeness of natural
language. Linguistics and Philosophy 8:333–343 (1985). [pdf,Springer
link]
 4/13/06: Featurebased contextfree grammars; introduction to
treeadjoining grammars
 Lecture guides: Kot/Leshed, Danis/Rogan
 Lecture references:
 Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct
grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
 Additional references:
 James Allen. Section 5.3: Handling questions in contextfree grammars.
Natural language
understanding, second edition. Benjamin/Cummings (1995). Motivated by gap feature propagation in GPSG, as described in
Gazdar et al. (1985) (full citation in references for the previous
lecture).
 Aravind K. Joshi and Yves Schabes. Treeadjoining grammars.
In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages,
volume 3 (Beyond words), pp. 69–123
(1997). [pdf]
 Daniel Jurafsky and James H. Martin. Chapter 11: Features and unification. Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech
Recognition. Prentice Hall (2000).
 K. VijayShankar and Aravind K. Joshi. Some computational
properties of tree adjoining grammars. Proceedings of the 23rd ACL,
pp. 82–93 (1985). [pdf,pdf2]
 The XTAG project. [home
page]. Site includes a tutorial, grammars for English and Korean, synchronoustag
implementation, and a supertagging demo.
 4/18/06: TAGs, continued
 Lecture guides: Connolly/Lin/Mujeeb
 Additional references: (see also above)
 Lillian Lee. "I'm sorry Dave, I'm afraid I can't do
that": Linguistics, Statistics, and Natural Language Processing
circa 2001. In Computer science: Reflections on the Field, Reflections
from the Field, National Academies Press, pp. 111–118
(2004). [pdf]
 4/20/06: TAGs for idiom analysis; adjunction constraints
 Lecture guides: Hamatake/Park
 Lecture references:
 K. VijayShanker and Aravind K. Joshi. Feature structures based
tree adjoining grammars. COLING, pp. 714–719 (1988) [pdf]
 Additional references: (see also above)
 Anne Abeillé and Yves Schabes. Parsing idioms in
lexicalized TAGs. Proc. of the 4th EACL, pp. 19 (1989). [pdf]
 4/25/06: Algorithms for grammar parsing and learning
 Lecture references:
 Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian,
Kimmen Sjölander, Rebecca C. Underwood, and David
Haussler. Stochastic contextfree grammars for tRNA modeling. Nucleic
Acids Research 22(23):5112–20 (1994). [pdf]
 David B. Searls. The linguistics of DNA. American Scientist
80(6):579–591. (1992)
 Andreas Stolcke. An Efficient Probabilistic ContextFree Parsing
Algorithm that Computes Prefix Probabilities. Computational
Linguistics 21(2):165–201. [pdf,html,techreport
ps]
 Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, Takashi Yokomori.
Tree adjoining grammars for RNA structure prediction. Theoretical
Computer Science 210(2): 277–303, special issue on genome
informatics (1999) [pdf]
 Additional references:
 Daniel Jurafsky and James H. Martin. Chapter 10: Parsing with contextfree grammars. Speech and language processing: An introduction to natural
language processing, computational linguistics, and speech
recognition. Prentice Hall (2000).
 Lillian Lee. Fast contextfree grammar parsing requires fast
Boolean matrix multiplication. Journal of the ACM 49(1):1–15
(2002). [pdf,pdf2]
 Giorgio Satta. Tree adjoining grammar parsing and Boolean matrix
multiplication. Computational Linguistics 20(2):173–192
(1994). [pdf]
 Giorgio Satta. Tutorial on tree adjoining grammar parsing.
Seventh international workshop on tree adjoining grammarrs and related
formalisms (TAG+7) (2004). [ps]
 4/27/06: The EM algorithm
 Lecture references:
 James K. Baker. Trainable grammars for speech recognition.
Speech communication papers for the 97th meeting of the Acoustical
Society of America, pp. 547–550 (1979)
 Karim Lari and Steve J. Young. The estimation of stochastic contextfree grammars
using the insideoutside algorithm. Computer Speech
and Language 4:35–56 (1990)
 Detlef Prescher. A tutorial on the ExpectationMaximization
algorithm including maximumlikelihood estimation and EM training of
probabilistic contextfree grammars. [pdf]
The handout for lecture consists of pages 44–47.
 Additional references: Note that notation varies
across different presentations, with the same variable often used to
refer to different things, unfortunately.
 Adam Berger. Convexity, maximum likelihood and all that. [ps]
 Zhiyi Chi and Stuart Geman. Estimation of probabilistic contextfree
grammars. Computational Linguistics 24(2):298–305 (1998). [pdf]
 Michael Collins. The EM algorithm (1997) [ps]
 Dempster, Laird, and Rubin. Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society,
B, 39(1):1–38 (1977) [ocr]
 Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm
and extensions. Second edition. Wiley (2006). [errata for first printing]
 Ted Pedersen. The EM algorithm: selected readings (2001) [pdf]
 Stefan Riezler. Getting EM to work (talk at EMNLP 2001) [ps.gz]
 5/2/06: Conclusion of EM; introduction to maximumentropy models.
 Handouts: lecture aid
 Lecture references:
 Adam Berger, Stephen Della Pietra, and Vincent Della Pietra. A
maximum entropy approach to natural language processing. Computational
Linguistics 22(1):39–71 (1996). [pdf,ps]
 Additional references:
 Adam Berger's page on "MaxEnt and exponential models"
[html]
Contains many tutorials on relevant topics.
 5/4/06: Conclusion of maximum entropy and of the class
 Handouts: Information regarding
the final exam
 Lecture references:
 John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional
random fields: Probabilistic models for segmenting and labeling
sequence data. ICML, pp. 282–289 (2001) [pdf]