1998 - 1999 CS Annual Report                                                                  Faculty
Lillian Lee

Assistant Professor

PhD Harvard, 1997

My primary research focus is on statistical methods for natural language processing, with particular interest in
problems arising from sparse data.  

Recent work has investigated the power of similarity-based techniques to improve probability estimation, the funda- mental technology underlying any statistical approach. I am currently interested in analyzing similarity functions, both theoretically and empirically; preliminary results include the 

development of a novel family of information-theoretic functions and a new analysis framework for similarity functions in general. Also, continuing work on both nearest-neighbor techniques and clustering methods, Fernando Pereira and I have been moving towards an understanding of the relationships between these two complementary paradigms. 

In other work, Rie Ando and I have been developing empirical methods for segmenting Japanese, which lacks space delimiters between words. Our algorithms rely neither on a dictionary nor pre-segmented training data, but rather only on simple statistics drawn  from unannotated text. Preliminary results are very promising: we are achieving error rates far below those of morphological analyzers over a variety of performance metrics. 

  • College of Engineering Teaching Award, 1998-1999 
University Activities 
  • Chair: Computer Science colloquium series.  
  • Member: Field of Cognitive Studies. 
Professional Activities  
  • Reviewer: Computer Speech and Language  
  • Reviewer: Natural Language Engineering 
  • NSF review panel  
  • Program Committees: 37th Annual Meeting of the Association for Computational
    Linguistics (ACL 99) (reviewer); Fourth Conference on Empirical Methods in Natural Language Processing/Very Large Corpora (EMNLP/VLC '99); ACL-99 Workshop on Unsupervised Learning in Natural Language Processing; Student Abstract and Poster Program, Sixteenth National Conference on Artificial Intelligence (reviewer) 
  • Statistical methods in natural language processing (Four-hour tutorial). Fifteenth National
    Conference on Artificial Intelligence, Madison, Wisconsin, 1998 (with J. Lafferty). 
  • Unsupervised segmentation of Japanese. Invited talk. ACL Workshop on Unsupervised
    Learning in Natural Language Processing, Univ. of Maryland, 1999. 
  • Similarity-based models of word co-occurrence probabilities. Machine Learning 34 (1999), 43-69. Special Issue on Natural Language Learning (with I. Dagan and F. Pereira).  
  • Measures of distributional similarity. 37th Annual Meeting of the Association for Computational Linguistics (1999), 25-32.  
  • Distributional similarity models: Clustering vs. nearest neighbors. 37th Annual Meeting of the Association for Computational Linguistics (1999), 33-40 (with F. Pereira).