Lillian Lee: Research Interests

Now, a few words on looking for things. When you go looking for something specific, your chances of finding it are very bad. Because of all the things in the world, you're only looking for one of them. When you go looking for anything at all, your chances of finding it are very good. Because of all the things in the world, you're sure to find some of them.
--The Zero Effect, written and directed by Jake Kasdan

The goal of natural language processing and information retrieval (I do not subscribe to the view that these are different fields) is, in a broad sense, to enable computers to use human language as a (one-way or two-way) communication medium accurately, robustly, and gracefully [a short, breezy history]. My research interests are in the empirical and theoretical problems that arise in the pursuit of this goal.

Because of the subtleties of human language, high-performance language-capable systems cannot be developed without access to high-quality linguistic and domain knowledge. Unfortunately, the process of gathering and encoding such information by hand is typically tedious and time-consuming; furthermore, the task often requires the aid of human experts. These factors give rise to the knowledge acquisition bottleneck, widely recognized to be one of the biggest obstacles to building sound, general-purpose natural language processing systems.

A major focus of my work has been to create knowledge-lean methods (the term is due to Rebecca Bruce and Ted Pedersen) for overcoming the knowledge acquisition bottleneck. I have developed algorithms that allow a system to automatically acquire linguistic and domain knowledge directly from text samples; in keeping with the ``knowledge-lean'' idea (and in contrast to supervised machine learning techniques), I have concentrated on approaches that work on essentially raw text rather than human-annotated data.

A sampling of areas I've worked on are:

See my papers page for a full list and more information.
Some of the data from previous experiments is available.

Doing this work really just involved following the lead of my students and postdocs, who have been spectacular. My debt to them is unbounded.


Papers on these topics.
Lillian Lee's home page
Cornell Natural Language Processing Group