Now, a few words on looking for things. When you go looking for something specific, your chances of finding it are very bad. Because of all the things in the world, you're only looking for one of them. When you go looking for anything at all, your chances of finding it are very good. Because of all the things in the world, you're sure to find some of them.
--The Zero Effect, written and directed by Jake Kasdan
Thou shalt not answer questionnaires
Or quizzes upon World-Affairs,
Nor with compliance
Take any test. Thou shalt not sit
With statisticians nor commit
A social science.
--Under Which Lyre: A Reactionary Tract for the Times, W. H. Auden
The goal of natural language processing and information retrieval (I do not subscribe to the view that these are different fields) is, in a broad sense, to enable computers to use human language as a (one-way or two-way) communication medium accurately, robustly, and gracefully [a short, breezy history]. My research interests are in the empirical and theoretical problems that arise in the pursuit of this goal.
Because of the subtleties of human language, high-performance language-capable systems cannot be developed without access to high-quality linguistic and domain knowledge. Unfortunately, the process of gathering and encoding such information by hand is typically tedious and time-consuming; furthermore, the task often requires the aid of human experts. These factors give rise to the knowledge acquisition bottleneck, widely recognized to be one of the biggest obstacles to building sound, general-purpose natural language processing systems.
A major focus of my work has been to create knowledge-lean methods for overcoming the knowledge acquisition bottleneck. I have developed algorithms that allow a system to automatically acquire linguistic and domain knowledge directly from text samples; in keeping with the “knowledge-lean” idea (and in contrast to supervised machine learning techniques), I have concentrated on approaches that work on essentially raw text rather than human-annotated data.
A sampling of areas I've worked on are:
Doing this work really just involved following the lead of my students and postdocs, who have been spectacular. My debt to them is unbounded.
Papers
on these topics.
Lillian Lee's home page
Cornell Natural
Language Processing Group