Claire Cardie Assistant Professor
cardie@cs.cornell.edu
http://www.cs.cornell.edu/home/cardie/cardie.html
Ph.D. University of Massachusetts, Amherst, 1994


My research focuses primarily on corpus-based approaches for understanding and extracting information from natural language texts, but it spans a number of areas including machine learning, case-based reasoning, and knowledge acquisition. Although current natural language processing (NLP) systems cannot yet perform in-depth text understanding, they can read an arbitrary text and summarize its major events provided those events fall within a particular domain of interest (e.g. stories about natural disasters or terrorist events). To understand the texts, NLP systems rely heavily on handcrafted linguistic knowledge as well as handcrafted knowledge about the domain and about the world in general. Unfortunately, encoding this background knowledge into the system is difficult, time-consuming, and error prone, and it invariably requires the expertise of computational linguists familiar with the underlying system.

To avoid these difficulties, we have developed a general knowledge acquisition frame-work, Kenmore, in which natural language processing systems can begin to bootstrap their own knowledge bases directly from the text. The framework, which combines robust partial parsing and machine learning techniques, essentially allows the NLP system to learn the knowledge it needs to process a text. Thus far, Kenmore has been used with corpora from two real-world domains for part of speech tagging, word-sense tagging, concept activation, and relative pronoun resolution.

We continue to investigate the use of machine learning techniques as tools for guiding natural language system development and for exploring the mechanisms that underlie language acquisition. This work includes: (1) extending Kenmore to handle additional knowledge acquisition tasks for NLP, e.g. pronoun resolution; (2) extending Kenmore to handle the task of extracting entire knowledge bases, e.g. a rule base, directly from text; and (3) improving the performance of the system by allowing linguistic and cognitive biases to influence our corpus-based approach to learning linguistic knowledge.


Awards


University Activities

Professional Activities

Lectures

Publications


Personal

Return to:

1995-1996 Annual Report Home Page
Departmental Home Page

If you have questions or comments please contact: www@cs.cornell.edu.


Last modified: 1 November 1996 by Denise Moore (denise@cs.cornell.edu).