Claire Cardie
Assistant Professor
PhD Univ. of Massachusetts, Amherst, 1994


The availability of on-line text is rapidly changing the field of natural language processing as researchers recognize the power of exploiting a corpus to develop text-processing techniques that can handle the flexible and complex use of language in real-world text. Our research focuses primarily on corpus-based approaches for understanding and extracting information from natural language texts, but it spans a number of areas, including knowledge acquisition, machine learning, and case-based reasoning. The work has been guided by two fundamental assumptions. First, successful natural language processing systems ultimately will require access to large amounts of semantic, syntactic, pragmatic, and often domain-specific parsing knowledge. Second, natural language systems must incorporate learning components to acquire this knowledge if building natural language processing systems to understand open-ended text is to become practical.

In recent work, we have developed Kenmore, a general framework for domain-specific knowledge acquisition for conceptual sentence analysis. The framework combines symbolic machine learning techniques, robust sentence analysis, and minimal human intervention to allow the NLP (natural language processor) to learn the knowledge it needs to process a text. In addition, it uniformly addresses a range of problems in sentence analysis, each of which traditionally had required a separate computational mechanism. Thus far, Kenmore has been used with corpora from two real-world domains to learn solutions to a number of problems in sentence analysis, including part-of-speech tagging, semantic feature tagging, concept activation, and relative clause attachment.

In current research, we continue to investigate the use of machine learning techniques as tools for guiding natural language system development and for exploring the mechanisms that underlie language acquisition. This work includes (1) extending the Kenmore framework to handle additional linguistic phenomena, (2) extending Kenmore to allow a natural language system to acquire increasingly complicated knowledge structures directly from text, e.g. semantic networks, mental models, rules, plans, and scripts, and (3) developing information extraction systems that make more direct use of a userŐs information needs.


University Activities

Professional Activities

Lectures

Publications


Return to:
1994-1995 Annual Report Home Page
Departmental Home Page

If you have questions or comments please contact: www@cs.cornell.edu.


Last modified: 24 November 1995 by Denise Moore (denise@cs.cornell.edu).