Claire Cardie

Assistant Professor
cardie@cs.cornell.edu

Ph.D., University of Massachusetts, Amherst, 1994
The availability of on-line corpora is rapidly changing the field of natural language processing as researchers recognize the power of exploiting a corpus to develop text processing techniques that can handle the flexible and complex use of language in real-world text. Our research focuses primarily on knowledge-based approaches for understanding natural language texts, but spans a number of areas including knowledge acquisition, machine learning, and case-based reasoning. The work has been guided by two fundamental assumptions. First, successful natural language processing systems ultimately will require access to large amounts of semantic, syntactic, pragmatic, and often domain-specific parsing knowledge. Second, natural language systems must incorporate learning components to acquire this knowledge if building natural language processing systems to understand open-ended text is to become practical.

In recent work, we have developed Kenmore, a general framework for domain-specific knowledge acquisition for conceptual sentence analysis. The framework combines symbolic machine learning techniques and robust sentence analysis, and requires only minimal human intervention. In addition, it uniformly addresses a range of problems in sentence analysis, each of which traditionally had required a separate computational mechanism. Thus far, Kenmore has been used with corpora from two real-world domains (1) to perform part of speech tagging, word sense tagging, and concept tagging of all open class words in the corpus; (2) to acquire heuristics for part of speech disambiguation, word sense disambiguation, and concept activation; and (3) to find the antecedents of relative pronouns.

In current research, we continue to investigate the use of machine learning techniques as tools for guiding natural language system development and for exploring the mechanisms that underlie language acquisition. This work includes (1) extending the Kenmore framework to allow a natural language system to acquire increasingly complicated knowledge structures, e.g., semantic networks, mental models, rules, plans, and scripts, (2) replacing the human supervision now required in Kenmore's training phase with unsupervised learning techniques and statistical methods, and (3) developing NLP systems that make more direct use of a user's information needs.

Selected Publications