Claire Cardie
Assistant Professor
cardie@cs.cornell.edu
Ph.D., University of Massachusetts, Amherst, 1994
The availability of on-line corpora is rapidly changing the field of
natural language processing as researchers recognize the power of
exploiting a corpus to develop text processing techniques that can
handle the flexible and complex use of language in real-world text.
Our research focuses primarily on knowledge-based approaches for
understanding natural language texts, but spans a number of areas
including knowledge acquisition, machine learning, and case-based
reasoning. The work has been guided by two fundamental assumptions.
First, successful natural language processing systems ultimately will
require access to large amounts of semantic, syntactic, pragmatic, and
often domain-specific parsing knowledge. Second, natural language
systems must incorporate learning components to acquire this knowledge
if building natural language processing systems to understand
open-ended text is to become practical.
In recent work, we have developed Kenmore, a general framework for
domain-specific knowledge acquisition for conceptual sentence
analysis. The framework combines symbolic machine learning techniques
and robust sentence analysis, and requires only minimal human
intervention. In addition, it uniformly addresses a range of problems
in sentence analysis, each of which traditionally had required a
separate computational mechanism. Thus far, Kenmore has been used with
corpora from two real-world domains (1) to perform part of speech
tagging, word sense tagging, and concept tagging of all open class
words in the corpus; (2) to acquire heuristics for part of speech
disambiguation, word sense disambiguation, and concept activation; and
(3) to find the antecedents of relative pronouns.
In current research, we continue to investigate the use of machine
learning techniques as tools for guiding natural language system
development and for exploring the mechanisms that underlie language
acquisition. This work includes (1) extending the Kenmore framework to
allow a natural language system to acquire increasingly complicated
knowledge structures, e.g., semantic networks, mental models, rules,
plans, and scripts, (2) replacing the human supervision now required
in Kenmore's training phase with unsupervised learning techniques and
statistical methods, and (3) developing NLP systems that make more
direct use of a user's information needs.
Selected Publications
- Lehnert, W., C. Cardie, D. Fisher, J. McCarthy, E. Riloff, and S.
Soderland. Evaluating an Information Extraction System. Journal
of Integrated Computer-Aided Engineering, vol. 1, number 6, 1994.
- Cardie, C. A Case-Based Approach to Knowledge Acquisition for
Domain-Specific Sentence Analysis. Proceedings of the Eleventh
National Conference on Artificial Intelligence, 798-803, Washington,
DC, 1993. AAAI Press / MIT Press.
- Cardie, C. Using Decision Trees to Improve Case-Based Learning.
Proceedings of the Tenth International Conference on Machine
Learning, 25-32, Amherst, MA, 1993. Morgan Kaufmann.
- Cardie, C. Using Cognitive Biases to Guide Feature Set Selection.
Proceedings of the Fourteenth Annual Conference of the Cognitive
Science Society, 743-748, Bloomington, IN, Lawrence Erlbaum
Associates, and Working Notes of the AAAI Workshop on
Constraining Learning with Prior Knowledge, 11-18, San Jose, CA,
1992.
- Cardie, C. Corpus-Based Acquisition of Relative Pronoun
Disambiguation Heuristics. Proceedings of the 30th Annual
Conference of the Association for Computational Linguistics, 216-223,
Newark, DE, 1992. Association for Computational Linguistics.
- Cardie, C. Learning to Disambiguate Relative Pronouns.
Proceedings of the Tenth National Conference on Artificial
Intelligence, 38-43, San Jose, CA, 1992. AAAI Press / MIT Press.
- Cardie, C. and W. Lehnert. A Cognitively Plausible Approach to
Understanding Complicated Syntax. Proceedings of the Ninth
National Conference on Artificial Intelligence, 117-124, Anaheim,
CA, 1991. AAAI Press / MIT Press.