Claire Cardie, Assistant Professor.
- 4124 Upson Hall
- Phone: 607-255-9206
- Fax : 607-255-4428
- Email: cardie@cs.cornell.edu
Program chair, Second
Conference On Empirical Methods in Natural Language Processing
(EMNLP-2)
Click on these to see:
Research in the Natural Language Processing (NLP) group at
Cornell focuses primarily on developing corpus-based techniques for
understanding and extracting information from natural language
texts. In current work, we are investigating the use of machine
learning techniques as tools for guiding natural language system
development and for exploring the mechanisms that underly language
acquisition. Our work encompasses three related areas: (1) the machine
learning of natural language, (2) the use of corpus-based NLP
techniques to aid information retrieval systems, and (3) the design of
user-trainable systems that can efficiently and reliably extract the
important information from a document.
- The Kenmore Project.
The focus of the Kenmore project is on developing techniques to
automate the knowledge acquisition tasks that comprise the building of
any NLP system. Very generally, Kenmore acquires linguistic knowledge
using a combination of symbolic machine learning techniques and robust
sentence analysis. It has been used with corpora from two real-world
domains to perform part-of-speech tagging, semantic feature tagging,
and concept activation and to find the antecedents of relative
pronouns. In current work, we are extending Kenmore to handle larger
text corpora and additional disambiguation tasks. In all of our work,
we evaluate the language learning components in the context of the
larger NLP application in which it is embedded. The goal of the
project is to determine the conditions under which machine learning
techniques can be expected to offer a cost-effective approach to
knowledge acquisition for NLP systems. This work is funded under NSF
CAREER Award IRI-9624639.
- High-Precision Information Retrieval.
We are also working with SaBIR Research and Cornell's
Information Retrieval (IR) group to develop a unified approach to
improving the end-user efficiency of state-of-the-art text retrieval
systems. The underlying technology of our research is a novel
combination of statistical and linguistic approaches to text analysis
in which a trainable, high-precision partial parser is used to
recognize those linguistic relationships that are most important for
the larger IR system. We are applying the approach to to three
distinct IR tasks: near-duplicate document detection, high-precision
text retrieval, and query-dependent text summarization. This work is
funded under the DARPA-sponsored TIPSTER initiative.
- Information Extraction.
As part of Cornell's CSTR project, we are using information extraction
techniques to support content-based browsing of technical texts. To
view one such browsing feature click
here
- Embedded Machine Learning Systems for Natural Language Processing: A
General Framework,
C. Cardie. In Wermter, S. and Riloff, E.
and Scheler, Gabriele (eds.), Connectionist, Statistical and
Symbolic Approaches to Learning for Natural Language Processing,
Lecture Notes in Artificial Intelligence, 315-328, Springer,
1996. Originally presented at the Workshop on New Approaches to
Learning for Natural Language Processing, 14th International Joint
Conference on Artificial Intelligence (IJCAI-95), 119-126,
1995. AAAI Press.
- Chapter 1 (Introduction), Ph.D. Thesis,
C. Cardie. Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis,
Ph.D. Thesis, University of Massachusetts, Amherst, MA,
1994. Note that this file contains just the introductory chapter of the thesis.
- Domain-Specific Knowledge Acquisition for Conceptual
Sentence Analysis,
C. Cardie. Ph.D. Thesis, University of Massachusetts, Amherst, MA,
1994. Available as University of Massachusetts, CMPSCI Technical Report
94-74. (178 pages, postscript, compressed postscript)
- Using Cognitive Biases to Guide Feature Set Selection,
C. Cardie. Proceedings of the Fourteenth Annual Conference of the Cognitive
Science Society, 743-748, Bloomington, IN, Lawrence Erlbaum
Associates, and Working Notes of the AAAI Workshop on
Constraining Learning with Prior Knowledge, 11-18, San Jose, CA,
1992.
- Analyzing Research Papers Using Citation Sentences,
W. Lehnert, C. Cardie, and E. Riloff. Proceedings of the Twelfth Annual Conference of the Cognitive
Science Society, 511-518, Cambridge, MA, 1990. Lawrence Erlbaum
Associates.