Claire Cardie
Professor, Department of Computer Science
Charles and Barbara Weiss Director,
Information Science
Office hours for Spring
2007:
Tuesdays 9-10am
Thursdays 3-4pm
Current research community service:
My primary research is in the area of natural language understanding and intelligent text processing where my goal is to develop algorithms and systems that will vastly improve a user's ability to find, absorb, and extract information from on-line text. My group's research generally proceeds at two complementary levels: we focus both on building real systems for large-scale natural language processing tasks and on developing techniques to address underlying theoretical problems in syntactic and semantic analysis of natural language. In particular, we are investigating the use of machine learning techniques as tools for guiding natural language system development and for exploring the mechanisms that underlie language understanding. Our work encompasses a number of related areas:
Currently, we are working on noun phrase coreference (within-document and
cross-document), weakly
supervised learning methods for NLP, and building opinion-oriented question-answering
and summarization systems. For information on these and other NLP projects at Cornell, follow this
link.
Some of my research has focused directly on the development of new
machine learning techniques. In particular, some of the group's
research in this area has addressed:
Identifying Expressions of Opinion in Context. Eric Breck, Yejin Choi, and Claire Cardie. To appear at the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), 2007.
Joint Extraction of Entities and Relations for Opinion Recognition. Yejin Choi, Eric Breck, and Claire Cardie. Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2006.
Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning. Veselin Stoyanov and Claire Cardie. Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2006.
Toward Opinion Summarization: Linking the Sources. Veselin Stoyanov and Claire Cardie. COLING\ACL2006 Workshop on Sentiment and Subjectivity in Text, 2006.
Using Natural Language Processing to Improve E-rulemaking. Claire Cardie, Cynthia Farina, Thomas Bruce, Erica Wagner. Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.
Better Inputs for Better Outcomes: Using the Interface to Improve e-Rulemaking. Cynthia Farina, Claire Cardie, Thomas Bruce, Erica Wagner. Workshop on eRulemaking at the Crossroads, Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.
Annotating Expressions of Opinions and Emotions in Language. Janyce Wiebe, Theresa Wilson, Claire Cardie. Language Resources and Evaluation (formerly Computers and the Humanities), 39:2-3, 2005.
Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. Proceedings of HLT-EMNLP 2005, 2005.
Multi-Perspective Question Answering Using the OpQA Corpus. Ves Stoyanov, Claire Cardie and Janyce Wiebe. Proceedings of HLT-EMNLP 2005, 2005.
Optimizing to Arbitrary NLP Metrics using Ensemble Selection. Art Munson, Claire Cardie, and Rich Caruana. Proceedings of HLT-EMNLP 2005, 2005.
OpinionFinder: A System for Subjectivity Analysis. Theresa Wilson, Paul Hoffmann, Swapna Somasundaran, Jason Kessler, Janyce Wiebe; Yejin Choi, Claire Cardie; Ellen Riloff and Siddharth Patwardhan. Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, 2005. (demo)
Evaluating an Opinion Annotation Scheme Using a New Multi-Perspective Question and Answer Corpus. Veselin Stoyanov, Claire Cardie, Janyce Wiebe, and Diane Litman. In Computing Attitude and Afftect in Text: Theory and Practice. Shanahan, Qu, and Wiebe (eds.), Springer, 2005. Originally appeared in 2004 AAAI Spring Symposium on Exploring Attitude and Affect in Text, AAAI Press, 2004. Answer annotation instructions; Question creation instructions.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions. Eric Breck and Claire Cardie. 20th International Conference on Computational Linguistics (COLING-04), 2004.
Low-Level Annotations and Summary Representations of Opinions for Multiperspective QA. Claire Cardie, Janyce Wiebe, Theresa Wilson, & Diane Litman. In Mark Maybury (ed), New Directions in Question Answering , AAAI Press/MIT Press, 2004. (Originally apeared at the 2003 AAAI Spring Symposium on New Directions in Question Answering.)
Weakly Supervised Natural Language LearningWithout Redundant Views. Vincent Ng and Claire Cardie. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), 173–180, Association for Computational Linguistics, 2003.
Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms. Vincent Ng and Claire Cardie. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-2003), Association for Computational Linguistics, 2003.
Recognizing and Organizing Opinions Expressed in theWorld Press. JanyceWiebe, Eric Breck, Chris Buckley, Claire Cardie, Paul Davis, Bruce Fraser, Diane Litman, David Pierce, Ellen Riloff, TheresaWilson, David Day, Mark Maybury. 2003 AAAI Spring Symposium on New Directions in Question Answering, 12–19, AAAI Press, 2003.
NRRC SummerWorkshop on Multiple-Perspective Question Answering: Final Report. JanyceWiebe, Eric Breck, Chris Buckley, Claire Cardie, Paul Davis, Bruce Fraser, Diane Litman, David Pierce, Ellen Riloff, TheresaWilson. 2002.
Improving Machine Learning Approaches to Coreference Resolution. Vincent Ng and Claire Cardie. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2002.
Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution. Vincent Ng and Claire Cardie. Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), 2002.
Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules. Vincent Ng and Claire Cardie. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2002.
Detecting
Discrepancies in Numerical Estimates Using Multidocument Hypertext Summaries.
Michael White, Claire Cardie, Vincent Ng, and Daryl McCullough.
Proceedings of the Second International Conference on Human
Language Technology Research (HLT-02), 2002.
Selecting Sentences
for Multidocument Summaries Using Randomized Local Search. Michael White
and Claire Cardie. ACL Workshop on Automatic Summarization, 2002.
Limitations of Co-Training for
Natural Language Learning from Large Datasets. David Pierce and
Claire Cardie.
Proceedings of the 2001 Conference on Empirical
Methods in Natural Language Processing (EMNLP-2001), Association for
Computational Linguistics Research,2001.
Constrained K-means Clustering with
Background Knowledge. Kiri Wagstaff, Claire Cardie, Seth Rogers,
and Stefan Schroedl.
Proceedings of the Eighteenth International Conference
on Machine Learning, Morgan Kaufmann, 2001.
Multi-document Summarization via
Information Extraction. Michael White, Tanya Korelsky; Claire
Cardie, Vincent Ng, David Pierce, and Kiri Wagstaff.
Proceedings of the First International Conference on Human
Language Technology Research (HLT-01), 2001.
Detecting
Discrepancies and Improving Intelligibility: Two Preliminary Evaluations of
RIPTIDES. Michael White, Claire Cardie,
Vincent Ng, Kiri Wagstaff, and Daryl McCullough. 2001
Document Understanding Conference (DUC-01), 2001.
User-Oriented
Machine Learning Strategies for Information Extraction: Putting the Human
Back in the Loop. David Pierce and Claire Cardie.
Working Notes of the IJCAI-2001 Workshop on Adaptive Text
Extraction and Mining, pages 80-81,
2001.
Using
Clustering and SuperConcepts within SMART: TREC 6. C.
Buckley,M. Mitra, J.Walz, and C. Cardie. Information
Processing and Management,
36(1), 109–131, 2000.
Examining the Role of Statistical
and Linguistic Knowledge Sources in a General-Knowledge
Question-Answering System. C. Cardie, V. Ng, D. Pierce, and
C. Buckley. Proceedings of the Sixth Applied Natural Language
Processing Conference (ANLP-2000), 180--187, Association for
Computational Linguistics / Morgan Kaufmann, 2000.
Towards Translingual Information Access
Using Portable Information Extraction. M. White, C. Cardie, C. Han,
N. Kim, B. Lavoie, M. Palmer, O. Rambow, J. Yoon. Proceedings of
the ANLP/NAACL Workshop on Embedded Machine Translation Systems,
31--37, 2000.
Integrating Case-Based Learning and
Cognitive Biases for Machine Learning of Natural Language.
C. Cardie. Journal of Experimental and Theoretical
Artificial Intelligence, 11, 297--337, 1999.
The Role of Lexicalization and Pruning
for Base Noun Phrase Grammars. C. Cardie and
D. Pierce. Proceedings of the Sixteenth National Conference on
Artificial Intelligence, 423-430, AAAI Press, 1999.
The Smart/Empire TIPSTER IR
System. Chris Buckley, Claire Cardie, Scott Mardis, Mandar Mitra,
David Pierce, Kiri Wagstaff, and Janet Walz. TIPSTER Phase III
Proceedings, 107--121, Morgan Kaufmann, 1999.
SMART High Precision: TREC 7. Chris
Buckley, Mandar Mitra, Janet Walz, and Claire Cardie. Proceedings of
the Seventh Text REtrieval Conference (TREC-7), NIST Special
Publication 500-242, 285-298, 1998.
Guest Editors' Introduction:
Machine Learning and Natural Language. C. Cardie and
R. Mooney. Machine Learning, 11:(1-3), 1--5, 1999.
Error-Driven Pruning of Treebank
Grammars for Base Noun Phrase Identification. C. Cardie and
D. Pierce. ACL/Coling-98, 218--224. Association for
Computational Linguistics, 1998.
Using Clustering and SuperConcepts within SMART: TREC 6.
C. Buckley, M. Mitra, J. Walz, and C. Cardie.
Proceedings of the Sixth Text REtrieval Conference (TREC-6),
NIST Special Publication 500--240, 107-124, 1998.
Proposal
for an Interactive Environment for Information Extraction.
C. Cardie and D. Pierce. Cornell CS Technical Report
TR98--1702, 1998.
Empirical Methods in Information
Extraction. C. Cardie. AI Magazine, 18:4,
65--79 1997. [Note that this is the version of the paper BEFORE it was
formatted for AI Magazine by their editors.]
Improving Minority Class Prediction
Using Case-Specific Feature Weights. C. Cardie and
N. Howe. Proceedings of the Fourteenth International Conference on
Machine Learning, D. Fisher, editor, Morgan Kaufmann, 57--65,
1997.
Examining Locally Varying Weights for
Nearest Neighbor Algorithms. N. Howe and
C. Cardie. Case-Based Reasoning Research and Development: Second
International Conference on Case-Based Reasoning, D. Leake and
E. Plaza, eds., Lecture Notes in Aritificial Intelligence, Springer,
455-466, 1997.
An Analysis of Statistical and
Syntactic Phrases. M. Mitra, C. Buckley, A. Singhal, and
C. Cardie. 5TH RIAO Conference, Computer-Assisted Information
Searching On the Internet, 200-214, 1997.
Proposal
for a Framework for the High-Precision Identification of Linguistic
Relationships. C. Cardie and S. Mardis. Cornell CS Technical
Report TR97--1653, 1997.
Automating Feature Set Selection for Case-Based Learning of
Linguistic Knowledge. C. Cardie. Proceedings of the Conference
on Empirical Methods in Natural Language Processing, 113-126, University of Pennsylvania,
1996.
Embedded Machine Learning Systems for Natural Language Processing: A
General Framework.
C. Cardie. In Wermter, S. and Riloff, E.
and Scheler, Gabriele (eds.), Connectionist, Statistical and
Symbolic Approaches to Learning for Natural Language Processing,
Lecture Notes in Artificial Intelligence, 315-328, Springer,
1996. Originally presented at the Workshop on New Approaches to
Learning for Natural Language Processing, 14th International Joint
Conference on Artificial Intelligence (IJCAI-95), 119-126,
1995. AAAI Press.
Domain-Specific Knowledge Acquisition for Conceptual
Sentence Analysis.
C. Cardie. Ph.D. Thesis, University of Massachusetts, Amherst, MA,
1994. Available as University of Massachusetts, CMPSCI Technical Report
94-74. (178 pages, compressed postscript)
A Case-Based Approach to Knowledge Acquisition for
Domain-Specific Sentence Analysis.
C. Cardie. Proceedings of the Eleventh National Conference on Artificial
Intelligence, 798-803, Washington, DC, 1993. AAAI Press /
MIT Press.
Using Decision Trees to Improve Case-Based Learning.
C. Cardie. Proceedings of the Tenth International Conference on Machine
Learning, 25-32, Amherst, MA, 1993. Morgan Kaufmann.
Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics.
C. Cardie. Proceedings of the 30th Annual Conference of the Association for
Computational Linguistics, 216-223, Newark, DE, 1992. Association for
Computational Linguistics.
Learning to Disambiguate Relative Pronouns.
C. Cardie. Proceedings of the Tenth National Conference on Artificial
Intelligence, 38-43, San Jose, CA, 1992. AAAI Press / MIT Press.
Using Cognitive Biases to Guide Feature Set Selection.
C. Cardie. Proceedings of the Fourteenth Annual Conference of the Cognitive
Science Society, 743-748, Bloomington, IN, Lawrence Erlbaum
Associates, and Working Notes of the AAAI Workshop on
Constraining Learning with Prior Knowledge, 11-18, San Jose, CA,
1992.
A Cognitively Plausible Approach to
Understanding Complicated Syntax.
C. Cardie and W. Lehnert. Proceedings of the Ninth National Conference on Artificial
Intelligence, 117-124, Anaheim, CA, 1991. AAAI Press / MIT Press.
Analyzing Research Papers Using Citation Sentences.
W. Lehnert, C. Cardie, and E. Riloff. Proceedings of the Twelfth Annual Conference of the Cognitive
Science Society, 511-518, Cambridge, MA, 1990. Lawrence Erlbaum
Associates.