Claire Cardie
Professor, Department of Computer Science
Charles and Barbara Weiss Director,
Information Science
Office hours (spring 2009): Mondays, 3:30-4:30pm; Wednesdays, 1:00-2:00pm.
Research Interests
Teaching
Publications
CV/Resume
My primary research is in the area of natural language understanding and intelligent text processing where my goal is to develop algorithms and systems that will vastly improve a user's ability to find, absorb, and extract information from on-line text. My group's research generally proceeds at two complementary levels: we focus both on building real systems for large-scale natural language processing tasks and on developing techniques to address underlying theoretical problems in syntactic and semantic analysis of natural language. In particular, we are investigating the use of machine learning techniques as tools for guiding natural language system development and for exploring the mechanisms that underlie language understanding. Our work encompasses a number of related areas:
Currently, we are working on noun phrase coreference (within-document and
cross-document), weakly
supervised learning methods for NLP, and building opinion-oriented question-answering
and summarization systems. For information on these and other NLP projects at Cornell, follow this
link.
Some of my research has focused directly on the development of new
machine learning techniques. In particular, some of the group's
research in this area has addressed:
Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis. Yejin Choi and Claire Cardie. Empirical Methods in Natural Language Processing (EMNLP), 2008.
Topic Identification for Fine-Grained Opinion Analysis. Veselin Stoyanov and Claire Cardie. Proceedings of the Conference on Computational Linguistics (COLING 2008), 2008.
The power of negative thinking: Exploiting Label Disagreement in the Min-cut Classification Framework. Mohit Bansal and Claire Cardie and Lillian Lee. Proceedings of the Conference on Computational Linguistics (COLING 2008): Companion volume: Posters, 2008.
Annotating Topics of Opinions. Veselin Stoyanov and Claire Cardie. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008.
An eRulemaking Corpus: Identifying
Substantive Issues in Public Comments. Claire Cardie, Cynthia Farina, Matt Rawding, Adil Aijaz.
Proceedings of the Sixth International Conference on Language
Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008.
A Study in Rule-Specific Issue Categorization for
e-Rulemaking. Claire Cardie, Cynthia Farina, Adil Aijaz, Matt Rawding, Stephen
Purpura.
9th Annual International Conference on Digital Government
Research, Montreal, Canada, 2008.
Active Learning for e-Rulemaking:
Public Comment Categorization. Stephen Purpura, Claire Cardie, Jesse Simons.
9th Annual International Conference on Digital Government
Research, Montreal, Canada, 2008.
Structured Local Training and Biased Potential Functions for
Conditional Random Fields with Application to Coreference Resolution.
Yejin Choi and Claire Cardie.
NAACL Human Language Technology Conference (NAACL-HLT), 2007.
Identifying Expressions of Opinion in Context.
Eric Breck, Yejin Choi, and Claire Cardie.
Twentieth International Joint Conference on Artificial Intelligence (IJCAI), 2007.
Cornell System Description for the
NTCIR-6 Opinion Task. Eric Breck, Yejin Choi, Veselin Stoyanov,
and Claire Cardie.
The 6th NTCIR Workshop Meeting, Tokyo, Japan, 2007.
Joint Extraction of Entities and Relations for Opinion Recognition.
Yejin Choi, Eric Breck, and Claire Cardie.
Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2006.
Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning.
Veselin Stoyanov and Claire Cardie.
Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2006.
Toward Opinion Summarization: Linking the Sources.
Veselin Stoyanov and Claire Cardie.
COLING-ACL 2006 Workshop on Sentiment and Subjectivity in Text, 2006.
Using Natural Language Processing to Improve E-rulemaking.
Claire Cardie, Cynthia Farina, Thomas Bruce, and Erica Wagner.
Proceedings of the 7th Annual International Conference on Digital Government Research, 2006.
Better Inputs for Better Outcomes: Using the Interface to Improve e-Rulemaking.
Cynthia Farina, Claire Cardie, Thomas Bruce, Erica Wagner.
Workshop on eRulemaking at the Crossroads, Proceedings of the
7th Annual International Conference on Digital Government Research,
2006.
Annotating Expressions of Opinions and Emotions in Language. Janyce Wiebe, Theresa Wilson, Claire Cardie. Language Resources and Evaluation (formerly Computers and the Humanities), 39:2-3, 2005.
Identifying Sources of Opinions
with Conditional Random Fields and Extraction Patterns. Yejin Choi, Claire
Cardie, Ellen Riloff, and Siddharth Patwardhan.
Proceedings of HLT-EMNLP 2005, 2005.
Multi-Perspective Question
Answering Using the OpQA Corpus. Ves Stoyanov, Claire Cardie and Janyce Wiebe.
Proceedings of HLT-EMNLP 2005, 2005. Optimizing to Arbitrary NLP
Metrics using Ensemble Selection. Art Munson, Claire Cardie, and Rich
Caruana.
Proceedings of HLT-EMNLP 2005, 2005. OpinionFinder: A System for Subjectivity
Analysis. Theresa Wilson, Paul Hoffmann, Swapna Somasundaran, Jason Kessler, Janyce Wiebe; Yejin Choi, Claire Cardie; Ellen Riloff and Siddharth
Patwardhan. Proceedings of HLT/EMNLP 2005 Interactive Demonstrations, 2005. (demo) Evaluating
an Opinion Annotation Scheme Using a New Multi-Perspective Question and Answer
Corpus. Veselin
Stoyanov, Claire Cardie, Janyce Wiebe,
and Diane Litman. In Computing Attitude and Afftect in Text: Theory
and Practice. Shanahan, Qu, and Wiebe (eds.), Springer, 2005.
Originally appeared in 2004 AAAI Spring Symposium
on Exploring Attitude and Affect in Text, AAAI
Press, 2004. Answer
annotation instructions; Question
creation instructions. Playing
the Telephone Game: Determining the Hierarchical Structure of Perspective and
Speech Expressions. Eric Breck and Claire Cardie. 20th
International Conference on Computational Linguistics (COLING-04), 2004.
Low-Level Annotations and
Summary Representations of Opinions
for Multiperspective QA. Claire
Cardie, Janyce Wiebe, Theresa Wilson, & Diane Litman. In Mark Maybury
(ed), New Directions in Question Answering , AAAI Press/MIT Press,
2004. (Originally apeared at the 2003 AAAI Spring Symposium
on New Directions in Question Answering.)
Weakly
Supervised Natural Language LearningWithout Redundant Views. Vincent
Ng and Claire Cardie. Human Language Technology Conference of the North
American Chapter of the Association for
Computational Linguistics (HLT-NAACL 2003), 173–180, Association for
Computational Linguistics, 2003.
Bootstrapping
Coreference Classifiers with Multiple Machine Learning Algorithms. Vincent
Ng and Claire Cardie. Proceedings of the 2003
Conference on Empirical Methods in Natural Language Processing (EMNLP-2003),
Association for Computational Linguistics, 2003.
Recognizing and
Organizing Opinions Expressed in theWorld Press. JanyceWiebe,
Eric Breck, Chris Buckley, Claire Cardie, Paul Davis, Bruce Fraser, Diane
Litman, David Pierce, Ellen Riloff, TheresaWilson, David Day,
Mark Maybury. 2003 AAAI Spring Symposium on New
Directions in Question Answering, 12–19, AAAI
Press, 2003.
NRRC SummerWorkshop
on Multiple-Perspective Question Answering: Final Report. JanyceWiebe,
Eric Breck, Chris Buckley, Claire Cardie, Paul Davis, Bruce Fraser, Diane
Litman, David Pierce, Ellen Riloff, TheresaWilson. 2002.
Improving Machine
Learning Approaches to Coreference Resolution. Vincent
Ng and Claire Cardie. Proceedings of the 40th Annual
Meeting of the Association for Computational Linguistics,
Association for Computational Linguistics, 2002.
Identifying
Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution.
Vincent Ng and Claire Cardie. Proceedings
of the 19th International Conference on Computational Linguistics
(COLING-2002), 2002.
Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules.
Vincent Ng and Claire Cardie. Proceedings
of the 2002 Conference on Empirical Methods in Natural Language Processing,
Association for Computational Linguistics, 2002.
Detecting
Discrepancies in Numerical Estimates Using Multidocument Hypertext Summaries.
Michael White, Claire Cardie, Vincent Ng, and Daryl McCullough.
Proceedings of the Second International Conference on Human
Language Technology Research (HLT-02), 2002.
Selecting Sentences
for Multidocument Summaries Using Randomized Local Search. Michael White
and Claire Cardie. ACL Workshop on Automatic Summarization, 2002.
Limitations of Co-Training for
Natural Language Learning from Large Datasets. David Pierce and
Claire Cardie.
Proceedings of the 2001 Conference on Empirical
Methods in Natural Language Processing (EMNLP-2001), Association for
Computational Linguistics Research,2001.
Constrained K-means Clustering with
Background Knowledge. Kiri Wagstaff, Claire Cardie, Seth Rogers,
and Stefan Schroedl.
Proceedings of the Eighteenth International Conference
on Machine Learning, Morgan Kaufmann, 2001.
Multi-document Summarization via
Information Extraction. Michael White, Tanya Korelsky; Claire
Cardie, Vincent Ng, David Pierce, and Kiri Wagstaff.
Proceedings of the First International Conference on Human
Language Technology Research (HLT-01), 2001.
Detecting
Discrepancies and Improving Intelligibility: Two Preliminary Evaluations of
RIPTIDES. Michael White, Claire Cardie,
Vincent Ng, Kiri Wagstaff, and Daryl McCullough. 2001
Document Understanding Conference (DUC-01), 2001.
User-Oriented
Machine Learning Strategies for Information Extraction: Putting the Human
Back in the Loop. David Pierce and Claire Cardie.
Working Notes of the IJCAI-2001 Workshop on Adaptive Text
Extraction and Mining, pages 80-81,
2001.
Using
Clustering and SuperConcepts within SMART: TREC 6. C.
Buckley,M. Mitra, J.Walz, and C. Cardie. Information
Processing and Management,
36(1), 109–131, 2000.
Examining the Role of Statistical
and Linguistic Knowledge Sources in a General-Knowledge
Question-Answering System. C. Cardie, V. Ng, D. Pierce, and
C. Buckley. Proceedings of the Sixth Applied Natural Language
Processing Conference (ANLP-2000), 180--187, Association for
Computational Linguistics / Morgan Kaufmann, 2000.
Towards Translingual Information Access
Using Portable Information Extraction. M. White, C. Cardie, C. Han,
N. Kim, B. Lavoie, M. Palmer, O. Rambow, J. Yoon. Proceedings of
the ANLP/NAACL Workshop on Embedded Machine Translation Systems,
31--37, 2000.
Integrating Case-Based Learning and
Cognitive Biases for Machine Learning of Natural Language.
C. Cardie. Journal of Experimental and Theoretical
Artificial Intelligence, 11, 297--337, 1999.
The Role of Lexicalization and Pruning
for Base Noun Phrase Grammars. C. Cardie and
D. Pierce. Proceedings of the Sixteenth National Conference on
Artificial Intelligence, 423-430, AAAI Press, 1999.
The Smart/Empire TIPSTER IR
System. Chris Buckley, Claire Cardie, Scott Mardis, Mandar Mitra,
David Pierce, Kiri Wagstaff, and Janet Walz. TIPSTER Phase III
Proceedings, 107--121, Morgan Kaufmann, 1999.
SMART High Precision: TREC 7. Chris
Buckley, Mandar Mitra, Janet Walz, and Claire Cardie. Proceedings of
the Seventh Text REtrieval Conference (TREC-7), NIST Special
Publication 500-242, 285-298, 1998.
Guest Editors' Introduction:
Machine Learning and Natural Language. C. Cardie and
R. Mooney. Machine Learning, 11:(1-3), 1--5, 1999.
Error-Driven Pruning of Treebank
Grammars for Base Noun Phrase Identification. C. Cardie and
D. Pierce. ACL/Coling-98, 218--224. Association for
Computational Linguistics, 1998.
Using Clustering and SuperConcepts within SMART: TREC 6.
C. Buckley, M. Mitra, J. Walz, and C. Cardie.
Proceedings of the Sixth Text REtrieval Conference (TREC-6),
NIST Special Publication 500--240, 107-124, 1998.
Proposal
for an Interactive Environment for Information Extraction.
C. Cardie and D. Pierce. Cornell CS Technical Report
TR98--1702, 1998.
Empirical Methods in Information
Extraction. C. Cardie. AI Magazine, 18:4,
65--79 1997. [Note that this is the version of the paper BEFORE it was
formatted for AI Magazine by their editors.]
Improving Minority Class Prediction
Using Case-Specific Feature Weights. C. Cardie and
N. Howe. Proceedings of the Fourteenth International Conference on
Machine Learning, D. Fisher, editor, Morgan Kaufmann, 57--65,
1997.
Examining Locally Varying Weights for
Nearest Neighbor Algorithms. N. Howe and
C. Cardie. Case-Based Reasoning Research and Development: Second
International Conference on Case-Based Reasoning, D. Leake and
E. Plaza, eds., Lecture Notes in Aritificial Intelligence, Springer,
455-466, 1997.
An Analysis of Statistical and
Syntactic Phrases. M. Mitra, C. Buckley, A. Singhal, and
C. Cardie. 5TH RIAO Conference, Computer-Assisted Information
Searching On the Internet, 200-214, 1997.
Proposal
for a Framework for the High-Precision Identification of Linguistic
Relationships. C. Cardie and S. Mardis. Cornell CS Technical
Report TR97--1653, 1997.
Automating Feature Set Selection for Case-Based Learning of
Linguistic Knowledge. C. Cardie. Proceedings of the Conference
on Empirical Methods in Natural Language Processing, 113-126, University of Pennsylvania,
1996.
Embedded Machine Learning Systems for Natural Language Processing: A
General Framework.
C. Cardie. In Wermter, S. and Riloff, E.
and Scheler, Gabriele (eds.), Connectionist, Statistical and
Symbolic Approaches to Learning for Natural Language Processing,
Lecture Notes in Artificial Intelligence, 315-328, Springer,
1996. Originally presented at the Workshop on New Approaches to
Learning for Natural Language Processing, 14th International Joint
Conference on Artificial Intelligence (IJCAI-95), 119-126,
1995. AAAI Press.
Domain-Specific Knowledge Acquisition for Conceptual
Sentence Analysis.
C. Cardie. Ph.D. Thesis, University of Massachusetts, Amherst, MA,
1994. Available as University of Massachusetts, CMPSCI Technical Report
94-74. (178 pages, compressed postscript)
A Case-Based Approach to Knowledge Acquisition for
Domain-Specific Sentence Analysis.
C. Cardie. Proceedings of the Eleventh National Conference on Artificial
Intelligence, 798-803, Washington, DC, 1993. AAAI Press /
MIT Press.
Using Decision Trees to Improve Case-Based Learning.
C. Cardie. Proceedings of the Tenth International Conference on Machine
Learning, 25-32, Amherst, MA, 1993. Morgan Kaufmann.
Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics.
C. Cardie. Proceedings of the 30th Annual Conference of the Association for
Computational Linguistics, 216-223, Newark, DE, 1992. Association for
Computational Linguistics.
Learning to Disambiguate Relative Pronouns.
C. Cardie. Proceedings of the Tenth National Conference on Artificial
Intelligence, 38-43, San Jose, CA, 1992. AAAI Press / MIT Press.
Using Cognitive Biases to Guide Feature Set Selection.
C. Cardie. Proceedings of the Fourteenth Annual Conference of the Cognitive
Science Society, 743-748, Bloomington, IN, Lawrence Erlbaum
Associates, and Working Notes of the AAAI Workshop on
Constraining Learning with Prior Knowledge, 11-18, San Jose, CA,
1992.
A Cognitively Plausible Approach to
Understanding Complicated Syntax.
C. Cardie and W. Lehnert. Proceedings of the Ninth National Conference on Artificial
Intelligence, 117-124, Anaheim, CA, 1991. AAAI Press / MIT Press.
Analyzing Research Papers Using Citation Sentences.
W. Lehnert, C. Cardie, and E. Riloff. Proceedings of the Twelfth Annual Conference of the Cognitive
Science Society, 511-518, Cambridge, MA, 1990. Lawrence Erlbaum
Associates.