Claire Cardie

Assistant Professor

PhD Univ. of Mass., Amherst, 1994

My research focuses on developing corpus-based techniques for understanding and extracting information from natural language texts. In particular, we are investigating the use of machine learning techniques as tools for guiding natural language system development and for exploring the
mechanisms that underlie language acquisition. Our work encompasses three related areas: (1) machine learning of natural language, (2) the use of corpus-based natural 

language processing (NLP) techniques to aid information retrieval (IR) systems, and (3) the design of user-trainable NLP systems that can efficiently and reliably extract the important information from a document.

In recent work with Ph.D. students Scott Mardis and David Pierce, we have developed a new approach to partial parsing of natural language texts that relies on machine learning methods. The approach combines corpus-based grammar induction with a very simple pattern-matching algorithm and an optional constituent verification step. In evaluations on a number of large-scale partial parsing tasks, the approach produces partial parsers that are both fast and accurate. 

We are also working with SaBIR Research to develop a unified approach to improving the end-user efficiency of state-of-the-art information retrieval systems. In particular, we are working on methods that combine statistical and linguistic text analysis for near-duplicate document detection, high-precision text retrieval, query-dependent text summarization, and cross-document text summarization. 


  • NSF Career Award (1996-2000) 

University Activities 
  • Selection Committee: Computer Science Department Chair; Engineering College Assoc.
    Dean for Undergraduate Programs; Cognitive Studies Summer Fellowships; Cognitive
    Studies Continuing Fellowships; Cognitive Studies Incoming Fellowship 
  • Faculty Recruiting Committee: Department of Computer Science 
Professional Activities
  • Editorial Board: Machine Learning, 1999-2001. 
  • Editor: Special Issue of Machine Learning Journal on Natural Language Learning 11 (1-3), (with R. Mooney).  
  • Program committees: Sixteenth International Conference on Machine Learning; 37th Annual
    Meeting of the Association for Computational Linguistics; Topic Detection Session, 37th
    Annual Meeting of the Assoc. .for Computational Linguistics; Fifteenth National Conference
    on Artificial Intelligence  
  • Executive Board: SIGDAT, Special Interest Group of ACL for Linguistic Data and
    Corpus-based approaches to NLP 
  • NSF Review Panel: Human-Computer Interact ion, 1999  
  • Panelist: 1999 AAAI/SIGART Doctoral Consortium, a mentoring workshop for Ph.D. 
    students in Artificial Intelligence  
  • Board of Advisors: GirlWideWeb, an internet magazine for girls 9 to15 
  • Symbolic machine learning for natural language processing (tutorial). 37th Annual Meeting
    of the Association for Computational Linguistics, Univ. of Maryland, June 1999 (with
    R. Mooney).  
  • Combining error-driven pruning and classification for partial parsing. Sixteenth International
    Conference on Machine Learning, Bled, Slovenia, June 1999.  
  • Empire and SMART in the TRUESmart Interface. TIPSTER Text-Processing Initiative 24-Month Workshop, Baltimore, MD, October 1998. 


  • Guest editors' introduction: Machine learning and natural language. Machine Learning
    11, 1-3 (Feb. 1999), 1-5 (with R. Mooney). 
  • Integrating case-based learning and cognitive biases for machine learning of natural language. Journal of Experimental and Theoretical Artificial Intelligence 11 (1999), 1-41. 
  • Clustering and Super Concepts within SMART: TREC 6. Information Processing and 
    35 (1999) (with C. Buckley, M. Mitra, and J. Walz). 
  • Noun phrase coreference as clustering. Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999),82-89 (with K. Wagstaff). 
  • Combining error-driven pruning and classification for partial parsing. Proceedings of the
    Sixteenth International Conference on Machine Learning
    (1999), 87-96 (with S. Mardis and D. Pierce). 
  • The Smart/Empire TIPSTER IR System. TIPSTER Phase III Proceedings, Morgan Kaufmann, (1999) (with C. Buckley, S. Mardis, M. Mitra, D. Pierce, K. Wagstaff, and J. Walz).  
  • SMART high precision: TREC 7. Proceedings of the Seventh Text Retrieval Conference (TREC-7), (1999), 285-298 (with C. Buckley, M. Mitra, and J. Walz).  
  • Error-driven pruning of treebank grammars for base noun phrase identification. Proc. Ann. Conf. Assoc. Computational Linguistics and COLING-98, Assoc. for Computational Linguistics, (1998),218-224, (with D. Pierce).