PhD Univ. of Mass., Amherst, 1994
My research focuses on developing corpus-based
techniques for understanding and extracting
information from natural language texts. In particular,
we are investigating the use of machine learning
techniques as tools for guiding natural language
system development and for exploring the
mechanisms that underlie language acquisition. Our
work encompasses three related areas: (1) machine
learning of natural language, (2) the use of
language processing (NLP)
techniques to aid information retrieval (IR) systems,
and (3) the design of user-trainable NLP systems that
can efficiently and reliably extract the important
information from a document.
In recent work with Ph.D. students Scott Mardis and
David Pierce, we have developed a new approach to
partial parsing of natural language texts that relies on
machine learning methods. The approach combines
corpus-based grammar induction with a very simple
pattern-matching algorithm and an optional constituent
verification step. In evaluations on a number of
large-scale partial parsing tasks, the approach
produces partial parsers that are both fast and
We are also working with SaBIR Research to
develop a unified approach to improving the end-user
efficiency of state-of-the-art information retrieval
systems. In particular, we are working on methods
that combine statistical and linguistic text analysis for
near-duplicate document detection, high-precision
text retrieval, query-dependent text summarization,
and cross-document text summarization.
- Selection Committee: Computer Science
Department Chair; Engineering College Assoc.
Dean for Undergraduate Programs; Cognitive Studies Summer Fellowships; Cognitive
Studies Continuing Fellowships; Cognitive
Studies Incoming Fellowship
- Faculty Recruiting Committee: Department of
Editorial Board: Machine
Editor: Special Issue of
Machine Learning Journal on
Natural Language Learning
11 (1-3), (with R. Mooney).
Program committees: Sixteenth
International Conference on
Machine Learning; 37th Annual
Meeting of the Association
for Computational Linguistics;
Topic Detection Session, 37th
Annual Meeting of the
Assoc. .for Computational
Linguistics; Fifteenth National Conference
on Artificial Intelligence
Executive Board: SIGDAT,
Special Interest Group of ACL
for Linguistic Data and
Corpus-based approaches to
NSF Review Panel:
Panelist: 1999 AAAI/SIGART
Doctoral Consortium, a
mentoring workshop for Ph.D.
students in Artificial Intelligence
Board of Advisors:
GirlWideWeb, an internet
magazine for girls 9 to15
- Symbolic machine learning for
processing (tutorial). 37th Annual Meeting
of the Association for
Computational Linguistics, Univ.
of Maryland, June 1999 (with
Combining error-driven pruning
and classification for partial
parsing. Sixteenth International
Conference on Machine
Learning, Bled, Slovenia, June
Empire and SMART in the
Initiative 24-Month Workshop,
Baltimore, MD, October 1998.
- Guest editors' introduction:
Machine learning and natural
language. Machine Learning
11, 1-3 (Feb. 1999), 1-5 (with
- Integrating case-based learning
and cognitive biases for machine
learning of natural language.
Journal of Experimental and
Intelligence 11 (1999), 1-41.
- Clustering and Super Concepts
within SMART: TREC 6. Information
Management 35 (1999) (with
C. Buckley, M. Mitra, and J.
- Noun phrase coreference as
of the Joint Conference on Empirical Methods in Natural Language
Processing and Very Large Corpora (1999),82-89 (with K. Wagstaff).
- Combining error-driven pruning
and classification for partial parsing.
Proceedings of the
Sixteenth International Conference on
Machine Learning (1999), 87-96 (with S.
Mardis and D. Pierce).
- The Smart/Empire TIPSTER IR System.
TIPSTER Phase III Proceedings, Morgan
Kaufmann, (1999) (with C. Buckley, S.
Mardis, M. Mitra, D. Pierce, K. Wagstaff, and J. Walz).
SMART high precision: TREC 7. Proceedings of the Seventh Text Retrieval
Conference (TREC-7), (1999),
285-298 (with C. Buckley, M. Mitra, and J.
Error-driven pruning of treebank grammars
for base noun phrase identification. Proc.
Ann. Conf. Assoc. Computational
Linguistics and COLING-98, Assoc. for Computational
Linguistics, (1998),218-224, (with D. Pierce).