Andrew McCallum

University of Massachusetts Amherst

Information Extraction, Information Integration and Joint Inference

Advances in machine learning have enabled the research community to build fairly accurate models for individual components of a natural language processing system, such as noun phrase segmentation, named entity recognition and entity resolution.  However there has been significantly less success stitching these components together into a useful, high-accuracy end-to-end system.  This is because errors cascade and compound in a pipeline---for example, six components each having 90% accuracy may have only about 50% accuracy when pipelined.


In this talk I will describe work in probabilistic models that perform joint inference across multiple components of an information processing pipeline in order to avoid the brittle accumulation of errors.  The need for joint inference appears not only in extraction and integration, but also in natural language processing, computer vision, robotics and elsewhere.  I will argue that joint inference is one of the most fundamental issues in articificial intelligence.


I will present recent work in conditional random fields for information extraction and integration, with a focus on joint inference through stochastic approximations, weighted first-order logic, and new methods of probabilistic programming that enable reasoning about large-scale data.


Joint work with colleagues at UMass: Charles Sutton, Aron Culotta, Khashayar Rohanemanesh, Chris Pal, Greg Druck, Karl Schultz, Sameer Singh, Pallika Kanani, Kedare Bellare, Michael Wick, Rob Hall and Gideon Mann.


Andrew McCallum is an Associate Professor and Director of the Information Extraction and Synthesis Laboratory in the Computer Science Department at University of Massachusetts Amherst. New work on search and bibliometric analysis of open-access research literature can be found at  McCallum's web page is


B17 Upson Hall

Tuesday, February 17, 2009

Refreshments at 3:45pm in the Upson 4th Floor Atrium

Computer Science


Spring 2009