Artificial Intelligence Seminar

Spring 2004
Friday 12:00-1:15
Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI),
Computing and Information Science, Cornell

The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. Lunch will be served starting at noon, with the talks running between 12:15 and 1:15. The new format is designed to allow AI chit-chat before the talks begin. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.
Date

Title/Speaker/Abstract/Host

January 30

Updating Probabilities
Joe Halpern

As examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a ``naive space'', which does not take into account the protocol used, can often lead to counterintuitive results. In this talk, I examine why. A criterion known as CAR (``coarsening at random'') in the statistical literature characterizes when ``naive'' conditioning in a naive space works. I show that the CAR condition holds rather infrequently, and provide a procedural characterization of it, by giving a randomized algorithm that generates all and only distributions for which CAR holds. This substantially extends previous characterizations of CAR. If time permits, I will also consider more generalized notions of update such as Jeffrey conditioning and minimizing relative entropy (MRE), give a generalization of the CAR condition that characterizes when Jeffrey conditioning leads to appropriate answers, and show that there exist some very simple settings in which MRE essentially never gives the right results. This generalizes and interconnects previous results obtained in the literature on CAR and MRE.

This is joint work with Peter Grunwald.

February 6

No AI-Seminar - no room

February 13

Learning Models for Object Recognition with the Hausdorff Distance
Pedro F. Felzenszwalb

We consider learning models for object recognition from examples.  Our method is motivated by systems that use the Hausdorff distance as a shape comparison measure.  Typically an object is represented in terms of a model shape.  A new shape is classified as being an instance of the object when the Hausdorff distance between the model and the new shape is small.  We show that such object concepts can be seen as halfspaces (linear threshold functions) in a transformed input space. This makes it possible to use a number of standard algorithms to learn object models from training examples.  When a good model exists, we are guaranteed to find one that provides (with high probability) a recognition rule that is accurate.  Moreover, our approach provides a measure which generalizes the Hausdorff distance in interesting ways. To demonstrate our method we trained a system to detect people in images and a system that can recognize hand-written digits.  The learning techniques can be extended to represent objects using multiple model shapes.  In this way, we might be able to automatically learn a small set of canonical shapes that characterize the appearance of an object.

February 20

The Computational Cogs of Cognitive Grammar
Peter Brodsky

Approaches to Natural Language Processing that concern themselves with syntax (such as CFG based models), generally assume that an understanding of syntactic structure will provide a roadmap for interpreting the semantic values of linguistic input.  However, systems that tend to incorporate semantic features (such as HSPG), often shift the burden of disambiguation and interpretation to pragmatics, and from there to unnamed cognitive mechanisms that potentially integrate non-linguistic knowledge.  A similar process of deferral occurs in the field of vision, where it is often argued that low level information, such as line detection, is processed and made sense of by some higher level system.  Approaches that rely on such unspecified mechanisms for a full account of the stimulus, whether in the realm of vision, language or another modality, have enjoyed success that is inverse to the openness of the domain in which they are deployed.  Even if a mosaic of approaches could be built to cover a very large domain (where each component covers a specific subset of that domain), the problem of identifying when to use which component, must be of essentially equal complexity to the initial problem of handling the given domain.  The problem at hand, I believe, is that of meaning.  This talk will describe a model of language whose goal it is to render a representation of meaning rich enough to be converted back into text of similar or equivalent meaning to the text from which the representation was derived. The talk will detail the system’s method of representing meaning and meaning’s spatial, qualitative and quantitative facets.

February 27

End-to-End Learning of Visual Categories
Yann LeCun

Machine learning and statistical modeling are at the core of many recent advances in data mining, forecasting, biological data analysis, information retrieval, and human-computer interfaces. Despite these successes, many of the classical "grand challenges" of AI, such as the automatic detection and recognition of objects in natural images, have been paradoxically neglected until very recently by the Machine Learning community as well as by the Computer Vision community.

Arguably, all the difficulties are combined in the problem of recognizing generic objects categories (e.g. picking up people, animals, cars, trucks or airplanes in natural images), independently of the particular instance, the pose, the illumination, and surrounding objects and textures: The input dimension is overwhelming (pixels), the intra-class variabilities are extremely complex (variation of appearance due to shape, pose, and illumination), the decision surfaces are highly non-linear, and the necessary number of training samples very large.

Cracking this problem will require solving the problem of learning invariant representations. It will require building very large learning systems composed of multiple heterogeneous modules with millions of adjustable parameters, trained on millions of examples so as to optimize a global performance measure. End-to-end training such systems from raw pixels to object categories requires new ways of integrating heterogeneous trainable modules such as object detectors, segmentors, features extractors, object recognizers, and models of composite object, so that they can be trained cooperatively.

We will first describe a general methodology with which to construct such large-scale heterogeneous learning machines. We will then show that training such systems can be seen as the process of minimizing the average difference between two quantities that are analogous to the Free Energy in statistical physics.

We will report results of generic object recognition experiments with various popular using the NORB dataset. This dataset comprises stereo image pairs of 50 uniform-colored toys under 18 azimuths, 9 elevations, and 6 lighting conditions. The objects belong to 5 generic categories (with 10 instances for each): four-legged animals, human figures, airplanes, trucks, and cars. The systems were trained on 5 instances of each objects and tested on 5 different instances. These experiments demonstrate the advantages of the convolutional neural net architecture (a biologically-inspired model of vision with good invariance properties) and the limiations of template matching-based architectures such as the ever popular Support Vector Machines. They also point out the relative merit of "deep" versus "shallow" architectures.

We will briefly describe a check reading system trained with discriminative Free Energy Difference criterion, that combines convolutional networks, trainable graph manipulation modules, and stochastic language models. This system is integrated in several commercial recognition engines, and currently processes an estimated 10% to 20% of all the checks written in the US with record accuracy.

We will show several live demonstrations of convolutional nets that recognize handwriting, detect human faces in images, and classify generic objects independently of pose and lighting.

Parts of this work are joint with Fu Jie Huang (NYU), and Leon Bottou (NEC Labs)

March 5

Budgeted Learning of Naive-Bayes Classifiers
Russ Greiner

There is often a cost associated with acquiring training data. We consider the situation where the learner, with a fixed budget, can only see data that it has `purchased' during training. Our first results deal with the simplest such problem: We are given a set of independent systems each of which, when queried, returns a reward, drawn from its specific Bernoulli distribution (think ``tossing a coin''). Our goal is to identify which of these systems has the largest expected reward. To do this, we sequentially specify which system to query, subject to the constratint that we can only perform a fixed total number of queries. After explaining how this task differs from many related ideas, including active learning and standard bandit problems, we address the computational complexity of the problem, propose a number of algorithms, and report both the approximation properties of these algorithms and their empirical performance on various problem instances. Our results show that some novel algorithms (including "biased-robin") can significantly out-perform the obvious algorithms, such as round robin and random. We next use these results to learn the parameters of a naive-bayes classifier, subject to the same type of budget constraints. We again consider a number of algorithms, both familiar and novel, and again show empirically that some of our novel algorithms can often out-perform the standard ones.

See http://www.cs.ualberta.ca/~madani/BudgetedLearningPage.html. This is joint work with Dan Lizotte and Omid Madani.

March 12 (Smart) Look-Ahead Arc Consistency ...and the Complexity of Constraint Satisfaction
Hubie Chen

Constraint satisfaction problems arise in a wide variety of domains, such as combinatorics, logic, algebra, database theory, and artificial intelligence. An instance of the constraint satisfaction problem (CSP) consists of a set of variables and a set of constraints on those variables; the goal is to decide whether or not there is an assignment to the variables satisfying all of the constraints.

The CSP is NP-complete in general, motivating the search for polynomial-time tractable cases of the CSP. A particularly useful way to restrict the CSP in order to obtain tractable cases is to require the "constraint language" B to be fixed; denote this restriction by CSP(B). This form of restriction, which goes back to the classic 1978 paper of Schaefer, can capture and place into a unified framework many particular cases of the CSP that have been independently investigated -- for instance, the Horn Satisfiability, 2-Satisfiability, and Graph H-Colorability problems.

In recent years, much effort has been directed towards the fundamental research program of classifying the complexity of CSP(B) for all constraint languages B; substantial and impressive progress has been made along these lines, in part due to new connections between CSP complexity and universal algebra that were uncovered in the nineties. A central component of the CSP(B) classification program is to identify those constraint languages B such that CSP(B) is tractable. The acquisition of CSP(B) tractability results has, almost exclusively, proceeded by isolating an easily describable condition believed to guarantee tractability, and then demonstrating an algorithm that decides CSP(B) for all B satisfying the condition.

In this work, we introduce a radically different approach to the acquisition of CSP(B) tractability results. Instead of taking as our starting point a well-characterized class of constraint languages, we begin with an algorithm for constraint satisfaction, and prove that this algorithm is well-characterized. In particular, we introduce a polynomial time algorithm which we call look-ahead arc consistency (LAAC), and show that those constraint languages B for which LAAC decides CSP(B) are exactly those B satisfying a simple algebraic criterion.

It is our hope that this work will inspire further research devoted to giving algebraic characterizations of algorithms similar to our characterization, and that such research will stimulate an interplay between our new approach of studying well-characterized algorithms, and the classical approach of studying well-characterized problem restrictions.

Joint work with Victor Dalmau.

March 19 Conditional Random Fields for Shallow Parsing and Information Extraction
Fernando Pereira

Conditional random fields (CRFs) are a recently developed method for sequence labeling that offers advantages over both generative models like HMMs and classifiers applied at each sequence position. I will introduce CRFs and discuss their application to shallow parsing and information extraction. For shallow parsing, I show that CRFs achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model. For information extraction, I will discuss ongoing work on identifying genes and mutation mentions in text for an NSF-funded project on information extraction from biomedical text. Our results rely on improved training methods based on modern optimization algorithms and a new feature induction algorithm by Andrew McCallum. I will conclude with a discussion of ongoing work on improved regularization for CRF models.

Joint work with Ryan McDonald and Fei Sha (University of Pennsylvania), and Hanna Wallach (Cambridge University), in collaboration with Andrew McCallum and his group at the University of Massachusetts. Partial funding from NSF (EIA 0205448 and 0205456) and DARPA (SRI contract NBCHD030010).

March 26

No AI-Seminar - Spring Break

April 2 Sentiment Analysis
Bo Pang
April 9

Rhodes 484

Robust Activity Recognition
Henry Kautz

In the early days of the field of artificial intelligence, researchers often explicitly strove to develop systems that could understand a broad range of everyday human experience. As the field matured research concentrated on better-defined, more limited domains. In the Assisted Cognition project we are returning to this original vision, armed with the modern toolkit for probabilistic inference and a new generation of ubiquitous sensing technology. I will describe progress in learning and tracking activities of daily living in the home and patterns of transportation use throughout a community, in the context of practical applications for assistive healthcare technology. Finally I will mention some surprising new connections between probabilistic inference and classical methods for logical reasoning.

April 16

The Language-modeling Approach to Ad-hoc Information Retrieval
Oren Kurland

April 23

Learning, Logic, and Probability: a Unified View
Pedro Domingos

AI systems must be able to learn, reason logically, and handle uncertainty. While much research has focused on each of these goals individually, only recently have we begun to attempt to achieve all three at once. In this talk I will describe Markov logic, a representation that combines the full power of first-order logic and probabilistic graphical models, and algorithms for learning and inference in it. Syntactically, Markov logic is first-order logic augmented with a weight for each formula. Semantically, a set of Markov logic formulas represents a probability distribution over possible worlds, in the form of a Markov network with one feature per grounding of a formula in the set, with the corresponding weight. Formulas and weights are learned from relational databases using inductive logic programming and iterative optimization of a pseudo-likelihood measure. Inference is performed by Markov chain Monte Carlo over the minimal subset of the ground network required for answering the query. Experiments in a real-world university domain illustrate the promise of this approach.

(Joint work with Matt Richardson.)

April 30

Toward Unified Models of Information Extraction and Data Mining
Andrew McCallum

Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and significant mining of complex text sources is beyond reach.

In this talk I will describe work in relational, undirected graphical models for information extraction and data mining. After briefly introducing conditional random fields, I will describe three pieces of recent work that make small steps in the direction toward joint models that make more unified decisions: (1) a random field method for noun co-reference resolution that has strong ties to graph partitioning, (2) an extension of CRFs to factorial state representation, enabling simultaneous part-of-speech tagging and noun-phrase segmentation, (3) an example of improving co-reference by modeling uncertainty about segmentation, and improving segmentation by leveraging co-reference.

Joint work with colleagues at UMass, Ben Wellner, Khashayar Rohanimanesh, Charles Sutton, Michael Hay, Fuchun Peng, David Pinto, Xing Wei, Wei Li, Bruce Croft.

May 7

Sow's Ears, Silk Purses, and Supervised Learning
Rich Caruana

Supervised learning has made tremendous strides in the last 30 years, and we now have a dozen different supervised learning algorithms from which to choose. But this progress has brought confusion: each algorithm optimizes a different thing, no one algorithm dominates the others, and it's difficult to say in advance which algorithm will perform best on each problem and performance metric. 

In this talk we present results from a massive empirical comparison of supervised learning methods. We compare the learning methods using a variety of performance criteria such as accuracy, precision/recall, ROC area, lift, squared error, calibration, and cross entropy, making this one of the more comprehensive empirical studies of supervised learning algorithms ever performed. The results of the study are surprising: it *is* possible to make a silk purse out of a sow's ear, the silk can then be transformed into gold, and that gold can then be transformed into platinum --- yielding a very expensive, but rather stylish, bag.

Joint work with Alex Niculescu.

See also the AI graduate study brochure.

Please contact tj@cs.cornell.edu or any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up! 


CS772, Spring '04
Claire Cardie

Rich Caruana
Carla Gomes
Joe Halpern
Dan Huttenlocher
Thorsten Joachims
Lillian Lee
Bart Selman
Golan Yona
Ramin Zabih

Back to CS course websites