Artificial Intelligence Seminar

Fall 2003
Friday 12:00-1:15
Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI),
Computing and Information Science, Cornell

The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. Lunch will be served starting at noon, with the talks running between 12:15 and 1:15. The new format is designed to allow AI chit-chat before the talks begin. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.

Date
Speaker/Title/Abstract/Host

October 3 Machine Learning for Coreference Resolution
Vincent Ng
It is commonly observed that a human speaker or author will avoid repetition by using a variety of noun phrases (NPs) to refer to the same real-world entity. While human audiences generally have little trouble mapping a collection of NPs onto the same entity, this task of coreference resolution can present a challenge to natural language processing systems.

This talk examines coreference resolution in the context of two machine learning paradigms: supervised learning and weakly supervised learning. Supervised approaches to coreference resolution have been reasonably successful, operating primarily by training a classifier that determines whether two NPs are co-referring. In the first part of the talk, we will discuss potential problems with recasting coreference resolution as a classification task and show how to circumvent these problems via example selection and error-driven pruning of classification rulesets.

The second part of the talk will center around weakly supervised approaches to coreference resolution, in which learning proceeds by bootstrapping a large set of automatically labeled data for coreference from a very small set of labeled instances. In particular, we investigate the applicability of multi-view weakly supervised algorithms to coreference resolution, which does not appear to have a natural feature split.

This is joint work with Claire Cardie.

October 10
Transductive Learning via Spectral Graph Partitioning
Thorsten Joachims

Consider the following type of prediction problem: you have a large database of object (e.g. text documents) that you would like to organize according to some classification. You know the classification for a small subset of the objects and you would like to infer the classes for the remaining objects. A typical example is relevance feedback in information retrieval: a user specifies some documents as relevant/non-relevant and wants to find all other relevant documents in the collection. This type of machine learning task is called transductive inference. A key difference to the typical inductive machine learning problem is that the location of the test points can be observed by the learning algorithm. I will motivate that exploiting cluster structure in the test set can help make more accurate predictions, in particular for text classification.

However, designing transductive learning algorithms that simultaneously minimize training error and maximize a measure of clustering in the test set is challenging. In this talk, I will present an approach based on spectral graph partitioning. In this approach, objects in the database form the nodes of a graph, while the edges represent dependencies. For such graphs, I will generalize ratio-cuts to find a clustering of the database that obeys the known classifications for the training examples. The relaxation of the resulting optimization problem can be solved as an eigenvalue problem. Empirical results show improved prediction accuracy especially for small training sets. Furthermore, the framework is very flexible, making it possible to implement co-training as a special case.

October 17
Using Machine Learning to Model Standard Medical Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees
Rich Caruana

We used bagged probabilistic decision trees to train models to predict the probability that an expectant mother will need a C-section. The models were trained on a database of 22,175 expectant mothers, 16.8% of which delivered by Caesarian Sction. The patients in the study are from 17 different physician practices. Our goal is not to predict if a new patient will need a C-section, but to perform a retrospective analysis of the 17 physician practices to see if some practices are performing C-sections at abnormally high or low rates compared to their colleagues. To do this we train models to predict the standard practice across all the physician groups, and then use the models to evaluate the aggregate risk of C-section for the patients in each practice. If a practice has an observed C-section rate higher than what the model predicts for them, we conclude that they are performing too many C-sections. For this to work, it is critical that the models yield well calibrated probabilities so that they do not systematically under or over predict on physician practices specializing in high-risk patients, i.e., practices that see patients predominantly in the tails of the risk distribution.

This is joint work with Radu Niculescu (CMU), Bharat Rao (Siemens), and Cynthia Simms MD (Magee Womens Hospital). This paper will be presented at the American Medical Informatics Association (AMIA) Meeting this November.

WEDNESDAY
10:00am
October 22
Efficient Learning in Stochastic Games
Ronen Brafman
Stochastic (or Markov) games are a rich model for multi-agent interactions. They generalize the Markov decsision process model and the repeated game model. At each stage the agent engages in a single shot game whose outcome determines both his reward and the next stage's game.

I will describe R-max, a very simple model-based algorithm for learning in zero-sum stochastic games. R-max is the first algorithm to be show to converge to near-optimal value in polynomial time. Moreover, unlike most other algorithms, it has a built in method for trading exploration and exploitation which provides the first formal justification of some popular reinforcement learning heuristics.

Time permitting, I will also describe extensions to common-interest stochastic games and a new normative criteria for game-learning algorithms called an efficient learning equilibrium.

October 31
Reasoning about Evidence
Riccardo Pucella

As soon as one wants to reason about systems containing a mix of nondeterministic actions and probabilistic actions, annoying things start to happen. A purely probabilistic analysis of such systems, the default method, often fails to capture fairly simple intuitions about them. Examples readily arise in the field of verification of probabilistic protocols, and in decision theory. It turns out that a key notion, evidence, can nicely capture these intuitions that cannotbe accounted for directly by existing analyses. Intuitively, we can view evidence as a function from prior beliefs (before an observation is made) to posterior beliefs (after the observation). I formalize this notion of evidence in a propositional modal logic, present a sound and complete axiomatization of the logic, and say a word about complexity.

Joint work with Joe Halpern.

November 7 Accelerating Random Walks
Wei Wei
In recent years, there has been much success using random walk style local search algorithms to solve combinatorial search problems, including Boolean satisfiability problems. However, such procedures are not as effective on problem classes that contain many hidden variable dependencies. We will discuss two approaches for speeding up the combinatorial search process by extracting structure from the underlying problem domain. The first method employs a general machine learning framework to discover good start states for local search methods. The second approach transforms the search space by introducing new problem constraints. These new constraints can be derived efficiently, and, literally, "accelerate" the random walk search process.

Joint work with Bart Selman.

November 14
On some Semantical and Computational Issues in Database Preference Queries
Carmel Domshlak

The problem of preference elicitation and management has generated much interest in the database systems community in recent years. This interest stems from a rapidly growing class of untrained, lay users browsing extremely large databases accessible through the Internet. Typically these users do not have clear knowledge about the particular items that these databases contain, nor do they typically have a particular result in mind. Rather, they are attempting to identify items that are useful for them in some manner, or in other words, items that suit their preferences best.

To support such users, database systems must be able to process preference queries, i.e. queries that describe desirable properties of the end result of the search process. Such queries must be intuitive to formulate by the user, correctly interpreted by the system, computationally efficient to process, and enable the user to quickly home-in on the desired elements. The main problem is that handling preference queries extends the current capabilities of database systems. This problem has not gone unnoticed, and in last decade the database community has begun to address this issue, proposing general frameworks for managing preference queries.

In the first part of my talk I will describe the notion of preference queries and introduce the core properties of the general frameworks proposed in the database literature to handle these queries. In the main part of the talk I will identify and discuss several semantical and computational traps that one will face trying to concretize one of these frameworks. Fortunately, I will show that these traps are not hopeless, and I will make a concrete proposal on how to escape these traps.

Joint work with R. Brafman.

November 21 Learning Functional Dependencies via Margin Maximization
Thomas Hofmann

Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. Recent progress in machine learning has mainly focused on designing flexible and powerful input representations, for instance, via kernel functions (Support Vector Machines) or by using implicit feature selection mechanisms (AdaBoost). This has greatly increased the range of applicability of learning methods to input representations based on strings, graphs, automata, etc.

This talk will deal with the complementary problem of generalizing methods such as Support Vector Machines from binary/multiclass classification to prediction problems involving more general output spaces, e.g. problems with multiple dependent output variables, structured output spaces, and classification with class descriptors. These extensions have important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. The talk will outline the general framework and discuss some specific instantiations, most notably, hierarchical classification and label sequence learning.

November 28 Thanksgiving

December 5 Algorithms for Markov Random Fields in Computer Vision
Dan Huttenlocher
Markov random field (MRF) models provide a unified theoretical framework for a broad range of problems in computer vision, ranging from low-level tasks such as stereopsis and motion estimation to high-level tasks such as recognition and pose recovery. Algorithmic advances over the past few years have resulted in MRF models being of practical as well as theoretical utility. For instance, most of the current best performing stereo algorithms are based on MRF formulations. This talk will describe the use of MRF models in computer vision, and discuss some recent algorithmic advances that make these models practical for a wide range of problems.

See also the AI graduate study brochure.

Please contact any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up!

CS772, Fall '03
Claire Cardie
Rich Caruana
Joe Halpern
Thorsten Joachims
Lillian Lee
Bart Selman
Golan Yona

Back to CS course websites

Date	Speaker/Title/Abstract/Host
October 3	Machine Learning for Coreference Resolution Vincent Ng It is commonly observed that a human speaker or author will avoid repetition by using a variety of noun phrases (NPs) to refer to the same real-world entity. While human audiences generally have little trouble mapping a collection of NPs onto the same entity, this task of coreference resolution can present a challenge to natural language processing systems. This talk examines coreference resolution in the context of two machine learning paradigms: supervised learning and weakly supervised learning. Supervised approaches to coreference resolution have been reasonably successful, operating primarily by training a classifier that determines whether two NPs are co-referring. In the first part of the talk, we will discuss potential problems with recasting coreference resolution as a classification task and show how to circumvent these problems via example selection and error-driven pruning of classification rulesets. The second part of the talk will center around weakly supervised approaches to coreference resolution, in which learning proceeds by bootstrapping a large set of automatically labeled data for coreference from a very small set of labeled instances. In particular, we investigate the applicability of multi-view weakly supervised algorithms to coreference resolution, which does not appear to have a natural feature split. This is joint work with Claire Cardie.
October 10	Transductive Learning via Spectral Graph Partitioning Thorsten Joachims Consider the following type of prediction problem: you have a large database of object (e.g. text documents) that you would like to organize according to some classification. You know the classification for a small subset of the objects and you would like to infer the classes for the remaining objects. A typical example is relevance feedback in information retrieval: a user specifies some documents as relevant/non-relevant and wants to find all other relevant documents in the collection. This type of machine learning task is called transductive inference. A key difference to the typical inductive machine learning problem is that the location of the test points can be observed by the learning algorithm. I will motivate that exploiting cluster structure in the test set can help make more accurate predictions, in particular for text classification. However, designing transductive learning algorithms that simultaneously minimize training error and maximize a measure of clustering in the test set is challenging. In this talk, I will present an approach based on spectral graph partitioning. In this approach, objects in the database form the nodes of a graph, while the edges represent dependencies. For such graphs, I will generalize ratio-cuts to find a clustering of the database that obeys the known classifications for the training examples. The relaxation of the resulting optimization problem can be solved as an eigenvalue problem. Empirical results show improved prediction accuracy especially for small training sets. Furthermore, the framework is very flexible, making it possible to implement co-training as a special case.
October 17	Using Machine Learning to Model Standard Medical Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees Rich Caruana We used bagged probabilistic decision trees to train models to predict the probability that an expectant mother will need a C-section. The models were trained on a database of 22,175 expectant mothers, 16.8% of which delivered by Caesarian Sction. The patients in the study are from 17 different physician practices. Our goal is not to predict if a new patient will need a C-section, but to perform a retrospective analysis of the 17 physician practices to see if some practices are performing C-sections at abnormally high or low rates compared to their colleagues. To do this we train models to predict the standard practice across all the physician groups, and then use the models to evaluate the aggregate risk of C-section for the patients in each practice. If a practice has an observed C-section rate higher than what the model predicts for them, we conclude that they are performing too many C-sections. For this to work, it is critical that the models yield well calibrated probabilities so that they do not systematically under or over predict on physician practices specializing in high-risk patients, i.e., practices that see patients predominantly in the tails of the risk distribution. This is joint work with Radu Niculescu (CMU), Bharat Rao (Siemens), and Cynthia Simms MD (Magee Womens Hospital). This paper will be presented at the American Medical Informatics Association (AMIA) Meeting this November.
WEDNESDAY 10:00am October 22	Efficient Learning in Stochastic Games Ronen Brafman Stochastic (or Markov) games are a rich model for multi-agent interactions. They generalize the Markov decsision process model and the repeated game model. At each stage the agent engages in a single shot game whose outcome determines both his reward and the next stage's game. I will describe R-max, a very simple model-based algorithm for learning in zero-sum stochastic games. R-max is the first algorithm to be show to converge to near-optimal value in polynomial time. Moreover, unlike most other algorithms, it has a built in method for trading exploration and exploitation which provides the first formal justification of some popular reinforcement learning heuristics. Time permitting, I will also describe extensions to common-interest stochastic games and a new normative criteria for game-learning algorithms called an efficient learning equilibrium.
October 31	Reasoning about Evidence Riccardo Pucella As soon as one wants to reason about systems containing a mix of nondeterministic actions and probabilistic actions, annoying things start to happen. A purely probabilistic analysis of such systems, the default method, often fails to capture fairly simple intuitions about them. Examples readily arise in the field of verification of probabilistic protocols, and in decision theory. It turns out that a key notion, evidence, can nicely capture these intuitions that cannotbe accounted for directly by existing analyses. Intuitively, we can view evidence as a function from prior beliefs (before an observation is made) to posterior beliefs (after the observation). I formalize this notion of evidence in a propositional modal logic, present a sound and complete axiomatization of the logic, and say a word about complexity. Joint work with Joe Halpern.
November 7	Accelerating Random Walks Wei Wei In recent years, there has been much success using random walk style local search algorithms to solve combinatorial search problems, including Boolean satisfiability problems. However, such procedures are not as effective on problem classes that contain many hidden variable dependencies. We will discuss two approaches for speeding up the combinatorial search process by extracting structure from the underlying problem domain. The first method employs a general machine learning framework to discover good start states for local search methods. The second approach transforms the search space by introducing new problem constraints. These new constraints can be derived efficiently, and, literally, "accelerate" the random walk search process. Joint work with Bart Selman.
November 14	On some Semantical and Computational Issues in Database Preference Queries Carmel Domshlak The problem of preference elicitation and management has generated much interest in the database systems community in recent years. This interest stems from a rapidly growing class of untrained, lay users browsing extremely large databases accessible through the Internet. Typically these users do not have clear knowledge about the particular items that these databases contain, nor do they typically have a particular result in mind. Rather, they are attempting to identify items that are useful for them in some manner, or in other words, items that suit their preferences best. To support such users, database systems must be able to process preference queries, i.e. queries that describe desirable properties of the end result of the search process. Such queries must be intuitive to formulate by the user, correctly interpreted by the system, computationally efficient to process, and enable the user to quickly home-in on the desired elements. The main problem is that handling preference queries extends the current capabilities of database systems. This problem has not gone unnoticed, and in last decade the database community has begun to address this issue, proposing general frameworks for managing preference queries. In the first part of my talk I will describe the notion of preference queries and introduce the core properties of the general frameworks proposed in the database literature to handle these queries. In the main part of the talk I will identify and discuss several semantical and computational traps that one will face trying to concretize one of these frameworks. Fortunately, I will show that these traps are not hopeless, and I will make a concrete proposal on how to escape these traps. Joint work with R. Brafman.
November 21	Learning Functional Dependencies via Margin Maximization Thomas Hofmann Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. Recent progress in machine learning has mainly focused on designing flexible and powerful input representations, for instance, via kernel functions (Support Vector Machines) or by using implicit feature selection mechanisms (AdaBoost). This has greatly increased the range of applicability of learning methods to input representations based on strings, graphs, automata, etc. This talk will deal with the complementary problem of generalizing methods such as Support Vector Machines from binary/multiclass classification to prediction problems involving more general output spaces, e.g. problems with multiple dependent output variables, structured output spaces, and classification with class descriptors. These extensions have important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. The talk will outline the general framework and discuss some specific instantiations, most notably, hierarchical classification and label sequence learning.
November 28	Thanksgiving
December 5	Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher Markov random field (MRF) models provide a unified theoretical framework for a broad range of problems in computer vision, ranging from low-level tasks such as stereopsis and motion estimation to high-level tasks such as recognition and pose recovery. Algorithmic advances over the past few years have resulted in MRF models being of practical as well as theoretical utility. For instance, most of the current best performing stereo algorithms are based on MRF formulations. This talk will describe the use of MRF models in computer vision, and discuss some recent algorithmic advances that make these models practical for a wide range of problems.

Artificial Intelligence Seminar

Fall 2003 Friday 12:00-1:15 Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI), Computing and Information Science, Cornell

Fall 2003
Friday 12:00-1:15
Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI),
Computing and Information Science, Cornell