Artificial Intelligence Seminar

Fall 2007
Friday 12:00-1:15
Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI),
Computing and Information Science, Cornell

The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. Lunch will be served starting at noon, with the talks running between 12:15 and 1:15. The new format is designed to allow AI chit-chat before the talks begin. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.
 

Date

Title/Speaker/Abstract/Host

August 31 Active Exploration for Learning to Rank
Filip Radlinski, Cornell University

Search engine logs are a well recognized source of training data for learning to rank. However, previous work has only considered logs collected passively. We show that an active exploration strategy that actively selects rankings to show can provide training data that leads to faster learning. Specifically, we develop a Bayesian approach for selecting rankings to present users so that interactions result in more informative training data. Our results using the TREC-10 Web corpus, as well as on synthetic data, demonstrate that a directed exploration strategy quickly leads to users being presented improved rankings in an online learning setting. We find that active exploration substantially outperforms passive observation and random exploration.

This is joint work with Thorsten Joachims.

Host: Thorsten Joachims

September 7 Computational Workflows...
Yolanda Gil, University of Southern California

Computational workflows have emerged as an important paradigm in large-scale, distributed scientific investigation. Specifying sequences of tasks, as well as the mapping of these tasks to the underlying computational environment, these workflows are used to coordinate the thousands of distributed operations that may be required to obtain a scientific result from raw experimental or simulation data. I will present our results to date on using Artificial Intelligence (AI) techniques to assist users in specifying workflows using domain-relevant descriptions and to automate the generation of executable workflows that can be submitted to distributed resources. In a recent application for seismic hazard analysis, our Wings/Pegasus workflow system exploits semantic representations and AI planning techniques to generate workflows of more than 8,000 computations, create more than 100,000 data products with automatically generated rich metadata descriptions, and manage the execution of the workflow for a total of 1.9 CPU years.

Drawing from these experiences, I will discuss the potential of stronger synergies between AI and computational workflows. I will propose a variety of AI techniques that are relevant to current challenges in computational workflows, including dynamic self- configuration, interactive steering, continuous and robust operations, and performance optimization. I will also discuss recent work on applying computational workflows to Artificial Intelligence as a scientific domain, and present our new work on large-scale integrative machine learning and natural language processing.

This work is in collaboration with researchers from the Center for Grid Technologies as well as the Intelligent Systems group at USC/ ISI, and dozens of worldwide collaborators from a variety of scientific domains.

BIO:  Dr. Yolanda Gil is Associate Division Director at the Information Sciences Institute of the University of Southern California, and Research Associate Professor in the Computer Science Department. She received her M.S. and Ph. D. degrees in Computer Science from Carnegie Mellon University. Dr. Gil leads a group that conducts research on various aspects of Interactive Knowledge Capture. Her research interests include intelligent user interfaces, knowledge- rich problem solving, scientific and grid computing, and the semantic web. An area of recent interest is large-scale distributed data analysis through computational workflows. Dr. Gil was Program Chair of the Intelligence User Interfaces (IUI) Conference in 2002, was co- founder and co-chair of the First International Conference on Knowledge Capture (K-CAP) in 2001, and was program co-chair of the International Semantic Web Conference (ISWC) in 2005. She was elected to the Council of the American Association of Artificial Intelligence (AAAI), and was program co-chair of the AAAI conference in 2006. She serves in the Advisory Committee of the Computer Science and Engineering Directorate of the National Science Foundation.

Host: Carla Gomes

September 14 Reverse-Engineering Nonlinear Systems
Mike Schmidt, Cornell University

Many branches of science and engineering represent dynamical systems mathematically, as sets of differential equations, derived laboriously from basic principles and through experimentation. We are developing new approaches to reverse-engineer the analytical differential equations of a dynamical system automatically. We use genetic programming techniques to perturb and destabilize the system to reveal its hidden characteristics and to infer nonlinear symbolic relationships between variables. Our research has shown the ability to infer a seven-variable cell glycolysis system directly from data the largest system inferred automatically to date and to recover from unexpected damage in autonomous robots through continuous self-modeling. Our focus is to advance this approach to operate in high-noise and limited observability environments where manual methods for modeling are most overwhelmed.

Host: Hod Lipson

September 21 McRank: Learning to Rank Using Classification and Gradient Boosting
Ping Li, Cornell University Statistics

Relevance Ranking is critical in building commercial search engines. We cast the ranking problem as (1) multiple classification (``Mc'') (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the "Expected Relevance" to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on very large-scale datasets show that our approach can improve current state-of-the-art rankers, including RankNet/LambdaRank, RankBoost, RankSVM, and regression-based ranker, in terms of the (normalized) DCG scores (NDCG). An efficient implementation of the boosting tree algorithm is also presented. Some results will also be presented in NIPS 2007

Host: Thorsten Joachims

September 28 Fields of Experts: High-order Markov Random Field Models of Natural Scenes
Michael J. Black, Department of Computer Science, Brown University http://www.cs.brown.edu/people/black/

We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for Bayesian inference in a variety of machine vision tasks. The approach provides a practical method for learning high-order Markov Random Field (MRF) models with potential functions that extend over large pixel neighborhoods. These high-order models significantly increase the expressive power of MRFs. The key insight involves modeling the MRF potentials using a Products-of-Experts framework that exploits non-linear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field of Experts (FoE) model with two example applications in archival film restoration: image denoising and image inpainting. Film grain noise in archival films is particularly challenging because it varies with image intensity, is non-Gaussian, and is spatially correlated. While the FoE model is trained on a generic image database, and is not tuned toward a specific application, we obtain results that compete with and even outperform specialized techniques.

This is joint work with Stefan Roth and Teodor Moldovan.

Host: Dan Huttenlocher

October 5 A-Exam - Title: Modeling Additive Structure and Detecting Interactions with Groves of Regression Trees
Daria Sorokina, Cornell University

Discovery of additive structure is an important step towards understanding a complex multi-dimensional function, because it allows for expressing this function as the sum of lower-dimensional or otherwise simpler components. Modeling additive structure also opens up opportunities for learning better regression models.

In the first part of the talk I will describe a new regression algorithm called Groves, which is an ensemble of additive regression trees. It is based on such existing techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. I will present an efficient way to train such models. The resulting algorithm outperforms other state-of-the-art regression ensembles such as bagged trees or stochastic gradient boosting. I will also show that in addition to exhibiting superior performance on a suite of regression test problems, bagged Groves of trees are very resistant to overfitting.

In the second part of the talk I will discuss the problem of interaction detection. When variables interact, their effects cannot be decomposed into independent lower-dimensional contributions and hence must be modeled simultaneously. We introduce a new approach to interaction detection: it is based on comparing the performance of restricted and unrestricted predictive models. I will show that bagged Groves of trees allow variable interactions to be carefully controlled and therefore are especially useful for this framework. Interaction detection has significant direct practical importance, because it can provide valuable knowledge about a domain. We will apply our techniques to real ecological data about bird observations in North America, where they have the potential to reveal previously unknown relationships between environmental features and abundance of wild birds.

In the end I will talk about several directions for the future work; in particular, I will discuss how we can extend our algorithms to the task of binary classification by using generalized additive models such as logistic regression.

This is joint work with Rich Caruana, Mirek Riedewald and Daniel Fink.

Part of this work was presented at ECML' 07, where it won the Best Student Paper award.

Host: Rich Caruana

October 12 Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach
Steve Koutsourelakis, Cornell University, Civil Engineering

Clustering represents one of the most common statistical procedures and a standard tool for pattern discovery and dimension reduction. Most often the objects to be clustered are described by a set of measurements or observables e.g. the coordinates of the vectors, the attributes of people. In a lot of  cases however the available observations appear in the form of links or connections (e.g. communication or transaction networks). This data contains valuable information that can in general be exploited in order to discover groups and better understand the structure of the dataset. Since in most real-world datasets, several of these links are missing, it is also useful to develop procedures that can predict those unobserved connections. In this talk we address the problem of unsupervised group discovery in relational datasets. A fundamental issue in all clustering problems is that the actual number of clusters is unknown a priori. In most cases this is addressed by running the model several times assuming a different number of clusters each time and selecting the value that provides the best fit based on some criterion (i.e. Bayes factor in the case of Bayesian techniques). It is easily understood that it would be preferable to develop techniques that are able to number of clusters is essentially learned from that data along with the rest of model parameters. For that purpose, we adopt a nonparametric Bayesian framework which provides a very flexible modeling environment in which the size of the model i.e. the number of clusters, can adapt to the available data and readily accommodate outliers. In this context we present two models that can account for mixed-membership effects i.e. the possibility that an individual can belong to several groups simultaneously with different degrees of membership.

Host: Thorsten Joachims  

October 19 Modeling Science: Topic models of Scientific Journals and Other Large Document Collections
David Blei, Princeton University

A surge of recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models. These models are called "topic models" because the word patterns often reflect the underlying topics that are combined to form the documents; however topic models also naturally apply to such data as images and biological sequences.

After reviewing the basics of topic modeling, I will describe two related lines of research in this field, which extend the current state of the art.

First, while previous topic models have assumed that the corpus is static, many document collections actually change over time:

scientific articles, emails, and search queries reflect evolving content, and it is important to model the corresponding evolution of the underlying topics. For example, an article about biology in 1885 will exhibit significantly different word frequencies than one in 2005. I will describe probabilistic models designed to capture the dynamics of topics as they evolve over time.

Second, previous models have assumed that the occurrence of the different latent topics are independent. In many document collections, the presence of a topic may be correlated with the presence of another. For example, a document about sports is more likely to also be about health than international finance. I will describe a probabilistic topic model which can capture such correlations between the hidden topics.

In addition to giving quantitative, predictive models of a corpus, topic models provide a qualitative window into the structure of a large document collection. This perspective allows a user to explore a corpus in a topic-guided fashion. We demonstrate the capabilities of these new models on the archives of the journal Science, founded in 1880 by Thomas Edison. Our models are built on the noisy text from JSTOR, an online scholarly journal archive, resulting from an optical character recognition engine run over the original bound journals.

(joint work with J. Lafferty)

Host: Thorsten Joachims

October 26 *No Seminar - DEPARTMENT REVIEW*
November 2 Identifying expressions of opinion in context
Eric Breck, Cornell University

While traditional information extraction systems have been built to answer questions about facts, subjective information extraction systems will answer questions about feelings and opinions. A crucial step towards this goal is identifying the words and phrases that express opinions in text. We present an approach for identifying opinion expressions that uses conditional random fields and we evaluate the approach at the expression-level using a standard sentiment corpus. Our approach achieves expression-level performance within 5% of the human interannotator agreement. 

This is joint work with Yejin Choi and Claire Cardie, and was presented at IJCAI-2007.

November 9 Structured Local Training and Biased Potential Functions
for Conditional Random Fields with Application to Coreference Resolution

Yejin Choi, Cornell University

Conditional Random Fields (CRFs) have shown great success for problems involving structured output variables. However, for many real-world NLP applications, exact maximum-likelihood training is intractable because computing the global normalization factor even approximately can be extremely hard. In addition, optimizing likelihood often does not correlate with maximizing task-specific evaluation measures. In this paper, we present a novel training procedure, structured local training, that maximizes likelihood while exploiting the benefits of global inference during training: hidden variables are used to capture interactions between local inference and global inference. Furthermore, we introduce biased potential functions that empirically drive CRFs towards performance improvements with respect to the preferred evaluation measure for the learning task. We report promising experimental results on two coreference data sets using two task-specific evaluation measures.

This is joint work with Claire Cardie, and was presented at NAACL-2007.

November 16 *No Seminar - ACSU Luncheon*
November 23 *No Seminar - THANKSGIVING BREAK*
November 30 *No Seminar - Cancelled*
December 7 *No Seminar - First Day of Final Exams*

See also the AI graduate study brochure.

Please contact any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up! 


CS772, Fall '07
Claire Cardie

Rich Caruana
Carla Gomes
Joe Halpern
Dan Huttenlocher
Thorsten Joachims
Lillian Lee
Bart Selman
Ramin Zabih

Back to CS course websites