Artificial Intelligence Seminar

Fall 2006
Friday 12:00-1:15
Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI),
Computing and Information Science, Cornell

The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. Lunch will be served starting at noon, with the talks running between 12:15 and 1:15. The new format is designed to allow AI chit-chat before the talks begin. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.

Date
Title/Speaker/Abstract/Host

September 1 No Seminar

September 8 Only Connect! Explorations in Graph-based Document Processing
Lillian Lee
Can we create a system that can learn to understand political speeches well enough to determine the speaker's viewpoint, without having to give the system a great deal of linguistic knowledge? Can we improve information retrieval by using link analysis, as is famously done in Web search, if we are dealing with documents that don't contain hyperlinks? And how can these two questions form the basis of a coherent talk? Answer: graphs!
Joint work with Oren Kurland, Bo Pang, and Matt Thomas.

September 15 Constraint Programming: Algorithms and Applications
Willem-Jan van Hoeve
Constraint Programming is a powerful paradigm to model and solve combinatorial problems. The field has matured significantly in the last 10-15 years. I will contrast some of the key features of this approach with the more traditional optimization techniques as developed for example in Operations Research. In particular, I'll discuss so-called filtering algorithms that efficiently handle several broad classes of combinatorial constraints. I will also discuss several application areas of these techniques.

September 22 Toward Opinion Summarization: Linking the Source through Partially Supervised Clustering
Ves Stoyanov
Recent research efforts in NLP have attempted to extract opinions expressed in text. Starting with all individual opinions extracted from a document, the goal of our work is to aggregate these opinions into a more concise and useful summary representation. I will begin by giving an example of such a summary representation and arguing for its usefulness.

Creating opinion summaries requires solving a number of research challenges. I will briefly explain these challenges and concentrate on a particular problem: linking opinions attributed to the same source (source coreference resolution). In the rest of the talk, I will discuss our approach to source coreference resolution, which we define in terms of the more general problem of partially supervised clustering.

Joint work with Claire Cardie.

September 29 Axiomatizing Ranking Systems
Alon Altman, Technion
This talk will survey our recent work in applying the axiomatic approach to ranking systems. Ranking systems are systems in which agents rank each other to produce a social ranking. In the axiomatic approach we study ranking systems under the light of basic properties, or axioms. In this talk I will present our axiomatization theorem for the PageRank ranking system, prove an impossibility and possibility result for general ranking systems, and discuss the issue of incentives in ranking systems. Finally, I will show initial results regarding personalized ranking systems, where a specialized ranking is generated for each agent.

October 6 No AI-Seminar

October 13 ONBIRES: A Platform for Information Extraction from Biomedical Literature
Xiaoyan Zhu, Tsinghua University (visting Cornell)
An important issue that we encounter today is how to make effective use of the enormous amount of biomedical data to improve their understanding of complex biological systems. These data are from various ways such as electronic medical journals, microarray experiments and other biological experiments. How to automatically and effectively extract, integrate and make use of information embedded in such heterogeneous unstructured data is a challenging task.
ONBIRES (ONtology-based BIological Relation Extraction System) is designed to perform the tasks: automatic relation extraction from literature, knowledge management such as biological interaction network visualization and query answer, providing hypothesis prediction information for new biological concepts discovery, and even information for knowledge inference.
Some information processing theories and techniques are involved. Dynamic programming algorithm is used for pattern generation while MDL (Minimum Description Length) principle for pattern optimization. A semi-supervised ML algorithm is proposed with an effective evaluation function. Furthermore, name entities recognition method, ontology integration and expansion methods, and classification method are also playing very important roles in the system.

October 20

October 27 Support Vector Training of Protein Alignment Models
Chun-Nam Yu
Sequence to structure alignment is an important step in homology modeling of protein structures, and the alignment accuracy significantly affects the quality of the final three-dimensional model. Incorporation of features like secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way.

In this talk, we will discuss overcoming this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred-thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions, which allows accounting for different types of alignment errors and for the inherent ambiguity in sequence to structure alignment.

In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, which is a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM from a training set of about 5000 examples aligns 47% of the residues correctly and aligns over 70% of the residues within a shift of 4 positions.

The talk will be self-contained and will contain a brief introduction to proteins and the alignment problem. No biological knowledge assumed.

November 3 Understanding how Real Neurons Respond to Simple Two-dimensional Form
Jonathan Victor, Weill Medical College of Cornell University
I will begin by briefly reviewing some of the differences between real neurons and artificial neurons. This review highlights the need for conceptual but nevertheless realistic models of the computations that real neurons and neural networks perform.
The neurons of the primary visual cortex play a pivotal role in vision, and have been studied for almost 50 years in an effort to achieve the above goal. There is a consensus that at a very qualitative level, these neurons resemble oriented spatiotemporal filters, but a predictively accurate model (i.e., one that can account for how they respond to complex or natural stimuli) remains elusive.
Localization in space and spatial frequency is a standard starting point for a theoretical understanding of the properties of these neurons. The usual Heisenberg-like notion of joint localization of space and spatial frequency has certain shortcomings in this context; resolution of these shortcomings leads to a central role for two-dimensional Hermite functions rather than Gabor functions. Experimental study of cortical neurons with two-dimensional Hermite functions reveals the unexpected prevalence of highly nonlinear behavior that may be relevant to understanding their responses to complex, natural stimuli.

Thursday
November 9 Toward First-Order Probabilistic Models for Language Processing
Aron Culotta, University of Massachusetts at Amherst
I will review recent work on increasingly flexible representations for probabilistic models of language. Drawing on examples from named- entity finding, co-reference resolution, and data mining, I'll show how representing language phenomena with first-order logic motivates the need for new approximate inference methods. I'll then discuss ongoing work on an approximate inference algorithm that combines stochastic satisfiability solvers and message-passing in junction trees.

November 17 No AI-Seminar (ACSU Lunch)

November 24 No AI-Seminar (Thanksgiving)

December 1 Using Access Data for Paper Recommendations on ArXiv.org
Stefan Pohl, Universitaet Darmstadt (visiting Cornell)
We investigate the use of http-access logs from arXiv.org as a source of information for identifying related papers. Compared to citation information, access logs have the advantage of being easily available without manual or automatic extraction of the citation graph. We compare access, content, and citation-based measures of relatedness on the task of creating reading lists, focusing in particular on the detection of recently published related papers. For cases, in which citation data is not available or expensive to acquire, already a simple measure like co-access outperforms textual similarity.

See also the AI graduate study brochure.

Please contact any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up!

CS772, Fall '06
Claire Cardie
Rich Caruana
Carla Gomes
Joe Halpern
Dan Huttenlocher
Thorsten Joachims
Lillian Lee
Bart Selman
Ramin Zabih

Back to CS course websites

Date	Title/Speaker/Abstract/Host
September 1	No Seminar
September 8	Only Connect! Explorations in Graph-based Document Processing Lillian Lee Can we create a system that can learn to understand political speeches well enough to determine the speaker's viewpoint, without having to give the system a great deal of linguistic knowledge? Can we improve information retrieval by using link analysis, as is famously done in Web search, if we are dealing with documents that don't contain hyperlinks? And how can these two questions form the basis of a coherent talk? Answer: graphs! Joint work with Oren Kurland, Bo Pang, and Matt Thomas.
September 15	Constraint Programming: Algorithms and Applications Willem-Jan van Hoeve Constraint Programming is a powerful paradigm to model and solve combinatorial problems. The field has matured significantly in the last 10-15 years. I will contrast some of the key features of this approach with the more traditional optimization techniques as developed for example in Operations Research. In particular, I'll discuss so-called filtering algorithms that efficiently handle several broad classes of combinatorial constraints. I will also discuss several application areas of these techniques.
September 22	Toward Opinion Summarization: Linking the Source through Partially Supervised Clustering Ves Stoyanov Recent research efforts in NLP have attempted to extract opinions expressed in text. Starting with all individual opinions extracted from a document, the goal of our work is to aggregate these opinions into a more concise and useful summary representation. I will begin by giving an example of such a summary representation and arguing for its usefulness. Creating opinion summaries requires solving a number of research challenges. I will briefly explain these challenges and concentrate on a particular problem: linking opinions attributed to the same source (source coreference resolution). In the rest of the talk, I will discuss our approach to source coreference resolution, which we define in terms of the more general problem of partially supervised clustering. Joint work with Claire Cardie.
September 29	Axiomatizing Ranking Systems Alon Altman, Technion This talk will survey our recent work in applying the axiomatic approach to ranking systems. Ranking systems are systems in which agents rank each other to produce a social ranking. In the axiomatic approach we study ranking systems under the light of basic properties, or axioms. In this talk I will present our axiomatization theorem for the PageRank ranking system, prove an impossibility and possibility result for general ranking systems, and discuss the issue of incentives in ranking systems. Finally, I will show initial results regarding personalized ranking systems, where a specialized ranking is generated for each agent.
October 6	No AI-Seminar
October 13	ONBIRES: A Platform for Information Extraction from Biomedical Literature Xiaoyan Zhu, Tsinghua University (visting Cornell) An important issue that we encounter today is how to make effective use of the enormous amount of biomedical data to improve their understanding of complex biological systems. These data are from various ways such as electronic medical journals, microarray experiments and other biological experiments. How to automatically and effectively extract, integrate and make use of information embedded in such heterogeneous unstructured data is a challenging task. ONBIRES (ONtology-based BIological Relation Extraction System) is designed to perform the tasks: automatic relation extraction from literature, knowledge management such as biological interaction network visualization and query answer, providing hypothesis prediction information for new biological concepts discovery, and even information for knowledge inference. Some information processing theories and techniques are involved. Dynamic programming algorithm is used for pattern generation while MDL (Minimum Description Length) principle for pattern optimization. A semi-supervised ML algorithm is proposed with an effective evaluation function. Furthermore, name entities recognition method, ontology integration and expansion methods, and classification method are also playing very important roles in the system.
October 20
October 27	Support Vector Training of Protein Alignment Models Chun-Nam Yu Sequence to structure alignment is an important step in homology modeling of protein structures, and the alignment accuracy significantly affects the quality of the final three-dimensional model. Incorporation of features like secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this talk, we will discuss overcoming this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred-thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions, which allows accounting for different types of alignment errors and for the inherent ambiguity in sequence to structure alignment. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, which is a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM from a training set of about 5000 examples aligns 47% of the residues correctly and aligns over 70% of the residues within a shift of 4 positions. The talk will be self-contained and will contain a brief introduction to proteins and the alignment problem. No biological knowledge assumed.
November 3	Understanding how Real Neurons Respond to Simple Two-dimensional Form Jonathan Victor, Weill Medical College of Cornell University I will begin by briefly reviewing some of the differences between real neurons and artificial neurons. This review highlights the need for conceptual but nevertheless realistic models of the computations that real neurons and neural networks perform. The neurons of the primary visual cortex play a pivotal role in vision, and have been studied for almost 50 years in an effort to achieve the above goal. There is a consensus that at a very qualitative level, these neurons resemble oriented spatiotemporal filters, but a predictively accurate model (i.e., one that can account for how they respond to complex or natural stimuli) remains elusive. Localization in space and spatial frequency is a standard starting point for a theoretical understanding of the properties of these neurons. The usual Heisenberg-like notion of joint localization of space and spatial frequency has certain shortcomings in this context; resolution of these shortcomings leads to a central role for two-dimensional Hermite functions rather than Gabor functions. Experimental study of cortical neurons with two-dimensional Hermite functions reveals the unexpected prevalence of highly nonlinear behavior that may be relevant to understanding their responses to complex, natural stimuli.
Thursday November 9	Toward First-Order Probabilistic Models for Language Processing Aron Culotta, University of Massachusetts at Amherst I will review recent work on increasingly flexible representations for probabilistic models of language. Drawing on examples from named- entity finding, co-reference resolution, and data mining, I'll show how representing language phenomena with first-order logic motivates the need for new approximate inference methods. I'll then discuss ongoing work on an approximate inference algorithm that combines stochastic satisfiability solvers and message-passing in junction trees.
November 17	No AI-Seminar (ACSU Lunch)
November 24	No AI-Seminar (Thanksgiving)
December 1	Using Access Data for Paper Recommendations on ArXiv.org Stefan Pohl, Universitaet Darmstadt (visiting Cornell) We investigate the use of http-access logs from arXiv.org as a source of information for identifying related papers. Compared to citation information, access logs have the advantage of being easily available without manual or automatic extraction of the citation graph. We compare access, content, and citation-based measures of relatedness on the task of creating reading lists, focusing in particular on the detection of recently published related papers. For cases, in which citation data is not available or expensive to acquire, already a simple measure like co-access outperforms textual similarity.

Artificial Intelligence Seminar

Fall 2006 Friday 12:00-1:15 Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI), Computing and Information Science, Cornell

Fall 2006
Friday 12:00-1:15
Upson 5130

Sponsored by the Intelligent Information Systems Institute (IISI),
Computing and Information Science, Cornell