The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. Lunch will be served starting at noon, with the talks running between 12:15 and 1:15. The new format is designed to allow AI chit-chat before the talks begin. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.
Schedule (because of departmental meetings, we will begin in February):
|
Date |
Speaker/Title/Abstract/Host |
| February 7 |
Golan Yona Using a mixture of probabilistic decision trees for direct prediction of protein function I will describe a new approach to decision tree learning and its applications to protein classification. Specifically, I will introduce the mixture model of probabilistic decision trees and demonstrate how it can be used to learn the set of potentially complex relationships between protein features and protein function. Our model addresses some of the fundamental problems with traditional decision tree learning algorithms. Specifically, we address four elements: optimization, evaluation, biased sample sets, and model selection. More precisely, we first propose an effective method of searching the hypothesis space that overcomes the pitfalls of the deterministic learning algorithms. Secondly, we introduce a novel criterion function to evaluate decision tree performance. Thirdly, we describe a method of dealing with distributions in which negatives samples far outnumber positive samples, such as in our protein classification problem. Lastly, we propose an alternative method for deciding on the most probable model that is especially effective for small data sets. The model was tested on two well established classifications of proteins. The model is very effective in learning highly diverged protein families or families that are not defined based on sequence. The resulting tree structure indicates the properties that are strongly correlated with structural and functional aspects of protein families, and can be used to suggest a concise definition of a protein family. Joint work with M.eng student Umar Syed. Host: Rich |
| February 14 | *** no class *** (FCI Founders meeting) |
|
February 21 |
*** no class *** |
| February 28 | *** no class *** |
| March 7 | John Langford
The One Bound It turns out that every(*) bound on the true error rate of a classifier which holds for all distributions can be stated in terms of the communication complexity of the labels given the unlabeled data. I'll discuss "the One Bound" and the relationships with several families of other bounds. (*) at least, every bound that has been checked |
| March 14 | ***no class*** (room is being used for brown-bag presentation) |
| March 21 | ***no class*** (Spring Break) |
| March 28 | Rich Caruana
Extreme Ensemble Selection Host: Rich |
| April 4 | Decision theory symposium |
| April 11 | ***no class*** (ACSU student/faculty lunch) |
| April 18 |
Shimon
Edelman (combined with the Brownbag seminar) Unsupervised efficient learning and representation of language
structure We describe a linguistic pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of corpus data. This is achieved by compactly coding recursively structured constituent patterns, and by placing strings that have an identical backbone and similar context structure into the same equivalence class. The resulting representations constitute an efficient encoding of linguistic knowledge and support systematic generalization to unseen sentences. Joint work with Zach Solan, Eytan Ruppin and David Horn Host: Lillian |
| April 25 | Dimitris Agrafiotis,
Johnson & Johnson Pharmaceutical Research & Development
New algorithms for the analysis of large data sets and their application in molecular design The multitude of potential drug targets emerging from genome sequencing
demands new approaches to drug discovery. A chemo-genomics strategy,
involving the generation of small molecule compounds that can be used both
as tools to probe biological mechanisms and as leads for drug property
optimization, provides a highly parallel, industrialized solution. Key to
the success of this strategy is an integrated suite of data-driven chemi-informatics
tools that can enable the rapid and directed optimization of chemical
compounds with drug-like properties using just-in-time combinatorial
chemical synthesis. An effective embodiment of this process requires new
computational and data mining techniques that cover all aspects of library
generation, modeling and design, and work effectively on a massive scale.
This talk will introduce the essential elements of such a system, and
highlight key algorithmic advances that expand, by several orders of
magnitude, the number of compounds that can be assessed as potential
drugs. Particular emphasis will be placed on a novel self-organizing
algorithm for extracting the metric structure and intrinsic dimensionality
of large experimental observation spaces. The algorithm, known as
stochastic proximity embedding or SPE, attempts to generate
low-dimensional Euclidean embeddings that best preserve the geodesic
distances between a set of related observations. Unlike previous
approaches, our method can reveal the underlying geometry of the data
without intensive nearest neighbor or shortest-path computations, and can
reproduce the true geodesic distances of the data points in the
low-dimensional embedding without requiring that these distances be
estimated from the data sample. More importantly, SPE scales linearly with
the number of points, and can be Host: Golan Yona |
| May 2 | Cognitive Studies Spring Symposium |
See also the AI graduate study brochure.
Please contact any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up!
Back to CS
course websites