Poster Sessions
There will be two poster sessions. The speakers from talks will present posters the same day as the talk. Other posters have been split up between the two sessions semi-randomly. Click on the title of each poster to read the abstract.
The poster locations indicate where the poster will be presented, please make sure to hang your poster in the right location. Attendees will get a map at registration.
Posters - Friday, 7pm
Location: Duffield Atrium
Recurrent Boosting Method for Time-Dependent Classification of Epileptiform Signals, Robert D Vincent, Joelle Pineau, Philip de Guzman and Massimo AvoliPoster Location: 1
Abstract Boosted ensemble classifiers have a demonstrated ability to discover
regularities in large, poorly modeled datasets. In this paper we present
an application of multi-hypothesis AdaBoost to detect epileptic activity
from electrophysiological recordings. While existing boosting methods do
not account automatically for the sequence information that is available
when analyzing time-series data, we present a recurrent extension to
AdaBoost, and show that it improves classification accuracy in our
application domain.
Propagating errors and beliefs for large-scale nonlinear structure prediction, Roland MemisevicPoster Location: 2
Abstract Nonlinear extensions to structured supervised learning can
be achieved by replacing the common linear compatibility
score with a nonlinear function, such as a neural network.
A potential advantage of such a nonlinear variant is that it can,
in contrast to kernel based approaches, be trained on very large
datasets, because computational complexity scales only linearly
with the number of examples. Supervised Clustering with Support Vector Machines, Thomas William Finley and Thorsten JoachimsPoster Location: 3
Abstract Supervised clustering is the problem of training a clustering algorithm to produce desirable clusterings: given sets of items and complete clusterings over these sets, we learn how to cluster future sets of items. Example applications include noun-phrase coreference clustering, and clustering news articles by whether they refer to the same topic. In this paper we present an SVM algorithm that trains a clustering algorithm by adapting the item-pair similarity measure. The algorithm may optimize a variety of different clustering functions to a variety of clustering performance measures. We empirically evaluate the algorithm for noun-phrase and news article clustering. Preconditioner Approximations for Probabilistic Graphical Models, Pradeep Ravikumar and John LaffertyPoster Location: 4
Abstract We present a family of approximation techniques
for probabilistic graphical models, based on the
use of graphical preconditioners developed in the
scientific computing literature. Our framework
yields rigorous upper and lower bounds on event
probabilities and the log partition function of
undirected graphical models, using non-iterative
procedures that have low time complexity. As
in mean-field approaches, the approximations are
built upon tractable subgraphs; however, we recast
the problem of optimizing the tractable distribution
parameters and approximate inference
in terms of the well-studied linear systems problem
of obtaining a good matrix preconditioner.
Experiments are presented that compare the new
approximation schemes to variational methods. Dynamic Embedding of Author-Word Co-occurrence Data Over Time, Purnamrita Sarkar and Sajid M. SiddiqiPoster Location: 5
Abstract We address the problem of embedding entities into Euclidean space over time
based on co-occurrence data. We model the probability of a co-occurrence given
the (hidden) coordinates of the entities as a joint distribution over the entities. This
leads to a non-standard factored state space model with real-valued hidden parent
nodes and discrete observation nodes. An approximation on the log partition
function of the observation model makes it conjugate to the Normal distribution,
allowing us to formulate the entire dynamic model as a Kalman filter. Qualitative
results on co-occurrences of authors and words in the NIPS corpus show that our
model results in efficient and meaningful embeddings of large datasets. Relaxations to Zero-Norm Support Vector Machines, Gautam KunapuliPoster Location: 6
Abstract The use of zero-norm rather than the classical two-norm
regularization in Support Vector Machines (SVMs) for
binary classification is considered.
This converts the easily solvable quadratic program to a
NP-complete combinatorial optimization problem. The approach
presented here converts the problem to a
Mixed Integer Linear Program (MILP) and explores
various relaxations. It is shown that
these relaxations lead to computationally tractable algorithms
and produce solutions that generalize
comparably to SVMs while using fewer support vectors. Error Rates for Causal Discovery Strategies, Frederick EberhardtPoster Location: 7
Abstract We simulate two different strategies to discover the causal structure
among $N$ variables with a sequence of experiments. In the first, we
randomize no more than one variable per experiment, whereas in the
second, several variables can be randomized simultaneously. Both
strategies are derived from worst case bounds on the number of
experiments necessary and sufficient to determine the causal structure
among $N$ variables described in Eberhardt et al, 2004, 2005. The
first strategy requires $N-1$ experiments, whereas the second only
requires $lfloor log_2(N)rfloor+1$. Since there is no closed form
expression for the error rates of these strategies as a function of
sample size or other features of the enterprise, for example the
spareness of the generating causal graph, we compute error rates from
a large simulation study. Apart from providing insight on these error
rates, our results raise a variety of questions pertaining to
meta-analysis, pooling of data, search strategies and optimal
sequences of experiments.
Cluster Ensembles for Network Anomaly Detection, Art Munson and Rich CaruanaPoster Location: 8
Abstract Cluster ensembles aim to find better, more natural clusterings by
combining multiple clusterings. We apply ensemble clustering to
anomaly detection, hypothesizing that multiple views of the data will
improve the detection of attacks. Each clustering rates how anomalous
a point is; ratings are combined by averaging or taking either the
minimum, the maximum, or median score. The evaluation shows that
taking the median prediction from the cluster ensemble results in
better performance than single clusterings. Surprisingly, averaging
the individual predictions a) leads to worse performance than that of
individual clusterings, and b) performs identically to taking the
minimum prediction from the ensemble. This counter-intuitive result
stems from asymmetric prediction distributions. Learning distance metrics for clustering using predictability, Abhishek A Gupta, Dean P Foster and Lyle H UngarPoster Location: 9
Abstract We propose a clustering approach which uses supervised learning
methods in a purely unsupervised setting. We seek linear
transformations of the points that give clean,
well-separated clusters in the transformed space, where "clean"
clusters are those for which cluster membership can be
accurately predicted. This corresponds to learning a distance metric
which minimizes a criterion reminiscent to that in k-means.
We propose an iterative algorithm which predicts cluster memberships
using linear regression
and then uses k-means to cluster these
predictions. When iterated, this procedure converges to a fixed
point. Further these clusters are invariant to linear transformations
of the original features because of the invariance of linear
regression. We also give an oracle argument to justify
the predictability of clusters, and discuss extensions using RKHSs. Learning sparse overcomplete representations with an energy-based factor graph, Marc\'Aurelio Ranzato and Yann LeCunPoster Location: 10
Abstract We describe the symmetric product of experts, a novel
unsupervised method for learning sparse, overcomplete features. The
method uses a factor graph with an ``encoder'' factor and a
``decoder'' factor. The encoder factor computes a Gaussian conditional
distribution over code vectors given an input image patch. The code
vector is transformed through a non-linear transformation dubbed
temporal softmax to produce a sparse code vector whose components
are positive or zero. The decoder factor computes a Gaussian
conditional distribution over reconstructed image patches given a
sparse code vector. Given an input patch, learning proceeds in a
two-phase EM-like fashion: (1) computing the maximum-likelihood code
vector, (2) adjusting the parameters of the encoder and decoder so as
to minimize a simple loss function. A system with linear filters in
the encoder and decoder was trained on handwritten numerals and
natural image patches. The learned filters resemble ``object parts''
for the numerals, and localized, oriented features for natural
patches. Unlike with other feature learning methods, inference is
extremely fast, no preprocessing of input patches is required, and no
sampling method is used.
Using object models to learn object features - away from keypoints, Alexander Sorokin, Himanshu Arora, Nicolas Loeff and David ForsythPoster Location: 11
Abstract We present our work in progress on how to diminish the dependence of
weakly supervised object recognition approaches on keypoint-detection.
Keypoint detection is crucial to reduce the complexity of the learning
problem while keep the most relevant information. But once the model
is learnt, the information it provides can be used to construct more
reliable feature detectors. Finally, after learning the feature
detectors the model can be retrained without dependence on
keypoints. A Situation Calculus Based Approach for Model Checking, Yilan Gu and Iluju KiringaPoster Location: 12
Abstract To reason about properties of reactive programs, one may usually follow
either an operational or a deductive approach. In this paper,
we propose representing the classical model checking approach of Clarke and
Emerson in the situation calculus, and give an approach that
merges the above two approaches into one single
framework by translating Kripke models that represent system specifications
into theories formulated in the situation calculus and by recasting CTL as
a decidable sublanguage of the calculus.
A Hierarchical Approach to Efficient Reinforcement Learning in Deterministic Domains, Carlos G Diuk, Alexander L Strehl and Michael LittmanPoster Location: 13
Abstract Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-scale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcement-learning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchy-based MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near
optimal performance within the hierarchy. Cascade-correlation algorithms for on-line reinforcement learning, Marc BellemarePoster Location: 14
Abstract Neural networks have been used very successfully to represent value
functions for reinforcement learning algorithms in several applications.
However, considerable hand-crafting is needed in order to obtain a good
network structure. To address this problem, cite{rivest03combining}
proposed an approach to using cascade correlation neural networks with
reinforcement learning. Their approach relies on a cache of data, which is
used to accumulate a batch of patterns for training the network. In this
paper, we explore a version of cascade correlation which has a more
on-line flavor. Empirical results on the game of Backgammon
show that the variations we discuss are more robust to the noise inherent in
RL tasks. PSIGRAPH: A Psiform-Based Conformant Graphplan, Alan Scott CarlinPoster Location: 15
Abstract Conformant planners solve problems with a correct but incomplete description of the initial state, by finding plans that are valid for all possible assignments of the unknown atoms. Most conformant planners, however, do not handle universal quantification, which is a problem when the set of all domain objects is unknown or very large, and thus can not be enumerated. This paper discusses PSIGRAPH, a conformant planner that operates with universally quantified statements in the initial and goal states, as well as in the action preconditions. Thus, PSIGRAPH does not need to know the complete set of domain objects. PSIGRAPH is based on Graphplan (Blum and Furst 1995), but differs from previous approaches such as Conformant Graphplan (Smith and Weld 1998) in that it does not create multiple planning graphs. We present the algorithm and the results of its experimental evaluation, showing that PSIGRAPH is competitive with other planners. Finally, we discuss recent additions to the algorithm that have been performed since the original publication of PSIGRAPH. An Empirical Evaluation of Interval Estimation for Markov Decision Processes, Alexander L Strehl and Michael L LittmanPoster Location: 16
Abstract This paper takes an empirical approach to evaluating three
model-based reinforcement-learning methods. All methods
intend to speed the learning process by mixing
exploitation of learned knowledge with exploration of possibly
promising alternatives. We consider $epsilon$-greedy exploration,
which is computationally cheap and popular, but unfocused in its
exploration effort; R-Max exploration, a simplification of an
exploration scheme that comes with a theoretical guarantee of efficiency;
and a well-grounded
approach, model-based interval estimation, that
better integrates exploration and exploitation. Our experiments indicate that effective exploration can result in dramatic improvements in the
observed rate of learning. Using SAT in QBF, Horst Samulowitz and Fahiem BacchusPoster Location: 17
Abstract QBF is the problem of deciding the satisfiability of quantified
boolean formulae in which variables can be either universally or
existentially quantified. QBF generalizes SAT and is in practice much
harder to solve than SAT. One of the sources of added complexity in
QBF arises from the restrictions quantifier nesting places on the
variable orderings that can be utilized during backtracking search.
In this paper we present a technique for alleviating some of this
complexity by utilizing an order unconstrained SAT solver during QBF
solving. The innovation of our paper lies in the integration of SAT
and QBF. We have developed methods that allow information obtained
from each solver to be used to improve the performance of the other.
Unlike previous attempts to avoid the ordering constraints imposed
by quantifier nesting, our algorithm retains the polynomial space
requirements of standard backtracking search. Our empirical results
demonstrate that our techniques allow improvements over the current
state-of-the-art in QBF solvers. Preprocessing QBF, Jessica Davies, Horst Samulowitz and Fahiem BacchusPoster Location: 18
Abstract In this paper we investigate the use of preprocessing when solving
Quantified Boolean Formulas (QBF). Many different problems can be
efficiently encoded as QBF instances, and there has been a great
deal of recent interest and progress in solving such instances
efficiently. We show that QBF instances can be simplified using
techniques related to those used for preprocessing SAT. These
simplifications can be performed in polynomial time, and are used to
preprocess the instance prior to invoking a worst case exponential
algorithm to solve it. We develop a method for preprocessing QBF
instances that is empirically very effective. That is, the
preprocessed formulas can be solved significantly faster, even when
we account for the time required to perform the preprocessing. Our
method significantly improves the efficiency of a range of
state-of-the-art QBF solvers. Furthermore, our method is able to
completely solve some instances just by preprocessing, including
some instances that to our knowledge have never been solved before
by any QBF solver. Improving Performance of Functional Magnetic Image Analysis by Parameter Sharing in Bayesian Networks, Mark M PalatucciPoster Location: 19
Abstract Bayesian Networks have proven useful for classifying cognitive states from functional magnetic resonance images (fMRI). Despite recent progress, training effective classifiers is still a difficult problem: dimensionality of fMRI data is extremely high, and training examples are both sparse and noisy.
Effective classification in this domain typically requires feature reduction or parameter sharing in order to reduce the number of estimated parameters in the Bayesian network. In this abstract, we present an overview of our current research for parameter reduction of fMRI data. Our goal is to show that we can improve classifier accuracy by sharing parameters across regions of the brain with highly correlated neural
activity. Latent variable models for integrating gene evidence for eukaryotic gene prediction, Qian LiuPoster Location: 20
Abstract Integrating multiple sources of evidence into consistent
gene predictions demands great effort by human
curators. To automate this process, we have developed
a new computational method – Dagger, based
on probabilistic graphical models. The model parameters
encode the strength of evidence sources, and they
are estimated from unlabeled data by likelihood maximization.
Given the estimated parameters and the
observed evidence, the Viterbi algorithm is then used
to compute the most likely consensus prediction. Our
method has two advantages. First, they are able to
integrate different types of evidence, including gene
models, protein/EST alignments, splice site predictions,
etc. Second, it uses unsupervised learning and
thus does not require separately annotated training
data. We have performed extensive experiments on
combining multiple sources of evidence for Arabidopsis
thaliana and Honeybee genomes. Those experiments
show that Dagger achieve better specificity and
sensitivity on whole genes, exons, and nucleotides
than any of the individual sources of evidence. Predicting Protein Complexes using Relational Features from Experimental Data, Lisa Friedland, David Jensen and David KulpPoster Location: 21
Abstract Predicting which proteins physically interact is one aspect of understanding the cellular processes within yeast. We treat this as a link prediction task in a graph and develop a novel approach to incorporate network features into the model alongside diverse data sources. The classifier gains in performance as a result of the relational features: compared to a published model with the same data, it achieved comparable AUC-0.05 even though we withheld the two most predictive, but hard-to-obtain, attributes. Hidden Process Models, Rebecca A. Hutchinson, Tom M. Mitchell and Indrayana RustandiPoster Location: 22
Abstract We introduce Hidden Process Models (HPMs), a class of probabilistic
models for multivariate time series data. The design of HPMs has been
motivated by the challenges of modeling hidden cognitive processes in
the brain, given functional Magnetic Resonance Imaging (fMRI) data.
fMRI data is sparse, high-dimensional, non-Markovian, and often
involves prior knowledge of the form "hidden event A occurs n times
within the interval [t,t']." HPMs provide a generalization of
the widely used General Linear Model approaches to fMRI analysis, and
HPMs can also be viewed as a subclass of Dynamic Bayes Networks. Co-Designing Agents, Albert Goldfain, Michael W Kandefer, Stuart C Shapiro and Josephine AnsteyPoster Location: 23
Abstract In this paper we sketch a new approach for agent design and
operability called co-designing agents (CDA). As a working definition,
we take a CDA to be an agent that participates in some aspect of its
own design. The precise manner and degree of a CDA's participation
will ultimately be a design-time decision, but by considering a
specific agent implementation (that of AI actor-agents in a virtual
drama) we are able to extract a general set of CDA requirements. The
CDA approach utilizes concepts from several areas of active research
in AI. We present a broad summary of the relevant literature and
discuss its applicability to CDA design. We then consider the SNePS
knowledge representation, reasoning, and acting system as a potential
CDA implementation platform. Discriminative Multilingual Dependency Parsing, Kevin Lerman, Ryan McDonald and Fernando PereiraPoster Location: 24
Abstract We present a two-stage multilingual dependency parser and evaluate it on 13 diverse languages. The first stage is based on the unlabeled dependency parsing models described in McDonald and Pereira (2006) augmented with morphological features for a subset of the languages. The second stage takes the output from the first and labels all the edges in the dependency graph with appropriate syntactic categories using a globally trained sequence classifier over components of the graph. We report results on the CoNLL-X shared task data sets and present an error analysis. Feature Design for Transfer Learning, Mark Dredze, John Blitzer, Koby Crammer and Fernando PereiraPoster Location: 25
Abstract Discriminative learning methods for classification perform well when
training and test data are drawn from the same distribution and
labeled using the same function. However, often we have labeled data
for a task related to the target task but not for the target task
itself. Under what conditions does a good classifier for the related
task transfer to the target task? Feature representations that
capture uniformities between the two tasks are a crucial factor in
the success of transfer. We formalize this intuition theoretically
with a generalization bound for transfer, and demonstrate it
empirically with a particular kind of representation, user-centric
features, that helps transfer an email reply predictor between
different email users. Story Building: A Preliminary Study on Hypothesis Generation and Ranking, Shizhuo ZhuPoster Location: 26
Abstract Story building is the ability of creating certain hypothetical explanations for those not-yet-understood observations. In this paper, “story” and “story building” are defined from a logic perspective in the sense of decision making. After that we propose a basic idea of generating and ranking hypotheses from technical point of view. We attempt to apply abductive reasoning in generating hypotheses, based on the principle of relevance. Then we model the story building problem as an HMM, from which multiple algorithms could be adopted for hypothesis ranking. Synchronous Binarization for Machine Translation, Hao Zhang, Liang Huang, Daniel Gildea and Kevin KnightPoster Location: 27
Abstract Systems based on synchronous grammars and tree transducers
promise to improve the quality of statistical machine translation output, but
are often very computationally intensive. The complexity is
exponential in the size of individual grammar rules
due to arbitrary re-orderings between the two languages,
and rules extracted from parallel corpora can be quite large.
We devise a linear-time algorithm for factoring syntactic re-orderings
by binarizing synchronous rules when possible and show that the resulting rule set
significantly improves the speed and accuracy of a state-of-the-art
syntax-based machine translation system.
Effect of Degraded Input on Statistical Machine Translation, Faisal Farooq and Yaser Al-OnaizanPoster Location: 28
Abstract In this paper, we study the effects of degraded (noisy) input on the accuracy of Natural Language Processing(NLP) systems such as Machine Translation(MT). The noisy input is often the only available input as it is the output from another NLP application such as an Optical Character Recognition (OCR) system. We show that correcting OCR output reduces the degradation of the translation quality of a state-of-the-art Statistical Machine Translation (SMT) system when the corrected output is used, instead of the original output, as input to the MT system. We use a novel method for correcting the input based on a direct "phrase-based" translation system in contrast to tranditional HMM source-channel models.
Posters - Saturday, 3:15pm
Location: Upson Hall, 5th Floor
Modeling Query Term Dependencies in Information Retrieval with Markov Random Fields, Donald A. Metzler and W. Bruce CroftPoster Location: 1
Abstract This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on occurrences of single terms, ordered phrases, and unordered phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel approach is developed to train the model by directly maximizing mean average precision. Our results show that significant improvements are possible by modeling dependencies, especially on larger web collections. Re-ranking Search Results based on Perturbation of Concept-Association Graphs, Gaurav S Chandalia and Rohini K SrihariPoster Location: 2
Abstract There are many difficult IR queries that warrant the use of a more sophisticated content representation of a document corpus as opposed to the traditional bag of words model. Moreover, effective algorithms are required to fully exploit the semantic information contained in such a representation. In this paper, we propose a characterization of the document corpus that is able to capture such information based on inherent relationships between concepts across the corpus. These relationships are then used to construct a concept-association graph from a small set of relevant documents with respect to the query. We also propose an algorithm based on perturbation of this graph and apply it to the task of re-ranking search results. Our technique, which is completely independent of the query employs a variation of the Subspace HITS algorithm (Ng et al., 2001b). Experiments carried out on the TREC 2003 HARD track dataset show an improvement in precision as compared to its baseline run. Topic Modeling: Beyond Bag-of-Words, Hanna M. WallachPoster Location: 3
Abstract Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assumption, in which word order is ignored. Previously, these methods have not been combined. In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. The model hyperparameters are inferred using a Gibbs EM algorithm.
On two data sets, each of 150 documents, the new model exhibits better predictive accuracy than either a hierarchical Dirichlet bigram language model or a unigram topic model. Additionally, the inferred topics are less dominated by function words than are topics discovered using unigram statistics, potentially making them more meaningful.
Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs, Filip Radlinski and Thorsten JoachimsPoster Location: 4
Abstract Clickthrough data is a particularly inexpensive and plentiful
resource to obtain implicit relevance feedback for improving
and personalizing search engines. However, it is well known
that the probability of a user clicking on a result is
strongly biased toward documents presented higher in the
result set irrespective of relevance. We introduce a simple
method to modify the presentation of search results that
provably gives relevance judgments that are unaffected by
presentation bias under reasonable assumptions. We validate this property of the training
data in interactive real world experiments. Finally,
we show that using these unbiased relevance judgments
learning methods can be guaranteed to converge to an ideal
ranking given sufficient data.
Regularizing Query-Based Retrieval Scores, Fernando David DiazPoster Location: 5
Abstract In information retrieval, the cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting query-based information retrieval scores from an initial retrieval so that topically related documents receive similar scores. We refer to this process as score regularization. Score regularization can be presented as an optimization problem, allowing the use of results from semi-supervised learning. We demonstrate that regularized scores consistently and significantly rank documents better than un-regularized scores, given a variety of initial retrieval algorithms.
Solving POMDPs Using Quadratically Constrained Linear Programs, Christopher Amato, Daniel S. Bernstein and Shlomo ZilbersteinPoster Location: 6
Abstract Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One promising approach is based on representing POMDP policies as finite-state controllers. This method has been used successfully to address the intractable memory requirements that have limited POMDP algorithms. We illustrate some fundamental disadvantages of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), the optimal solution of which provides an optimal controller of a desired size. We consider a leading optimization method for solving QCLPs and compare its performance with existing POMDP optimization methods. While the optimization algorithm used in this paper only guarantees local optimality, it produces consistent solution quality improvement over the state-of-the-art techniques. The results show that powerful nonlinear programming methods can be used effectively to improve the performance and scalability of POMDP algorithms. An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem, Matthew J Streeter and Stephen F SmithPoster Location: 7
Abstract We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O ( ln (1/delta) ln(n)^2 / epsilon^2 ) trials are sufficient to identify, with probability at least 1-delta, a machine whose expected maximum payoff is within epsilon of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n -> infinity.
Regret-based Incremental Partial Revelation Mechanisms, Nathanael Hyafil and Craig BoutilierPoster Location: 8
Abstract Classic direct mechanisms suffer from the drawback of requiring
full type revelation from participating agents. In complex settings
with multi-attribute utility, assessing utility functions can be very difficult,
a problem addressed by recent work on preference elicitation.
In this work
we propose a framework for incremental, partial revelation
mechanisms and study the use of \emph{minimax regret} as
an optimization criterion for allocation determination with
type uncertainty. We examine the incentive properties of incremental
mechanisms when minimax regret is used to determine allocations
with no additional elicitation of payment information;
and when additional payment information is obtained. We argue that
elicitation effort can be focused simultaneously on reducing
allocation and payment uncertainty.
Invariant Mapping for Dimensionality Reduction, Raia Hadsell, Sumit Chopra and Yann LeCunPoster Location: 9
Abstract Dimensionality reduction involves mapping a set of high dimensional input points onto a low dimensional manifold so that ``similar" points in input space are mapped to nearby points on the manifold. We present a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent non-linear function that maps the data evenly to the output manifold. The learning relies solely on neighborhood relationships and does not require any distance measure in the input space. The method can learn mappings that are invariant to certain transformations of the inputs, as is demonstrated with a number of experiments.
Comparisons are made to other techniques, in particular LLE.
Discovering Natural Kinds of Robot Sensory Experiences in Unstructured Environments, Daniel H Grollman, Odest Chadwicke Jenkins and Frank WoodPoster Location: 10
Abstract We address the symbol grounding problem for robot perception through a
data-driven approach to deriving sensor categories. Unlike
model-based approaches, our method learns intrinsic categories (or
natural kinds) from the raw data itself. We approximate a manifold
underlying sensor data using Isomap nonlinear dimension reduction and
apply Bayesian clustering (Gaussian mixture models) to discover
categories. We demonstrate our method through the learning of sensory
kinds from trials in various indoor environments with
multiple sensor modalities. Learned kinds are used to classify new
sensor data (out-of-sample readings). We present results indicating
greater consistency in classifying sensor data employing mixture
models in low-dimensional embeddings.
A Learned Similarity Metric for Identity Verification, Sumit Chopra, Raia Hadsell and Yann LeCunPoster Location: 11
Abstract We present a method for training a similarity metric from data. The
method can be used for recognition or verification applications
where the number of categories is very large and not known during
training, and where the number of training samples for a single
category is very small. The idea is to learn a function that maps
input patterns into a target space such that the $L_1$ norm in the
target space approximates the ``semantic'' distance in the input
space. The method is applied to a face verification task, using the
Purdue/AR face database which has a very high degree of variability
in pose, lighting, expression, and position, and includes artificial
occlusions such as dark glasses and obscuring scarves. Decision-Theoretic User Modeling, Bowen Hui and Craig BoutilierPoster Location: 12
Abstract Automated software customization is drawing increasing attention as a means to help users deal with the scope, complexity, potential intrusiveness, and ever-changing nature of modern software. The ability to automatically customize functionality, interfaces, and advice to specific users is made more difficult by the uncertainty about the needs of specific individuals and their preferences for interaction. Following recent probabilistic techniques in user modeling, we model our user with a dynamic Bayesian network (DBN) and propose to explicitly infer the ``user's type'' --- a composite of personality and affect variables --- in real time. We design the system to reason about the impact of its actions given the user's current state. Our simulations show that user types can be inferred quickly and that a myopic policy offers considerable benefit by adapting to both different types and changing attitudes. We then develop a more realistic user model, using behavioural data from 45 users to learn model parameters and the topology of our proposed user types. With the new model, we conduct a usability experiment with 4 users and 4 different policies. These experiments, while preliminary, show encouraging results for our adaptive policy. Sketch Recognition Using Multiscale Models of Temporal Patterns, Metin SezginPoster Location: 13
Abstract With the emergence and widespread use of
Tablet PCs, recognizing freehand sketches has
gained importance as an enabling technology
for pen-based intelligent human computer interfaces.
This paper presents a sketch recognition
framework based on multiscale statistical
models of temporal patterns. Previous work in
sketch recognition has shown that in certain domains,
stroke orderings used in the course of
drawing individual objects contain temporal patterns
that can aid recognition. We build on this
work to show how sketch recognition systems
can use knowledge of both common stroke orderings
and common object orderings. We describe
a statistical framework based on Dynamic
Bayesian Networks that can learn temporal models
of object-level and stroke-level patterns for
recognition. Our recognition framework supports
multi-object strokes, multi-stroke objects,
and it supports real-valued feature representations
using a numerically stable recognition algorithm.
We present recognition results for handdrawn
electronic circuit diagrams. The results
show that modeling temporal patterns at multiple
scales provides a significant increase in correct
recognition rates, with no added computational
penalties. Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models, Praveen Krishnamurthy and Matthew J BealPoster Location: 14
Abstract Most existing approaches to clustering gene
expression time course data treat the different
time points as independent dimensions
and are invariant to permutations, such as reversal,
of the experimental time course. Approaches
utilizing HMMs have been shown
to be helpful in this regard, but are hampered
by having to choose model architectures
with appropriate complexities. Here we
propose for a clustering application an HMM
with a countably infinite state space; inference
in this model is possible by recasting it
in the hierarchical Dirichlet process (HDP)
framework (Teh et al. 2006), and hence we
call it the HDP-HMM. We show that the
infinite model outperforms model selection
methods over finite models, and traditional
time-independent methods, as measured by
a variety of external and internal indices for
clustering on two large publicly available data
sets. Moreover, we show that the infinite
models utilize more hidden states and employ
richer architectures (e.g. state-to-state
transitions) without the damaging effects of
overfitting. Implicit nonlinear encapsulation of anatomical structure relations in scoliosis permitting almost linear regression schemes, Charles Bergeron, Farida Cheriet, Janet Ronsky, Ronald Zernicke and Hubert LabellePoster Location: 15
Abstract This paper represents the asymmetric trunk surface and deformed spinal curve in scoliosis patients from the approach of functional data analysis. The minimum noise fractions transform (a generalized principal components analysis) extracts noiseless and shape-comprehensive functional components. Coefficients thereof are used to relate the trunk surface to the spine curve using two multivariate multiresponse learning techniques: least squares linear regression and kernel partial least squares regression. This paper finds optimal models that are almost linear and competitive with data acquisition errors, explaining $74\%$ of spinal variance with a prediction error of 9.1 millimetres. These results are clinically significant, and suggests a noninvasive protocol for scoliosis follow-up. An Architecture for Distributed Internet Fault Diagnosis using Probabilistic Relational Models, George J LeePoster Location: 16
Abstract Internet fault diagnosis today is slow, costly, and error-prone
because it requires humans to run diagnostic tests and interpret
their results. We can automate much of the process of diagnosis
using a network of intelligent diagnostic agents. Distributed
diagnosis in such an network requires a common ontology for
expressing diagnostic knowledge and data and a protocol for
distributed probabilistic reasoning. In this paper I show how the
Common Architecture for Probabilistic Reasoning for Internet
Diagnosis (CAPRID) can satisfy these requirements using
probabilistic relational models (PRMs) and describe how agents can
diagnose Internet failures using this architecture. Typing Staphylococcus aureus using the spa gene and novel distance measures, Phaedra Agius, Barry N. Kreiswirth, Steve Naidich and Kristin P. BennettPoster Location: 17
Abstract We develop an approach for identifying groups of
Staphylococcus aureus bacteria based on genotype data. With
the emergence of drug resistant strains, S. aureus represents
a significant human health threat. Identifying the family types
efficiently is crucial in community settings. We develop a hybrid
sequence algorithm approach to type this bacterium using only its
spa gene. Two of the sequence algorithms we used are well
established, while the third, the Best Common Gap-Weighted Sequence
(BCGS), is novel. We combined the sequence algorithms with a
weighted match/mismatch algorithm for the spa sequence ends.
Normalized similarity scores and distances between the sequences
were derived and used within unsupervised clustering methods. The
resulting spa groupings and the visual displays we obtained
correlated strongly with the groups defined by the well-established
Multi locus sequence typing (MLST) method. Spa typing is
preferable to MLST typing which types seven genes instead of just
one. Furthermore, our spa clustering methods can be
fine-tuned to be more discriminative, to identify new strains that
the MLST method may not. The proposed methodology provides a
promising new approach to molecular epidemiology.
SVM Learning of IP Address Structure for Latency Prediction, Robert Beverly and Karen SollinsPoster Location: 18
Abstract We examine the ability to exploit the hierarchical structure of
Internet IP addresses in order to predict network properties.
Specifically, we consider machine learning for prediction of
round-trip latency to random network destinations. Our method forms a
prediction solely based upon an IP address without prior knowledge of
that IP address. Using Support Vector Machine (SVM)
multi-classification on a randomly collected Internet data set, we
achieve over 80% accuracy using only 20% of our samples for
training. SVM regression yields a mean prediction error of 25ms with
20% training samples. This level of performance can equip a variety
of network architectures with intelligence including service
selection, user-directed routing, resource scheduling and network
measurement. Finally, we analyze the information content of IP
addresses and find that the first eight most significant bits provide
surprisingly strong discriminative power.
Combining Data from Multiple Subjects in Classification of fMRI Data, Indrayana RustandiPoster Location: 19
Abstract I propose an approach to account for differences in individual
subjects in the context of classification of fMRI data. Exploiting Spatial Information to Improve fMRI Pattern Classification, Melissa K Carroll, Kenneth A Norman, James V Haxby and Robert E SchapirePoster Location: 20
Abstract Classification methods have been successfully
applied to pattern extraction from fMRI
(Mitchell et al., 2004; Polyn et al., 2005).
These approaches have used individual voxels
as features, ignoring the spatial correlation
of activity between voxels. The present
work describes one method, adapted from
computer vision, for generating features that
capture spatial correlation, and demonstrates
how it can improve classification accuracy by
5% or more. This approach also has the potential
for discerning which types of neural
features are most useful for discriminating
between cognitive states.
Estimating Variable Interaction with Ensembles of Trees, Daria SorokinaPoster Location: 21
Abstract Data mining methods can help to extract previously unknown information from data, in particular information about different relationships between data attributes. Variable interaction detection is one of such tasks: it questions whether the target function is representable as the sum of several functions depending on smaller number of variables. We propose an algorithm for this task, discuss its advantages over the previous solutions and mention potential problems. Scalable Online Support Vector Machines, Seyda Ertekin, Leon Bottou and C. Lee GilesPoster Location: 22
Abstract The goal of this paper is to introduce fast kernel classifier algorithm which has an outstanding speed improvement over the classical Support Vector Machines (SVMs) and the known online SVM algorithms, while preserving the classification accuracy rates of the state-of-the-art SVM solvers. We also show that not all the examples are equally informative in the training set. We present methods to reach the most informative examples and exploit those to reduce the computational requirements of learning algorithms. Inductive Transfer for Bayesian Network Structure Learning, Alexandru Niculescu-Mizil and Rich CaruanaPoster Location: 23
Abstract We consider the problem of learning Bayes Net structures for related
tasks. We present an algorithm for learning Bayes Net structures that
takes advantage of the similarity between tasks by biasing learning
toward similar structures for each task. Heuristic search is used to
find a high scoring set of structures (one for each task), where
the score for a set of structures is computed in a principled
way. Experiments on synthetic problems generated from the ALARM and
INSURANCE networks show that learning the structures for related tasks
using the proposed method yields better results than learning the
structures independently.
Model Compression, Cristian Bucila, Rich Caruana and Alexandru Niculescu-MizilPoster Location: 24
Abstract Often the best performing supervised learning models are ensembles of
hundreds or thousands of base-level classifiers. Unfortunately,
the space required to store this many classifiers, and the time
required to execute them at run-time, prohibits their use in
applications where test sets are large (e.g. Google), where storage
space is at a premium (e.g. PDAs), and where computational power is
limited (e.g. hearing aids). We present a method for ``compressing''
large, complex ensembles into smaller, faster models, usually without
significant loss in performance.
Meta Clustering, Rich Caruana, Mohamed Elhawary, Nam Nguyen and Casey SmithPoster Location: 25
Abstract Clustering is ill-defined. Unlike supervised learning where labels lead
to crisp performance criteria such as accuracy and squared error,
clustering quality depends on how the results will be used. Devising
clustering criteria that capture what users need is difficult. Whereas
most clustering algorithms search for the single, optimal clustering, we
present an approach that searches for many alternate reasonable
clusterings of the data, and then allows users to select the
clustering(s) that best fit their needs. We cluster the clusterings so
that users must only examine a small number of qualitatively different
clusterings. In this paper, we present methods for automatically
generating a diverse set of alternate clusterings, as well as methods
for grouping clusterings into meta clusters. Surprisingly, clusterings
that would be of most interest to users often are not the most compact
clusterings. Practical Probabilistic Planning in Real-World Domains, Huy Pham, Mikhail Soutchanski and John MylopoulosPoster Location: 26
Abstract Planning under uncertainty, and its asscociated computational difficulties, has attracted a signifi-cant amount of attention in AI and other fields. To overcome the computational problems asso-ciated with Markov Decision Processes (MDPs) in large scale domains, researchers often take advantage of the problem’s structural properties and look for approximately optimal solutions. DTGolog, a Situation Caluculus-based decision-theoretic agent programming language, was pro-posed to ease some of the computational difficul-ties by using natural ordering constraints on exe-cution of actions. Using DTGolog, domain spe-cific constraints on the set of available policies can be expressed in a high-level program and this program helps to reduce significantly com-putation required to find a policy optimal in this set.
In this paper, we investigate DTGolog’s scalabil-ity to large and complex domains by applying it to the well-known problem of the London Am-bulance Service, a real-world domain that, while still largely unknown in AI, has recieved a lot of attention in Software Engineering, and demon-strate that DTGolog can be used to quantitatively evaluate several different designs of a decision making agent.
Adaptive Planning for a Team of Agents with Resource Constraints, Rui WangPoster Location: 27
Abstract The dynamic nature of many real-world domains (e.g., military, homeland security and spacecraft maneuvers, etc) requires plans to be modified to adapt to changes in the environment. While the plan adaptation problem for individual agent has been investigated by many researchers, collaborative plan adaptation for a team of agents raises important issues that haven’t been fully addressed. These issues are challenging because an agent has limited belief and information about other agents’ resources and their states. This limitation makes it difficult to detect and resolve potential conflicts of resources among different agents due to changes in the environment. This paper focuses on those issues and presents novel solutions based on the anticipation capability of a multi-agent teamwork model called CAST (Collaborative Agents for Simulating Teamwork). More specific, this research aims to enable an agent to anticipate information and resource needs of its teammates for coordinating the process of adapting existing plans online, in real-time, under the dynamically changing environment. As a result, each planner can evaluate the implications of the change and generate response options better based on more complete information. The adaptation to changes will be fast, accurate and relieve the cognitive load of human planners.
|