Talk Schedule
Talk sessions will be held in Upson Hall rooms B17 (in the basement) and in Phillips Hall room 101 (on the ground floor). Each talk will have a 30 minute time slot including 5 minutes for questions. The talks will be presented in the order below. Click on the talk titles to see an abstract.
Session A1: Machine Learning
Saturday, 9am-10:30pm. Room: Upson B17
Convex Clustering with Exemplar-based Models, Danial LashkariAbstract Clustering is often formulated as the maximum
likelihood estimation of a mixture model that explains the data. The
EM algorithm widely used to solve the resulting optimization problem
is inherently a gradient-descent method and is sensitive to
initialization. The resulting solution is a local optimum in the
neighborhood of the initial guess. This sensitivity to
initialization presents a significant challenge in clustering large
data sets into many clusters. In this paper, we present a different
approach to approximate mixture fitting for clustering. We introduce
an exemplar-based likelihood function that approximates the exact
likelihood. This formulation leads to a convex minimization problem
and an efficient algorithm with guaranteed convergence to the
globally optimal solution. The resulting clustering can be thought
of as a probabilistic mapping of the data points to the set of
exemplars that minimizes the average distance and the
information-theoretic cost of mapping. We present experimental
results illustrating the performance of our algorithm and its
comparison with the conventional approach to mixture model
clustering.
Learning from Labeled Features using Generalized Expectation Criteria, Gregory Druck, Gideon Mann and Andrew McCallumAbstract It is difficult to apply machine learning to new domains because there are often no existing labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of correlations between input features and classes. Specifically, we propose a method for training discriminative probabilistic models with ``labeled features" and unlabeled instances. Unlike previous approaches that use labeled features to create labeled instances, we use labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter estimation objective function that assign scores to values of a model expectation. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled features, and experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. Using Functions on a Model Graph for Inductive Transfer, Eric Eaton, Marie desJardins and Terran LaneAbstract In this paper, we propose a novel graph-based method for knowledge transfer. We embed a set of learned background models in a graph that captures the transferability between the models. We then learn a function on this graph that automatically determines the parameters to transfer to each learning task. Transfer to a new problem proceeds by mapping the problem into the graph, then using the function to determine the parameters to transfer in learning the new model. This method is analogous to inductive transfer along a manifold that captures the transfer relationships between the tasks.
Session A2: NLP
Saturday, 9am-10:30pm. Room: Phillips 101.
Forest Reranking: Discriminative Parsing with Non-Local Features, Liang HuangAbstract Conventional n-best reranking techniques often suffer from the limited scope of the
n-best list, which rules out many potentially good alternatives. We instead propose forest reranking, a method that reranks a packed forest of exponentially many parses. Although exact inference is intractable with non-local features, we present an approximation algorithm that makes discriminative training practical over the whole Treebank. Our final result, an F-score of 91.7, outperforms both 50-best and 100-best reranking baselines, and is better than any previously reported systems trained on the Treebank.
Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis, Kevin Lerman, Ari Gilder, Mark Dredze and Fernando PereiraAbstract Media reporting of current events shapes public opinion which can in turn influence events. This is particularly true of political elections, in which candidates both respond to and shape public perception of their campaigns. We use computational linguistics to automatically predict changes in public perception of candidates during the 2004 US Presidential election season. Our system uses daily newspaper articles to predict shifts in public opinion as reflected in prediction markets. We discuss various types of features designed for this problem. The news system improves market prediction over baseline market systems. Better Alignments = Better Translations?, Joao Graca, Kuzman Ganchev and Ben TaskarAbstract Automatic word alignment is a key step in training statistical machine
translation systems. Despite much recent work on word alignment
methods, alignment accuracy increases often produce little or no
improvements in machine translation quality. In this work we analyze a
recently proposed agreement-constrained EM algorithm for unsupervised
alignment models. We attempt to tease apart the effects that this
simple but effective modification has on alignment precision and
recall trade-offs, and how rare and common words are affected across
several language pairs. We propose and extensively evaluate a simple
method for using alignment models to produce alignments better-suited
for phrase-based MT systems, and show significant gains (as measured
by BLEU score) in end-to-end translation systems for six languages
pairs used in recent MT competitions.
Session B1: Structured Learning
Saturday, 2:30pm-4pm. Room: Upson B17
Structured Learning with Approximate Inference, Alex Kulesza and Fernando PereiraAbstract In many structured prediction problems, the highest-scoring labeling
is hard to compute exactly, leading to the use of approximate
inference methods. However, when inference is used in a learning
algorithm, a good approximation of the score may not be sufficient. We
show in particular that learning can fail even with an approximate
inference method with rigorous approximation guarantees. There are
two reasons for this. First, approximate methods can effectively
reduce the expressivity of an underlying model by making it impossible
to choose parameters that reliably give good predictions. Second,
approximations can respond to parameter changes in such a way that
standard learning algorithms are misled. In contrast, we give two
positive results in the form of learning bounds for the use of
LP-relaxed inference in structured perceptron and empirical risk
minimization settings. We argue that without understanding
combinations of inference and learning, such as these, that are
appropriately compatible, learning performance under approximate
inference cannot be guaranteed. Training Structural SVMs when Exact Inference is Intractable, Thomas W Finley and Thorsten JoachimsAbstract While discriminative training (e.g., CRF, structural SVM) holds much promise for machine translation, image segmentation, and clustering, the complex inference these applications require make exact training intractable. This leads to a need for approximate training methods. Unfortunately, knowledge about how to perform efficient and effective approximate training is limited. Focusing on structural SVMs, we provide and explore algorithms for two different classes of approximate training algorithms, which we call undergenerating (e.g., greedy) and overgenerating (e.g., relaxations) algorithms. We provide a theoretical and empirical analysis of both types of approximate trained structural SVMs, focusing on fully connected pairwise Markov random fields. We find that models trained with overgenerating methods have theoretic advantages over undergenerating methods, are empirically robust relative to their undergenerating brethren, and relaxed trained models favor non-fractional predictions from relaxed predictors. Collective Inference on Markov Models for Modeling Bird Migration, Daniel Sheldon, M. A. Saleh Elmohamed and Dexter KozenAbstract We investigate a family of inference problems on Markov models, where
many sample paths are drawn from a Markov chain and partial
information is revealed to an observer who attempts to reconstruct the
sample paths. We present algorithms and hardness results for several
variants of this problem which arise by revealing different
information to the observer and imposing different requirements for
the reconstruction of sample paths. Our algorithms are analogous to
the classical Viterbi algorithm for Hidden Markov Models, which finds
the single most probable sample path given a sequence of observations. Our
work is motivated by an important application in ecology: inferring
bird migration paths from a large database of observations.
Session B2: ML in Vision
Saturday, 2:30pm-4pm. Room: Phillips 101
Deep Belief Net Learning for Online Terrain Classification on a Mobile Robot, Raia Hadsell, Ayse Erkan, Pierre Sermanet, Marco Scoffier, Urs Muller and Yann LeCunAbstract We present an online classifier that is able to accurately classify
complex terrain at distances up to the horizon, thus allowing
high-level strategic planning for a mobile robot. A deep belief
network is trained to extract informative and meaningful features from
an input image, and the features are used to train a realtime
classifier to predict traversability. Supervision for the training
comes from a short-range stereo module, which robustly labels the
image using 5 classes. The process was developed and tested on the
LAGR mobile robot. A Discriminative Semi-Markov Model for Robust Scene Text Recognition, Jerod J Weinman, Erik Learned-Miller and Allen R HansonAbstract We present a system for recognizing scene text that is aided by the use of a lexicon, yet also allows non-lexicon words. Our discriminative semi-Markov probabilistic model fully integrates both character and word segmentation with recognition. The system uses wavelet image features for recognition and requires only approximate location of the text baseline and font size, but no binarization or prior word segmentation is necessary. To facilitate inference with a large lexicon, we compare several methods for approximate Viterbi beam search. Our system performs robustly on low-resolution images of signs containing text in fonts atypical of documents. Dynamic Visual Category Learning, Tom YehAbstract Dynamic visual category learning calls for efficient adaptation as
new training images become available or new categories are
defined, existing training images or categories become modified or
obsolete, or when categories are divided into subcategories or
merged together. We develop novel methods for efficient
incremental learning of SVM-based visual category classifiers to
handle such dynamic tasks. Our method exploits previous classifier
estimates to more efficiently learn the optimal parameters for the
current set of training images and categories. We show
empirically that for dynamic visual category tasks, our
incremental learning methods are significantly faster than batch
retraining.
Session C1: Game theory/Multi-agent theory
Saturday, 4:30pm-6pm. Room: Upson B17
Learning When to Take Advice: A Statistical Test for Achieving A Correlated Equilibrium, Greg Hines and Kate LarsonAbstract In this paper we study a multiagent learning problem where agents can either learn via repeated interactions, or can follow the advice of a mediator who suggests possible actions to take. We present an algorithm that each agent can use so that, with high probability, they can verify whether or not the mediator's advice is useful. In particular, if the mediator's advice is useful then agents will reach a correlated equilibrium, but if the mediator's advice was not useful, then agents are not harmed by using our test, and can fall back to their original learning algorithm. We then generalize our algorithm and show that in the limit it always correctly verifies the mediator's advice. Copeland Voting: Ties Matter, Piotr Faliszewski, Edith Hemaspaandra and Henning SchnoorAbstract We study the complexity of manipulation for a family of election systems derived from Copeland voting via introducing a parameter alpha that describes how ties in head-to-head contests are valued. We show that the thus obtained problem of manipulation for unweighted Copeland^alpha elections is NP-complete even if the size of the manipulating coalition is limited to two. Our result holds for all rational values of alpha such that 0 < alpha < 1 except for alpha = 1/2 . Learning from Collective Behavior, Michael Kearns and Jennifer WortmanAbstract Inspired by longstanding lines of research in sociology and related fields, and by more recent large-population human subject experiments on the Internet and the Web, we initiate a study of the computational issues in learning to model collective behavior from observed data. We define formal models for efficient learning in such settings, and provide both general theory and specific learning algorithms for these models.
Session C2: Applied AI
Saturday, 4:30pm-6pm. Room: Phillips 101
Ranking in a Multiple Instance Setting, Charles Bergeron, Jed Zaretzki, Curt Breneman and Kristin P. BennettAbstract This paper introduces a novel machine learning model called
multiple instance ranking (MIRank) that enables ranking to be
performed in a multiple instance learning setting. The motivation
for MIRank stems from the hydrogen abstraction problem in
computational chemistry, that of predicting the grouping of
hydrogen atoms from which a hydrogen is abstracted (removed)
during metabolism. The model predicts the preferred hydrogen
grouping within a molecule by ranking the groups, with the
ambiguity of not knowing which hydrogen within the preferred
grouping is actually abstracted. This paper formulates MIRank in
its general context and proposes an algorithm for solving MIRank
problems using successive linear programming. The method
outperforms multiple instance classification models on several
real and synthetic datasets. An Empirical Evaluation of Supervised Learning in High Dimensions, Ainur Yessenalina, Nikos Karampatziakis and Rich CaruanaAbstract In this paper we perform an empirical evaluation
of supervised learning methods on highdimensional
data. We evaluate learning performance
on three metrics: accuracy, AUC,
and squared loss. We also study the effect
of increasing dimensionality on the relative
performance of the learning algorithms. Our
findings are consistent with previous studies
for problems of relatively low dimension,
but suggest that as dimensionality increases
the relative performance of the various learning
algorithms changes. Minimax regret based elicitation of generalized additive utilities, Darius Braziunas and Craig BoutilierAbstract We describe the semantic foundations for elicitation of generalized additively independent (GAI) utilities using the minimax regret criterion, and propose several new query types and strategies for this purpose. Computational feasibility is obtained by exploiting the local GAI structure in the model. Our results provide a practical approach for implementing preference-based constrained configuration optimization as well as effective search in multiattribute product databases.
Session D1: Planning and Search
Sunday, 9:30am-10:30am. Room: Upson B17
Conformant Planning through QBF, Alexandra Goultiaeva and Fahiem BacchusAbstract Conformant planning involves finding a plan that is guaranteed to achieve the goal in the face of incomplete information. Unlike classical planning, conformant planning cannot be directly encoded into SAT. It can, however, be directly encoded in the more expressive language of Quantified Boolean Formulas (QBF). The resulting QBF can then be solved in different ways including through the application of generic QBF solvers. In this paper we investigate how recent advances in QBF solving, specifically the use of circuit-based representations, can be exploited to solve conformant planning problems more effectively. We develop some new ways of encoding these problems so as to better exploit the circuit-based representation, and show empirically that this can lead to significant performance improvements. Using Inadmissible Heuristics for Bounded Suboptimal Searc, Jordan Thayer, Wheeler Ruml and Ephrat BittonAbstract Applications often demand we tackle problems that are too large to
solve optimally. We introduce two general approaches to heuristic
search with inadmissible heuristics and show how to guarantee a bound
on the suboptimality of the resulting solutions while ignoring many
duplicate states. We also propose a new inadmissible heuristic based
on on-line temporal difference learning. We show empirically that
these new approaches to suboptimal heuristic search can outperform
weighted A* on standard grid-world path-finding and temporal planning
benchmarks when finding near-optimal solutions.
|