Artificial Intelligence Seminar

Fall 2017
Friday 12:00-1:15
Gates Hall G01

Lunch in Gates 122 at 11:45


The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. Lunch will be served starting at 11:45, with the talks running between 12:15 and 1:15. The new format is designed to allow AI chit-chat before the talks begin. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.

If you or others would like to be deleted or added from this announcement, please contact Vanessa Maley at

Friday, August 25th, 2017

Speaker: Bharath Hariharan


Title: Exploring visual reasoning using synthetic controlled datasets

Abstract: There has been a lot of recent work on visual question answering (VQA), where the goal is to get machines to answer arbitrary questions about images. It is assumed that solving this task requires not just understanding the content of the scene but also reasoning about the image and what the question asks. However, recent research has shown that simple baselines without reasoning do just as well as more sophisticated algorithms, perhaps because of subtle biases in the dataset. To control for these biases and to test different aspects of VQA systems in a controlled manner, we create a synthetic dataset with ground truth semantics for both the visual scene and the question, and show that on these datasets simple baselines predictably fail. We also identify weaknesses in state-of-the-art VQA systems. Further, we demonstrate that these weaknesses can be reduced by building reasoning into the architecture of these systems. In this talk, I will describe this work, its conclusions and caveats, and place it in the context of other contemporary and subsequent work.

Joint work with Justin Johnson at Stanford and colleagues at Stanford and FAIR.

“The AI-Seminar is sponsored by Microsoft”

Friday, September 1st, 2017

Each talk is 30 minutes.

Talk #1: Speaker: Justine Zhang

Title: Asking too much? The rhetorical role of questions in political discourse

Abstract: Questions play a prominent role in social interactions, performing rhetorical functions that go beyond that of simple informational exchange.  The surface form of a question can signal the intention and background of the person asking it, as well as the nature of their relation with the interlocutor.  While the informational nature of questions has been extensively examined in the context of question-answering applications, their rhetorical aspects have been largely understudied. 

In this work we introduce an unsupervised methodology for extracting surface motifs that recur in questions, and for grouping them according to their latent rhetorical role.  By applying this framework to the setting of question sessions in the UK parliament, we show that the resulting typology encodes key aspects of the political discourse---such as the bifurcation in questioning behavior between government and opposition parties---and reveals new insights into the effects of a legislator's tenure and political career ambitions.

This is joint work with Cristian Danescu-Niculescu-Mizil and Arthur Spirling. 

Talk #2: Speaker: Tianze Shi

Title: Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Abstract: We present a minimal feature set for transition-based dependency parsing, continuing a recent trend of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework to produce the first implementation of worst-case O(n^3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n^3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the "second-best-in-class" result on the English Penn Treebank.

This is joint work with Liang Huang and Lillian Lee. And the talk will be self-contained.

“The AI-Seminar is sponsored by Microsoft”

Friday, September 8th, 2017

Speaker: Gao Huang

Title: Dense connectivity for efficient deep learning

Abstract: Recent years we have witnessed astonishing progress in the field of deep learning. This development was largely due to the availability of training and inferencing very deep models, which have even managed to surpass human-level performance on many specific tasks. However, the requirements for real world applications differ from those necessary to win competitions, as the computational efficiency is a major concern in practice. In this talk, I will first introduce a densely connected network (DenseNet) with shortcut connections throughout the network. The dense connectivity alleviates the vanishing-gradient problem, encourages feature reuse, and substantially reduces the number of parameters in the network. During this talk, I will also introduce a multi-scale dense network (MSDNet) with shortcut classifiers, which facilitate retrieving fast and accurate predictions from intermediate layer, leading to significantly improved inference efficiency over typical convolutional networks

“The AI-Seminar is sponsored by Microsoft”

Friday, September 15th, 2017

Two student talks, 30 minutes each

Talk #1:

Speaker: Ben Athiwaratkun

Title: Multimodal Word Distributions

Abstract: Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment. 

Talk #2:

Speaker: Christoforos Mavrogiannis

Title: Socially Competent Navigation Planning by Deep Learning of Multi-Agent Path Topologies

Abstract: We present a novel, data-driven framework for planning socially competent robot behaviors in crowded environments. The core of our approach is a topological model of collective navigation behaviors, based on braid groups. This model constitutes the basis for the design of a human-inspired probabilistic inference mechanism that predicts the topology of multiple agents' future trajectories, given observations of the context. We derive an approximation of this mechanism by employing a neural network learning architecture on synthetic data of collective navigation behaviors. Our planner makes use of this mechanism as a tool for interpreting the context and understanding what future behaviors are in compliance with it. The planning agent makes use of this understanding to determine a personal action that contributes to the context in the most clear way possible, while ensuring progress to its destination. Our simulations provide evidence that our planning framework results in socially competent navigation behaviors not only for the planning agent, but also for interacting naive agents.  Performance benefits include (1) early conflict resolutions and (2) faster uncertainty decrease for the other agents in the scene.

This is joint work with Valts Blukis and Ross Knepper.

“The AI-Seminar is sponsored by Microsoft”

Friday, September 22nd, 2017

Speaker: Christopher Matthew De Sa

Title: Fast Stochastic Algorithms for Machine Learning

Abstract: As machine learning applications become larger and more widely used, there is an increasing need for efficient systems solutions. The performance of essentially all machine learning applications is limited by bottlenecks, such as parallelizability and memory bandwidth, with effects that cut across traditional layers in the software stack. The key property that helps us address these bottlenecks is the fact that machine learning problems are statistical and thus have some built-in error tolerance: this gives us additional degrees of freedom that we can use when designing and optimizing machine learning algorithms.  In order to use these extra degrees of freedom effectively, we need to develop techniques that can leverage noise-tolerance to increase the throughput of common machine learning algorithms, while provably having little effect on their accuracy.

In this talk, I will identify a broad class of algorithms, stochastic iterative algorithms, that often determine the performance of machine learning systems. I will describe several methods that can be applied to speed up stochastic iterative algorithms in a principled way by using high-level structural information about a problem.  I will demonstrate how my approach can be applied to a specific bottleneck (parallel overheads), problem (inference), and algorithm (asynchronous Gibbs sampling).  Finally, I will illustrate the effectiveness of these methods on a range of problems including CNNs.

“The AI-Seminar is sponsored by Microsoft”

Friday, September 29th, 2017

Speaker: Karthik Sridharan

Title: A Journey Towards Plug-&-Play ML: Online Learning, Probabilistic Inequalities and Zigzag concave Functions

Abstract: At the get go, online machine learning, probabilistic inequalities like concentration for martingale valued random variables and convex/concave analysis dont seem related. But in this talk we will see how the concepts are inherently interlinked. While the topics seem esoteric, we will see that at the end the connection will yield a simple and elegant way for designing adaptive machine learning algorithms. We will also see how all of these will help us get a step closer to what I shall term Plug-&-Play ML. That is help us move a step towards building machine learning systems automatically. 

“The AI-Seminar is sponsored by Microsoft”

Friday, October 6th, 2017

*Live-streamed remote talk*

Speaker: Ray Mooney, University of Texas at Austin

Host: Yoav Artzi

Title: Robots that Learn Grounded Language Through Interactive Dialog

Abstract: In order to develop an office robot that learns to accept natural language commands, we have developed methods that learn from natural dialog with ordinary users rather than from manually labeled data.  By engaging users in dialog, the system learns a semantic parser, an effective dialog management policy, and a grounded semantic lexicon that connects words to multi-modal (visual, auditory and haptic) perception. In addition to learning from clarification dialogs when understanding user commands, it also engages people in interactive games such as "I Spy." We have tested our approach on both simulated robots using on-line crowdsourced users on the web as well as with people interacting with real robots in our lab. Experimental results demonstrate our methods  produce more successful, shorter dialogs over time and learn to accurately identify objects from natural language descriptions using multi-modal perception.

Bio: Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 160 published research papers, primarily in the areas of machine learning and natural language processing. He was the President of the International Machine Learning Society from 2008-2011, program co-chair for AAAI 2006, general chair for HLT-EMNLP 2005, and co-chair for ICML 1990. He is a Fellow of the American Association for Artificial Intelligence, the Association for Computing Machinery, and the Association for Computational Linguistics and the recipient of best paper awards from AAAI-96, KDD-04, ICML-05 and ACL-07.

“The AI-Seminar is sponsored by Microsoft”

Friday, October 13th, 2017

Speaker: Andrew McCallum

Title: Knowledge Bases of Science with Representation and Reasoning through Universal Schema

Abstract: We want to build a large-scale knowledge bases of science containing entities and relations in fields such as biomedicine, material science, computer science, and STEM career paths.  Work in knowledge representation and knowledge bases has long struggled to design schemas of entity- and relation-types that capture the desired balance of specificity and generality while also supporting reasoning and information integration from various sources of input evidence.  In this talk I will describe our work in "universal schema," a deep learning approach to knowledge representation in which we operate on the union of all input schemas (from structured KBs to natural language textual patterns) while also supporting integration and generalization by learning vector embeddings whose neighborhoods capture semantic implicature.  I will also discuss our work in (a) large-scale, non-greedy clustering for entity resolution, (b) question answering with chains of reasoning, using reinforcement learning to guide the efficient search for meaningful chains, and (c) embedded vector representations of common sense, (d) applications to material science (in collaboration with Elsa Olivetti, MIT), and biomedicine.  I will also briefly touch on our ongoing efforts to revolutionize scientific peer review by creating systems supporting a variety of reviewing workflows, including "open peer review" and improved expertise modeling.

Bio: Andrew McCallum is a Professor and Director of the Information Extraction and Synthesis Laboratory, as well as Director of Center for Data Science in the College of Information and Computer Science at University of Massachusetts Amherst. He has published over 250 papers in many areas of AI, including natural language processing, machine learning and reinforcement learning; his work has received over 50,000 citations.  He obtained his PhD from University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun. In the early 2000's he was Vice President of Research and Development at at WhizBang Labs, a 170-person start-up company that used machine learning for information extraction from the Web. He is a AAAI Fellow, the recipient of the UMass Chancellor's Award for Research and Creative Activity, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from Google, IBM, Microsoft, Oracle, and Yahoo. He was the General Chair for the International Conference on Machine Learning (ICML) 2012, and is the now serving as Past-President of the International Machine Learning Society, as well as member of the editorial board of the Journal of Machine Learning Research. For the past twenty years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, entity resolution, social network analysis, structured prediction, semi-supervised learning, and deep neural networks for knowledge representation. His work on open peer review can be found at McCallum's web page is

“The AI-Seminar is sponsored by Microsoft”

Friday, October 20th, 2017

Speaker: Andrew Wilson

Title: Bayesian Generative Adversarial Networks

Abstract: Through an adversarial game, generative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood.  I will present a practical Bayesian formulation for unsupervised and semi-supervised learning with GANs.  Within this framework, we use a stochastic gradient Hamiltonian Monte Carlo for marginalizing parameters.  The resulting approach can automatically discover complementary and interpretable generative hypotheses for collections of images.  Moreover, by exploring an expressive posterior over these hypotheses, we show that it is possible to achieve state-of-the-art quantitative results on image classification benchmarks, even with less than 1% of the labelled training data.  


“The AI-Seminar is sponsored by Microsoft”

Friday, October 27th, 2017

Speaker: Pantelis Pipergias Analytis

Title: You're special, but it doesn't matter if you 're a greenhorn: Social recommender strategies for mere mortals.

Abstract: Most choices people make are about “matters of taste” on which there is no universal, objective truth. Nevertheless, people can learn from the experiences of individuals with similar tastes who have already evaluated the available options—a potential harnessed by recommender systems. We mapped recommender system algorithms to models of human judgment and decision making about “matters of fact” and recast the latter as social learning strategies for “matters of taste.” Using computer simulations on a large-scale, empirical dataset, we studied how people could leverage the experiences of others to make better decisions. We found that experienced individuals can benefit from relying mostly on the opinions of seemingly similar people. Inexperienced individuals, in contrast, cannot reliably estimate similarity and are better off picking the mainstream option despite differences in taste. Crucially, the level of experience beyond which people should switch to similarity-heavy strategies varies substantially across individuals and depends on (i) how mainstream (or alternative) an individual’s tastes are and (ii) how much the similarity of the individual’s taste to that of the other people in the population differs across those people.

This is joint work with Daniel Barkoczi and Stefan Herzog. 

“The AI-Seminar is sponsored by Microsoft”

Friday, November 3rd, 2017

Speaker: Ashish Sabharwal, Allen Institute for Artificial Intelligence (AI2)

Title: Global Reasoning over Semantic Abstractions and Semi-Formal Knowledge Bases

Abstract: Reasoning with natural language is fundamental to the long-term vision of AI, and has driven much research into effective question answering (QA) systems. Despite tremendous advances, however, the best systems to date are fooled by simple textual variations and still struggle with routine tests of human intelligence, such as standardized science exams, even at the elementary level. Unlike several prominent QA datasets, such exams appeal to broad common-sense and domain knowledge, and require multi-fact inference in the presence of generics. The difficulty of creating high quality questions in such complex domains results in limited training data, questioning the viability of the currently popular paradigm of learning everything end-to-end.

The Aristo project at AI2 studies reasoning and linguistic challenges in the microcosm of science QA. In this talk, I'll describe our recent work on global reasoning (via Integer Linear Programming) over semantic abstractions of text. Such abstractions are either collected offline in a semi-formal knowledge base or derived on-the-fly using off-the-shelf, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Despite substantial noise in these abstractions, our approach outperforms state-of-the-art methods, including specialized techniques as well as neural models, by 2%-6%. I'll also describe a novel method for evaluating the validity of noisy semi-formal knowledge bases -- and indeed of any massive dataset -- using a cost-efficient crowdsourcing strategy that requires, for example, only 48K annotations for datasets with 2B items.

“The AI-Seminar is sponsored by Microsoft”

Friday, November 10th, 2017

Speaker: Andrew Wilson & Peter Frazier

Title: Bayesian Optimization with Gradients

Abstract: Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbours.

“The AI-Seminar is sponsored by Microsoft”

Friday, November 17th, 2017


*2:30p.m. - 3:30p.m., in Gates 122. Refreshments served at 2p.m.*

Speaker: Dylan Foster

Title: Parameter-free Online Learning Via Model Selection

Abstract: Algorithms for online learning and optimization such as gradient descent typically depend on a hyperparameter called the learning rate, which must be tuned correctly to prevent overfitting and ensure rapid convergence. We consider the problem of developing algorithms that are able to "learn the learning rate" and thus be parameter-free. While work in this area has focused on specific, highly structured function classes, such as nested balls in Hilbert space, we eschew this approach and propose a generic meta-algorithm framework that can provably learning the learning rate under minimal structural assumptions. This allows us to derive new computationally efficient parameter-free algorithms for a wide range of settings where such results were previously unavailable. The performance of these algorithms is quantified via so-called model selection oracle inequalities, and the connection between this type of inequality and parameter-free online learning will be a running theme of the talk.

Joint work with Satyen Kale, Mehryar Mohri, and Karthik Sridharan.

“The AI-Seminar is sponsored by Microsoft”

Friday, November 24th, 2017

No Seminar due to the Thanksgiving Holiday

No Seminar due to the Thanksgiving Holiday
Friday, December 1st, 2017

Speaker: Sudeep Bhatia, UPenn

Host: Pantelis Anayltis

Title: Knowledge Representation in Human Judgment

Abstract: I discuss how insights from computational linguistics can be used to inform psychological research on human judgment. My approach uses vector space semantic models, and shows that these models can approximate the knowledge representations that people use to gauge probabilities, forecast events in the future, assess facts about the world, evaluate brands, and judge other people. Subsequently, with existing psychological insights applied to knowledge representations obtained through vector space semantic analysis, it is possible to build computational models that are able to predict human responses in a large variety of high-level judgment tasks

“The AI-Seminar is sponsored by Microsoft”


See also the AI graduate study brochure.

Please contact any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up!

Sponsored by

CS7790, Fall '17

Erik Andersen
Yoav Artzi
Kavita Bala
Serge Belongie
Claire Cardie
Tanzeem Choudhury
Cristian Danescu-Niculescu-Mizil
Shimon Edelman
Carla Gomes
Joe Halpern
Haym Hirsh
Thorsten Joachims
Ross Knepper
Lillian Lee
David Mimno
Bart Selman
Karthik Sridharan
Kilian Weinberger
Ramin Zabih

Andrew Wilson

Madeleine Udell

Bharath Hariharan


Back to CS course websites