Artificial Intelligence Seminar

Spring 2018, Fridays
Talks - 12:15-1:15 p.m. in Gates G01
Discussion & Refreshments - 1:15-2:00 p.m. in Gates 122


The AI seminar will meet weekly for lectures by graduate students, faculty, and researchers emphasizing work-in-progress and recent results in AI research. The talks will run in Gates G01 between 12:15 and 1:15 p.m., with a discussion and lunch in Gates 122 following at 1:15 p.m. The new format is designed to allow AI chit-chat after the talks. Also, we're trying to make some of the presentations less formal so that students and faculty will feel comfortable using the seminar to give presentations about work in progress or practice talks for conferences.

If you or others would like to be deleted or added from this announcement, please contact Vanessa Maley at


January 26th, 2018

Speaker: Yisong Yue, California Institute of Technology

Host: Thorsten Joachims, Cornell University

Title: New Frontiers in Imitation Learning

Abstract: Imitation learning is a branch of machine learning that pertains to learning to make (a sequence of) decisions given demonstrations and/or feedback. Canonical settings include self-driving cars and playing games. When scaling up to complex state/action spaces, one major challenge is how best to incorporate structure into the learning process.  For instance, the complexity of unstructured imitation learning can scale very poorly w.r.t. the naive size of the state/action space.

In this talk, I will describe recent and ongoing work in developing principled structured imitation learning approaches that can exploit interdependencies in the state/action space, and achieve orders-of-magnitude improvements in learning rate or accuracy, or both.  These approaches are showcased on a wide range of (often commercially deployed) applications, including modeling professional sports, laboratory animals, speech animation, and expensive computational oracles.

Biography: Yisong Yue is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology.  He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign.

Yisong's research interests lie primarily in the theory and application of statistical machine learning. His research is largely centered around developing learning approaches that can characterize structured and adaptive decision-making settings. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, policy learning in robotics, and adaptive planning & allocation problems.

February 2nd, 2018

Speaker: Ross Knepper, Cornell University

Title: Communicative Actions in Human-Robot Teams

Abstract: Robots out in the world today work for people but not with people.  Before robots can work closely with ordinary people as part of a human-robot team in a home or office setting, robots need the ability to think and act more like people. When people act jointly as part of a team, they engage in collaborative planning, which involves forming a consensus through an exchange of information about goals, capabilities, and partial plans.  In this talk, I describe a framework for robots to understand and generate messages broadly -- not only through natural language but also by functional actions that carry meaning through the context in which they occur. Careful action selection allows robots to clearly and concisely communicate meaning with human partners in a manner that almost resembles telepathy.  I show examples of how this implicit communication can facilitate activities as basic as hallway navigation and as sophisticated as collaborative tool use in assembly tasks.  I also show how these abilities can assist in recovery after a failure.

February 9th, 2018

No Speaker - Lunch Discussion Only at 12 (noon)


February 16th, 2018

No Speaker - Lunch Discussion Only at 12 (Noon)


February 23rd, 2018

Speaker: Olga Russakovsky, Princeton

*This is a joint seminar with Cornell Tech Learning Machine Seminar Series. It will be live at Cornell Tech Campus, but broadcasted to Gates G01 at 12:00 p.m.* More info here:

Title: The Human Side of Computer Vision

Abstract: Abstract: Intelligent agents acting in the real world need advanced vision capabilities to perceive, learn from, reason about and interact with their environment. In this talk, I will explore the role that humans play in the design and deployment of computer vision systems. Large-scale manually labeled datasets have proven instrumental for scaling up visual recognition, but they come at a substantial human cost. I will first briefly talk about strategies for making optimal use of human annotation effort for computer vision progress. However, no dataset can foresee all the visual scenarios that a real-world system might encounter. I will describe several recent works that integrate human and computer expertise for visual recognition in the fields of semantic segmentation and visual question answering. I will conclude with some thoughts around making fair, transparent and representative computer vision systems going forward.

Bio: Dr. Olga Russakovsky is an Assistant Professor in the Computer Science Department at Princeton University. Her research is in computer vision, closely integrated with machine learning and human-computer interaction. She completed her PhD at Stanford University and her postdoctoral fellowship at Carnegie Mellon University. She was awarded the PAMI Everingham Prize as one of the leaders of the ImageNet Large Scale Visual Recognition Challenge, the NSF Graduate Fellowship and the MIT Technology Review 35-under-35 Innovator award. In addition to her research, she co-founded the Stanford AI Laboratory’s outreach camp SAILORS to educate high school girls about AI. She then co-founded and continues to serve as a board member of the AI4ALL foundation dedicated to educating a diverse group of future AI leaders.


March 2nd, 2018

Speaker: Wei-Lun (Harry) Chao, University of Southern California

Host: Kilian Weinberger, Cornell University

Title: Transfer learning towards intelligent systems in the wild

Abstract: Developing intelligent systems for vision and language understanding in the wild has long been a crucial part that people dream about the future. In the past few years, with the accessibility to large-scale data and the advance of machine learning algorithms, vision and language understanding has had significant progress for constrained environments. However, it remains challenging for unconstrained environments in the wild where the intelligent system needs to tackle unseen objects and unfamiliar language usage that it has not been trained on. Transfer learning, which aims to transfer and adapt the learned knowledge from the training environment to a different but related test environment has thus emerged as a promising paradigm to remedy the difficulty.

In this talk, I will present my recent work on transfer learning towards intelligent systems in the wild. I will begin with zero-shot learning, which aims to expand the learned knowledge from seen objects, of which we have training data, to unseen objects, of which we have no training data. I will present an algorithm SynC that can construct classifiers of any object class given its semantic description, even without training data, followed by a comprehensive study on how to apply it to different environments. I will then describe an adaptive visual question answering framework that builds upon the insight of zero-shot learning and can further adapt its knowledge to the new environment given limited information. I will finish my talk with some directions for future research.

Bio: Wei-Lun (Harry) Chao is a Computer Science PhD candidate at University of Southern California, working with Fei Sha. His research interests are in machine learning and its applications to computer vision and artificial intelligence. His recent work has focused on transfer learning towards vision and language understanding in the wild. His earlier research includes work on probabilistic inference, structured prediction for video summarization, and face understanding.


March 9th, 2018

Speaker: Byron Boots, Georgia Tech

*This is a joint seminar with Cornell Tech Learning Machine Seminar Series* More info here:

Title: Learning Perception and Control for Agile Off-Road Autonomous Driving

Abstract: The main goal of this talk is to illustrate how machine learning can start to address some of the fundamental perceptual and control challenges involved in building intelligent robots. I’ll start by introducing a new high speed autonomous “rally car” platform built at Georgia Tech, and discuss an off-road racing task that requires impressive sensing, speed, and agility to complete. I will discuss two approaches to this problem, one based on model predictive control and one based on learning deep policies that directly map images to actions. Along the way I’ll introduce new tools from reinforcement learning, imitation learning, and online learning and show how theoretical insights help us to overcome some of the practical challenges involved in learning on a real-world platform. I will conclude by discussing ongoing work in my lab related to machine learning for robotics. 


March 16th, 2018

Speaker: Andrew Wilson, Cornell University

Title: Loss Valleys and Generalization in Deep Learning

Abstract: In this talk, we present two surprising geometric discoveries, leading to two different practical methods for training deep neural networks.  The first result shows that the optima for DNN are not isolated, but can be connected along simple curves, such as a polygonal chain or quadratic Bezier curve, of near-constant accuracy. We present a new training procedure for finding such paths, and an ensembling algorithm, Fast Geometric Ensembling, which was inspired by this insight.  This paper can be found at:

The second result is from a paper to be announced in the next few days.  This work helps advance the debate about optima width and generalization, as well as understand whether SGD does indeed converge to broad optima.  In this second paper, we provide a general procedure for training neural networks with greatly improved performance over SGD training (for essentially any architecture and any benchmark), and no overhead.

This is joint work with Pavel Izmailov (Cornell), Timur Garipov, Dmitrii Podoprikhin, and Dmitry Vetrov (Moscow State University and the Higher School of Economics).


March 23rd, 2018

Speaker: Jesse Thomason, UT Austin

Host: Ross Knepper, Cornell University

Title:Improving Semantic Parsing and Language Grounding through Human-Robot Dialog

Abstract:As robots become more ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to Take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must know heavy, green, and mug are properties that describe a physical object in the environment, and must have accurate concept models of those properties to select the right one. In this talk, we discuss work that performs language parsing with sparse initial data, using the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions. Additionally, we discuss strategies for learning perceptual concepts like heavy, and the objects those concepts apply to, using multi-modal sensory information and interaction with humans. Finally, we present a system with both parsing and perception capabilities that learns from conversations with users to improve both components over time.

Bio: Jesse Thomason is a fifth year PhD candidate working with Dr. Raymond Mooney and collaborating with Dr. Peter Stone at the University of Texas at Austin Computer Science Department (UTCS). He works at the intersection of natural language processing and robotics. His research interests are primarily in semantic understanding and language grounding in human-robot dialogs. He focuses on algorithms that bootstrap robot understanding from interaction with humans, improving language understanding and perceptual grounding for whatever task and domain an embodied robot operates in. He is supported by a National Science Foundation Graduate Research Fellowship and has published at AI, Robotics, and NLP venues such as AAAI, CoRL, IJCAI, NAACL, and COLING.


March 30th, 2018

Speaker: Yuqian Zhang,

Host: Killian Weinberger

Title: Low-Complexity Modeling for Visual Data: Representations and Algorithms

Abstract: This talk focuses on representations and algorithms for visual data, in light of recent theoretical and algorithmic developments in high-dimensional data analysis. We first consider the problem of modeling a given dataset as superpositions of basic motifs. This simple model arises from several important applications, including microscopy image analysis, neural spike sorting and image deblurring. This motif-finding problem can be phrased as ‘"short-and-sparse" blind deconvolution, in which the goal is to recover a short motif (convolution kernel) from its convolution with a random spike train. We assume the kernel to have unit Frobenius norm, and formulate it as a nonconvex optimization problem over the sphere. By analyzing the optimization landscape, we argue that when the target spike train is sufficiently sparse, then on a region of the sphere, every local minimum is equivalent to the ground truth. This geometric characterization implies that efficient methods obtain the ground truth under the same conditions. We next consider the problem of modeling physical nuisances across a collection of images, in the context of illumination-invariant object detection and recognition. We study the image formation process for general nonconvex objects (faces etc.), and propose a test data construction methodology that achieves object verification with worst-case performance guarantees. In addition, we leverage tools from sparse and low-rank decomposition to reduce the complexity for both storage and computation. These examples show the possibility of formalizing certain vision problems with rigorous guarantees.


April 6th, 2018



April 13th, 2018

ACSU Faculty Luncheon - NO SEMINAR


April 20th, 2018

Speaker: Molly Feldman, Cornell University

Title: Automatic Diagnosis of Student Misconceptions in K-8 Mathematics

Abstract: K-8 mathematics students must learn many procedures, such as addition and subtraction. Students frequently learn “buggy” variations of these procedures, which we ideally could identify automatically. This is challenging because there are many possible variations that reflect deep compositions of procedural thought. 

In this talk, I will discuss a system we built that can examine students’ answers and infers how they incorrectly combine basic skills into complex procedures. We evaluate this approach on data from approximately 300 students. Our system replicates 86% of the answers that contain clear systematic mistakes (13%). Investigating further, we found 77% at least partially replicate a known misconception, with 53% matching exactly. We also present data from 29 participants showing that our system can demonstrate inferred incorrect procedures to an educator as successfully as a human expert.

This is joint work with Ji Yong Cho, Monica Ong, Sumit Gulwani, Zoran Popović, and Erik Andersen.


April 27th, 2018

*No Speaker-- Discussion and lunch in Gates 122 at 12:15 p.m.*

May 4th, 2018

Speaker: Fujun Luan, Cornell University

Title: Style Transfer in Photos and Paintings

Abstract: In this talk, we present the recent advances in style transfer techniques can be well-adjusted and applied in new applications such as photorealistic style transfer and painterly harmonization. For photorealistic style transfer, the original work of Gatys et al. produces strong distortions in the output which is acceptable for artistic paintings but not for photographs, we introduce a photorealistic regularization term that enforces the transformation from the input to the output to be locally affine in colorspace. For painterly harmonization, we ensure both spatial and inter-scale consistency and demonstrate that both aspects are key to generating quality harmonization results.

Bio: Fujun Luan is a third-year Ph.D. student in the Computer Graphics & Vision Group at Cornell University, advised by Prof. Kavita Bala. He works on the boundary of computer graphics and vision. His research interests include physically-based rendering and denoising, fabrics modeling, and developing machine learning techniques for image editing and inverse rendering.



See also the AI graduate study brochure.

Please contact any of the faculty below if you'd like to give a talk this semester. We especially encourage graduate students to sign up!

CS7790, Spring 18


Back to CS course websites