01.31.25: Reasoning in the Wild
Speaker: Wenting Zhao, Cornell
Host: Claire Cardie
Abstract: In this talk, I will discuss how to build natural language processing (NLP) systems that solve real-world problems requiring complex reasoning. I will address three key challenges. First, because real-world reasoning tasks often differ from the data used in pretraining, I will introduce WildChat, a dataset of reasoning questions collected from users, and demonstrate how training on it enhances language models’ reasoning abilities. Second, because supervision is often limited in practice, I will describe my approach to enabling models to perform multi-hop reasoning without direct supervision. Finally, since many real-world applications demand reasoning beyond natural language, I will introduce a language agent capable of acting on external feedback. I will conclude by outlining a vision for training the next generation of AI reasoning models.
02.07.25: Egocentric Computer Vision, for Fun and for Science - To Be Rescheduled
Speaker: David Crandall, Indiana University
Host: Waki Kamino
Abstract: The typical datasets we use to train and test computer vision algorithms consist of millions of consumer-style photos downloaded from the Internet. But this imagery is, arguably, very artificial: it’s significantly different from what humans actually see as they go about their daily lives. Low-cost, lightweight wearable cameras (like GoPro) make it possible to record people's lives from a first-person, "egocentric" perspective that approximates their actual field of view. In this talk, I’ll share some of our recent work on studying computer vision from the egocentric point of view. I'll argue that the egocentric perspective offers an opportunity for a more human-centered approach to computer vision research. In addition to "fun" consumer applications, I’ll talk about our joint work with developmental psychologists that has used first person cameras on young children to better understand how they learn --- and, in the process, how we might improve computer vision.
02.14.25: Efficient Local and Global Causal Discovery: Methods Leveraging Causal Substructures for Improved Finite Sample Performance
Speaker: Kyra Gan, Cornell
Host: Kilian Weinberger
Abstract: In this talk, we introduce two complementary methods for local and global causal discovery that leverage causal substructures for improved finite-sample performance. The first method, under an additive noise model (ANM) setting, exploits ancestral relationships to produce a more informative topological ordering than traditional linear orderings, generalizing to nonlinear causal relationships. The second method is a constraint-based approach that focuses on efficiently identifying valid adjustment sets (VAS) for confounding control, without relying on parametric or pretreatment assumptions. Both methods offer theoretical guarantees, run in polynomial time, and are empirically validated on synthetic data. Together, they highlight how harnessing local and global structures can reduce computational overhead, enhance accuracy in identifying causal edges, and improve downstream inference in observational studies.
02.21.25: Using Algorithms to Understand Transformers, and Using Transformers to Understand Algorithms
Speaker: Vatsal Sharan, USC
Host: Michael Kim
Abstract: We will discuss how algorithmic tools and understanding borrowed from optimization theory, Fourier transforms, and Boolean function analysis can help understand the mechanisms employed by Transformers to solve basic computational tasks such as linear regression and addition. We will examine the role of the architecture and pre-trained data in enabling Transformers to learn their employed mechanisms. Finally, we will discuss work on using Transformers themselves to discover and design data structures for tasks such as nearest neighbor search.
02.28.25: All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Speaker: Gokul Swamy, Carnegie Mellon
Host: Sanjiban Choudhury
Abstract: From a first-principles perspective, it may seem odd that the strongest results in foundation model fine-tuning (FT) are achieved via a relatively complex, two-stage training procedure. Specifically, one first trains a reward model (RM) on some dataset (e.g. human preferences) before using it to provide online feedback as part of a downstream reinforcement learning (RL) procedure, rather than directly optimizing the policy parameters on the dataset via offline maximum likelihood estimation. In fact, from an information-theoretic perspective, we can only lose information via passing through a reward model and cannot create any new information via on-policy sampling. To explain this discrepancy, we scrutinize several hypotheses on the value of RL in FT through both theoretical and empirical lenses. Of the hypotheses considered, we find the most support for the explanation that on problems with a generation-verification gap, the combination of the ease of learning the relatively simple RM (verifier) from the preference data, coupled with the ability of the downstream RL procedure to then filter its search space to the subset of policies (generators) that are optimal for relatively simple verifiers is what leads to the superior performance of online FT.
03.07.25: The OLMo Cookbook: Open Recipes for Language Model Data Curation
Speaker: Kyle Lo, Allen Institute for AI
Host: Tanya Goyal
Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it can be challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities, risks and limitations. In this talk, I'll present how we approach data curation research for OLMo, our project to develop and share fully open language models. Reflecting on our journey from OLMo 1 to our latest release of OLMo 2, I'll explore how data curation practices have matured across our work and the broader open data research ecosystem. Finally, I'll examine key challenges and opportunities for open data amid a rapidly changing language model landscape.
03.14.25: Building Real-Time, Adaptive Robots Using Online Search
Speaker: Preston Culbertson, Cornell
Host: Tapomayukh Bhattacharjee
Abstract: Robots operating in unpredictable environments often struggle when offline-trained policies—particularly those developed in simulation—face domain shifts. Although techniques like domain-randomized reinforcement learning and imitation learning have shown impressive results in hardware, they typically require extensive retraining and are unable to adapt to changing conditions. In this talk, I introduce an alternative approach that leverages online search to adapt pre-trained policies at runtime, allowing robots to meet new constraints and optimize new objectives without expensive retraining. I will first present our work on stochastic safety, which employs a control barrier function calibrated with a hardware-trained generative model to enable real-time collision avoidance for quadrotors and humanoid robots. I will also discuss our recent work online search for dexterous manipulation, where forward search via the cross-entropy method and vision-based state estimation achieve robust in-hand cube reorientation in hardware, without any policy pretraining. These findings demonstrate that runtime computation and online search can enable robust adaptation and multi-task generalization, opening promising new research directions in adaptive robotics.
03.21.25: Reasoning about Large Language Models
Speaker: Guy Van den Broeck, UCLA
Host: Kevin Ellis
Abstract: Today, many expect AI to tackle complex problems by performing reasoning—commonly interpreted as large language models generating sequences of tokens that resemble chains of thought. Yet historically, AI reasoning had a very different meaning: executing symbolic algorithms that performed logical or probabilistic deduction to derive definite answers to questions about knowledge. In this talk, I show that such old-fashioned ideas are very relevant to reasoning with large language models today. In particular, I will demonstrate that integrating symbolic reasoning algorithms directly into the architecture of language models enables state-of-the-art capabilities in controllable text generation and alignment.
03.28.25: Sequential decision making using online variational Bayes
Speaker: Kevin Murphy, Google
Host: Kevin Ellis
Abstract: After reviewing the basics of of approximate sequential Bayesian inference for state space models, I will present several algorithms we have developed [1-4] to make this process more efficient and robust, by combining various tricks (eg. linearization, low rank matrix updates, natural gradients, generalized Bayes). I then show how this can be applied to the problem of learning neural networks from streaming, non-stationary data, which is needed when tackling various kinds of sequential decision making problems, such as bandits, Bayesian optimization, and RL.
04.18.25: GenAI for Social Sciences: A Data-Driven-Robust-Control Approach to Corporate Finance
Speaker: Lin William Cong
Host: Thorsten Joachims
Abstract: I overview non-text-based generative modeling (involving transformer-based reinforcement learning or the novel "panel trees") for portfolio management, test asset creation, and detecting heterogeneity groups (e.g., asset clusters with differential return predictability), before briefly introducing the concept of data-driven generative equilibrium for counterfactual analysis in economics. I then focus on how goal-oriented GenAI applies to corporate decision-making that entails complex, high-dimensional, and non-linear stochastic control during which managers learn and adapt via dynamic interactions with the market environment. In Campello, Cong, and Zhou (2024), we propose a data-driven-robust-control (DDRC) framework to complement traditional theory, reduced-form models, and structural estimations in corporate finance research, emphasizing both empirical explanation and prediction of firm outcomes while delivering policy recommendations for a variety of business objectives. Specifically, we develop a predictive environment module using supervised deep learning and integrate a decision-making module based on generative deep reinforcement learning. By incorporating model ambiguity and robust control techniques, our framework not only better explains and predicts corporate outcomes in- and out-of-sample but also prescribes key managerial actions that significantly outperform historical ones. We document rich heterogeneity in model ambiguity, prediction performance, and policy efficacy in the cross section of U.S. public firms and over time. Importantly, DDRC helps delineate where theory and causal analysis should concentrate, integrate fragmented prior knowledge (e.g., via transfer learning), and understand managerial preferences (through an extension involving inverse reinforcement learning and generative adversarial networks).
04.25.25: The Loop: The interaction of creation and verification in designing complex systems
Speaker: Cody Roux, Applied Scientist, Amazon Web Services
Host: John Thickstun
Abstract: In designing complex systems, a well-known (and somewhat inevitable) pattern is cyclic: alternating between a creative design and implementation phase, followed by a more reflective validation and verification phase, which either declares success, or identifies issues and goes back to the creative phase. The scale of this cycle in terms of time and effort varies wildly, going from years and millions of dollars to weeks in more "agile" philosophies.
The modern advent of LLMs which can reasonably design code and reason may dramatically reduce the size of this loop, by placing a completely automated solution on one or both sides of the cycle.
It is therefore interesting to examine how such loops work, and what their failure modes are. I will mostly talk about my experiences applying automated reasoning as a human component of such loops, usually on the validation side, but the lessons learned are applicable to a broader context.
05.02.25: Reward-Guided Generation in Diffusion Models: Toward Programmable Protein Design
Speaker: Masatoshi Uehara, Evolutionary Scale
Host: Karthik Sridharan
Abstract: Diffusion models are celebrated for their strong generative capabilities. However, practical applications often demand sample generation that not only produces realistic outputs but also optimizes specific objectives (e.g., human preference scores in computer vision, binding affinity in proteins). To address this, diffusion models can be adapted to explicitly maximize desired reward metrics. While many methods have been developed for domains like computer vision, applying reward-guided generation to biological design poses unique challenges: (1) reward functions are often non-differentiable, and (2) biological data frequently involves discrete data. In this talk, I will present our recent advances in test-time controlled generation methods that address these challenges. I will also discuss how these techniques enable real-world applications across molecular design tasks, including protein, DNA, RNA, and small molecule generation.
05.09.25: Auditing, Understanding, and Evaluating Large Language Models
Speaker: Robin Jia, USC
Host: Jennifer Sun
Abstract: The widespread adoption of large language models places a responsibility on the AI research community to rigorously study and understand them. In this talk, I will describe my group’s research on analyzing language models’ training data, internal mechanisms, and downstream behavior. First, I will discuss two complementary approaches to audit usage of copyrighted data for language model training. Next, I will describe my group’s recent work on understanding how language models work internally, including a case study of how they use Fourier features to solve arithmetic problems. Finally, I will highlight our collaborative efforts to characterize LLMs’ strengths and weaknesses in application domains spanning medicine, robotics, and software engineering.