Professor of Computer Science
Cornell University & ASAPP Inc.
ACM Fellow | AAAI Fellow | Jolly Good Fellow
Biography
Kilian Q. Weinberger is a Professor in the Department of Computer Science at Cornell University. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul and his undergraduate degree in Mathematics and Computing from the University of Oxford.
During his career he has won several best paper awards at ICML (2004), CVPR (2004, 2017), AISTATS (2005) and KDD (2014, runner-up award). In 2011 he was awarded the Outstanding AAAI Senior Program Chair Award and in 2012 he received an NSF CAREER award. He is the recipient of the Daniel M Lazar '29 Excellence in Teaching Award (2016) and the Ann S. Bowers Teaching and Advising Excellence Award (2024).
As of 2024 he is an ACM and AAAI fellow and in 2021 became a Blavatnik National Awards Finalists. He was elected co-Program Chair for ICML 2016 and for AAAI 2018 and has been on the ICML board since 2016. He served as the 7th president of ICML from 2023 until 2025. Since 2024 he has been a member of the Sloan Research Fellowships Selection Committee.
Kilian Weinberger's research focuses on Machine Learning and its applications. In particular, he has worked on learning under resource constraints, metric learning, AI in Science, computer vision, autonomous vehicles, Gaussian Processes, and deep learning. Before joining Cornell University, he was an Associate Professor at Washington University in St. Louis and before that he worked as a research scientist at Yahoo! Research in Santa Clara.
About Me
I am married to (the amazing) Anne Bracy. Together we have three children, Timo, Koby, and Nika. In my spare time I like running (when I am not injured), reading, biking, or boating on Cayuga Lake. Some books that I really enjoyed are The Three Body Problem Trilogy, A Gentleman in Moscow, The Things They Carried, and Never Let Me Go. In 2025 I particularly enjoyed reading Source Code (by Bill Gates).
My Lab
I have been lucky to work with incredibly talented and fun (π-Phd, π-Masters, π-Undergraduate) students and Postdocs (π¬).
Current Lab Members
Former Lab Members
I am looking for good PhD students most of the time. The most important prerequisite to be successful in machine learning is a strong mathematical background and coding skills.
If you are interested, please do not apply to me directly, as all applications are centralized through the department. Please indicate on your application that you are interested in working on machine learning and that you are interested in joining my group. As we receive thousands of applicants, I sometimes apply a filter and first look at applications that mention my name.
All students who are accepted will obtain a fully funded fellowship that covers tuition, 12 months salary and health insurance. (Please don't send me any emails with questions about the application process, as I am not involved in it.)
A few years ago I summarized my research philosophy for a Neurips Workshop talk. My research focuses on algorithm design for machine learning with a specific emphasis on representation learning. Over the years my work has spanned several interconnected research directions, from fundamental questions about how to compare data to practical applications in autonomous driving.
Metric Learning
One of the fundamental challenges of machine learning is how to compare examples. My early work introduced Large Margin Nearest Neighbor (LMNN), which learns distance metrics by pulling similarly labeled inputs close while pushing dissimilarly labeled inputs apart. This framework popularized the triplet loss objective that is now widely used in computer vision.
When word embeddings became a thing, we introduced the Word Mover's Distance (WMD), a novel approach to measuring similarity between text documents. WMD elegantly incorporates the fact that different words can have similar meanings by casting document comparison as a transportation problem over word embeddings. We later extended this idea to contextual embeddings with BertScore, which has been widely adopted for evaluating machine translation and text generation systems.
Resource Efficient Learning
In industrial applications, all resources are limited and must be accounted for. My group has been among the first to formally integrate resource constraints into learning algorithms, treating feature extraction cost as a natural trade-off with accuracy. This work introduced feature hashing, now widely known as the "hashing trick," which allows learning tasks to operate within fixed memory budgets.
During the rise of deep learning, we extended these ideas to neural networks, showing that network parameters are highly redundant and can be compressed by orders of magnitude without significant accuracy loss. This contributed to the now-vibrant subfield of neural network compression and the ongoing discussion about over-parameterization.
Deep Network Architectures
Our work on network compression revealed surprising redundancy in deep networks, raising questions about whether this redundancy is necessary or avoidable. We introduced stochastic depth, showing that deliberately increasing redundancy can substantially improve generalization.
To understand the purpose of this redundancy, we designed DenseNet, which introduces direct skip connections between all layers of the same size. This architecture drastically improves both generalization performance and parameter efficiency, winning the CVPR 2017 best paper award and establishing itself as one of the most widely used neural network architectures.
Recognizing that neural networks are increasingly used in high-stakes medical decisions, we investigated why network probability outputs are poorly calibrated. Our work on calibration showed that temperature scaling is highly effective for reliable probability estimates, and this approach has become the standard method for network calibration.
More recently, we studied graph convolutional neural networks and discovered that much of their complexity was unnecessary. Our Simplifying Graph Convolutional Networks paper showed that a simple closed-form preprocessing step paired with logistic regression can match the performance of complex GCNs while being orders of magnitude faster.
Efficient Inference for Gaussian Processes
To make Gaussian Processes as accessible as deep learning, my students, Andrew G. Wilson, and I (but really, mostly Geoff Pleiss and Jake Gardner) developed GPyTorch, a highly modular library that leverages GPU-optimized matrix operations. This platform has become one of the most popular GP coding frameworks, with contributors from universities and companies worldwide.
Perception for Autonomous Driving
In collaboration with colleagues in Mechanical Engineering and Computer Science, we investigated whether 3D object detection for self-driving cars could be performed with passive cameras instead of expensive LiDAR sensors. Our pseudo-LiDAR approach mimics a LiDAR-like point cloud using stereo camera data, dramatically improving detection accuracy.
Teaching
One of my favorite parts of my job is teaching. I have mostly taught Machine Learning, Deep Learning, and AI. Some of my lectures on Machine Learning are available on Youtube.
At Cornell University
CS4782 - Introduction to Deep Learning
Spring 2026, Spring 2025
CS6784 - Advanced Topics in Machine Learning
Fall 2025, Fall 2024, Fall 2023, Fall 2022, Fall 2017, Fall 2016, Spring 2016
CS3780/4780/5780 - Machine Learning
Spring 2024, Spring 2023, Spring 2022, Fall 2021 (co-taught with Anil Damle), Fall 2018, Spring 2018 (co-taught with Chris de Sa), Spring 2017, Fall 2015
At Washington University in St. Louis
CSE517a - Machine Learning
Spring 2015, Spring 2014, Spring 2010
CSE519T - Advanced Machine Learning
Fall 2014, Fall 2012
CSE 511a - Artificial Intelligence
Fall 2013, Spring 2012, Fall 2010
Contact
Office
Professor of Computer Science
Cornell University
Bowers, Room 475
Ithaca, NY 14853-7501
kilian () cornell.edu
Phone
(607) 255 4845