- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Spring 2025 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University / Cornell Tech - High School Programming Workshop and Contest 2025
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Robotics Ph. D. prgram
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
To be useful to downstream applications, visual recognition systems have to solve a diverse array of tasks: they need to recognize a large number of categories, localize instances of these categories precisely in the visual field, estimate their pose accurately and so on. This set of requirements is also not fixed a priori and can change over time, requiring recognition systems to learn new tasks quickly and with minimal training. In contrast, visual recognition systems today only produce a shallow understanding of images, restricted to recognizing categories in an image, and expanding this shallow understanding requires the expensive collection of training data.
In this talk I will describe my work on removing this shortcoming. I will show we can build recognition systems that produce richer outputs, such as pixel-precise localization of detected objects, and how we can make progress towards making these systems capable of visual reasoning. Building models for these new goals requires a lot of training data. To reduce this requirement, I will present ways of leveraging past visual experience to learn new tasks, such as recognizing unseen categories, from very little data.
Bio:
I am currently a post-doctoral scholar in Facebook AI Research (FAIR). Before joining FAIR, I did my PhD with Prof. Jitendra Malik at the University of California, Berkeley, where I was awarded the Berkeley Fellowship and the Microsoft Research Fellowship. My interests are in object recognition in computer vision and machine learning.