Recognition
Core Recognition
- Identify examples of motivating applications: Robotics, Internet applications, Healthcare, Science etc.
- Define the tasks of Image Classification, Object Detection, Instance Segmentation, Semantic Segmentation and Pose Estimation in terms of the input, the desired output, and evaluation metrics
- Outline why machine learning is needed to solve these tasks, and how machine learning is limited for these tasks.
- For each of the above tasks, describe the key challenges that make them difficult even with powerful machine learning systems.
- Define neural network architectures, training objectives and training pipelines for each of these tasks.
- Describe the overarching framework of structured prediction, how it is relevant to the above tasks, and what the challenges are in learning structured prediction models
- Describe the motivation of training models that refine a previous prediction and the two main instantiations of the idea
Learning with limited labeled data
- Articulate why labeled training datasets are a necessity
- Describe potential alternative problem setups that use less labeled data: Few-shot learning, Transfer learning, Semi-supervised learning and Self-supervised learning.
- Identify the key learning signals and cues that enable the above setups
- Describe the contrastive learning paradigm and reason about why that works well
- Describe the idea of self-training / pseudo-labeling and when that leads to improvement
- Describe the meaning of consistency regularization
Vision - language learning
- Define the captioning, VQA and retrieval tasks
- Articulate the challenges with evaluating captioning and VQA performance
- Describe the contrastive architectures for matching text captions to images
- Describe how language models can be useful: e.g., ViperGPT
- Connect Vision-language learning with Learning from fewer labels, by describing the idea of weak supervision and the notion of
"zero-shot" learning
Task framing and ethics
- Ask critically who the beneficiaries of particular applications are
- Identify if particular tasks / recognition problems make sense, or purport to do the impossible
- Discuss what metrics and what levels of accuracy would be required for deploying particular technologies
- Interrogate the data collection pipeline: identify the issues that stem from large uncurated datasets, data scraped without consent, datasets that are kept private and undisclosed, and datasets that replicate biases and problematic content
- Argue whether or not modern advances should count as progress given the issues above.
Reconstruction
Basics
- Derive equations describing image formation and the matrices defining the camera
- Derive equations relating pixel color to surface reflectance and lighting
- Describe mathematically the epipolar constraint
- Describe mathematically how one can reconstruct camera pose and 3D structure given correspondences
- Describe classical (SIFT) and modern learning-based correspondence pipelines
- Describe qualitatively modern pipelines for correspondence-based Structure-from-Motion
- Identify weaknesses of this pipeline and where it may fail: repeating textures, sparse views.
Neural Fields
- Describe qualitatively the idea of space carving
- Describe mathematically the notion of radiance fields and how they relate to image pixels
- Discuss the challenges with using neural networks to approximate radiance fields and how they are solved in NeRF
- Discuss similar alternatives: Plenoxels, multi-plane images
- Contrast reconstructions with SfM
- Discuss the challenges associated with modeling dynamic scenes
- Define Signed Distance Fields and other representations of shape
- Describe techniques to represent shapes as neural scenes
Synthesis
- Discuss the motivations for building generative models
- Derive the training objectives for GANs, VAEs and Diffusion Models
- Contrast the benefits and challenges for each generative model and connect them to training objectives
- Describe architectures for conditional generation based on text
- Discuss ethical issues around algorithmic synthesis: misinformation, scraping artist content
Embodied Vision
- Describe why embodiment might be necessary for learning
- Describe qualitatively the MDP and POMDP problems as well as basic RL algorithms
- Describe the notion of intuitive physics and world models and the idea of model-based Reinforcement Learning
- Compare model-based and model-free approaches intuitively
- Discuss the usefulness and the challenges of cognitive-science inspired architectures, especially with an eye to generalization
- Discuss challenges associated with other vision sensors common in robotics: 3D sensors
Research skills
- Concretely identify a research question of significance
- Review prior work and crisply identify limitations
- Design new solutions to address identified limitations
- Evaluate and audit the proposed solution with the actual application and potential ethical considerations in mind.
- Provide feedback and guidance to peer researchers.
- Write a technical paper that can be accepted to a workshop or conference.