Recognition

Core Recognition

Identify examples of motivating applications: Robotics, Internet applications, Healthcare, Science etc.
Define the tasks of Image Classification, Object Detection, Instance Segmentation, Semantic Segmentation and Pose Estimation in terms of the input, the desired output, and evaluation metrics
Outline why machine learning is needed to solve these tasks, and how machine learning is limited for these tasks.
For each of the above tasks, describe the key challenges that make them difficult even with powerful machine learning systems.
Define neural network architectures, training objectives and training pipelines for each of these tasks.
Describe the overarching framework of structured prediction, how it is relevant to the above tasks, and what the challenges are in learning structured prediction models
Describe the motivation of training models that refine a previous prediction and the two main instantiations of the idea

Articulate why labeled training datasets are a necessity
Describe potential alternative problem setups that use less labeled data: Few-shot learning, Transfer learning, Semi-supervised learning and Self-supervised learning.
Identify the key learning signals and cues that enable the above setups
Describe the contrastive learning paradigm and reason about why that works well
Describe the idea of self-training / pseudo-labeling and when that leads to improvement
Describe the meaning of consistency regularization

Define the captioning, VQA and retrieval tasks
Articulate the challenges with evaluating captioning and VQA performance
Describe the contrastive architectures for matching text captions to images
Describe how language models can be useful: e.g., ViperGPT
Connect Vision-language learning with Learning from fewer labels, by describing the idea of weak supervision and the notion of "zero-shot" learning

Ask critically who the beneficiaries of particular applications are
Identify if particular tasks / recognition problems make sense, or purport to do the impossible
Discuss what metrics and what levels of accuracy would be required for deploying particular technologies
Interrogate the data collection pipeline: identify the issues that stem from large uncurated datasets, data scraped without consent, datasets that are kept private and undisclosed, and datasets that replicate biases and problematic content
Argue whether or not modern advances should count as progress given the issues above.

Derive equations describing image formation and the matrices defining the camera
Derive equations relating pixel color to surface reflectance and lighting
Describe mathematically the epipolar constraint
Describe mathematically how one can reconstruct camera pose and 3D structure given correspondences
Describe classical (SIFT) and modern learning-based correspondence pipelines
Describe qualitatively modern pipelines for correspondence-based Structure-from-Motion
Identify weaknesses of this pipeline and where it may fail: repeating textures, sparse views.

Describe qualitatively the idea of space carving
Describe mathematically the notion of radiance fields and how they relate to image pixels
Discuss the challenges with using neural networks to approximate radiance fields and how they are solved in NeRF
Discuss similar alternatives: Plenoxels, multi-plane images
Contrast reconstructions with SfM
Discuss the challenges associated with modeling dynamic scenes
Define Signed Distance Fields and other representations of shape
Describe techniques to represent shapes as neural scenes

Discuss the motivations for building generative models
Derive the training objectives for GANs, VAEs and Diffusion Models
Contrast the benefits and challenges for each generative model and connect them to training objectives
Describe architectures for conditional generation based on text
Discuss ethical issues around algorithmic synthesis: misinformation, scraping artist content

Describe why embodiment might be necessary for learning
Describe qualitatively the MDP and POMDP problems as well as basic RL algorithms
Describe the notion of intuitive physics and world models and the idea of model-based Reinforcement Learning
Compare model-based and model-free approaches intuitively
Discuss the usefulness and the challenges of cognitive-science inspired architectures, especially with an eye to generalization
Discuss challenges associated with other vision sensors common in robotics: 3D sensors

Concretely identify a research question of significance
Review prior work and crisply identify limitations
Design new solutions to address identified limitations
Evaluate and audit the proposed solution with the actual application and potential ethical considerations in mind.
Provide feedback and guidance to peer researchers.
Write a technical paper that can be accepted to a workshop or conference.