Frontiers of Computer Vision

Fall 2025

CS, 5672

Course website:https://canvas.cornell.edu/courses/80112

Enrollment questions: courses@cis.cornell.edu

Faculty Email: bharathh@cs.cornell.edu

Faculty Office Hours: Gates 311, Tuesday 2:00 - 3:00 p.m. Starting the week of Sep 1.

Course Staff and Course Staff Office Hours:

This course will have 2 teaching assistants and ~2 office hours per week. Times and venues for office hours will be posted the first week of classes.

Prerequisites/Corequisites:
Knowledge of linear algebra, programming and probability/statistics is required.
Knowledge of machine learning basics is recommended.

Time and Location: Mondays/Wednesdays 8:40 am - 9:55 am in Kimbal Hall. Total of 28 lectures.

Course Description

This course will cover contemporary advances in computer vision. We will cover quick background on classical computer vision before delving into neural networks, modern methods for 3D reconstruction such as neural radiance fields, modern recognition systems based on vision-language models and multimodal language models, and recent trends in synthesis or image generation.

Course Objectives/Student Learning Outcomes

After taking this course, students will be able to:

Describe the classical pipelines for 3D reconstruction.

Describe intuitively and mathematically the geometry and physics of image formation
Define clearly the information that gets lost in image formation
Derive the mathematics behind inverting the projection process and recovering camera poses and point clouds
Identify why correspondences are needed
Define the challenges in identifying correspondences and describe how reconstruction algorithms handle outlier correspondences
Describe the architecture and training pipelines of DUST3R and its variants
Use established libraries to reconstruct scenes

Implement simple versions of neural radiance fields, and explain modern neural radiance field papers.

Define the goal of neural radiance fields
Differentiate it from classical reconstruction pipelines
Define the need for positional encodings
Derive the mathematical expressions involved in volume rendering
Implement the NeRF training and inference pipeline
Identify scenarios where NeRFs fail to model scenes
Discuss the advantages and disadvantages of Gaussian Splats as an alternative to NeRFs

Describe the network architectures of recognition systems, and analyze the capabilities of modern multimodal language models.

Design, implement and train convolutional networks for classification.
Design, implement and train transformers for classification.
Describe the many ways of transfer learning, including fine-tuning, prompt tuning and parameter-efficient fine-tuning
Describe architectures and training pipelines for object detection
Describe architectures and training pipelines for semantic and instance segmentation
Articulate the challenges and training methodologies for structured prediction
Write down loss functions used for self-supervised learning, and explain the corresponding intuitions
Write down loss functions used for contrastive vision language models and explain the corresponding intuitions
Identify the capabilities and weaknesses of vision language models
Identify the capabilities and limitations of multimodal language models

Derive the mathematical formulation of GANs, VAEs, diffusion models and conditional image generation.

Mathematically describe the generative modeling problem
Intuitively describe the challenges behind generative modeling.
Define the training objective of GANs and describe how it solves these challenges.
Define the training objective of VAEs and contrast them with GANs
Define the training objective and architectures of diffusion models and their different variants.
Describe how the architecture and objectives are altered for conditional models
Describe the emergent properties of diffusion models
Compare diffusion models with novel view synthesis methods and describe how they may be combined with NeRFs.

Course Materials

Course materials in the form of lecture slides will be available through the Canvas webpage

Method of Assessing Student Achievement

Deliverables:

Projects: This course will have 2 projects on the topics of 3D reconstruction and recognition respectively. The assignments must be done in groups of 2. Students wanting to do the projects alone must let the instructor know.
Individual homeworks: This course will have 2 homeworks to be done individually. The homework will involve both written and programming components.
Exams: This course will have one take-home final exam.

Grading policies

Late work: You will have 10 slip days for the entire course. Once these are exhausted, you will lose 5% of the assignment grade for every day of delay.
Missed work: If you miss assignments, homeworks or exams due to unforeseen medical emergencies, contact the instructor of the course with the reason and appropriate documentation. If the instructor finds the reason justified, you will be given the option of rescaling whatever work you have done to calculate your grade. The details of how this rescaling will work is at the discretion of the instructor.
However, this option will only be available if you have submitted at least one programming assignment and one exam. If you are unable to do this minimum amount of work, then you will be encouraged to take an INC and finish the work later, or switch to S/U.

Grade distribution:

Assignment, Assessment or activity	Percentage of grade or points
Projects	40%
Individual Homeworks	40%
Final	20%

Grading scale

95-100%	A+	50-65%	C+
90-95%	A	40-50%	C
85-90%	A-	30 - 40%	C-
80-85%	B+	20 - 30%	D +
75-80%	B	10-20%	D
65-75%	B-	<10%	F

Course Management

ACADEMIC INTEGRITY:

Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work.

In particular, since modern language model-based tools (like ChatGPT) can copy without citation any text in their training data, the use of such tools will be considered as plagiarism and is therefore strictly prohibited.

ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES:

Students with Disabilities: Your access in this course is important. Please give me your Student Disability Services (SDS) accommodation letter and email me a note early in the semester so that we have adequate time to arrange your approved academic accommodations. If you need an immediate accommodation for equal access, please speak with me after class or send an email message to me and/or SDS at sds_cu@cornell.edu. If the need arises for additional accommodations during the semester, please contact SDS. Student Disability Services is located at Cornell Health Level 5, 110 Ho Plaza, 607-254-4545, sds.cornell.edu.

INCLUSIVITY:

Computer vision is a technology fraught with many ethical issues in its current practice. As new entrants into this field, you have the power to change this for the better. We can start by keeping our course an inclusive environment that supports everyone’s learning, maintains a civil discourse, and respects what every one of us brings to the table.

MENTAL HEALTH AND STRESS MANAGEMENT RESOURCES

If you are feeling overwhelmed, or worried about a friend, please reach out to one of your instructors or your academic advisor.

Please look at this guide that collects all the resources that you can avail of.

Note that Cornell has trained peer mentors available to listen and help: Empathy, Assistance, and Referral Service , Also trained counselors: Cornell Health's Counseling and Psychological Services (CAPS, 607-255-5155), and Let’s Talk.