Frontiers of Computer Vision

Fall 2025

CS, 5672

Course website:https://canvas.cornell.edu/courses/80112

Enrollment questions: courses@cis.cornell.edu

Faculty Name(s): Bharath Hariharan

Faculty Email: bharathh@cs.cornell.edu

Faculty Office Hours: Gates 311, Tuesday 2:00 - 3:00 p.m. Starting the week of Sep 1.

 

Course Staff and Course Staff Office Hours:

This course will have 2 teaching assistants and ~2 office hours per week. Times and venues for office hours will be posted the first week of classes.

 

Prerequisites/Corequisites:
Knowledge of linear algebra, programming and probability/statistics is required.
Knowledge of machine learning basics is recommended.

Time and Location: Mondays/Wednesdays 8:40 am - 9:55 am in Kimbal Hall. Total of 28 lectures.

Course Description

This course will cover contemporary advances in computer vision. We will cover quick background on classical computer vision before delving into neural networks, modern methods for 3D reconstruction such as neural radiance fields, modern recognition systems based on vision-language models and multimodal language models, and recent trends in synthesis or image generation.

 

Course Objectives/Student Learning Outcomes

After taking this course, students will be able to:

  1. Describe intuitively and mathematically the geometry and physics of image formation
  2. Define clearly the information that gets lost in image formation
  3. Derive the mathematics behind inverting the projection process and recovering camera poses and point clouds
  4. Identify why correspondences are needed
  5. Define the challenges in identifying correspondences and describe how reconstruction algorithms handle outlier correspondences
  6. Describe the architecture and training pipelines of DUST3R and its variants
  7. Use established libraries to reconstruct scenes
  1. Define the goal of neural radiance fields
  2. Differentiate it from classical reconstruction pipelines
  3. Define the need for positional encodings
  4. Derive the mathematical expressions involved in volume rendering
  5. Implement the NeRF training and inference pipeline
  6. Identify scenarios where NeRFs fail to model scenes
  7. Discuss the advantages and disadvantages of Gaussian Splats as an alternative to NeRFs
  1. Design, implement and train convolutional networks for classification.
  2. Design, implement and train transformers for classification.
  3. Describe the many ways of transfer learning, including fine-tuning, prompt tuning and parameter-efficient fine-tuning
  4. Describe architectures and training pipelines for object detection
  5. Describe architectures and training pipelines for semantic and instance segmentation
  6. Articulate the challenges and training methodologies for structured prediction
  7. Write down loss functions used for self-supervised learning, and explain the corresponding intuitions
  8. Write down loss functions used for contrastive vision language models and explain the corresponding intuitions
  9. Identify the capabilities and weaknesses of vision language models
  10. Identify the capabilities and limitations of multimodal language models
  1. Mathematically describe the generative modeling problem
  2. Intuitively describe the challenges behind generative modeling.
  3. Define the training objective of GANs and describe how it solves these challenges.
  4. Define the training objective of VAEs and contrast them with GANs
  5. Define the training objective and architectures of diffusion models and their different variants.
  6. Describe how the architecture and objectives are altered for conditional models
  7. Describe the emergent properties of diffusion models
  8. Compare diffusion models with novel view synthesis methods and describe how they may be combined with NeRFs.

Course Materials

Course materials in the form of lecture slides will be available through the Canvas webpage  

Method of Assessing Student Achievement

  1. Projects: This course will have 2 projects on the topics of 3D reconstruction and recognition respectively. The assignments must be done in groups of 2. Students wanting to do the projects alone must let the instructor know.
  2. Individual homeworks: This course will have 2 homeworks to be done individually. The homework will involve both written and programming components.
  3. Exams: This course will have one take-home final exam.

  1. Late work: You will have 10 slip days for the entire course. Once these are exhausted, you will lose 5% of the assignment grade for every day of delay.
  2. Missed work: If you miss assignments, homeworks or exams due to unforeseen medical emergencies, contact the instructor of the course with the reason and appropriate documentation. If the instructor finds the reason justified, you will be given the option of rescaling whatever work you have done to calculate your grade. The details of how this rescaling will work is at the discretion of the instructor.
  3. However, this option will only be available if you have submitted at least one programming assignment and one exam. If you are unable to do this minimum amount of work, then you will be encouraged to take an INC and finish the work later, or switch to S/U.

  1. Grade distribution:           

Assignment, Assessment or activity

Percentage of grade or points

Projects

40%

Individual Homeworks

40%

Final

20%

                                                                                                                       

 

           

95-100%

A+

50-65%

C+

 

90-95%

A

40-50%

C

 

85-90%

A-

30 - 40%

C-

 

80-85%

B+

20 - 30% 

 D +

 

75-80%

B

 10-20%

 D

 

65-75%

B-

 <10%

 F

 

 

 

 

 

 

Course Management

ACADEMIC INTEGRITY:  

Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work.

In particular, since modern language model-based tools (like ChatGPT) can copy without citation any text in their training data, the use of such tools will be considered as plagiarism and is therefore strictly prohibited.

 

ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES:  

Students with Disabilities: Your access in this course is important. Please give me your Student Disability Services (SDS) accommodation letter and email me a note early in the semester so that we have adequate time to arrange your approved academic accommodations. If you need an immediate accommodation for equal access, please speak with me after class or send an email message to me and/or SDS at sds_cu@cornell.edu. If the need arises for additional accommodations during the semester, please contact SDS. Student Disability Services is located at Cornell Health Level 5, 110 Ho Plaza, 607-254-4545, sds.cornell.edu.

 

 

 

INCLUSIVITY:

 Computer vision is a technology fraught with many ethical issues in its current practice. As new entrants into this field, you have the power to change this for the better. We can start by keeping our course an inclusive environment that supports everyone’s learning, maintains a civil discourse, and respects what every one of us brings to the table.

MENTAL HEALTH AND STRESS MANAGEMENT RESOURCES

 If you are feeling overwhelmed, or worried about a friend, please reach out to one of your instructors or your academic advisor.

Please look at this guide that collects all the resources that you can avail of.

 

Note that Cornell has trained peer mentors available to listen and help: Empathy, Assistance, and Referral Service , Also trained counselors:  Cornell Health's Counseling and Psychological Services (CAPS, 607-255-5155), and Let’s Talk.