Syllabus
Logistical information
CS 6241, Spring 2021: Numerical Methods for Data Science
Lecture time: TR 2:45-4:00
Lecture location: Zoom
Prof: David Bindel
Email: bindel@cornell.edu
OH: Mon 2-4, Fri 12:30-1:30 or 2:00-3:00, or by appointment.
TA: Xinran Zhu
Email: xz584@cornell.edu
OH: Thu 10:00-12:00.
Course description
This is a graduate level course on numerical methods prominent in modern data analysis and machine learning. Students must have a strong grounding in linear algebra and probability as well as sufficient mathematical maturity. Prior experience with numerical methods at the level of CS 4210/4220 or CS 6210/6220 will be highly useful, though not strictly required. The course will consist of six units of about two weeks each:
- Least squares and regression
- Low-rank factorizations for matrix and tensor data
- Low-dimensional structure in function approximation
- Kernel interpolation and Gaussian processes
- Numerical methods for graph data analysis
- Methods for learning models of dynamics
We will pay particular attention throughout to sparsity, rank structure, and spectral behavior of underlying linear algebra problems; convergence behavior and “regularization via iteration” effects for standard solvers; and comparisons between numerical methods for data analysis with large-scale numerical methods used in other areas of science and engineering.
Course work
Notes and readings will be posted on the course web page. We recommend reading the notes prior to the meeting.
Class meetings will typically consist of an ice-breaker, open question, or group quiz, followed by lecture and full-class discussion. We will try to reserve part of the time for small-group work involving either paper discussions or working through a Jupyter notebook.
Other than activities to help students keep on track with the readings, the main course activity will be a course research project.
Course technology
The public course web page will be used for all activities that can readily be shared with the world. Otherwise, we will use Canvas for assignments, together with integrations for discussion (Ed Discussion) and social annotation of the reading (Hypothes.is). We will use Jupyter notebooks and the Julia programming language for in-class exercises; these can be run locally on your machine or via Google Colab.
Course policies
Grading
Graded work will be 50% a term research project, 45% participation, and 5% course feedback and evaluations. On a 200 point scale, this will consist of:
Research project
After an initial reaction paper (done individually), students will work in groups of 1-3 on a term-length research project. Part of the credit will involve participating in peer review of contributions from two other groups. The parts of this project will include:
- Reaction paper involving reading and critique of at least two papers - 10 points
- Project proposal for 1-3 people, including pointers to related work and a plan for how team members will work together - 10 points
- Short progress report - 5 points
- Draft report - 10 points
- Peer review of two other draft reports - 10 points each
- Final report - 40 points
Participation and feedback
After the first lecture where we lay out this syllabus, there will be 25 additional lectures. We ask for 90 points of “participation work,” defined as
- Participating in a class opening activity or quiz (1 point)
- Completing a class/homework notebook (1 point per question)
- Written participation in class discussion of a paper (1 point)
- Providing a (solved) question suitable for a homework (1 point)
- Defining an open research question, with reference to literature (1 point)
For homework, quizzes, or questions that are clearly incorrect, we will provide feedback and a chance to provide a correction (or convince us that we are wrong!). Participation work does not have to be done wholly during class meetings.
Collaboration
Collaboration in this class is explicitly encouraged, as is reference to the research literature. You can and should work together, with co-authorship for group assignments.
An assignment is an academic document, like a journal article. When you turn it in, you are claiming everything in it is your original work, unless you cite a source for it.
If you get an idea from a classmate, the TA or professor, a book or other published source, or elsewhere, please provide an appropriate citation. This is not only critical to maintaining academic integrity, but it is also an important way for you to give credit to those who have helped you out. When in doubt, cite! Code or writeups with appropriate citations will never be considered a violation of academic integrity in this class (though you will not receive credit for code or writeups that were shared when you should have done them yourself).
For more information, see Cornell’s Code of Academic Integrity.
Emergency procedures
In the event of a major campus emergency, course requirements, deadlines, and grading percentages are subject to changes that may be necessitated by a revised semester calendar or other circumstances. Any such announcements will be posted to Canvas and the course home page.