CS 6220: Data-sparse matrix computations


Instructor: Anil Damle

Contact: damle@cornell.edu

Office hours: Mondays 11 am till 12 pm, Wednesdays 10 am - 11 am, and by appointment on Zoom

Lectures: Tuesdays and Thursdays from 1 pm till 2:15 pm synchronously via Zoom

Zoom links: Meeting information for lectures and office hours are available here.

Course overview: Matrices and linear systems can be data-sparse in a wide variety of ways, and we can often leverage such underlying structure to perform matrix computations efficiently. This course will discuss several varieties of structured problems and associated algorithms. Example topics include randomized algorithms for numerical linear algebra, Krylov subspace methods, sparse recovery, and assorted matrix factorizations.

News and important dates


  • February 9 — First day of class

Homework/Project


Your grade in this course is comprised of three components: scribing lectures (think of these as individualized homeworks) and participating in peer review, writing a paper review, and a course project. The final grade will be heavily weighted towards the project. While these assignments are constitute the formal workload for the course, I encourage you to look further into the given references for any topics that you find particularly interesting. There is far more to many of these topics than we will have time to cover in this course.

  • Lecture scribing and peer review:

    You will be responsible for scribing two lectures given in this course and reviewing two sets of lecture notes. Scribing a lecture involves not just taking good notes and typesetting them, but also elaborating on points you find interesting, filling in details, example implementations and simple numerical experiments (if appropriate), and adding references as needed. Therefore, I recommend picking a topic you find interesting as this can provide a good opportunity to learn more. It would be reasonable to think of these as two homeworks for the class. An example set of scribed notes (not from this class but on a relevant topic) may be found here.

    Peer review of lectures notes consists of carefully reading through another students scribed notes and providing feedback (typos, suggestions for improvement, etc.) to them so they can iterate on the notes before they are final. The purpose of this assignment is both to ensure the quality of notes (they will be made available on the course website for others in the class) and provide experience providing written feedback on technical material.

    The first draft of the notes should be completed within one week after the class you are scribing occurs and the peer review process including revisions should occur within one week after that. As with most due dates in this class (save for the final report) I am flexible — let me know if you need a short extension. Once this process is completed I should receive the original draft of the notes, the review, and the revised version so I may peruse them myself and add the final version to the course website. I am happy to aid in this process and please let me know if you have any questions along the way. I will not be grading the notes per se and this portion of the class grade is essentially based on completion (if I feel improvements are needed to a set of notes I will ask for them).

  • Paper review:

    As part of this course you will have to complete and submit one paper review. In particular, at any point in this class you should select a paper relevant to topics we are discussing (I am happy to provide suggestions and there will be many on the course website) and write a brief review of it. A review consists of a summary discussion of the key points of the paper (theorems, experiments, etc.) and why you think they are interesting. This must be turned in by the last day of class, though I would encourage you to submit it in a timely manner with respect to our discussion of the topic the paper you choose covers. It is perfectly fine to align this assignment with either the lecture scribing or your project. For example, you could review a paper particularly relevant to the lecture(s) you scribe and/or choose one closely related to your project.

  • Course project:

    As part of this course you will be required to complete a course project on a topic of your choosing (some examples will be added here in the near future and given in class). This may be done individually or in groups of up to three students (flexible), though the scope should be scaled accordingly. While the project should contain a mix of implementation, application examples, and theoretical discussion. However, depending on the specific topic the relative composition of the project may vary as appropriate. Further details are available here.

    As part of the project you will write a report and present your work in the final week of the course. The specifics of how the presentations work logistically is subject to change pending the final enrollment count for the course. A two page proposal is due on April 1 (I am happy to discuss ideas at any time) and the final report is due on May 24 at 1:00 pm. A brief document is available here that provides guidelines for the report.

References


There is no one textbook that covers all of the material for this course — in fact, there is no required text for this course. In lieu of that, a collection of references (textbooks, papers, and notes) will be progressively added that cover various aspects of the course.

Books

  • Matrix Computations by Golub and Van Loan
    This text is the standard reference text in numerical linear algebra and, therefore, valuable as a backstop for much of what we will talk about in this courses
  • Numerical Linear Algebra by Trefethen and Bau
    A good textbook that broadly covers numerical linear algebra at the graduate level. It is well written, certainly more colloquial the Golub and Van Loan, and many find it provides a smoother introduction to the topics we will discuss. However, by construction, it is not as comprehensive as Golub and Van Loan. This book is available online through the library. [Cornell library]
  • Applied Numerical Linear Algebra by Demmel
    A good textbook that covers numerical linear algebra at the graduate level. It may prove useful as a secondary source of information for some of the material, or as a vehicle for further study. This book is also available online through through Cornell's SIAM subscription. [SIAM online]

Papers

  • A short course on fast multipole methods by Rick Beatson and Leslie Greengard
    These lecture notes provide a good introduction to the FMM, cover the algorithm in three dimensions, and includes more detail than presented in this courses lectures. [Online]
  • Fast Algorithms for Boundary Integral Equations by Lexing Ying
    This book chapter provides both an introduction to boundary integral equations and a discussion of various fast algorithms constructed to solve them. Notably, this includes the FMM, some of its variants, rank-structured techniques, and more. [Online]
  • Rank-structured matrix papers
    There are several good references for rank-structured matrix algorithms of the variety we discussed in class. A selection includes: the work by Martinsson and Rokhlin in 2005 [PDF], the recursive skeletonization factorization by Ho and Greengard in 2012 [PDF], a more thorough treatment of the low-rank compression scheme by Cheng et. al. in 2005 [PDF], and a recent paper on a strongly admissible recursive skeletonization factorization by Minden et al. in 2017 [PDF]. The last reference is essentially what we covered in class, contains numerous references to previous work, and section 3.2 describes the use of a so-called proxy surface in some detail.
  • Rank-structured matrix code
    The FLAM package mentioned in class is available online as is the code for the recursive skeletonization based on strong admissibility.
  • Randomized Numerical Linear Algebra: Foundations & Algorithms by Martinsson and Tropp
    This monograph provides a nice, recent introduction to randomized methods in numerical linear algebra and related fields. [arXiv]
  • Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions by Halko, Martinsson, and Tropp
    This SIAM Review article provides a nice introduction to randomized methods in numerical linear algebra. [SIAM Online and arXiv]
  • CUR matrix decompositions for improved data analysis by Mahoney and Drineas
    This PNAS paper provides a brief introduction to the CUR factorization, some applications, and references to the more detailed works in the area. [PNAS Online]
  • LSRN by Meng, Saunders, and Mahoney
    This paper presents the LSRN method, an iterative solver utilizing randomness that solves least squares problems. [Paper and code]
  • Blendenpik by Avron, Maymounkov, and Toledo
    This paper presents the Blendenpik algorithm for overdetermined least squares problems. [Paper]
  • Low-rank approximation and regression in input sparsity time by Clarkson and Woodruff
    This paper presents and analyzes the sparse random matrices we discussed in class. [Journal and arXiv]
  • Atomic decomposition by basis pursuit by Chen, Donoho and Saunders
    This paper presents the basis pursuit denoising algorithms we discussed in class and gives comparisons with other methods. [Journal]
  • Sparse and redundant representations by Elad
    This book is a good resource for the sparse recovery discussions we have had in class and contains many of the theoretical statements with proofs. It should be available online in PDF form via the Cornell library [Springer online]
  • Optimization resources
    There are many algorithms and techniques for solving the problems we discussed, including PDCO, ASP, CVX, TFOCS, and many more.
  • Low rank plus sparse and robust PCA
    Two papers relevant to our discussion in class are Candès, Li, Ma, and Wright 2011 [Journal link] and Chandrasekaran, Sanghavi, Parrilo, and Willsky 2009 [Journal link].

Schedule


A tentative schedule follows, and includes the topics we will be covering and relevant reference material (More details on the listed papers are given in the References section). Scribed notes provided here are written by students in the class and, while peer reviewed by

Date Topic References Notes/assignments
2/09 Introduction
2/11 Intro to randomized numerical linear algebra SIAM review paper by Martinsson, Halko, and Tropp and Monograph by Martinsson and Tropp
2/16 Randomized NLA, algorithmic improvements and proofs SIAM review paper by Martinsson, Halko, and Tropp and Monograph by Martinsson and Tropp
2/18 Randomized NLA, proofs SIAM review paper by Martinsson, Halko, and Tropp and Monograph by Martinsson and Tropp
2/23 Randomization for least squares problems LSRN paper by Meng, Saunders, and Mahoney
2/25 Randomization for least squares problems Blendenpik paper by Avron, Maymounkov, and Toledo
3/2 Blendenpik and CUR decompositions
3/04 Column subset selection
3/09 Wellness day No Class
3/11 Low-dimensional embeddings
3/16 The FFT
3/18 The NUFFT
3/23 The idea behind the FMM
3/25 Rank-structured matrices
3/30 Rank-structured matrices Paper by Minden et al.
4/01 Rank-structured matrices Paper by Minden et al.
4/06 Actual sparse matrices Direct Methods for Sparse Linear Systems by Davis
4/08 Actual sparse matrices Direct Methods for Sparse Linear Systems by Davis
4/13 Actual sparse matrices Direct Methods for Sparse Linear Systems by Davis
4/15 Sparse recovery Sparse and redundant representations by Elad
4/20 Sparse recovery Sparse and redundant representations by Elad
4/22 Sparse recovery Sparse and redundant representations by Elad
4/27 Sparse recovery Sparse and redundant representations by Elad
4/29 Compressed sensing
5/04 Compressed sensing
5/06 Phase retrieval and low-rank plus sparse decompositions
5/11 Limiting distributions of neural networks
5/13 Project presentations

Notes


Additional notes may be added here throughout the course. Please email the instructor if you believe to have found any errors so they may be corrected.

Course policies


Inclusiveness

You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the Computer Science Department’s Values of Inclusion. [Statement reproduced with permission from Dan Grossman.]

Participation

You are encouraged to actively participate in class. This can take the form of asking questions in class, responding to questions to the class, and actively asking/answering questions on the online discussion board.

Collaboration policy

You may discuss the homework and projects freely with other students, though any work you turn in must be your own and provide appropriate citations to existing work, implementations, etc. if they are used.

Academic integrity

The Cornell Code of Academic Integrity applies to this course.

Accommodations

In compliance with the Cornell University policy and equal access laws, I am available to discuss appropriate academic accommodations that may be required for student with disabilities. Requests for academic accommodations are to be made during the first three weeks of the semester, except for unusual circumstances, so arrangements can be made. Students are encouraged to register with Student Disability Services to verify their eligibility for appropriate accommodations.