CS 6241: Numerical Methods for Data Science

Cornell University, Spring 2019.
Lectures: 2:55 PM–4:10 PM, Tuesdays and Thursdays, 140 Bard Hall.
Instructor: Austin Benson, Assistant Professor, Computer Science.
Contact: arb@cs.cornell.edu.
Office hours: 1:30 PM–2:30 PM Tuesdays or by appointment, 413B Gates.

Resources

Course administration and discussion: Textbooks: Other course web pages: Datasets:

Coursework schedule

Class schedule

This schedule is tentative and subject to change.
Week 1
Lectures 1 & 2 (Tu 1/22 and Th 1/24): Numerical linear algebra review, optimization problems, linear least squares
Topics: course overview and expectations, vector and matrix norms, linear systems, eigenvalues, basic matrix factorizations, basics of optimization problems, linear least squares
Readings: Lecture notes:
Week 2
Lecture 3 (Tu 1/29): Regularized linear least squares
Topics: Tikhonov regularization / ridge regression, Lasso, pivoted QR
Readings: Lecture notes: Lecture 4 (Th 1/31): Basics of iterative solvers and sparse linear least squares
Topics: Basics of Krylov subspace methods, LSQR, LSMR
Readings: Lecture notes:
Week 3
Lecture 5 (Tu 2/5): Latent factor models, linear dimensionality reduction, and matrix factorization
Topics: PCA, robust PCA, CUR
Readings: Lecture notes: Lecture 6 (Th 2/7): Latent factor models, linear dimensionality reduction, and matrix factorization
Topics: NMF
Readings: Lecture notes:
Week 4
Lecture 7 (Tu 2/12): Randomized numerical linear algebra [Guest lecture by Anil Damle]
Topics: randomized SVD, LSRN, random projections for NLA
Readings: No lecture Th 2/14 [Austin is in Australia]
Week 5
Lectures 8 & 9 (Tu 2/19 & Th 2/21): Tensors eigenvectors and decompositions
Topics: problems with border rank, ill-posedness, and Eckart–Young analogs; best rank-1 approximation and connections to tensor eigenvectors; Z-eigenpairs and H-eigenpairs; CP and Tucker decompositions
Readings (broad ideas): Readings (rank and decompositions): Readings (eigenvalues and eigenvectors): Readings (applications): Lecture notes:
Week 6
No lecture Tu 2/26 (February break)

Lecture 10 (Th 2/28): Nonlinear dimensionality reduction
Topics: ISOMAP, LLE, (t-)SNE
Readings: Lecture notes:
Week 7 [HW1 due, Th 3/7 at 11:59pm ET]
Lectures 11 & 12 (Tu 3/5 & Th 3/7): Basic network analysis, structure, heavy tails, random graph models
Topics: adjacency matrix, graph laplacian, random walk matrix, basic spectral graph theory, sparsity, hop distances, clustering, heavy tails and degree distributions, Erdős–Rényi, configuration models, stochastic block models
Readings (survey): Readings (matrices, spectra, random walks): Readings (empirical network properties): Readings (heavy-tailed distributions): Readings (random graph models): Lecture notes:
Week 8
Lectures 13 & 14 (Tu 3/12 & Th 3/14): Learning on graphs: unsupervised network clustering and community detection
Topics: Spectral methods for bisection, ratio cut, normalized cut, and modularity
Readings (spectral methods): Readings (more general community detection): Lecture notes:
Week 9
Lecture 15 (Tu 3/19): Graph-based semi-supervised learning
Topics: graph-based semi-supervised techniques and connections to random walks and spectral clustering
Readings: Lecture notes: Lecture 16 (Th 3/21): Node representation learning in graphs
Topics: node embeddings, latent space models, role discovery, node2vec
Readings: Lecture notes:
Week 10 [Reaction paper due Th 3/28 at 11:59pm ET]
Lecture 17 & 18 (Tu 3/26 & Th 3/28): Small patterns in networks
Topics: motifs, graphlets, and small subgraphs
Readings (foundations): Readings (applications) Readings (counting algorithms): Lecture notes:
Week 11
No lecture Tu 4/2 or Th 4/4 (spring break)
Week 12 [Project proposal due Th 4/11 at 11:59pm ET]
No lecture Tu 4/9 [Austin is in DC]

Lecture 19 (Th 4/11): Ranking and network centrality
Topics: Katz, eigenvector, PageRank, and HITS
Readings (original papers): Readings (overviews and more modern takes): Lecture notes:
Week 13 [HW2 due, Th 4/18 at 11:59pm ET]
No lecture Tu 4/16 [Austin is in DC]

Lecture 20 (Th 4/18): Ranking and network centrality in hypergraphs
Topics: Extensions of PageRank and eigenvector centrality to hypergraphs
Main readings: Additional readings (other extensions and computational considerations): Lecture notes:
Week 14
Lecture 21 (Tu 4/23): Special topics: Kernels and function approximation
Topics: feature maps, kernel trick, RKHS, Gaussian Processes
Readings: Lecture notes: No lecture Th 4/25 [Austin is in Boston]
Week 15 [Project progress report due Th 5/2 at 11:59pm ET]
Lectures 22 & 23 (Tu 4/30 & Th 5/2): Special topics: Discrete choice
Topics: random utility models, (mixed) logits, IIA, probit, network growth as discrete choice
Readings: Lecture notes:
Week 16
Last day of class (Tu 5/7): Project feedback sessions
Week 17 [Final project report due Wed 5/15 4:30pm ET]

Coursework

Coursework will be managed through and assignments submitted on CMS. The required coursework consists of three components:
  • Homework (10% each)
    There will be two homework where you implement numerical methods that we learned in class and use them to analyze datasets.
  • Reaction paper (20%)
    The second component of the course is composing a (roughly) 5-page reaction paper that summarizes and critiques at least two closely related published papers relevant to the class. The motivation of this assignment is to get everyone thinking about research issues related to the class material and to stimulate ideas for the course project (described below). You can work individually or in groups of two or three. Your group members can be the same as your project partners, but they do not have to be. Additional details will be posted soon.
  • Course project (60%)
    Most of your grade will be determined by a course project, where the topic is of your choosing but must be related to the class. You can work individually or in groups of two or three (the scope of the project should be scaled according to the group size). I will be looking for three key pieces in the course project: (i) some theoretical or mathematical discussion of a numerical method or algorithm (broadly construed); (ii) some software implementation of a method or algorithm; and (iii) some analysis of a real-world dataset. The project will be split into three components: a proposal, a progress report with feedback session attendance, and a final report, worth 1/4, 1/4, and 1/2 of the project grade, respectively. More details will be posted soon.