1 Introduction

1.1 Overview and philosophy

The title of this course is Numerical Methods for Data Science. What does that mean? Before we dive into the course technical material, let’s put things into context. I will not attempt to completely define either “numerical methods” or “data science,” but will at least give some thoughts on each.

Numerical methods are algorithms that solve problems of continuous mathematics: finding solutions to systems of linear or nonlinear equations, minimizing or maximizing functions, computing approximations to functions, simulating how systems of differential equations evolve in time, and so forth. Numerical methods are used everywhere, and many mathematicians and scientists focus on designing these methods, analyzing their properties, adapting them to work well for specific types of problems, and implementing them to run fast on modern computers. Scientific computing, also called Computational Science and Engineering (CSE), is about applying numerical methods — as well as the algorithms and approaches of discrete mathematics — to solve “real world” problems from some application field. Though different researchers in scientific computing focus on different aspects, they share the interplay between the domain expertise and modeling, mathematical analysis, and efficient computation.

I have read many descriptions of data science, and have not been satisfied by any of them. The fashion now is to call oneself a data scientist and (if in a university) perhaps to start a master’s program to train students to call themselves data scientists. There are books and web sites and conferences devoted to data science; SIAM even has a journal on the Mathematics of Data Science. But what is data science, really? Statisticians may claim that data science is a modern rebranding of statistics. Computer scientists may reply that it is all about machine learning¹ and scalable algorithms for large data sets. Experts from various scientific fields might claim the name of data science for work that combines statistics, novel algorithms, and new sources of large scale data like modern telescopes or DNA sequencers. And from my biased perspective, data science sounds a lot like scientific computing!

Though I am uncertain how data science should be defined, I am certain that a foundation of numerical methods should be involved. Moreover, I am certain that advances in data science, broadly construed, will drive research in numerical method design in new and interesting directions. In this course, we will explore some of the fundamental numerical methods for optimization, numerical linear algebra, and function approximation, and see the role they play in different styles of data analysis problems that are currently in fashion. In particular, we will spend roughly two weeks each talking about

Linear algebra and optimization concepts for ML.
Latent factor models, factorizations, and analysis of matrix data.
Low-dimensional structure in function approximation.
Function approximation and kernel methods.
Numerical methods for graph data analysis.
Methods for learning models of dynamics.

You will not strictly need to have a prior numerical analysis course for this course, though it will help (the same is true of prior ML coursework). But you should have a good grounding in calculus, linear algebra, and probability, as well as some “mathematical maturity.”

1.2 Readings

In the next chapter, we give a lightning review of some background material, largely to remind you of things you have forgotten, but also perhaps to fill in some things you may not have seen. Nonetheless, I have never believed it is possible to have too many books, and there are many references that you might find helpful along the way. All the texts mentioned here are either openly available or can be accessed as electronic resources via many university libraries. I note abbreviations for the books where there are actually assigned readings.

1.2.1 General Numerics

If you want to refresh your general numerical analysis chops and have fun doing it, I recommend the Afternotes books by Pete Stewart. If you would like a more standard text that covers most of the background relevant to this class, you may like Heath’s book (expanded for the “SIAM Classics” edition). I was involved in a book on many of the same topics, together with Jonathan Goodman at NYU. O’Leary’s book on Scientific Computing with Case Studies is probably the closest of the lot to the topics of this course, with particularly relevant case studies. And Higham’s Accuracy and Stability of Numerical Methods is a magesterial treatment of all manner of error analysis (highly recommended, but perhaps not as a starting point).

Afternotes on Numerical Analysis and Afternotes Goes to Graduate School, Stewart
Scientific Computing: An Introductory Survey, Heath
Principles of Scientific Computing, Bindel and Goodman
Scientific Computing with Case Studies, O’Leary
Accuracy and Stability of Numerical Algorithms, Higham

1.2.2 Numerical Linear Algebra

I learned numerical linear algebra from Demmel’s book, and still tend to go to it as a reference when I think about how to teach. Trefethen and Bau is another popular take, created from when Trefethen taught at Cornell CS. Golub and Van Loan’s book on Matrix Computations ought to be on your shelf if you decide to do this stuff professionally, but I also like the depth of coverage in Stewart’s Matrix Algorithms (in two volumes). And Elden’s Matrix Methods in Data Mining and Pattern Recognition is one of the closest books I’ve found to the spirit of this course (or at least part of it).

ALA: Applied Numerical Linear Algebra, Demmel
Numerical Linear Algebra, Trefethen and Bau
Matrix Algorithms, Vol 1 and Matrix Algorithms, Vol 2, Stewart
Matrix Methods in Data Mining and Pattern Recognition, Elden

1.2.3 Numerical Optimization

My go-to book on numerical optimization is Nocedal and Wright, with the book by Gill, Murray, and Wright as a close second (the two Wrights are unrelated). For the particular case of convex optimization, the standard reference is Boyd and Vandeberghe. And given how much of data fitting revolves around linear and nonlinear least squares problems, we also mention an old favorite by Bjorck.

NO: Numerical Optimization, Nocedal and Wright
Practical Optimization, Gill, Murray, and Wright
Convex Optimization, Boyd and Vandenberghe
Numerical Methods for Least Squares Problems, Bjorck

1.2.4 Machine Learning and Statistics

This class is primarily about numerical methods, but the application (to tasks in statistics, data science, and machine learning) is important to the shape of the methods. My favorite book for background in this direction is Hastie, Tribshirani, and Friedman, but the first book I picked up (and one I still think is good) was Bishop. And while you may decide not to read the entirety of Wasserman’s book, I highly recommend at least reading the preface, and specifically the “statistics/data mining dictionary”.

ESL: Elements of Statistical Learning, Hastie, Tribshirani, and Friedman
Pattern Recognition and Machine Learning, Bishop
All of Statistics, Wasserman

1.2.5 Math Background

If you want a quick refresher of “math I thought you knew” and prefer something beyond my own notes, Garrett Thomas’s notes on “Mathematics for Machine Learning” are a good start. If you want much, much more math for ML (and CS beyond ML), the book(?) by Gallier and Quaintance will keep you busy for some time.

Mathematics for Machine Learning, Thomas
Much more math for CS and ML, Gallier and Quaintance

The statisticians could retort that machine learning is itself a modern rebranding of statistics, with some justification.↩︎