CS 6210: Matrix Computations

Introduction

David Bindel

2025-08-25

Vector spaces

Concrete spaces

Example vector: \[ v = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \in \mathbb{R}^2 \] Set of such vectors forms a concrete vector space.

  • Most common cases: \(\mathbb{R}^n\) and \(\mathbb{C}^n\)
  • We denote these as columns (statisticians default to rows)
  • What we actually compute with (except in floating point)
  • But not the whole game…

Abstract spaces

Ingredients:

  • Set \(\mathcal{V}\) of vectors (including a zero vector)
  • Scalar field \(\mathbb{F}\) (\(\mathbb{R}\) or \(\mathbb{C}\) for us)
  • Sensible addition and scalar multiplication operations

Axioms

Need for all \(u, v, w \in \mathcal{V}\) and \(\alpha \in \mathbb{F}\): \[\begin{aligned} 0 v &= 0 & 1 v &= v \\ u + v &= v + u & (u + v) + w &= u + (v + w) \\ \alpha (u + v) &= \alpha u + \alpha v & (\alpha + \beta) u &= \alpha u + \beta u \end{aligned}\]

Examples

\[\begin{aligned} \mathcal{P}_d &= \{ \mbox{polynomials of degree at most $d$} \} \\ \mathcal{V}^* &= \{ \mbox{linear functions $\mathcal{V} \rightarrow \mathbb{R}$ (or $\mathbb{C}$)} \} \\ L(\mathcal{V}, \mathcal{W}) &= \{ \mbox{linear maps $\mathcal{V}\rightarrow \mathcal{W}$} \} \\ \mathcal{C}^k(\Omega) &= \{\mbox{ $k$-times differentiable functions on a set $\Omega$} \} \end{aligned}\]

  • Mostly interested in finite-dimensional case
  • … but \(C^k\) is an infinite-dimensional example!

Subspaces

\(\mathcal{U}\) is a subspace of vector space \(\mathcal{V}\) if

  • \(\mathcal{U}\) is a subset of \(\mathcal{V}\)
  • \(\mathcal{U}\) is closed under the vector space operations

Sums of subspaces \(\mathcal{V}_1 \subset \mathcal{V}\) and \(\mathcal{V}_2 \subset \mathcal{V}\):

  • \(\mathcal{V}_1 + \mathcal{V}_2 = \{ v_1 + v_2 : v_1 \in \mathcal{V}_1, v_2 \in \mathcal{V}_2 \}\)
  • \(\mathcal{V}_1 + \mathcal{V}_2\) is a subspace of \(\mathcal{V}\) (tedious exercise: show it!)
  • Direct sum \(\mathcal{V}_1 \oplus \mathcal{V}_2\) if decomposition \(v_1+v_2\) is unique

Can also quotient: \([v] \in \mathcal{V}/ \mathcal{U}= \{ v + u : u \in \mathcal{U}\}\).

Direct sum decompositions

When \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\), we have component projectors \[ \Pi_1 (v_1 + v_2) = v_1, \quad \Pi_2 (v_1 + v_2) = v_2 \] Example: \(\mathcal{P}_d\) is a direct sums of even and odd subspaces \[\begin{aligned} (\Pi_{\mathrm{even}} q)(x) &= \frac{1}{2} (q(x) + q(-x)) \\ (\Pi_{\mathrm{odd}} q)(x) &= \frac{1}{2} (q(x) - q(-x)) \end{aligned}\] Idea generalizes to when \(\mathcal{V}= \mathcal{V}_1 \oplus \ldots \oplus \mathcal{V}_k\).

Spanning sets and bases

Spanning sets

\(S \subset \mathcal{V}\) a spanning set if any \(v \in \mathcal{V}\) is a linear combination \[ v = \sum_{j=1}^m \alpha_j s_j \] for some \(s_j \in S\) and \(\alpha_j \in \mathbb{F}\).

Example: \(\mathcal{P}_2 = \operatorname{span}\{1, x, -x, x^2\}\).

Linear independence

\(S \subset \mathcal{V}\) is linearly independent if any \(v \in \mathcal{V}\) is unique as \[ v = \sum_{j=1}^m \alpha_j s_j. \] Equivalent: \(S \subset \mathcal{V}\) if no nontrivial linear combination gives 0.

Example:

  • \(\{1, x, -x, x^2\}\) is not linearly independent (\(x + -x = 0\))
  • \(\{1, x, x^2\}\) is linearly independent

Bases

\(S \subset \mathcal{V}\) is a basis if

  • \(S\) is a spanning set
  • \(S\) is linearly independent

If \(S\) is a basis, \(d = |S|\) is the dimension.

  • But dimension doesn’t actually depend on the basis

Dual bases

Basis \(\{ w_1^*, \ldots, w_d^* \}\) for \(\mathcal{V}^*\) and basis \(\{ v_1, \ldots, v_d \}\) for \(\mathcal{V}\) are dual to each other if \[ w_i^* \left( \sum_{j=1}^d \alpha_j v_j \right) = \alpha_i. \] Equivalently: \(w_i^* v_j = \delta_{ij}\).

Basis quasimatrices

  • Conventional linear algebra: work with basis sets
  • But we invariably give the elements indices (usually integer)!
  • Idea: just use ordered lists vs sets
  • Notation: write as quasimatrices

Basis quasimatrices

Basis quasimatrix \(V = \begin{bmatrix} v_1 & \ldots & v_d \end{bmatrix}\) for \(\mathcal{V}\)

  • Ordered list written like a matrix
  • Each column is an abstract vector
  • Matrix-vector products denote linear combinations
  • Represents mapping from concrete space to abstract space

Basis quasimatrices

Dual basis quasimatrix \(W^* = \begin{bmatrix} w_1^* \\ \vdots \\ w_d^* \end{bmatrix}\) for \(\mathcal{V}^*\)

  • Each row is an abstract dual vector (element of \(\mathcal{V}^*\))
  • \(W^*\) maps abstract space to concrete space
  • If \(W^*\) and \(V\) are dual basis quasimatrices (\(W^* = V^{-1}\))
    • \(W^* V = I\) (identity on concrete space)
    • \(V W^* = I\) (identity on abstract space)

Example: Concrete space

Standard basis for \(\mathbb{R}^n\) has elements

\[ e_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix},~~ e_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix},~~ e_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix},~~ \cdots,~~ e_n = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} \]

Basis quasimatrix looks like \(I\). This is not very interesting!

Example: Polynomials

Power basis

Example: Power basis for \(\mathcal{P}_2\) is \[ P = \begin{bmatrix} 1 & x & x^2 \end{bmatrix} \] Write \(p(x) = 1 + x^2\) as \[ p = \begin{bmatrix} 1 & x & x^2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} \]

Chebyshev polynomials

Can also make basis of Chebyshev polynomials: \[\begin{aligned} T_0(x) &= 1 \\ T_1(x) &= x \\ T_{k+1}(x) &= 2xT_k(x) - T_{k-1}(x), \quad k > 1 \end{aligned}\] Note: \[ T_k(\cos(\theta)) = \cos(k\theta) \] Chebyshev polynomials equi-oscillate on \([-1,1]\).

Chebyshev polynomials

Chebyshev basis

Two bases for \(\mathcal{P}_d\):

  • Power: \(P_d = \begin{bmatrix} 1 & x & \ldots & x^d \end{bmatrix}\)
  • Chebyshev: \(T_d = \begin{bmatrix} T_0(x) & T_1(x) & \ldots & T_d(x) \end{bmatrix}\)

Change of basis example: \(T_2 = P_2 X\) where \[ X = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix}. \] Question: How would you compute \(X\) for general \(d\)?

Norms

Properties of a norm

Norm is \(\| \cdot \| : \mathcal{V}\rightarrow \mathbb{R}\) satisfying

  • Positive definiteness: \(\|v\| \geq 0\) with equality iff \(v = 0\)
  • Triangle inequality: \(\|u+v\| \leq \|u\| + \|v\|\)
  • Homogeneity: \(\|\alpha v\| = |\alpha| \, \|v\|\)

An aside:

  • First two properties: \(d(u,v) = \|u-v\|\) is a metric
  • Completeness wrt this metric means \(\mathcal{V}\) is a Banach space
  • Finite-dimensional normed vector spaces over \(\mathbb{R}\) and \(\mathbb{C}\) are all Banach spaces.

Concrete space norms

Our three favorite norms for \(\mathbb{R}^n\) (or \(\mathbb{C}^n\)) are the 2-norm (Euclidean norm), 1-norm (Manhattan norm) and \(\infty\)-norm (max norm): \[\begin{aligned} \|x\|_2 &= \sqrt{\sum_{j=1}^n |x_j|^2} \\ \|x\|_1 &= \sum_{j=1}^n |x_j| \\ \|x\|_\infty &= \max_{j=1}^n |x_j| \end{aligned}\]

Polynomial space norms

For polynomials on \([-1,1]\), favorite norms are: \[\begin{aligned} \|p\|_2 &= \sqrt{\int_{-1}^1 |p(x)|^2 \, dx} \\ \|p\|_1 &= \int_{-1}^1 |p(x)| \, dx \\ \|p\|_\infty &= \max_{x \in [-1,1]} |p(x)| \end{aligned}\] Q: What are these for \(p(x) = x\)?

Norm equivalence

Norms \(\|\cdot\|\) and \(\|\cdot\|_*\) are equivalent if \(\exists c, C > 0\) s.t. \[ \forall v \in \mathcal{V}, c\|v\| \leq \|v\|_* \leq C\|v\|. \]

  • All norms on finite-dimensional spaces are equivalent
  • Says nothing about the size of \(c\) and \(C\)!
  • Statement on topology, not geometry of f-d spaces

Q: Find \(c, C\) relating \(\|\cdot\|_1\) and \(\|\cdot\|_\infty\) on \(\mathbb{R}^n\)?

Inner products

Properties of an inner product

Function \(\langle \cdot, \cdot \rangle : \mathcal{V}\times \mathcal{V}\rightarrow \mathbb{R}\) (or \(\mathbb{C}\)) satisfying:

  • Linearity in first slot: \(\langle \alpha v, w \rangle = \alpha \langle v, w \rangle\) and \(\langle u+v, w \rangle = \langle u, w \rangle + \langle v, w \rangle\).
  • Hermitian (or symmetric): \(\langle v, w \rangle = \overline{\langle w, v \rangle}\).
  • Positive definiteness: \(\langle v, v \rangle \geq 0\) with equality iff \(v = 0\).

Check: \(\|v\| = \sqrt{\langle v, v \rangle}\) is a norm (the Euclidean norm for the inner product).

Terminology

An inner product is a positive definite

  • symmetric bilinear form (for \(\mathbb{R}\)) OR
  • Hermitian sesquilinear form (for \(\mathbb{C}\))

Standard inner product

On \(\mathbb{R}^n\) or \(\mathbb{C}^n\), the standard inner product (dot product) is \[ \langle x, y \rangle = \sum_{j=1}^n x_j \overline{y}_j = y^* x. \] This is not the only inner product even on these spaces!

Expanding the square

We do something like this a lot: \[\begin{aligned} \|v+w\|^2 &= \langle v+w, v+w \rangle \\ &= \langle v, v \rangle + \langle v, w \rangle + \langle w, v \rangle + \langle w, w \rangle \\ &= \|v\|^2 + 2\Re \langle v, w \rangle + \|w\|^2 \end{aligned}\] This turns out to be useful in theory and in computation!

Cauchy-Schwarz

For real spaces: \[\begin{aligned} \|v+2\|^2 &= \|v\|^2 + 2 \langle v, w \rangle + \|w\|^2 \\ \|v+w\|^2 &\leq (\|v\|+\|w\|)^2 \\ &= \|v\|^2 + 2\|v\|\|w\| + \|w\|^2 \end{aligned}\] Therefore \[ \langle v, w \rangle \leq \|v\| \|w\| \] More generally: \(|\langle v, w \rangle| \leq \|v\| \|w\|\).

Law of cosines

Define angle between \(v\) and \(w\) (for a real space) by \[ \cos(\theta) = \frac{\langle v, w \rangle}{\|v\| \|w\|}. \] Then expanding square is equivalent to \[ \|v+w\|^2 = \|v\|^2 + 2 \|v\| \|w\| \cos(\theta) + \|w\|^2. \] This is the law of cosines from basic trig.

Pythagorean theorem

Suppose \(\langle v, w \rangle = 0\), i.e. \(v\) and \(w\) are orthogonal or normal. Then \[ \|v+w\|^2 = \|v\|^2 + 2\Re \langle v, w \rangle + \|w\|^2 = \|v\|^2 + \|w\|^2. \] This is the Pythagorean theorem.

Polynomial inner product

For polynomials, the \(L^2([-1,1])\) inner product is \[ \langle p, q \rangle = \int_{-1}^1 p(x) \overline{q(x)} \, dx. \] This is analogous to the standard inner product on \(\mathbb{R}^n\).

Gram matrices

Suppose \(V\) is a basis for an inner product space \(\mathcal{V}\). \[\begin{aligned} \langle Vc, Vd \rangle &= \left\langle \sum_j v_j c_j, \sum_i v_i d_i \right\rangle \\ &= \sum_{i,j} \langle v_j, v_i \rangle c_j \overline{d}_i \\ &= \sum_{i,j} g_{ij} c_j \overline{d}_i = d^* G c \end{aligned}\] The Gram matrix \(G\) of outer products is symmetric (Hermitian) and positive definite.

Orthonormal bases

A basis \(V\) is orthonormal if

  • \(\langle v_i, v_j \rangle = \delta_{ij}\) or equivalently
  • The Gram matrix is \(I\)

An orthonormal basis is an isometry between the concrete and abstract spaces (with respect to the Euclidean norms): \[ \|Vc\| = \|c\|. \]

Legendre polynomials

Consider Legendre polynomials (usually on \([-1,1]\)) \[\begin{aligned} P_0(x) &= 1 \\ P_1(x) &= x \\ P_k(x) &= (n+1) P_{n+1}(x) = (2n+1) x P_n(x) - n P_{n-1}(x) \end{aligned}\] Satisfies \[ \langle P_n, P_m \rangle = \frac{2}{2n+1} \delta_{mn}. \]

Scaled Legendre polynomials

Scaled Legendre polynomials form orthonormal bases for \(\mathcal{P}_d\): \[ Q_n = \sqrt{\frac{2}{2n+1}} P_n \] These are very useful in function approximation.

Mappings in linear algebra

What does a matrix mean?

Intro class: “matrices represent linear maps.”
But there’s more to the story!

Linear maps

Suppose \(V, W\) are bases for \(\mathcal{V}, \mathcal{W}\) and \(\mathcal{A} \in L(\mathcal{V}, \mathcal{W})\).
Matrix is given by: \[ A = W^{-1} \mathcal{A} V \] That is \(y = Ax\) represents \[ (Wy) = \mathcal{A} (Vx) \]

Operators

Suppose \(V\) a basis for \(\mathcal{V}\) and \(\mathcal{A} \in L(\mathcal{V}, \mathcal{V})\).
Matrix is given by: \[ A = V^{-1} \mathcal{A} V. \] That is, \(y = Ax\) represents \[ (Vy) = \mathcal{A} (Vx). \] We say \(A\) and \(\mathcal{A}\) are similar (and \(A = V^{-1} \mathcal{A} V\) is a similarity transformation).

Bilinear forms

Suppose \(V, W\) are bases for \(\mathcal{V}, \mathcal{W}\) and \(a : \mathcal{V}\times \mathcal{W}\rightarrow \mathbb{R}\) is bilinear (linear in both slots).
Then for the matrix \(A\) with entries \[ a_{ij} = a(v_j, w_i), \] we have \[ a(Vx,Wy) = y^T A x \]

Sesquilinear forms

Suppose \(V, W\) are bases for \(\mathcal{V}, \mathcal{W}\) and \(a : \mathcal{V}\times \mathcal{W}\rightarrow \mathbb{C}\) is sesquilinear (linear in first slot, conjugate linear in second).
Then for the matrix \(A\) with entries \[ a_{ij} = a(v_j, w_i), \] we have \[ a(Vx,Wy) = y^* A x \]

Quadratic forms

Suppose \(V\) a basis for \(\mathcal{V}\) and \(\phi : \mathcal{V}\rightarrow \mathbb{R}\) is a quadratic form: \(\phi(v) = a(v,v)\) for a symmetric bilinear form \(a\).
Then for the matrix \(A\) with entries \[ a_{ij} = a(v_j, v_i) \] we have \[ \phi(Vx) = x^T A x. \]

Quadratic forms

Q: How could we get \(a(v_i,v_j)\) given just access to \(\phi\)?
Hint: Think about expanding \(\phi(v_i+v_j)\)!

Matrix norms

Norms for maps

\(L(\mathcal{V}, \mathcal{W})\) is a vector space

  • Can have structures like other norms, inner products
    • Ex: Frobenius inner product \(\langle A, B \rangle_F = \operatorname{tr}(B^* A)\)
    • Concrete case: \(\langle A, B \rangle_F = \sum_{i,j} a_{ij} \overline{b}_{ij}\)
  • What are some desireable properties?

Consistency

A norm on \(L(\mathcal{V}, \mathcal{W})\) is consistent with norms on \(\mathcal{V}, \mathcal{W}\) if \[ \|Av\| \leq \|A\| \|v\|. \] Ex: Frobenius norm is consistent with two norms: \[ \|Ax\|_2 \leq \|A\|_F \|x\|_2 \] Q: Why? (Hint: only one named inequality in these slides!)

Induced norms

If \(\mathcal{V}\) and \(\mathcal{W}\) have norms, the induced norm on \(L(\mathcal{V},\mathcal{W})\) is \[ \|\mathcal{A}\|_{\mathcal{V},\mathcal{W}} = \max_{v \neq 0} \frac{\|\mathcal{A}v\|_\mathcal{W}}{\|v\|_{\mathcal{V}}} \] For concrete case with our favorite norms, we have \[\begin{aligned} \|A\|_1 &= \max_j \sum_i |a_{ij}| \\ \|A\|_\infty &= \max_i \sum_j |a_{ij}| \\ \|A\|_2 &= \mbox{???} \end{aligned}\]

Decompositions

Factorization paradigm

Basic idea: Write \(A\) as a product of other matrices! \[\begin{aligned} PA &= LU & \mbox{Gaussian elimination} \\ A &= QR & \mbox{Used for least squares, etc} \\ A &= U \Sigma V^* & \mbox{Singular value decomposition} \\ A &= V \Lambda V^{-1} & \mbox{Eigenvalue decomposition} \\ A &= U T U^* & \mbox{Schur decomposition} \end{aligned}\] Claim the last three are different from first two!

Canonical decompositions

Canonical form: “simplest” matrix for any bases.

  • Have canonical forms for maps, operators, quadratic forms
  • Intro linear algebra: Mostly consider any bases
  • Numerical linear algebra: Mostly consider orthonormal bases

Rank and nullity

Mapping type: \(L(\mathcal{V}, \mathcal{W})\) or bilinear or sesquilinear forms
No restrictions on bases.
Canonical form: \[ \begin{bmatrix} I_{r \times r} & 0_{r \times (n-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (n-r)} \end{bmatrix} \] Rank is \(r\), null space dimension is \(n-r\).

Decomposition: \(\mathcal{A} = X_1 Y_1^*\) where \(X\) and \(Y^*\) are bases for \(\mathcal{W}\) and \(\mathcal{V}^*\).

Singular value decomposition

Mapping type: \(L(\mathcal{V}, \mathcal{W})\) or bilinear or sesquilinear forms
Restrict to orthonormal bases.
Canonical form: \[ \begin{bmatrix} \Sigma_1 & 0_{r \times (n-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (n-r)} \end{bmatrix} \] where \(\Sigma_1 = \operatorname{diag}(\sigma_1, \ldots, \sigma_r)\) with \(\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r > 0\) the nonzero singular values.

Decomposition: \(\mathcal{A} = U_1 \Sigma_1 V_1^*\) (economy SVD)

Jordan form

Mapping type: \(L(\mathcal{V}, \mathcal{V})\).
No restriction on basis.
Canonical form (almost all matrices): \[ \Lambda = \operatorname{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n), \] where \(\lambda_j\) are the eigenvalues (basis of eigenvectors).

Decomposition: \(\mathcal{A} = V \Lambda V^{-1}\)

Sometimes need generalized eigenvectors, which gives the more complicated Jordan form.

Schur form

Mapping type: \(L(\mathcal{V}, \mathcal{V})\) (over \(\mathbb{C}\)).
Restrict to orthonormal basis.
Canonical form (all matrices): \[ T \mbox{ upper triangular, i.e. } t_{ij} = 0 \mbox{ for } i > j. \]

Decomposition: \(\mathcal{A} = U T U^*\).

Prefixes of the basis vectors span invariant subspaces.

Sylvester inertia

Mapping type: Quadratic form \(\phi\).
No restriction on basis.
Canonical form: \[ \begin{bmatrix} I_{\nu_+} & 0 & 0 \\ 0 & 0_{\nu_0} & 0 \\ 0 & 0 & -I_{\nu_-} \end{bmatrix} \] where the triple \(\nu = (\nu_+, \nu_0, \nu_-)\) is Sylvester’s inertia.

Corresponds to decomposing the space into positive curvature, zero curvature, and negative curvature subspaces.

Symmetric eigendecomposition

Mapping type: Quadratic form \(\phi\).
No restriction on basis.
Canonical form: \[ \Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) \] where \(\lambda_1 \geq \lambda_2 \geq \ldots\) are eigenvalues and the first \(\nu_+\) of the eigenvalues are positive, the next \(\nu_0\) are zero, and the remaining \(\nu_-\) are negative. The basis is eigenvectors.

Decomposition: \(\phi(x) = x^T Q \Lambda Q^T x\) or \(\phi(Qy) = y^T \Lambda y\).

SVD and 2-norm

Unitary invariance

For \(U\) an orthonormal basis or a unitary matrix (columns are an orthonormal basis for \(\mathbb{C}^n\)): \(\|Ux\| = \|x\|\)

Therefore if \(A = U \Sigma V^*\) is the full SVD: \[ \frac{\|Ax\|_2}{\|x\|_2} = \frac{\|U\Sigma V^* x\|_2}{\|x\|_2} = \frac{\|\Sigma V^* x\|_2}{\|V^* x\|_2}. \]

SVD and 2-norm

Therefore, maximizing \(\|Ax\|_2/\|x\|_2\) is equivalent to maximizing \[ \sqrt{\|\Sigma y\|_2^2/\|y\|_2^2} = \sqrt{\sum_j \sigma_j^2 w_j} \] where the weights \(w_j = y_j^2/\|y\|^2\) are positive and sum to one. The maximum possible value of the weighted average is \(\sigma_1\)

Inverse norm

Similar logic on \(A^{-1} = V \Sigma^{-1} U^*\) gives \[ \|A^{-1}\|_2 = \sigma_n^{-1}. \]