CS 6210: Matrix Computations

Matrix representations and operations

David Bindel

2025-09-03

Matrix algebra and linear algebra

Linear algebra

Focus on:

  • Abstract vector spaces
  • Mappings on/between those spaces
  • Invariant properties

Matrix computations

Add concern with:

  • Matrix shapes
  • Graph of a matrix
  • Efficient representations

… all of which are basis-dependent!

Dense matrix basics

The basics

Build on Basic Linear Algebra Subroutines (BLAS):

  • Level 1: Vector operations (e.g. \(y^T x\))
  • Level 2: Matrix-vector operation (e.g. \(Ax\))
  • Level 3: Matrix-matrix operations (e.g. \(AB\))

Arithmetic costs are \(O(n^1)\), \(O(n^2)\), and \(O(n^3)\), respectively.

Computation and memory

Worry about two costs:

  • Storage used
  • Time used
    • For arithmetic…
    • and for data movement

Data movement time varies a lot, depending on

  • Data layout in memory
  • Access patterns in algorithms

Memory and meaning

Memory (linearly addressed) contains 1D floating point array:

1.0
2.0
3.0
4.0

Interpretation:

\[ x = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} \mbox{ or } A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \mbox{ or ...} \]

Column major and row major

1.0
2.0
3.0
4.0

Column major

\[ A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \] (\(a_{ij}\) at data[i+j*m])

Fortran, MATLAB, Julia, R, NumPy / SciPy, Eigen and Arbmadillo (C++)

Row major

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]

(\(a_{ij}\) at data[j+i*n])

C and C++ (built in), Pascal

Column major in practice

let
    A = [1.0 3.0;
         2.0 4.0]
    A[:]  # View multidimensional array mem as vector
end
4-element Vector{Float64}:
 1.0
 2.0
 3.0
 4.0

So what?

Why do you need to know this?

Consider matvecs

Can be row-oriented (ij) or column-oriented (ji)

\[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{lightgray}{y_{2}} \\ \color{lightgray}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{red}{x_{2}} \\ \color{red}{x_{3}} \end{bmatrix} \] \[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{purple}{y_{2}} \\ \color{purple}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{lightgray}{x_{2}} \\ \color{lightgray}{x_{3}} \end{bmatrix} \]

Which is faster?

function matvec1_row(A, x)
  m, n = size(A)
  y = zeros(m)
  for i = 1:m
    for j = 1:n
      y[i] += A[i,j]*x[j]
    end
  end
  y
end
function matvec1_col(A, x)
  m, n = size(A)
  y = zeros(m)
  for j = 1:n
    for i = 1:m
      y[i] += A[i,j]*x[j]
    end
  end
  y
end
  • Consider \(m = n = 4000\) on my laptop (M1 Macbook Pro)
  • matvec1_row takes about 118 ms
  • matvec1_col takes about 14 ms
  • A*x takes about 3 ms

Why is it faster?

Cache locality takes advantage of locality of reference

  • Temporal locality: Access the same items close in time
  • Spatial locality: Access data in close proximity together
  • Cache hierarchy: small/fast to big/slow
  • Get a line at a time (say 64 bytes) – spatial locality
  • Try to keep recently-used data – temporal locality

Cache on my laptop

Numbers from one core on my laptop:

  • Registers (on chip): use immediately
  • L1 cache (128 KB): 3-4 cycles latency
  • L2 cache (12 MB, shared): 18 cycles latency
  • L3 cache (8 MB, shared): 18 cycle latency
  • M1 uses 128-byte cache lines (64 more common)
  • Main memory: worst case about 350 cycles
  • Note: Can launch several vector ops per cycle!

Explanation

  • At 4000-by-4000 8-byte floats, A takes about 120 MB
    • \(10\times\) the biggest cache!
    • So we are mostly going to main memory
  • matvec1_col accesses A with unit stride
    • This is great for spatial locality!
    • One cache line is 16 doubles
    • So about an order of magnitude fewer cache misses
    • And arithmetic is negligible compared to data transfers

Matrix-matrix products

\[c_{ij} = \sum_k a_{ik} b_{kj}\]

  • Not two, but six possible orderings of \(i, j, k\)
  • These provide some useful perspectives…

Matrix-matrix: inner products

ij(k) or ji(k): Each entry of \(C\) is a dot product of a row of \(A\) and a column of \(B\).

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]

Matrix-matrix: outer products

k(ij) or k(ji): \(C\) is a sum of outer products of column \(k\) of \(A\) and row \(k\) of \(B\).

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{purple}{c_{23}} \\ \color{purple}{c_{31}} & \color{purple}{c_{32}} & \color{purple}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{lightgray}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{lightgray}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]

Matrix-matrix: row matvecs

i(jk) or i(kj): Each row of \(C\) is a row of \(A\) multiplied by \(B\)

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{red}{b_{21}} & \color{red}{b_{22}} & \color{red}{b_{23}} \\ \color{red}{b_{31}} & \color{red}{b_{32}} & \color{red}{b_{33}} \end{bmatrix} \]

Matrix-matrix: col matvecs

j(ik) or j(ki): Each column of \(C\) is \(A\) times a column of \(B\)

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{purple}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{purple}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{blue}{a_{21}} & \color{blue}{a_{22}} & \color{blue}{a_{23}} \\ \color{blue}{a_{31}} & \color{blue}{a_{32}} & \color{blue}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]

Matrix-matrix: blocking

  • Can do any traversal over \((i,j,k)\) index space
  • So organize around blocks: \(C_{ij} = \sum_k A_{ij} B_{jk}\)

\[ \begin{bmatrix} \color{purple}{C_{11}} & \color{blue}{C_{12}} \\ \color{red}{C_{21}} & \color{gray}{C_{22}} \end{bmatrix} = \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{blue}{c_{13}} & \color{blue}{c_{14}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{blue}{c_{23}} & \color{blue}{c_{24}} \\ \color{red}{c_{31}} & \color{red}{c_{32}} & \color{gray}{c_{33}} & \color{gray}{c_{34}} \\ \color{red}{c_{41}} & \color{red}{c_{42}} & \color{gray}{c_{43}} & \color{gray}{c_{44}} \end{bmatrix} \]

Q: Why is this a useful idea?

Blocking and cache

  • For spatial locality, want unit stride access
  • For temporal locality, want lots of re-use
    • Requires re-use in computation and good access pattern!
    • Measure re-use via arithmetic intensity: flops/memory ops
  • Idea: use small blocks that fit in cache
    • Simple model: \(b\)-by-\(b\) blocks, explicit cache control
    • Memory transfers to multiply blocks: \(O(b^2)\)
    • Arithmetic costs to multiply blocks: \(O(b^3)\)
  • Block recursively for better use at multiple cache levels

Blocking and linear algebra

For \(\mathcal{A} \in L(\mathcal{V}_1 \oplus \mathcal{V}_2, \mathcal{W}_1 \oplus \mathcal{W}_2)\): \[ w = \mathcal{A} v \equiv \begin{bmatrix} w_1 \\ w_2 \end{bmatrix} = \begin{bmatrix} \mathcal{A}_{11} & \mathcal{A}_{12} \\ \mathcal{A}_{21} & \mathcal{A}_{22} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \] where \[ v_i = \Pi_{\mathcal{V}_i} v, \quad w_j = \Pi_{\mathcal{W}_j} w, \quad \mathcal{A}_{ij} = \Pi_{\mathcal{W}_i} \mathcal{A} |_{\mathcal{V}_j} \] Given bases \(V_1\), \(V_2\), and \(V = \begin{bmatrix} V_1 & V_2 \end{bmatrix}\) for \(\mathcal{V}_1\), \(\mathcal{V}_2\) and \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\) (and similarly for \(\mathcal{W}\)), get matrix representation with same block structure as above.

Performance the lazy way

How LAPACK does it:

  • Organize around block matrices (subspace sums)
  • Do as much as possible with level 3 BLAS calls
    • Matrix-matrix multiply, rank \(k\) update, etc
  • Use someone else’s well-tuned BLAS library

Matrix shapes and structures

Linear algebra structures

Invariant under appropriate change of basis:

  • \(A \in \mathbb{R}^{n \times n}\) is nonsingular if \(A^{-1}\) exists; otherwise singular.

  • \(Q \in \mathbb{R}^{n \times n}\) is orthogonal if \(Q^T Q = I\).

  • \(A \in \mathbb{R}^{n \times n}\) is symmetric if \(A = A^T\).

  • \(S \in \mathbb{R}^{n \times n}\) is skew-symmetric if \(S = -S^T\).

  • \(L \in \mathbb{R}^{n \times m}\) is low rank if \(L = UV^T\) for \(U \in \mathbb{R}^{n \times k}\) and \(V \in \mathbb{R}^{m \times k}\) where \(k \ll \min(m,n)\).

Reflect fundamental properties of maps, operators, forms.

Matrix structures

  • Shape (nonzero structure) depends on basis.
  • These are not linear algebraic structures!
  • But they are very useful in matrix computations…

Matrix structures

\(D\) is diagonal if \(d_{ij} = 0\) for \(i \neq j\).

\[\begin{bmatrix} d_1 & \color{lightgray}{0} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & d_2 & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{lightgray}{0} & d_3 & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{lightgray}{0} & d_4 \end{bmatrix} \rightarrow \begin{bmatrix} \times & & & \\ & \times & & \\ & & \times & \\ & & & \times \end{bmatrix} \]

Diagramming convention (“spy plots”):

  • \(\times\) (or something) for structural nonzeros
  • Blank space for zeros (or a zero symbol)

Matrix structures

\(T\) is tridiagonal if \(t_{ij} = 0\) for \(i \not \in \{j-1, j, j+1\}\).

\[ \begin{bmatrix} \times & \times & & & \\ \times & \times & \times & & \\ & \times & \times & \times & \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]

Matrix structures

\(U\) is upper bidiagonal if only main diagonal and first superdiagonal are nonzero (similar with lower bidiagonal).

\[ \begin{bmatrix} \times & \times & & & \\ & \times & \times & & \\ & & \times & \times & \\ & & & \times & \times \\ & & & & \times \end{bmatrix} \]

Matrix structures

\(U\) is upper triangular if \(u_{ij} = 0\) for \(i > j\) and strictly upper triangular if \(u_{ij} = 0\) for \(i \geq j\) (lower triangular and strictly lower triangular are similarly defined).

\[ \begin{bmatrix} \times & \times & \times & \times & \times \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \\ & & & & \times \end{bmatrix} , \begin{bmatrix} & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \\ & & & & \times \\ & & & & \end{bmatrix} \]

Matrix structures

\(H\) is upper Hessenberg if \(h_{ij} = 0\) for \(i > j+1\).

\[ \begin{bmatrix} \times & \times & \times & \times & \times \\ \times & \times & \times & \times & \times \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]

Matrix structures

\(B\) is banded if \(b_{ij} = 0\) for \(-b_l \leq j-i \leq b_{u}\).

\[ \begin{bmatrix} \times & \times & \times & & \\ \times & \times & \times & \times & \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]

Sparse formats

Sparse formats

  • Matrix is sparse if most entries are zero
  • Diagonal, tridiagonal, band matrices are examples
  • \(\operatorname{nnz}(A)\) is number of nonzeros in \(A\)
  • Cost of storage, multiply is \(O(\operatorname{nnz}(A))\)

Diagonal storage

4×4 Diagonal{Float64, Vector{Float64}}:
 1.0   ⋅    ⋅    ⋅ 
  ⋅   2.0   ⋅    ⋅ 
  ⋅    ⋅   3.0   ⋅ 
  ⋅    ⋅    ⋅   4.0
  • Only explicitly store diagonal entries
  • Equivalent: Diagonal(d) * x and d .* x

Tridiagonal storage

4×4 SymTridiagonal{Float64, Vector{Float64}}:
 1.0  5.0   ⋅    ⋅ 
 5.0  2.0  6.0   ⋅ 
  ⋅   6.0  3.0  7.0
  ⋅    ⋅   7.0  4.0
  • Have Tridiagonal and SymTridiagonal variants
  • Only explicitly store diagonal and first super/sub diagonal

Q: How would we write T*x via diagonal/superdiagonal?

Tridiagonal storage

let
    d, du = [1.0; 2.0; 3.0; 4.0], [5.0; 6.0; 7.0]
    T = SymTridiagonal(d, du)
    x = [9.0; 10.0; 11.0; 12.0]

    # Use Julia built-in for tridiagonal
    y1 = T*x

    # Implement multiply directly with d, du
    y2 = d .* x                       # Diagonal
    y2[2:end]   .+= du .* x[1:end-1]  # Subdiagonal
    y2[1:end-1] .+= du .* x[2:end]    # Superdiagonal
    
    # Check equivalence
    y1 == y2
end
true

Band storage

Store \(a_{ij}\) at B[j,j-i+bl+1 (index by col, diagonal) \[ \begin{bmatrix} a_{11} & \color{green}{a_{12}} & \color{blue}{a_{13}} & & \\ \color{red}{a_{21}} & a_{22} & \color{green}{a_{23}} & \color{blue}{a_{24}} & \\ & \color{red}{a_{32}} & a_{33} & \color{green}{a_{34}} & \color{blue}{a_{35}} \\ & & \color{red}{a_{43}} & a_{44} & \color{green}{a_{45}} \\ & & & \color{red}{a_{54}} & a_{55} \end{bmatrix} \rightarrow \begin{bmatrix} \color{red}{a_{21}} & a_{11} & & \\ \color{red}{a_{32}} & a_{22} & \color{green}{a_{12}} & \\ \color{red}{a_{43}} & a_{33} & \color{green}{a_{23}} & \color{blue}{a_{35}} \\ \color{red}{a_{52}} & a_{44} & \color{green}{a_{34}} & \color{blue}{a_{45}} \\ & a_{55} & \color{green}{a_{45}} & \color{blue}{a_{55}} \end{bmatrix} \]

  • No band matrix type built into Julia base LinearAlgebra
  • But there is a package! (and can call LAPACK directly)

Permutations

Consider \((Pv)_i = v_{\pi(i)}\):

  • \(P\) a sparse matrix with \(p_{ij} = \delta_{\pi(i),j}\)

  • Can store as integer vector \(\begin{bmatrix} \pi(1) & \ldots & \pi(n) \end{bmatrix}^T\):

    Pv = v[pi]
    v[pi] = Pv
  • Not the only compact representation!

    • Ex: LAPACK partial pivot uses product of transpositions

General sparse matrix formats

  • General sparse formats: store nonzero locations explicitly
  • Coordinate form: \((i, j, a_{ij})\) tuples (usu in parallel arrays)
    • Entries \((i, j, a_{ij})\) and \((i, j, a_{ij}')\) implicitly summed
    • Fast for adding new nonzeros
    • Slow for matrix-vector products
  • Main representation: Compressed Sparse Column (CSC)
    • Fast (vs COO) for matrix-vector products
    • Slow for adding new nonzeros

General sparse matrix formats (COO)

\[ \begin{bmatrix} a_{11} & & & a_{14} \\ & a_{22} & & a_{24} \\ & & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]

i
1
4
2
4
3
4
1
2
3
4
j
1
1
2
2
3
3
4
4
4
4
data
a11
a41
a22
a42
a32
a43
a14
a24
a34
a44

General sparse matrix formats (CSC)

\[ \begin{bmatrix} a_{11} & & & a_{14} \\ & a_{22} & & a_{24} \\ & & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]

colptr
1
3
5
7
11
rowval
1
4
2
4
3
4
1
2
3
4
nzval
a11
a41
a22
a42
a32
a43
a14
a24
a34
a44

CSC construction

# using SparseArrays
let
    i = [1; 4; 2; 4; 3; 4; 1; 2; 3; 4]
    j = [1; 1; 2; 2; 3; 3; 4; 4; 4; 4]
    data = [1.0; 1.1; 2.0; 2.1; 3.0; 3.1; 4.0; 4.1; 4.2; 4.3]
    A = sparse(i,j,data)
end
4×4 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
 1.0   ⋅    ⋅   4.0
  ⋅   2.0   ⋅   4.1
  ⋅    ⋅   3.0  4.2
 1.1  2.1  3.1  4.3

Could then multiply with A*x

Q: How is A*x implemented?

CSC multiply

Column-oriented layout, so column-oriented matvec!

function matvec(A :: SparseMatrixCSC, x :: AbstractVector)
    colptr, rowval, nzval = A.colptr, A.rowval, A.nzval
    m, n = size(A)
    y = zeros(m)
    for j = 1:n
        lo, hi = colptr[j], colptr[j+1]-1
        y[rowval[lo:hi]] .+= nzval[lo:hi] .* x[j]
    end
    y
end
  • Optional type annotations (e.g. A :: SparseMatrixCSC)
  • Last expression in function is returned

A slightly bigger example

\[ \begin{bmatrix} \times & \times & & \times & & & & & \\ \times & \times & \times & & \times & & & & \\ & \times & \times & & & \times & & & \\ \times & & & \times & \times & & \times & & \\ & \times & & \times & \times & \times & & \times & \\ & & \times & & \times & \times & & & \times \\ & & & \times & & & \times & \times & \\ & & & & \times & & \times & \times & \times \\ & & & & & \times & & \times & \times \end{bmatrix} \]

Graph of a matrix

Nodes are rows/columns, edge \((i,j)\) if \(a_{ij} \neq 0\).

G 1 1 2 2 1--2 4 4 1--4 3 3 2--3 5 5 2--5 6 6 3--6 4--5 7 7 4--7 5--6 8 8 5--8 9 9 6--9 7--8 8--9

Data-sparse matrices

What is data sparsity?

  • \(A\) is sparse: \(\operatorname{nnz}(A) \ll mn\)
  • \(A\) is data sparse: Represent with \(\ll mn\) parameters
    • Ordinary sparsity is a special case
    • More generally involves formulas for elements
  • Data sparse representations often allow fast matvecs

Low-rank

If \(A \in \mathbb{R}^{m \times n}\) has rank \(k\), can write as \[ A = UV^T, \quad U \in \mathbb{R}^{m \times k}, V \in \mathbb{R}^{n \times k}. \] Fast matvecs using associativity:

y = (U*V')*x   # O(mn) storage, O(mnk) flops
y = U*(V'*x)   # O((m+n) k) storage and flops

Toeplitz

\[ A = \begin{bmatrix} a_0 & a_1 & a_2 & a_3 \\ a_{-1} & a_0 & a_1 & a_2 \\ a_{-2} & a_{-1} & a_0 & a_1 \\ a_{-3} & a_{-2} & a_{-1} & a_0 \end{bmatrix} \]

  • Toeplitz matrices have constant diagonals
  • Related: Hankel matrices (constant anti diagonals)

Circulant

\[ C = \begin{bmatrix} \color{red}{a_0 } & \color{red}{a_1 } & \color{red}{a_2 } & \color{red}{a_3} & a_{-3} & a_{-2} & a_{-1}\\ \color{red}{a_{-1}} & \color{red}{a_0 } & \color{red}{a_1 } & \color{red}{a_2} & a_{3} & a_{-3} & a_{-2}\\ \color{red}{a_{-2}} & \color{red}{a_{-1}} & \color{red}{a_0 } & \color{red}{a_1} & a_{2} & a_{3} & a_{-3} \\ \color{red}{a_{-3}} & \color{red}{a_{-2}} & \color{red}{a_{-1}} & \color{red}{a_0} & a_{1} & a_{2} & a_{3} \\ a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 & a_1 & a_2 \\ a_2 & a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 & a_1 \\ a_1 & a_2 & a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 \\ \end{bmatrix} \]

  • Can embed any Toeplitz in a circulant matrix
  • Fast matvec with \(C\) yields fast matvec with \(A\) (how?)

Circulants and permutations

Write previous circulant concisely as \[ C = \sum_{k=-3}^3 a_k P^{-k}, \] where \(P\) is the forward cyclic permutation matrix \[ P_{ij} = \begin{cases} 1, & i = j+1 \mod n \\ 0, & \mbox{otherwise} \end{cases} \] FFT diagonalizes \(P\) (and therefore \(C\)) – more later in semester

Kronecker products

\[ A \otimes B = \begin{bmatrix} a_{11} B & a_{12} B & \ldots & a_{1n} B \\ a_{21} B & a_{22} B & \ldots & a_{2n} B \\ \vdots & \vdots & & \vdots \\ a_{m1} B & a_{m2} B & \ldots & a_{mn} B \end{bmatrix}. \] Multiplication represents vector triple product \[ (A \otimes B) \operatorname{vec}(X) = \operatorname{vec}(BXA^T) \] where \(\operatorname{vec}(X)\) represents X[:] (column-major flattening)

Low-rank block structure

Many matrices have (nearly) low-rank blocks. Ex:

let
    T = Tridiagonal(rand(9), rand(10), rand(9))
    B = inv(T)
    U, s, V = svd(B[1:4,5:10])  # Superdiagonal block is rank 1
    s
end
4-element Vector{Float64}:
 13.861349606169064
  2.3147718848826372e-15
  5.145067223414744e-16
  1.139063951832993e-16
  • Low-rank block representation yields fast matvecs
  • Basis for fast multipole method (FMM)
  • Also fast PDE solvers, methods in controls, …

Punchline

Types of structure

  • Linear algebraic structure (low rank, orthogonality)
  • Matrix “shapes” (nonzero pattern) as structure
    • Equivalent to specifying a graph structure
  • Other types of “data sparse” structure

Simple representations

  • Full dense matrices (column major)
    • Operations may involve lots of arithmetic
    • But fast with tuned level 3 BLAS, blocking
  • Special sparse matrices
    • Bidiagonal, tridiagonal, band
    • Indexing can be done implicitly

General sparse and data sparse

  • General sparse matrices
    • Assume \(\operatorname{nnz}(A) \ll mn\)
    • Explicit structure for graph (indexing)
  • Data sparse matrices
    • Assume \(\ll mn\) parameters describe \(A\)
    • Sparse matrices are a special case
    • May give a “black box” for fast matvecs