CS 6210: Matrix Computations

Matrix representations and operations

David Bindel

2025-09-03

Matrix algebra and linear algebra

Linear algebra

Focus on:

Abstract vector spaces
Mappings on/between those spaces
Invariant properties

Matrix computations

Add concern with:

Matrix shapes
Graph of a matrix
Efficient representations

… all of which are basis-dependent!

Dense matrix basics

The basics

Build on Basic Linear Algebra Subroutines (BLAS):

Level 1: Vector operations (e.g. \(y^T x\))
Level 2: Matrix-vector operation (e.g. \(Ax\))
Level 3: Matrix-matrix operations (e.g. \(AB\))

Arithmetic costs are \(O(n^1)\), \(O(n^2)\), and \(O(n^3)\), respectively.

Computation and memory

Worry about two costs:

Storage used
Time used
- For arithmetic…
- and for data movement

Data movement time varies a lot, depending on

Data layout in memory
Access patterns in algorithms

Memory and meaning

Memory (linearly addressed) contains 1D floating point array:

Interpretation:

\[ x = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} \mbox{ or } A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \mbox{ or ...} \]

Column major and row major

Column major

\[ A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \] (\(a_{ij}\) at data[i+j*m])

Fortran, MATLAB, Julia, R, NumPy / SciPy, Eigen and Arbmadillo (C++)

Row major

\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]

(\(a_{ij}\) at data[j+i*n])

C and C++ (built in), Pascal

Column major in practice

let
    A = [1.0 3.0;
         2.0 4.0]
    A[:]  # View multidimensional array mem as vector
end

4-element Vector{Float64}:
 1.0
 2.0
 3.0
 4.0

So what?

Why do you need to know this?

Consider matvecs

Can be row-oriented (ij) or column-oriented (ji)

\[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{lightgray}{y_{2}} \\ \color{lightgray}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{red}{x_{2}} \\ \color{red}{x_{3}} \end{bmatrix} \] \[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{purple}{y_{2}} \\ \color{purple}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{lightgray}{x_{2}} \\ \color{lightgray}{x_{3}} \end{bmatrix} \]

Which is faster?

function matvec1_row(A, x)
  m, n = size(A)
  y = zeros(m)
  for i = 1:m
    for j = 1:n
      y[i] += A[i,j]*x[j]
    end
  end
  y
end

function matvec1_col(A, x)
  m, n = size(A)
  y = zeros(m)
  for j = 1:n
    for i = 1:m
      y[i] += A[i,j]*x[j]
    end
  end
  y
end

Consider \(m = n = 4000\) on my laptop (M1 Macbook Pro)
matvec1_row takes about 118 ms
matvec1_col takes about 14 ms
A*x takes about 3 ms

Why is it faster?

Cache locality takes advantage of locality of reference

Temporal locality: Access the same items close in time
Spatial locality: Access data in close proximity together
Cache hierarchy: small/fast to big/slow
Get a line at a time (say 64 bytes) – spatial locality
Try to keep recently-used data – temporal locality

Cache on my laptop

Numbers from one core on my laptop:

Registers (on chip): use immediately
L1 cache (128 KB): 3-4 cycles latency
L2 cache (12 MB, shared): 18 cycles latency
L3 cache (8 MB, shared): 18 cycle latency
M1 uses 128-byte cache lines (64 more common)
Main memory: worst case about 350 cycles
Note: Can launch several vector ops per cycle!

Explanation

At 4000-by-4000 8-byte floats, A takes about 120 MB
- \(10\times\) the biggest cache!
- So we are mostly going to main memory
matvec1_col accesses A with unit stride
- This is great for spatial locality!
- One cache line is 16 doubles
- So about an order of magnitude fewer cache misses
- And arithmetic is negligible compared to data transfers

Matrix-matrix products

\[c_{ij} = \sum_k a_{ik} b_{kj}\]

Not two, but six possible orderings of \(i, j, k\)
These provide some useful perspectives…

Matrix-matrix: inner products

ij(k) or ji(k): Each entry of \(C\) is a dot product of a row of \(A\) and a column of \(B\).

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]

Matrix-matrix: outer products

k(ij) or k(ji): \(C\) is a sum of outer products of column \(k\) of \(A\) and row \(k\) of \(B\).

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{purple}{c_{23}} \\ \color{purple}{c_{31}} & \color{purple}{c_{32}} & \color{purple}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{lightgray}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{lightgray}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]

Matrix-matrix: row matvecs

i(jk) or i(kj): Each row of \(C\) is a row of \(A\) multiplied by \(B\)

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{red}{b_{21}} & \color{red}{b_{22}} & \color{red}{b_{23}} \\ \color{red}{b_{31}} & \color{red}{b_{32}} & \color{red}{b_{33}} \end{bmatrix} \]

Matrix-matrix: col matvecs

j(ik) or j(ki): Each column of \(C\) is \(A\) times a column of \(B\)

\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{purple}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{purple}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{blue}{a_{21}} & \color{blue}{a_{22}} & \color{blue}{a_{23}} \\ \color{blue}{a_{31}} & \color{blue}{a_{32}} & \color{blue}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]

Matrix-matrix: blocking

Can do any traversal over \((i,j,k)\) index space
So organize around blocks: \(C_{ij} = \sum_k A_{ij} B_{jk}\)

\[ \begin{bmatrix} \color{purple}{C_{11}} & \color{blue}{C_{12}} \\ \color{red}{C_{21}} & \color{gray}{C_{22}} \end{bmatrix} = \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{blue}{c_{13}} & \color{blue}{c_{14}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{blue}{c_{23}} & \color{blue}{c_{24}} \\ \color{red}{c_{31}} & \color{red}{c_{32}} & \color{gray}{c_{33}} & \color{gray}{c_{34}} \\ \color{red}{c_{41}} & \color{red}{c_{42}} & \color{gray}{c_{43}} & \color{gray}{c_{44}} \end{bmatrix} \]

Q: Why is this a useful idea?

Blocking and cache

For spatial locality, want unit stride access
For temporal locality, want lots of re-use
- Requires re-use in computation and good access pattern!
- Measure re-use via arithmetic intensity: flops/memory ops
Idea: use small blocks that fit in cache
- Simple model: \(b\)-by-\(b\) blocks, explicit cache control
- Memory transfers to multiply blocks: \(O(b^2)\)
- Arithmetic costs to multiply blocks: \(O(b^3)\)
Block recursively for better use at multiple cache levels

Blocking and linear algebra

For \(\mathcal{A} \in L(\mathcal{V}_1 \oplus \mathcal{V}_2, \mathcal{W}_1 \oplus \mathcal{W}_2)\): \[ w = \mathcal{A} v \equiv \begin{bmatrix} w_1 \\ w_2 \end{bmatrix} = \begin{bmatrix} \mathcal{A}_{11} & \mathcal{A}_{12} \\ \mathcal{A}_{21} & \mathcal{A}_{22} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \] where \[ v_i = \Pi_{\mathcal{V}_i} v, \quad w_j = \Pi_{\mathcal{W}_j} w, \quad \mathcal{A}_{ij} = \Pi_{\mathcal{W}_i} \mathcal{A} |_{\mathcal{V}_j} \] Given bases \(V_1\), \(V_2\), and \(V = \begin{bmatrix} V_1 & V_2 \end{bmatrix}\) for \(\mathcal{V}_1\), \(\mathcal{V}_2\) and \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\) (and similarly for \(\mathcal{W}\)), get matrix representation with same block structure as above.

Performance the lazy way

How LAPACK does it:

Organize around block matrices (subspace sums)
Do as much as possible with level 3 BLAS calls
- Matrix-matrix multiply, rank \(k\) update, etc
Use someone else’s well-tuned BLAS library

Matrix shapes and structures

Linear algebra structures

Invariant under appropriate change of basis:

\(A \in \mathbb{R}^{n \times n}\) is nonsingular if \(A^{-1}\) exists; otherwise singular.
\(Q \in \mathbb{R}^{n \times n}\) is orthogonal if \(Q^T Q = I\).
\(A \in \mathbb{R}^{n \times n}\) is symmetric if \(A = A^T\).
\(S \in \mathbb{R}^{n \times n}\) is skew-symmetric if \(S = -S^T\).
\(L \in \mathbb{R}^{n \times m}\) is low rank if \(L = UV^T\) for \(U \in \mathbb{R}^{n \times k}\) and \(V \in \mathbb{R}^{m \times k}\) where \(k \ll \min(m,n)\).

Reflect fundamental properties of maps, operators, forms.

Matrix structures

Shape (nonzero structure) depends on basis.
These are not linear algebraic structures!
But they are very useful in matrix computations…

Matrix structures

\(D\) is diagonal if \(d_{ij} = 0\) for \(i \neq j\).

\[\begin{bmatrix} d_1 & \color{lightgray}{0} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & d_2 & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{lightgray}{0} & d_3 & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{lightgray}{0} & d_4 \end{bmatrix} \rightarrow \begin{bmatrix} \times & & & \\ & \times & & \\ & & \times & \\ & & & \times \end{bmatrix} \]

Diagramming convention (“spy plots”):

\(\times\) (or something) for structural nonzeros
Blank space for zeros (or a zero symbol)

Matrix structures

\(T\) is tridiagonal if \(t_{ij} = 0\) for \(i \not \in \{j-1, j, j+1\}\).

\[ \begin{bmatrix} \times & \times & & & \\ \times & \times & \times & & \\ & \times & \times & \times & \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]

Matrix structures

\(U\) is upper bidiagonal if only main diagonal and first superdiagonal are nonzero (similar with lower bidiagonal).

\[ \begin{bmatrix} \times & \times & & & \\ & \times & \times & & \\ & & \times & \times & \\ & & & \times & \times \\ & & & & \times \end{bmatrix} \]

Matrix structures

\(U\) is upper triangular if \(u_{ij} = 0\) for \(i > j\) and strictly upper triangular if \(u_{ij} = 0\) for \(i \geq j\) (lower triangular and strictly lower triangular are similarly defined).

\[ \begin{bmatrix} \times & \times & \times & \times & \times \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \\ & & & & \times \end{bmatrix} , \begin{bmatrix} & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \\ & & & & \times \\ & & & & \end{bmatrix} \]

Matrix structures

\(H\) is upper Hessenberg if \(h_{ij} = 0\) for \(i > j+1\).

\[ \begin{bmatrix} \times & \times & \times & \times & \times \\ \times & \times & \times & \times & \times \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]

Matrix structures

\(B\) is banded if \(b_{ij} = 0\) for \(-b_l \leq j-i \leq b_{u}\).

\[ \begin{bmatrix} \times & \times & \times & & \\ \times & \times & \times & \times & \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]

Sparse formats

Matrix is sparse if most entries are zero
Diagonal, tridiagonal, band matrices are examples
\(\operatorname{nnz}(A)\) is number of nonzeros in \(A\)
Cost of storage, multiply is \(O(\operatorname{nnz}(A))\)

Diagonal storage

4×4 Diagonal{Float64, Vector{Float64}}:
 1.0   ⋅    ⋅    ⋅ 
  ⋅   2.0   ⋅    ⋅ 
  ⋅    ⋅   3.0   ⋅ 
  ⋅    ⋅    ⋅   4.0

Only explicitly store diagonal entries
Equivalent: Diagonal(d) * x and d .* x

Tridiagonal storage

4×4 SymTridiagonal{Float64, Vector{Float64}}:
 1.0  5.0   ⋅    ⋅ 
 5.0  2.0  6.0   ⋅ 
  ⋅   6.0  3.0  7.0
  ⋅    ⋅   7.0  4.0

Have Tridiagonal and SymTridiagonal variants
Only explicitly store diagonal and first super/sub diagonal

Q: How would we write T*x via diagonal/superdiagonal?

Tridiagonal storage

let
    d, du = [1.0; 2.0; 3.0; 4.0], [5.0; 6.0; 7.0]
    T = SymTridiagonal(d, du)
    x = [9.0; 10.0; 11.0; 12.0]

    # Use Julia built-in for tridiagonal
    y1 = T*x

    # Implement multiply directly with d, du
    y2 = d .* x                       # Diagonal
    y2[2:end]   .+= du .* x[1:end-1]  # Subdiagonal
    y2[1:end-1] .+= du .* x[2:end]    # Superdiagonal
    
    # Check equivalence
    y1 == y2
end

true

Band storage

Store \(a_{ij}\) at B[j,j-i+bl+1 (index by col, diagonal) \[ \begin{bmatrix} a_{11} & \color{green}{a_{12}} & \color{blue}{a_{13}} & & \\ \color{red}{a_{21}} & a_{22} & \color{green}{a_{23}} & \color{blue}{a_{24}} & \\ & \color{red}{a_{32}} & a_{33} & \color{green}{a_{34}} & \color{blue}{a_{35}} \\ & & \color{red}{a_{43}} & a_{44} & \color{green}{a_{45}} \\ & & & \color{red}{a_{54}} & a_{55} \end{bmatrix} \rightarrow \begin{bmatrix} \color{red}{a_{21}} & a_{11} & & \\ \color{red}{a_{32}} & a_{22} & \color{green}{a_{12}} & \\ \color{red}{a_{43}} & a_{33} & \color{green}{a_{23}} & \color{blue}{a_{35}} \\ \color{red}{a_{52}} & a_{44} & \color{green}{a_{34}} & \color{blue}{a_{45}} \\ & a_{55} & \color{green}{a_{45}} & \color{blue}{a_{55}} \end{bmatrix} \]

No band matrix type built into Julia base LinearAlgebra
But there is a package! (and can call LAPACK directly)

Permutations

Consider \((Pv)_i = v_{\pi(i)}\):

\(P\) a sparse matrix with \(p_{ij} = \delta_{\pi(i),j}\)
Can store as integer vector \(\begin{bmatrix} \pi(1) & \ldots & \pi(n) \end{bmatrix}^T\):
```
Pv = v[pi]
v[pi] = Pv
```
Not the only compact representation!
- Ex: LAPACK partial pivot uses product of transpositions

General sparse matrix formats

General sparse formats: store nonzero locations explicitly
Coordinate form: \((i, j, a_{ij})\) tuples (usu in parallel arrays)
- Entries \((i, j, a_{ij})\) and \((i, j, a_{ij}')\) implicitly summed
- Fast for adding new nonzeros
- Slow for matrix-vector products
Main representation: Compressed Sparse Column (CSC)
- Fast (vs COO) for matrix-vector products
- Slow for adding new nonzeros

General sparse matrix formats (COO)

\[ \begin{bmatrix} a_{11} & & & a_{14} \\ & a_{22} & & a_{24} \\ & & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]

General sparse matrix formats (CSC)

\[ \begin{bmatrix} a_{11} & & & a_{14} \\ & a_{22} & & a_{24} \\ & & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]

CSC construction

# using SparseArrays
let
    i = [1; 4; 2; 4; 3; 4; 1; 2; 3; 4]
    j = [1; 1; 2; 2; 3; 3; 4; 4; 4; 4]
    data = [1.0; 1.1; 2.0; 2.1; 3.0; 3.1; 4.0; 4.1; 4.2; 4.3]
    A = sparse(i,j,data)
end

4×4 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
 1.0   ⋅    ⋅   4.0
  ⋅   2.0   ⋅   4.1
  ⋅    ⋅   3.0  4.2
 1.1  2.1  3.1  4.3

Could then multiply with A*x

Q: How is A*x implemented?

CSC multiply

Column-oriented layout, so column-oriented matvec!

function matvec(A :: SparseMatrixCSC, x :: AbstractVector)
    colptr, rowval, nzval = A.colptr, A.rowval, A.nzval
    m, n = size(A)
    y = zeros(m)
    for j = 1:n
        lo, hi = colptr[j], colptr[j+1]-1
        y[rowval[lo:hi]] .+= nzval[lo:hi] .* x[j]
    end
    y
end

Optional type annotations (e.g. A :: SparseMatrixCSC)
Last expression in function is returned

A slightly bigger example

\[ \begin{bmatrix} \times & \times & & \times & & & & & \\ \times & \times & \times & & \times & & & & \\ & \times & \times & & & \times & & & \\ \times & & & \times & \times & & \times & & \\ & \times & & \times & \times & \times & & \times & \\ & & \times & & \times & \times & & & \times \\ & & & \times & & & \times & \times & \\ & & & & \times & & \times & \times & \times \\ & & & & & \times & & \times & \times \end{bmatrix} \]

Graph of a matrix

Nodes are rows/columns, edge \((i,j)\) if \(a_{ij} \neq 0\).

Data-sparse matrices

What is data sparsity?

\(A\) is sparse: \(\operatorname{nnz}(A) \ll mn\)
\(A\) is data sparse: Represent with \(\ll mn\) parameters
- Ordinary sparsity is a special case
- More generally involves formulas for elements
Data sparse representations often allow fast matvecs

Low-rank

If \(A \in \mathbb{R}^{m \times n}\) has rank \(k\), can write as \[ A = UV^T, \quad U \in \mathbb{R}^{m \times k}, V \in \mathbb{R}^{n \times k}. \] Fast matvecs using associativity:

y = (U*V')*x   # O(mn) storage, O(mnk) flops
y = U*(V'*x)   # O((m+n) k) storage and flops

Toeplitz

\[ A = \begin{bmatrix} a_0 & a_1 & a_2 & a_3 \\ a_{-1} & a_0 & a_1 & a_2 \\ a_{-2} & a_{-1} & a_0 & a_1 \\ a_{-3} & a_{-2} & a_{-1} & a_0 \end{bmatrix} \]

Toeplitz matrices have constant diagonals
Related: Hankel matrices (constant anti diagonals)

Circulant

\[ C = \begin{bmatrix} \color{red}{a_0 } & \color{red}{a_1 } & \color{red}{a_2 } & \color{red}{a_3} & a_{-3} & a_{-2} & a_{-1}\\ \color{red}{a_{-1}} & \color{red}{a_0 } & \color{red}{a_1 } & \color{red}{a_2} & a_{3} & a_{-3} & a_{-2}\\ \color{red}{a_{-2}} & \color{red}{a_{-1}} & \color{red}{a_0 } & \color{red}{a_1} & a_{2} & a_{3} & a_{-3} \\ \color{red}{a_{-3}} & \color{red}{a_{-2}} & \color{red}{a_{-1}} & \color{red}{a_0} & a_{1} & a_{2} & a_{3} \\ a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 & a_1 & a_2 \\ a_2 & a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 & a_1 \\ a_1 & a_2 & a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 \\ \end{bmatrix} \]

Can embed any Toeplitz in a circulant matrix
Fast matvec with \(C\) yields fast matvec with \(A\) (how?)

Circulants and permutations

Write previous circulant concisely as \[ C = \sum_{k=-3}^3 a_k P^{-k}, \] where \(P\) is the forward cyclic permutation matrix \[ P_{ij} = \begin{cases} 1, & i = j+1 \mod n \\ 0, & \mbox{otherwise} \end{cases} \] FFT diagonalizes \(P\) (and therefore \(C\)) – more later in semester

Kronecker products

\[ A \otimes B = \begin{bmatrix} a_{11} B & a_{12} B & \ldots & a_{1n} B \\ a_{21} B & a_{22} B & \ldots & a_{2n} B \\ \vdots & \vdots & & \vdots \\ a_{m1} B & a_{m2} B & \ldots & a_{mn} B \end{bmatrix}. \] Multiplication represents vector triple product \[ (A \otimes B) \operatorname{vec}(X) = \operatorname{vec}(BXA^T) \] where \(\operatorname{vec}(X)\) represents X[:] (column-major flattening)

Low-rank block structure

Many matrices have (nearly) low-rank blocks. Ex:

let
    T = Tridiagonal(rand(9), rand(10), rand(9))
    B = inv(T)
    U, s, V = svd(B[1:4,5:10])  # Superdiagonal block is rank 1
    s
end

4-element Vector{Float64}:
 13.861349606169064
  2.3147718848826372e-15
  5.145067223414744e-16
  1.139063951832993e-16

Low-rank block representation yields fast matvecs
Basis for fast multipole method (FMM)
Also fast PDE solvers, methods in controls, …

Punchline

Types of structure

Linear algebraic structure (low rank, orthogonality)
Matrix “shapes” (nonzero pattern) as structure
- Equivalent to specifying a graph structure
Other types of “data sparse” structure

Simple representations

Full dense matrices (column major)
- Operations may involve lots of arithmetic
- But fast with tuned level 3 BLAS, blocking
Special sparse matrices
- Bidiagonal, tridiagonal, band
- Indexing can be done implicitly

General sparse and data sparse

General sparse matrices
- Assume \(\operatorname{nnz}(A) \ll mn\)
- Explicit structure for graph (indexing)
Data sparse matrices
- Assume \(\ll mn\) parameters describe \(A\)
- Sparse matrices are a special case
- May give a “black box” for fast matvecs