Matrix representations and operations
2025-09-03
Focus on:
Add concern with:
… all of which are basis-dependent!
Build on Basic Linear Algebra Subroutines (BLAS):
Arithmetic costs are \(O(n^1)\), \(O(n^2)\), and \(O(n^3)\), respectively.
Worry about two costs:
Data movement time varies a lot, depending on
Memory (linearly addressed) contains 1D floating point array:
Interpretation:
\[ x = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} \mbox{ or } A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \mbox{ or ...} \]
\[
A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}
\] (\(a_{ij}\) at data[i+j*m])
Fortran, MATLAB, Julia, R, NumPy / SciPy, Eigen and Arbmadillo (C++)
\[ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \]
(\(a_{ij}\) at data[j+i*n])
C and C++ (built in), Pascal
Why do you need to know this?
Can be row-oriented (ij) or column-oriented (ji)
\[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{lightgray}{y_{2}} \\ \color{lightgray}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{red}{x_{2}} \\ \color{red}{x_{3}} \end{bmatrix} \] \[ \begin{bmatrix} \color{purple}{y_{1}} \\ \color{purple}{y_{2}} \\ \color{purple}{y_{3}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{x_{1}} \\ \color{lightgray}{x_{2}} \\ \color{lightgray}{x_{3}} \end{bmatrix} \]
matvec1_row takes about 118 msmatvec1_col takes about 14 msA*x takes about 3 msCache locality takes advantage of locality of reference
Numbers from one core on my laptop:
A takes about 120 MB
matvec1_col accesses A with unit stride
\[c_{ij} = \sum_k a_{ik} b_{kj}\]
ij(k) or ji(k): Each entry of \(C\) is a dot product of a row of \(A\) and a column of \(B\).
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]
k(ij) or k(ji): \(C\) is a sum of outer products of column \(k\) of \(A\) and row \(k\) of \(B\).
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{purple}{c_{23}} \\ \color{purple}{c_{31}} & \color{purple}{c_{32}} & \color{purple}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{lightgray}{a_{12}} & \color{lightgray}{a_{13}} \\ \color{blue}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{blue}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{lightgray}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{lightgray}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]
i(jk) or i(kj): Each row of \(C\) is a row of \(A\) multiplied by \(B\)
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{purple}{c_{13}} \\ \color{lightgray}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{lightgray}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{lightgray}{a_{21}} & \color{lightgray}{a_{22}} & \color{lightgray}{a_{23}} \\ \color{lightgray}{a_{31}} & \color{lightgray}{a_{32}} & \color{lightgray}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{red}{b_{12}} & \color{red}{b_{13}} \\ \color{red}{b_{21}} & \color{red}{b_{22}} & \color{red}{b_{23}} \\ \color{red}{b_{31}} & \color{red}{b_{32}} & \color{red}{b_{33}} \end{bmatrix} \]
j(ik) or j(ki): Each column of \(C\) is \(A\) times a column of \(B\)
\[ \begin{bmatrix} \color{purple}{c_{11}} & \color{lightgray}{c_{12}} & \color{lightgray}{c_{13}} \\ \color{purple}{c_{21}} & \color{lightgray}{c_{22}} & \color{lightgray}{c_{23}} \\ \color{purple}{c_{31}} & \color{lightgray}{c_{32}} & \color{lightgray}{c_{33}} \end{bmatrix} = \begin{bmatrix} \color{blue}{a_{11}} & \color{blue}{a_{12}} & \color{blue}{a_{13}} \\ \color{blue}{a_{21}} & \color{blue}{a_{22}} & \color{blue}{a_{23}} \\ \color{blue}{a_{31}} & \color{blue}{a_{32}} & \color{blue}{a_{33}} \end{bmatrix} \begin{bmatrix} \color{red}{b_{11}} & \color{lightgray}{b_{12}} & \color{lightgray}{b_{13}} \\ \color{red}{b_{21}} & \color{lightgray}{b_{22}} & \color{lightgray}{b_{23}} \\ \color{red}{b_{31}} & \color{lightgray}{b_{32}} & \color{lightgray}{b_{33}} \end{bmatrix} \]
\[ \begin{bmatrix} \color{purple}{C_{11}} & \color{blue}{C_{12}} \\ \color{red}{C_{21}} & \color{gray}{C_{22}} \end{bmatrix} = \begin{bmatrix} \color{purple}{c_{11}} & \color{purple}{c_{12}} & \color{blue}{c_{13}} & \color{blue}{c_{14}} \\ \color{purple}{c_{21}} & \color{purple}{c_{22}} & \color{blue}{c_{23}} & \color{blue}{c_{24}} \\ \color{red}{c_{31}} & \color{red}{c_{32}} & \color{gray}{c_{33}} & \color{gray}{c_{34}} \\ \color{red}{c_{41}} & \color{red}{c_{42}} & \color{gray}{c_{43}} & \color{gray}{c_{44}} \end{bmatrix} \]
Q: Why is this a useful idea?
For \(\mathcal{A} \in L(\mathcal{V}_1 \oplus \mathcal{V}_2, \mathcal{W}_1 \oplus \mathcal{W}_2)\): \[ w = \mathcal{A} v \equiv \begin{bmatrix} w_1 \\ w_2 \end{bmatrix} = \begin{bmatrix} \mathcal{A}_{11} & \mathcal{A}_{12} \\ \mathcal{A}_{21} & \mathcal{A}_{22} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \] where \[ v_i = \Pi_{\mathcal{V}_i} v, \quad w_j = \Pi_{\mathcal{W}_j} w, \quad \mathcal{A}_{ij} = \Pi_{\mathcal{W}_i} \mathcal{A} |_{\mathcal{V}_j} \] Given bases \(V_1\), \(V_2\), and \(V = \begin{bmatrix} V_1 & V_2 \end{bmatrix}\) for \(\mathcal{V}_1\), \(\mathcal{V}_2\) and \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\) (and similarly for \(\mathcal{W}\)), get matrix representation with same block structure as above.
How LAPACK does it:
Invariant under appropriate change of basis:
\(A \in \mathbb{R}^{n \times n}\) is nonsingular if \(A^{-1}\) exists; otherwise singular.
\(Q \in \mathbb{R}^{n \times n}\) is orthogonal if \(Q^T Q = I\).
\(A \in \mathbb{R}^{n \times n}\) is symmetric if \(A = A^T\).
\(S \in \mathbb{R}^{n \times n}\) is skew-symmetric if \(S = -S^T\).
\(L \in \mathbb{R}^{n \times m}\) is low rank if \(L = UV^T\) for \(U \in \mathbb{R}^{n \times k}\) and \(V \in \mathbb{R}^{m \times k}\) where \(k \ll \min(m,n)\).
Reflect fundamental properties of maps, operators, forms.
\(D\) is diagonal if \(d_{ij} = 0\) for \(i \neq j\).
\[\begin{bmatrix} d_1 & \color{lightgray}{0} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & d_2 & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{lightgray}{0} & d_3 & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{lightgray}{0} & d_4 \end{bmatrix} \rightarrow \begin{bmatrix} \times & & & \\ & \times & & \\ & & \times & \\ & & & \times \end{bmatrix} \]
Diagramming convention (“spy plots”):
\(T\) is tridiagonal if \(t_{ij} = 0\) for \(i \not \in \{j-1, j, j+1\}\).
\[ \begin{bmatrix} \times & \times & & & \\ \times & \times & \times & & \\ & \times & \times & \times & \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]
\(U\) is upper bidiagonal if only main diagonal and first superdiagonal are nonzero (similar with lower bidiagonal).
\[ \begin{bmatrix} \times & \times & & & \\ & \times & \times & & \\ & & \times & \times & \\ & & & \times & \times \\ & & & & \times \end{bmatrix} \]
\(U\) is upper triangular if \(u_{ij} = 0\) for \(i > j\) and strictly upper triangular if \(u_{ij} = 0\) for \(i \geq j\) (lower triangular and strictly lower triangular are similarly defined).
\[ \begin{bmatrix} \times & \times & \times & \times & \times \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \\ & & & & \times \end{bmatrix} , \begin{bmatrix} & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \\ & & & & \times \\ & & & & \end{bmatrix} \]
\(H\) is upper Hessenberg if \(h_{ij} = 0\) for \(i > j+1\).
\[ \begin{bmatrix} \times & \times & \times & \times & \times \\ \times & \times & \times & \times & \times \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]
\(B\) is banded if \(b_{ij} = 0\) for \(-b_l \leq j-i \leq b_{u}\).
\[ \begin{bmatrix} \times & \times & \times & & \\ \times & \times & \times & \times & \\ & \times & \times & \times & \times \\ & & \times & \times & \times \\ & & & \times & \times \end{bmatrix} \]
4×4 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅ ⋅ ⋅
⋅ 2.0 ⋅ ⋅
⋅ ⋅ 3.0 ⋅
⋅ ⋅ ⋅ 4.0
Diagonal(d) * x and d .* x4×4 SymTridiagonal{Float64, Vector{Float64}}:
1.0 5.0 ⋅ ⋅
5.0 2.0 6.0 ⋅
⋅ 6.0 3.0 7.0
⋅ ⋅ 7.0 4.0
Tridiagonal and SymTridiagonal variantsQ: How would we write T*x via diagonal/superdiagonal?
let
d, du = [1.0; 2.0; 3.0; 4.0], [5.0; 6.0; 7.0]
T = SymTridiagonal(d, du)
x = [9.0; 10.0; 11.0; 12.0]
# Use Julia built-in for tridiagonal
y1 = T*x
# Implement multiply directly with d, du
y2 = d .* x # Diagonal
y2[2:end] .+= du .* x[1:end-1] # Subdiagonal
y2[1:end-1] .+= du .* x[2:end] # Superdiagonal
# Check equivalence
y1 == y2
endtrue
Store \(a_{ij}\) at B[j,j-i+bl+1 (index by col, diagonal) \[
\begin{bmatrix}
a_{11} & \color{green}{a_{12}} & \color{blue}{a_{13}} & & \\
\color{red}{a_{21}} & a_{22} & \color{green}{a_{23}} & \color{blue}{a_{24}} & \\
& \color{red}{a_{32}} & a_{33} & \color{green}{a_{34}} & \color{blue}{a_{35}} \\
& & \color{red}{a_{43}} & a_{44} & \color{green}{a_{45}} \\
& & & \color{red}{a_{54}} & a_{55}
\end{bmatrix} \rightarrow
\begin{bmatrix}
\color{red}{a_{21}} & a_{11} & & \\
\color{red}{a_{32}} & a_{22} & \color{green}{a_{12}} & \\
\color{red}{a_{43}} & a_{33} & \color{green}{a_{23}} & \color{blue}{a_{35}} \\
\color{red}{a_{52}} & a_{44} & \color{green}{a_{34}} & \color{blue}{a_{45}} \\
& a_{55} & \color{green}{a_{45}} & \color{blue}{a_{55}}
\end{bmatrix}
\]
LinearAlgebraConsider \((Pv)_i = v_{\pi(i)}\):
\(P\) a sparse matrix with \(p_{ij} = \delta_{\pi(i),j}\)
Can store as integer vector \(\begin{bmatrix} \pi(1) & \ldots & \pi(n) \end{bmatrix}^T\):
Not the only compact representation!
\[ \begin{bmatrix} a_{11} & & & a_{14} \\ & a_{22} & & a_{24} \\ & & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]
\[ \begin{bmatrix} a_{11} & & & a_{14} \\ & a_{22} & & a_{24} \\ & & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]
4×4 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
1.0 ⋅ ⋅ 4.0
⋅ 2.0 ⋅ 4.1
⋅ ⋅ 3.0 4.2
1.1 2.1 3.1 4.3
Could then multiply with A*x
Q: How is A*x implemented?
Column-oriented layout, so column-oriented matvec!
A :: SparseMatrixCSC)\[ \begin{bmatrix} \times & \times & & \times & & & & & \\ \times & \times & \times & & \times & & & & \\ & \times & \times & & & \times & & & \\ \times & & & \times & \times & & \times & & \\ & \times & & \times & \times & \times & & \times & \\ & & \times & & \times & \times & & & \times \\ & & & \times & & & \times & \times & \\ & & & & \times & & \times & \times & \times \\ & & & & & \times & & \times & \times \end{bmatrix} \]
Nodes are rows/columns, edge \((i,j)\) if \(a_{ij} \neq 0\).
If \(A \in \mathbb{R}^{m \times n}\) has rank \(k\), can write as \[ A = UV^T, \quad U \in \mathbb{R}^{m \times k}, V \in \mathbb{R}^{n \times k}. \] Fast matvecs using associativity:
\[ A = \begin{bmatrix} a_0 & a_1 & a_2 & a_3 \\ a_{-1} & a_0 & a_1 & a_2 \\ a_{-2} & a_{-1} & a_0 & a_1 \\ a_{-3} & a_{-2} & a_{-1} & a_0 \end{bmatrix} \]
\[ C = \begin{bmatrix} \color{red}{a_0 } & \color{red}{a_1 } & \color{red}{a_2 } & \color{red}{a_3} & a_{-3} & a_{-2} & a_{-1}\\ \color{red}{a_{-1}} & \color{red}{a_0 } & \color{red}{a_1 } & \color{red}{a_2} & a_{3} & a_{-3} & a_{-2}\\ \color{red}{a_{-2}} & \color{red}{a_{-1}} & \color{red}{a_0 } & \color{red}{a_1} & a_{2} & a_{3} & a_{-3} \\ \color{red}{a_{-3}} & \color{red}{a_{-2}} & \color{red}{a_{-1}} & \color{red}{a_0} & a_{1} & a_{2} & a_{3} \\ a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 & a_1 & a_2 \\ a_2 & a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 & a_1 \\ a_1 & a_2 & a_3 & a_{-3} & a_{-2} & a_{-1} & a_0 \\ \end{bmatrix} \]
Write previous circulant concisely as \[ C = \sum_{k=-3}^3 a_k P^{-k}, \] where \(P\) is the forward cyclic permutation matrix \[ P_{ij} = \begin{cases} 1, & i = j+1 \mod n \\ 0, & \mbox{otherwise} \end{cases} \] FFT diagonalizes \(P\) (and therefore \(C\)) – more later in semester
\[
A \otimes B =
\begin{bmatrix}
a_{11} B & a_{12} B & \ldots & a_{1n} B \\
a_{21} B & a_{22} B & \ldots & a_{2n} B \\
\vdots & \vdots & & \vdots \\
a_{m1} B & a_{m2} B & \ldots & a_{mn} B
\end{bmatrix}.
\] Multiplication represents vector triple product \[
(A \otimes B) \operatorname{vec}(X) = \operatorname{vec}(BXA^T)
\] where \(\operatorname{vec}(X)\) represents X[:] (column-major flattening)
Many matrices have (nearly) low-rank blocks. Ex:
4-element Vector{Float64}:
13.861349606169064
2.3147718848826372e-15
5.145067223414744e-16
1.139063951832993e-16