CS 6210: Matrix Computations

Gaussian elimination

David Bindel

2025-09-15

Introduction

Goal (3 lec): LU factorization and co

AKA Gaussian elimination
Workhorse for linear systems (not the only way)
Not just for linear systems!
More subtle than you think…

Points to remember

Nonzero matrices can still be singular
Some matrices are rectangular
Can still factor singular and rectangular matrices!

Julia notes

Will want to use LinearAlgebra throughout
A\b and c'/A compute \(A^{-1} b\) and \(c^T A^{-1}\)
Can also use forward/backslash with factorization objects
We can explicitly call inv, but almost never should

Linear systems in Julia

let
    # Set up test problem with known solution
    A    = [1.0 2.0; 3.0 4.0]
    xref = [5.0; 6.0]
    b    = A*xref

    x1 = inv(A)*b  # AVOID EXPLICIT INVERSES!
    x2 = A\b       # Good for one solve

    F = lu(A)      # Save factorization object
    x3 = F\b       # Good for multiple solves

    x1 ≈ xref, x2 ≈ xref, x3 ≈ xref
end

(true, true, true)

Julia views

\[\begin{bmatrix} 1 & 4 & 7 \\ 2 & {\color{red}5} & {\color{red}8} \\ 3 & {\color{red}6} & {\color{red}9} \end{bmatrix}\]

let
    A = [1 4 7; 2 5 8; 3 6 9]  # Create matrix, bind to name A
    B = @view A[2:3,2:3]       # Create view, bind to name B
    B[1,1] = 10
    A
end

3×3 Matrix{Int64}:
 1   4  7
 2  10  8
 3   6  9

Julia views

\[\begin{bmatrix} 1 & 4 & 7 \\ 2 & {\color{red}5} & {\color{red}8} \\ 3 & {\color{red}6} & {\color{red}9} \end{bmatrix}\]

Matrix object owns storage
A submatrix view references a piece of that storage
Ex: B[1,1] = 10 also changes A[2,2] to 10 (same data!)

Julia views

In general:

Views provide interpretation of storage
That storage is “owned” by someone else

In particular: strided views have

Fixed offset between rows (row stride — unit is best)
Fixed offset between cols (col stride)
Often relatively fast arithmetic routines

Julia views

Views don’t have to just reference submatrices!

let
    A = [1 4 7; 2 5 8; 3 6 9]
    L = UnitLowerTriangular(A)  # This is a view
    U = UpperTriangular(A)      # This is, too!
    A[3,3] = 10
    display(L)
    display(U)
end

3×3 UnitLowerTriangular{Int64, Matrix{Int64}}:
 1  ⋅  ⋅
 2  1  ⋅
 3  6  1

3×3 UpperTriangular{Int64, Matrix{Int64}}:
 1  4   7
 ⋅  5   8
 ⋅  ⋅  10

Gaussian elimination

Gauss transformations

Example: \(M = \begin{bmatrix} 1 & 0 \\ -0.5 & 1 \end{bmatrix}\)

Gauss transformations

Geometrically: a type of simple shear (preserves volume!)
Write as \(M^{(j)} = I - \tau_j e_j^T\), \((\tau_j)_k = 0\) for \(k \leq j\)

Gauss transformations

Write as \(M^{(j)} = I - \tau_j e_j^T\). Then for \((\tau_j)_k = v_k/v_j\):

\[(I - \tau_j e_j^T) v = \begin{bmatrix} v_1 \\ \vdots \\ v_j \\ 0 \\ \vdots \\ 0 \end{bmatrix}\]

Operation “zeros out” entries after index \(j\).

Gauss transformations

Write as \(M^{(j)} = I - \tau_j e_j^T\).

Interpretation of \(M^{(j)} v\): subtract \((\tau_{j})_k v_j\) from \(v_k\), \(k > j\).
Interpretation of \((M^{(j)})^{-1} v\): add \((\tau_{j})_k v_j\) to \(v_k\), \(k > j\)
Note: \((M^{(j)})^{-1} = I + \tau_j e_j^T\)

Q: Why is \((I + \tau_j e_j^T) (I - \tau_j e_j^T) = I\)?

Gaussian elimination

Apply Gauss transforms to \(A\) to “zero out” subdiagonals
Transformed matrix is upper triangular (\(U\))
(Inverse) transforms into unit lower triangular \(L\)
Result: \(A = LU\)

Example

\[ A = A^{(0)} = \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 10 \end{bmatrix} \]

Example

\[ \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{-2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{-3} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{blue}{1} & \color{blue}{4} & \color{blue}{7} \\ \color{black}{2} & \color{black}{5} & \color{black}{8} \\ \color{black}{3} & \color{black}{6} & \color{black}{10} \end{bmatrix} = \begin{bmatrix} \color{blue}{1} & \color{blue}{4} & \color{blue}{7} \\ \color{lightgray}{0} & \color{purple}{-3} & \color{purple}{-6} \\ \color{lightgray}{0} & \color{purple}{-6} & \color{purple}{-11} \end{bmatrix} \] Subtract multiples of row 1 to introduce zeros in column 1: \[ (I-\tau_1 e_1^T) A^{(0)} = A^{(1)}, \quad\tau_1 = \begin{bmatrix} \color{lightgray}{0} \\ \color{red}{2} \\ \color{red}{3} \end{bmatrix} ,\quad e_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \]

Example

\[ \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{black}{1} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{red}{-2} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{black}{0} & \color{blue}{-3} & \color{blue}{-6} \\ \color{black}{0} & \color{black}{-6} & \color{black}{-11} \end{bmatrix} = \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{lightgray}{0} & \color{blue}{-3} & \color{blue}{-6} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{purple}{1} \end{bmatrix} \] Subtract multiples of row 2 to introduce zeros in column 2: \[ (I-\tau_2 e_2^T) A^{(1)} = A^{(2)}, \quad\tau_2 = \begin{bmatrix} \color{lightgray}{0} \\ \color{lightgray}{0} \\ \color{red}{2} \end{bmatrix} ,\quad e_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \]

Example

\[ \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{black}{1} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{red}{-2} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{-2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{-3} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 10 \end{bmatrix} = \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{lightgray}{0} & \color{black}{-3} & \color{black}{-6} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \]

But note: \((I-\tau_j e_j^T)^{-1} = (I + \tau_j e_j)\) – so

\[ \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 10 \end{bmatrix} = \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{3} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{black}{1} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{red}{2} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{lightgray}{0} & \color{black}{-3} & \color{black}{-6} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \]

Example

Now note: \((I+\tau_1 e_1^T) (I + \tau_2 e_2^T) = I + \tau_1 e_1^T + \tau_2 e_2^T\)

\[ \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{3} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{black}{1} & \color{lightgray}{0} \\ \color{lightgray}{0} & \color{red}{2} & \color{black}{1} \end{bmatrix} = \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{3} & \color{red}{2} & \color{black}{1} \end{bmatrix} \] Therefore \(A = LU\):

Example

Forward substitution \(Ly = b\) and
Back substitution \(Ux = y\):

Example

Forward substitution: \[ \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{3} & \color{red}{2} & \color{black}{1} \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} = \begin{bmatrix} 12 \\ 15 \\ 19 \end{bmatrix} \]

\[ y = \begin{bmatrix} 12 \\ -9 \\ 1 \end{bmatrix} \]

Example

Back substitution: \[ \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{lightgray}{0} & \color{black}{-3} & \color{black}{-6} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 12 \\ -9 \\ 1 \end{bmatrix} \]

\[ x = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \]

Example

\[ \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 10 \end{bmatrix} = \begin{bmatrix} \color{black}{1} & \color{lightgray}{0} & \color{lightgray}{0} \\ \color{red}{2} & \color{black}{1} & \color{lightgray}{0} \\ \color{red}{3} & \color{red}{2} & \color{black}{1} \end{bmatrix} \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{lightgray}{0} & \color{black}{-3} & \color{black}{-6} \\ \color{lightgray}{0} & \color{lightgray}{0} & \color{black}{1} \end{bmatrix} \] And we can pack both factors into one matrix: \[ \begin{bmatrix} \color{black}{1} & \color{black}{4} & \color{black}{7} \\ \color{red}{2} & \color{black}{-3} & \color{black}{-6} \\ \color{red}{3} & \color{red}{2} & \color{black}{1} \end{bmatrix} \]

Code: Separate factors

function mylu_v1(A)
    n = size(A)[1]
    L = Matrix(1.0I, n, n)
    U = copy(A)
    for j = 1:n-1
        for i = j+1:n
1            L[i,j] = U[i,j]/U[j,j]
            U[i,j] = 0.0
            for k = j+1:n
2                U[i,k] -= L[i,j]*U[j,k]
            end
        end
    end
    L, U
end

1: Figure out multiple of row \(j\) to subtract from row \(i\)
2: Subtract off the appropriate multiple of row \(j\) from row \(i\)

Code: Packed storage

function mylu_v2(A)
    n = size(A)[1]
    A = copy(A)
    for j = 1:n-1
        for i = j+1:n
1            A[i,j] = A[i,j]/A[j,j]
            for k = j+1:n
2                A[i,k] -= A[i,j]*A[j,k]
            end
        end
    end
3    UnitLowerTriangular(A), UpperTriangular(A)
end

1: Figure out multiple of row \(j\) to subtract from row \(i\)
2: Subtract off the appropriate multiple of row \(j\) from row \(i\)
3: Return views of (unit) lower and upper triangles

Code: Packed storage

function mylu_v2b(A)
    n = size(A)[1]
    A = copy(A)
    for j = 1:n-1
1        A[j+1:n,:] ./= A[j,j]
2        A[j+1:n,j+1:n] .-= A[j+1:n,:] * A[:,j+1:n]
    end
3    UnitLowerTriangular(A), UpperTriangular(A)
end

1: Figure out multiple of row \(j\) to subtract from row \(i\)
2: Subtract off the appropriate multiple of row \(j\) from row \(i\). (This version forms outer product)
3: Return views of (unit) lower and upper triangles

Code: Packed storage

function mylu_v2b(A)
    n = size(A)[1]
    A = copy(A)
    for j = 1:n-1
1        lj = @view A[j+1:n,j]
        uj = @view A[j,j+1:n]
2        lj[:] ./= A[k,k]
3        BLAS.ger!(-1.0, lj, uj, view(A,j+1:n,j+1:n))
    end
4    UnitLowerTriangular(A), UpperTriangular(A)
end

1: Also use views of pieces of \(A\)
2: Figure out multiple of row \(j\) to subtract from row \(i\)
3: Subtract off the appropriate multiple of row \(j\) from row \(i\). (This version uses a BLAS call – more efficient.)
4: Return views of (unit) lower and upper triangles

Block elimination

Block 2-by-2 LU

Block perspective on \(A = LU\): \[ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} L_{11} & \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ & U_{22} \end{bmatrix} \] Can think of this as four matrix equations \[\begin{aligned} A_{11} &= L_{11} U_{11} \\ A_{12} &= L_{11} U_{12} \\ A_{21} &= L_{21} U_{11} \\ A_{22} &= L_{21} U_{12} + L_{22} U_{22} \end{aligned}\]

Block LU computation

Matrix equations \(\implies\) computational steps

\[\begin{aligned} A_{11} &= L_{11} U_{11} \\ A_{12} &= L_{11} U_{12} \\ A_{21} &= L_{21} U_{11} \\ A_{22} &= L_{21} U_{12} + L_{22} U_{22} \end{aligned}\]

\[\begin{aligned} L_{11} U_{11} &= A_{11} \\ U_{12} &= L_{11}^{-1} A_{12} \\ L_{21} &= A_{21} U_{11}^{-1} \\ L_{22} U_{22} &= A_{22} - L_{21} U_{12} \end{aligned}\]

Trailing submatrix in last step is a Schur complement: \[\begin{aligned} S &= A_{22} - L_{21} U_{12} \\ &= A_{22} - A_{21} A_{11}^{-1} A_{12} \end{aligned}\]

Alternate: block elimination

Can do blocked view of LU or eliminate blockwise: \[ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} I & 0 \\ A_{21} A_{11}^{-1} & I \end{bmatrix} \begin{bmatrix} A_{11} & A_{12} \\ 0 & S \end{bmatrix} \] where \(S = A_{22} - A_{21} A_{11}^{-1} A_{12}\) as before!

Why does it matter?

Suppose a fast solver \(g(b)\) for \(Ax = b\) and consider

\[\begin{bmatrix} A & v \\ v^T & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} b \\ 0 \end{bmatrix}\]

Schur complement system gives:

\[\lambda = \frac{v^T g(b)}{v^T g(v)}\]

Back substitute to get \(x = g(b) - \lambda g(v)\). Only two \(g\) calls!

Schur complement meaning

Let \(B = A^{-1}\). Then \(AB = LUB = I\), so

\[ \begin{bmatrix} L_{11} & \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ & U_{22} \end{bmatrix} \begin{bmatrix} B_{21} \\ B_{22} \end{bmatrix} = \begin{bmatrix} 0 \\ I \end{bmatrix}. \]

Block forward and back substitution gives:

\[ \begin{bmatrix} B_{21} \\ B_{22} \end{bmatrix} = \begin{bmatrix} -U_{11}^{-1} U_{12} S^{-1} \\ S^{-1} \end{bmatrix} \]

where the Schur complement \(S\) is \(L_{22} U_{22}\)

Schur complement meaning

Conclusion: \[S = L_{22} U_{22} = B_{22}^{-1} = ((A^{-1})_{22})^{-1}\]

Schur complement is an

Inverse
Of a submatrix
Of an inverse

Can make sense of each of these in an abstract setting!

Block perspective gains

Fast level-3 BLAS rich codes
Approach to fast solves for bordered systems
Interesting interpretation of Schur complement

Backward error analysis

Backward error

Idea: write \(A = LU\) as: \[ u_{jk} = a_{jk} - \sum_{i=1}^{j-1} l_{ji} u_{ij} \] Computed version has \(|\delta_i| \leq (j-1) \epsilon_{\mathrm{mach}}\) s.t. \[ \hat{u}_{jk} = a_{jk}(1+\delta_0) - \sum_{i=1}^{j-1} \hat{l}_{ji} \hat{u}_{ij} (1+\delta_i) + O(\epsilon_{\mathrm{mach}}^2) \] True independent of order.

Turn it around

\[\begin{aligned} a_{jk} & = \frac{1}{1+\delta_0} \left( \hat{l}_{jj} \hat{u}_{jk} + \sum_{i=1}^{j-1} \hat{l}_{ji} \hat{u}_{ik} (1 + \delta_i) \right) + O(\epsilon_{\mathrm{mach}}^2) \\ & = \hat{l}_{jj} \hat{u}_{jk} (1 - \delta_0) + \sum_{i=1}^{j-1} \hat{l}_{ji} \hat{u}_{ik} (1 + \delta_i - \delta_0) + O(\epsilon_{\mathrm{mach}}^2) \\ & = \left( \hat{L} \hat{U} \right)_{jk} + e_{jk} \\ e_{jk} &= -\hat{l}_{jj} \hat{u}_{jk} \delta_0 + \sum_{i=1}^{j-1} \hat{l}_{ji} \hat{u}_{ik} (\delta_i - \delta_0) + O(\epsilon_{\mathrm{mach}}^2) \end{aligned}\]

A little more work, and…

\[ A = \hat{L} \hat{U} + E, \mbox{ where } |E| \leq n \epsilon_{\mathrm{mach}}|\hat{L}| |\hat{U}| + O(\epsilon_{\mathrm{mach}}^2). \] Then backward error from forward and backward substitution.

But wait: what if \(|L| |U| \gg |A|\)?
Control via growth factor
This really just gives the problem a name…
(Partial) solution: (partial) pivoting

Pivoting

The problem

What if \(|L| |U| \gg |A|\)?
Extreme case: divide by zero
Still bad: divide by tiny

Fixing the problem?

Bad: \[ A = \begin{bmatrix} \delta & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \delta^{-1} & 1 \end{bmatrix} \begin{bmatrix} \delta & 1 \\ 0 & 1-\delta^{-1} \end{bmatrix}. \]

Fine: \[ PA = \begin{bmatrix} 1 & 1 \\ \delta & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \delta & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 0 & 1-\delta \end{bmatrix}. \] How to choose permutation \(P\) more generally?

Partial pivoting

Goal: \(PA = LU\) where \(L\) controlled.
Idea: At step \(j\), eliminate \(x_j\) from remaining equations

Can use any of equations \(j\) through \(n\)
Choose equation \(i\) with largest \(|S^{(j)}_{ij}|\)
Swap rows \(i\) and \(j\) and eliminate
Result: \(|l_{ij}| \leq 1\)

Example

\[ \begin{bmatrix} \color{black}{0.69} & \color{black}{0.39} & \color{black}{0.32} & \color{black}{0.41} \\ \color{black}{0.86} & \color{black}{0.71} & \color{black}{0.01} & \color{black}{0.22} \\ \color{black}{0.40} & \color{black}{0.51} & \color{black}{0.75} & \color{black}{1.00} \\ \color{black}{0.53} & \color{black}{0.42} & \color{black}{0.58} & \color{black}{0.12} \end{bmatrix} \quad \begin{bmatrix} \color{black}{1} \\ \color{black}{2} \\ \color{black}{3} \\ \color{black}{4} \end{bmatrix} \] Initial matrix and permutation \(\pi\)

Example

\[ \begin{bmatrix} \color{brown}{0.69} & \color{brown}{0.39} & \color{brown}{0.32} & \color{brown}{0.41} \\ \color{brown}{0.86} & \color{brown}{0.71} & \color{brown}{0.01} & \color{brown}{0.22} \\ \color{black}{0.40} & \color{black}{0.51} & \color{black}{0.75} & \color{black}{1.00} \\ \color{black}{0.53} & \color{black}{0.42} & \color{black}{0.58} & \color{black}{0.12} \end{bmatrix} \quad \begin{bmatrix} \color{brown}{1} \\ \color{brown}{2} \\ \color{black}{3} \\ \color{black}{4} \end{bmatrix} \] Identify pivot (column 1)

Example

\[ \begin{bmatrix} \color{brown}{0.86} & \color{brown}{0.71} & \color{brown}{0.01} & \color{brown}{0.22} \\ \color{brown}{0.69} & \color{brown}{0.39} & \color{brown}{0.32} & \color{brown}{0.41} \\ \color{black}{0.40} & \color{black}{0.51} & \color{black}{0.75} & \color{black}{1.00} \\ \color{black}{0.53} & \color{black}{0.42} & \color{black}{0.58} & \color{black}{0.12} \end{bmatrix} \quad \begin{bmatrix} \color{brown}{2} \\ \color{brown}{1} \\ \color{black}{3} \\ \color{black}{4} \end{bmatrix} \] Apply swap (1 with 2)

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.80} & \color{black}{0.39} & \color{black}{0.32} & \color{black}{0.41} \\ \color{red}{0.46} & \color{black}{0.51} & \color{black}{0.75} & \color{black}{1.00} \\ \color{red}{0.62} & \color{black}{0.42} & \color{black}{0.58} & \color{black}{0.12} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{1} \\ \color{black}{3} \\ \color{black}{4} \end{bmatrix} \] Compute multipliers in column 1

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.80} & \color{purple}{-0.17} & \color{purple}{0.31} & \color{purple}{0.24} \\ \color{red}{0.46} & \color{purple}{0.18} & \color{purple}{0.74} & \color{purple}{0.90} \\ \color{red}{0.62} & \color{purple}{-0.02} & \color{purple}{0.57} & \color{purple}{-0.01} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{1} \\ \color{black}{3} \\ \color{black}{4} \end{bmatrix} \] Schur complement

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{brown}{0.80} & \color{brown}{-0.17} & \color{brown}{0.31} & \color{brown}{0.24} \\ \color{brown}{0.46} & \color{brown}{0.18} & \color{brown}{0.74} & \color{brown}{0.90} \\ \color{red}{0.62} & \color{black}{-0.02} & \color{black}{0.57} & \color{black}{-0.01} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{brown}{1} \\ \color{brown}{3} \\ \color{black}{4} \end{bmatrix} \] Identify pivot (column 2)

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{brown}{0.46} & \color{brown}{0.18} & \color{brown}{0.74} & \color{brown}{0.90} \\ \color{brown}{0.80} & \color{brown}{-0.17} & \color{brown}{0.31} & \color{brown}{0.24} \\ \color{red}{0.62} & \color{black}{-0.02} & \color{black}{0.57} & \color{black}{-0.01} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{brown}{3} \\ \color{brown}{1} \\ \color{black}{4} \end{bmatrix} \] Apply swap (2 with 3)

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{red}{0.80} & \color{red}{-0.93} & \color{black}{0.31} & \color{black}{0.24} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{black}{0.57} & \color{black}{-0.01} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{black}{1} \\ \color{black}{4} \end{bmatrix} \] Compute multipliers in column 2

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{red}{0.80} & \color{red}{-0.93} & \color{purple}{0.99} & \color{purple}{1.07} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{purple}{0.64} & \color{purple}{0.06} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{black}{1} \\ \color{black}{4} \end{bmatrix} \] Schur complement

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{brown}{0.80} & \color{brown}{-0.93} & \color{brown}{0.99} & \color{brown}{1.07} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{black}{0.64} & \color{black}{0.06} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{brown}{1} \\ \color{black}{4} \end{bmatrix} \] Identify pivot (column 3)

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{brown}{0.80} & \color{brown}{-0.93} & \color{brown}{0.99} & \color{brown}{1.07} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{black}{0.64} & \color{black}{0.06} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{brown}{1} \\ \color{black}{4} \end{bmatrix} \] Apply swap (3 with 3)

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{red}{0.80} & \color{red}{-0.93} & \color{blue}{0.99} & \color{blue}{1.07} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{red}{0.64} & \color{black}{0.06} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{black}{1} \\ \color{black}{4} \end{bmatrix} \] Compute multipliers in column 3

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{red}{0.80} & \color{red}{-0.93} & \color{blue}{0.99} & \color{blue}{1.07} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{red}{0.64} & \color{purple}{-0.63} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{black}{1} \\ \color{black}{4} \end{bmatrix} \] Schur complement

Example

\[ \begin{bmatrix} \color{blue}{0.86} & \color{blue}{0.71} & \color{blue}{0.01} & \color{blue}{0.22} \\ \color{red}{0.46} & \color{blue}{0.18} & \color{blue}{0.74} & \color{blue}{0.90} \\ \color{red}{0.80} & \color{red}{-0.93} & \color{blue}{0.99} & \color{blue}{1.07} \\ \color{red}{0.62} & \color{red}{-0.08} & \color{red}{0.64} & \color{blue}{-0.63} \end{bmatrix} \quad \begin{bmatrix} \color{black}{2} \\ \color{black}{3} \\ \color{black}{1} \\ \color{black}{4} \end{bmatrix} \] Final factorization \({\color{red}L}{\color{blue}U}\) and \(\pi\)

Partial pivoting code

function mypivlu(A)
    n = size(A)[1]
    A = copy(A)
    p = Vector(1:n)
    for j = 1:n-1
        jj = findmax(abs.(A[j:n,j]))[2] + j-1    # Find pivot
        p[jj], p[j] = p[j], p[jj]                # Record it
        for k = 1:n                              # Apply swap
            A[jj,k], A[j,k] = A[j,k], A[jj,k]
        end
        A[j+1:n,j] /= A[j,j]                     # Multipliers
        A[j+1:n,j+1:n] -= A[j+1:n,j]*A[j,j+1:n]' # Schur complement
    end
    p, UnitLowerTriangular(A), UpperTriangular(A)
end

Deferred updating

What about blocking? Do standard LU on a few columns \[ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \rightarrow \begin{bmatrix} (L/U)_{11} & A_{12} \\ L_{21} & A_{22} \end{bmatrix} \] Then apply permutations in a batch \[ \begin{bmatrix} (L/U)_{11} & (PA)_{12} \\ L_{21} & (PA)_{22} \end{bmatrix} \] Then \(U_{12} = L_{11}^{-1} (PA)_{12}\) (level 3 BLAS)
Then Schur complement (level 3 BLAS)

Problems with partial pivoting

Gaussian elimination with Partial Pivoting (GEPP) is standard

But it is not uniformly backward stable
- \(L\) remains controlled, but \(U\) entries can (rarely) grow
- Studied since Wilkinson 1961 — and still some today!
And pivoting increases memory movement
- Deferred updating helps with this
- But still not great in parallel

Beyond partial pivoting

For better stability, permute rows and columns
- Opens broader search for big elements
- Complete pivoting: Search entire Schur complement
- Rook pivoting: Search column \(j\) and row \(j\)
More recent: tournament pivoting
- Choose \(b\) pivot rows in one go
- Can improve both stability and performance!
Can also just use GEPP (or less) and clean up later…

Summary

LU factorization

Gaussian elimination \(\equiv PA = LU\)
Block perspective is good for performance
- Also good for algorithmic variants!
Schur complements are interesting on their own

Error and pivoting

\(\hat{L} \hat{U} = A + E\) where \(|E| \leq n \epsilon_{\mathrm{mach}}|L| |U|\)
Could have \(|L| |U| \gg |A|\)!
Partial solution from partial pivoting: \(PA = LU\)
Options beyond partial pivoting with different tradeoffs