CS 6210: Matrix Computations

LU, Cholesky, refinement, condest

David Bindel

2025-09-17

Diagonal dominance

Recall GEPP

Compute \(PA = LU\), control \(L\)

Partial pivoting generally works (not always)
More elaborate pivoting may cost more
But maybe hypotheses on \(A\) will help?

Diagonal dominance

\(A \in \mathbb{R}^{n \times n}\) is (column) diagonally dominant if \[ |a_{jj}| \geq \sum_{i \neq j} |a_{ij}|. \]

Strictly diagonally dominant if inequality strict
Strict diagonal dominance implies invertible
We will see this structure at various points later…

So how does this connect to LU?

Schur complements

Sanity check: what is the first Schur complement in \[A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}\]

Answer: \[S = F - \frac{bc^T}{\alpha}\]

Note: GEPP wouldn’t pivot in SDD case, \(|\alpha|\) is largest in col

SDD Schur complements

\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]

\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]

Multiply first inequality by \(|c_j|/|\alpha|\): \[\sum_i \frac{|b_i||c_j|}{|\alpha|} < |c_j|\]

SDD Schur complements

\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]

\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]

Substitute: \[|f_{jj}| > \sum_{i \neq j} |f_{ij}| + \sum_i \frac{|b_i| |c_j|}{|\alpha|}\]

SDD Schur complements

\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]

\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]

Subtract diagonal piece from last term: \[ |f_{jj}| - \frac{|b_j| |c_j|}{|\alpha|} > \sum_{i \neq j} \left( |f_{ij}| + \frac{|b_i| |c_j|}{|\alpha|} \right) \]

SDD Schur complements

\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]

\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]

Triangle inequality: \[ |s_{jj}| \geq |f_{jj}| - \frac{|b_j| |c_j|}{|\alpha|} > \sum_{i \neq j} \left( |f_{ij}| + \frac{|b_i| |c_j|}{|\alpha|} \right) \geq \sum_{i \neq j} |s_{jj}| \]

SDD Schur complements

\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]

\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]

Conclusion: Schur complement is also SDD: \[|s_{jj}| > \sum_{i \neq j} |s_{ij}|\]

SDD and GEPP

Why is this relevant to Gaussian elimination?

Biggest element in each column is on the diagonal
And Schur complements remain strictly diag dominant
Therefore, GEPP does no pivoting on SDD matrices!

What other matrices can we stably factor without pivoting?

Symmetry and Cholesky

Reminder: Quadratic forms

Symmetric \(A \in \mathbb{R}^{n \times n}\) represents a quadratic form \[ \phi(x) = x^T A x \] Characterize by inertia \(\nu = (\nu_+, \nu_0, \nu_-)\).

Inertia and LU

Suppose \(A = LU\), and write \[ U = DM^T, \quad D = \operatorname{diag}(U). \]

Then \(A = A^T = M(DL^T) = LU\) (because LU is unique)
Therefore \(M = L\) and \(A = LDL^T\)
Inertia \(\nu(D) = \nu(A)\) (just a change of basis!)

But in general still need to pivot for stability.

Positive definiteness

Positive definite if \(\nu = (n, 0, 0)\), i.e. \[ \forall x \neq 0, \phi(x) > 0. \]

Notation: \(A \succ 0\)
\(A = LL^T\) is Cholesky factorization
Also write as \(A = R^T R\) where \(R\) upper triangular

Fun facts!

For \(A \succ 0\): \[\begin{aligned} A^{-1} &\succ 0 \\ A_{11} &\succ 0 \\ \kappa(A_{11}) &\leq \kappa(A) \\ S & \succ 0 \\ \kappa(S) & \leq \kappa(A) \end{aligned}\] where \(A\) is block 2-by-2 and \(S = A_{22} - A_{21} A_{11}^{-1} A_{12}\)

Cholesky

Partition \(A\) into first row/column and rest. Then: \[ \begin{bmatrix} a_{11} & a_{21}^T \\ a_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 \\ l_{21} & L_{22} \end{bmatrix} \begin{bmatrix} l_{11} & l_{21}^T \\ 0 & L_{22}^T \end{bmatrix} \] Recursively look at \[\begin{aligned} l_{11} &= \sqrt{a_{11}} \\ l_{21} &= a_{21}/l_{21} \\ L_{22} L_{22}^T &= A_{22} - l_{21} l_{21}^T \end{aligned}\] Q: Why no worries about validity of \(\sqrt{\cdot}\)?

Cholesky

Factorization exists iff \(A\) is spd (why?)
Factorization is backward stable (without pivoting!)
Get straightforward bounds on elements of \(L\)

Cholesky

Algorithm looks a lot like LU!

function mychol!(A)
    A = copy(A)
    n = size(A)[1]
    for j = 1:n
        if A[j,j] < 0.0
            error("Indefinite matrix")
        end
        A[j,j] = sqrt(A[j,j])
        A[j+1:end,j] /= A[j,j]
        A[j+1:end,j+1:end] -= A[j+1:end,j]*A[j+1:end,j]'
    end
    LowerTriangular(A)
end

As with LU, we usually block for level 3 BLAS.

Slightly sloppy

Suppose a slightly-sloppy solve: \[ \hat{A} \hat{x} = b, \quad \hat{A} = A + E \] where \(\|E\| \leq \gamma \|A\|\) (think \(\gamma \ll 1\), but not \(O(\epsilon_{\mathrm{mach}})\)).

Because of pivot growth in GEPP
Because of static or approximate pivoting
Because of a low-precision factorization

Can we improve the solution?

Forward error and residual

\[\begin{aligned} (A + E) \hat{x} &= b \\ A x &= b \\\hline A(\hat{x}-x) + E \hat{x} &= 0 \end{aligned}\]

Forward error \(e = \hat{x}-x\) is given by \[\begin{aligned} e &= -A^{-1} E \hat{x} \\ &= A^{-1} (b - A\hat{x}) \end{aligned}\] Can compute \(b-A \hat{x}\) up to roundoff…

Analysis

Iteration equation: \(\hat{x}_{k+1} = \hat{x}_k + d_k\), \[ (A+E_k) d_k = b-A\hat{x}_k + f_k \] where \[\|E_{k+1}\| \leq \gamma \|A\|, \quad \|f_k\| \leq \phi \]

Analysis

Iteration equation: \(\hat{x}_{k+1} = \hat{x}_k + d_k\)

Goal: Get an iteration for the residual.

\[\begin{aligned} && (A+E_k) d_k &= b-A\hat{x}_k + f_k \\ &\implies& (I+E_k A^{-1}) (r_{k+1}-r_k) &= -r_k + f_k \\ &\implies& (I+E_k A^{-1}) r_{k+1} &= E_k A^{-1} r_k + f_k \end{aligned}\]

Analysis

Goal: Get an iteration for \(\|r_k\|\) from \[ (I+E_k A^{-1}) r_{k+1} = E_k A^{-1} r_k + f_k \] where \[\|E_{k+1}\| \leq \gamma \|A\|, \quad \|f_k\| \leq \phi\] Hint: Neumann bounds (assume \(\kappa(A) \gamma < 1\))

\[ \|r_{k+1}\| \leq \frac{\kappa(A) \gamma \|r_k\| + \phi}{1-\kappa(A) \gamma} \]

Analysis

Iteration is \[ \|r_{k+1}\| \leq \alpha \|r_k\| + \tilde{\phi} \] starting from \(\|r_0\| = \|b\|\), where \[\begin{aligned} \alpha &= \frac{\kappa(A) \gamma}{1-\kappa(A) \gamma} \\ \tilde{\phi} &= \frac{\phi}{1-\kappa(A) \gamma} \end{aligned}\]

Analysis

Q: How to bound \(\|r_k\|\) from recurrence? \[ \|r_{k+1}\| \leq \alpha \|r_k\| + \tilde{\phi} \]

Assuming \(\alpha < 1\), can show by induction \[ \|r_k\| \leq \alpha^k \|b\| + \frac{\tilde{\phi}}{1-\alpha}. \] For \(\alpha\) sufficiently less than one, residual is asymptotically determined by \(\phi\).

Condition estimation

Condition number quandry

Consider \[\kappa_1(A) = \|A\|_1 \|A^{-1}\|_1\]

Easy to estimate \(\|A\|_1\)
How do we estimate \(\|A^{-1}\|_1\)?

Note: this is a convex optimization problem! \[ \|A^{-1}\|_1 = \max_{\|x\|_1 \leq 1} \|A^{-1} x\|_1 \]

Gradient relation

Define \(\xi_j = \operatorname{sign}(y_j)\). Then \[ \|y\|_1 = \xi^T y \] and if \(y\) has no nonzero components, \(\xi\) is a gradient: \[ \delta\left( \|y\|_1 \right) = \xi^T \, \delta y. \]

Idea

Predict using gradient: \[ \|A^{-1} x_{\mathrm{new}}\|_1 \approx \xi^T A^{-1} x_{\mathrm{new}} \]

Know \(x_{\mathrm{new}}\) should only be some \(e_j\)
Best guess: \(j = \operatorname{argmax}_j |z_j|\) where \(z = A^{-T} \xi\)

This gives Hager’s estimator (independently derived but equivalent to Boyd’s \(p\)-norm power method).

Hager’s estimator

function hager(n, solveA, solveAT)
    x = ones(n)/n
    invA_normest = 0.0
    while true
        y = solveA(x)    # Evaluate y = A^-1 x
        xi = sign.(y)    # and z = A^-T sign(y), the
        z = solveAT(xi)  # subgradient of x -> |A^-1 x|_1
        znorm, j = findmax(abs.(z))  # Next iterate
        if znorm <= dot(z,x)         # Convergence check
            return norm(y,1)
        end
        x[:] .= 0.0      # Update x to e_j and repeat
        x[j] = 1.0
    end
    invA_normest
end

Improved version (Higham 1988) in LAPACK.gecon!