LU, Cholesky, refinement, condest
2025-09-17
Compute \(PA = LU\), control \(L\)
\(A \in \mathbb{R}^{n \times n}\) is (column) diagonally dominant if \[ |a_{jj}| \geq \sum_{i \neq j} |a_{ij}|. \]
So how does this connect to LU?
Sanity check: what is the first Schur complement in \[A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}\]
Answer: \[S = F - \frac{bc^T}{\alpha}\]
Note: GEPP wouldn’t pivot in SDD case, \(|\alpha|\) is largest in col
\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]
\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]
Multiply first inequality by \(|c_j|/|\alpha|\): \[\sum_i \frac{|b_i||c_j|}{|\alpha|} < |c_j|\]
\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]
\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]
Substitute: \[|f_{jj}| > \sum_{i \neq j} |f_{ij}| + \sum_i \frac{|b_i| |c_j|}{|\alpha|}\]
\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]
\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]
Subtract diagonal piece from last term: \[ |f_{jj}| - \frac{|b_j| |c_j|}{|\alpha|} > \sum_{i \neq j} \left( |f_{ij}| + \frac{|b_i| |c_j|}{|\alpha|} \right) \]
\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]
\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]
Triangle inequality: \[ |s_{jj}| \geq |f_{jj}| - \frac{|b_j| |c_j|}{|\alpha|} > \sum_{i \neq j} \left( |f_{ij}| + \frac{|b_i| |c_j|}{|\alpha|} \right) \geq \sum_{i \neq j} |s_{jj}| \]
\[ A = \begin{bmatrix} \alpha & c^T \\ b & F \end{bmatrix}, \quad S = F - \frac{bc^T}{\alpha} \]
\[\begin{aligned} |\alpha| & > \sum_{i} |b_i|, & |f_{jj}| & > \sum_{i \neq j} |f_{ij}| + |c_j| \end{aligned}\]
Conclusion: Schur complement is also SDD: \[|s_{jj}| > \sum_{i \neq j} |s_{ij}|\]
Why is this relevant to Gaussian elimination?
What other matrices can we stably factor without pivoting?
Symmetric \(A \in \mathbb{R}^{n \times n}\) represents a quadratic form \[ \phi(x) = x^T A x \] Characterize by inertia \(\nu = (\nu_+, \nu_0, \nu_-)\).
Suppose \(A = LU\), and write \[ U = DM^T, \quad D = \operatorname{diag}(U). \]
But in general still need to pivot for stability.
Positive definite if \(\nu = (n, 0, 0)\), i.e. \[ \forall x \neq 0, \phi(x) > 0. \]
For \(A \succ 0\): \[\begin{aligned} A^{-1} &\succ 0 \\ A_{11} &\succ 0 \\ \kappa(A_{11}) &\leq \kappa(A) \\ S & \succ 0 \\ \kappa(S) & \leq \kappa(A) \end{aligned}\] where \(A\) is block 2-by-2 and \(S = A_{22} - A_{21} A_{11}^{-1} A_{12}\)
Partition \(A\) into first row/column and rest. Then: \[ \begin{bmatrix} a_{11} & a_{21}^T \\ a_{21} & A_{22} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 \\ l_{21} & L_{22} \end{bmatrix} \begin{bmatrix} l_{11} & l_{21}^T \\ 0 & L_{22}^T \end{bmatrix} \] Recursively look at \[\begin{aligned} l_{11} &= \sqrt{a_{11}} \\ l_{21} &= a_{21}/l_{21} \\ L_{22} L_{22}^T &= A_{22} - l_{21} l_{21}^T \end{aligned}\] Q: Why no worries about validity of \(\sqrt{\cdot}\)?
Algorithm looks a lot like LU!
As with LU, we usually block for level 3 BLAS.
Suppose a slightly-sloppy solve: \[ \hat{A} \hat{x} = b, \quad \hat{A} = A + E \] where \(\|E\| \leq \gamma \|A\|\) (think \(\gamma \ll 1\), but not \(O(\epsilon_{\mathrm{mach}})\)).
Can we improve the solution?
\[\begin{aligned} (A + E) \hat{x} &= b \\ A x &= b \\\hline A(\hat{x}-x) + E \hat{x} &= 0 \end{aligned}\]
Forward error \(e = \hat{x}-x\) is given by \[\begin{aligned} e &= -A^{-1} E \hat{x} \\ &= A^{-1} (b - A\hat{x}) \end{aligned}\] Can compute \(b-A \hat{x}\) up to roundoff…
Idea: Compute an (approximate) correction to \(\hat{x}_1\): \[\begin{aligned} \hat{x}_1 &= \hat{A}^{-1} b \\ \hat{x}_2 &= \hat{x}_1 + \hat{A}^{-1} (b-A \hat{x}_1) \end{aligned}\] Can repeat if needed (iterative refinement).
Iteration equation: \(\hat{x}_{k+1} = \hat{x}_k + d_k\), \[ (A+E_k) d_k = b-A\hat{x}_k + f_k \] where \[\|E_{k+1}\| \leq \gamma \|A\|, \quad \|f_k\| \leq \phi \]
Iteration equation: \(\hat{x}_{k+1} = \hat{x}_k + d_k\)
Goal: Get an iteration for the residual.
\[\begin{aligned} && (A+E_k) d_k &= b-A\hat{x}_k + f_k \\ &\implies& (I+E_k A^{-1}) (r_{k+1}-r_k) &= -r_k + f_k \\ &\implies& (I+E_k A^{-1}) r_{k+1} &= E_k A^{-1} r_k + f_k \end{aligned}\]
Goal: Get an iteration for \(\|r_k\|\) from \[ (I+E_k A^{-1}) r_{k+1} = E_k A^{-1} r_k + f_k \] where \[\|E_{k+1}\| \leq \gamma \|A\|, \quad \|f_k\| \leq \phi\] Hint: Neumann bounds (assume \(\kappa(A) \gamma < 1\))
\[ \|r_{k+1}\| \leq \frac{\kappa(A) \gamma \|r_k\| + \phi}{1-\kappa(A) \gamma} \]
Iteration is \[ \|r_{k+1}\| \leq \alpha \|r_k\| + \tilde{\phi} \] starting from \(\|r_0\| = \|b\|\), where \[\begin{aligned} \alpha &= \frac{\kappa(A) \gamma}{1-\kappa(A) \gamma} \\ \tilde{\phi} &= \frac{\phi}{1-\kappa(A) \gamma} \end{aligned}\]
Q: How to bound \(\|r_k\|\) from recurrence? \[ \|r_{k+1}\| \leq \alpha \|r_k\| + \tilde{\phi} \]
Assuming \(\alpha < 1\), can show by induction \[ \|r_k\| \leq \alpha^k \|b\| + \frac{\tilde{\phi}}{1-\alpha}. \] For \(\alpha\) sufficiently less than one, residual is asymptotically determined by \(\phi\).
For a good enough approximate solver (relative to \(\kappa(A)\)):
Consider \[\kappa_1(A) = \|A\|_1 \|A^{-1}\|_1\]
Note: this is a convex optimization problem! \[ \|A^{-1}\|_1 = \max_{\|x\|_1 \leq 1} \|A^{-1} x\|_1 \]
Define \(\xi_j = \operatorname{sign}(y_j)\). Then \[ \|y\|_1 = \xi^T y \] and if \(y\) has no nonzero components, \(\xi\) is a gradient: \[ \delta\left( \|y\|_1 \right) = \xi^T \, \delta y. \]
Predict using gradient: \[ \|A^{-1} x_{\mathrm{new}}\|_1 \approx \xi^T A^{-1} x_{\mathrm{new}} \]
This gives Hager’s estimator (independently derived but equivalent to Boyd’s \(p\)-norm power method).
function hager(n, solveA, solveAT)
x = ones(n)/n
invA_normest = 0.0
while true
y = solveA(x) # Evaluate y = A^-1 x
xi = sign.(y) # and z = A^-T sign(y), the
z = solveAT(xi) # subgradient of x -> |A^-1 x|_1
znorm, j = findmax(abs.(z)) # Next iterate
if znorm <= dot(z,x) # Convergence check
return norm(y,1)
end
x[:] .= 0.0 # Update x to e_j and repeat
x[j] = 1.0
end
invA_normest
endImproved version (Higham 1988) in LAPACK.gecon!