- 1
- Solves the least squares problem via QR (behind the scenes)
- 2
- \(r\) is not zero! Should satisfy \(A^T r = 0\)
- 3
- Measure \(\|A^T r\|/\|b\|\) (about \(3 \epsilon_{\mathrm{mach}}\) here)
Householder, Givens, and QR factorization
2025-09-29
Goal: Solve the linear least squares problem \[ \min \|Ax-b\|^2 \] where \(A \in \mathbb{R}^{m \times n}\), \(m > n\). Solution: \[ x = A^\dagger b = (A^T A)^{-1} A^T b. \] \(A^\dagger\) is the Moore-Penrose pseudoinverse.
In terms of QR factorization, write \(A^\dagger = R_1^{-1} Q_1^T\); note: \[\begin{aligned} A^\dagger &= R^{-1} Q^T \\ \Pi &= Q_1 Q_1^T \\ I-\Pi &= Q_2 Q_2^T \end{aligned}\] Sanity check: Derive the latter two from the first.
F.Q * F.R does what you’d hope!F.Q[:,1:2]Can compute a full SVD from full QR and SVD of \(R\) \[ \tilde{U}_1 \Sigma_1 V^T = R, \quad U_1 = Q_1 \tilde{U}_1, \quad U_2 = Q_2 \]
In terms of SVD, write \(A^\dagger = V \Sigma_1^{-1} U_1^T\); note: \[\begin{aligned} A^\dagger &= V \Sigma_1^{-1} U_1^T \\ \Pi &= U_1 U_1^T \\ I-\Pi &= U_2 U_2^T \end{aligned}\]
svd forms ordinary dense representations of \(U\) and \(V\)Intro LA version: Given a basis \[ V = \begin{bmatrix} v_1 & v_2 & \ldots & v_n \end{bmatrix} \] Produce an orthonormal basis \[ Q = \begin{bmatrix} q_1 & q_2 & \ldots & q_n \end{bmatrix} \] s.t. for \(k = 1, \ldots, n\) \[ \mathcal{V}_{1:k} \equiv \operatorname{span}\{v_1, \ldots, v_k\} = \operatorname{span}\{q_1, \ldots, q_k\} \]
Define \(Q_{1:k} = \begin{bmatrix} q_1 & \ldots & q_k \end{bmatrix}\). Orth projector onto \(\mathcal{V}_{1:k}\) is \[ \Pi^{(k)} = Q_{1:k} Q_{1:k}^T = V_{1:k} V_{1:k}^\dagger \] Therefore \[ w_{k} = \left( I - \Pi^{(k-1)} \right) v_{k} \] is a residual of \(v_k\) to \(\mathcal{V}_{1:k-1}\). Take \(q_k = w_k/\|w_k\|\).
\[\begin{aligned} v_k &= \Pi^{k-1} v_k + w_k \\ &= Q_{1:k-1} r_{1:k-1,k} + q_k r_{kk}, \end{aligned}\] where \(r_{1:k-1} = Q_{1:k-1}^T v_k\) and \(r_{kk} = \|w_k\|\)
Clean up indexing with blocking! Given \(V_1 = Q_1 R_{11}\), want \[\begin{bmatrix} V_1 & v_2 \end{bmatrix} = \begin{bmatrix} Q_1 & q_2 \end{bmatrix} \begin{bmatrix} R_{11} & r_{12} \\ 0 & r_{22} \end{bmatrix}\] Get \(r_{12}\) from \[ Q_1^T v_2 = Q_1^T Q_1 r_{12} + Q_1^T q_2 r_{22} = r_{12} \] Leaves \(q_2 r_{22} = (I-Q_1 Q_1^T) v_2\).
(0.0, 1.3796212406123276e-6, 488620.25639022485)
Two ideas improve stability:
Observe: \[ I - \sum_{i=1}^j q_i q_i^T = \prod_{i=1}^j (I-q_i q_i^T) \]
Questions:
(2.2079048057097415e-17, 2.8258300911654438e-11)
(5.630766018929308e-17, 2.8782379311608624e-16)
Fortunately, there is another way…
Consider an algorithm style:
Need a simple reflection to introduce zeros in a column!
A Householder transformation (elementary reflector) is \[ H = I-2vv^T, \quad \|v\|_2 = 1 \]
Why do we care about the sign?
For each column \(k\):
\[ V^{(0)} = \begin{bmatrix} \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \end{bmatrix} \] \[ A^{(0)} = \begin{bmatrix} 0.8067 & 0.9139 & 0.1586 \\ 0.4203 & 0.1499 & 0.3644 \\ 0.3801 & 0.3566 & 0.0895 \\ 0.9338 & 0.8856 & 0.2698 \end{bmatrix} \]
\[ V^{(1)} = \begin{bmatrix} \color{red}{0.8928} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{red}{0.1734} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{red}{0.1568} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{red}{0.3851} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \end{bmatrix} \] \[ A^{(1)} = \begin{bmatrix} \color{blue}{-1.3579} & \color{purple}{-1.2981} & \color{purple}{-0.4177} \\ \color{blue}{0.0} & \color{purple}{-0.2796} & \color{purple}{0.2525} \\ \color{blue}{0.0} & \color{purple}{-0.0319} & \color{purple}{-0.0117} \\ \color{blue}{0.0} & \color{purple}{-0.0687} & \color{purple}{0.0212} \end{bmatrix} \]
\[ V^{(2)} = \begin{bmatrix} \color{black}{0.8928} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{black}{0.1734} & \color{red}{-0.9913} & \color{lightgray}{0.0} \\ \color{black}{0.1568} & \color{red}{-0.0555} & \color{lightgray}{0.0} \\ \color{black}{0.3851} & \color{red}{-0.1196} & \color{lightgray}{0.0} \end{bmatrix} \] \[ A^{(2)} = \begin{bmatrix} \color{black}{-1.3579} & \color{black}{-1.2981} & \color{black}{-0.4177} \\ \color{lightgray}{0.0} & \color{blue}{0.2897} & \color{purple}{-0.2475} \\ \color{lightgray}{0.0} & \color{blue}{0.0} & \color{purple}{-0.0397} \\ \color{lightgray}{0.0} & \color{blue}{0.0} & \color{purple}{-0.0391} \end{bmatrix} \]
\[ V^{(3)} = \begin{bmatrix} \color{black}{0.8928} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{black}{0.1734} & \color{black}{-0.9913} & \color{lightgray}{0.0} \\ \color{black}{0.1568} & \color{black}{-0.0555} & \color{red}{-0.9252} \\ \color{black}{0.3851} & \color{black}{-0.1196} & \color{red}{-0.3794} \end{bmatrix} \] \[ A^{(3)} = \begin{bmatrix} \color{black}{-1.3579} & \color{black}{-1.2981} & \color{black}{-0.4177} \\ \color{lightgray}{0.0} & \color{black}{0.2897} & \color{black}{-0.2475} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{blue}{0.0557} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{blue}{0.0} \end{bmatrix} \]
\[ V^{(3)} = \begin{bmatrix} \color{black}{0.8928} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \\ \color{black}{0.1734} & \color{black}{-0.9913} & \color{lightgray}{0.0} \\ \color{black}{0.1568} & \color{black}{-0.0555} & \color{black}{-0.9252} \\ \color{black}{0.3851} & \color{black}{-0.1196} & \color{black}{-0.3794} \end{bmatrix} \] \[ A^{(3)} = \begin{bmatrix} \color{black}{-1.3579} & \color{black}{-1.2981} & \color{black}{-0.4177} \\ \color{lightgray}{0.0} & \color{black}{0.2897} & \color{black}{-0.2475} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{black}{0.0557} \\ \color{lightgray}{0.0} & \color{lightgray}{0.0} & \color{lightgray}{0.0} \end{bmatrix} \]
Nonzero structure of normals almost complements \(R\).
Idea: Write \[
H = I - \tau w w^T
\] where \(w\) normalized to start with 1.
function hqr!(A)
m,n = size(A)
tau = zeros(n)
for j = 1:n
# -- Find H = I-tau*w*w' to zero out A[j+1:end,j]
normx = norm(A[j:end,j])
s = A[j,j] >= 0 ? -1 : 1
u1 = A[j,j] - s*normx
w = A[j:end,j]/u1
w[1] = 1.0
A[j+1:end,j] = w[2:end] # Save trailing part of w
A[j,j] = s*normx # Diagonal element of R
tau[j] = -s*u1/normx # Save scaling factor
# -- Update trailing submatrix by multipling by H
A[j:end,j+1:end] -= tau[j]*w*(w'*A[j:end,j+1:end])
end
A, tau
endFor level 3 BLAS, can compute block reflectors \[ I-WY^T = I-YTY^T, \quad T \mbox{ upper triangular} \]
A Givens rotation for \(x \in \mathbb{R}^{2}\) is \[ G = \begin{bmatrix} c & -s \\ s & c \end{bmatrix}, \quad c^2 + s^2 = 1 \] such that \[ Gx = \begin{bmatrix} r \\ 0 \end{bmatrix} \] Can compose as an alternative to Householder to build orthogonal matrices.
Think linear system iterative refinement on \[\begin{bmatrix} I & A \\ A^T & 0 \end{bmatrix} \begin{bmatrix} r \\ x \end{bmatrix} = \begin{bmatrix} b \\ 0 \end{bmatrix}.\]
Gives code
When \(A \in \mathbb{R}^{m \times n}\) and \(m \gg n\), may make sense to block: \[ A = \begin{bmatrix} A_1 \\ A_2 \end{bmatrix} = \begin{bmatrix} Q_1 R_1 \\ Q_2 R_2 \end{bmatrix} = \begin{bmatrix} Q_1 & \\ & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ R_2 \end{bmatrix}. \] Now compute QR for full \(A\) via \[ \begin{bmatrix} R_1 \\ R_2 \end{bmatrix} = \tilde{Q} R, \quad Q = \begin{bmatrix} Q_1 & \\ & Q_2 \end{bmatrix} \tilde{Q} \] Good for parallelism. Can apply recursively.
Consider semi-normal equations: \[ x = R^{-1} (R^{-T} (A^T b)). \]