(1.3072880000000002, 1.3072880000186693)
Matrix calculus, sensitivity, conditioning
2025-08-27
Consider: \[ f = y^T x = \sum_i x_i y_i \] Then \[\begin{aligned} \frac{\partial f}{\partial x_j} &= y_j, & \frac{\partial f}{\partial y_j} &= x_j \end{aligned}\]
Alternate notation: \[ f(s) = (y + s \, \delta y)^T (x + s \, \delta x) \] Then \[ f'(0) = y^T \, \delta x + \delta y^T x \] Generally suppress the scaling variable \(s\) and write \[ \delta f = y^T \, \delta x + \delta y^T x. \] Variational notation for a Gateaux (directional) derivative.
Just to be sure!
Matrix product rule: \[ \delta (AB) = (\delta A) B + A (\delta B) \] Like usual product rule – but \(A\) and \(B\) don’t commute!
Consider \(B = A^{-1}\) and differentiate \(AB = I\): \[ \delta (AB) = (\delta A) B + A (\delta B) = 0. \] Rearrange to get \[ \delta(A^{-1}) = \delta B = -A^{-1} (\delta A) A^{-1}. \] Much easier than differentiating componentwise!
Vector two norm: \[\delta(\|x\|^2) = 2 \Re(\delta x^* x)\]
Frobenius norm: \[\delta(\|A\|^2) = 2 \Re(\delta A^* A)\]
From definition \[ \|A\|_2^2 = \max_{\|v\|_2^2=1} \|Av\|_2^2 \] Lagrangian for optimization \[ L(x, \lambda) = \frac{1}{2} \|Av\|_2^2 - \frac{\lambda}{2} (\|v\|^2-1) \] Q: What’s the derivative of \(L\)?
Stationary points satisfy (for any \(\delta x\) and \(\delta \lambda\)) \[ \delta L = \delta v^T (A^* A v - \lambda v) - \frac{\delta \lambda}{2} (\|v\|^2 - 1) = 0 \] So maximum (or minimum, etc) at an eigenvector of \[ A^* A = (V \Sigma U^*) (U \Sigma V^*) = V \Sigma^2 V^* \] Note that \(A v_i = u_i \sigma_i\), so we can get everything.
Suppose \(F \in L(\mathcal{V}, \mathcal{V})\) and \(\|F\| < 1\). Then define \[ G_n = \sum_{j=0}^n F^j. \] Note that \[ (I-F) G_n = I-F^{n+1} \] and \(\|F^{n+1}\| \leq \|F\|^{n+1}\) for consistent norms.
Triangle inequailty and consistency: \[ \|G_n\| \leq \sum_{j=0}^n \|F\|^j = \frac{1-\|F\|^{n+1}}{1-\|F\|}. \] Note that for \(m > n\), \[ \|G_n-G_m\| \leq \|F\|^n \frac{1-\|F\|^{(m-n)+1}}{1-\|F\|} < \frac{\|F\|^n}{1-\|F\|}. \] Cauchy sequence, so converges!
For \(\|F\| < 1\), have the convergent Neumann series \[ (I-F)^{-1} = \sum_{j=0}^\infty F^j. \] Consistency and triangle inequality give \[ \|(I-F)^{-1}\| \leq \sum_{j=0}^\infty \|F\|^j = (1-\|F\|)^{-1}. \] This is a Neumann series bound.
Consider invertible \(A\) and \(\|A^{-1} E\| < 1\): \[ \|(A+E)^{-1}\| = \|(I+A^{-1} E)^{-1} A^{-1}\| \leq \frac{\|A^{-1}\|}{1-\|A^{-1} E\|}. \]
\[e_{\mbox{abs}} = |\hat{x}-x|\]
\[e_{\mbox{rel}} = \frac{|\hat{x}-x|}{|x|}\]
\[e_{\mbox{mix}} = \frac{|\hat{x}-x|}{|x| + \tau}\]
Can do all the above with norms \[\begin{aligned} e_{\mbox{abs}} &= \|\hat{x}-x\| \\ e_{\mbox{rel}} &= \frac{\|\hat{x}-x\|}{\|x\|} \\ e_{\mbox{mix}} &= \frac{\|\hat{x}-x\|}{\|x\| + \tau} \end{aligned}\]
Consider \(y = f(x)\). Forward error for \(\hat{y}\): \[ \hat{y}-y \] Can also consider backward error \(\hat{x}-x\): \[ \hat{y} = f(\hat{x}) \] Treat error as a perturbation to input rather than output.
First-order bound on relation between relative changes in input and output: \[ \frac{\|\hat{y}-y\|}{\|y\|} \lesssim \kappa_f(x) \frac{\|\hat{x}-x\|}{\|x\|}. \] How to get (tight) constant \(\kappa_f(x)\)?
Consider \(\hat{y} = (A+E)x\) vs \(y = Ax\) (\(A\) invertible). \[ \frac{\|\hat{y}-y\|}{\|y\|} = \frac{\|Ex\|}{\|y\|} \leq \kappa(A) \frac{\|E\|}{\|A\|}. \] What should \(\kappa(A)\) be? Write \(x = A^{-1} y\); then \[ \frac{\|Ex\|}{\|y\|} = \frac{\|EA^{-1} y\|}{\|y\|} \leq \|EA^{-1}\| \leq \|E\| \|A^{-1}\|. \] So \(\kappa(A) = \|A\| \|A^{-1}\|\).
When \(\|~|A|~\| \leq \|A\|\) (true for all our favorites) and \(|E| < \epsilon |A|\) (elementwise): \[ \frac{\|Ex\|}{\|y\|} = \frac{\|EA^{-1} y\|}{\|y\|} \leq \|~|A|~|A^{-1}|~\| \epsilon \] \(\|~|A|~|A^{-1}|~\|\) is the relative condition number.
Q: What about \(\hat{y} = A \hat{x}\)? Same \(\kappa(A)\)…