Introduction
2025-08-25
Example vector: \[ v = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \in \mathbb{R}^2 \] Set of such vectors forms a concrete vector space.
Ingredients:
Need for all \(u, v, w \in \mathcal{V}\) and \(\alpha \in \mathbb{F}\): \[\begin{aligned} 0 v &= 0 & 1 v &= v \\ u + v &= v + u & (u + v) + w &= u + (v + w) \\ \alpha (u + v) &= \alpha u + \alpha v & (\alpha + \beta) u &= \alpha u + \beta u \end{aligned}\]
\[\begin{aligned} \mathcal{P}_d &= \{ \mbox{polynomials of degree at most $d$} \} \\ \mathcal{V}^* &= \{ \mbox{linear functions $\mathcal{V} \rightarrow \mathbb{R}$ (or $\mathbb{C}$)} \} \\ L(\mathcal{V}, \mathcal{W}) &= \{ \mbox{linear maps $\mathcal{V}\rightarrow \mathcal{W}$} \} \\ \mathcal{C}^k(\Omega) &= \{\mbox{ $k$-times differentiable functions on a set $\Omega$} \} \end{aligned}\]
\(\mathcal{U}\) is a subspace of vector space \(\mathcal{V}\) if
Sums of subspaces \(\mathcal{V}_1 \subset \mathcal{V}\) and \(\mathcal{V}_2 \subset \mathcal{V}\):
Can also quotient: \([v] \in \mathcal{V}/ \mathcal{U}= \{ v + u : u \in \mathcal{U}\}\).
When \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\), we have component projectors \[ \Pi_1 (v_1 + v_2) = v_1, \quad \Pi_2 (v_1 + v_2) = v_2 \] Example: \(\mathcal{P}_d\) is a direct sums of even and odd subspaces \[\begin{aligned} (\Pi_{\mathrm{even}} q)(x) &= \frac{1}{2} (q(x) + q(-x)) \\ (\Pi_{\mathrm{odd}} q)(x) &= \frac{1}{2} (q(x) - q(-x)) \end{aligned}\] Idea generalizes to when \(\mathcal{V}= \mathcal{V}_1 \oplus \ldots \oplus \mathcal{V}_k\).
\(S \subset \mathcal{V}\) a spanning set if any \(v \in \mathcal{V}\) is a linear combination \[ v = \sum_{j=1}^m \alpha_j s_j \] for some \(s_j \in S\) and \(\alpha_j \in \mathbb{F}\).
Example: \(\mathcal{P}_2 = \operatorname{span}\{1, x, -x, x^2\}\).
\(S \subset \mathcal{V}\) is linearly independent if any \(v \in \mathcal{V}\) is unique as \[ v = \sum_{j=1}^m \alpha_j s_j. \] Equivalent: \(S \subset \mathcal{V}\) if no nontrivial linear combination gives 0.
Example:
\(S \subset \mathcal{V}\) is a basis if
If \(S\) is a basis, \(d = |S|\) is the dimension.
Basis \(\{ w_1^*, \ldots, w_d^* \}\) for \(\mathcal{V}^*\) and basis \(\{ v_1, \ldots, v_d \}\) for \(\mathcal{V}\) are dual to each other if \[ w_i^* \left( \sum_{j=1}^d \alpha_j v_j \right) = \alpha_i. \] Equivalently: \(w_i^* v_j = \delta_{ij}\).
Basis quasimatrix \(V = \begin{bmatrix} v_1 & \ldots & v_d \end{bmatrix}\) for \(\mathcal{V}\)
Dual basis quasimatrix \(W^* = \begin{bmatrix} w_1^* \\ \vdots \\ w_d^* \end{bmatrix}\) for \(\mathcal{V}^*\)
Standard basis for \(\mathbb{R}^n\) has elements
\[ e_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix},~~ e_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix},~~ e_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{bmatrix},~~ \cdots,~~ e_n = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} \]
Basis quasimatrix looks like \(I\). This is not very interesting!
Example: Power basis for \(\mathcal{P}_2\) is \[ P = \begin{bmatrix} 1 & x & x^2 \end{bmatrix} \] Write \(p(x) = 1 + x^2\) as \[ p = \begin{bmatrix} 1 & x & x^2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} \]
Can also make basis of Chebyshev polynomials: \[\begin{aligned} T_0(x) &= 1 \\ T_1(x) &= x \\ T_{k+1}(x) &= 2xT_k(x) - T_{k-1}(x), \quad k > 1 \end{aligned}\] Note: \[ T_k(\cos(\theta)) = \cos(k\theta) \] Chebyshev polynomials equi-oscillate on \([-1,1]\).
Two bases for \(\mathcal{P}_d\):
Change of basis example: \(T_2 = P_2 X\) where \[ X = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{bmatrix}. \] Question: How would you compute \(X\) for general \(d\)?
Norm is \(\| \cdot \| : \mathcal{V}\rightarrow \mathbb{R}\) satisfying
An aside:
Our three favorite norms for \(\mathbb{R}^n\) (or \(\mathbb{C}^n\)) are the 2-norm (Euclidean norm), 1-norm (Manhattan norm) and \(\infty\)-norm (max norm): \[\begin{aligned} \|x\|_2 &= \sqrt{\sum_{j=1}^n |x_j|^2} \\ \|x\|_1 &= \sum_{j=1}^n |x_j| \\ \|x\|_\infty &= \max_{j=1}^n |x_j| \end{aligned}\]
For polynomials on \([-1,1]\), favorite norms are: \[\begin{aligned} \|p\|_2 &= \sqrt{\int_{-1}^1 |p(x)|^2 \, dx} \\ \|p\|_1 &= \int_{-1}^1 |p(x)| \, dx \\ \|p\|_\infty &= \max_{x \in [-1,1]} |p(x)| \end{aligned}\] Q: What are these for \(p(x) = x\)?
Norms \(\|\cdot\|\) and \(\|\cdot\|_*\) are equivalent if \(\exists c, C > 0\) s.t. \[ \forall v \in \mathcal{V}, c\|v\| \leq \|v\|_* \leq C\|v\|. \]
Q: Find \(c, C\) relating \(\|\cdot\|_1\) and \(\|\cdot\|_\infty\) on \(\mathbb{R}^n\)?
Function \(\langle \cdot, \cdot \rangle : \mathcal{V}\times \mathcal{V}\rightarrow \mathbb{R}\) (or \(\mathbb{C}\)) satisfying:
Check: \(\|v\| = \sqrt{\langle v, v \rangle}\) is a norm (the Euclidean norm for the inner product).
An inner product is a positive definite
On \(\mathbb{R}^n\) or \(\mathbb{C}^n\), the standard inner product (dot product) is \[ \langle x, y \rangle = \sum_{j=1}^n x_j \overline{y}_j = y^* x. \] This is not the only inner product even on these spaces!
We do something like this a lot: \[\begin{aligned} \|v+w\|^2 &= \langle v+w, v+w \rangle \\ &= \langle v, v \rangle + \langle v, w \rangle + \langle w, v \rangle + \langle w, w \rangle \\ &= \|v\|^2 + 2\Re \langle v, w \rangle + \|w\|^2 \end{aligned}\] This turns out to be useful in theory and in computation!
For real spaces: \[\begin{aligned} \|v+2\|^2 &= \|v\|^2 + 2 \langle v, w \rangle + \|w\|^2 \\ \|v+w\|^2 &\leq (\|v\|+\|w\|)^2 \\ &= \|v\|^2 + 2\|v\|\|w\| + \|w\|^2 \end{aligned}\] Therefore \[ \langle v, w \rangle \leq \|v\| \|w\| \] More generally: \(|\langle v, w \rangle| \leq \|v\| \|w\|\).
Define angle between \(v\) and \(w\) (for a real space) by \[ \cos(\theta) = \frac{\langle v, w \rangle}{\|v\| \|w\|}. \] Then expanding square is equivalent to \[ \|v+w\|^2 = \|v\|^2 + 2 \|v\| \|w\| \cos(\theta) + \|w\|^2. \] This is the law of cosines from basic trig.
Suppose \(\langle v, w \rangle = 0\), i.e. \(v\) and \(w\) are orthogonal or normal. Then \[ \|v+w\|^2 = \|v\|^2 + 2\Re \langle v, w \rangle + \|w\|^2 = \|v\|^2 + \|w\|^2. \] This is the Pythagorean theorem.
For polynomials, the \(L^2([-1,1])\) inner product is \[ \langle p, q \rangle = \int_{-1}^1 p(x) \overline{q(x)} \, dx. \] This is analogous to the standard inner product on \(\mathbb{R}^n\).
Suppose \(V\) is a basis for an inner product space \(\mathcal{V}\). \[\begin{aligned} \langle Vc, Vd \rangle &= \left\langle \sum_j v_j c_j, \sum_i v_i d_i \right\rangle \\ &= \sum_{i,j} \langle v_j, v_i \rangle c_j \overline{d}_i \\ &= \sum_{i,j} g_{ij} c_j \overline{d}_i = d^* G c \end{aligned}\] The Gram matrix \(G\) of outer products is symmetric (Hermitian) and positive definite.
A basis \(V\) is orthonormal if
An orthonormal basis is an isometry between the concrete and abstract spaces (with respect to the Euclidean norms): \[ \|Vc\| = \|c\|. \]
Consider Legendre polynomials (usually on \([-1,1]\)) \[\begin{aligned} P_0(x) &= 1 \\ P_1(x) &= x \\ P_k(x) &= (n+1) P_{n+1}(x) = (2n+1) x P_n(x) - n P_{n-1}(x) \end{aligned}\] Satisfies \[ \langle P_n, P_m \rangle = \frac{2}{2n+1} \delta_{mn}. \]
Scaled Legendre polynomials form orthonormal bases for \(\mathcal{P}_d\): \[ Q_n = \sqrt{\frac{2}{2n+1}} P_n \] These are very useful in function approximation.
Intro class: “matrices represent linear maps.”
But there’s more to the story!
Suppose \(V, W\) are bases for \(\mathcal{V}, \mathcal{W}\) and \(\mathcal{A} \in L(\mathcal{V}, \mathcal{W})\).
Matrix is given by: \[
A = W^{-1} \mathcal{A} V
\] That is \(y = Ax\) represents \[
(Wy) = \mathcal{A} (Vx)
\]
Suppose \(V\) a basis for \(\mathcal{V}\) and \(\mathcal{A} \in L(\mathcal{V},
\mathcal{V})\).
Matrix is given by: \[
A = V^{-1} \mathcal{A} V.
\] That is, \(y = Ax\) represents \[
(Vy) = \mathcal{A} (Vx).
\] We say \(A\) and \(\mathcal{A}\) are similar (and \(A = V^{-1}
\mathcal{A} V\) is a similarity transformation).
Suppose \(V, W\) are bases for \(\mathcal{V}, \mathcal{W}\) and \(a : \mathcal{V}\times
\mathcal{W}\rightarrow \mathbb{R}\) is bilinear (linear in both slots).
Then for the matrix \(A\) with entries \[
a_{ij} = a(v_j, w_i),
\] we have \[
a(Vx,Wy) = y^T A x
\]
Suppose \(V, W\) are bases for \(\mathcal{V}, \mathcal{W}\) and \(a : \mathcal{V}\times
\mathcal{W}\rightarrow \mathbb{C}\) is sesquilinear (linear in first slot, conjugate linear in second).
Then for the matrix \(A\) with entries \[
a_{ij} = a(v_j, w_i),
\] we have \[
a(Vx,Wy) = y^* A x
\]
Suppose \(V\) a basis for \(\mathcal{V}\) and \(\phi : \mathcal{V}\rightarrow \mathbb{R}\) is a quadratic form: \(\phi(v) = a(v,v)\) for a symmetric bilinear form \(a\).
Then for the matrix \(A\) with entries \[
a_{ij} = a(v_j, v_i)
\] we have \[
\phi(Vx) = x^T A x.
\]
Q: How could we get \(a(v_i,v_j)\) given just access to \(\phi\)?
Hint: Think about expanding \(\phi(v_i+v_j)\)!
\(L(\mathcal{V}, \mathcal{W})\) is a vector space
A norm on \(L(\mathcal{V}, \mathcal{W})\) is consistent with norms on \(\mathcal{V}, \mathcal{W}\) if \[ \|Av\| \leq \|A\| \|v\|. \] Ex: Frobenius norm is consistent with two norms: \[ \|Ax\|_2 \leq \|A\|_F \|x\|_2 \] Q: Why? (Hint: only one named inequality in these slides!)
If \(\mathcal{V}\) and \(\mathcal{W}\) have norms, the induced norm on \(L(\mathcal{V},\mathcal{W})\) is \[ \|\mathcal{A}\|_{\mathcal{V},\mathcal{W}} = \max_{v \neq 0} \frac{\|\mathcal{A}v\|_\mathcal{W}}{\|v\|_{\mathcal{V}}} \] For concrete case with our favorite norms, we have \[\begin{aligned} \|A\|_1 &= \max_j \sum_i |a_{ij}| \\ \|A\|_\infty &= \max_i \sum_j |a_{ij}| \\ \|A\|_2 &= \mbox{???} \end{aligned}\]
Basic idea: Write \(A\) as a product of other matrices! \[\begin{aligned} PA &= LU & \mbox{Gaussian elimination} \\ A &= QR & \mbox{Used for least squares, etc} \\ A &= U \Sigma V^* & \mbox{Singular value decomposition} \\ A &= V \Lambda V^{-1} & \mbox{Eigenvalue decomposition} \\ A &= U T U^* & \mbox{Schur decomposition} \end{aligned}\] Claim the last three are different from first two!
Canonical form: “simplest” matrix for any bases.
Mapping type: \(L(\mathcal{V}, \mathcal{W})\) or bilinear or sesquilinear forms
No restrictions on bases.
Canonical form: \[
\begin{bmatrix}
I_{r \times r} & 0_{r \times (n-r)} \\
0_{(m-r) \times r} & 0_{(m-r) \times (n-r)}
\end{bmatrix}
\] Rank is \(r\), null space dimension is \(n-r\).
Decomposition: \(\mathcal{A} = X_1 Y_1^*\) where \(X\) and \(Y^*\) are bases for \(\mathcal{W}\) and \(\mathcal{V}^*\).
Mapping type: \(L(\mathcal{V}, \mathcal{W})\) or bilinear or sesquilinear forms
Restrict to orthonormal bases.
Canonical form: \[
\begin{bmatrix}
\Sigma_1 & 0_{r \times (n-r)} \\
0_{(m-r) \times r} & 0_{(m-r) \times (n-r)}
\end{bmatrix}
\] where \(\Sigma_1 = \operatorname{diag}(\sigma_1, \ldots, \sigma_r)\) with \(\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r > 0\) the nonzero singular values.
Decomposition: \(\mathcal{A} = U_1 \Sigma_1 V_1^*\) (economy SVD)
Mapping type: \(L(\mathcal{V}, \mathcal{V})\).
No restriction on basis.
Canonical form (almost all matrices): \[
\Lambda = \operatorname{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n),
\] where \(\lambda_j\) are the eigenvalues (basis of eigenvectors).
Decomposition: \(\mathcal{A} = V \Lambda V^{-1}\)
Sometimes need generalized eigenvectors, which gives the more complicated Jordan form.
Mapping type: \(L(\mathcal{V}, \mathcal{V})\) (over \(\mathbb{C}\)).
Restrict to orthonormal basis.
Canonical form (all matrices): \[
T \mbox{ upper triangular, i.e. } t_{ij} = 0 \mbox{ for } i > j.
\]
Decomposition: \(\mathcal{A} = U T U^*\).
Prefixes of the basis vectors span invariant subspaces.
Mapping type: Quadratic form \(\phi\).
No restriction on basis.
Canonical form: \[
\begin{bmatrix}
I_{\nu_+} & 0 & 0 \\
0 & 0_{\nu_0} & 0 \\
0 & 0 & -I_{\nu_-}
\end{bmatrix}
\] where the triple \(\nu = (\nu_+, \nu_0, \nu_-)\) is Sylvester’s inertia.
Corresponds to decomposing the space into positive curvature, zero curvature, and negative curvature subspaces.
Mapping type: Quadratic form \(\phi\).
No restriction on basis.
Canonical form: \[
\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)
\] where \(\lambda_1 \geq \lambda_2 \geq \ldots\) are eigenvalues and the first \(\nu_+\) of the eigenvalues are positive, the next \(\nu_0\) are zero, and the remaining \(\nu_-\) are negative. The basis is eigenvectors.
Decomposition: \(\phi(x) = x^T Q \Lambda Q^T x\) or \(\phi(Qy) = y^T \Lambda y\).
For \(U\) an orthonormal basis or a unitary matrix (columns are an orthonormal basis for \(\mathbb{C}^n\)): \(\|Ux\| = \|x\|\)
Therefore if \(A = U \Sigma V^*\) is the full SVD: \[ \frac{\|Ax\|_2}{\|x\|_2} = \frac{\|U\Sigma V^* x\|_2}{\|x\|_2} = \frac{\|\Sigma V^* x\|_2}{\|V^* x\|_2}. \]
Therefore, maximizing \(\|Ax\|_2/\|x\|_2\) is equivalent to maximizing \[ \sqrt{\|\Sigma y\|_2^2/\|y\|_2^2} = \sqrt{\sum_j \sigma_j^2 w_j} \] where the weights \(w_j = y_j^2/\|y\|^2\) are positive and sum to one. The maximum possible value of the weighted average is \(\sigma_1\)
Similar logic on \(A^{-1} = V \Sigma^{-1} U^*\) gives \[ \|A^{-1}\|_2 = \sigma_n^{-1}. \]