4  Linear Algebra

We will not get far in this book without a strong foundation in linear algebra. That means knowing how to compute with matrices, but it also means understanding the abstractions of linear algebra well enough to use them effectively. We assume prior familiarity with linear algebra at the level of an undergraduate course taught from Strang (2023) or Lay, Lay, and McDonald (2015). It will be helpful, but not necessary, to have a more advanced perspective as covered in Axler (2024) or even Lax (2007). Even for readers well-versed with linear algebra, we suggest skimming this chapter, as some ideas (quasimatrices, the framework of canonical forms) are not always standard in first (or even second) linear algebra courses.

While our treatment will be fairly abstract, we try to keep things concrete by regularly giving examples in Julia. Readers unfamiliar with Julia may want to review Chapter 2; it is also possible to simply skim past the code examples.

We do not assume a prior course in functional analysis. This will not prevent us from sneaking a little functional analysis into our discussion, or treating some linear algebra concepts from a functional analysis perspective.

4.1 Vector spaces

A vector space (or linear space) over a field \(\mathbb{F}\) is a set with elements (vectors) that can be added or scaled in a sensible way – that is, addition is associative and commutative and scaling (by elements of the field) is distributive. We will always take \(\mathbb{F}\) to be \(\mathbb{R}\) or \(\mathbb{C}\). We generally denote vector spaces by script letters (e.g. \(\mathcal{V}, \mathcal{W}\)), vectors by lower case Roman letters (e.g. \(v, w\)), and scalars by lower case Greek letters (e.g. \(\alpha, \beta\)). But we feel free to violate these conventions according to the dictates of our conscience or in deference to other conflicting conventions.

There are many types of vector spaces. Apart from the ubiquitous concrete vector spaces \(\mathbb{R}^n\) and \(\mathbb{C}^n\), the most common vector spaces in applied mathematics are different types of function spaces. These include \[\begin{aligned} \mathcal{P}_d &= \{ \mbox{polynomials of degree at most $d$} \}; \\ L(\mathcal{V}, \mathcal{W}) &= \{ \mbox{linear maps $\mathcal{V}\rightarrow \mathcal{W}$} \}; \\ \mathcal{C}^k(\Omega) &= \{\mbox{ $k$-times differentiable functions on a set $\Omega$} \}; \\ \ell^1 &= \{ \mbox{absolutely summable sequences } x_1, x_2, \ldots \} \\ \ell^2 &= \{ \mbox{absolutely square-summable sequences } x_1, x_2, \ldots \} \\ \ell^\infty &= \{ \mbox{bounded sequences } x_1, x_2, \ldots \} \end{aligned}\] and many more. A finite-dimensional vector space can be put into 1-1 correspondence with some concrete vector space \(\mathbb{F}^n\) (using a basis). In this case, \(n\) is the dimension of the space. Vector spaces that are not finite-dimensional are infinite-dimensional.

4.1.1 Polynomials

We compute with vectors in \(\mathbb{F}^n\) (\(\mathbb{R}^n\) and \(\mathbb{C}^n\)), which we represent concretely by tuples of numbers in memory, usually stored in sequence. To keep a broader perspective, we will also frequently describe examples involving the polynomial spaces \(\mathcal{P}_d\).

The Julia Polynomials.jl package provides a variety of representations of and operations on polynomials.

using Polynomials

For example, we can construct a polynomial in standard form from its coefficients in the monomial basis or from its roots, or we can keep it in factored form internally.

Polynomial([2, -3, 1])
2 - 3∙x + x2
fromroots([1, 2])
2 - 3∙x + x2
fromroots(FactoredPolynomial, [1, 2])
(x - 1) * (x - 2)

The fact that there are several natural representations for polynomials is part of what makes them a good example of an abstract vector space.

4.1.2 Dual spaces

A vector space in isolation is a lonely thing1. Fortunately, every vector space has an associated dual space of linear functionals2, that is \[ \mathcal{V}^* = \{ \mbox{linear functions $\mathcal{V} \rightarrow \mathbb{F}$} \}. \] Dual pairs are everywhere in applied mathematics. The relation between random variables and probability densities, the theory of Lagrange multipliers, and the notion of generalized functions are all examples. It is also sometimes convenient to recast equations in terms of linear functionals, e.g. using the fact that \(v = 0\) iff \(\forall w^* \in \mathcal{V}^*, w^* v = 0\).

In matrix terms, we usually associate vectors with columns and dual vectors with columns (ordinary vectors) and rows (dual vectors). In Julia, we construct a dual vector by taking the conjugate transpose of a column, e.g.

typeof([1.0; 2.0; 3.0])   # Column vector in R^3
Vector{Float64} (alias for Array{Float64, 1})
typeof([0.0; 1.0; 0.0]')  # Row vector in R^3
Adjoint{Float64, Vector{Float64}}

Applying a linear functional is concretely just “row times column”:

let
    v = [1.0; 2.0; 3.0]   # Column vector in R^3
    w = [0.0; 1.0; 0.0]'  # Row vector in R^3
    w*v
end
2.0

In order to try to keep the syntax uniform between concrete vectors and operations on abstract polynomial spaces, we will define a DualVector type representing a (linear) functional

struct DualVector
    f :: Function
end

We overload the function evaluation and multiplication syntax so that we can write application of a dual vector \(w^*\) to a vector \(v\) as w(v) or as w*v.

(w :: DualVector)(v) = w.f(v)
Base.:*(w :: DualVector, v) = w(v)

Because dual vectors still make up a vector space, we would like to be able to add and scale them:

Base.:+(w1 :: DualVector, w2 :: DualVector) =
    DualVector(v -> w1(v) + w2(v))
Base.:-(w1 :: DualVector, w2 :: DualVector) =
    DualVector(v -> w1(v) - w2(v))
Base.:-(w :: DualVector) =
    DualVector(v -> -w(v))
Base.:*(c :: Number, w :: DualVector) =
    DualVector(v -> c*w(v))
Base.:*(w :: DualVector, c :: Number) = c*w
Base.:/(w :: DualVector, c :: Number) =
    DualVector(v -> w(v)/c)

Evaluating a polynomial at a point is one example of a functional in \(\mathcal{P}_d^*\); another example is a definite integral operation:

let
    p = Polynomial([2.0; -3.0; 1.0])
    w1 = DualVector(p -> p(0.0))
    w2 = DualVector(p -> integrate(p, -1.0, 1.0))
    w1*p, w2*p
end
(2.0, 4.666666666666666)

4.1.3 Subspaces

Given a vector space \(\mathcal{V}\), a (nonempty3) subset \(\mathcal{U}\subset \mathcal{V}\) is a subspace if it is closed under the vector space operations (addition and scalar multiplication). We usually take \(\mathcal{U}\) to inherit any applicable structure from \(\mathcal{V}\), such as a norm or inner product. Many concepts in numerical linear algebra and approximation theory are best thought of in terms of subspaces, whether that is nested subspaces used in approximation theory, subspaces that are invariant under some linear map, or subspaces and their orthogonal complements.

There are a few standard operations involving subspaces:

  • The span of a subset \(S \subset \mathcal{V}\) is the smallest subspace that contains \(S\), i.e. the set of all finite linear combinations \(\sum_i c_i s_i\) where each \(s_i \in S\) and \(c_i \in \mathbb{F}\).

  • Given a collection of subspaces \(\mathcal{U}_i\), their sum is the subspace \(\mathcal{U}\) consisting of all sums of elements \(\sum_i u_i\) where only finitely many \(u_i \in \mathcal{U}_i\) are nonzero. We say the sum is a direct sum if the decomposition into vectors \(u_i \in \mathcal{U}_i\) is unique.

  • The annihilator for a subspace \(\mathcal{U}\subset \mathcal{V}\) is the set of linear functionals that are zero on \(\mathcal{U}\). That is: \[ \mathcal{U}^\perp = \{ w^* \in \mathcal{V}^* : \forall u \in \mathcal{U}, w^* u = 0 \}. \] In inner product spaces, the annihilator of a subspace is identified with the orthogonal complement (Section 4.1.7).

  • Given a subspace \(\mathcal{U}\subset \mathcal{V}\), the quotient space \(\mathcal{V} / \mathcal{U}\) is a vector space whose elements are equivalence classes under the relation \(v_1 \sim v_2\) if \(v_1-v_2 \in \mathcal{U}\). We can always find a “complementary” subspace \(\mathcal{U}^c \subset \mathcal{V}\) isomorphic to \(\mathcal{V}/ \mathcal{U}\) such that each equivalence class in \(\mathcal{V}/ \mathcal{U}\) contains a unique representative from \(\mathcal{U}^c\). Indeed, we can generally find many such spaces! For any complementary subspace \(\mathcal{U}^c\), \(\mathcal{V}\) is a direct sum \(\mathcal{U} \oplus \mathcal{U}^c\).

  • In the infinite-dimensional setting, the closure of a subspace consists of all limit points in the subspace. This only makes sense in topological vector spaces where we have enough structure that we can take limits. We will always consider the even stronger structure of normed vector spaces.

4.1.4 Quasimatrices

Matrix notation is convenient for expressing linear algebra over concrete vector spaces like \(\mathbb{R}^n\) and \(\mathbb{C}^n\). To get that same convenience for abstract vector spaces, we introduce the more general notion of quasimatrices, matrix-like objects that involve more than just ordinary numbers. This includes letting columns or rows belong to an abstract vector space instead of a concrete space like \(\mathbb{R}^m\) or \(\mathbb{R}^n\), or using linear maps between subspaces as the “entries” of a matrix.

4.1.4.1 Column quasimatrices

A (column) quasimatrix is a list of vectors in some space \(\mathcal{V}\) interpreted as columns of a matrix-like object, i.e. \[ U = \begin{bmatrix} u_1 & u_2 & \ldots & u_n \end{bmatrix}. \] We write a linear combination of the columns of \(U\) as \[ Uc = \sum_{i=1}^n u_i c_i \] for some coefficient vector \(c \in \mathbb{F}^n\). The range space \(\mathcal{R}(U) = \{ Uc : c \in \mathbb{F}^n \}\) (or column space) is the set of all possible linear combinations of the columns; this is the same as the span of the set of column vectors. The key advantages of column quasimatrices over sets of vectors is that we assign indices to name the vectors and we allow for the possibility of duplicates.

When \(\mathcal{V}\) is a concrete vector space like \(\mathbb{R}^m\) or \(\mathbb{C}^m\), the notion of a quasimatrix coincides with the idea of an ordinary matrix, and taking a linear combination of columns corresponds to ordinary matrix-vector multiplication. We will typically use the word “quasimatrix” when the columns correspond to vectors in an abstract vector space like \(\mathcal{P}_d\).

We can represent a quasimatrix with columns in \(\mathcal{P}_d\) by an ordinary Julia array of Polynomials, e.g.

Udemo = let
    p1 = Polynomial([1.0])
    p2 = Polynomial([-1.0, 1.0])
    p3 = Polynomial([2.0, -3.0, 1.0])
    transpose([p1; p2; p3])
end
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(1.0)  Polynomial(-1.0 + 1.0*x)  Polynomial(2.0 - 3.0*x + 1.0*x^2)

We can use these together with standard matrix operations involving concrete vectors, e.g.

Udemo*[1.0 1.0; 1.0 1.0; 1.0 0.0]
1×2 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(2.0 - 2.0*x + 1.0*x^2)  Polynomial(1.0*x)

We also want to be able to multiply a dual vector by a quasimatrix to get a concrete row vector:

Base.:*(w :: DualVector, U :: AbstractMatrix) = map(w, U)

For example,

DualVector(p->p(0.0)) * Udemo
1×3 transpose(::Vector{Float64}) with eltype Float64:
 1.0  -1.0  2.0

We sometimes want to refer to a subset of the columns. In mathematical writing, we will use notation similar to that in Julia or MATLAB: \[ U_{:,p:q} = \begin{bmatrix} u_p & u_{p+1} & \ldots & u_q \end{bmatrix}. \] We will also sometimes (horizontally) concatenate two quasimatrices with columns in the same \(\mathcal{V}\), e.g. \[ U = \begin{bmatrix} U_{:,1:p} & U_{:,p+1:n} \end{bmatrix}. \] The range of the horizontal concatenation of two quasimatrices is the same as the sum of the ranges, i.e. \[ \mathcal{R}(U) = \mathcal{R}(U_{:,1:p}) + \mathcal{R}(U_{:,p+1:n}). \]

4.1.4.2 Row quasimatrices

Every vector space \(\mathcal{V}\) over a field \(\mathbb{F}\) has a natural dual space \(\mathcal{V}^*\) of linear functions from \(\mathcal{V}\) to \(\mathbb{F}\) (aka linear functionals). A row quasimatrix is a list of vectors in a dual space \(\mathcal{V}^*\) interpreted as the rows of a matrix-like object, i.e. \[ W^* = \begin{bmatrix} w_1^* \\ w_2^* \\ \vdots \\ w_m^* \end{bmatrix}. \] We write linear combinations of rows as \[ c^* W^* = \sum_{i=1}^m \bar{c}_i w_i^*; \] the set of all such linear combinations is the row space of the quasimatrix. We can refer to a subset of rows with colon notation as in Julia or MATLAB: \[ (W^*)_{p:q,:} = \begin{bmatrix} w_p^* \\ w_{p+1}^* \\ \vdots \\ w_q^* \end{bmatrix}. \] We can also combine row quasimatrices via horizontal concatenation, e.g. \[ W^* = \begin{bmatrix} (W^*)_{1:p,:} \\ (W^*)_{p+1:m,:} \end{bmatrix} . \]

In Julia, analogous to the column quasimatrices over spaces \(\mathcal{P}_d\), we can define a structure representing row quasimatrices over spaces \(\mathcal{P}_d^*\).

Wdemo = DualVector.(
    [p -> p(0.0);
     p -> p(1.0);
     p -> integrate(p, -1.0, 1.0)])
3-element Vector{DualVector}:
 DualVector(var"#27#30"())
 DualVector(var"#28#31"())
 DualVector(var"#29#32"())

4.1.4.3 Duality relations

For a column quasimatrix \(U\) with columns in \(\mathcal{V}\), the map \(c \mapsto Uc\) takes a concrete vector \(c \in \mathbb{F}^n\) to an abstract vector \(\mathcal{V}\). Alternately, if \(w^* \in \mathcal{V}^*\) is a linear functional, we have \[ d^* = w^* U = \begin{bmatrix} w^* u_1 & w^* u_2 & \ldots & w^* u_n \end{bmatrix}, \] representing a concrete row vector such that \(d^* c = w^* (Uc)\). So \(U\) can be associated either with a linear map from a concrete vector space \(\mathbb{F}^n\) to an abstract vector space \(\mathcal{V}\), or with a linear map from the abstract dual space \(\mathcal{V}^*\) to a concrete (row) vector space associated with linear functionals on \(\mathbb{F}^n\).

# Example: map concrete [1; 1; 1] to a Polynomial
Udemo*[1; 1; 1]
2.0 - 2.0∙x + 1.0∙x2
# Example: map DualVector to a concrete row
DualVector(p->p(0))*Udemo
1×3 transpose(::Vector{Float64}) with eltype Float64:
 1.0  -1.0  2.0

Similarly, a row quasimatrix \(W^*\) can be interpreted as a map from a concrete row vector to an abstract dual vector in \(\mathcal{V}^*\), or as a map from an abstract vector in \(\mathcal{V}\) to a concrete vector in \(\mathbb{F}^m\).

# Example: map concrete row to an abstract dual vector
typeof([1.0; 1.0; 1.0]'*Wdemo)
DualVector
# Example: map Polynomial to a concrete vector
Wdemo.*Polynomial([2.0, -3.0, 1.0])
3-element Vector{Float64}:
 2.0
 0.0
 4.666666666666666

Composing the actions of a row quasimatrix with a column quasimatrix gives us an ordinary matrix representing a mapping between two abstract vector spaces: \[ W^* U = \begin{bmatrix} w_1^* u_1 & \ldots & w_1^* u_n \\ \vdots & & \vdots \\ w_m^* u_1 & \ldots & w_m^* u_n \end{bmatrix} \] For example,

Wdemo*Udemo
3×3 Matrix{Float64}:
 1.0  -1.0  2.0
 1.0   0.0  0.0
 2.0  -2.0  4.66667

Conversely, composing the actions of a column quasimatrix with a row quasimatrix (assuming \(m = n\)) gives us an operator mapping \(\mathcal{V}\) to \(\mathcal{V}\): \[ U W^* = \sum_{k=1}^n u_k w_k^*. \] In Julia, we can write this mapping as

UWdemo(p) = Udemo*(Wdemo.*p)
UWdemo (generic function with 1 method)

These two compositions correspond respectively to the inner product and outer product interpretations of matrix-matrix multiplication.

4.1.4.4 Infinite quasimatrices

We may (rarely) discuss “infinite” quasimatrices with a countable number of columns. We will be analysts about it instead of algebraists. In the case of column quasimatrices:

  • We assume the columns lie in some complete vector space.

  • When we take a linear combination \(Uc\), we assume the sequence \(c\) is restricted to make sense of \[ Uc = \lim_{n\rightarrow \infty} U_{:,1:n} c_{1:n}. \] Usually, this will mean \(c\) is bounded (\(\ell^\infty\)), absolutely summable (\(\ell^1\)), or absolutely square summable (\(\ell^2\)).

  • We take the “range” of such a quasimatrix to be the set of allowed limits of linear combinations.

We can similarly define infinite versions of row-wise quasimatrices.

4.1.5 Bases

A basis set for a finite-dimensional vector space \(\mathcal{V}\) is a finite set \(S \subset \mathcal{V}\) that spans \(\mathcal{V}\) and whose elements are linearly independent — that is, no nonzero linear combination of entries of \(S\) is equal to zero. We can equivalently define a basis quasimatrix \(U\) such that \(\mathcal{R}(U) = \mathcal{V}\) and the mapping \(c \mapsto Uc\) is one-to-one. That is, we think of a basis quasimatrix as an invertible linear mapping from a concrete vector space \(\mathbb{F}^n\) to an abstract space \(\mathcal{V}\). While we may refer to either the set or the quasimatrix as “a basis” when terminology is unambiguous, by default we use quasimatrices.

Thinking of dual spaces as consisting of rows, we generally represent the basis of \(\mathcal{V}^*\) by a row quasimatrix \(W^*\). The map \(d^* \mapsto d^* W^*\) is then an invertible linear map from the set of concrete row vectors wo \(\mathcal{V}^*\). If \(U\) and \(W^*\) are bases of \(\mathcal{V}\) and \(\mathcal{V}^*\), the matrix \(X = W^* U\) represents the composition of two invertible maps (from \(\mathbb{F}^n\) to \(\mathcal{V}\) and back via the two bases), and so is invertible. The map \(U^{-1} = X^{-1} W^*\) is the dual basis associated with \(U\), which satisfies that \(U^{-1} U\) is the \(n\)-by-\(n\) identity matrix and \(U U^{-1}\) is the identity map on \(\mathcal{V}\).

If \(U\) and \(\hat{U} = UX\) are two bases for \(\mathcal{V}\), a vector can be written as \(v = Uc\) or \(v = \hat{U} \hat{c}\) where \(\hat{c} = X^{-1} c\). Because we use \(X\) to change from the \(U\) basis to the \(\hat{U}\) basis and use \(X^{-1}\) to change from the \(c\) coefficients to the \(\hat{c}\) coefficients, we say the coefficient vector is contravariant (it changes in contrast to the basis). In contrast, an abstract dual vector \(w^* \in \mathcal{V}^*\) can be written in terms of the \(U\) and \(\hat{U}\) vectors with the concrete row vectors \(d^* = w^* U\) or \(\hat{d}^* = w^* \hat{U} = d^* X\). Because the transformations from \(U\) to \(\hat{U}\) and from \(d^*\) to \(\hat{d}^*\) both involve the same operation with \(X\), we say these coefficient vector is covariant (it changes together with the basis).

4.1.5.1 Component conventions

We will generally follow the convention common in many areas of numerical analysis: a vector \(v \in \mathcal{V}\) is associated with a column, a dual vector \(w^* \in \mathcal{V}^*\) is associated with a row, and writing a dual vector followed by a vector (\(w^* v\)) indicates evaluation. However, there are some shortcomings to this convention, such as situations where we want to apply a dual vector to a vector but some other notational convention prevents us from writing the two symbols adjacent to each other. This issue becomes particularly obvious in multilinear algebra (tensor algebra), as we will see in Section 4.5.

An alternate convention that addresses some of these issues is frequently used in areas of physics where tensors are particularly common. In this convention (“Einstein notation”), vectors with respect to a basis \[ V = \begin{bmatrix} \mathbf{v}_1 & \ldots & \mathbf{v}_n \end{bmatrix} \] are written as \[ \mathbf{v} = \mathbf{v}_i c^i \] where the summation convention is in effect, i.e. there is an implicit summation over repeated indices. Dual vectors are written with respect to the basis \[ V^{-1} = \begin{bmatrix} \mathbf{w}^1 \\ \vdots \\ \mathbf{w}^n \end{bmatrix} \] as \[ \mathbf{w}^* = d_i \mathbf{w}^i. \] Therefore \[ \mathbf{w}^* \mathbf{v} = d_i c^i. \] Note that ordering of the symbols in the product makes no difference in this notation: \(d_i c^i\) and \(c^i d_i\) indicate the same scalar value.

The convention of using subscripts to denote vector coefficients (contravariant coefficients) and superscripts to denote dual coefficients (covariant coefficients) gives us a mechanism for “type checking” expressions: summation over a repeated index should involve one superscript (a covariant index) and one subscript (a contravariant index). At the same time, many authors who follow the summation convention only use subscript indices in order to save the superscript space for other purposes. Caveat lector!4

Indices are useful for matching functions (dual vectors) and arguments (vectors) without explicit reference to position in an expression. At the same time, indicial notation presumes a basis, with particularly basis-centric authors sometimes writing statements like “a vector is an array that transforms like a vector.” While we usually use bases in our computation, we prefer a basis-independent perspective for mathematical arguments when possible. With this said, we will revert to indicial notation when the situation merits it.

4.1.5.2 Polynomial bases

As an example of different choices of bases, we consider the polynomial spaces \(\mathcal{P}_d\). A common choice of basis for \(\mathcal{P}_d\) is the power basis \[ X_{0:d} = \begin{bmatrix} 1 & x & x^2 & \ldots & x^d \end{bmatrix}. \] That is, we write polynomials \(p \in \mathcal{P}_d\) in terms of the basis as \[ p = X_{0:d} c = \sum_{j=0}^d c_j x^j. \] A common basis for \(\mathcal{P}_d^*\) is a basis of point evaluation functionals, i.e. \[ Y^* p = \begin{bmatrix} p(x_0) \\ p(x_1) \\ \vdots \\ p(x_d) \end{bmatrix} \] for distinct points \(x_0, x_1, \ldots, x_d\). The matrix \(Y^* X_{0:d}\) is the Vandermonde matrix \[ V = \begin{bmatrix} 1 & x_0 & x_0^2 & \ldots & x_0^d \\ 1 & x_1 & x_1^2 & \ldots & x_1^d \\ 1 & x_2 & x_2^2 & \ldots & x_2^d \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_d & x_d^2 & \ldots & x_d^d \end{bmatrix}. \] The basis \(V^{-1} W^*\) is the dual basis associated with the monomial basis. For example, for quartic polynomials with equally spaced polynomials on \([-1, 1]\), we can write

let
    p = Polynomial([1.0; 2.0; 3.0; 4.0; 5.0])
    X = range(-1.0, 1.0, length=5)
    V = [xi^j for xi in X, j in 0:4]
    c = V\p.(X)
end
5-element Vector{Float64}:
 1.0
 2.0
 3.0
 4.0
 5.0

That is, if \(p(x) = c_0 + c_1 x + c_2 x^2 + c_3 x^3 + c_4 x^4\), and let \(p_X\) denote the vector of samples \(p(-1), p(-0.5), p(0), p(0.5), p(1)\), then \(c = V^{-1} p_X\). This is, alas, not a particularly good numerical approach to recovering a polynomial from point values, a topic that we will return to in Chapter 10.

The power basis is not the only basis for \(\mathcal{P}_d\). Other common choices include Newton or Lagrange polynomials with respect to a set of points. Along with Vandermonde matrices, we will discuss these topics in Chapter 10. We will sometimes use the (first kind) Chebyshev5 polynomial basis \(T_{0:d}\) with columns \(T_0(x), \ldots, T_d(x)\) given by the recurrence \[\begin{aligned} T_0(x) &= 1 \\ T_1(x) &= x \\ T_{j+1}(x) &= 2 x T_{j}(x) - T_{j-1}(x), \quad j \geq 1. \end{aligned}\] The Chebyshev polynomials can also be written on \([-1, 1]\) in terms of trigonometric functions: \[ T_m(\cos(\theta)) = \cos(m \theta). \] This connection between Chebyshev polynomials and trigonometric functions is part of their utility, and allows us to use transform methods for working with Chebyshev polynomials (Chapter 19).

The related second kind Chebyshev polynomial basis \(U_{0:d}\) satisfies \[\begin{aligned} U_0(x) &= 1 \\ U_1(x) &= 2x \\ U_{j+1}(x) &= 2x U_j(x) - U_{j-1}(x), \quad j \geq 1. \end{aligned}\] These polynomials satisfy \[ U_m(\cos(\theta)) \sin(\theta) = \sin(m \theta). \] These polynomials are useful in part because of the derivative relationship \(T_m'(x) = m U_{m-1}(x)\).

In matrix terms, we can write the relationship between the Chebyshev polynomials and the power basis as \(T_{0:d} = X_{0:d} M\) where the matrix \(M\) is computed via the three-term recurrence:

function chebyshev_power_coeffs(d, kind=1)
    n = max(d+1, 2)
    M = zeros(n, n)
    M[1,1] = 1                          # p0(x) = 1
    M[2,2] = kind                       # p1(x) = kind*x
    for j = 2:d                         # p_{j+1}(x) =
        M[2:end,j+1] .= 2*M[1:end-1,j]  #   2x p_j(x)
        M[:,j+1] .-= M[:,j-1]           #   -p_{j-1}(x)
    end
    UpperTriangular(M[1:d+1,1:d+1])
end
chebyshev_power_coeffs (generic function with 2 methods)

For example, for the quadratic case, we have

chebyshev_power_coeffs(2)
3×3 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.0  -1.0
  ⋅   1.0   0.0
  ⋅    ⋅    2.0

and the coefficient vectors \(d\) and \(c\) with respect to the Chebyshev and power bases are such that \(p = T_{0:d} d = X_{0:d} Md = X_{0:d} c\), so \(Md = c\). For example, to write the polynomial \(p(x) = 2 - 3x + x^2\) in the Chebyshev basis, we solve the triangular linear system:

chebyshev_power_coeffs(2) \ [2.0; -3.0; 1.0]
3-element Vector{Float64}:
  2.5
 -3.0
  0.5

If we would like to form a representation of the Chebyshev basis, we could use the power basis expansion

function chebyshev_basis(d, kind=1)
    M = chebyshev_power_coeffs(d, kind)
    transpose([Polynomial(M[1:j,j]) for j=1:d+1])
end

chebyshev_basis(2)           
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(1.0)  Polynomial(1.0*x)  Polynomial(-1.0 + 2.0*x^2)

However, the Chebyshev polynomials are natively supported in the Polynomials.jl package, and we can explicitly write polynomials in terms of Chebyshev polynomials. For example,

ChebyshevT([2.5, -3.0, 0.5])
2.5⋅T_0(x) - 3.0⋅T_1(x) + 0.5⋅T_2(x)
convert(Polynomial, ChebyshevT([2.5, -3.0, 0.5]))
2.0 - 3.0∙x + 1.0∙x2

The coefficients satisfy exactly the equations from above.

The bases \(T_{0:d}\) and \(U_{0:d}\) are also both bases for \(\mathcal{P}_d\), so there must be a change of basis matrix that converts between the two. We can look up or derive that \[ T_n(x) = \frac{1}{2} \left( U_n(x) - U_{n-2}(x) \right), \] or, in terms of a matrix representation of the change of basis, \[ T_{0:d} = U_{0:d} B \] where \(B\) is the upper triangular matrix \[ B = \begin{bmatrix} 1 & & -\frac{1}{2} \\ & \frac{1}{2} & & -\frac{1}{2} \\ & & \frac{1}{2} & & \ddots \\ & & & \frac{1}{2} & \ddots & -\frac{1}{2} \\ & & & & \ddots & \\ & & & & & \frac{1}{2} \end{bmatrix}. \] Alternately, in code we have

function change_U_to_T(d)
    B = Matrix(0.5*I, d+1, d+1)
    B[1,1] = 1.0
    for j = 3:d+1
        B[j-2,j] = -0.5
    end
    B
end
change_U_to_T (generic function with 1 method)

That is, we should have

chebyshev_basis(5) ==
    chebyshev_basis(5,2) * change_U_to_T(5)
true

In addition to the Chebyshev polynomials, we will sometimes want the Legendre polynomial basis \(P_{0:d}(x)\) with columns \(P_j(x)\) given by the recurrence \[\begin{aligned} P_0(x) &= 1 \\ P_1(x) &= x \\ (j+1) P_{j+1}(x) &= (2j+1) x P_j(x) - j P_{j-1}(x). \end{aligned}\] As with the Chebyshev polynomials, we can define an upper triangular matrix whose columns are coefficients for the Legendre polynomials in the power basis:

function legendre_power_coeffs(d)
    n = max(d+1, 2)
    M = zeros(n, n)
    M[1,1] = 1
    M[2,2] = 1
    for j = 2:d
        M[2:end,j+1] .= (2*j-1)*M[1:end-1,j]
        M[:,j+1] .-= (j-1)*M[:,j-1]
        M[:,j+1] ./= j
    end
    UpperTriangular(M[1:d+1,1:d+1])
end
legendre_power_coeffs (generic function with 1 method)

If we want a representation in terms of a list of Polynomial objects (representing the basis polynomials), we can simply convert the columns:

function legendre_basis(d)
    M = legendre_power_coeffs(d)
    transpose([Polynomial(M[1:j,j]) for j=1:d+1])
end

legendre_basis(2)           
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(1.0)  Polynomial(1.0*x)  Polynomial(-0.5 + 1.5*x^2)

Each basis we have discussed has a different use case. Sometimes the “obvious” choice of basis (e.g. the standard basis in \(\mathbb{R}^n\) or the power basis in \(\mathcal{P}_d\)) is not the best choice for numerical computations.

4.1.5.3 Block bases

Consider the direct sum \[ \mathcal{V}= \mathcal{U}_1 \oplus \mathcal{U}_2 \oplus \ldots \oplus \mathcal{U}_p. \] Given basis quasimatrices \(U_j\) for the spaces \(\mathcal{U}_j\), there is an associated basis quasimatrix \(V\) for \(\mathcal{V}\) defined by horizontal concatenation: \[ V = \begin{bmatrix} U_1 & U_2 & \ldots & U_p \end{bmatrix}. \] Conversely, we can think of partitioning the columns of a basis \(V\) for \(\mathcal{V}\) into disjoint subsets; then each subset is itself a basis for some subspace, and \(\mathcal{V}\) is a direct sum of those subspaces. This sort of “block thinking” will generally be useful in our treatment of numerical linear algebra.

4.1.5.4 Infinite bases

The notion of a basis can be generalized to the case of an infinite-dimensional space \(\mathcal{V}\) in two ways. The algebraists version (the Hamel basis) is a set of linearly independent vectors such that any vector is a finite linear combination of basis vectors. Such a basis always exists if we assume the axiom of choice, but is not particularly useful for our purposes. More useful for topics such as approximation theory and probability is the analyst’s notion of a basis: an infinite quasimatrix that represents a map between some sequence space (e.g. \(\ell^1\), \(\ell^2\), or \(\ell^\infty\)) and an abstract vector space.

4.1.6 Vector norms

A norm \(\|\cdot\|\) measures vector lengths. It is positive definite, (absolutely) homogeneous, and sub-additive: \[\begin{aligned} \|v\| & \geq 0 \mbox{ and } \|v\| = 0 \mbox{ iff } v = 0 \\ \|\alpha v\| &= |\alpha| \|v\| \\ \|u+v\| & \leq \|u\| + \|v\|. \end{aligned}\] The three most common vector norms we work with in \(\mathbb{R}^n\) are the Euclidean norm (aka the 2-norm), the \(\infty\)-norm (or max norm), and the \(1\)-norm: \[\begin{aligned} \|v\|_2 &= \sqrt{\sum_j |v_j|^2} \\ \|v\|_\infty &= \max_j |v_j| \\ \|v\|_1 &= \sum_j |v_j| \end{aligned}\] These are all available via Julia’s norm function:

let
    x = [3.0; 4.0]
    norm(x),      # Euclidean norm (also written norm(x,2))
    norm(x,Inf),  # Max norm
    norm(x,1)     # 1-norm
end
(5.0, 4.0, 7.0)

Many other norms can be related to one of these three norms. In particular, a “natural” norm in an abstract vector space will often look strange in the corresponding concrete representation with respect to some basis function. For example, consider the vector space of polynomials with degree at most 2 on \([-1,1]\). This space also has a natural Euclidean norm, max norm, and \(1\)-norm; for a given polynomial \(p(x)\) these are \[\begin{aligned} \|p\|_2 &= \sqrt{ \int_{-1}^1 |p(x)|^2 \, dx } \\ \|p\|_\infty &= \max_{x \in [-1,1]} |p(x)| \\ \|p\|_1 &= \int_{-1}^1 |p(x)| \, dx. \end{aligned}\] These norms are not built into the Polynomials package, but they are not difficult to compute for real polynomials:

real_roots(p) = sort(real(filter(isreal, roots(p))))
real_roots(p, a, b) = filter(x -> (a <= x <= b),
                             real_roots(p))

normL2(p) = sqrt(integrate(p*p, -1, 1))

function normL∞(p)
    nodes = [-1.0; real_roots(derivative(p), -1, 1); 1.0]
    maximum(abs.(p.(nodes)))
end

function normL1(p)
    nodes = [-1.0; real_roots(p, -1, 1); 1.0]
    sum(abs(integrate(p, nodes[j], nodes[j+1]))
        for j = 1:length(nodes)-1)
end

let
    p = Polynomial([-0.5, 0.0, 1.0])
    normL2(p),
    normL∞(p),
    normL1(p)
end
(0.483045891539648, 0.5, 0.60947570824873)

The same norms can also be approximated from samples using the midpoint rule (?sec-quad-diff-ch):

let
    n = 100
    p = Polynomial([-0.5, 0.0, 1.0])
    X = range(-1.0, 1.0, length=n+1)
    X = (X[1:end-1] + X[2:end])/2
    pX = p.(X)
    norm(pX)*sqrt(2/n),
    norm(pX, Inf),
    norm(pX, 1)*2/n
end
(0.48297688971626773, 0.4999, 0.6093599999999999)

Of course, when we write \(p(x)\) in terms of the coefficient vector with respect to the power basis (for example), the max norm of the polynomial is not the same as the max norm of the coefficient vector.

In a finite-dimensional vector space, all norms are equivalent: that is, if \(\|\cdot\|\) and \({\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert \cdot \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}\) are two norms on the same finite-dimensional vector space, then there exist constants \(c\) and \(C\) such that for any \(v\) in the space, \[ c \|v\| \leq {\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert v \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert} \leq C \|v\|. \] Of course, there is no guarantee that the constants are small! In infinite-dimensional spaces, not all norms are equivalent.

A normed vector space is a metric space, and so we can use it to define things like Cauchy sequences and limits. In a finite-dimensional normed space, any Cauchy sequence \(\{v_k\}_{k=1^\infty}\) in a finite-dimensional vector space \(\mathcal{V}\) converges to a limit \(v_* \in \mathcal{V}\). A normed infinite-dimensional vector space where Cauchy sequences converge (i.e. a space that is complete under the norm) is called a Banach space.

For any Banach space \(\mathcal{V}\), we can define the dual norm for continuous functionals \(w^*\) in the dual space \(\mathcal{V}^*\) by \[ \|w^*\| = \sup_{v \neq 0} \frac{w^* v}{\|v\|} = \sup_{\|v\| = 1} w^* v. \] In the finite-dimensional case, the supremum is always achieved, i.e. for any \(w^*\) there exists a \(v\) of unit norm (\(\|v\|=1\)) such that \(w^* v = \|w^*\|\). Conversely, for any \(v\) there exists a \(w^*\) of unit norm such that \(w^* v = \|v\|\). The \(\infty\)-norm and the \(1\)-norm are dual norms of each other, and the Euclidean norm is its own dual.

4.1.7 Inner products

An inner product \(\langle \cdot, \cdot \rangle\) is a function from two vectors into the real numbers (or complex numbers for an complex vector space). It is positive definite, linear in the first slot, and symmetric (or Hermitian in the case of complex vectors); that is: \[\begin{aligned} \langle v, v \rangle & \geq 0 \mbox{ and } \langle v, v \rangle = 0 \mbox{ iff } v = 0 \\ \langle \alpha u, w \rangle &= \alpha \langle u, w \rangle \mbox{ and } \langle u+v, w \rangle = \langle u, w \rangle + \langle v, w \rangle \\ \langle u, v \rangle &= \overline{\langle v, u \rangle}, \end{aligned}\] where the overbar in the latter case corresponds to complex conjugation. The standard inner product on \(\mathbb{C}^n\) is \[ x \cdot y = y^* x = \sum_{j=1}^n \bar{y}_j x_j. \] A vector space with an inner product is sometimes called an inner product space or a Euclidean space.

4.1.7.1 The Riesz map

The inner product defines an (anti)linear map (the Riesz map) from vectors \(v\) to dual vectors \(v^*\) via \(v^* u = \langle u, v \rangle\). In terms of the Julia code in this chapter,

riesz(dot) = v -> DualVector(u -> dot(u, v))

According to the Riesz representation theorem, in finite-dimensional spaces, this defines a 1-1 correspondence between vectors and dual vectors. For \(\mathbb{R}^n\) or \(\mathbb{C}^n\) under the standard inner product, the Riesz map is simply the (conjugate) transpose: \[ \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} \mapsto \begin{bmatrix} \bar{c}_1 & \bar{c}_2 & \ldots & \bar{c}_n \end{bmatrix}. \] Going beyond single vectors at a time, we can think of the Riesz map as inducing a mapping from colum quasimatrices to row quasimatrices.

In infinite-dimensional inner product spaces, we have a 1-1 correspondence between vectors and continuous dual vectors (which is all dual vectors for a complete inner product space). A (not-necessarily finite-dimensional) Euclidean space that is complete under the Euclidean norm is called a Hilbert space.

4.1.7.2 The Gram matrix

Suppose \(\mathcal{V}\) is an arbitrary space with an inner product and \(V\) is a basis for that space. Then for vectors expressed in that basis, the inner product is \[\begin{aligned} \langle Vc, Vd \rangle &= \left\langle \sum_j v_j c_j, \sum_i v_i d_i \right\rangle \\ &= \sum_{i,j} \bar{d}_i c_j \langle v_j, v_i \rangle \\ &= d^* M c = \langle c, d \rangle_M, \end{aligned}\] where \(M = V^* V\) is the Gram matrix whose entries are inner products of basis elements. In Julia code, we have

gram(dot, V) = Symmetric([dot(vj, vi) for vi in V[:], vj in V[:]])

The Gram matrix is Hermitian and (because \(V\) is a basis) positive definite. We call the expression \(d^* M c\) the \(M\)-inner product between the concrete vectors \(c\) and \(d\).

We can also write the Riesz map in terms of \(V\) and \(M\). Given \(w^* \in \mathcal{V}^*\), the Riesz map is \(w = V M^{-1} (w^* V)^*\); we can verify this based on the expression of the \(\mathcal{V}\)-inner product as an \(M\)-inner product over coefficient vectors: \[ \langle u, w \rangle = \langle V^{-1} u, M^{-1} (w^* V)^* \rangle = w^* V M^{-*} M V^{-1} u = w^* u. \]

4.1.7.3 Euclidean norms

Every inner product defines a corresponding norm \[ \|v\| = \sqrt{ \langle v, v \rangle}. \] The inner product can also be recovered from the norm. In general, we can expand \[\begin{aligned} \|u+v\|^2 &= \langle u + v, u + v \rangle \\ &= \langle u, u \rangle + \langle v, u \rangle + \langle u, v \rangle + \langle v, v \rangle \\ &= \|u\|^2 + 2 \Re \langle u, v \rangle + \|v\|^2 \end{aligned}\] Therefore \[\begin{aligned} \Re \langle u, v \rangle &= \frac{1}{2} \left( \|u+v\|^2 - \|u\|^2 - \|v\|^2 \right) \\ \Im \langle u, v \rangle = \Re \langle u, iv \rangle &= \frac{1}{2} \left( \|u+iv\|^2 - \|u\|^2 - \|v\|^2 \right). \end{aligned}\] Of course, for real vector spaces only the first equation is needed.

4.1.7.4 Angles and orthogonality

The inner product and the associated norm satisfy the Cauchy-Schwarz inequality \[ |\langle u, v \rangle| \leq \|u\| \|v\|. \] We define the cosine of the angle between nonzero real vectors \(u\) and \(v\) to be \[ \cos \theta = \frac{\langle v, w \rangle}{\|v\| \|w\|}. \] Because of the Cauchy-Schwarz inequality, the cosine is always at most one in absolute value. For a real inner product space, the expansion of \(\|u+v\|^2\) and the definition of the cosine gives us the law of cosines: for any nonzero \(u\) and \(v\), \[ \|u+v\|^2 = \|u\|^2 + 2 \|u\| \|v\| \cos(\theta) + \|v\|^2. \] We say two vectors \(u\) and \(v\) are orthogonal with respect to an inner product if \(\langle u, v \rangle = 0\). If \(u\) and \(v\) are orthogonal, we have the Pythagorean theorem: \[ \|u+v\|^2 = \|u\|^2 + \|v\|^2. \]

If \(\mathcal{U}\) is any subspace of an inner product space \(\mathcal{V}\), we can define the orthogonal complement \(\mathcal{U}^\perp\): \[ \mathcal{U}^\perp = \{ v \in \mathcal{V}: \forall u \in \mathcal{U}, \langle u, v \rangle = 0 \}. \] We can always write \(\mathcal{V}\) as the direct sum \(\mathcal{U}\oplus \mathcal{U}^\perp\). In other words, the orthogonal complement is isomorphic to the quotient space \(\mathcal{V}/ \mathcal{U}\), and in inner product spaces these two are often identified with each other.

Vectors that are mutually orthogonal and have unit length in the Euclidean norm are said to be orthonormal. A basis \(V\) for an \(n\)-dimensional space \(\mathcal{V}\) is an orthonormal basis if all its columns are orthonormal. In this case, \(V^* V\) is the \(n\)-by-\(n\) identity matrix and \(V V^*\) is the identity map on \(\mathcal{V}\). That is, for an orthonormal basis quasimatrix, the Riesz map and the dual basis are the same, i.e. \(U^* = U^{-1}\).

4.1.7.5 Einstein notation

For physicists working with Einstein notation on real vector spaces, the Gram matrix (aka the metric tensor) is sometimes written in terms of the components \(g_{ij} = \langle \mathbf{v}_j, \mathbf{v}_i \rangle\); the inverse of the Gram matrix is written with components \(g^{ij}\). In this convention, the Riesz map from \(\mathcal{V}^*\) to \(\mathcal{V}\) is associated with using the metric tensor to “raise an index”: \[ v^i = g^{ij} w_j. \] Conversely, the inverse Riesz map from \(\mathcal{V}\) to \(\mathcal{V}^*\) is associated with using the metric tensor to “lower an index”: \[ w_i = g_{ij} v^j. \] And the Euclidean norm is written as \[ \|\mathbf{v}\|^2 = g_{ij} v^i v^j \] When an orthonormal basis is used, the Gram matrix is just the identity, and raising or lowering indices appears to be a purely formal matter.

4.1.7.6 The \(L^2\) inner product

Returning to our example of a vector space of polynomials, the standard \(L^2([-1,1])\) inner product is \[ \langle p, q \rangle_{L^2([-1,1])} = \int_{-1}^1 p(x) \bar{q}(x) \, dx. \] In Julia code, we can write this as

dotL2(p, q) = integrate(p*conj(q), -1, 1)

The Gram matrix for this inner product with the power basis (using zero-based indexing) is \[ a_{ij} = \int_{-1}^1 x^{i-1} x^{j-1} \, dx = \begin{cases} 2/(i+j+1), & i+j \mbox{ even} \\ 0, & \mbox{ otherwise} \end{cases} \] which we can implement in Julia as

gram_L2_power(d) = [(i+j)%2 == 0 ? 2.0/(i+j+1) : 0.0
                    for i=0:d, j=0:d]

For example, for quadratics we have

gram_L2_power(2)
3×3 Matrix{Float64}:
 2.0       0.0       0.666667
 0.0       0.666667  0.0
 0.666667  0.0       0.4

Hence we can compute \(\langle 1 + x + x^2, 2 -3x + x^2 \rangle\) either via

let
    p = Polynomial([1.0; 1.0; 1.0])
    q = Polynomial([2.0; -3.0; 1.0])
    dotL2(p, q)
end
4.4

or via

[1.0; 1.0; 1.0]'*gram_L2_power(2)*[2.0; -3.0; 1.0]
4.3999999999999995

For the Chebyshev basis, the Gram matrix has entries \[ \int_{-1}^1 T_{i}(x) T_j(x) \, dx. \] This can also be computed using the relations \[\begin{aligned} T_m(x) T_n(x) &= \frac{1}{2} \left( T_{m+n}(x) + T_{|m-n|}(x) \right) \\ \int_{-1}^1 T_n(x) \, dx &= \begin{cases} \frac{2}{1-n^2}, & n \mbox{ even} \\ 0, & n \mbox{ odd}. \end{cases} \end{aligned}\]

function gram_L2_chebyshev(d)
    tint(j) = j%2 == 0 ? 2.0/(1-j^2) : 0.0
    m(i,j) = (tint(i+j) + tint(abs(i-j)))/2.0
    [m(i,j) for i=0:d, j=0:d]
end
gram_L2_chebyshev (generic function with 1 method)

For example, the Gram matrix for \(T_{0:4}\) is

gram_L2_chebyshev(4)
5×5 Matrix{Float64}:
  2.0        0.0       -0.666667   0.0       -0.133333
  0.0        0.666667   0.0       -0.4        0.0
 -0.666667   0.0        0.933333   0.0       -0.361905
  0.0       -0.4        0.0        0.971429   0.0
 -0.133333   0.0       -0.361905   0.0        0.984127

We can sanity check our integration by comparing to another routine:

gram_L2_chebyshev(4)  gram(dotL2, chebyshev_basis(4)[:])
true

For the Legendre basis, the Gram matrix is

gram(dotL2, legendre_basis(4)[:])
5×5 Symmetric{Float64, Matrix{Float64}}:
 2.0  0.0       0.0  0.0       0.0
 0.0  0.666667  0.0  0.0       0.0
 0.0  0.0       0.4  0.0       0.0
 0.0  0.0       0.0  0.285714  0.0
 0.0  0.0       0.0  0.0       0.222222

That is, the Gram matrix for the Legendre basis is diagonal, with diagonal entries \(2/1, 2/3, 2/5, 2/7, 2/9, \ldots\). In other words, the Legendre basis vectors are orthogonal with respect to the \(L^2([-1,1])\) inner product. The normalized Legendre polynomials are \[ \sqrt{\frac{2j+1}{2}} P_j, \] or, in code

normalized_legendre_basis(d) =
    transpose([sqrt((2j+1)/2)*Pj
               for (Pj,j) = zip(legendre_basis(d), 0:d)])
normalized_legendre_basis (generic function with 1 method)

and these form an orthonormal basis for the \(\mathcal{P}_d\) spaces under the \(L^2([-1,1])\) inner product. Hence, the Gram matrix for a normalized Legendre basis (using the \(L^2([-1,1])\) inner product) is just the identity:

let
= normalized_legendre_basis(3)
    gram(dotL2, P̄)
end
4×4 Symmetric{Float64, Matrix{Float64}}:
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0

Using an orthonormal basis \(U\) makes it straightforward to evaluate the Riesz map. For a given functional \(w^* \in \mathcal{V}^*\) the associated \(w \in \mathcal{V}\) under the Riesz map is \[ w = U (w^* U)^*, \] where \((w^* U)^*\) represents the ordinary conjugate transpose of the concrete row vector \(w^* U\). Then taking the inner produce with \(w\) is equivalent to evaluating \(w^*\), since \[ \langle Uc, w \rangle = \langle Uc, U (w^* U)^* \rangle = (w^* U) c = w^* (Uc). \] In the case of the normalized Legendre polynomials, for example, this gives us a way of computing a vector \(\delta_x\) associated with the evaluation functional \(\delta_x^*\) such that \(\delta_x^* p = p(x)\):

function dualL2_δx(x, d)
= normalized_legendre_basis(d)
    c = [P̄j(x) for P̄j in P̄]'
*c[:]
end
dualL2_δx (generic function with 1 method)

For example, evaluating \(2 -3x + x^2\) at \(x = 0.5\) gives \(p(0.5) = \langle p, \delta_{0.5} \rangle = 0.75\); in code, we have

let
    p = Polynomial([2.0; -3.0; 1.0])
    δhalf = dualL2_δx(0.5, 2)
    p(0.5)  dotL2(p, δhalf)
end
true

4.2 Maps and forms

A central object in linear algebra is a linear map between two spaces. If \(\mathcal{V}\) and \(\mathcal{U}\) are two vector spaces, we denote the vector space of linear maps6 between them by \(L(\mathcal{V}, \mathcal{U})\).

Closely associated with linear maps between two vector spaces are certain maps from two vector spaces. A bilinear form is a map \(a : \mathcal{U}\times \mathcal{V}\rightarrow \mathbb{F}\) which is linear in both arguments. Such a form can be identified with a linear map from \(\mathcal{V}\) to \(\mathcal{U}^*\) by partial evaluation: \(v \mapsto a(\cdot, v)\). A sesquilinear form is linear in one argument and antilinear (conjugate linear) in the second argument, and can be identified with an antilinear map from \(\mathcal{V}\) to \(\mathcal{U}^*\). We are generally interested in bilinear forms over real spaces and sesquilinear forms over complex spaces. When we want to talk about real bilinear forms and complex sesquilinear forms at the same time, we will usually just say “sesquilinear forms.”

On inner product spaces, each sesquilinear form \(a : \mathcal{V}\times \mathcal{W} \rightarrow \mathbb{F}\) is associated with a unique linear map \(L \in L(\mathcal{V}, \mathcal{W})\) such that \[ a(v, w) = \langle Lv, w \rangle_{\mathcal{U}}. \] That is, sesquilinear forms can be identified with linear maps and vice-versa, and so we will address them together.

A linear map from a space to itself is called a linear operator. Linear operators have enough additional structure that we will deal with them separately.

4.2.1 Dual and adjoint maps

Any linear map \(L \in L(\mathcal{V}, \mathcal{U})\) comes with an associated dual map in \(L(\mathcal{U}^*, \mathcal{V}^*)\) with the action \(u^* \mapsto u^* L\). If both spaces have inner products, then we compose the dual map with the Riesz map on both sides to get the adjoint \(L^* \in L(\mathcal{U}, \mathcal{V})\) with the action \(u \mapsto (u^* L)^*\). Equivalently, the adjoint map has the property that \[ \langle Lv, u \rangle_{\mathcal{U}} = \langle v, L^* u \rangle_{\mathcal{V}}. \] In concrete spaces with the standard inner product, the adjoint map is represented by the conjugate transpose of the matrix of the original map.

4.2.2 Matrices

4.2.2.1 Matrices for maps

The usual representation of a linear map \(L \in L(\mathbb{F}^n, \mathbb{F}^m)\) is via an \(m\)-by-\(n\) matrix \(A \in \mathbb{F}^{m \times n}\). Applying the function \(y = Lx\) is equivalent to the matrix multiplication \(y = Ax\): \[ y_i = \sum_{j=1}^n a_{ij} x_j. \]

More generally, if \(L \in L(\mathcal{V}, \mathcal{U})\) and \(V\) and \(U\) are bases for \(\mathcal{V}\) and \(\mathcal{U}\), then there is an associated matrix representation \[ A = U^{-1} L V. \] The action of the matrix \(A\) comes from converting from \(\mathbb{F}^n\) to \(\mathcal{V}\) (via \(V\)), applying \(L\) to get a vector in \(\mathcal{U}\), and mapping from \(\mathcal{U}\) back to the concrete space \(\mathbb{F}^m\). Equivalently, we can write the abstract operator as \[ L = U A V^{-1}, \] i.e. we map from \(\mathcal{V}\) to a concrete vector in \(\mathbb{F}^n\), map that to a concrete vector in \(\mathbb{F}^m\) by matrix multiplication with \(A\), and map \(\mathbb{F}^m\) to \(\mathcal{U}\) via the \(U\) basis.

If we consider the case where \(V'\) and \(U'\) are alternate bases for \(\mathcal{V}\) and \(\mathcal{U}\), there is an associated matrix representation \[ A' = U'^{-1} L V' = (U'^{-1} U) A (V^{-1} V') \] Here the (invertible) matrix \(V^{-1} V'\) represents a change of basis: if \(v = V'c\) is a representation in the \(V'\) basis and \(v = Vd\) is a representation in the \(V\) basis, then \(d = V^{-1} V' c\). Similarly, the matrix \(U'^{-1} U\) converts the coefficients in the \(U\) basis to coefficients in the \(U'\) basis.

4.2.2.2 Matrices for sesquilinear forms

Suppose \(a : \mathcal{V}\times \mathcal{W}\rightarrow \mathbb{F}\) is a sesquilinear form and \(V\) and \(W\) are bases for \(\mathcal{V}\) and \(\mathcal{W}\). Then \[ a(Vx, Wy) = \sum_{j,i} x_j \bar y_i a(v_j, w_i) = y^* A x \] where \(A\) is the matrix with entries \(a_{ij} = a(v_j, w_i)\). In an inner product space where we can write \(a\) in terms of a linear operator \(L\), we have \[ a(Vx, Wy) = \langle L Vx, Wy \rangle_{\mathcal{W}} = y^* W^* L V x, \] i.e. we can also think of \(A\) as \(W^* L V\).

We note that the matrix representation of the sesquilinear form is not identical to the matrix representation of \(L\) as a linear map, except in the special case where we constrain \(V\) and \(W\) to be orthonormal bases. For more general matrices, the expression \(W^* L V\) involves both the matrix for the linear map (\(W^{-1} L V\)) and the Gram matrix (\(M = W^* W)\); that is, we can also write \[ y^* W^* L V x = y^* (W^* W) W^{-1} L V x = \langle W^{-1} L V x, y \rangle_M. \]

4.2.2.3 Representing derivatives

As a concrete example of matrix representations, we consider the derivative operator \(D : \mathcal{P}_d \rightarrow \mathcal{P}_{d-1}\). With respect to the power basis for both \(\mathcal{P}_d\) and \(\mathcal{P}_{d-1}\), we write the relation \[ \frac{d}{dx} x^n = n x^{n-1} \] as \[ D X_{0:d} = X_{0:d-1} A \] where \[ A = \begin{bmatrix} 0 & 1 \\ & & 2 \\ & & & 3 \\ & & & & \ddots \\ & & & & & d \end{bmatrix}. \] In code, we have

function derivative_power(d)
    A = zeros(d,d+1)
    for i=1:d
        A[i,i+1] = i
    end
    A
end
derivative_power (generic function with 1 method)

Alternately, we can write \(D = X_{0:d-1} A X_{0:d}^{-1}\) or \(A = X_{0:d-1}^{-1} D X_{0:d}\).

A similar relationship holds for the first and second kind Chebyshev polynomials: \[ \frac{d}{dx} T_n(x) = n U_{n-1}(x), \] and so if we use the first-kind polynomial basis \(T_{0:d}\) for \(\mathcal{P}_d\) and the second-kind polynomial basis \(U_{0:d-1}\) for \(\mathcal{P}_{d-1}\), we have \[ D T_{0:d} = U_{0:d-1} A \] where \(A\) is the same matrix as before. Using the change of basis, we have \[ D T_{0:d} = T_{0:d-1} B^{-1} A. \] In code, the matrix representation \(T_{0:d-1}^{-1} D T_{0:d} = B^{-1} A\) can be computed by

derivative_chebyshevT(d) =
    change_U_to_T(d-1)\derivative_power(d)
derivative_chebyshevT (generic function with 1 method)

It is always useful to compare against another implementation of the same computation, e.g.

let
    Dmatrix = zeros(5,6)
    DT = derivative.(chebyshev_basis(5))
    for j = 1:length(DT)
        for ci = collect(pairs(convert(ChebyshevT, DT[j])))
            Dmatrix[ci[1]+1,j] = ci[2]
        end
    end
    Dmatrix  derivative_chebyshevT(5)
end
true

Now consider a sesquilinear form on \(\mathcal{P}_d \times \mathcal{P}_{d-1}\) given by \[ a(u, w) = \int_{-1}^1 \bar{w}(x) u'(x) \, dx. \] If we use the Chebyshev basis for both arguments, we have \[\begin{aligned} a(T_{0:d} x, T_{0:d-1} y) &= y^* T_{0:d-1}^* D T_{0:d} x \\ &= y^* T_{0:d-1}^* T_{0:d-1} B^{-1} A x, \end{aligned}\] that is, the matrix is \(T_{0:d-1}^* T_{0:d-1} B^{-1} A\). In terms of codes we have written, we have (for \(d = 5\))

derivative_chebyshevT_bilinear(d) =
    gram_L2_chebyshev(d-1)*derivative_chebyshevT(d)
derivative_chebyshevT_bilinear (generic function with 1 method)

As before, we can sanity check against an alternate computation

let
    T = chebyshev_basis(5)
    a(p, q) = dotL2(derivative(p), q)

    derivative_chebyshevT_bilinear(5) 
        [a(T[j+1], T[i+1]) for i=0:4, j=0:5]
end
true

With respect to the Legendre basis, one can differentiate the Bonnet recurrence relation for \(P_n\) and rearrange to get a recurrence relation for the derivatives: \[ P_{n+1}'(x) = (2n+1) P_n(x) + P_{n-1}'(x). \] Together with the initial conditions \(P_0'(x) = 0\) and \(P_1'(x) = P_0(x)\), we have the following expression of \(D\) with respect to the Legendre bases for \(\mathcal{P}_d\) and \(\mathcal{P}_{d-1}\), i.e. \(D P_{0:d} = P_{0:d-1} C\):

function derivative_legendre(d)
    DP = zeros(d,d+1)
    if d > 0
        DP[1,2] = 1
    end
    for j = 1:d-1
        DP[:,  j+2] = DP[:,j]
        DP[j+1,j+2] = 2j+1
    end
    DP
end
derivative_legendre (generic function with 1 method)

As usual, we sanity check our computation:

derivative.(legendre_basis(3)) ==
    legendre_basis(2)*derivative_legendre(3)
true

If we want the derivative for the normalized Legendre polynomials, it is convenient to write them as \(P_{0:d} S_d\) where \(S_d\) is the diagonal matrix with elements \(\sqrt{(2j+1)/2}\) for \(j=0\) to \(d\). Then \[ D P_{0:d} S_d = P_{0:d-1} S_{d-1} (S_{d-1}^{-1} C S_d). \] In code, we have

derivative_nlegendre(d) =
    sqrt.(2.0./(2*(0:d-1).+1)) .*
    derivative_legendre(d) .*
    sqrt.((2*(0:d).+1)/2)'
derivative_nlegendre (generic function with 1 method)

Unlike some of our earlier computations, there is a little difference between different approaches to computing the normalized Legendre polynomials, so we check whether the results are close rather than exactly the same:

let
    err = normalized_legendre_basis(2)*derivative_nlegendre(3) -
        derivative.(normalized_legendre_basis(3))
    all(iszero.(truncate.(err, atol=1e-12)))
end
true

4.2.3 Block matrices

Sometimes we are interested in \(L \in L(\mathcal{V}, \mathcal{U})\) where the two spaces are direct sums: \[\begin{aligned} \mathcal{V}&= \mathcal{V}_1 \oplus \mathcal{V}_2 \oplus \ldots \oplus \mathcal{V}_n \\ \mathcal{U}&= \mathcal{U}_1 \oplus \mathcal{U}_2 \oplus \ldots \oplus \mathcal{U}_m. \end{aligned}\] In this case, we can write \(u = Lv\) with the subspace decompositions \(u = u_1 + \ldots + u_m\) and \(v = v_1 + \ldots + v_n\) as \[ u_i = \sum_{j=1}^n L_{ij} v_j \] where \(L_{ij} \in L(\mathcal{V}_j, \mathcal{U}_i)\). We can write this in block quasimatrix notation as \[ \begin{bmatrix} u_1 \\ \vdots \\ u_m \end{bmatrix} = \begin{bmatrix} L_{11} & \ldots & L_{1n} \\ \vdots & & \vdots \\ L_{m1} & \ldots & L_{mn} \end{bmatrix} \begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix}. \] Now suppose we have associated partitioned bases \[\begin{aligned} V &= \begin{bmatrix} V_1 & V_2 & \ldots & V_n \end{bmatrix} \\ U &= \begin{bmatrix} U_1 & U_2 & \ldots & U_m \end{bmatrix}. \end{aligned}\] Then the matrix \(A = U^{-1} L V\) can be thought of as a block matrix with blocks \(A_{ij} = U_i^{-1} L_{ij} V_j\) associated with each of the \(L_{ij}\). Concretely, if \(w_i = U_i d_i\) and \(v_j = V_j c_j\) for appropriately-sized concrete vectors \(d_i\) and \(c_i\), we have \[ \begin{bmatrix} d_1 \\ \vdots \\ d_m \end{bmatrix} = \begin{bmatrix} A_{11} & \ldots & A_{1n} \\ \vdots & & \vdots \\ A_{m1} & \ldots & A_{mn} \end{bmatrix} \begin{bmatrix} c_1 \\ \vdots \\ c_n \end{bmatrix}. \]

4.2.4 Canonical forms

A canonical form for a map \(L \in L(\mathcal{V}, \mathcal{U})\) is a “simple as possible” matrix representation associated with a particular choice of bases. This also makes sense when \(\mathcal{V}\) and \(\mathcal{U}\) are concrete vector spaces, in which case the canonical form is a sort of matrix decomposition.

In general, we will consider two types of canonical forms: those associated with general choices of bases and those associated with orthonormal bases (when \(\mathcal{V}\) and \(\mathcal{U}\) are inner product spaces). The canonical forms are essentially the same whether we are working with linear maps or sesquilinear forms; we will focus on the linear map case.

4.2.4.1 Block decomposition

The first step to the canonical forms for a general linear map is to decompose \(\mathcal{V}\) and \(\mathcal{U}\) into special direct sums. For \(\mathcal{V}\), we decompose into the kernel (or null space) of \(L\), i.e. \(\mathcal{N}(L) = \{v \in \mathcal{V}: Lv = 0\}\), along with any complementary space: \[ \mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2, \quad \mathcal{V}_2 = \mathcal{N}(L). \] For \(\mathcal{U}\), we decompose into the range space \(\mathcal{R}(L) = \{Lv : v \in \mathcal{V}\}\) and any complementary space: \[ \mathcal{U}= \mathcal{U}_1 \oplus \mathcal{U}_2, \quad \mathcal{U}_1 = \mathcal{R}(L). \] The null space \(\mathcal{N}(L)\) and the range \(\mathcal{R}(L)\) are sometimes also known as the kernel and the image of \(L\). The space \(\mathcal{U}_2\) is isomorphic to \(\mathcal{U}/ \mathcal{R}(L)\), sometimes called the cokernel, while the space \(\mathcal{V}_1\) is isomorphic to \(\mathcal{V}/ \mathcal{N}(L)\), sometimes called the coimage. These four spaces (the kernel, cokernel, range, and corange) play a key role in linear algebra. The rank \(r\) is thus the both the dimension of the range and the dimension of the coimage.

With respect to such a decomposition, we have the block quasimatrix \[ L = \begin{bmatrix} L_{11} & 0_{r \times (n-r)} \\ 0_{(m-r)\times r} & 0_{(m-r)\times(n-r)} \end{bmatrix}. \] We have indicated the dimensions of the zero blocks here for clarity. The structure of this quasimatrix comes directly from the definition of the null space (which guarantees that the second block column should be zero) and the range space (which guarantees that the second block row should be zero). The block \(L_{11} \in L(\mathcal{V}_1, \mathcal{U}_1)\) is one-to-one (because all null vectors of \(L\) are in the \(\mathcal{V}_2\) space) and onto (because \(\mathcal{U}_1\) is the range space). Therefore, \(L_{11}\) is invertible.

The block decomposition of a map is closely connected to the Fredholm alternative theorem. Consider the equations \(Lx = b\), and suppose we decompose our spaces in terms of kernel, cokernel, range, and corange as above to get the quasimatrix equation \[ \begin{bmatrix} L_{11} & 0_{r \times (n-r)} \\ 0_{(m-r)\times r} & 0_{(m-r)\times(n-r)} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \end{bmatrix}. \] Then one of two conditions must hold; the system is either

  • Consistent (\(b_2 = 0\)): There is a solution to the system. In fact, if \(x_*\) is any solution (a particular solution), then the set of all possible solutions is \(\{ x_* + v : Lv = 0 \}\). more generally, an \((n-r)\)-dimensional set of solutions
  • Inconsistent (\(b_2 \neq 0\)): There is not a solution to the system (\(b_2 \neq 0\)), but there does exist \(y^* \in \mathcal{U}^*\) such that \(y^* L = 0\) and \(y^* b \neq 0\).

Without the decomposition of \(\mathcal{U}\) and \(\mathcal{V}\), the statement of the Fredholm alternative (for finite dimensional spaces) is a little mysterious. With the decomposition, it is almost a matter of inspection.

The decomposition of a map based on range and cokernel (for \(\mathcal{U}\)) and kernel and coimage (for \(\mathcal{V}\)) works equally well in the infinite-dimensional setting. Of course, in this setting some of the block dimensions will be infinite! But in some cases when the kernel and cokernel are finite-dimensional, we can define the index of \(L\) to be the difference between the dimension of the kernel and the dimension of the cokernel (or the codimension of the range space, which is the same thing). In the finite-dimensional picture, this is \[ \dim(\mathcal{V}_2) - \dim(\mathcal{U}_2) = (m-r)-(n-r) = m-n. \] In other words, the index generalizes the notion of “how many more equations than variables” there are.

4.2.4.2 General bases

Let \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2\) and let \(\mathcal{U}_1 \oplus \mathcal{U}_2\) as above (i.e. \(\mathcal{U}_1\) is the range space and \(\mathcal{V}_2\) is the null space). For any basis \(V_1\) for \(\mathcal{V}_1\), the quasimatrix \(U_1 = LV_1\) is a basis for \(\mathcal{U}_1\). Concatenating any bases \(V_2\) and \(U_2\) for \(\mathcal{V}_2\) and \(\mathcal{U}_2\), we have bases \(V = \begin{bmatrix} V_1 & V_2 \end{bmatrix}\) for \(\mathcal{V}\) and \(U = \begin{bmatrix} U_1 & U_2 \end{bmatrix}\) for \(\mathcal{U}\). With respect to these bases, we have the matrix representation \[ U^{-1} L V = \begin{bmatrix} I_{r \times r} & 0 \\ 0 & 0 \end{bmatrix}. \] This matrix representation is completely characterized by the dimensions of \(\mathcal{V}\) and \(\mathcal{U}\) and the rank \(r\).

4.2.4.3 Orthonormal bases

In a general vector space, we can use any complementary subspaces to the range and null space of \(L\). But in an inner product space, we prefer to use the orthogonal complements: \[\begin{aligned} \mathcal{V}_1 &= \mathcal{N}(L)^\perp & \mbox{coimage} \\ \mathcal{V}_2 &= \mathcal{N}(L) & \mbox{kernel / null space} \\ \mathcal{U}_1 &= \mathcal{R}(L) & \mbox{range / image} \\ \mathcal{U}_2 &= \mathcal{R}(L)^\perp & \mbox{cokernel} \end{aligned}\] These are sometimes referred to as the “four fundamental subspaces” for a linear map7 (see Strang (2023)).

Our canonical form in this setting involves a special choice for the bases \(V_1\) and \(U_1\). Specifically, we choose unit-length vectors \(v_1, \ldots, v_r\) by maximizing \(\|Lv_j\|\) over all vectors in \(\mathcal{V}\) orthogonal to \(v_1, \ldots, v_{j-1}\). As we will discuss in Chapter 18, the vectors \(Lv_1, \ldots, Lv_r\) themselves are then (nonzero) orthogonal vectors. The vectors \(v_j\) are the right singular vectors, the norms \(\sigma_j = \|Lv_j\|\) are the singular values, and the vectors \(u_j = Lv_j/\sigma_j\) are the left singular vectors. Concatenating the orthonormal bases \(V_1 = \begin{bmatrix} v_1 & \ldots & v_r \end{bmatrix}\) and \(U_1 = \begin{bmatrix} u_1 & \ldots & u_r \end{bmatrix}\) with orthonormal bases \(V_2\) and \(U_2\) for the orthogonal complements, we have the canonical form \[ U^* L V = \Sigma = \begin{bmatrix} \Sigma_{11} & 0 \\ 0 & 0 \end{bmatrix} \] where \(\Sigma_{11}\) is the diagonal matrix with the singular values \(\sigma_1, \ldots, \sigma_r\) on the diagonal. By construction, these singular values are nonzero and are listed in descending order. Frequently, we write this canonical form as the singular value decomposition (SVD) of \(L\): \[ L = U \Sigma V^* = U_1 \Sigma_{11} V_1^*. \]

4.2.4.4 Decomposed derivatives

We return again to our example of the derivative mapping from \(\mathcal{P}_d\) to \(\mathcal{P}_{d-1}\). For this example, the range space is all of \(\mathcal{P}_{d-1}\) and the null space is the constant vectors. Decomposing with respect to the \(L^2([-1,1])\) inner product, the cokernel and coimage are the empty set and the mean zero vectors. Therefore, we can find general bases in which the matrix is \[ \begin{bmatrix} I_{d \times d} & 0_{d \times 1} \end{bmatrix}. \]

More interesting is the singular value decomposition in the \(L^2([-1,1])\) inner product. Here it is easiest to compute using the matrix representation of the derivative with respect to orthonormal bases like the normalized Legendre polynomials \(\bar{P}_{0:d}\). Then taking \(C = U \Sigma V^T\) as the singular value decomposition of the matrix representing the derivative, the singular value decomposition of the derivative map is \[ D = (\bar{P}_{0:d-1} U) \Sigma (\bar{P}_{0:d} V)^T. \] That is, the orthonormal basis vectors that diagonalize \(D\) are given by \(\hat{U} = \bar{P}_{0:d-1} U\) and \(\bar{P}_{0:d} V\).

The singular values of the derivative mapping from \(\mathcal{P}_d\) to \(\mathcal{P}_{d-1}\) depend on the dimension \(d\) of the space. It is illuminating to look at the behavior of these values in ascending order:

let
    dmax = 10

    # Compute the singular values for each degree d
    σs = zeros(dmax,dmax)
    for d = 1:dmax
        F = svd(derivative_nlegendre(d))
        σs[1:d,d] .= F.S[end:-1:1]
    end

    # Draw one line for the ith smallest singular value as a function of d
    p = plot(1:dmax, σs[1,:], legend=:false, marker=:star, xlabel="d")
    for i = 2:dmax
        plot!(i:dmax, σs[i,i:dmax], marker=:star)
    end
    p
end

As we increase \(d\), we get more singular values; but each singular value (ordered from smallest to biggest) appears to fairly quickly converge from above to a limiting value. Indeed, we can make sense of these limits.

As an example, consider the smallest singular value and the associated basis vectors for the derivative acting on \(\mathcal{P}_{10}\) with the \(L^2([-1,1])\) inner product. We expect that regardless of the degree, we should have \(Dv_{\min} = u_{\min} \sigma_{\min}\) where \(\sigma_{\min}\) is the smallest (nonzero) singular value and \(u_{\min}\) and \(v_{\min}\) are the associated singular vectors (columns in the \(U\) and \(V\) bases). By \(d = 10\), we also expect to be close to the limiting behavior: the smallest singular value should converge to \(\pi/2\), with associated singular vectors \(v_{\min}(x) \rightarrow \cos(x\pi/2)\) and \(u_{\min}(x) \rightarrow \sin(x\pi/2)\). This anticipated behavior is borne out in numerical experiment:

let
    # Compute the SVD of D acting on P_10
    d = 10
    F = svd(derivative_nlegendre(d))
    U = normalized_legendre_basis(d-1)*F.U
    V = normalized_legendre_basis(d)*F.V
    σs = F.S

    # Get the singular triple associated with
    # the smallest (nonzero) singular value
    σ, u, v = σs[end], U[end], V[end]

    # Check various relations (use a mesh of 100 points)
    xx = range(-1,1,length=100)
    s = sign(u(0))
    println("""
- |D*u-v*σ| = $(norm((derivative(v)-u*σ).(xx), Inf))
- |sin(π/2*x)-v| = $(norm(s*sin.(π/2*xx)-v.(xx), Inf))
- |cos(π/2*x)-u| = $(norm(s*cos.(π/2*xx)-u.(xx), Inf))
- |σmin-π/2| = $(abs-π/2))
""")
end
- |D*u-v*σ| = 5.076326047387471e-14
- |sin(π/2*x)-v| = 4.919919749379886e-9
- |cos(π/2*x)-u| = 1.3086657329447118e-7
- |σmin-π/2| = 3.552713678800501e-15

In general, the \(k\)th smallest singular value converges to \(k\pi/2\), and the singular vectors converge to sines and cosines.

4.2.5 (Pseudo)inverses

Suppose \(L \in L(\mathcal{V}, \mathcal{U})\) has a canonical form with respect to a general basis of \[ L = U \begin{bmatrix} I_{r \times r} & 0_{r \times (n-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (n-r)} \end{bmatrix} V^{-1}. \] If \(r \neq m\) and \(r \neq n\), then \(L\) does not have an inverse. However, we can define an associated pseudoinverse \(M \in L(\mathcal{V}, \mathcal{U})\) by \[ M = V \begin{bmatrix} I_{r \times r} & 0_{r \times (m-r)} \\ 0_{(n-r) \times r} & 0_{(n-r) \times (m-r)} \end{bmatrix} U^{-1}. \] Let \(P_V = ML\) and \(P_U = LM\). In terms of the bases, we have \[ P_V = V \begin{bmatrix} I_{r \times r} & 0_{r \times (n-r)} \\ 0_{(n-r) \times r} & 0_{(n-r) \times (n-r)} \end{bmatrix} V^{-1} \] and \[ P_U = U \begin{bmatrix} I_{r \times r} & 0_{r \times (m-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (m-r)} \end{bmatrix} U^{-1}. \] Both \(P_V\) and \(P_U\) are projectors, i.e. \(P_V^2 = P_V\) and \(P_U^2 = P_U\). The \(P_V\) projector is the identity on the subspace \(\mathcal{V}_1\) that is the complementary to the kernel, and the null space is \(\mathcal{V}_1\). The \(P_U\) projector is the identity on the range space \(\mathcal{U}_1\), and the null space is \(\mathcal{U}_2\). If \(L\) is one-to-one, \(P_V\) is simply the identity; and if \(L\) is onto, \(P_U\) is the identity.

The pseudoinverse depends on the decompositions \(\mathcal{V}= \mathcal{V}_1 \oplus \mathcal{N}(L)\) and \(\mathcal{U}= \mathcal{R}(L) \oplus \mathcal{U}_2\), though it is independent of the details of the bases chosen for the four subspaces. In an inner product space, we most often choose to decompose \(\mathcal{V}\) and \(\mathcal{U}\) into orthogonal complements. This gives the Moore-Penrose pseudoinverse (often just called “the pseudoinverse” when the context does not specify some other pseudoinverse). The Moore-Penrose pseudoinverse plays a central role in least squares and minimal norm problems, as we discuss in Chapter 17. In terms of the singular value decomposition, the Moore-Penrose pseudoinverse is \[ L^\dagger = V \begin{bmatrix} \Sigma_{11}^{-1} & 0 \\ 0 & 0 \end{bmatrix} U^*. \] For the Moore-Penrose pseudoinverse, the associated projectors are \(P_V = V_1 V_1^*\) and \(P_U = U_1 U_1^*\). These are orthogonal projectors, since their range spaces are orthogonal complements of their null spaces. Projectors that are not orthogonal are said to be oblique projectors.

As an example, the Moore-Penrose pseudoinverse of the derivative operator from \(\mathcal{P}_d\) to \(\mathcal{P}_{d-1}\) is the indefinite integral operator \(\mathcal{P}_{d-1}\) to \(\mathcal{P}_d\) where the constant is chosen to make the result have zero mean. The associated projector on \(\mathcal{P}_d\) is \(q(x) \mapsto q(x) - \bar{q}\) where \(\bar{q} = \frac{1}{2} \int_{-1}^1 q(x) \, dx\).

4.2.6 Norms

The space \(L(\mathcal{V}, \mathcal{U})\) is a vector space; and, as with other vector spaces, it makes sense to talk about norms. In particular, we frequently want norms that are consistent with vector norms on the range and domain spaces; that is, for any \(w\) and \(v\), we want \[ u = Lv \implies \|u\| \leq \|L\| \|v\|. \] Particularly useful are the norms induced by vector norms on \(\mathcal{V}\) and \(\mathcal{U}\): \[ \|A\| = \sup_{v \neq 0} \frac{\|Av\|}{\|v\|} = \sup_{\|v\|=1} \|Av\|. \] In a finite-dimensional space8, the supremum is achieved, i.e. there exists a maximizing \(v\).

Suppose \(S \in L(\mathcal{U}, \mathcal{V})\) and \(T \in L(\mathcal{V}, \mathcal{W})\), and we have consistent norms on the three vectors spaces \(\mathcal{U}, \mathcal{V}, \mathcal{W}\) and the spaces of linear maps \(L(\mathcal{U}, \mathcal{V})\) and \(L(\mathcal{V}, \mathcal{W})\). Then for all \(u \in \mathcal{U}\), consistency implies \[ \|TSu\| \leq \|T\| \|Su\| \leq \|T\| \|S\| \|u\|. \] Taking the supremum over \(\|u\| = 1\) implies that the induced norm on \(L(\mathcal{U}, \mathcal{W})\), we have \[ \|TS\| \leq \|T\| \|S\|. \] This fact is most often used when the norms on \(L(\mathcal{U}, \mathcal{V})\) and \(L(\mathcal{V}, \mathcal{W})\) are also induced norms.

For inner product spaces, we can also define norms in terms of the singular value decomposition. Some examples include

  • The spectral norm (the largest of the singular values)
  • The Frobenius norm (the two-norm of the singular values)
  • The nuclear norm (the sum of the singular values)

All of these are also examples of Ky Fan norms (the Ky Fan \(k\)-norm is the sum of the \(k\) largest singular values) and Schatten norms (the Schatten norm is the \(p\)-norm of the vector of singular values). The spectral norm is the norm induced by the Euclidean structure on the two spaces. All the other Ky Fan and Schatten norms are consistent, though they are not induced norms.

For concrete vector spaces, we can write the Frobenius norm as \[ \|A\|_F = \sqrt{ \sum_{i,j} |a_{ij}|^2 }. \] The induced norms corresponding to the vector 1-norm and \(\infty\)-norm are \[\begin{aligned} \|A\|_1 &= \max_j \sum_i |a_{ij}| \quad \mbox{(max absolute column sum)}\\ \|A\|_\infty &= \max_i \sum_j |a_{ij}| \quad \mbox{(max absolute row sum)} \end{aligned}\] The norm induced by the standard Euclidean norm (the spectral norm) is harder to compute without appealing to the singular value decomposition.

4.2.7 Isometries

A linear map that preserves norms is called an isometry. Isometries are always one-to-one, since any null vector must have zero norm. If an isometry is also onto, it is sometimes called a global isometry. Global isometries are invertible, and the inverse is also an isometry.

For general norms, there are not always many interesting isometries. For example, the only isometries from \(\mathbb{R}^n\) to itself preserving the 1-norm or the \(\infty\)-norm are (signed) permutations. For inner product spaces, though, there is a much richer group of isometries, and we will focus on this case.

Isometries between inner product spaces necessarily preserve inner products, since the inner product can be recovered from the associated norm. For global isometries between inner product spaces, the adjoint and the inverse are the same. A global isometry from a space to itself is called an orthogonal operator in the real case, or a unitary operator in the complex case.

A unitary operator maps an orthonormal basis to another orthonormal basis. Therefore, if \(L \in L(\mathcal{V}, \mathcal{U})\) and \(W\) and \(Z\) are unitary operator operators on \(\mathcal{U}\) and \(\mathcal{V}\), respectively, then \(L\) and \(W L Z\) have the same singular values. This also means that any norms that can be defined in terms of singular values (including the spectral norm, the Frobenius norm, and the nuclear norm) are the same for \(L\) and \(W L Z\). This properties is known as unitary invariance (or orthogonal invariance in the real case).

4.2.8 Volume scaling

In an inner product space, we usually say that a paralleliped associated with an orthonormal basis (i.e. a unit cube) has unit volume. A unitary operator between two spaces maps a unit cube in one space to a unit cube in another space, i.e. it preserves volume. More generally, if \(L \in L(\mathcal{V}, \mathcal{U})\) and \(V\) and \(U\) are orthonormal bases associated with the singular value canonical form, then we can see \(L = U \Sigma V^*\) as

  • A volume-preserving transformation \(V^*\) from \(\mathcal{V}\) into \(\mathbb{R}^n\) (or \(\mathbb{C}^n\)).

  • A diagonal scaling operation on \(\mathbb{R}^n\) that change each side length from \(1\) to \(\sigma_j\). The volume of the scaled paralleliped in \(\mathbb{R}^n\) is then \(\prod_{j=1}^n \sigma_j\).

  • A volume-preserving transformation \(U\) from \(\mathbb{R}^n\) into \(\mathcal{V}\).

Therefore, if \(\Omega \subset \mathcal{V}\) is a region with volume \(\operatorname{vol}(\Omega)\), then \[ \operatorname{vol}(L \Omega) = \left( \prod_{j=1}^n \sigma_j \right) \operatorname{vol}(\Omega). \] In other words, the product of the singular values gives the magnitude of the determinant of \(L\). We will discuss determinants and signed volumes in Section 4.5.5.

4.3 Operators

An operator is a map \(L \in L(\mathcal{V}, \mathcal{V})\). Everything we know about general maps also works for operators, but we have a lot of additional structure coming from the fact that the space is mapped to itself.

Most of our discussion of operators focuses on three connected ideas:

  • A key feature of an operator \(L\) is its invariant subspaces, i.e subspaces \(\mathcal{U}\subset \mathcal{V}\) such that \(L \mathcal{U}= \{ Lu : u \in \mathcal{U}\}\) lies within \(\mathcal{U}\). Given an invariant subspace, we can analyze the associated restriction \(L|_{\mathcal{U}}\) of \(L\) to \(\mathcal{U}\). One-dimensional invariant subspaces play a particular notable role, as they contain the eigenvectors of \(L\); the restrictions are the associated eigenvalues.

  • We can sensibly talk about polynomials in \(L\): if \(p(z) = \sum_{j=0}^d c_j z^j\), then \(p(L) = \sum_{j=0}^d c_j L^j\). Such polynomials are useful for designing and analyzing numerical methods (e.g. Krylov subspace methods for linear systems and eigenvalue problems). They are also useful as a more purely theoretical tool.

In Chapter 5, we will also discuss an alternative approach to analyzing operators by applying the tools of complex analysis to the resolvent \(R(z) = (zI-L)^{-1}\), a rational function on \(\mathbb{C}\) whose poles correspond to eigenvalues. This perspective is useful both for computation and theory. It is particularly useful if we want to generalize from the finite-dimensional to the infinite-dimensional setting.

4.3.1 Minimal polynomial

If \(\mathcal{V}\) is an \(n\)-dimensional space, then \(L(\mathcal{V}, \mathcal{V})\) is an \(n^2\)-dimensional space, and so there must be a linear dependency between the set of \(n^2+1\) operators \(\{L^j\}_{j=0}^{n^2}\). Consequently, there must be an annihilating polynomial of degree at most \(n^2\), i.e. a polynomial \(p\) such that \(p(L) = 0\). Among the annihilating polynomials for \(L\), there must be some monic9 annihilating polynomial \(p_{\min}\) of minimal degree \(m\). We call \(p_{\min}\) the minimal polynomial.

If \(p\) is any annihilating polynomial, polynomial division lets us write \[ p(z) = q(z) p_{\min}(z) + r(z), \quad \deg(r) < m. \] Because \(p_{\min}(L) = 0\), we have \[ p(L) = q(L) p_{\min}(L) + r(L) = r(L), \] and therefore \(r = 0\); otherwise \(r\) would be an annihilating polynomial of lower degree than \(m\), contradicting the minimality. Therefore any annihilating polynomial can be written as \(q(z) p_{\min}(z)\) for some other polynomial \(q(z)\). As a direct consequence, the minimal polynomial is unique. Also as a consequence, the minimal polynomial of \(L|_{\mathcal{U}}\) for any subspace \(\mathcal{U}\) divides \(p_{\min}\) for \(L\), since \(p_{\min}\) for \(L\) is an annihilating polynomial for \(L|_{\mathcal{U}}\).

We can write the minimal polynomial in factored form (over \(\mathbb{C}\)) as \[ p_{\min}(z) = \prod_{j=1}^s (z-\lambda_j)^{d_j}. \] Each term \(L-\lambda_j I\) must be singular; otherwise, we would have \((L-\lambda_j I)^{-1} p_{\min}(L) = 0\), contradicting minimality.

The \(\lambda_j\) are the eigenvalues of \(L\), and the null spaces of each \(L-\lambda_j I\) are comprised of eigenvectors. If \(d_j = 1\) for each eigenvalue, then there is a basis of eigenvectors; otherwise, we have to consider generalized eigenvectors as well to get a basis. In either case, the space \(\mathcal{V}\) can be decomposed into a direct sum of invariant subspaces \(\mathcal{N}\left( (L-\lambda_j I)^{d_j} \right)\); these are the maximal invariant subspaces associated with the eigenvalues.

If we let \(m_j = \dim \mathcal{N}\left( (L-\lambda_j I)^{d_j} \right)\), we can also define the characteristic polynomial \[ p_{\mbox{char}}(z) = \prod_{j=1}^s (z-\lambda_j)^{m_j}. \] The dimension \(m_j\) is the algebraic multiplicity of the eigenvalue \(\lambda_j\). The geometric multiplicity is the dimension of \(\mathcal{N}(L - \lambda_j I)\), i.e. the number of independent eigenvectors for \(\lambda_j\).

The characteristic polynomial is often also written as \[ p_{\mbox{char}}(z) = \det(zI-L), \] but writing the characteristic polynomial in terms of determinants has the disadvantage that we need an appropriate definition of determinants first!

4.3.2 Canonical forms

As in the case of maps between different spaces, we are interested in canonical matrix representations for operators \(L \in L(\mathcal{V}, \mathcal{V})\), i.e. choices of bases that make the matrix representation “as simple as possible.” We consider two such forms: one involving an arbitrary basis, and the other involving an orthonormal basis.

4.3.2.1 Jordan form

As we noted above, the space \(\mathcal{V}\) can be written as \[ \mathcal{V}= \mathcal{V}_1 \oplus \mathcal{V}_2 \oplus \ldots \oplus \mathcal{V}_s, \] where \(\mathcal{V}_s\) is the maximal invariant subspace associated with the eigenvalue \(\lambda_s\), i.e. \[ \mathcal{V}_j = \mathcal{N}((L-\lambda_j I)^{d_j}). \] With respect to this partitioning of \(\mathcal{V}\), we have the block diagonal quasimatrix representation \[ \begin{bmatrix} L_{11} \\ & L_{22} \\ & & \ddots \\ & & & L_{ss} \end{bmatrix} \] where \(L_{jj}\) represents the restriction \(L|_{\mathcal{V}_j}\). With a choice of bases for each of the \(\mathcal{V}_j\), this leads to a block diagonal matrix representation.

When the minimal polynomial exponent \(d_j\) is 1, we have that \(\mathcal{V}_j = \mathcal{N}(L-\lambda_j I)\), and so \[ L|_{\mathcal{V}_j} = \lambda_j I. \] Hence, if there are no repeated zeros of the minimal polynomial, we can choose bases \(V_j\) for each subspace \(\mathcal{V}_j\), and each will satisfy \[ L V_{j} = V_j (\lambda_j I). \] Putting these together, we have the concatenated basis \[ V = \begin{bmatrix} V_1 & V_2 & \ldots & V_s \end{bmatrix} \] satisfying the relationships \[ LV = V \Lambda \] where \(\Lambda\) is a diagonal matrix in which each eigenvalue \(\lambda_j\) appears \(\dim \mathcal{V}_j\) times in a row. With respect to the eigenvector basis \(V\), we therefore have the canonical matrix representation \[ \Lambda = V^{-1} L V. \] Hence, such operators are diagonalizable.

When the minimal polynomial has zeros with multiplicity greater than one, the matrix is not diagonalizable. In this case, we cannot form a basis of eigenvectors, but we can form a basis if we include generalized eigenvectors. In the Jordan canonical form, we find bases consisting of Jordan chains, i.e. vectors satisfying \[\begin{aligned} (L-\lambda_j I) u_1 &= 0 \\ (L-\lambda_j I) u_k &= u_{k-1}, \quad 1 < k \leq b. \end{aligned}\] Rewriting the second expression as \(L u_k = u_{k-1} + \lambda_j u_k\) and defining the quasimatrix \(U = \begin{bmatrix} u_1 & \ldots & u_b \end{bmatrix}\), we can summarize with the matrix equation \[ L U = U J_{b\times b}(\lambda) \] where \[ J_{b \times b}(\lambda) \begin{bmatrix} \lambda_j & 1 \\ & \lambda_j & 1 \\ & & \ddots & \ddots \\ & & & \lambda_j & 1 \\ & & & & \lambda_j \end{bmatrix} \] is a \(b\)-dimensional Jordan block. The maximum size of any Jordan block is equal to the multiplicity \(d_j\) of \(\lambda_j\) in the minimal polynomial; the number of Jordan blocks for \(\lambda_j\) is equal to the geometric multiplicity \(\dim \mathcal{N}(L-\lambda_j I)\); and the sum of the sizes of the Jordan blocks is equal to the algebraic multiplicity \(\dim \mathcal{N}((L-\lambda_j I)^{d_j})\). Taking a basis formed from concatenating such Jordan chains for each space, we have the Jordan canonical form, a block diagonal matrix in which each diagonal block is a Jordan block.

The Jordan form is fragile: almost all operators are diagonalizable and have distinct eigenvalues, and infinitesimally small changes will usually completely destroy any nontrival Jordan blocks. Moreover, diagonalizable operators that are close to operators with nontrivial Jordan blocks may have eigenvector bases, but those eigenvector bases are not all that nice. We therefore would like an alternate canonical form that gives up some of the goal of being “as diagonal as possible.”

4.3.2.2 Schur form

In our discussion of the Jordan form, we started with the decomposition of \(\mathcal{V}\) into maximal invariant subspaces associated with each eigenvalue. For the Schur form, we decompose \(\mathcal{V}\) in terms of nested invariant subspaces: \[ \mathcal{W}_k = \oplus_{j=1}^k \mathcal{V}_j. \] Assuming \(\mathcal{V}\) is an inner product space, we can also write \[ \mathcal{W}_k = \oplus_{j=1}^k \mathcal{U}_j \] where \(\mathcal{U}_1 = \mathcal{V}_1\) and otherwise \(\mathcal{U}_j\) is the orthogonal complement of \(\mathcal{W}_{j-1}\) in \(\mathcal{W}_j\). With respect to the decomposition \(\mathcal{V}= \oplus_{j=1}^s \mathcal{U}_j\), we have the block upper triangular quasimatrix representation \[ \begin{bmatrix} L_{11} & L_{12} & \ldots & L_{1s} \\ & L_{22} & \ldots & L_{2s} \\ & & \ddots & \vdots \\ & & & L_{ss} \end{bmatrix} \] With an appropriate choice of orthonormal basis \(U_j\) for each \(\mathcal{U}_j\) and setting \(U = \begin{bmatrix} U_1 & U_2 & \ldots & U_s \end{bmatrix}\), we have \[ U^* L U = T \] where \(T\) is an upper triangular matrix for which the eigenvalues are on the diagonal. This is a (complex) Schur canonical form.

The complex Schur form may in general involve complex choices of the basis \(U\) and a complex matrix representation \(T\), even if \(\mathcal{V}\) is most naturally treated as a vector space over the reals. However, there is a real Schur form which involves a real orthonormal basis and a block upper triangular matrix \(T\) where the diagonal blocks are 1-by-1 (for real eigenvalues) or 2-by-2 (for conjugate pairs of eigenvalues).

4.3.3 Similar matrices

So far, we have focused on matrices as representations of operators on abstract vector spaces, where we choose one basis \(V\) for the space \(\mathcal{V}\): \[ A = V^{-1} L V. \] If we consider an alternate basis \(V' = VX\) for an invertible matrix \(X\), we have the associated matrix representation \[ A' = V'^{-1} L V' = (V^{-1} V')^{-1} A (V^{-1} V') = X^{-1} A X. \] The transformation \(A \mapsto X^{-1} A X\) is a similarity transformation, and the matrices \(A\) and \(A'\) are said to be similar.

The properties we have discussed so far (eigenvalues, the minimal and characteristic polynomial, the Jordan form) are fundamentally properties of operators, and do not depend on the basis in which those operators are expressed. As much as possible, we try to make this clear by giving basis-free explanations when possible. But it is also possible to give matrix-based arguments in which we argue for invariance under similarity transformations. For example, if \(\hat{A} = X^{-1} A X\), then \[ \hat{A}^j = (X^{-1} A X)^j = X^{-1} A^j X \] and, more generally, \(p(\hat{A}) = X^{-1} p(A) X\). Therefore, any annihilating polynomial for \(p(A)\) is also an annihilating polynomial for \(\hat{A}\); and the minimal and characteristic polynomials for \(A\) will similarly be the minimal and characteristic polynomials for \(\hat{A}\).

4.3.4 Trace

We typically describe the trace in terms of a matrix representation: the trace of a matrix is the sum of its diagonal entries, i.e. \[ \operatorname{tr}(A) = \sum_{i=1}^n a_{ii}. \] The trace satisfies the cyclic property that for any \(B \in \mathbb{F}^{m \times n}\) and \(C \in \mathbb{F}^{n \times m}\) we have \[\begin{aligned} \operatorname{tr}(AB) &= \sum_{i=1}^m \sum_{j=1}^n b_{ij} c_{ji} \\ &= \sum_{j=1}^n \sum_{i=1}^m c_{ji} b_{ij} \\ &= \operatorname{tr}(BA). \end{aligned}\] Using this property, for any invertible \(X \in \mathbb{F}^{n \times n}\) we have \[ \operatorname{tr}(X^{-1} A X) = \operatorname{tr}(A X X^{-1}) = \operatorname{tr}(A). \] That is, the trace is invariant under similarity, and is thus a property of the underlyling linear operator and not just of a particular matrix representation. In particular, if \(X^{-1} A X = J\) is a Jordan canonical form, we can see that the trace is the sum of the eigenvalues (counting geometric multiplicity).

4.3.5 Frobenius inner product

The trace is also useful for defining other structures on spaces of linear maps in a basis-free way. Let \(L \in L(\mathcal{V}, \mathcal{U})\) be a linear map between two inner product spaces, and let \(L = U \Sigma V^*\) be the singular value decomposition. Then \[ \operatorname{tr}(L^* L) = \operatorname{tr}(V \Sigma^2 V^*) = \operatorname{tr}(\Sigma^2 V^* V) = \operatorname{tr}(\Sigma^2) = \sum_{j=1}^n \sigma_j^2 = \|L\|_F^2. \] where \(\|L\|_F^2\) is the squared Frobenius norm. Also, if \(B\) is any matrix representation of \(L\) with respect to orthonormal bases, then we have \[ \|L\|_F^2 = \|B\|_F^2 = \operatorname{tr}(B^* B) = \sum_{i,j} |b_{ij}|^2, \] so it is very simple to compute the Frobenius norm of \(L\) given any matrix representation with respect to orthonormal bases.

More generally, the trace can be used to define the Frobenius inner product on the space \(L(\mathcal{V}, \mathcal{W})\) via \[ \langle L, M \rangle_F = \operatorname{tr}(M^* L). \] The Frobenius norm is the standard Euclidean norm associated with the Frobenius inner product.

4.3.6 Hermitian and skew

If \(L \in L(\mathcal{V}, \mathcal{V})\), then \[ L = H + S, \quad H = \frac{1}{2} (L+L^*), \quad S = \frac{1}{2}(L-L^*). \]

The operators \(H\) and \(S\) are known as the Hermitian and skew-Hermitian parts of \(L\). We note that and \(\langle H, S \rangle_F = 0\), so by the Pythagorean theorem, \[ \|L\|_F^2 = \|H\|_F^2 + \|S\|_F^2. \] More generally, \(L(\mathcal{V}, \mathcal{V})\) is a direct sum of Hermitian and skew-Hermitian subspaces, and these spaces are orthogonal complements of each other.

Any operator \(L\) has a Schur canonical form \[ U^* L U = T \] where \(T\) is an upper triangular matrix. For a Hermitian matrix \(H\), we have \[ T = U^* H U = (U^* H U)^* = T^*, \] which implies that \(T\) must be diagonal (since the subdiagonal elements of \(T\) are zero and the superdiagonal elements of \(T^*\) are zero), and that the diagonal part of \(T\) must be real (since \(t_{ii} = t_{ii}^*\)). Similarly, for a skew-Hermitian matrix \(S\), we have \[ T = U^* S U = -(U^* S U)^* = -T^*, \] which simplies that in this case \(T\) is diagonal and the diagonal part is imaginary. Hence, the decomposition of a matrix into Hermitian and skew-Hermitian parts in many ways mirrors the decomposition of a complex number into real and imaginary parts.

There is an even stronger number between the Hermitian / skew-Hermitian decomposition and the complex numbers for the case of \(\mathbb{R}^{2 \times 2}\). Let \(g : \mathbb{C}\rightarrow \mathbb{R}^{2 \times 2}\) be the map \[ g(a + bi) = aI + bJ, \quad J = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}. \] Then \[\begin{aligned} g(0) &= 0 \\ g(1) &= I \\ g(z+w) &= g(z) + g(w) \\ g(zw) &= g(z) g(w) \\ g(z^{-1}) &= g(z)^{-1}. \end{aligned}\] That is, the complex matrices are isomorphic to the subset of the 2-by-2 real matrix where the symmetric part is proportional to the identity.

4.4 Hermitian and quadratic forms

A sesquilinear form \(a : \mathcal{V}\times \mathcal{V}\rightarrow \mathbb{F}\) is Hermitian if \(a(v,w) = \overline{a(w,v)}\).
If \(a(v,w) = -\overline{a(w,v)}\), the form is called skew-Hermitian. Any sesquilinear form on \(\mathcal{V}\times \mathcal{V}\) can be decomposed into Hermitian and skew-Hermitian parts: \[\begin{aligned} a(v,w) &= a^H(v,w) + a^S(v,w) \\ a^H(v,w) &= \frac{1}{2} \left( a(v,w) + \overline{a(w,v)} \right) \\ a^S(v,w) &= \frac{1}{2} \left( a(v,w) - \overline{a(w,v)} \right). \end{aligned}\] We use the terms Hermitian and skew-Hermitian generically to also refer to symmetric and skew-symmetric forms on real vector spaces.

If \(\mathcal{V}\) is an inner product space, then for any sesquilinear form \(a : \mathcal{V}\times \mathcal{V}\rightarrow \mathbb{F}\), there is an associated operator \(L : \mathcal{V}\rightarrow \mathcal{V}\) such that \[ a(v,w) = \langle Lv, w \rangle. \] When \(a\) is Hermitian, we have \[ \langle Lv, w \rangle = a(v,w) = \overline{a(w,v)} = \langle v, Lw \rangle, \] i.e. \(L = L^*\) is a self-adjoint operator. A similar argument says that if \(a\) is skew-Hermitian, the corresponding operator will satisfy \(L = -L^*\). Hence, the study of Hermitian forms is closely linked to the study of self-adjoint operators.

A quadratic form is a function \(\phi : \mathcal{V}\rightarrow \mathbb{F}\) that is homogeneous of degree 2 (i.e. \(\phi(\alpha v) = \alpha^2 \phi(v)\)) and such that \((u, v) \mapsto \phi(u+v) - \phi(u) - \phi(v)\) is bilinear. The associated bilinear form is always symmetric. Hence, on real vector spaces the study of Hermitian forms is also closely linked to the study of quadratic forms.

4.4.1 Matrices

Given a basis \(V\) for \(\mathcal{V}\), the standard matrix representation of a sesquilinear form on \(\mathcal{V}\times \mathcal{V}\) is a square matrix with entries \(a_{ij} = a(v_j, v_i)\) so that \[ a(Vc, Vd) = d^* A c. \] Decomposing the form into Hermitian and skew-Hermitian parts corresponds to decomposing \(A = H+S\) where \(H=H^*\) and \(S=-S^*\) are the Hermitian and skew-Hermitian parts of the matrix.

A quadratic form \(\phi\) has an associated symmetric bilinear form \[ a(u, v) = \frac{1}{2} \left( \phi(u+v) - \phi(u) - \phi(v) \right). \] The matrix representation for a quadratic form is \[ \phi(Vc) = c^T A c \] where the entries of \(A\) are \(a_{ij} = a(v_i, v_j)\). Conversely, for any real bilinear form, \(a\), we have that \(\phi(v) := a(v,v)\) is a quadratic form, and \(\phi(Vc) = c^T H c\) where \(H\) is the matrix for the symmetric part of \(a\).

Let \(A\) be a matrix representing a sesquilinear form \(a : \mathcal{V}\times \mathcal{V} \rightarrow \mathcal{F}\) with respect to a basis \(V\), i.e. \[ a(Vc, Vd) = d^* A v. \] We can write any other basis as \(\hat{V} = VX\) for some nonsingular \(X\); with respect to the \(\hat{V}\) basis, we have \[ a(\hat{V}c \hat{V}d) = a(VXc, VXd) = d^* (X^* A X) c. \] For any nonsingular \(X\), we say \(A\) and \(\hat{A} = X^* A X\) are congruent. Just as similar matrices represent the same operator under different bases, congruent matrices represent the same sesquilinear form under different bases. When we restrict our attention to orthonormal bases, the matrix \(X\) will be unitary; we note that \(X^* = X^{-1}\) for unitary \(X\), so a unitary congruence and a unitary similarity are the same thing.

4.4.2 Canonical forms

In the case of a general basis, there exists is a basis such that the matrix representation of a Hermitian form is \[ \begin{bmatrix} I_{\nu_+} & & \\ & 0_{\nu_0} & \\ & & -I_{\nu_{-}} \end{bmatrix} \] The triple \((\nu_+, \nu_0, \nu_-)\) is the inertia of the form. Inertia, like rank, is a property of the linear algebraic object (in this case the Hermitian form) rather than its representation in any specific basis. Let \(n = \dim(\mathcal{V})\); then we classify the form based on its inertia as:

  • Positive definite if \(\nu = (n, 0, 0)\)
  • Positive semidefinite if \(\nu = (r, n-r, 0)\)
  • Negative definite if \(\nu = (0, 0, n)\)
  • Negative semidefinite if \(\nu (0, n-r, r)\)
  • Strongly indefinite if \(\nu_+ > 0\) and \(\nu_- > 0\).

A Hermitian positive definite form is an inner product.

If \(\mathcal{V}\) is an inner product space and we restrict ourselves to orthonormal bases, we have a diagonal matrix \(\Lambda\) with \(\nu_+\) positive values, \(\nu_0\) zeros, and \(\nu_-\) negative values. If we identify \(a(v, w)\) with \(\langle Lv, w \rangle\), then this canonical form gives that \(a(v_i, v_j) = \langle Lv_i, v_j \rangle = \lambda_i \delta_{ij}\). The Hermitian property implies that \(a(v_i, v_i) = \overline{a(v_i, v_i)} = \lambda\), so \(\lambda\) must be real even when the form is over \(\mathbb{C}\). Given the orthonormality of the basis, this means \[ Lv_j = v_j \lambda_j, \] i.e. the canonical decomposition of the Hermitian form is really the same as the eigenvalue decomposition of the associated self-adjoint operator \(L\).

The Hermitian eigenvalue problem has an enormous amount of structure that comes from the fact that it can be interpreted either in terms of a Hermitian form or the associated quadratic or in terms of an operator.

4.4.3 A derivative example

As before, it is useful to consider a specific example that does not involve a concrete vector space. We will use the real symmetric form on \(\mathcal{P}_d\) given by \[ a(p, q) = \left\langle \frac{dp}{dx}, \frac{dq}{dx} \right\rangle \] using the \(L^2([-1,1])\) inner product between polynomials. Integration by parts implies that \[\begin{aligned} a(p,q) &= \int_{-1}^1 \frac{dp}{dx} \frac{dq}{dx} \, dx \\ &= \left. \frac{dp}{dx} q \right|_{-1}^1 - \int_{-1}^1 \frac{d^2p}{dx^2} q \, dx \\ &= \left\langle \delta_1 \delta_1^* \frac{dp}{dx} - \delta_{-1} \delta_{-1}^* \frac{dp}{dx} - \frac{d^2 p}{dx^2}, q \right\rangle, \end{aligned}\] where \(\delta_1^*\) and \(\delta_{-1}^*\) are the evaluation functionals at \(\pm 1\) and \(\delta_1\) and \(\delta_{-1}\) are the associated vectors under the Riesz map. Hence, the self-adjoint linear map associated with the form is \[ L = \delta_1 \delta_1^* \frac{d}{dx} - \delta_{-1} \delta_{-1}^* \frac{d}{dx} - \frac{d^2}{dx^2}. \] We can check ourselves by implementing this relation in code

let
    # Implement the L map
    d = 2
    δp1 = dualL2_δx( 1.0, d)
    δn1 = dualL2_δx(-1.0, d)
    function L(p)
        Dp = derivative(p)
        D2p = derivative(Dp)
        δp1*Dp(1) - δn1*Dp(-1) - D2p
    end

    # Test polynomials p and q
    p = Polynomial([2.0; -3.0; 1.0])
    q = Polynomial([1.0; 1.0; 1.0])

    # Test agreement with the form and self-adjointness
    dotL2(derivative(p), derivative(q))  dotL2(L(p), q),
    dotL2(L(p), q)  dotL2(p, L(q))
end
(true, true)

The form is positive semidefinite, since \(a(1,1) = 0\) but \(a\) cannot ever by negative (by positive definiteness of the inner product); the inertia is \((d, 1, 0)\). While we can compute the eigenvalues in terms of a matrix derived from \(L\), it is simpler to compute the matrix directly from the form:

let
    d = 3

    # Compute the matrix in terms of the normalized Legendre polynomials
= normalized_legendre_basis(d)
    AP̄ = gram(dotL2, derivative.(P̄))

    # Compute the eigendecomposition
    F = eigen(AP̄)
    U =*F.vectors

    # Check orthonormality of the U basis,
    # and that the associated matrix is Λ
    gram(dotL2, U)  I,
    gram(dotL2, derivative.(U))  Diagonal(F.values)
end
(true, true)

Similar to what we saw in our discussion of the singular value decomposition, the \(k\)th nonzero eigenvalue converges as a function of \(d\) to a limiting value of \((k \pi/2)^2\). We plot the convergence of the first few eigenvalues below:

let
    d = 15
= normalized_legendre_basis(d)
    AP̄ = gram(dotL2, derivative.(P̄))

    p = plot()
    for j=2:5
        λs = [eigvals(AP̄[1:n,1:n])[j] for n=j:d+1]
        plot!(j:d+1, abs.(λs.-((j-1)*π/2)^2),
              yscale=:log10, label="λ[$j]")
    end
    p
end

4.5 Tensors and determinants

So far, we have dealt with linear, bilinear, sesquilinear, and quadratic functions on vector spaces. Going beyond this to functions that are linear in several arguments takes us into the land of tensors.

4.5.1 Takes on tensors

There are several ways to approach tensors. In much of numerical analysis, only tensors of concrete spaces are considered, and tensors are treated as arrays of multi-dimensional arrays of numbers (with matrices as a special case where there are two dimensions used for indexing). For example, a tensor with three indices is represented by an element \(a_{ijk}\) of \(\mathbb{F}^{m \times n \times p}\). We will see this perspective in detail later, both in our discussion of numerical linear algebra and in our discussion of tensors in latent factor models.

Here, though, we will focus on tensors as basic objects of multilinear algebra. There are still several ways to approach this:

  • We can consider tensors purely in terms of bases, along with change-of-basis rules. This view, common in some corners of physics, says that “a tensor is an object that transforms like a tensor,” a perspective similar to “a vector is an object that transforms like a vector” that we mentioned earlier. We will prefer a discussion that gives changes of basis as a consequence of a more fundamental definition, rather than as the definition itself.

  • We can consider tensors from an algebraic perspective as a quotient space: the tensor product space \(\mathcal{V}\otimes \mathcal{W}\) is a vector space spanned by elements of the Cartesian product \(\mathcal{V}\times \mathcal{W}\), modulo the equivalence relations \[\begin{aligned} \alpha (v, w) &\sim (\alpha v, w) \sim (v, \alpha w) \\ (u+v, w) &\sim (u,w) + (v,w) \\ (u, v+w) &\sim (u,v) + (u,w) \end{aligned}\] We write \(v \otimes w\) to denote the equivalence class associated with \((v,w)\). While this definition has the advantage of being very general (it can generalize to structures other than vector spaces) and it does not depending on a basis, it can be a bit overly abstract.

  • We will mostly consider tensors as multilinear forms, generalizing the notion of bilinear forms discussed above. This has the advantage of extending our discussion of bilinear and sesquilinear forms from earlier, remaining abstracted away from a specific basis while not too abstract.

4.5.2 Multilinear forms

Let \(\mathcal{V}\) and \(\mathcal{U}\) be two vector spaces. For any \(u \in \mathcal{U}\) and \(w^* \in \mathcal{V}^*\), the outer product \(uw^*\) is associated with a rank 1 map in \(L(\mathcal{V}, \mathcal{U})\) given by \(v \mapsto u w^* v\). Alternately, we can see \(uw^*\) as a bilinear map from \(\mathcal{U}^* \times \mathcal{V}\rightarrow \mathbb{F}\) given by \((f^*, v) \mapsto f^* u w^* v\); or we can see \(uw^*\) as a dual map in \(L(\mathcal{U}^*, \mathcal{V}^*)\) given by \(f^* \mapsto f^* u w^*\). We will use the uniform notation \(u \otimes w^*\) to cover all of these possible interpretations of the outer product.

More generally, let \(\mathcal{V}_1, \mathcal{V}_2, \ldots, \mathcal{V}_d\) be vector spaces. Then for any vectors \(v_1, v_2, \ldots, v_d\) in these spaces, the tensor product \(v_1 \otimes v_2 \otimes \ldots \otimes v_d\) can be interpreted as the multilinear map from \(\mathcal{V}_1^* \times \mathcal{V}_2^* \times \ldots \mathcal{V}_d^* \rightarrow \mathbb{F}\) given by \[ (v_1 \otimes v_2 \otimes \ldots \otimes v_d)(w_1^*, w_2^*, \ldots, w_d^*) = (w_1^* v_1) (w_2^* v_d) \ldots (w_d^* v_d). \] As in the case of just two vector spaces, we can define other maps by partial evaluation. For example, we can consider this as a map from \(\mathcal{V}_2^* \times \ldots \times \mathcal{V}_d^* \rightarrow \mathcal{V}_1\) by \[ (w_d^*, \ldots, w_d^*) \mapsto a(\cdot, w_2^*, \ldots, w_d^*) \] where “filling in” all but the first argument gives an element of \(\mathcal{V}_1\) (or technically of \(\mathcal{V}_1^{**}\), but this is isomorphic10 to \(\mathcal{V}_1\)).

The vector space of all multilinear forms \(\mathcal{V}_1^* \times \ldots \mathcal{V}_d^* \rightarrow \mathbb{F}\) is written as \(\mathcal{V}_1 \otimes \ldots \otimes \mathcal{V}_d\). If \(\mathcal{V}_j\) has a basis \(V^{(j)}\) and associated dual basis \((W^{(j)})^*\), then an arbitrary element \(a \in \mathcal{V}_1 \otimes \ldots \otimes \mathcal{V}_d\) can be written uniquely as \[ a = \sum_{i_1, \ldots, i_d} a_{i_1 \ldots i_d} \, v_{i_1}^{(1)} \otimes \ldots \otimes v_{i_d}^{(d)}. \] where \[ a_{i_1 \ldots i_d} = a(w_{i_1}^{(1)*}, \ldots w_{i_d}^{(d)*}). \] That is, the set of all tensor products of basis vectors for the spaces \(\mathcal{V}_j\) form a basis for the tensor product space \(\mathcal{V}_1 \otimes \ldots \otimes \mathcal{V}_d\).

In terms of the partial evaluation operation discussed above, we might consider \(a\) as a map from \(\mathcal{V}_2^* \otimes \ldots \otimes \mathcal{V}_d^*\) to \(\mathcal{V}_1\) by taking \[\begin{aligned} b &= \sum_{i_2, \ldots, i_d} b_{i_2 \ldots i_d} \, w_{i_2}^{(2)*} \otimes w_{i_d}^{(d)*} \\ &\mapsto \sum_{i_2, \ldots, i_d} b_{i_2 \ldots i_d} \, a(\cdot, w_{i_2}^{(2)*}, \ldots w_{i_d}^{(d)*}) \\ &= \sum_{i_1, i_2, \ldots, i_d} v_{i_1}^{(1)} \, a_{i_1 \ldots i_d} b_{i_2 \ldots i_d}. \end{aligned}\] This generalized partial evaluation operation is known as a tensor contraction. In general, we can contract two tensors on any pairs of “slots” in the form that take vectors from a dual pair of spaces.

When a real inner product space \(\mathcal{V}\) appears in a tensor, we can construct a new tensor in which \(\mathcal{V}^*\) appears in the same slot by composing that slot with the inverse Riesz map (“lowering the index” in the physics parlance). Similarly, we can convert a \(\mathcal{V}^*\) slot to a \(\mathcal{V}\) slot via the Riesz map (“raising the index”). When a tensor is represented with respect to orthonormal bases, raising or lowering operations do not change any of the coefficients.

It is tempting to ask if the coefficients might be made simple by a special choices of the bases for the component spaces, analogous to the anonical forms we discussed before. Unfortunately, tensors beyond bilinear forms lack any nice canonical representation.

4.5.3 Polynomial products

As an example of tensor product spaces in action, we consider the space of polynomials in three variables with maximum degree per variable of \(d\). In the power basis, we would write this as \[ \mathcal{P}_d \otimes \mathcal{P}_d \otimes \mathcal{P}_d = \left\{ \sum_{i,j,k} c_{ijk} x^i y^j z^k : c_{ijk} \in \mathbb{R}^{d \times d \times d} \right\}. \] Evaluating a polynomial \(p \in \mathcal{P}_d \otimes \mathcal{P}_d \otimes \mathcal{P}_d\) at a particular point \((\alpha, \beta, \gamma)\) is associated with evaluation of a trilinear form, or equivalently with contraction against a tensor in \(\mathcal{P}_d^* \otimes \mathcal{P}_d^* \otimes \mathcal{P}_d^*\); i.e. \[ p \cdot (\delta_{\alpha}^* \otimes \delta_{\beta}^* \otimes \delta_{\gamma}^*) \] We can similarly think about contracting in some slots but not others, or contracting against other linear functionals (like integration).

For concrete tensors, contractions can also be used to express operations like changing basis. For example, if \(T_{0:d} A = X_{0:d}\), we can write \[\begin{aligned} p(x,y,z) &= \sum_{i,j,k} c_{ijk} \, x^i y^j z^k \\ &= \sum_{l,m,n} d_{lmn} \, T_l(x) T_m(y) T_n(z) \\ d_{lmn} &= \sum_{i,j,k} c_{ijk} a_{li} a_{jm} a_{kn} \end{aligned}\] That is, we change bases by applying \(A\) on each “fiber” of the coefficient tensor (i.e. a vector associated with holding two indices fixed and letting the other vary). Indeed, the transformation from the \(c_{ijk}\) to the \(d_{lmn}\) coefficients is most easily coded in terms of such a matrix operation:

let
    d = 2
    C = rand(d+1,d+1,d+1)  # Power coeffs

    # Transform to the Chebyshev coefficients
    D = copy(C)
    M = chebyshev_power_coeffs(d)  # A = inv(M)

    # Contract with a_li
    for j=1:d+1
        for k = 1:d+1
            D[:,j,k] = M\D[:,j,k]
        end
    end

    # Contract with a_mj
    for i=1:d+1
        for k=1:d+1
            D[i,:,k] = M\D[i,:,k]
        end
    end

    # Contract with a_nk
    for i=1:d+1
        for j=1:d+1
            D[i,j,:] = M\D[i,j,:]
        end
    end

    # Check correctness by comparing evaluations a random point
    T = chebyshev_basis(d)
    x,y,z = rand(3)
    p1 = 0.0
    p2 = 0.0
    for i=1:d+1
        for j=1:d+1
            for k=1:d+1
                p1 += C[i,j,k] * x^(i-1) * y^(j-1) * z^(k-1)
                p2 += D[i,j,k] * T[i](x) * T[j](y) * T[k](z)
            end
        end
    end
    p1  p2
end
true

4.5.4 Alternating forms

An alternating form is a multilinear form on several copies of the same vector space \(\mathcal{V}\) with the property that it is zero if any input is repeated. This implies that swapping the order of any pair of arguments reverses the sign; for example, if \(a : \mathcal{V}\times \mathcal{V} \times \mathcal{V}\rightarrow \mathbb{F}\) is a multilinear form, then \[\begin{aligned} a(u, v, w) + a(w, v, u) &= a(u, v, w) + a(u, v, u) + a(w, v, w) + a(w, v, u) \\ &= a(u, v, w+u) + a(w, v, w+u) \\ &= a(u+w, v, w+u) \\ &= 0. \end{aligned}\] The set of all alternating forms on \(k\) copies of \(\mathcal{V}\) is written as \(\mathcal{V}^* \wedge \ldots \wedge \mathcal{V}^*\), a subspace of \(\mathcal{V}^* \otimes \ldots \otimes \mathcal{V}^*\).

The wedge product of two tensors over copies of the same space is \[ f \wedge g = f \otimes g - g \otimes f. \] Alternating forms of a given number of arguments also form a vector space. For example, \(\mathcal{V}^* \wedge \mathcal{V}^* \wedge \mathcal{V}^*\) has a basis of vectors \(w_i^* \wedge w_j^* \wedge w_k^*\) for each possible subset \(\{i, j, k\}\) of three distinct indices from the range \(1, \ldots, n\). Hence, it has dimension \(n\) choose \(3\).

If \(\mathcal{V}\) is \(n\)-dimensional, the space of \(n\) copies of \(\mathcal{V}^*\) wedged together has \(n\) choose \(n\) dimensions — that is, there is only one alternating form on \(n\) inputs, up to scaling. We arbitrarily choose one such form by picking a basis \(V\) and letting \(a(v_1, v_2, \ldots, v_n) = 1\) and declaring this to be the signed volume form. In most situations, \(\mathcal{V}\) is an inner product space and we choose \(V\) to be an orthonormal basis.

4.5.5 Determinants

Let \(f \in \mathcal{V}^* \wedge \ldots \wedge \mathcal{V}^*\) be the signed volume form for a space \(\mathcal{V}\). Rather than writing evaluation of \(f\) as \(f(v_1, v_2, \ldots, v_n)\), we will collect the arguments into a quasimatrix and write \(f(V)\) for the volume associated with a parallelipiped whose edges are vectors \(V\).

If \(U\) is a basis such that \(f(U) = 1\), then we can write any other set of \(n\) vectors as \(UA\) for some matrix \(A\). The determinant of \(A\) is the function such that \(f(UA) = f(U) \det(A) = \det(A)\). Put differently, the determinant represents the change of volume in a mapping \(L = UAU^{-1}\) represented with respect to a basis quasimatrix \(U\) corresponding to a parallelipiped of unit (signed) volume. In order to satisfy this definition, the determinant must be

  • An alternating form on the concrete space \(\mathbb{F}^n \otimes \ldots \otimes \mathbb{F}^n\) (written in terms of matrices \(\mathbb{F}^{n \times n}\))

  • Equal to one for the identity matrix.

The interpretation of the determinant as representing a scaling of (signed) volumes explains various other properties, such as the important fact that \(\det(AB) = \det(A) \det(B)\). It also explains why determinants arise in the change-of-variables formula for integration, which is one of the main places where we will see them.

The determinant of a singular matrix must be zero, which explains why we can write the characteristic polynomial for a matrix as \(\det(zI-A)\). However, we prefer to treat both determinants and characteristic polynomials as interesting objects in their own right, and the connection through the determinant as a derived fact.


  1. “Vector spaces are like potato chips: you cannot have just one.” - W. Kahan↩︎

  2. In infinite-dimensional settings, we will restrict our attention to continuous linear functionals (the continuous dual space) rather than all linear functionals (the algebraic dual space).↩︎

  3. The set containing only the zero vector is the smallest subspace we ever talk about.↩︎

  4. “Caveat lector”: let the reader beware!↩︎

  5. Pafnuty Chebyshev was a nineteenth century Russian mathematician, and his name has been transliterated from the Cyrillic alphabet into the Latin alphabet in several different ways. We inherit our usual spelling from one of the French transliterations, but the symbol \(T\) for the polynomials comes from the German transliteration Tschebyscheff.↩︎

  6. Some authors write \(L(\mathcal{V}, \mathcal{U})\) for the set of bounded linear maps between two spaces. This distinction matters when we are dealing with infinite-dimensional normed vector spaces — since we are mostly concerned with the finite-dimensional case, we will not worry about it.↩︎

  7. In the absence of an inner product, the coimage is most naturally thought of as a subspace of \(\mathcal{V}^*\) and the cokernel as a subspace of \(\mathcal{W}^*\). The Riesz map allows us to move these back to subspaces of \(\mathcal{V}\) and \(\mathcal{W}\).↩︎

  8. The notion of an induced norm makes sense in infinite-dimensional spaces as well, but there the supremum may not be achieved.↩︎

  9. A monic polynomial is one in which the coefficient in front of the highest-degree term is 1.↩︎

  10. There is a canonical injection \(\mathcal{V}\rightarrow \mathcal{V}^{**}\) via \(v \mapsto (w^* \mapsto w^* v)\). In finite-dimensional spaces, this is alway also surjective; in infinite-dimensional spaces, it is only surjective when the space is reflexive.↩︎