\documentclass[11pt,onecolumn]{article}
\usepackage{amssymb, amsmath, amsthm,graphicx, paralist,algpseudocode,algorithm,cancel,url,color}
\usepackage{sectsty}
\usepackage{fancyvrb}
\usepackage{mathrsfs}
\usepackage{multirow}
\usepackage{hhline}
\usepackage{booktabs}
\usepackage[table]{xcolor}
\usepackage{tikz}
% \usepackage[framed,numbered,autolinebreaks,useliterate]{mcode}
\usepackage{listings}
\usepackage{enumitem}
\newcommand{\bvec}[1]{\mathbf{#1}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Rnn}{\R^{n\times n}}
\newcommand{\Rmn}{\R^{m\times n}}
\newcommand{\Cnn}{\C^{n\times n}}
\newcommand{\Cmn}{\C^{m\times n}}
\newcommand{\cO}{\mathcal{O}}
\DeclareMathOperator{\Tr}{Tr}
\DeclareMathOperator{\trace}{trace}
\DeclareMathOperator{\diag}{diag}
\DeclareMathOperator{\dist}{dist}
\DeclareMathOperator{\spann}{span}
\sectionfont{\Large\sc}
\subsectionfont{\sc}
\usepackage[margin=1 in]{geometry}
\begin{document}
\noindent
\textsc{\Large CS 6210: Homework 5}\\
Instructor: Anil Damle\\
Due: November 20, 2023\\
\subsection*{Policies}
You may discuss the homework problems freely with other students, but please
refrain from looking at their code or writeups (or sharing your own).
Ultimately, you must implement your own code and write up your own solution to
be turned in. Your solution, including plots and requested output from your code
should be typeset and submitted via the Gradescope as a pdf file. Additionally, please
submit any code written for the assignment. This can be done
by either including it in your solution as an appendix, or uploading it as a zip
file to the separate Gradescope assignment.\\
\hrule
\noindent
\subsection*{Question 1:}
Here we will prove one nice property of the Rayleigh Quotient $\left(r_A(z) = \frac{z^TAz}{z^Tz}\right)$ and use that to prove the
convergence rate of Rayleigh iteration. Assume that $v$ is an eigenvector of $A$ with
simple eigenvalue $\lambda$ and $z$ is some vector of unit length such that
$\dist(v,z) = \cO(\epsilon).$
\begin{enumerate}[label=(\alph*)]
\item Prove that if $A$ is real and symmetric then $\lvert \lambda - r_A(z)\rvert \leq \cO(\epsilon^2).$
\item Assume that $A$ is a $n\times n$ real and symmetric matrix with distinct eigenvalues
$\lambda_1,\lambda_2,\ldots,\lambda_n.$ Under the assumption that Rayleigh iteration converges
(technically it does for almost all starting vectors, i.e., all but a set of measure zero)
prove that the convergence is asymptotically cubic to some eigenvalue of $A.$
\item Assuming that $A$ is not symmetric (or hermitian) argue that in general we may only expect
the Rayleigh quotient $r_A$ to provide an eigenvalue estimate satisfying
$\lvert \lambda - r_A(z)\rvert \leq \cO(\epsilon)$ when $\dist(v,z) = \cO(\epsilon).$
\end{enumerate}
(Hint: it may be useful as an intermediary to prove that for two one dimensional subspaces
represented by unit length vectors $v$ and $z$ $\dist(v,z) = \sqrt{1-(v^Tz)^2}.$)
\subsection*{Question 2:}
Assume you are given an $n\times n$ matrix $A$ with eigenvalues such that
$\lvert \lambda_1 \rvert = \lvert \lambda_2 \rvert > \lvert \lambda_3 \rvert > \cdots >
\lvert\lambda_n\rvert.$ Is there any sense in which the iterates generated by the power
method converge? If you think they do, is there any way to get an estimate of any of the
eigenvalues of $A$ given the iterates?
\newpage
\subsection*{Question 3:}
Implement orthogonal iteration (also sometimes known as simultaneous iteration), you may use a
built in eigenvalues solver to find the $r$ eigenvalues of the projection of $A$ into the subspace
of current iterates.\\
\noindent
For the remainder of this problem let $A$ be an $n\times n$ real matrix with the real block Schur
factorization (see HW1 for details)
\[
A = \begin{bmatrix}Q_1 & Q_2\end{bmatrix}
\begin{bmatrix}T_{11} & T_{12} \\ & T_{22}\end{bmatrix}
\begin{bmatrix}Q_1 & Q_2\end{bmatrix}^T
\]
where $Q_1$ is the first $r$ columns of $Q,$ $T_{11}$ is $r\times r$ and the remaining
dimensions are as inferred. Furthermore, let $A$ have eigenvalues satisfying
$\lvert \lambda_1 \rvert \geq \cdots \geq \lvert \lambda_r \rvert > \lvert \lambda_{r+1}
\rvert \geq \lvert \lambda_{r+2} \rvert \geq \cdots \geq \lvert \lambda_n \rvert$ and assume
that the $T_{11}$ has eigenvalues $\lambda_1,\ldots,\lambda_r.$
\begin{enumerate}[label=(\alph*)]
\item For real symmetric $A$ of size $100 \times 100$ generate instances of this problem where
$\lambda_{r+1}\rightarrow \lambda_r$ and show your implementation achieves the desired convergence
rate for the invariant subspace of interest, both in terms of iterations and as a function of
the eigenvalue gap. Think carefully about how to illustrate this.
\item Now, assume $A$ is not normal and numerically explore how the convergence behavior of orthogonal
iteration changes as a function $\|T_{12}\|_F$ and $\|N\|_F,$ where $N$ is the off diagonal part of $T$ (i.e., $T = D + N$ where $D$ is diagonal).What are the implications of your observations? This question is deliberately quite open ended---the goal is explore convergence behavior. So, points will be given for reasonable experiments and discussions and are not tied to any highly specific conclusion.
\end{enumerate}
\subsection*{Question 4:}
We will now consider some details related to the convergence of the QR algorithm for non-normal matrices and reiterate a few point from class. Suppose that $H$ is an $n\times n$ square, diagonalizable upper Hessenberg matrix with distinct eigenvalues. Say we run the QR algorithm for $k$ steps with real shifts $\{\mu^{(i)}\}_{i=1}^k,$ i.e., we let $H^{(1)} = H$ and define the iterates for $i = 1,2,\ldots$ by (1) computing the QR factorization $(H^{(i)} - \mu^{(i)} I) = Q^{(i)}R^{(i)}$ and (2) forming $H^{(i+1)} = R^{(i)}Q^{(i)} + \mu^{(i)}.$ (Note the slight index shift from the notation used in class.)
\begin{enumerate}[label=(\alph*)]
\item Assume we are given an $n\times n$ upper Hessenberg matrix $H$ and are
running the QR algorithm with the Rayleigh shift $\mu^{(k)} = H^{(k)}_{n,n}$. Now, say
that at step $k$ we are computing the QR factorization of $H^{(k)} - \mu^{(k)}I$ using
Givens rotations and prior to applying the final necessary rotation we observe the structure
\[
G^{(n-2)}\cdots G^{(1)}\left(H^{(k)} - \mu^{(k)}I\right)
=
\begin{bmatrix}
R^{(k)}_{11} & R^{(k)}_{12} & R^{(k)}_{13} \\
0 & a & b \\
0 & \epsilon & 0
\end{bmatrix}
\]
where $R_{11}^{(k)}$ is $n-2 \times n-2$ and $a,b,$ and $\epsilon$ are scalars. Derive an
expression for $H^{(k+1)}_{n,n-1}$ where $H^{(k+1)} = R^{(k)}Q^{(k)} + \mu^{(k)}I$ and given
some conditions under which we would expect that entry to be $\cO(\epsilon^2).$ Notably,
this argues that we may sometimes expect quadratic convergence of each eigenvalue in the
non-symmetric case.
\item Prove that
\[
(H-\mu^{(k)} I)(H-\mu^{(k-1)} I)\cdots (H-\mu^{(1)} I) = Q^{(1)}\cdots Q^{(k)} R^{(k)} \cdots R^{(1)}.
\]
In other words, after $k$ steps of this process we have implicitly computed a QR factorization of some polynomial of $H$ whose roots are the shifts.
\item The result in part (b) is also useful in showing the connection to shifted inverse iteration. Assuming that none of the selected shifts are eigenvalues of $H$ show that the last column of $Q^{(1)}\cdots Q^{(k)}$ is equivalent to the vector obtained by running $k$ steps of shifted inverse iteration with $H^*$ starting from $e_n$ and with shifts $\{\mu^{(i)}\}_{i=1}^k.$ Does the order we use the shifts in change this result? (E.g., what if we used the shifts in a different order is the shifted QR algorithm and the sequence of steps of shifted inverse iteration?)
\item There are suitable assumptions that allow us to assert that $Q^{(1)}\cdots Q^{(k)}$ is converging to the unitary factor of a Schur form of $H.$ We want to show that our previous results mesh well with this perspective (while omitting details of when we can expect such a result). If $H$ has Schur form $H = UTU^*$ show that: (1) the last column of $U$ is an eigenvector of $H^*$ and (2) we can construct a Schur form of $(H-\mu^{(k)} I)(H-\mu^{(k-1)} I)\cdots (H-\mu^{(1)} I)$ that has the same unitary factor $U$ as in a Schur form of $H.$ (Note that the careful language here is because the Schur form is not quite unique---the eigenvalues on the diagonal of $T$ can be arbitrarily ordered and that changes the specifics of the unitary factor. Fortunately, these two results are stated in a way that is not sensitive to this fact.)
\end{enumerate}
\end{document}