The Generalized Minimal Residual (GMRES) Method

The article assumes the reader has basic knowledge and understanding on linear algebra.

The standalone PDF version of this article is available at {{< resource path="gmres.pdf" type="file" >}} Or, if you prefer the blog (web page) version, you could print this article by clicking the print button.

We only consider real-valued linear systems.

## Subspace Projection Method ¶

### The Idea ¶

Consider the nonsingular linear system

$$ \boldsymbol{A} \boldsymbol{x} = \boldsymbol{b} $$

where $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ and $\boldsymbol{b} \in \mathbb{R}^n$. The system is equivalent to the optimization problem:

$$ \boldsymbol{A} \boldsymbol{x} = \boldsymbol{b} \Leftrightarrow \underset{x \in \mathbb{R}^n}{\min} \lVert \boldsymbol{b} - \boldsymbol{A} \boldsymbol{x} \rVert $$

For large-scale linear system, $n$ can be very large, instead of finding a solution in $\mathbb{R}^n$, the subspace projection method tries to find a approximate solution in its subspace $\mathcal{S}$, such that the approximate solution satisfies some constraint making it a good estimate to the optimal solution:

$$ \underset{x \in \mathbb{R}^n}{\min} \lVert \boldsymbol{b} - \boldsymbol{A} \boldsymbol{x} \rVert \Rightarrow \underset{x \in \mathcal{S}}{\min} \lVert \boldsymbol{b} - \boldsymbol{A} \boldsymbol{x} \rVert $$

Let $\mathcal{S}$ and $\mathcal{C}$ be $m$-dimensional subspaces of $\mathbb{R}^n$, where $\dim{(\mathcal{S})} = \dim{(\mathcal{C})} = m \ll n$, $\mathcal{S}$ is called the search space, $\mathcal{C}$ is called the constraint space. The projection method for solving the linear system $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$ is formulated as:

$$ \text{find} \quad \boldsymbol{x}^* \in \mathcal{S} \quad \text{such that} \quad \boldsymbol{r}^* = \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x}^* \perp \mathcal{C} $$

If given the initial guess $\boldsymbol{x}_0$, then the initial residual $\boldsymbol{r}_0 = \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x}_0$, we can instead find the solution in the affine space $\boldsymbol{x}_0 + \mathcal{S}$, then it can be reformulated as:

$$ \boxed{\text{find} \quad \boldsymbol{x}^* = \boldsymbol{x}_0 + \boldsymbol{\delta}, \ \boldsymbol{\delta} \in \mathcal{S} \quad \text{such that} \quad \boldsymbol{r}^* = \boldsymbol{r}_0 - \boldsymbol{A}\boldsymbol{\delta} \perp \mathcal{C}} $$

The degree of freedom is reduced from $n$ to $m$, with $m$ constraints. The orthogonal condition $$ \boxed{\boldsymbol{r}^* = \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x}^* \perp \mathcal{C}} $$ is called Petrov-Galerkin condition, specially, if $\mathcal{S} = \mathcal{C}$, it’s called Galerkin condition.

### Matrix Representation ¶

Let $\boldsymbol{S} = \lbrack \ \boldsymbol{s}_1 \mid \dots \mid \boldsymbol{s}_m \ \rbrack$ whose columns form a basis of $\mathcal{S}$, and $\boldsymbol{C} = \lbrack \ \boldsymbol{c}_1 \mid \dots \mid \boldsymbol{c}_m \ \rbrack$ whose columns form a basis of $\mathcal{C}$. The approximate solution $\boldsymbol{x}^*$ can be written as

$$ \begin{align} \boldsymbol{x}^* &= \boldsymbol{x}_0 + \boldsymbol{\delta}, \quad \boldsymbol{\delta} \in \mathcal{S} \\ &= \boldsymbol{x}_0 + \boldsymbol{S}\boldsymbol{t}, \quad \exists \boldsymbol{t} \in \mathbb{R}^m \end{align} $$

The Petrov-Galerkin condition gives

$$ \begin{align} &\boldsymbol{r}_0 - \boldsymbol{A}\boldsymbol{\delta} \perp \mathcal{C} \\ \Leftrightarrow\ &\boldsymbol{r}_0 - \boldsymbol{A}\boldsymbol{S}\boldsymbol{t} \perp \boldsymbol{c}_i \quad \forall i = 1, \dots, m \\ \Leftrightarrow\ &\boldsymbol{C}^{\top} (\boldsymbol{r}_0 - \boldsymbol{A}\boldsymbol{S}\boldsymbol{t}) = 0 \\ \Leftrightarrow\ &\boldsymbol{C}^{\top}\boldsymbol{A}\boldsymbol{S}\boldsymbol{t} = \boldsymbol{C}^{\top}\boldsymbol{r}_0 \end{align} $$

If $\boldsymbol{C}^{\top} \boldsymbol{A}\boldsymbol{S}$ is nonsingular, then

$$ \begin{align} &\boldsymbol{t} = (\boldsymbol{C}^{\top}\boldsymbol{A}\boldsymbol{S})^{-1} \boldsymbol{C}^{\top}\boldsymbol{r}_0 \\ \Rightarrow\ &x^* = \boldsymbol{x}_0 + \boldsymbol{S}(\boldsymbol{C}^{\top}\boldsymbol{A}\boldsymbol{S})^{-1} \boldsymbol{C}^{\top}\boldsymbol{r}_0 \end{align} $$

    \begin{algorithm}
\caption{Subspace Projection Method}
\begin{algorithmic}
\PROCEDURE{subspace-projection-method}{$\boldsymbol{A}, \boldsymbol{b}, \boldsymbol{x}_0$}
    \FOR{$k = 0$ \TO convergence}
        \STATE Find bases $\boldsymbol{S} = \lbrack \ \boldsymbol{s}_1 \mid \dots \mid \boldsymbol{s}_{m} \ \rbrack$ and $\boldsymbol{C} = \lbrack \ \boldsymbol{c}_1 \mid \dots \mid \boldsymbol{c}_{m} \ \rbrack$ for $\mathcal{S}$ and $\mathcal{C}$
        \STATE $\boldsymbol{r}_k := \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x}_k$
        \STATE $\boldsymbol{t}_k := (\boldsymbol{C}^{\top}\boldsymbol{A}\boldsymbol{S})^{-1} \boldsymbol{C}^{\top}\boldsymbol{r}_k$
        \STATE $\boldsymbol{x}_{k+1} := \boldsymbol{x}_k + \boldsymbol{S}\boldsymbol{t}_k$
    \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

## Krylov Subspace ¶

The order-$p$ (or $p$-th) Krylov subspace generated by $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ and $\boldsymbol{v} \in \mathbb{R}^{n}$, denoted by $\mathcal{K}_m(\boldsymbol{A}, \boldsymbol{v})$, is defined as the linear subspace spanned by the set of vectors $\boldsymbol{A}^{i}\boldsymbol{v}$, where $i = 0, \dots, m-1$, formally

$$ \boxed{\mathcal{K}_m(\boldsymbol{A}, \boldsymbol{v}) = \text{span} \lbrace \boldsymbol{v}, \boldsymbol{A}\boldsymbol{v}, \dots, \boldsymbol{A}^{m-1}\boldsymbol{v} \rbrace} \subset \mathbb{R}^n $$

To express it in matrix form, we define Krylov matrix, denoted by $\boldsymbol{K}_m$ with respect to $\boldsymbol{A}$ and $\boldsymbol{v}$, to be the $n \times m$ matrix with $\boldsymbol{A}^{j}\boldsymbol{v}$ as its $(j+1)$-th column, that is

$$ \boxed{\boldsymbol{K}_m(\boldsymbol{A}, \boldsymbol{v}) = \boldsymbol{K}_m = \begin{bmatrix} \ \boldsymbol{v} \mid \boldsymbol{A}\boldsymbol{v} \mid \dots \mid \boldsymbol{A}^{m-1}\boldsymbol{v} \ \end{bmatrix}} $$

$\boldsymbol{A}$ and $\boldsymbol{v}$ are often omitted if they are understood in context, then $\mathcal{K}_m(\boldsymbol{A}, \boldsymbol{v}) = \mathcal{K}_m$ and $\boldsymbol{K}_m(\boldsymbol{A}, \boldsymbol{v}) = \boldsymbol{K}_m$.

The columns of $\boldsymbol{K}_m$ spans the $m$-th Krylov subspace, that is

$$ \text{Col} (\boldsymbol{K}_m) = \mathcal{K}_m $$

Properties of the Krylov subspace:

The dimension of the $m$-th Krylov subspace is less than or equal to $m$.

$$ \boxed{\dim{(\mathcal{K}_m)} \leq m} $$
Given $\boldsymbol{A}$ and $\boldsymbol{v}$, a higher-order Krylov subspace contains lower-order subspaces, that is

$$ \boxed{\mathcal{K}_1 \subseteq \mathcal{K}_2 \subseteq \dots \subseteq \mathcal{K}_m} $$

The set of Krylov subspaces is nested.
Given $\boldsymbol{A}$ and $\boldsymbol{v}$, we have $$ \text{Col} (\boldsymbol{A} \boldsymbol{K}_m) \subset \text{Col} (\boldsymbol{K}_{m+1}) $$

If we define $\boldsymbol{A} \mathcal{K}_m = \text{span} \lbrace \boldsymbol{A}\boldsymbol{v}, \dots, \boldsymbol{A}^{m-1}\boldsymbol{v}, \boldsymbol{A}^{m}\boldsymbol{v} \rbrace = \text{Col} (\boldsymbol{A} \boldsymbol{K}_m)$, then it can be rewritten as

$$ \boxed{\boldsymbol{A} \mathcal{K}_m \subset \mathcal{K_{m+1}}} $$

In Krylov subspace methods for solving $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$, Krylov subspace $\mathcal{K}_m(\boldsymbol{A}, \boldsymbol{b})$ is employed as the search space in projection method:

$$ \mathcal{S} = \mathcal{K}_m $$

Different constraint space leads to different Krylov subspace methods:

$\mathcal{C} = \mathcal{K}_m$ : FOM, CG, SYMMLQ
$\mathcal{C} = \boldsymbol{A}\mathcal{K}_m$ : MINRES, GMRES
$\mathcal{C} = \mathcal{K}_m(\boldsymbol{A}^{\top}, r)$ : BiCG

## Arnoldi Iteration ¶

### GS-Arnoldi ¶

Arnoldi iteration (process) can be used to build an orthonormal basis of the Krylov subspace $\mathcal{K}_m$.

    \begin{algorithm}
\caption{Arnoldi Iteration}
\begin{algorithmic}
\PROCEDURE{arnoldi-iteration}{$\boldsymbol{A}$, $\boldsymbol{r}$}
    \STATE $\displaystyle\boldsymbol{v}_1 := \frac{\boldsymbol{r}}{\lVert \boldsymbol{r} \rVert}$
    \FOR{$j = 1, 2, \dots, m$}
        \FOR{$\color{blue}{i = 1, 2, \dots, j}$}
            \STATE $\color{blue}{h_{ij} := \langle \boldsymbol{A} \boldsymbol{v}_j, \boldsymbol{v}_i \rangle}$
        \ENDFOR
        \STATE $\color{blue}{\displaystyle\boldsymbol{w}_j := \boldsymbol{A} \boldsymbol{v}_j - \sum_{i=1}^{j} h_{ij} \boldsymbol{v}_i}$
        \STATE $h_{j+1,j} := \lVert \boldsymbol{w}_j \rVert$
        \IF{$h_{j+1,j} = 0$}
            \BREAK
        \ENDIF
        \STATE $\displaystyle\boldsymbol{v}_{j+1} := \frac{\boldsymbol{w}_j}{h_{j+1,j}}$
    \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

$\boldsymbol{v}_i$ is called Arnoldi vector. The Arnoldi iteration is basically a standard Gram-Schmidt process. At each step $j$, the algorithm first compute the vector $\boldsymbol{A}\boldsymbol{v}_j$, then orthogonalize $\boldsymbol{A}\boldsymbol{v}_j$ against all previous arnoldi vectors $\boldsymbol{v}_i$ where $i = 1, \dots, j$, giving the vector $\boldsymbol{w}_j$. If $\lVert \boldsymbol{w}_j \rVert$ is zero the algorithm stops, otherwise, the normalized $\lVert \boldsymbol{w}_j \rVert$ will be the next arnoldi vector $\boldsymbol{v}_{j+1}$.

The resulting set of (Arnoldi) vectors $\lbrace \boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_m \rbrace$ forms an orthonormal basis of $\mathcal{K}_m(\boldsymbol{A}, \boldsymbol{r})$. If the algorithm stops in advance, that is, $h_{k+1,k} = 0$ at step $k$, then $\boldsymbol{A}\boldsymbol{v}_k$ can be linearly represented by $\boldsymbol{v}_1, \dots, \boldsymbol{v}_k$.

### MGS-Arnoldi ¶

However, in practice, for numerical stability reason, the modified Gram-Schmidt (MGS) or the Householder transformation is employed to replace the standard Gram-Schmidt. They are mathmatically equivalent. The only difference is that line $4$-$6$ in the Alnoldi iteration is replaced with line $4$-$7$ in the MGS-Arnoldi iteration.

    \begin{algorithm}
\caption{MGS-Arnoldi Iteration}
\begin{algorithmic}
\PROCEDURE{mgs-arnoldi-iteration}{$\boldsymbol{A}$, $\boldsymbol{r}$}
    \STATE $\displaystyle\boldsymbol{v}_1 := \frac{\boldsymbol{r}}{\lVert \boldsymbol{r} \rVert}$
    \FOR{$j = 1, 2, \dots, m$}
        \STATE $\color{blue}{\boldsymbol{w}_j := \boldsymbol{A} \boldsymbol{v}_j}$
        \FOR{$\color{blue}{i = 1, 2, \dots, j}$}
            \STATE $\color{blue}{h_{ij} := \langle \boldsymbol{w}_j, \boldsymbol{v}_i \rangle}$
            \STATE $\color{blue}{\boldsymbol{w}_j := \boldsymbol{w}_j - h_{ij} \boldsymbol{v}_i}$
        \ENDFOR
        \STATE $h_{j+1,j} := \lVert \boldsymbol{w}_j \rVert$
        \IF{$h_{j+1,j} = 0$}
            \BREAK
        \ENDIF
        \STATE $\displaystyle\boldsymbol{v}_{j+1} := \frac{\boldsymbol{w}_j}{h_{j+1,j}}$
    \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

### Matrix Representation ¶

From the Arnoldi process we have, for $j = 0, \dots, m$

$$ \begin{align} &h_{j+1,j} \boldsymbol{v}_{j+1} = \boldsymbol{A} \boldsymbol{v}_j - \sum_{i=1}^{j} h_{ij} \boldsymbol{v}_i \\ \Rightarrow\ & \boldsymbol{A} \boldsymbol{v}_j = h_{j+1,j} \boldsymbol{v}_{j+1} + \sum_{i=1}^{j} h_{ij} \boldsymbol{v}_i \\ \Rightarrow\ & \boldsymbol{A} \boldsymbol{v}_j = \sum_{i=1}^{j+1} h_{ij} \boldsymbol{v}_i \\ \Rightarrow\ & \boldsymbol{A} \boldsymbol{V}_m = \boldsymbol{V}_{m+1} \bar{\boldsymbol{H}}_{m+1} \end{align} $$

where $\boldsymbol{V}$ is constructed as a matrix, such that its $j$-column is the arnoldi vector $\boldsymbol{v}_j$:

$$ \boldsymbol{V}_{m+1} = \begin{bmatrix} \boldsymbol{V}_{m} \mid \boldsymbol{v}_{m+1} \end{bmatrix} = \begin{bmatrix} \boldsymbol{v}_1 \mid \boldsymbol{v}_2 \mid \dots \mid \boldsymbol{v}_m \mid \boldsymbol{v}_{m+1} \end{bmatrix} \in \mathbb{R}^{(m+1) \times m} $$

and $\boldsymbol{\bar{H}}$ is construted as a matrix, such that its $(i, j)$-element is $h_{ij}$ and the rest are all zeros:

$$ \boldsymbol{\bar{H}}_{m+1} = \begin{bmatrix} \boldsymbol{H}_m \\ \hline \ \boldsymbol{0} \mid h_{m+1,m} \end{bmatrix} = \begin{bmatrix} h_{11} & h_{12} & \cdots & h_{1,m-1} & h_{1m} \\ h_{21} & h_{22} & \cdots & h_{2,m-1} & h_{2m} \\ 0 & h_{32} & \cdots & h_{3,m-1} & h_{3m} \\ 0 & 0 & \cdots & h_{4,m-1} & h_{4m} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & h_{m,m-1} & h_{m,m} \\ \hline \ 0 & 0 & \cdots & 0 & h_{m+1,m} \end{bmatrix} \in \mathbb{R}^{(m+1) \times m} $$

$\boldsymbol{H}_{m}$ is an upper Hessenberg matrix (the elements below its subdiagonal are all zeros).

Since the columns of $\boldsymbol{V}_{m}$ are orthonormal, we have

$$ \begin{align} & \boxed{\boldsymbol{A} \boldsymbol{V}_m = \boldsymbol{V}_{m+1} \bar{\boldsymbol{H}}_{m+1}} \\ \Rightarrow\ & \boxed{\boldsymbol{V}^{\top}_m \boldsymbol{A} \boldsymbol{V}_m = \boldsymbol{H}_{m}} \end{align} $$

## Generalized Minimal Residual Method ¶

It’s the most widely used, preferred method for solving large-scale unsymmetric linear systems.

Any vector $\boldsymbol{x} \in \mathbb{R}^n$ in $\boldsymbol{x}_0 + \mathcal{K}_m$ can be represented as

$$ \boldsymbol{x} = \boldsymbol{x}_0 + \boldsymbol{V}_m t $$

where $\boldsymbol{V}_m \in \mathbb{R}^{n \times m}$, $\boldsymbol{t} \in \mathbb{R}^m$, and $\boldsymbol{x}_0$ is the initial guess.

Finding the best approximate solution $\boldsymbol{x}^*$ is equivalent to solving the least squares method

$$ \begin{align} & \underset{\boldsymbol{x} \in \mathbb{R}^n}{\min} \ \lVert \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x} \rVert \\ \Leftrightarrow\ & \underset{\boldsymbol{t} \in \mathbb{R}^m}{\min} \ \lVert \boldsymbol{b} - \boldsymbol{A} (\boldsymbol{x}_0 + \boldsymbol{V}_m t) \rVert \\ \Leftrightarrow\ & \underset{\boldsymbol{t} \in \mathbb{R}^m}{\min} \ \lVert \boldsymbol{r}_0 - \boldsymbol{A} \boldsymbol{V}_m t \rVert \\ \Leftrightarrow\ & \underset{\boldsymbol{t} \in \mathbb{R}^m}{\min} \ \lVert \beta \boldsymbol{v}_1 - \boldsymbol{V}_{m+1} \bar{\boldsymbol{H}}_{m+1} \boldsymbol{t} \rVert \qquad \beta = \lVert \boldsymbol{r}_0 \rVert \\ \Leftrightarrow\ & \underset{\boldsymbol{t} \in \mathbb{R}^m}{\min} \ \lVert \boldsymbol{V}_{m+1} (\beta \boldsymbol{e}_1 - \bar{\boldsymbol{H}}_{m+1} \boldsymbol{t}) \rVert \\ \Leftrightarrow\ & \underset{\boldsymbol{t} \in \mathbb{R}^m}{\min} \ \lVert \beta \boldsymbol{e}_1 - \bar{\boldsymbol{H}}_{m+1} \boldsymbol{t} \rVert \qquad \boldsymbol{V}_{m+1}\ \text{is an orthonormal matrix} \end{align} $$

Therefore, the approximate solution $\boldsymbol{x}_m$ is given by

$$ \begin{align} & \boldsymbol{x}_m = \boldsymbol{x}_0 + \boldsymbol{V}_m \boldsymbol{t}_m \\ \text{where}\quad & \boldsymbol{t}_m = \underset{\boldsymbol{t} \in \mathbb{R}^m}{\text{argmin}} \lVert \beta \boldsymbol{e}_1 - \bar{\boldsymbol{H}}_{m+1} \boldsymbol{t} \rVert \end{align} $$

Typically $m \ll n$, it’s usually much inexpensive to compute.

    \begin{algorithm}
\caption{GMRES Method}
\begin{algorithmic}
\PROCEDURE{gmres}{$\boldsymbol{A}$, $\boldsymbol{b}$}
    \STATE $\boldsymbol{x}_0 :=$ initialize randomly
    \STATE $\boldsymbol{r}_0 := b - \boldsymbol{A} \boldsymbol{x}_0$
    \STATE $\displaystyle\boldsymbol{v}_1 := \frac{\boldsymbol{r}_0}{\lVert \boldsymbol{r}_0 \rVert_2}$
    \STATE $\beta := \lVert \boldsymbol{r}_0 \rVert_2$
    \FOR{$j := 1$ \TO $m$}
        \STATE $\boldsymbol{w}_j := \boldsymbol{A} \boldsymbol{v}_j$
        \FOR{$i := 1$ \TO $j$}
            \STATE $h_{ij} := \langle \boldsymbol{w}_j, \boldsymbol{v}_i \rangle$
            \STATE $\boldsymbol{w}_j := \boldsymbol{w}_j - h_{ij} \boldsymbol{v}_i$
        \ENDFOR
        \STATE $h_{j+1,j} := \lVert \boldsymbol{w}_j \rVert_2$
        \IF{$h_{j+1,j} = 0$}
            \STATE $m = j$
            \BREAK
        \ENDIF
        \STATE $\displaystyle\boldsymbol{v}_{j+1} := \frac{\boldsymbol{w}_j}{h_{j+1,j}}$
    \ENDFOR
    \STATE $\bar{\boldsymbol{H}}_{m+1} := \begin{bmatrix} h_{ij} \end{bmatrix}_{\ i \in [1, m+1],\ j \in [1, m]}$
    \STATE $\boldsymbol{t}_m = \underset{\boldsymbol{t} \in \mathbb{R}^m}{\text{argmin}} \lVert \beta \boldsymbol{e}_1 - \bar{\boldsymbol{H}}_{m+1} \boldsymbol{t} \rVert$
    \STATE $\boldsymbol{x}_m := \boldsymbol{x}_0 + \boldsymbol{V}_m \boldsymbol{t}_m$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

MGS-Arnoldi is employed in lines 6-15.

## Givens Rotation ¶

Givens rotation (also plane rotation) is represented by a matrix of the form

$$ \boldsymbol{G}(i, j, \theta) = \begin{bmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & & \vdots & & \vdots \\ 0 & \cdots & c & \cdots & s & \cdots & 0 \\ \vdots & & \vdots & \ddots & \vdots & & \vdots \\ 0 & \cdots & -s & \cdots & c & \cdots & 0 \\ \vdots & & \vdots & & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & \cdots & 0 & \cdots & 1 \end{bmatrix} $$

where $c = \cos{\theta}$ and $s = \cos{\theta}$, its $(k,l)$-entry $g_{kl}$ is defined as $$ g_{kl} = \begin{cases} c &\qquad k = l = i \ \text{or}\ k = l = j \\ s &\qquad k = i, \ l = j \\ -s &\qquad k = j, \ l = i \\ 1 &\qquad k = l \neq i,\ j \\ 0 &\qquad \text{otherwise} \end{cases} $$

It only differs from identity matrix in elements $(i,i)$, $(i,j)$, $(j,i)$ and $(j, j)$, the rest elements are same.

Givens matrix $\boldsymbol{G}(i, j, \theta)$ is an orthogonal matrix (obvious :D).

Given a Givens matrix $\boldsymbol{G} = \boldsymbol{G}(i, j, \theta) \in \mathbb{R}^{n \times n}$ and vector $\boldsymbol{x} \in \mathbb{R}^n$, the product $\boldsymbol{G}\boldsymbol{x}$ means:

Rotate the vector $\boldsymbol{x}$ in the $(i, j)$ plane of $\theta$ radians counterclockwise. More accurately, it’s rotating the projection of $\boldsymbol{x}$ onto the $(i, j)$ plane ── It’s also called plane rotation matrix.
$\boldsymbol{G}$ only changes the $i$-th and $j$-th elements of $\boldsymbol{x}$, the rest elements are unaffected.
- When $G$ is left (right) multiplied to a matrix $A$, only the $i$-th and $j$-th rows (columns) of $A$ is affected.

The main use of Givens matrix in numerical linear algebra is to transform a vector into form with certain element to be zero.

Given a vector $\boldsymbol{x} \in \mathbb{R}^n$ $$ \boldsymbol{x} = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}^{\top} $$

then the Givens transformation $\boldsymbol{G}(i, j, \theta)$ can be constructed, where $\displaystyle\theta = \arctan{\frac{x_j}{x_i}} \left( c = \displaystyle\frac{x_i}{\sqrt{x_i^2 + x_j^2}} \ \text{and}\ s = \displaystyle\frac{x_j}{\sqrt{x_i^2 + x_j^2}} \right)$

$$ \boldsymbol{G}_{ij} = \begin{bmatrix} 1 & & & & & & \\ & \ddots & & & & & \\ & & \displaystyle\frac{x_i}{\sqrt{x_i^2 + x_j^2}} & \cdots & \displaystyle\frac{x_j}{\sqrt{x_i^2 + x_j^2}} & & \\ & & \vdots & \ddots & \vdots & & \\ & & -\displaystyle\frac{x_j}{\sqrt{x_i^2 + x_j^2}} & \cdots & \displaystyle\frac{x_i}{\sqrt{x_i^2 + x_j^2}} & & \\ & & & & & \ddots & \\ & & & & & & 1 \end{bmatrix} $$

such that

$$ \begin{matrix} \boldsymbol{G}_{ij}\boldsymbol{x} = \lbrack x_1 & x_2 & \cdots & \sqrt{x_i^2 + x_j^2} & \cdots & 0 & \cdots & x_n \rbrack^{\top} \\ &&& \uparrow && \uparrow && \\ &&& i && j && \\ \end{matrix} $$

The $j$-th element $x_j$ is transformed to zero.

Givens transformation can be employed to perform QR decomposition, constructing Givens matrix repeatedly to eliminate those elements under the diagonal of the matrix, could give us a upper triangular matrix.

Given a matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$, denote $\boldsymbol{G}_{ij}$ as the Givens matrix that will transform the $(i, j)$-element of $a_{ij}$ to $0$, constructed by element $a_{jj}$ and $a_{ij}$.

For example, $\boldsymbol{G}_{21}$ only affects the first two rows of $\boldsymbol{A}$, and $a_{21}$ becomes zero

$$ \boldsymbol{G}_{21} \boldsymbol{A} = \begin{bmatrix} \tilde{a}_{11} & \tilde{a}_{12} & \cdots & \tilde{a}_{1n} \\ 0 & \tilde{a}_{22} & \cdots & \tilde{a}_{2n} \\ \vdots & \vdots & & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{bmatrix} $$

To transform the elements under $a_{11}$ in the first column of $\boldsymbol{A}$ to zeros, construct $\boldsymbol{G}_{21}, \boldsymbol{G}_{31}, \dots, \boldsymbol{G}_{n1}$, left multiply these matrices to $\boldsymbol{A}$ gives

$$ \boldsymbol{G}_{n1} \cdots \boldsymbol{G}_{31} \boldsymbol{G}_{21} \boldsymbol{A} = \begin{bmatrix} \star & \star & \cdots & \star \\ 0 & \star & \cdots & \star \\ \vdots & \vdots & & \vdots \\ 0 & \star & \cdots & \star \\ \end{bmatrix} $$

The similar procedure can be done for the $2$-th column to the $n$-th column, in the end, we have constructed $\displaystyle\frac{n(n-1)}{2}$ Givens matrices, such that

$$ \boldsymbol{R} = \boldsymbol{G}_{n,n-1} \cdots \boldsymbol{G}_{21} \boldsymbol{A} $$

where $\boldsymbol{R}$ is an upper triangular matrix. Let $\boldsymbol{Q}^{\top} = \boldsymbol{G}_{n,n-1} \cdots \boldsymbol{G}_{21}$, since Givens matrix is orthogonal, $\boldsymbol{Q}$ is also orthogonal, thus, we have a QR decomposition of $\boldsymbol{A}$:

$$ \begin{align} &\quad \boldsymbol{A} = \boldsymbol{Q}\boldsymbol{R} \\ \text{where}&\quad \boldsymbol{Q} = \left( \prod_{j=1}^{n-1}\prod_{i=n}^{j+1}\boldsymbol{G}_{ij} \right)^{\top} \\ \text{and}&\quad \boldsymbol{R} = \left( \prod_{j=1}^{n-1}\prod_{i=n}^{j+1}\boldsymbol{G}_{ij} \right) \boldsymbol{A} \end{align} $$

    \begin{algorithm}
\caption{Givens-QR Decomposition}
\begin{algorithmic}
\PROCEDURE{givens-qr}{$\boldsymbol{A} \in \mathbb{R}^{m \times n}$}
    \STATE $\boldsymbol{Q} := \boldsymbol{I}_{m}$
    \FOR{$j = 1$ \TO $n-1$}
        \FOR{$i = j + 1$ \TO $m$}
            \STATE $c, s :=$ \CALL{givens}{$a_{jj}, a_{ij}$}
            \STATE $\boldsymbol{G}_{ij} := \begin{bmatrix} c & s \\ -s & c \end{bmatrix}$
            \STATE $\begin{bmatrix}\boldsymbol{A}(j, j:n) \\ \boldsymbol{A}(i, j:n)\end{bmatrix} = \boldsymbol{G}_{ij} \begin{bmatrix}\boldsymbol{A}(j, j:n) \\ \boldsymbol{A}(i, j:n)\end{bmatrix}$
            \STATE $\begin{bmatrix}\boldsymbol{Q}(:, i) \ \boldsymbol{Q}(:, j)\end{bmatrix} = \begin{bmatrix}\boldsymbol{Q}(:, i) \ \boldsymbol{Q}(:, j)\end{bmatrix} \boldsymbol{G}_{ij}^{\top}$
        \ENDFOR
    \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

Note that the algorithm is performed "inplace", that is, after the iteration finishes $\boldsymbol{A}$ becomes $\boldsymbol{R}$.

For dense matrix, it is expensive to perform Givens-QR decomposition. But for sparse matrix, or matrix that has little nonzeros in its lower triangular part (e.g. upper hessenberg matrix), Givens-QR is fast and cheap.

## Practical GMRES ¶

The difficulty with the GMRES method is that you cannot have the approximate solution at each step, so we cannot know when to stop.

Givens rotation is employed at each step, to transform the upper hessenberg matrix to an upper triangular form. In the end the least-squares problem becomes a upper triangular linear system.

    \begin{algorithm}
\caption{Practical GMRES Method}
\begin{algorithmic}
\PROCEDURE{givens}{$x, y$}
    \STATE $r := \sqrt{x^2 + y^2}$
    \STATE $\displaystyle c, s := \frac{x}{r}, \frac{y}{r}$
    \RETURN $c, s$
\ENDPROCEDURE

\PROCEDURE{practical-gmres}{$\boldsymbol{A}$, $\boldsymbol{b}, \epsilon$}
    \STATE $\boldsymbol{x}_0 :=$ initialize randomly
    \STATE $\boldsymbol{r}_0 := b - \boldsymbol{A} \boldsymbol{x}_0$
    \STATE $\beta := \lVert \boldsymbol{r}_0 \rVert_2$
    \STATE $\displaystyle\boldsymbol{v}_1 := \frac{\boldsymbol{r}_0}{\beta}$
    \STATE $\boldsymbol{\rho} = \beta \boldsymbol{e}_1$
    \FOR{$j := 1$ \TO $m$}
        \STATE $\boldsymbol{w}_j := \boldsymbol{A} \boldsymbol{v}_j$
        \FOR{$i := 1$ \TO $j$}
            \STATE $h_{ij} := \langle \boldsymbol{w}_j, \boldsymbol{v}_i \rangle$
            \STATE $\boldsymbol{w}_j := \boldsymbol{w}_j - h_{ij} \boldsymbol{v}_i$
        \ENDFOR
        \STATE $h_{j+1,j} := \lVert \boldsymbol{w}_j \rVert_2$
        \STATE \COMMENT{Apply givens rotation $\boldsymbol{G}_{j-1}, \dots, \boldsymbol{G}_1$ to the last column of $\boldsymbol{H}_{j+1,j}$}
        \FOR{$i := 1$ \TO $j-1$}
            \STATE $\begin{bmatrix}h_{i,j} \\ h_{i+1,j}\end{bmatrix} = \begin{bmatrix}c_i & s_i \\ -s_i & c_i\end{bmatrix} \begin{bmatrix}h_{i,j} \\ h_{i+1,j}\end{bmatrix}$
        \ENDFOR
        \IF{$h_{j+1,j} = 0$}
            \STATE $m = j$
            \BREAK
        \ENDIF
        \STATE $\displaystyle\boldsymbol{v}_{j+1} := \frac{\boldsymbol{w}_j}{h_{j+1,j}}$
        \STATE \COMMENT{Compute the givens rotation $\boldsymbol{G}_j$}
        \STATE $c_j, s_j :=$ \CALL{givens}{$h_{j,j}, h_{j+1,j}$}
        \STATE \COMMENT{Apply $\boldsymbol{G}_j$ to the last column of $\boldsymbol{H}_{j+1,j}$ to annihilate $h_{j+1,j}$}
        \STATE $\begin{bmatrix}h_{j,j} \\ h_{j+1,j}\end{bmatrix} = \begin{bmatrix}c_j & s_j \\ -s_j & c_j\end{bmatrix} \begin{bmatrix}h_{j,j} \\ h_{j+1,j}\end{bmatrix}$
        \STATE \COMMENT{Apply $\boldsymbol{G}_j$ to the rhs $\boldsymbol{\rho}$}
        \STATE $\begin{bmatrix}\rho_j \\ \rho_{j+1}\end{bmatrix} = \begin{bmatrix}c_j & s_j \\ -s_j & c_j\end{bmatrix} \begin{bmatrix}\rho_j \\ 0\end{bmatrix}$
        \STATE \COMMENT{Check convergence}
        \IF{$\displaystyle\frac{\rho_{j+1}}{\beta} < \epsilon$}
            \STATE $m = j$
            \BREAK
        \ENDIF
    \ENDFOR
    \STATE Let $\boldsymbol{H}_m := \begin{bmatrix} h_{ij} \end{bmatrix}_{\ i, j \in [1, m]}$, $\boldsymbol{\rho}_m := \begin{bmatrix}\rho_j\end{bmatrix}_{j \in [1, m]}$ and $\boldsymbol{V}_m := \begin{bmatrix}\boldsymbol{v}_j\end{bmatrix}_{j \in [1, m]}$
    \STATE Solve the upper triangular linear system $\boldsymbol{H}_m \boldsymbol{t}_m = \boldsymbol{\rho}_m$
    \STATE Compute the solution $\boldsymbol{x}_m := \boldsymbol{x}_0 + \boldsymbol{V}_m \boldsymbol{t}_m$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

## References ¶

Wikipedia
Book chapters
- Part VI. Iterative Methods, Numerical Linear Algebra, Lloyd N. Trefethen and David Bau III
- Chapter 4-6, Iterative Methods for Sparse Linear Systems, 2nd Edition, Yousef Saad
Paper
- GMRES algorithms over 35 years, Qinmeng Zou, https://doi.org/10.48550/arXiv.2110.04017
Slides / Lecture notes
- Lecture 2, Krylov subspace iterative methods, Iterative Methods and Preconditioning, Jianyu Pan: https://math.ecnu.edu.cn/~jypan/Teaching/IMP/slides_IMP04_beamer.pdf (in Chinese)
- Lecture 3-3, Least squares problem: Householder transformation and Givens transformation, Numerical Analysis, Jianyu Pan: https://math.ecnu.edu.cn/~jypan/Teaching/NA/slides_03A_Transformation.pdf (in Chinese)

## Appendix ¶

English-Chinese terminology table:

English Terminology	中文术语
Subspace Projection Method	子空间投影法
Petrov–Galerkin Condition, Galerkin Condition	彼得罗夫-伽辽金条件，伽辽金条件
Krylov Subspace	克雷洛夫子空间
Krylov Subspace (Iterative) Methods	克雷洛夫子空间（迭代）法
{Upper, Lower} Hessenberg Matrix	{上，下}海森堡矩阵
Subdiagonal, Superdiagonal	下对角线，上对角线
Arnoldi Iteration	阿尔诺迪迭代
Generalized Minimal Residual Method (GMRES)	广义极小残差法
Givens {Transformation, Rotation, Matrix}, Plane Rotation Matrix	吉文斯{变换，旋转，矩阵}，平面旋转矩阵

MistY

professional cat

# The Generalized Minimal Residual (GMRES) Method

## Subspace Projection Method ¶

### The Idea ¶

### Matrix Representation ¶

## Krylov Subspace ¶

## Arnoldi Iteration ¶

### GS-Arnoldi ¶

### MGS-Arnoldi ¶

### Matrix Representation ¶

## Generalized Minimal Residual Method ¶

## Givens Rotation ¶

## Practical GMRES ¶

## References ¶

## Appendix ¶