8.5. GMRES#

The most important use of the Arnoldi iteration is to solve the square linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\).

In Demo 8.4.3, we attempted to replace the linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\) by the lower-dimensional approximation

\[\min_{\mathbf{x}\in \mathcal{K}_m} \| \mathbf{A}\mathbf{x}-\mathbf{b} \| = \min_{\mathbf{z}\in\mathbb{C}^m} \| \mathbf{A}\mathbf{K}_m\mathbf{z}-\mathbf{b} \|,\]

where \(\mathbf{K}_m\) is the Krylov matrix generated using \(\mathbf{A}\) and the seed vector \(\mathbf{b}\). This method was unstable due to the poor conditioning of \(\mathbf{K}_m\), which is a numerically poor basis for \(\mathcal{K}_m\).

The Arnoldi algorithm yields an orthonormal basis of the same space and fixes the stability problem. Set \(\mathbf{x}=\mathbf{Q}_m\mathbf{z}\) and obtain

(8.5.1)#\[\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z}-\mathbf{b} \bigr\|.\]

From the fundamental Arnoldi identity (8.4.6), this is equivalent to

(8.5.2)#\[\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\|.\]

Note that \(\mathbf{q}_1\) is a unit multiple of \(\mathbf{b}\), so \(\mathbf{b} = \|\mathbf{b}\| \mathbf{Q}_{m+1}\mathbf{e}_1\). Thus (8.5.2) becomes

(8.5.3)#\[\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\|.\]

The least-squares problems (8.5.1), (8.5.2), and (8.5.3) are all \(n\times m\). But observe that for any \(\mathbf{w}\in\mathbb{C}^{m+1}\),

\[ \|\mathbf{Q}_{m+1}\mathbf{w}\|^2 = \mathbf{w}^*\mathbf{Q}_{m+1}^*\mathbf{Q}_{m+1}\mathbf{w} = \mathbf{w}^*\mathbf{w} = \|\mathbf{w}\|^2.\]

The first norm in that equation is on \(\mathbb{C}^n\), while the last is on the much smaller space \(\mathbb{C}^{m+1}\). Hence the least-squares problem (8.5.3) is equivalent to

(8.5.4)#\[ \min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\,\mathbf{e}_1 \bigr\|,\]

which is of size \((m+1)\times m\). We call the solution of this minimization \(\mathbf{z}_m\), and then \(\mathbf{x}_m=\mathbf{Q}_m \mathbf{z}_m\) is the \(m\)th approximation to the solution of \(\mathbf{A}\mathbf{x}=\mathbf{b}\).

Algorithm 8.5.1 :  GMRES

Given \(n\times n\) matrix \(\mathbf{A}\) and \(n\)-vector \(\mathbf{b}\):

For \(m=1,2,\ldots\), let \(\mathbf{x}_m=\mathbf{Q}_m \mathbf{z}_m\), where \(\mathbf{z}_m\) solves the linear least-squares problem (8.5.4), and \(\mathbf{Q}_m,\mathbf{H}_m\) arise from the Arnoldi iteration.

GMRES1 uses the Arnoldi iteration to minimize the residual \(\mathbf{b} - \mathbf{A}\mathbf{x}\) over successive Krylov subspaces. In exact arithmetic, GMRES should get the exact solution when \(m=n\), but the goal is to reduce the residual enough to stop at some \(m \ll n\).2

Demo 8.5.2

We define a triangular matrix with known eigenvalues and a random vector \(\mathbf{b}\).

λ = @. 10 + (1:100)
A = triu(rand(100,100),1) + diagm(λ)
b = rand(100);

Instead of building the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.

Q,H = FNC.arnoldi(A,b,60);

The Arnoldi bases are used to solve the least-squares problems defining the GMRES iterates.

resid = [norm(b);zeros(60)]
for m in 1:60  
    s = [norm(b); zeros(m)]
    z = H[1:m+1,1:m]\s
    x = Q[:,1:m]*z
    resid[m+1] = norm(b-A*x)
 end

The approximations converge smoothly, practically all the way to machine epsilon.

plot(0:60,resid,m=:o,
    xaxis=(L"m"),yaxis=(:log10,"norm of mth residual"), 
    title="Residual for GMRES",leg=:none)
norm of mth residual Residual for GMRES

Compare the graph in Demo 8.5.2 to the one in Demo 8.4.3. Both start with the same linear convergence, but only the version using Arnoldi avoids the instability created by the poor Krylov basis.

A basic implementation of GMRES is given in Function 8.5.3.

Function 8.5.3 :  gmres

GMRES for a linear system

 1"""
 2    gmres(A,b,m)
 3
 4Do `m` iterations of GMRES for the linear system `A`*x=`b`. Returns
 5the final solution estimate x and a vector with the history of
 6residual norms. (This function is for demo only, not practical use.)
 7"""
 8function gmres(A,b,m)
 9    n = length(b)
10    Q = zeros(n,m+1)
11    Q[:,1] = b/norm(b)
12    H = zeros(m+1,m)
13
14    # Initial solution is zero.
15    x = 0
16    residual = [norm(b);zeros(m)]
17
18    for j in 1:m
19        # Next step of Arnoldi iteration.
20        v = A*Q[:,j]
21        for i in 1:j
22            H[i,j] = dot(Q[:,i],v)
23            v -= H[i,j]*Q[:,i]
24        end
25        H[j+1,j] = norm(v)
26        Q[:,j+1] = v/H[j+1,j]
27
28        # Solve the minimum residual problem.
29        r = [norm(b); zeros(j)]
30        z = H[1:j+1,1:j] \ r
31        x = Q[:,1:j]*z
32        residual[j+1] = norm( A*x - b )
33    end
34    return x,residual
35end

Convergence and restarting#

Thanks to Theorem 8.4.2, minimization of \(\|\mathbf{b}-\mathbf{A}\mathbf{x}\|\) over \(\mathcal{K}_{m+1}\) includes minimization over \(\mathcal{K}_m\). Hence the norm of the residual \(\mathbf{r}_m = \mathbf{b} - \mathbf{A}\mathbf{x}_m\) (being the minimized quantity) cannot increase as the iteration unfolds.

Unfortunately, making other conclusive statements about the convergence of GMRES is neither easy nor simple. Demo 8.5.2 shows the cleanest behavior: essentially linear convergence down to the range of machine epsilon. But it is possible for the convergence to go through phases of sublinear and superlinear convergence as well. There is a strong dependence on the eigenvalues of the matrix, a fact we state with more precision and detail in the next section.

One of the practical challenges in GMRES is that as the dimension of the Krylov subspace grows, the number of new entries to be found in \(\mathbf{H}_m\) and the total number of columns in \(\mathbf{Q}\) also grow. Thus both the work and the storage requirements are quadratic in \(m\), which can become intolerable in some applications. For this reason, GMRES is often used with restarting.

Suppose \(\hat{\mathbf{x}}\) is an approximate solution of \(\mathbf{A}\mathbf{x}=\mathbf{b}\). Then if we set \(\mathbf{x}=\mathbf{u}+\hat{\mathbf{x}}\), we have \(\mathbf{A}(\mathbf{u}+\hat{\mathbf{x}}) = \mathbf{b}\), or \(\mathbf{A}\mathbf{u} = \mathbf{b} - \mathbf{A}\hat{\mathbf{x}}\). The conclusion is that if we get an approximate solution and compute its residual \(\mathbf{r}=\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}\), then we need only to solve \(\mathbf{A}\mathbf{u} = \mathbf{r}\) in order to get a correction to \(\hat{\mathbf{x}}\).3

Restarting guarantees a fixed upper bound on the per-iteration cost of GMRES. However, this benefit comes at a price. Even though restarting preserves progress made in previous iterations, the Krylov space information is discarded and the residual minimization process starts again over low-dimensional spaces. That can significantly retard or even stagnate the convergence.

Demo 8.5.4

The following experiments are based on a matrix resulting from discretization of a partial differential equation.

A = FNC.poisson(50)
n = size(A,1)
b = ones(n);
spy(A,color=:blues)