# 8.7. Matrix-free iterations#

A primary reason for our interest in matrices is their relationship to linear transformations. If we define $$\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}$$, then for all vectors $$\mathbf{x}$$, $$\mathbf{y}$$, and scalars $$\alpha$$,

(8.7.1)#$\begin{split}\begin{split} \mathbf{f}(\mathbf{x} + \mathbf{y} ) &= \mathbf{f}(\mathbf{x}) + \mathbf{f}(\mathbf{y} ), \\ \mathbf{f}(\alpha \mathbf{x} ) & = \alpha\, \mathbf{f}(\mathbf{x}). \end{split}\end{split}$

These properties define a linear transformation. Moreover, every linear transformation between finite-dimensional vector spaces can be represented as a matrix-vector multiplication.

## Matrix-free iterations#

In Chapter 4 we solved the nonlinear rootfinding problem $$\mathbf{f}(\mathbf{x})=\boldsymbol{0}$$ with methods that needed only the ability to evaluate $$\mathbf{f}$$ at any known value of $$\mathbf{x}$$. By repeatedly evaluating $$\mathbf{f}$$ at cleverly chosen points, these algorithms were able to return an estimate for $$\mathbf{f}^{-1}(\boldsymbol{0})$$.

A close examination reveals that the power method and Krylov subspace methods have the same structure because the only appearance of the matrix $$\mathbf{A}$$ in them is to multiply a known vector, i.e., to evaluate $$\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}$$. This is used to evaluate the inverse, $$\mathbf{A}^{-1}\mathbf{b}$$.

Bringing these points of view together leads us to a cornerstone of modern scientific computation: matrix-free iterations. Krylov subspace methods can be used to invert a linear transformation if one provides code for the transformation, even if its associated matrix is not known explicitly.

## Blurring images#

In Section 7.1 we saw that a grayscale image can be represented as an $$m\times n$$ matrix $$\mathbf{X}$$ of pixel intensity values. Now consider a simple model for blurring the image. Define $$\mathbf{B}$$ as the $$m\times m$$ tridiagonal matrix

(8.7.2)#$\begin{split}B_{ij} = \begin{cases} \tfrac{1}{2} & \text{if i=j},\\ \tfrac{1}{4} & \text{if |i-j|=1},\\ 0 & \text{otherwise.} \end{cases}\end{split}$

The product $$\mathbf{B}\mathbf{X}$$ applies $$\mathbf{B}$$ to each column of $$\mathbf{X}$$. Within that column it does a weighted average of the values of each pixel and its two neighbors. That has the effect of blurring the image vertically. We can increase the amount of blur by applying $$\mathbf{B}$$ repeatedly.

In order to blur horizontally, we can transpose the image and apply blurring in the same way. We need a blurring matrix defined as in (8.7.2) but with size $$n\times n$$. We call this matrix $$\mathbf{C}$$. Altogether the horizontal blurring is done by transposing, applying $$\mathbf{C}$$, and transposing back to the original orientation. That is,

$\bigl(\mathbf{C} \mathbf{X}^T\bigr)^T = \mathbf{X}\mathbf{C}^T = \mathbf{X}\mathbf{C},$

using the symmetry of $$\mathbf{C}$$. So we can describe blur in both directions as the function

(8.7.3)#$\operatorname{blur}(\mathbf{X}) = \mathbf{B}^k \mathbf{X} \mathbf{C}^k$

for a positive integer $$k$$.

Demo 8.7.1

We use a readily available test image.

img = testimage("mandrill")
m,n = size(img)
X = @. Float64(Gray(img))
plot(Gray.(X),title="Original image",aspect_ratio=1)

┌ Info: Precompiling TiffImages [731e570b-9d59-4bfa-96dc-6df516fadf69]


We define the one-dimensional tridiagonal blurring matrices.

function blurmatrix(d)
v1 = fill(0.25,d-1)
return spdiagm(0=>fill(0.5,d), 1=>v1, -1=>v1)
end
B,C = blurmatrix(m),blurmatrix(n);


Finally, we show the results of using $$k=12$$ repetitions of the blur in each direction.

blur = X -> B^12 * X * C^12;
Z = blur(X)
plot(Gray.(Z),title="Blurred image")


## Deblurring#

A more interesting operation is deblurring: given an image blurred by poor focus, can we reconstruct the true image? Conceptually, we want to invert the function $$\operatorname{blur}(\mathbf{X})$$.

It’s easy to see from (8.7.3) that the blur operation is a linear transformation on image matrices. But an $$m\times n$$ image matrix is equivalent to a length-$$mn$$ vector—it’s just a matter of interpreting the shape of the same data. Let $$\operatorname{vec}(\mathbf{X})=\mathbf{x}$$ and $$\operatorname{unvec}(\mathbf{x})=\mathbf{X}$$ be the mathematical statements of such reshaping operations. Now say $$\mathbf{X}$$ is the original image and $$\mathbf{Z}=\operatorname{blur}(\mathbf{X})$$ is the blurred one. Then by linearity there is some matrix $$\mathbf{A}$$ such that

$\mathbf{A} \operatorname{vec}(\mathbf{X}) = \operatorname{vec}(\mathbf{Z}),$

or $$\mathbf{A}\mathbf{x}=\mathbf{z}$$.

The matrix $$\mathbf{A}$$ is $$mn\times mn$$; for a 12-megapixel image, it would have $$1.4\times 10^{14}$$ entries! Admittedly, it is extremely sparse, but the point is that we don’t need it at all.

Instead, given any vector $$\mathbf{u}$$ we can compute $$\mathbf{v}=\mathbf{A}\mathbf{u}$$ through the steps

\begin{align*} \mathbf{U} &= \operatorname{unvec}(\mathbf{u}),\\ \mathbf{V} &= \operatorname{blur}(\mathbf{U}),\\ \mathbf{v} &= \operatorname{vec}(\mathbf{V}). \end{align*}

The following example shows how to put these ideas into practice with MINRES.

Demo 8.7.2

We repeat the earlier process to blur an original image $$\mathbf{X}$$ to get $$\mathbf{Z}$$.

img = testimage("lighthouse")
m,n = size(img)
X = @. Float64(Gray(img))

B = spdiagm(0=>fill(0.5,m),
1=>fill(0.25,m-1),-1=>fill(0.25,m-1))
C = spdiagm(0=>fill(0.5,n),
1=>fill(0.25,n-1),-1=>fill(0.25,n-1))
blur = X -> B^12 * X * C^12
Z = blur(X)
plot(Gray.(Z),title="Blurred image")

┌ Info: Precompiling PNGFiles [f57f5aa1-a3ce-4bc8-8ab9-96f992907883]