Power iteration
Contents
8.2. Power iteration#
Given that matrix-vector multiplication is fast for sparse matrices, let’s see what we might accomplish with only that at our disposal.
Here we choose a random 5×5 matrix and a random 5-vector.
A = rand(1.:9.,5,5)
A = A./sum(A,dims=1)
x = randn(5)
5-element Vector{Float64}:
0.425781202469713
-0.5205355821472383
0.7579623174056564
1.2660763121739163
-1.3063430455782654
Applying matrix-vector multiplication once doesn’t do anything recognizable.
y = A*x
5-element Vector{Float64}:
0.06207777505990392
0.17097054226012237
0.270047544447043
-0.052553182031896906
0.1723985245886096
Repeating the multiplication still doesn’t do anything obvious.
z = A*y
5-element Vector{Float64}:
0.1390817587106179
0.09320200009545425
0.0988458893342177
0.13047409038352917
0.16133746579996294
But if we keep repeating the matrix-vector multiplication, something remarkable happens: \(\mathbf{A} \mathbf{x} \approx \mathbf{x}\).
for j in 1:8; x = A*x; end
[x A*x]
5×2 Matrix{Float64}:
0.154275 0.154275
0.10942 0.109421
0.108751 0.108752
0.120164 0.120161
0.130332 0.130332
This phenomenon seems to occur regardless of the starting vector.
x = randn(5)
for j in 1:8; x = A*x; end
[x A*x]
5×2 Matrix{Float64}:
0.134334 0.134337
0.0952766 0.0952794
0.0946953 0.0946971
0.104639 0.10463
0.113487 0.113488
There was a little cheating in Demo 8.2.1 to make the story come out neatly (specifically, the line A=A./sum(A,dims=1)
). But it illustrates an important general fact that we investigate now.
Dominant eigenvalue#
Analysis of matrix powers is most straightforward in the diagonalizable case. Let \(\mathbf{A}\) be any diagonalizable \(n\times n\) matrix having eigenvalues \(\lambda_1,\ldots,\lambda_n\) and corresponding linearly independent eigenvectors \(\mathbf{v}_1,\ldots,\mathbf{v}_n\). Furthermore, suppose the eigenvalues are such that
Given (8.2.1) we say that \(\lambda_1\) is the dominant eigenvalue. This was the case with \(\lambda_1=1\) for \(\mathbf{A}\) in Demo 8.2.1.
Now let \(\mathbf{x}\) be an \(n\)-vector, let \(k\) be a positive integer, and refer to (7.2.5):
Let \(\mathbf{z}=\mathbf{V}^{-1}\mathbf{x}\), and recall that \(\mathbf{D}\) is a diagonal matrix of eigenvalues. Then
Since \(\lambda_1\) is dominant, we conclude that if \(z_1\neq 0\),
That is, \(\mathbf{A}^k\mathbf{x}\) eventually is a scalar multiple of the dominant eigenvector.1
Attention
For algorithmic purposes, it is important to interpret \(\mathbf{A}^k\mathbf{x}\) as \(\mathbf{A}\bigl( \cdots\bigl( \mathbf{A} (\mathbf{A}\mathbf{x})\bigl) \cdots\bigl)\), i.e., as repeated applications of \(\mathbf{A}\) to a vector. Doing so allows us to fully exploit sparsity of \(\mathbf{A}\), something which is not preserved by taking a matrix power \(\mathbf{A}^k\) explicitly before the multiplication with \(\mathbf{x}\) (see Demo 8.1.2).
Power iteration#
An important technicality separates us from an algorithm: unless \(|\lambda_1|=1\), the factor \(\lambda_1^k\) tends to make \(\|\mathbf{A}^k\mathbf{x}\|\) either very large or very small. Nor can we easily normalize by \(\lambda_1^k\), as in (8.2.3), unless we know \(\lambda_1\) in advance.
To make a practical algorithm, we alternate matrix-vector multiplication with a renormalization of the vector. In the following, we use \(x_{k,m}\) and \(y_{k,m}\) to mean the \(m\)th component of vectors \(\mathbf{x}_k\) and \(\mathbf{y}_k\).
Given matrix \(\mathbf{A}\):
Choose \(\mathbf{x}_1\).
For \(k=1,2,\ldots\),
a. Set \(\mathbf{y}_k = \mathbf{A} \mathbf{x}_k\).
b. Find \(m\) such that \(|y_{k,m}|=\|{\mathbf{y}_k} \|_\infty\).
c. Set \(\alpha_k = \dfrac{1}{y_{k,m}}\) and \(\,\beta_k = \dfrac{y_{k,m}}{x_{k,m}}\).
d. Set \(\mathbf{x}_{k+1} = \alpha_k \mathbf{y}_k\).
Note that by definition, \(\| \mathbf{x}_{k+1}\|_\infty=1\). Also, we can write
Thus Algorithm 8.2.2 modifies (8.2.2) and (8.2.3) only slightly.
Finally, if \(\mathbf{x}_k\) is nearly a dominant eigenvector of \(\mathbf{A}\), then \(\mathbf{A}\mathbf{x}_k\) is nearly \(\lambda_1\mathbf{x}_k\), and we can take the ratio \(\beta_k=y_{k,m}/x_{k,m}\) as an eigenvalue estimate. In fact, revisiting (8.2.2), the extra \(\alpha_j\) normalization factors cancel in the ratio, and, after some simplification, we get
where \(r_j=\lambda_j/\lambda_1\) and the \(b_j\) are constants. By assumption (8.2.1), each \(r_j\) satisfies \(|r_j|<1\), so we see that \(\beta_k\rightarrow \lambda_1\) as \(k\rightarrow\infty\).
Function 8.2.3 is our implementation of power iteration.
Power iteration to find a dominant eigenvalue
1"""
2 poweriter(A,numiter)
3
4Perform `numiter` power iterations with the matrix `A`, starting
5from a random vector. Returns a vector of eigenvalue estimates
6and the final eigenvector approximation.
7"""
8function poweriter(A,numiter)
9 n = size(A,1)
10 x = normalize(randn(n),Inf)
11 β = zeros(numiter)
12 for k in 1:numiter
13 y = A*x
14 m = argmax(abs.(y))
15 β[k] = y[m]/x[m]
16 x = y/y[m]
17 end
18 return β,x
19end
Observe that the only use of \(\mathbf{A}\) is to find the matrix-vector product \(\mathbf{A}\mathbf{x}\), which makes exploitation of the sparsity of \(\mathbf{A}\) trivial.
Convergence#
Let’s examine the terms in the numerator and denominator of (8.2.5) more carefully:
At this point we’ll introduce an additional assumption,
This condition isn’t strictly necessary, but it simplifies the following statements considerably because now it’s clear that the quantity in (8.2.6) approaches \(b_2 r_2^k\) as \(k\rightarrow \infty\).
Next we estimate (8.2.5) for large \(k\), using a geometric series expansion for the denominator to get
This is linear convergence with factor \(r_2\):
The error in the power iteration eigenvalue estimates \(\beta_k\) is reduced asymptotically by a constant factor \(\lambda_2/\lambda_1\) at each iteration.
We will experiment with the power iteration on a 5×5 matrix with prescribed eigenvalues and dominant eigenvalue at 1.
λ = [1,-0.75,0.6,-0.4,0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5,5),1) + diagm(λ)
5×5 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0
0.0 -0.75 1.0 1.0 1.0
0.0 0.0 0.6 1.0 1.0
0.0 0.0 0.0 -0.4 1.0
0.0 0.0 0.0 0.0 0.0
We run the power iteration 60 times. The best estimate of the dominant eigenvalue is the last entry of the first output.
β,x = FNC.poweriter(A,60)
eigval = β[end]
1.000000007641057
We check for linear convergence using a log-linear plot of the error.
err = @. 1 - β
plot(0:59,abs.(err),m=:o,title="Convergence of power iteration",
xlabel=L"k",yaxis=(L"|\lambda_1-\beta_k|",:log10,[1e-10,1]))
The asymptotic trend seems to be a straight line, consistent with linear convergence. To estimate the convergence rate, we look at the ratio of two consecutive errors in the linear part of the convergence curve. The ratio of the first two eigenvalues should match the observed rate.
@show theory = λ[2]/λ[1];
@show observed = err[40]/err[39];
theory = λ[2] / λ[1] = -0.75
observed = err[40] / err[39] = -0.7505804799199765
Note that the error is supposed to change sign on each iteration. The effect of these alternating signs is that estimates oscillate around the exact value.
β[26:30]
5-element Vector{Float64}:
1.0001362923884616
0.9998992158824787
1.0000764485393467
0.999943179638739
1.0000429247061229
In practical situations, we don’t know the exact eigenvalue that the algorithm is supposed to find. In that case we would base errors on the final \(\beta\) that was found, as in the following plot.
err = @. β[end] - β[1:end-1]
plot(0:58,abs.(err),m=:o,title="Convergence of power iteration",
xlabel=L"k",yaxis=(L"|\beta_{60}-\beta_k|",:log10,[1e-10,1]))
The results are very similar until the last few iterations, when the limited accuracy of the reference value begins to show. That is, while it is a good estimate of \(\lambda_1\), it is less good as an estimate of the error in nearby estimates.
The practical utility of (8.2.9) is limited because if we knew \(\lambda_1\) and \(\lambda_2\), we wouldn’t be running the power iteration in the first place! Sometimes it’s possible to find estimates of or bounds on the ratio. If nothing else, though, it is useful to know that linear convergence is expected at a rate based solely on the dominant eigenvalues.
Exercises#
⌨ Use Function 8.2.3 to perform 20 power iterations for the following matrices. Quantitatively compare the observed convergence to the prediction in (8.2.9).
(a) \(\mathbf{A} = \begin{bmatrix} 1.1 & 1 \\ 0.1 & 2.4 \end{bmatrix} \quad\) (b) \(\mathbf{A} = \begin{bmatrix} 2 & 1 \\ 1 & 0 \end{bmatrix} \quad\) (c) \( \mathbf{A} = \begin{bmatrix} 6 & 5 & 4 \\ 5 & 4 & 3 \\ 4 & 3 & 2 \end{bmatrix}\)
(d) \(\mathbf{A} = \begin{bmatrix} 8 & -14 & 0 & -14 \\ -8 & 1 & 1 & 1 \\ -4 & -2 & 0 & 2 \\ 8 & -7 & -1 & -7 \end{bmatrix}\)
✍ Describe what happens during power iteration using the matrix \(\mathbf{A}= \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\) and initial vector \(\mathbf{x}=\begin{bmatrix} 0.4\\0.7 \end{bmatrix}\). Does the algorithm converge to an eigenvector? How does this relate to (8.2.2)?
⌨ In Exercise 2.3.5 we considered a mass-lumped model of a hanging string that led to a tridiagonal system of linear equations. Then, in Exercise 7.2.6, we found that eigenvectors of the same matrix correspond to vibrational modes of the string. The same setup can be applied to a membrane hanging from a square frame. Lumping the mass onto a Cartesian grid, each interacts with the four neighbors to the north, south, east, and west. If \(n\) masses are used in each coordinate direction, we get an \(n^2\times n^2\) sparse matrix \(\mathbf{A}\) that can be constructed by
FNC.poisson(n)
.(a) Let \(n=10\) and make a
spy
plot of \(\mathbf{A}\). What is the density of \(\mathbf{A}\)? Most rows all have the same number of nonzeros; find this number.(b) Find the dominant \(\lambda_1\) using
eigs
for \(n=10,15,20,25\).(c) For each \(n\) in part (b), apply 100 steps of Function 8.2.3. On one graph, plot the four convergence curves \(|\beta_k-\lambda_1|\) using a semi-log scale. (They will not be smooth curves because this matrix has many repeated eigenvalues that complicate the convergence analysis.)
⌨ Copy the instructions from Exercise 8.1.5 to obtain a large, sparse matrix \(\mathbf{A}\). Use Function 8.2.3 to find the leading eigenvalue of \(\mathbf{A}^T\mathbf{A}\) to at least six significant digits.
⌨ For symmetric matrices, the Rayleigh quotient (7.4.2) converts an \(O(\epsilon)\) eigenvector estimate into an \(O(\epsilon^2)\) eigenvalue estimate. Duplicate Function 8.2.3 and rename it to
powersym
. Modify the new function to use the Rayleigh quotient to produce the entries ofβ
. Your function should not introduce any additional matrix-vector multiplications. Apply the original Function 8.2.3 and the newpowersym
on the matrixmatrixdepot("fiedler",100)
, plotting the convergence curves on one graph.
- 1
If \(\mathbf{x}\) is chosen randomly, the probability that \(z_1=0\) is mathematically zero.