Eigenvalue decomposition

7.2. Eigenvalue decomposition#

To this point we have dealt frequently with the solution of the linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\). Alongside this problem in its importance to linear algebra is the eigenvalue problem.

Definition 7.2.1 : Eigenvalue and eigenvector

Given a square matrix \(\mathbf{A}\), if

(7.2.1)#\[\mathbf{A}\mathbf{x} = \lambda \mathbf{x}\]

for a scalar \(\lambda\) and a nonzero vector \(\mathbf{x}\), then \(\lambda\) is an eigenvalue and \(\mathbf{x}\) is an associated eigenvector.

Complex matrices#

A matrix with real entries can have complex eigenvalues. Therefore we assume all matrices, vectors, and scalars may be complex in what follows. Recall that a complex number can be represented as \(a+i b\) for real \(a\) and \(b\) and where \(i^2=-1\). The complex conjugate of \(x=a+i b\) is denoted \(\bar{x}\) and is given by \(\bar{x}=a-i b\). The magnitude or modulus of a complex number \(z\) is

\[ |z| = \sqrt{z\cdot \bar{z}}. \]

Definition 7.2.2 : Terms for complex matrices

The adjoint or hermitian of a matrix \(\mathbf{A}\) is denoted \(\mathbf{A}^*\) and is given by \(\mathbf{A}^*=(\overline{\mathbf{A}})^T=\overline{\mathbf{A}^T}\). The matrix is self-adjoint or hermitian if \(\mathbf{A}^*=\mathbf{A}\).

The 2-norm of a complex vector \(\mathbf{u}\) is \(\sqrt{\mathbf{u}^*\mathbf{u}}\). Other vector norms, and all matrix norms, are as defined in Section 2.7.

Complex vectors \(\mathbf{u}\) and \(\mathbf{v}\) of the same dimension are orthogonal if \(\mathbf{u}^*\mathbf{v}=0\). Orthonormal vectors are mutually orthogonal and have unit 2-norm. A unitary matrix is a square matrix with orthonormal columns, or, equivalently, a matrix satisfying \(\mathbf{A}^* = \mathbf{A}^{-1}\).

For the most part, “adjoint” replaces “transpose,” “hermitian” replaces “symmetric,” and “unitary matrix” replaces “orthogonal matrix” when applying our previous results to complex matrices.

Eigenvalue decomposition#

An easy rewrite of the eigenvalue definition (7.2.1) is that \((\mathbf{A} - \lambda\mathbf{I}) \mathbf{x} = \boldsymbol{0}\). Hence \((\mathbf{A} - \lambda\mathbf{I})\) is singular, and it therefore must have a zero determinant. This is the property most often used to compute eigenvalues by hand.

Example 7.2.3

Given

\[\begin{split}\mathbf{A} = \begin{bmatrix} 1 & 1 \\ 4 & 1 \end{bmatrix},\end{split}\]

we compute

\[\begin{split}\begin{vmatrix} 1-\lambda & 1\\ 4 & 1-\lambda \end{vmatrix} = (1-\lambda)^2 - 4 = \lambda^2-2\lambda-3.\end{split}\]

The eigenvalues are the roots of this quadratic, \(\lambda_1=3\) and \(\lambda_2=-1\).

The determinant \(\det(\mathbf{A} - \lambda \mathbf{I})\) is called the characteristic polynomial. Its roots are the eigenvalues, so we know that an \(n\times n\) matrix has \(n\) eigenvalues, counting algebraic multiplicity.

Suppose that \(\mathbf{A}\mathbf{v}_k=\lambda_k\mathbf{v}_k\) for \(k=1,\ldots,n\). We can summarize these as

\[\begin{split} \begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \cdots & \mathbf{A}\mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \lambda_1 \mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \cdots & \lambda_n \mathbf{v}_n \end{bmatrix}, \\[1mm] \mathbf{A} \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix},\end{split}\]

which we write as

(7.2.2)#\[ \mathbf{A} \mathbf{V} = \mathbf{V} \mathbf{D}.\]

If we find that \(\mathbf{V}\) is a nonsingular matrix, then we arrive at a key factorization.1

Definition 7.2.4 : Eigenvalue decomposition (EVD)

An eigenvalue decomposition (EVD) of a square matrix \(\mathbf{A}\) is

(7.2.3)#\[\mathbf{A} = \mathbf{V} \mathbf{D} \mathbf{V}^{-1}.\]

If \(\mathbf{A}\) has an EVD, we say that \(\mathbf{A}\) is diagonalizable; otherwise \(\mathbf{A}\) is nondiagonalizable (or defective).

Observe that if \(\mathbf{A}\mathbf{v} = \lambda \mathbf{v}\) for nonzero \(\mathbf{v}\), then the equation remains true for any nonzero multiple of \(\mathbf{v}\). Therefore, eigenvectors are not unique, and thus neither is an EVD.

We stress that while (7.2.2) is possible for all square matrices, (7.2.3) is not. One simple example of a nondiagonalizable matrix is

(7.2.4)#\[\begin{split} \mathbf{B} = \begin{bmatrix} 1 & 1\\0 & 1 \end{bmatrix}.\end{split}\]

There is a common circumstance in which we can guarantee an EVD exists. The proof of the following theorem can be found in many elementary texts on linear algebra.

Theorem 7.2.5

If the \(n\times n\) matrix \(\mathbf{A}\) has \(n\) distinct eigenvalues, then \(\mathbf{A}\) is diagonalizable.

Demo 7.2.6

The eigvals function returns a vector of the eigenvalues of a matrix.

A = π*ones(2,2)

2×2 Matrix{Float64}:
 3.14159  3.14159
 3.14159  3.14159

λ = eigvals(A)

2-element Vector{Float64}:
 0.0
 6.283185307179586

If you want the eigenvectors as well, use eigen.

λ,V = eigen(A)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
 0.0
 6.283185307179586
vectors:
2×2 Matrix{Float64}:
 -0.707107  0.707107
  0.707107  0.707107

norm( A*V[:,2] - λ[2]*V[:,2] )

0.0

Both functions allow you to sort the eigenvalues by specified criteria.

A = diagm(-2.3:1.7)
@show eigvals(A,sortby=real);
@show eigvals(A,sortby=abs);

eigvals(A, sortby = real) = [-2.3, -1.3, -0.3, 0.7, 1.7]

eigvals(A, sortby = abs) =

[-0.3, 0.7, -1.3, 1.7, -2.3]

If the matrix is not diagonalizable, no message is given, but V will be singular. The robust way to detect that circumstance is via \(\kappa(\mathbf{V})\).

A = [-1 1;0 -1]
λ,V = eigen(A)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
 -1.0
 -1.0
vectors:
2×2 Matrix{Float64}:
 1.0  -1.0
 0.0   2.22045e-16

cond(V)

9.007199254740991e15

Even in the nondiagonalizable case, \(\mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D}\) holds.

opnorm(A*V - V*diagm(λ))

2.220446049250313e-16

Similarity and matrix powers#

The particular relationship between matrices \(\mathbf{A}\) and \(\mathbf{D}\) in (7.2.3) is important.

Definition 7.2.7 : Similar matrices

If \(\mathbf{S}\) is any nonsingular matrix, we say that \(\mathbf{B}=\mathbf{S}\mathbf{A}\mathbf{S}^{-1}\) is a similarity transformation of \(\mathbf{A}\), and we say that \(\mathbf{B}\) is similar to \(\mathbf{A}\).

A similarity transformation does not change eigenvalues, a fact that is typically proved in elementary linear algebra texts.

Theorem 7.2.8

If \(\mathbf{S}\) is a nonsingular matrix, then \(\mathbf{S}\mathbf{A}\mathbf{S}^{-1}\) has the same eigenvalues as \(\mathbf{A}\).

The EVD is especially useful for matrix powers. To begin,

\[\mathbf{A}^2=(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})=\mathbf{V}\mathbf{D}(\mathbf{V}^{-1}\mathbf{V})\mathbf{D}\mathbf{V}^{-1}=\mathbf{V}\mathbf{D}^2\mathbf{V}^{-1}.\]

Multiplying this result by \(\mathbf{A}\) repeatedly, we find that

(7.2.5)#\[\mathbf{A}^k = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}.\]

Because \(\mathbf{D}\) is diagonal, its power \(\mathbf{D}^k\) is just the diagonal matrix of the \(k\)th powers of the eigenvalues.

Furthermore, given a polynomial \(p(z)=c_0+c_1 z + \cdots + c_m z^m\), we can apply the polynomial to the matrix in a straightforward way,

(7.2.6)#\[p(\mathbf{A}) = c_0\mathbf{I} +c_1 \mathbf{A} + \cdots + c_m \mathbf{A}^m.\]

Applying (7.2.5) leads to

(7.2.7)#\[\begin{split}p(\mathbf{A}) & = c_0\mathbf{V}\mathbf{V}^{-1} +c_1 \mathbf{V}\mathbf{D}\mathbf{V}^{-1} + \cdots + c_m \mathbf{V}\mathbf{D}^m\mathbf{V}^{-1} \\ &= \mathbf{V} \cdot [ c_0\mathbf{I} +c_1 \mathbf{D} + \cdots + c_m \mathbf{D}^m] \cdot \mathbf{V}^{-1} \\[1mm] &= \mathbf{V} \cdot \begin{bmatrix} p(\lambda_1) & & & \\ & p(\lambda_2) & & \\ & & \ddots & \\ & & & p(\lambda_n) \end{bmatrix} \cdot \mathbf{V}^{-1}.\end{split}\]

Finally, given the convergence of Taylor polynomials to common functions, we are able to apply a function \(f\) to a square matrix by replacing \(p\) with \(f\) in (7.2.6).

Conditioning of eigenvalues#

Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. While many possible results can be derived, we will use just one, the Bauer–Fike theorem.

Theorem 7.2.9 : Bauer–Fike

Let \(\mathbf{A}\in\mathbb{C}^{n\times n}\) be diagonalizable, \(\mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1}\), with eigenvalues \(\lambda_1,\ldots,\lambda_n\). If \(\mu\) is an eigenvalue of \(\mathbf{A}+\mathbf{E}\) for a complex matrix \(\mathbf{E}\), then

(7.2.8)#\[\min_{j=1,\ldots,n} |\mu - \lambda_j| \le \kappa(\mathbf{V}) \, \| \mathbf{E} \|\,,\]

where \(\|\cdot\|\) and \(\kappa\) are in the 2-norm.

The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is \(\kappa(\mathbf{V})\) times larger than perturbations to the matrix. This result is a bit less straightforward than it might seem—eigenvectors are not unique, so there are multiple possible values for \(\kappa(\mathbf{V})\). Even so, the theorem indicates caution when a matrix has eigenvectors that form an ill-conditioned matrix. The limiting case of \(\kappa(\mathbf{V})=\infty\) might be interpreted as indicating a nondiagonalizable matrix \(\mathbf{A}\). The other extreme is also of interest: \(\kappa(\mathbf{V})=1\), which implies that \(\mathbf{V}\) is unitary.

Definition 7.2.10 : Normal matrix

If \(\mathbf{A}\) has an EVD (7.2.3) with a unitary eigenvector matrix \(\mathbf{V}\), then \(\mathbf{A}\) is a normal matrix.

As we will see in Section 7.4, hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, (7.2.8) guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.

Demo 7.2.11

We first define a hermitian matrix. Note that the ' operation is the adjoint and includes complex conjugation.

n = 7
A = randn(n,n) + 1im*randn(n,n)
A = (A+A')/2

7×7 Matrix{ComplexF64}:
   0.581646+0.0im         -1.14439-2.01986im   …   -1.57284+0.0713392im
   -1.14439+2.01986im     0.436616+0.0im          -0.473855+0.102027im
 -0.0415607+0.235214im   -0.412786+0.968471im      0.330738-0.00972205im
   0.545272-0.730313im    0.998448+0.461714im       -1.4534-0.186603im
    0.08257-0.45582im      1.28887+0.720884im      0.529256+0.382783im
  -0.849645+0.0401748im   0.116424-0.260548im  …   -1.36784-0.459035im
   -1.57284-0.0713392im  -0.473855-0.102027im      0.616586+0.0im

We confirm that the matrix \(\mathbf{A}\) is normal by checking that \(\kappa(\mathbf{V}) = 1\) (to within roundoff).

λ,V = eigen(A)
@show cond(V);

cond(V) = 1.0000000000000007

Now we perturb \(\mathbf{A}\) and measure the effect on the eigenvalues. The Bauer–Fike theorem uses absolute differences, not relative ones.

Since the ordering of eigenvalues can change, we look at all pairwise differences and take the minima.

ΔA = 1e-8*normalize(randn(n,n) + 1im*randn(n,n))
λ̃ = eigvals(A+ΔA)
dist = minimum( [abs(x-y) for x in λ̃, y in λ], dims=2 )

7×1 Matrix{Float64}:
240134615330935e-10
39461619974461e-10
649536179262133e-10
6291480651451636e-9
683011620578484e-9
0693421672262406e-9
8462247607754486e-9

As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.

Now we see what happens for a triangular matrix.

n = 20
x = 1:n
A = triu( x*ones(n)' )
A[1:5,1:5]

5×5 Matrix{Float64}:
0  1.0  1.0  1.0  1.0
0  2.0  2.0  2.0  2.0
0  0.0  3.0  3.0  3.0
0  0.0  0.0  4.0  4.0
0  0.0  0.0  0.0  5.0

This matrix is not especially close to normal.

λ,V = eigen(A)
@show cond(V);

cond(V) = 6.149906664929389e9

As a result, the eigenvalues can change by a good deal more.

ΔA = 1e-8*normalize(randn(n,n) + 1im*randn(n,n))
λ̃ = eigvals(A+ΔA)
dist = minimum( [abs(x-y) for x in λ̃, y in λ], dims=2 )
BF_bound = cond(V)*norm(ΔA)
@show maximum(dist),BF_bound;

(maximum(dist), BF_bound) = (0.21791487335613785, 61.499066649293894)

If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.

plt = scatter(λ,zeros(n),aspect_ratio=1)
for _ in 1:200
    ΔA = eps(Float32)*normalize(randn(n,n) + 1im*randn(n,n))
    λ̃ = eigvals(A+ΔA)
    scatter!(real(λ̃),imag(λ̃),m=1,color=:black)
end
plt

The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.

Computing the EVD#

Roots of the characteristic polynomial are not used in numerical methods for finding eigenvalues.2 Practical algorithms for computing the EVD go beyond the scope of this book. The essence of the matter is the connection to matrix powers indicated in (7.2.5). (We will see much more about the importance of matrix powers in Chapter 8.)

If the eigenvalues have different complex magnitudes, then as \(k\to\infty\) the entries on the diagonal of \(\mathbf{D}^k\) become increasingly well separated and easy to pick out. It turns out that there is an astonishingly easy and elegant way to accomplish this separation without explicitly computing the matrix powers.

Demo 7.2.12

Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.

D = diagm( [-6,-1,2,4,5] )
V,R = qr(randn(5,5))    # V is unitary
A = V*D*V'

5×5 Matrix{Float64}:
  0.062894  -0.423156  -1.0086    -1.77949   -0.554159
 -0.423156  -0.4128     3.98033   -1.87395   -1.03854
 -1.0086     3.98033    0.133188   2.7109     0.965617
 -1.77949   -1.87395    2.7109     0.368932   0.220078
 -0.554159  -1.03854    0.965617   0.220078   3.84779

eigvals(A)

5-element Vector{Float64}:
 -6.000000000000008
 -1.000000000000001
  1.9999999999999984
  4.0
  4.999999999999997

Now we will take the QR factorization and just reverse the factors.

Q,R = qr(A)
A = R*Q;

It turns out that this is a similarity transformation, so the eigenvalues are unchanged.

eigvals(A)

5-element Vector{Float64}:
 -6.000000000000007
 -0.9999999999999997
  1.9999999999999998
  4.0
  5.000000000000002

What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to \(\mathbf{D}\).

for k in 1:40
    Q,R = qr(A)
    A = R*Q
end
A

5×5 Matrix{Float64}:
 -5.99994       0.0247765    -5.44932e-7    1.71064e-15  -1.37885e-15
  0.0247765     4.99994      -4.3434e-5    -4.53226e-16   2.53933e-15
 -5.44932e-7   -4.3434e-5     4.0           1.66301e-12   2.43691e-15
 -4.44677e-19  -1.33889e-16   1.66247e-12   2.0           2.10008e-10
 -3.01204e-29  -9.03837e-27   1.13121e-22   2.10007e-10  -1.0

The process demonstrated in Demo 7.2.12 is known as the Francis QR iteration, and it can be formulated as an \(O(n^3)\) algorithm for finding the EVD. Such an algorithm is the foundation of what the eigen function uses.

Exercises#

(a) ✍ Suppose that matrix \(\mathbf{A}\) has an eigenvalue \(\lambda\). Show that for any induced matrix norm, \(\| \mathbf{A} \|\ge |\lambda|\).

(b) ✍ Find a matrix \(\mathbf{A}\) such that \(\| \mathbf{A} \|_2\) is strictly larger than \(|\lambda|\) for all eigenvalues \(\lambda\). (Proof-by-computer isn’t allowed here. You don’t need to compute \(\| \mathbf{A} \|_2\) exactly, just a lower bound for it.)
✍ Prove that the matrix \(\mathbf{B}\) in (7.2.4) does not have two independent eigenvectors.
⌨ Use eigvals to find the eigenvalues of each matrix. Then for each eigenvalue \(\lambda\), use rank to verify that \(\lambda\mathbf{I}\) minus the given matrix is singular.

\(\mathbf{A} = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \end{bmatrix}\qquad\) \(\mathbf{B} = \begin{bmatrix} 2 & -1 & -1 \\ -2 & 2 & -1 \\ -1 & -2 & 2 \end{bmatrix} \qquad\) \( \mathbf{C} = \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix} \)

\(\mathbf{D} = \begin{bmatrix} 3 & 1 & 0 & 0 \\ 1 & 3 & 1 & 0 \\ 0 & 1 & 3 & 1 \\ 0 & 0 & 1 & 3 \end{bmatrix}\qquad \) \(\mathbf{E} = \begin{bmatrix} 4 & -3 & -2 & -1\\ -2 & 4 & -2 & -1 \\ -1 & -2 & 4 & -1 \\ -1 & -2 & -1 & 4 \\ \end{bmatrix} \)
(a) ✍ Show that the eigenvalues of a diagonal \(n\times n\) matrix \(\mathbf{D}\) are the diagonal entries of \(\mathbf{D}\). (That is, produce the associated eigenvectors.)

(b) ✍ The eigenvalues of a triangular matrix are its diagonal entries. Prove this in the \(3\times 3\) case,

\[\begin{split} \mathbf{T} = \begin{bmatrix} t_{11} & t_{12}& t_{13}\\ 0 & t_{22} & t_{23} \\ 0 & 0 & t_{33} \end{bmatrix},\end{split}\]

by finding the eigenvectors. (Start by showing that \([1,0,0]^T\) is an eigenvector. Then show how to make \([a,1,0]^T\) an eigenvector, except for one case that does not change the outcome. Continue the same logic for \([a,b,1]^T\).)
✍ Let \(\mathbf{A}=\displaystyle\frac{\pi}{6}\begin{bmatrix} 4 & 1 \\ 4 & 4 \end{bmatrix}\).

(a) Show that

\[\begin{split} \lambda_1=\pi,\, \mathbf{v}_1=\begin{bmatrix}1 \\ 2 \end{bmatrix}, \quad \lambda_2=\frac{\pi}{3},\, \mathbf{v}_2=\begin{bmatrix}1 \\ -2 \end{bmatrix} \end{split}\]

yield an EVD of \(\mathbf{A}\).

(b) Use (7.2.7) to evaluate \(p(\mathbf{A})\), where \(p(x) = (x-\pi)^4\).

(c) Use the function analog of (7.2.7) to evaluate \(\cos(\mathbf{A})\).
⌨ In Exercise 2.3.5, you showed that the displacements of point masses placed along a string satisfy a linear system \(\mathbf{A}\mathbf{q}=\mathbf{f}\) for an \((n-1)\times(n-1)\) matrix \(\mathbf{A}\). The eigenvalues and eigenvectors of \(\mathbf{A}\) correspond to resonant frequencies and modes of vibration of the string. For \(n=40\) and the physical parameters given in part (b) of that exercise, find the eigenvalue decomposition of \(\mathbf{A}\). Report the three eigenvalues with smallest absolute value, and plot all three associated eigenvectors on a single graph (as functions of the vector row index).
⌨ Demo 7.2.12 suggests that the result of the Francis QR iteration as \(k\to\infty\) sorts the eigenvalues on the diagonal according to a particular ordering. Following the code there as a model, create a random matrix with eigenvalues equal to \(-9.6,-8.6,\ldots,10.4\), perform the iteration 200 times, and check whether the sorting criterion holds in your experiment as well.
⌨ Eigenvalues of random matrices and their perturbations can be very interesting.

(a) Let A=randn(60,60). Scatter plot its eigenvalues in the complex plane, using aspect_ratio=1 and red diamonds as markers.

(b) Let \(\mathbf{E}\) be another random \(60\times 60\) matrix, and on top of the previous graph, plot the eigenvalues of \(\mathbf{A}+0.05\mathbf{E}\) as blue dots. Repeat this for 100 different values of \(\mathbf{E}\).

(c) Let T=triu(A). On a new graph, scatter plot the eigenvalues of \(\mathbf{T}\) in the complex plane. (They all lie on the real axis.)

(d) Repeat part (b) with \(\mathbf{T}\) in place of \(\mathbf{A}\).

(e) Compute some condition numbers and apply Theorem 7.2.9 to explain the dramatic difference between your plots with respect to the dot distributions.

1: The terms “factorization” and “decomposition” are equivalent; they coexist mainly for historical reasons.
2: In fact, the situation is reversed: eigenvalue methods are among the best ways to compute the roots of a given polynomial.