5  Eigenvalues

Code
using Plots, LinearAlgebra, LaTeXStrings
default(label="", linewidth=3, markersize=4, size=(500,320))

We now steer into the part of linear algebra most related to ODEs.

Important

For this chapter, we deal with square matrices only.

5.1 Determinants

There are many ways to characterize singular matrices, but only a few of them are computationally attractive. One that stands out is a function of square matrices called the determinant. (You probably saw some \(2\times 2\) and \(3\times 3\) determinants in vector calculus. This is the same thing.)

A definition of the determinant from fundamentals is actually quite tricky. We are going to take a shortcut and define it by a formula for computing it. The \(2\times 2\) case is easy:

\[ \det\left( \twomat{a}{b}{c}{d} \right) = \twodet{a}{b}{c}{d} = ad-bc. \]

This definition can be bootstrapped into a real-valued function for square matrices of any size.

Definition 5.1 (Determinant) If \(\bfA\) is \(n\times n\), then its determinant is

\[ \det(\bfA) = \sum (-1)^{i+j} A_{ij} \det\bigl( \mathbf{M}_{ij} \bigr), \tag{5.1}\]

where the sum is taken over any row or column of \(\bfA\) and \(\mathbf{M}_{ij}\) is the matrix that results from deleting row \(i\) and column \(j\) from \(\bfA\).

The formula Equation 5.1, which is called cofactor expansion, is recursive: the \(n\times n\) case is defined in terms of the \((n-1)\times (n-1)\) case, and so on all the way back down to \(2\times 2\).

Since expanding along any row or column gives the same result, it can be advantageous to choose one with lots of zeros to cut down on the total computation.

Example 5.1 Compute the determinant of

\[ \begin{bmatrix} 2 & 0 & -1 \\ -2 & 3 & -1 \\ 2 & 0 & 1 \end{bmatrix}. \]

Solution. Using cofactor expansion along the first row,

\[ \begin{split} \begin{vmatrix} 2 & 0 & -1 \\ -2 & 3 & -1 \\ 2 & 0 & 1 \end{vmatrix} & = (2) \twodet{3}{-1}{0}{1} - (0) \twodet{-2}{-1}{2}{1} + (-1)\twodet{-2}{3}{2}{0} \\ & = 2(3-0) + (-1)(0-6) = 12. \\ \end{split} \]

Here, it might have been a tad easier to expand along the second column instead, because then we can ignore all but one of the 2x2 cases:

\[ \begin{split} \begin{vmatrix} 2 & 0 & -1 \\ -2 & 3 & -1 \\ 2 & 0 & 1 \end{vmatrix} & = -(0) \begin{vmatrix} & & \\ & & \end{vmatrix} + (3) \twodet{2}{-1}{2}{1} - (0)\begin{vmatrix} & & \\ & & \end{vmatrix} \\ & = 3(2+2) = 12. \\ \end{split} \]

5.1.1 Triangular matrices

There is one class of matrices for which determinants are super easy to calculate: the triangular matrices.

Definition 5.2 A matrix \(\bfA\) is upper triangular if \(A_{ij}=0\) whenever \(i>j\). It is lower triangular if \(A_{ij}=0\) whenever \(i<j\).

Important

A triangular matrix has to have zeros in designated elements, but its other elements may or may not be zero.

Note

A matrix that is both upper and lower triangular is diagonal.

The following ensues easily from cofactor expansion.

Theorem 5.1 The determinant of a triangular matrix is the product of its diagonal elements. That is,

\[ \det(\mathbf{T}) = \prod_{i=1}^n t_{ii} \]

if \(\mathbf{T}\) is triangular.

5.1.2 Properties

Theorem 5.2 Let \(\bfA\) and \(\bfB\) be \(n\times n\), and let \(c\) be a number. Then

  1. \(\det(c\bfA) = c^n \det(\bfA)\),
  2. \(\det(\bfA\bfB) = \det(\bfA)\det(\bfB)\),
  3. If \(\bfA\) is nonsingular, \(\det(\bfA^{-1})=\bigl[\det(\bfA)\bigr]^{-1}\).
  4. \(\det(\bfA)=0\) if and only if \(\bfA\) is singular.

It’s the last property above that is of the greatest interest.

Note

The determinant is often the easiest way to check by hand for the invertibility of a small matrix.

5.1.3 Cramer’s Rule

Even though a 2x2 inverse is easy, it’s still not the most convenient way to solve a linear system \(\bfA\bfx=\bfb\) by hand. There is another shortcut known as Cramer’s Rule:

\[ \begin{split} x_1 = \frac{ \twodet{b_1}{A_{12}}{b_2}{A_{22}} }{ \det(\bfA) },\qquad x_2 = \frac{ \twodet{A_{11}}{b_1}{A_{21}}{b_2} }{ \det(\bfA) }. \end{split} \tag{5.2}\]

Obviously, this does not work if \(\det(\bfA)=0\), i. e., when the matrix is singular. In that case, you have to fall back on row elimination.

Example 5.2 Solve

\[\begin{split} -x + 3y & = 1 \\ 3x + y & = 7 \end{split}\]

by Cramer’s Rule.

Solution. Plug and play (or is it plug and pray?):

\[\begin{split} x & = \frac{ \twodet{1}{3}{7}{1} }{ \det(\bfA) }= \frac{ \twodet{1}{3}{7}{1} }{ \twodet{-1}{3}{3}{1} } = \frac{-20}{-10} = 2, \\ y & = \frac{ \twodet{-1}{1}{3}{7} }{ \det(\bfA) } = \frac{ \twodet{-1}{1}{3}{7} }{ \twodet{-1}{3}{3}{1} } = \frac{-10}{-10} = 1.\\ \end{split}\]

.

5.2 Eigenvalues

The importance and usefulness of the following definition won’t be apparent for a while.

Definition 5.3 (Eigenvalue and eigenvector) Suppose \(\bfA\in\cmn{n}{n}\). If there exist a number \(\lambda\) and a nonzero vector \(\bfv\) such that

\[ \bfA \bfv = \lambda \bfv, \]

then \(\lambda\) is an eigenvalue of \(\bfA\) with associated eigenvector \(\bfv\).

If you think of \(\bfA\) as acting on vectors, then an eigenvector is a direction in which the action of \(\bfA\) is one-dimensional.

Example 5.3 Let \(\bfA = -\dfrac{1}{6}\twomat{1}{5}{10}{-4}\). For any value of \(\theta\), \(\bfx = [\cos\theta,\sin\theta]\) is a vector in \(\real^2\) in the direction of \(\theta\). If we choose the direction randomly, then there is no special relationship between \(\bfx\) (blue) and \(\bfA\bfx\) (red). But in two special directions, the result \(\bfA\bfx\) is parallel to \(\bfx\). For these directions, \(\bfx\) is an eigenvector, and the corresponding eigenvalue is either \(1.5\) or \(-1\).

Code
V = [-2 1;4 1]
D = [1.5 0;0 -1]
A = V*D/V

plot(size=(600,200),layout=(1,3),frame=:zerolines,aspect_ratio=1,xticks=[],yticks=[],xlims=[-1,1],ylims=1.4*[-1,1])
t = (0:360)*2pi/360

plot!(cos.(t),sin.(t),color=RGBA(0,0,0,.5),l=1,subplot=1)
x = normalize([-1,-0.4])
y = A*x
plot!([0,x[1]],[0,x[2]],color=:darkblue,l=3,arrow=true,subplot=1)
annotate!(x[1]/2+.1,x[2]/2-0.2,L"\mathbf{x}",color=:darkblue,subplot=1)
plot!([0,y[1]],[0,y[2]],color=:red,l=3,arrow=true,subplot=1)
annotate!(y[1]/2+.33,y[2]/2-0.1,L"\mathbf{Ax}",color=:red,subplot=1)

plot!(cos.(t),sin.(t),color=RGBA(0,0,0,.5),l=1,subplot=2)
x = normalize([-2,4])
y = A*x
plot!([0,y[1]],[0,y[2]],color=:red,l=3,arrow=true,subplot=2)
plot!([0,x[1]],[0,x[2]],color=:darkblue,l=3,arrow=true,subplot=2)

plot!(cos.(t),sin.(t),color=RGBA(0,0,0,.5),l=1,subplot=3)
x = normalize([1,1])
y = A*x
plot!([0,y[1]],[0,y[2]],color=:red,l=3,arrow=true,subplot=3)
plot!([0,x[1]],[0,x[2]],color=:darkblue,l=3,arrow=true,subplot=3)

The geometric intuition above is nice, but it is limiting.

Example 5.4 The matrix

\[ \bfA=\twomat{0}{-1}{1}{0} \]

rotates 2-vectors by 90 degrees counterclockwise. For instance,

\[ \bfA \twovec{\sqrt{3}}{1} = \twovec{-1}{\sqrt{3}}. \]

Therefore, \(\bfA \bfv\) can never be geometrically parallel to \(\bfv\) unless \(\bfv=\bfzero\). That would seem to rule out the possibility of eigenvalues and eigenvectors. However, consider:

\[ \bfA \twovec{1}{i} = \twovec{i}{-1} = i \twovec{1}{i}. \]

It turns out that both \(i\) and \(-i\) are eigenvalues! The eigenvectors are \([1,i]\) and \([1,-i]\), respectively.

5.2.1 Properties

Except for the first one, these properties are all easy to prove.

Theorem 5.3 Suppose \(\bfA\) is an \(n\times n\) square matrix.

  1. \(\bfA\) has at least one and at most \(n\) distinct eigenvalues.
  2. If \(\lambda\) is an eigenvalue of \(\bfA\), then \(c\lambda\) is an eigenvalue of \(c\bfA\).
  3. If \(\lambda\) is an eigenvalue of \(\bfA\), then \(\lambda-c\) is an eigenvalue of \(\bfA-c\meye\).
  4. If \(\bfA\) is a triangular square matrix, then its eigenvalues are its diagonal elements.
  5. A matrix is singular if and only if \(0\) is among its eigenvalues.

The last property above arises because \(\bfA\bfx=\bfzero\) has a nonzero solution if and only if \(\bfA\) is singular. But \(\bfA\bfx=\bfzero\) is equivalent to \(\bfA\bfx=0\bfx\), so \(0\) is an eigenvalue of \(\bfA\) if and only if \(\bfA\) is singular.

5.2.2 Eigenspaces

An eigenvalue is a clean, well-defined target. Eigenvectors are a little slipperier. For starters, if \(\bfA\bfv=\lambda\bfv\), then

\[ \bfA(c\bfv) = c(\bfA\bfv)=c(\lambda\bfv)=\lambda(c\bfv). \]

Hence:

Theorem 5.4 Every nonzero multiple of an eigenvector is also an eigenvector at the same eigenvalue.

But a simple example shows that there can be even more ambiguity.

Example 5.5 Let \(\meye\) be an identity matrix. Then \(\meye\bfx=\bfx\) for any vector \(\bfx\), so every nonzero vector is an eigenvector!

Fortunately we already have the tools we need to describe a more robust target, based on the very simple reformulation

\[ \bfzero=\bfA\bfv-\lambda\bfv=(\bfA-\lambda\meye)\bfv. \]

The requirement of an eigenvector to be nonzero, combined with Theorem 4.8, leads to the following crucial conclusion.

Theorem 5.5 \(\lambda\) is an eigenvalue of \(\bfA\) if and only if \(\bfA-\lambda\meye\) is singular.

Definition 5.4 (Eigenspace) Let \(\lambda\) be an eigenvalue of \(\bfA\). The eigenspace associated with \(\lambda\) is the null space of \((\bfA-\lambda\meye)\bfx\).

Eigenspaces, unlike eigenvectors, are uniquely associated with their eigenvalues.

Example 5.6 Find the eigenvalues and eigenspaces of \(\bfA = \begin{bmatrix} -2 & 4 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & -2 \end{bmatrix}.\)

Solution. Because the matrix is upper triangular, we see right away that its eigenvalues are \(\lambda_1=-2\) and \(\lambda_2=1\). The eigenspace for \(\lambda_1\) is the null space of

\[ \bfA - (-2)\meye = \begin{bmatrix} 0 & 4 & 0 \\ 0 & 3 & -3 \\ 0 & 0 & 0 \end{bmatrix} \quad \overset{\text{RREF}}{\Longrightarrow} \quad \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}. \]

Since the solution of a homogeneous system with this matrix is \(x_1=t\), \(x_2=x_3=0\), the basis for this eigenspace is thus \([1,0,0]\). The eigenspace for \(\lambda_2\) is the null space of

\[ \bfA - (1)\meye = \begin{bmatrix} -3 & 4 & 0 \\ 0 & 0 & -3 \\ 0 & 0 & -3 \end{bmatrix} \quad \overset{\text{RREF}}{\Longrightarrow} \quad \begin{bmatrix} 1 & -\frac{4}{3} & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}. \]

A basis for this eigenspace is \([\frac{4}{3},1,0]\), though a more convenient choice is \([4,3,0]\).

Note

For practical purposes, we often refer to and work with eigenvectors regardless of their nonuniqueness. It’s usually implied that we could answer a problem asking for them with any nonzero multiple of the ones we happen to give.

5.2.3 Fundamental Theorem redux

We can extend Theorem 4.8 to include statements about determinants and eigenvalues.

Theorem 5.6 (FTLA, Extended Edition) If \(\bfA\) is an \(n\times n\) matrix, then each of these statements is equivalent to all of the others.

  1. \(\bfA\) is invertible.
  2. The linear system \(\bfA\bfx=\bfb\) has the unique solution \(\bfx=\bfA^{-1}\bfb\) for each \(\bfb\).
  3. The null space of \(\bfA\) is just \(\{\bfzero\}\).
  4. The RRE form of \(\bfA\) is the identity matrix.
  5. \(\rank(\bfA)=n\).
  6. \(\det(\bfA)\neq 0\).
  7. None of the eigenvalues of \(\bfA\) is zero.

5.3 Computing eigenvalues

The most common way to find eigenvalues by hand is to use the determinant to detect when \(\bfA-\lambda\meye\) is singular, as required by Theorem 5.5. This determinant has a particular form and name.

Definition 5.5 (Characteristic polynomial of a matrix) Suppose \(\bfA\) is an \(n\times n\) matrix. The function \(p(z) = \det(\bfA-z\meye)\) is a polynomial of degree \(n\) in \(z\) called the characteristic polynomial of \(\bfA\).

Theorem 5.7 (Computing eigenvalues and eigenspaces) Given an \(n\times n\) matrix \(\bfA\), the following procedure can be used to find its eigenvalues and eigenspaces:

  1. Find the characteristic polynomial \(p\) of \(\bfA\).
  2. Let \(\lambda_1,\ldots,\lambda_k\) be the distinct roots of \(p(z)\). These are the eigenvalues. (If \(k<n\), it’s because one or more roots has multiplicity greater than 1.)
  3. For each \(\lambda_j\), find the general solution of \((\bfA-\lambda_j\meye)\bfv=\bfzero\). This is the eigenspace associated with \(\lambda_j\).

Example 5.7 Find the eigenvalues and eigenspaces of

\[ \bfA = \begin{bmatrix} 1 & 1 \\ 4 & 1 \end{bmatrix}. \]

Solution. Start by computing the characteristic polynomial:

\[ \det \left(\twomat{1}{1}{4}{1} - \twomat{\lambda}{0}{0}{\lambda} \right) = \twodet{1-\lambda}{1}{4}{1-\lambda} = (1-\lambda)^2 - 4 = \lambda^2-2\lambda-3. \]

We find eigenvalues by finding its roots, in this case \(\lambda_1=3\) and \(\lambda_2=-1\).

For \(\lambda_1=3\),

\[ \bfA-3 \meye = \twomat{-2}{1}{4}{-2} \quad \overset{\text{RREF}}{\Longrightarrow} \quad \twomat{1}{-1/2}{0}{0}. \]

The homogeneous solution can be expressed as \(x_1=s/2\), \(x_2=s\), or \(\bfx=s\cdot[1/2;\,1]\). So \([1/2;\,1]\) is a basis for this eigenspace. Since eigenvectors can be rescaled at will, we prefer to use \(\twovec{1}{2}\) as the basis vector.

For \(\lambda_2=-1\),

\[ \bfA+ \meye = \twomat{2}{1}{4}{2} \quad \overset{\text{RREF}}{\Longrightarrow} \quad \twomat{1}{1/2}{0}{0}, \]

leading to the eigenspace basis \([-1/2;\,1]\) or equivalently, \(\twovec{-1}{2}\).

Because eigenvalues are the roots of polynomials, we make an important observation:

Important

Real matrices may have complex eigenvalues occurring in conjugate pairs.

We catch a little break from the following fact.

Theorem 5.8 If \(\bfv\) is an eigenvector for a complex eigenvalue \(\lambda\) of a real matrix, then its conjugate \(\overline{\bfv}\) is an eigenvector for \(\overline{\lambda}\).

Example 5.8 Find eigenvalues and eigenspaces of \(\bfA = \begin{bmatrix} 0 & 0 & 6 \\ 0 & 0 & -3 \\ -3 & -3 & 0 \end{bmatrix}.\)

Solution. \[ \begin{split} \det(\bfA - z\meye) & = \begin{vmatrix} -z & 0 & 6 \\ 0 & -z & -3 \\ -3 & -3 & -z \end{vmatrix} \\ & = -z(z^2-9) + 6(-3z) \\ & = -z(z^2+18-9) = -z(z+3i)(z-3i), \end{split} \]

which gives us \(\lambda_1=0\), \(\lambda_2=3i\), \(\lambda_3=-3i\).

The eigenspace for \(\lambda_1\) is the null space of

\[ \bfA - (0)\meye = \begin{bmatrix} 0 & 0 & 6 \\ 0 & 0 & -3 \\ -3 & -3 & 0 \end{bmatrix} \quad \overset{\text{RREF}}{\Longrightarrow} \quad \begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}. \]

The only free column is the second, and a basis for this eigenspace is \([-1,1,0]\).

The eigenspace for \(\lambda_2\) is the null space of

\[ \bfA - (3i)\meye = \begin{bmatrix} -3i & 0 & 6 \\ 0 & -3i & -3 \\ -3 & -3 & -3i \end{bmatrix} \quad \overset{\text{RREF}}{\Longrightarrow} \quad \begin{bmatrix} 1 & 0 & 2i \\ 0 & 1 & -i \\ 0 & 0 & 0 \end{bmatrix}. \]

The third column is free, and a basis for this eigenspace is \([-2i,i,1]\).

Because \(\lambda_3\) is the conjugate of \(\lambda_2\), its eigenspace has basis \([2i,-i,1]\).

5.3.1 Eigenstuff for \(2\times 2\)

Finding the exact roots of a cubic polynomial is rarely easy. Thus most of our hand computations will be for \(2\times 2\) matrices. This allows some shortcuts.

First of all, the characteristic polynomial takes a memorable form.

Theorem 5.9 The characteristic polynomial of \(\twomat{a}{b}{c}{d}\) is

\[ z^2 - (a+d)z + (ad-bc). \tag{5.3}\]

Definition 5.6 The trace of a matrix is the sum of its diagonal elements. It is denoted \(\operatorname{tr}(\bfA)\).

Thus, we can write the characteristic polynomial of a \(2\times 2\) matrix as

\[ z^2 - \operatorname{tr}(\bfA)z + \det(\bfA). \]

Now suppose \(\lambda\) is known to be an eigenvalue of \(\bfA\). Let

\[ \bfB = \bfA - \lambda\meye = \twomat{\alpha}{\beta}{\gamma}{\delta}. \]

This has to be a singular matrix, and since it’s square, that means the RRE form of the matrix must have a row of zeros. So we must be able to do row operations such that

\[ \bfB \implies \twomat{\alpha}{\beta}{0}{0}. \]

It’s clear that

\[ \twomat{\alpha}{\beta}{0}{0} \twovec{\beta}{-\alpha} = \twovec{0}{0}, \]

so provided that both \(\alpha\) and \(\beta\) aren’t both zero, we have found an eigenvector of \(\bfA\). Taking care of the exception, we can deduce the following.

Theorem 5.10 (Eigenvectors for \(2\times 2\)) Given an eigenvalue \(\lambda\) of \(2\times 2\) matrix \(\bfA\), let \(\bfB = \bfA-\lambda\meye\).

  1. If \(\bfB\) is the zero matrix, then all of \(\complex^2\) is the eigenspace of \(\lambda\). Otherwise, continue:
  2. If the first row of \(\bfB\) is zero, swap the rows of \(\bfB\).
  3. Let the first row of \(\bfB\) be written as the vector \([\alpha,\beta]\). The vector \([\beta,\: -\alpha]\) is a basis of the eigenspace of \(\lambda\).

Example 5.9 Find the eigenstuff of

\[ \bfA = \twomat{1}{1}{-1}{1}. \]

Solution. We start by finding eigenvalues. The characteristic polynomial is

\[ z^2 - (1+1)z + (1+1) = z- 2z+ 2. \]

The eigenvalues are therefore

\[ \lambda = 1 \pm \sqrt{1-2} = 1 \pm i. \]

The eigenspace for \(\lambda_1=1+i\) is the null space of

\[ \bfA - (1+i) \meye = \twomat{-i}{1}{-1}{-i}. \]

To find a basis we just use the first row as explained in Theorem 5.10, getting \(\twovec{1}{i}\).

Since the matrix is real, we can apply Theorem 5.8. A basis for the other eigenspace can be found by conjugating this one, to get \(\twovec{1}{-i}\).

5.4 Similarity

Here is a vague English word that has a specific meaning in linear algebra.

Definition 5.7 (Similar matrices) Two \(n\times n\) matrices \(\bfA\) and \(\bfB\) are similar if there exists an invertible matrix \(\bfP\) such that

\[ \bfB=\bfP^{-1}\bfA\bfP. \tag{5.4}\]

Similarity is an odd-looking relationship. If we were dealing with scalars, then similarity would just be equality. In the matrix case, Equation 5.4 has an interesting relationship to eigenvalues. First, if \(z\) is any complex scalar, then

\[ \bfB - z\meye = \bfP^{-1}\bfA\bfP - z\bfP^{-1}\bfP = \bfP^{-1}(\bfA - z\meye)\bfP. \]

Upon taking determinants and applying Theorem 5.2, we get

\[ \det(\bfB - z\meye) = \det(\bfA - z\meye). \]

Hence:

Theorem 5.11 Similar matrices have the same characteristic polynomials, and therefore the same eigenvalues.

Example 5.10 Explain why the matrix \(\bfA = \twomat{1}{1}{0}{0}\) cannot be similar to the identity matrix.

Solution. If \(\bfA\) and \(\meye\) were similar, then they would have the same eigenvalues. But \(\bfA\), which is triangular, has eigenvalues \(\lambda_1=1\) and \(\lambda_2=0\), while \(\meye\) has only \(\lambda=1\).

5.4.1 Matrix powers

One use of similarity is to compute powers of matrices. Suppose \(\bfB=\bfP^{-1}\bfA\bfP\). Then

\[ \begin{split} \bfB^2 &= \bigl(\bfP^{-1}\bfA\bfP\bigr) \bigl(\bfP^{-1}\bfA\bfP\bigr) \\ &= \bfP^{-1}\bfA\bigl( \bfP \bfP^{-1}\bigr) \bfA\bfP \\ &= \bfP^{-1}\bfA \bfA\bfP \\ & = \bfP^{-1}\bfA^2\bfP. \end{split} \]

A similar argument shows that for any positive integer \(k\),

\[ \bfB^k = \bfP^{-1}\bfA^k\bfP. \tag{5.5}\]

Equation 5.5 becomes useful when \(\bfA\) is easy to compute. The easiest case is when \(\bfA\) is diagonal, because

\[ \begin{bmatrix} D_{11} & & & \\ & D_{22} & & \\ & & \ddots & \\ & & & D_{nn} \end{bmatrix}^k = \begin{bmatrix} D_{11}^k & & & \\ & D_{22}^k & & \\ & & \ddots & \\ & & & D_{nn}^k \end{bmatrix}. \tag{5.6}\]

Example 5.11 Suppose that \[ \bfa = \twomat{1}{3}{2}{0} \; \twomat{1}{0}{0}{-1} \; \twomat{1}{3}{2}{0}^{-1}. \] Find \(\bfA^{100}\).

Solution. Applying Equation 5.5 and Equation 5.6, we get

\[ \begin{split} \bfA^{100} &= \twomat{1}{3}{2}{0} \; \twomat{1}{0}{0}{-1}^{100} \; \twomat{1}{3}{2}{0}^{-1} \\[1ex] &= \twomat{1}{3}{2}{0} \; \twomat{1^{100}}{0}{0}{(-1)^{100}} \; \twomat{1}{3}{2}{0}^{-1} \\[1ex] &= \twomat{1}{3}{2}{0} \; \twomat{1}{0}{0}{1} \; \twomat{1}{3}{2}{0}^{-1} \\[1ex] &= \twomat{1}{3}{2}{0} \; \twomat{1}{3}{2}{0}^{-1} \\[1em] &= \meye. \end{split} \]

You may be wondering, Example 5.11 is great and all, but how likely is it that we will find that some matrix is similar to a diagonal matrix?

Funny you should ask.

5.5 Diagonalization

Suppose \(\bfA\) is \(n\times n\) and that it has eigenvalues \(\lambda_1,\ldots,\lambda_n\) associated with linearly independent eigenvectors \(\bfv_1,\ldots,\bfv_n\). Writing out the equations \(\bfA\bfv_j = \lambda_j \bfv_j\) in columns, we find

\[ \begin{split} \begin{bmatrix} \bfA \bfv_1 & \bfA \bfv_2 & \cdots & \bfA \bfv_n \end{bmatrix} &= \begin{bmatrix} \lambda_1 \bfv_1 & \lambda_2 \bfv_2 & \cdots & \lambda_n \bfv_n \end{bmatrix} \\[1em] \bfA \begin{bmatrix} \bfv_1 & \bfv_2 & \cdots & \bfv_n \end{bmatrix} &= \begin{bmatrix} \bfv_1 & \bfv_2 & \cdots & \bfv_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix} \\[1em] \bfA \bfV &= \bfV \mathbf{D}, \end{split} \]

where we defined

\[ \bfV = \begin{bmatrix} \bfv_1 & \bfv_2 & \cdots & \bfv_n \end{bmatrix}, \qquad \mathbf{D} = \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}. \tag{5.7}\]

Since we assumed that the columns of \(\bfV\) are linearly independent vectors, the column space of \(\bfV\) is \(n\)-dimensional. Hence \(\rank(\bfV)=n\) and \(\bfV\) is invertible, and we can conclude that \(\bfA\) and \(\mathbf{D}\) are similar.

Definition 5.8 (Diagonalization) A diagonalization of square matrix \(\bfA\) is the similarity factorization

\[ \bfA = \bfV \mathbf{D} \bfV^{-1}, \tag{5.8}\]

where \(\mathbf{D}\) is a diagonal matrix of eigenvalues and \(\bfV\) is a square matrix whose columns are corresponding eigenvectors, as defined in Equation 5.7. If such a factorization exists, we say that \(\bfA\) is diagonalizable.

Caution

Diagonalizable and invertible are independent properties. Diagonalization refers to the invertibility of the eigenvector matrix, not the original one.

Example 5.12 Find a diagonalization of \(\bfA = \twomat{1}{1}{0}{0}\).

Solution. Since \(\bfA\) is triangular, we see immediately that its eigenvalues are \(\lambda_1=1\) and \(\lambda_2=0\). The eigenspace for \(\lambda_1\) is the null space of

\[ \bfA - (1)\meye = \begin{bmatrix} 0 & 1 \\ 0 & -1 \end{bmatrix}, \]

and the eigenspace is spanned by \([1,0]\). The eigenspace for \(\lambda_2\) is the null space of

\[ \bfA - (0)\meye = \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}, \]

and the eigenspace is spanned by \([-1,1]\). Thus we propose the matrices

\[ \mathbf{D} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}, \qquad \bfV = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix}. \]

If \(\bfV\) is invertible, then \(\bfA\) is diagonalizable. The determinant of \(\bfV\) is \(1\), so it is invertible. We note that

\[ \bfV^{-1} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}, \]

which finally implies

\[ \bfA = \bfV \mathbf{D} \bfV^{-1} = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}. \]

Example 5.13 Suppose \(\bfA\) has the diagonalization

\[ \bfA = \begin{bmatrix} 1 & 0 & 1 \\ -1 & -1 & 0 \\ 0 & 0 & 2 \end{bmatrix} \, \begin{bmatrix} -2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 5 \end{bmatrix} \, \begin{bmatrix} 1 & 0 & -\tfrac{1}{2} \\ -1 & -1 & \tfrac{1}{2} \\ 0 & 0 & \tfrac{1}{2} \end{bmatrix}. \]

Find the eigenvalues and eigenvectors of \(\bfA\).

Solution. Fight the temptation to multiply out product to find \(\bfA\)! All the information needed is in the first two matrices, which by Equation 5.8 give eigenvectors and their corresponding eigenvalues:

\[ \lambda_1 = -2, \quad \bfv_1 = \threevec{1}{-1}{0}, \qquad \lambda_2 = 1, \quad \bfv_2 = \threevec{0}{-1}{0}, \qquad \lambda_3 = 5, \quad \bfv_3 = \threevec{1}{0}{2}. \]

5.5.1 Multiplicity

The eigenvalues of a matrix are the roots of its characteristic polynomial. In the general \(n\times n\) case, it factors as

\[ p(z) = (z-\lambda_1)^{m_1}(z-\lambda_2)^{m_2}\cdots(z-\lambda_k)^{m_k}, \tag{5.9}\]

for positive integer multiplicities such that \(m_1+\cdots+m_k=n\).

Definition 5.9 (Algebraic multiplicity) The algebraic multiplicity (AM) of an eigenvalue is its multiplicity as a root of the characteristic polynomial. An eigenvalue with AM equal to 1 is called simple.

The easiest situation is when all of the eigenvalues are simple:

Theorem 5.12 (Distinct eigenvalues) If all the eigenvalues of \(\bfA\) are simple, i. e. distinct, then \(\bfA\) is diagonalizable.

If the eigenvalues are not all simple, then the matrix may or may not be diagonalizable, and we have to go into greater detail. The following example illustrates a peculiar possibility.

Example 5.14 Find the eigenspaces of \(\bfA=\twomat{4}{1}{0}{4}\).

Solution. The characteristic polynomial is

\[ \twodet{4-\lambda}{1}{0}{4-\lambda} = (4-\lambda)^2, \]

so the double root \(\lambda_1=4\) is the only eigenvalue. Since

\[ \bfA - 4\meye = \twomat{0}{1}{0}{0}, \]

the eigenspace has basis \(\twovec{1}{0}\). There is only one possible eigenvector, up to linear dependence.

Example 5.14 leads us to define a second notion of multiplicity.

Definition 5.10 (Geometric multiplicity) The geometric multiplicity (GM) of an eigenvalue is the dimension of its associated eigenspace.

Here is an important fact we won’t try to justify.

Theorem 5.13 The multiplicities of any eigenvalue satisfy

\[ 1 \le \text{GM} \le \text{AM}. \]

5.5.2 Defectiveness

In the Example 5.14 we found an eigenvalue \(\lambda_1=4\) of algebraic multiplicity 2 whose geometric multiplicity is 1.

Definition 5.11 (Defectiveness) An eigenvalue \(\lambda\) whose geometric multiplicity is strictly less than its algebraic multiplicity is said to be defective. A matrix is called defective if any of its eigenvalues are defective.

And now we can tie this discussion neatly back to the diagonalization of a matrix:

Theorem 5.14 A matrix has a diagonalization if and only if it is not defective.

5.5.3 2-by-2 case

A \(2\times 2\) matrix has two eigenvalues, counting multiplicities. If they are distinct, then Theorem 5.12 applies and the matrix is diagonalizable. Otherwise, there is a single eigenvalue of AM 2.

Example 5.14 showed a defective case. Here is a nondefective case.

Example 5.15 The \(2\times 2\) identity matrix \(\meye\) has a lone eigenvalue \(\lambda_1=1\) of algebraic multiplicity 2. The system \((\meye - \meye)\bfv=\bfzero\) has an RRE form that is the zero matrix, so there are two free variables and two basis vectors. Hence the geometric multiplicity of \(\lambda_1\) is also 2.

In fact, the difference between the cases is easy to spot in \(2\times 2\) matrices.

Theorem 5.15 (\(2\times 2\) defectiveness) Any \(\bfA\in\cmn{2}{2}\) that has a single repeated eigenvalue is either defective or a multiple of the identity matrix.