4  Matrix algebra

We now dive deeply into the world of vectors and matrices. There are a ton of new facts in this chapter, presented in the mathematical form of definitions and theorems so that they are stated precisely. But the terminology overlaps tremendously, and there are actually relatively few unique ideas.

4.1 Elementwise operations

In this game, we often refer to mere numbers as scalars. That’s because they just scale every element, like in

\[ c \begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & & \vdots \\ A_{m1} & \cdots & A_{mn} \end{bmatrix} = \begin{bmatrix} cA_{11} & \cdots & cA_{1n} \\ \vdots & & \vdots \\ cA_{m1} & \cdots & cA_{mn} \end{bmatrix}. \]

It’s easy to add or subtract two vectors or two matrices that have the same size. Just act elementwise:

\[ \begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & & \vdots \\ A_{m1} & \cdots & A_{mn} \end{bmatrix} + \begin{bmatrix} B_{11} & \cdots & B_{1n} \\ \vdots & & \vdots \\ B_{m1} & \cdots & B_{mn} \end{bmatrix} = \begin{bmatrix} A_{11}+B_{11} & \cdots & A_{1n}+B_{1n} \\ \vdots & & \vdots \\ A_{m1}+B_{m1} & \cdots & A_{mn}+B_{mn} \end{bmatrix} \]

We consider the operation of adding matrices of different sizes to be undefined.

Note

Mathematically, we leave the operation of adding a scalar to a vector or matrix undefined as well, although MATLAB and NumPy will happily do that for you.

You would probably expect that we define matrix multiplication similarly:

\[ \begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & & \vdots \\ A_{m1} & \cdots & A_{mn} \end{bmatrix} \cdot \begin{bmatrix} B_{11} & \cdots & B_{1n} \\ \vdots & & \vdots \\ B_{m1} & \cdots & B_{mn} \end{bmatrix} \stackrel{??}{=} \begin{bmatrix} A_{11}B_{11} & \cdots & A_{1n}B_{1n} \\ \vdots & & \vdots \\ A_{m1}B_{m1} & \cdots & A_{mn}B_{mn} \end{bmatrix} \]

But we don’t! OK, technically this is called a Hadamard product, and it has some uses. But 99.9999% of the time a different, less obvious way of multiplying matrices does a better job of respecting critical mathematical structure.

4.2 Matrix times vector

The idea of linear combinations, as defined in Definition 3.11, serves as the foundation of multiplication between a matrix and a vector.

Definition 4.1 (Matrix times vector) Given \(\bfA\in\cmn{m}{n}\) and \(\bfx\in\complex^{n}\), the product \(\bfA\bfx\) is defined as

\[ \bfA\bfx = x_1 \bfa_1 + x_2 \bfa_2 + \cdots + x_n \bfa_n = \sum_{j=1}^n x_j \bfa_j, \tag{4.1}\]

where \(\bfa_j\) refers to the \(j\)th column of \(\bfA\).

In order for \(\bfA\bfx\) to be defined, the number of columns in \(\bfA\) has to be the same as the number of elements in \(\bfx\).

Note that when \(\bfA\) is \(m\times n\), then \(\bfx\) must be in \(\real^n\) or \(\complex^n\), and \(\bfA\bfx\) has dimension \(m\).

Example 4.1 Calculate the product

\[ \begin{bmatrix} 1 & -1 & -1 \\ 3 & -2 & 0 \\ 1 & -2 & -1 \end{bmatrix} \threevec{-1}{2}{-1}. \]

Solution. The product is equivalent to

\[ (-1) \threevec{1}{3}{1} + (2) \threevec{-1}{-2}{-2} + (-1) \threevec{-1}{0}{-1} = \threevec{-2}{-7}{-4}. \]

We don’t often write out the product in this much detail. Instead we “zip together” the rows of the matrix with the entries of the vector:

\[ \threevec{(-1)(1)+(2)(-1)+(-1)(-1)}{(-1)(3)+(2)(-2)+(-1)(0)}{(-1)(1)+(2)(-2)+(-1)(-1)} = \threevec{-2}{-7}{-4}. \]

You might recognize the “zip” expressions in this vector as dot products from vector calculus.

Note

We can regard a vector \(\bfx \in \real^n\) as also being a matrix, in two ways: as a member of \(\rmn{1}{n}\), making it a row vector, or as a member of \(\rmn{n}{1}\), making it a column vector. Our convention is that when we want to interpret a named vector as a matrix, it’s a column vector.

However, that Python assumes a row vector, MATLAB lets you choose either, and Julia considers it a column vector. It’s a mess that can lead to frustrating errors in computer codes.

4.2.1 Properties

What justifies calling this operation multiplication? In large part, it’s the natural distributive properties

\[ \begin{split} \bfA(\bfx+\bfy) & = \bfA\bfx + \bfA\bfy,\\ (\bfA+\bfB)\bfx & = \bfA\bfx + \bfB\bfx, \end{split}, \]

along with

\[ \bfA(c\bfx)=c(\bfA\bfx) \]

for any scalar \(c\).

But there is a major departure from multiplication as we usually know it.

Warning

Matrix-vector products are not commutative. In fact, \(\bfx\bfA\) is not defined even when \(\bfA\bfx\) is.

This is the first time we see something that will happen more generally:

Important

The order of the terms in a product of non-scalars cannot be changed without some explicit justification.

4.2.2 Connection to linear systems

The following observation finally brings us back around to the introduction of linear systems through the insultingly simple scalar equation \(ax=b\).

Theorem 4.1 The linear system with coefficient matrix \(\bfA\), forcing vector \(\bfb\), and solution \(\bfx\) is equivalent to the equation \(\bfA\bfx=\bfb\).

The following result follows quickly from our definitions.

Theorem 4.2 The linear system \(\bfA\bfx=\bfb\) is consistent if and only if \(\bfb\) is in the span of the columns of \(\bfA\).

4.3 Matrix times matrix

We can think of vectors as a special kind of matrix, and accordingly we can generalize matrix-vector products to matrix-matrix products. There are many equivalent ways to define these products. Here is the one we start with.

Definition 4.2 (Matrix times matrix) If \(\bfA\) is \(m\times n\) and \(\bfB\) is \(n\times p\), then the product \(\bfA\bfB\) is defined as

\[ \bfA\mathbf{B} = \bfA \begin{bmatrix} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_p \end{bmatrix} = \begin{bmatrix} \bfA\mathbf{b}_1 & \bfA\mathbf{b}_2 & \cdots & \bfA\mathbf{b}_p \end{bmatrix}. \tag{4.2}\]

In words, a matrix-matrix product is the horizontal concatenation of matrix-vector products involving the columns of the right-hand matrix.

Important

In order to define \(\bfA\bfB\), we require that the number of columns in \(\bfA\) is the same as the number of rows in \(\bfB\). That is, the inner dimensions must agree. The result has size determined by the outer dimensions of the original matrices.

When we compute a matrix product by hand, we usually don’t write out the above. Instead we use a more compact definition for the individual entries of \(\mathbf{C} = \bfA\bfB\),

\[ C_{ij} = \sum_{k=1}^n a_{ik}b_{kj}, \qquad i=1,\ldots,m, \quad j=1,\ldots,p. \tag{4.3}\]

In words:

Important

The \((i,j)\) entry of the matrix product \(\bfA \bfB\) is the dot product of row \(i\) of \(\bfA\) with column \(j\) of \(\bfB\).

Example 4.2 Find \(\mathbf{A}\mathbf{B}\) if

\[ \bfA = \begin{bmatrix} 1 & -1 \\ 0 & 2 \\ -3 & 1 \end{bmatrix}, \qquad \mathbf{B} = \begin{bmatrix} 2 & -1 & 0 & 4 \\ 1 & 1 & 3 & 2 \end{bmatrix}. \]

Solution. Using Equation 4.3,

\[ \begin{split} \bfA\mathbf{B} &= \begin{bmatrix} (1)(2) + (-1)(1) & (1)(-1) + (-1)(1) & (1)(0) + (-1)(3) & (1)(4) + (-1)(2) \\ (0)(2) + (2)(1) & (0)(-1) + (2)(1) & (0)(0) + (2)(3) & (0)(4) + (2)(2) \\ (-3)(2) + (1)(1) & (-3)(-1) + (1)(1) & (-3)(0) + (1)(3) & (-3)(4) + (1)(2) \end{bmatrix} \\ & = \begin{bmatrix} 1 & -2 & -3 & 2 \\ 2 & 2 & 6 & 4 \\ -5 & 4 & 3 & -10 \end{bmatrix} \end{split}. \]

Observe that

\[ \bfA \begin{bmatrix} 2 \\ 1 \end{bmatrix} = 2 \begin{bmatrix} 1 \\ 0 \\ -3 \end{bmatrix} + 1 \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ -5 \end{bmatrix}, \]

and so on.

4.3.1 Counterintuitive observations

Even though matrix multiplication has a lot of similarity to scalar multiplication, there are some fundamental differences.

Example 4.3 Suppose

\[ \bfA = \begin{bmatrix} 1 & 2 & -2 \\ 3 & 0 & 0 \end{bmatrix}, \qquad \bfB = \begin{bmatrix} 0 & -1 & -1 \\ 1 & 1 & 0 \\ 1 & 2 & 3\end{bmatrix}. \]

Then

\[ \bfA \bfB = \begin{bmatrix} 0 + 2 - 2 & -1 + 2 - 4 & -1 + 0 - 6 \\ 0 + 0 + 0 & -3 + 0 + 0 & -3+0+0 \end{bmatrix} = \begin{bmatrix} 0 & -3 & -7 \\ 0 & -3 & -3 \end{bmatrix}. \]

But \(\bfB\bfA\) is not defined, because \(\bfB\) is \(3\times 3\) and \(\bfA\) is \(2\times 3\).

Example 4.4 Suppose

\[ \bfA = \begin{bmatrix} 2 & -2 \\ 1 & 1 \end{bmatrix}, \qquad \bfB = \begin{bmatrix} 0 & -2 \\ 2 & -1 \end{bmatrix}. \]

Then

\[ \bfA\bfB = \begin{bmatrix} 0-4 & -4+2 \\ 0+2 & -2-1 \end{bmatrix} = \begin{bmatrix} -4 & 2 \\ 1 & -3 \end{bmatrix}, \]

and

\[ \bfB\bfA = \begin{bmatrix} 0-2 & 0 -2 \\ 4-1 & -4-1\end{bmatrix} = \begin{bmatrix} -2 & -2 \\ 3 & -5 \end{bmatrix}. \]

Warning

Matrix multiplication is not commutative. If \(\bfA\bfB\) is defined, then \(\bfB\bfA\) may not be, and even if it is, it may not equal \(\bfA\bfB\).

In other words, you cannot change the order of the terms in a matrix product without some explicit justification.

A zero matrix is a matrix of all zeros. When the sizes are compatible, the product of any matrix with a zero matrix is another zero matrix. But here is another critical difference from the scalar case.

Example 4.5 Let \(\bfA = \twomat{0}{0}{1}{0}\). Note that

\[ \bfA \bfA = \twomat{0}{0}{1}{0} \cdot \twomat{0}{0}{1}{0} = \twomat{0}{0}{0}{0}. \]

As a result, we cannot make the implication \[ \bfA\bfB = \bfzero \implies \bfA = \bfzero \text{ or } \bfB=\bfzero \qquad \text{(FALSE!)} \]

Warning

It is possible for the product of two nonzero matrices to be zero.

If you think about it, Example 4.5 also means that we lose another routine property of scalar multiplication:

\[ \bfA \bfB = \bfA \mathbf{C} \implies \bfB = \mathbf{C} \qquad \text{(FALSE!)} \]

Warning

We cannot always cancel out a matrix appearing in a product on both sides of an equation.

4.3.2 Properties

Fortunately, other familiar and handy properties of scalar multiplication do come along for the ride:

  1. \((\bfA\bfB)\mathbf{C}=\bfA(\bfB \mathbf{C})\qquad\) (association)
  2. \(\bfA(\bfB+\mathbf{C}) = \bfA\bfB + \bfA\mathbf{C}\qquad\) (right distribution)
  3. \((\bfA+\bfB)\mathbf{C} = \bfA\mathbf{C} + \bfB\mathbf{C}\qquad\) (left distribution)

These properties are easy to check computationally. (But keep in mind that a numerical demonstration, or an algebraic one at particular sizes, is not a general proof.) In addition, matrix multiplication plays well with numbers:

  1. \((c\bfA \bfB) = c (\bfA \bfB) = \bfA (c \bfB)\)
  2. \(c(\bfA + \bfB) = (c\bfA) + (c\bfB)\)
  3. \((c+d) \bfA = (c\bfA) + (d\bfA)\)

Finally, we observe that if \(\bfA\) is \(m\times n\) and \(\bfx\) is an \(n\)-vector, then \(\bfA\bfx\) gives the same result whether we interpret \(\bfx\) as a vector or as an \(n\times 1\) matrix.

4.3.3 Matrix powers

When \(\bfA\) is square, meaning \(n \times n\), we can define positive integer powers of \(\bfA\) in the usual way:

\[ \bfA^2 = \bfA\cdot \bfA, \quad \bfA^3 = \bfA\cdot \bfA\cdot \bfA = \bfA^2\cdot \bfA = \bfA\cdot \bfA^2, \]

and so on. We have to wait for the next section to define \(\bfA^0\) and negative powers. (The question of noninteger powers of a matrix is a topic for a graduate-level math course.)

4.4 Identity and inverse

You solve \(ax=b\) for nonzero \(a\) without thinking about it: \(x=b/a\). If we do break it down a little, we can see that when we multiply both sides of \(ax=b\) by the number \(1/a\), then on the left the terms \(1/a\) and \(a\) combine to give \(1\), and \(1x=x\). So the key to the solution is the presence of a multiplicative identity value \(1\), and the existence of the multiplicative inverse \(1/a\) when \(a\neq 0\). These two items are also a way to discuss the vector case \(\bfA\bfx=\bfb\).

4.4.1 Identity matrix

Suppose we are given an \(m\times n\) matrix \(\bfA\). Writing its columns as the vectors \(\bfa_1,\ldots,\bfa_n\), we can make the rather obvious observations

\[ \begin{split} \bfa_1 &= 1\cdot \bfa_1 + 0 \cdot \bfa_2 + \cdots + 0\cdot \bfa_n,\\ \bfa_2 &= 0\cdot \bfa_1 + 1 \cdot \bfa_2 + \cdots + 0\cdot \bfa_n,\\ &\; \vdots \\ \bfa_n &= 0\cdot \bfa_1 + 0 \cdot \bfa_2 + \cdots + 1\cdot \bfa_n. \end{split} \]

The purpose in using these expressions is to interpret them as linear combinations, and thus as matrix-vector products. Let’s define \(\bfe_j\) for \(j=1,\ldots,n\) as follows.

Definition 4.3 (Standard vectors) \[ \text{$i$th component of }\bfe_j = \begin{cases} 1, & i=j, \\ 0, & i\neq j. \end{cases} \]

Now we can write

\[ \bfa_j = \bfA \bfe_j, \quad j=1,\ldots,n. \tag{4.4}\]

Furthermore, we can use the definition of matrix products as a concatenation of matrix-vector products to derive

\[ \begin{split} \bfA &= \begin{bmatrix} \bfa_1 & \bfa_2 & \cdots & \bfa_n \end{bmatrix} \\ &= \begin{bmatrix} \bfA\bfe_1 & \bfA\bfe_2 & \cdots & \bfA\bfe_n \end{bmatrix}\\ &= \bfA \begin{bmatrix} \bfe_1 & \bfe_2 & \cdots & \bfe_n \end{bmatrix}. \end{split} \]

Definition 4.4 (Identity matrix) The \(n\times n\) identity matrix is

\[ \meye = \begin{bmatrix} \bfe_1 & \bfe_2 & \cdots & \bfe_n \end{bmatrix} = \begin{bmatrix} 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ & & \ddots & & \\ 0 & 0 & \cdots & 1 & 0 \\ 0 & 0 & \cdots & 0 & 1 \end{bmatrix}. \]

Note

Sometimes, when we need to indicate the size of the identity, we use a subscript, as in \(\meye_4\) to represent the \(4\times 4\) case. Usually, though, it’s implied by the context.

Theorem 4.3 (Multiplicative identity) If \(\bfA\) is \(m\times n\), then \(\bfA = \meye_m \bfA = \bfA \meye_n\).

Example 4.6 Compute

\[ \begin{bmatrix} 7 & -2 & 11 \\ 1131 & \pi & -\sqrt{13} \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{bmatrix}. \]

Solution. You can grind through the multiplication algorithm, of course, but there is a shortcut:

\[ \begin{split} \begin{bmatrix} 7 & -2 & 11 \\ 1131 & \pi & -\sqrt{13} \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{bmatrix} & = \begin{bmatrix} 7 & -2 & 11 \\ 1131 & \pi & -\sqrt{13} \end{bmatrix} \left( 2 \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \right) \\ & = 2 \begin{bmatrix} 7 & -2 & 11 \\ 1131 & \pi & -\sqrt{13} \end{bmatrix} \cdot \meye \\ & = \begin{bmatrix} 14 & -4 & 22 \\ 2262 & 2\pi & -2\sqrt{13} \end{bmatrix}. \end{split} \]

4.4.2 Inverse

We are now going to introduce a major simplification.

Definition 4.5 (Square matrix) A square matrix has the same number of rows as columns.

Here is what we seek from a multiplicative inverse.

Definition 4.6 (Inverse) Suppose \(\bfA\) is a square matrix. A matrix \(\mathbf{Z}\) of the same size such that \(\mathbf{Z}\bfA = \meye\) and \(\bfA\mathbf{Z}=\meye\) is called the inverse of \(\bfA\), written \(\mathbf{Z} = \bfA^{-1}\). In this case we say \(\bfA\) is invertible. A matrix that has no inverse is singular.

The definition of inverse mentions that both \(\bfA\mathbf{Z}\) and \(\mathbf{Z}\bfA\) are the identity matrix. It turns out that one of these facts always implies the other.

Theorem 4.4 If \(\bfA\) and \(\mathbf{Z}\) are square matrices, then \(\bfA\mathbf{Z} = \meye\) if and only if \(\mathbf{Z}\bfA = \meye\). In either case, \(\mathbf{Z} = \bfA^{-1}\).

Finding the inverse of a given square \(\bfA\) is usually hard. But checking whether a proposed \(\mathbf{Z}\) is the inverse of \(\bfA\) is easy: just multiply them together (in either order) and see if you get an identity matrix.

Example 4.7 The matrix \(\mathbf{R}(\theta) = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix}\) performs rotation in the plane around the origin by angle \(\theta\). Show that \(\mathbf{R}(-\theta)\) is the inverse of \(\mathbf{R}(\theta)\).

Solution. All we need to do is to check that the product, in either order, is the identity matrix:

\[ \begin{split} \mathbf{R}(-\theta)\mathbf{R}(\theta) &= \begin{bmatrix} \cos(-\theta) & -\sin(-\theta) \\ \sin(-\theta) & \cos(-\theta) \end{bmatrix} \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \\ &= \begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \\ &= \begin{bmatrix} \cos^2(\theta)+\sin^2(\theta) & -\cos(\theta)\sin(\theta) + \sin(\theta)\cos(\theta) \\ -\sin(\theta)\cos(\theta) + \cos(\theta)\sin(\theta) & \sin^2(\theta) + \cos^2(\theta) \end{bmatrix} \\ &= \twomat{1}{0}{0}{1}. \end{split} \]

If \(\mathbf{S}\) is an \(n\times n\) matrix of all zeros, then \(\mathbf{S}\bfA\) and \(\bfA\mathbf{S}\) are also zero matrices whenever the sizes are compatible. Therefore, \(\mathbf{S}\) is singular—no inverse is possible, just like for the scalar zero. But unlike with scalars, there are other, nonzero singular matrices.

Important

Some nonzero matrices are singular.

Example 4.8 Let \(\bfA = \twomat{0}{0}{1}{0}\). Suppose that

\[ \meye = \twomat{a}{b}{c}{d} \bfA = \twomat{b}{0}{d}{0}. \]

This is clearly impossible for any choices of \(a,b,c,d\). Hence \(\bfA\) is singular.

4.4.3 Properties

Here are some facts about inverses that we will use without justification.

Theorem 4.5 If \(\bfA\) is an invertible square matrix, then:

  1. \(\bfA^{-1}\) is unique.
  2. \((\bfA^{-1})^{-1} = \bfA\); that is, \(\bfA\) is the inverse of \(\bfA^{-1}\).
  3. If \(c\) is a nonzero number, then \((c\bfA)^{-1}= \dfrac{1}{c}\bfA^{-1}\).
  4. If \(\bfB\) is invertible and the same size as \(\bfA\), then \(\bfA\bfB\) is invertible and

\[ (\bfA\bfB)^{-1} = \bfB^{-1}\bfA^{-1}. \tag{4.5}\]

The last identity above is easy to get wrong, so it bears restatement (and a little generalization) in words:

Important

The inverse of a product of two or more invertible matrices is the product of the inverses in the reverse order.

In Section 4.3 we saw some counterintuitive properties of matrix multiplication. Those were due entirely to matrix singularity. Specifically, in Example 4.5, we saw that

\[ \bfA = \twomat{0}{0}{1}{0} \]

implies that \(\bfA^2 = \bfzero\). This is possible only because, as shown in Example 4.8, \(\bfA\) is singular. According to Theorem 4.5, the product of invertible matrices is invertible, so that product can’t be zero. We can refine our earlier observation:

Important

A product that incudes a nonzero singular matrix may be zero.

Similarly, the “lost” cancellation property,

\[ \bfA \bfB = \bfA \mathbf{C} \implies \bfB = \mathbf{C} \qquad \text{(FALSE!)}, \]

becomes true if \(\bfA\) is invertible. The proof is easy:

\[ \bfA^{-1}( \bfA \bfB) = (\bfA^{-1} \bfA) \bfB = (\meye) \bfB = \bfB, \]

and since we can do the same to show that \(\bfA^{-1}(\bfA\mathbf{C}) = \mathbf{C}\), we conclude that \(\bfB=\mathbf{C}\).

4.4.4 Diagonal matrix

Definition 4.7 A diagonal matrix \(\mathbf{D}\) is one in which \(D_{ij}=0\) whenever \(i\neq j\).

If any diagonal element \(D_{ii}\) is zero, then a diagonal matrix is singular. Otherwise, its inverse is trivial, thanks to how matrix multiplication is defined.

Theorem 4.6 (Inverse of a diagonal matrix) \[ \begin{bmatrix} A_{11} & & & \\ & A_{22} & & \\ & & \ddots & \\ & & & A_{nn} \end{bmatrix}^{-1} = \begin{bmatrix} \frac{1}{A_{11}} & & & \\ & \frac{1}{A_{22}} & & \\ & & \ddots & \\ & & & \frac{1}{A_{nn}} \end{bmatrix} \tag{4.6}\]

4.4.5 \(2\times 2\)

In the \(2\times 2\) case, the inverse is easy enough to memorize.

Theorem 4.7 (Inverse of \(2\times 2\)) \[ \begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac{1}{ad-bc}\: \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}. \tag{4.7}\]

This formula breaks down if \(ad=bc\), in which case the matrix is singular.

4.5 Fundamental Theorem

The following theorem is in every linear algebra course, but it does not have a universally accepted name.

Theorem 4.8 (Fundamental Theorem of Linear Algebra, FTLA) If \(\bfA\) is an \(n\times n\) matrix, then each of these statements is equivalent to all of the others.

  1. \(\bfA\) is invertible.
  2. The linear system \(\bfA\bfx=\bfb\) has the unique solution \(\bfx=\bfA^{-1}\bfb\).
  3. The RRE form of \(\bfA\) is the identity matrix.
  4. \(\rank(\bfA)=n\).
Important

Because the statements in Theorem 4.8 are equivalent, so are their negations. That is, if \(\bfA\) is an \(n\times n\) matrix, then these statements are all equivalent as well:

  1. \(\bfA\) is singular.
  2. The linear system \(\bfA\bfx=\bfb\) has infinitely many solutions or is inconsistent.
  3. The RRE form of \(\bfA\) has at least one row of zeros.
  4. \(\rank(\bfA) < n\).
Note

The solution formula \(\bfx=\bfA^{-1}\bfb\) from Theorem 4.8 is theoretically valuable but can be applied only if the inverse is available. Computing a matrix inverse is usually harder than doing row elimination on a linear system, so it’s not a useful algorithm in general.

4.6 Subspaces

You are familiar with the idea that the \(x\)-axis is a special subset of the plane, and that the \(xy\)-plane is a special subset of \(\real^3\). These examples generalize in an important way.

Definition 4.8 (Subspace) A subspace of \(\real^n\) is a subset \(S\) satisfying two conditions:

  1. The zero vector is in \(S\).
  2. Every linear combination of vectors in \(S\) is also in \(S\). (This is called closure.)
Note

We will be making statements about real spaces like \(\real^n\), but everything also works for the complex case \(\complex^n\), which turns out to be important later.

Example 4.9 The equation \(x + 2y + 3z = 0\) describes a plane passing through the origin in \(\real^3\). It’s clear geometrically that scaling a vector in the plane leaves you in the plane, and adding vectors in the plane does as well. This is enough to show that this plane is a subspace of \(\real^3\).

The equation \(x+y+z=1\) is also a plane in \(\real^3\), but it does not pass through the origin, so it cannot be a subspace.

Example 4.10 The singleton set \(\{\bfzero\}\) is a subspace of \(\real^n\) for any \(n\). It’s called the trivial subspace.

Here is a common way to encounter subspaces: the span of a set of vectors.

Theorem 4.9 If \(S=\span(\bfv_1,\ldots,\bfv_k)\) for any vectors \(\bfv_j\) in \(\real^n\), then \(S\) is a subspace of \(\real^n\).

4.6.1 Column space

Each matrix generates several subspaces associated with it.

Definition 4.9 (Column space) Let \(\bfA\) be an \(m\times n\) matrix. The column space of \(\bfA\), \(\colsp(\bfA)\), is the span of the columns of \(\bfA\).

Important

We have encountered column spaces a lot, actually, just not by name. Here is a collection of equivalent statements:

  1. \(\bfb\) is in the column space of \(\bfA\).
  2. \(\bfb\) is a linear combination of the columns of \(\bfA\).
  3. \(\bfA\bfx=\bfb\) for at least one choice of \(\bfx\).

The last item above implies that the column space is the set of all vectors \(\bfA\bfx\) for some choice of \(\bfx\). It’s also known as the range of \(\bfA\) for this reason, by thinking of \(\bfA\) as the function \(\mathbf{f}(\bfx)=\bfA\bfx\).

Example 4.11 Is \(\threevec{1}{2}{3}\) in the column space of \(\bfA = \begin{bmatrix} 1 & 1 \\ -1 & 1 \\ 2 & 1 \end{bmatrix}\)?

Solution. We can check by looking at the consistency of the system \(\bfA\bfx = \threevec{1}{2}{3}\). The augmented matrix is

\[ \augmat{\begin{matrix} 1 & 1 \\ -1 & 1 \\ 2 & 1 \end{matrix}}{\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}} \stackrel{\text{RRE}}{\implies} \augmat{\begin{matrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{matrix}}{\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}}. \]

This system is inconsistent, so the given vector is not in the column space.

4.6.2 Null space

Here is another subspace automatically associated with a matrix.

Definition 4.10 The null space of a matrix \(\bfA\), written \(\nullsp(\bfA)\), is the set of all solutions to the homogeneous linear system with coefficient matrix \(\bfA\).

Note

Sometimes the null space is called the kernel of the matrix.

By definition, \(\{\bfzero\}\) is in the null space of every matrix.

To find the members of the null space of \(\bfA\), we solve the homogeneous system \(\bfA\bfx=\bfzero\). Because row operations can never change a column of zeros, we can use row elimination to find the RRE form of \(\bfA\) and then read off the null space.

Example 4.12 Suppose the RRE form of a matrix \(\bfA\) is

\[ \mathbf{R} = \begin{bmatrix} 0 & 1 & 4 & 0 & 1 \\ 0 & 0 & 0 & 1 & -3 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix}. \]

Therefore, the RRE form of the augmented matrix \(\augmat{\mathbf{A}}{\boldsymbol{0}}\) is \(\augmat{\mathbf{R}}{\boldsymbol{0}}\). The pivot columns are 2 and 4, which makes \(x_1=r\), \(x_3=s\), and \(x_5=t\) free variables. The nonzero rows lead to

\[ \begin{split} x_2 + 4x_3 + x_5 &= 0 &\quad ⇒ \quad x_2 &= -4s-t,\\ x_4 -3x_5 &= 0 & \quad ⇒ \quad x_4 &= 3t. \end{split} \]

One way to express a generic solution vector in the null space is by a linear combination of constant vectors:

\[ \begin{bmatrix} r \\ -4s-t \\ s \\ 3t\\ t \end{bmatrix} = r \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + s \begin{bmatrix} 0 \\ -4 \\ 0 \\ 1 \\ 0 \end{bmatrix} + t \begin{bmatrix} 0 \\ -1 \\ 0 \\ 3 \\ 1 \end{bmatrix} = r\bfv_1 + s\bfv_2 + t\bfv_3 . \]

Hence \(\nullsp(\bfA)=\span(\bfv_1,\bfv_2,\bfv_3)\), where the \(\bfv_j\) are the constant vectors above.

4.6.3 Basis

Definition 4.11 (Basis) A basis of a subspace \(S\) is any set of linearly independent vectors that spans \(S\).

Finding a basis for a null space was demonstrated in Example 4.12. The column space is also found from the RRE form.

Theorem 4.10 Let \(\bfA\) have RRE form with pivot columns numbered \(j_1,\ldots,j_r\). Then columns \(j_1,\ldots,j_r\) of \(\bfA\) are a basis for \(\colsp(\bfA)\).

Example 4.13 Find bases for the null space and column space of

\[ \bfA = \begin{bmatrix} 1 & 2 & 0 & -4 \\ -2 & -4 & 1 & 9 \\ -3 & -6 & 1 & 13 \\ -2 & -4 & 0 & 8 \end{bmatrix}. \]

Solution. You can compute that the RRE form of \(\bfA\) is

\[ \begin{bmatrix} 1 & 2 & 0 & -4 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}. \]

To get a basis for \(\colsp(\bfA)\) we choose columns 1 and 3 of \(\bfA\), i. e., \(\{[1,-2,-3,-2], [0,1,1,0] \}\).

The homogeneous system \(\bfA\bfx = \bfzero\) has free variables \(x_2=s\), \(x_4=t\). Solving for the other variables gives the solution set

\[ \bfx = s \begin{bmatrix} -2 \\ 1 \\ 0 \\ 0 \end{bmatrix} + t \begin{bmatrix} 4 \\ 0 \\ -1 \\ 1 \end{bmatrix}, \]

which makes \(\{[-2,1,0,0],[4,0,-1,1] \}\) a basis for \(\nullsp(\bfA)\).

Example 4.14 Find a basis for span of the vectors \(\bfv_1=[1,-2,-3,-2]\), \(\bfv_2=[2,-4,-6,-4]\), \(\bfv_3=[0,1,1,0]\), \(\bfv_4=[-4,9,13,8]\).

Solution. If we use the given vectors as columns of a matrix, then their span is equivalent to the column space of the matrix:

\[ \colsp\left( \begin{bmatrix} 1 & 2 & 0 & -4 \\ -2 & -4 & 1 & 9 \\ -3 & -6 & 1 & 13 \\ -2 & -4 & 0 & 8 \end{bmatrix} \right). \]

This is the same matrix whose column space was found in Example 4.13. Columns 1 and 3 are the pivot columns, and we get the basis \(\bfv_1,\bfv_3\) as before.

4.6.4 Dimension

You have an intuitive idea of dimension. Bases let us lock it down for all subspaces.

Theorem 4.11 Every basis for a subspace \(S\) has the same number of members.

Definition 4.12 (Dimension) The dimension of a subspace \(S\), written \(\dim(S)\), is the number of vectors in any basis of \(S\). The dimension of the singleton subspace \(\{\bfzero\}\) is defined to be zero.

Note

As you would expect, \(\dim(\real^n)=n\).

Relative to a subspace \(S\), a set of vectors has three properties: the number of members, whether they are independent, and whether they span \(S\). The following theorem shows that knowing any two of these properties determines the third.

Theorem 4.12 Suppose \(V\) is a set of \(k\) vectors in subspace \(S\).

  1. If \(k < \dim(S)\), then \(V\) cannot span \(S\).
  2. If \(k > \dim(S)\), then \(V\) cannot be linearly independent.
  3. Suppose \(k=\dim(S)\). If \(V\) is independent or if \(V\) spans \(S\), then \(V\) is a basis for \(S\).

Example 4.15 Determine whether the vectors \(\bfv_1=[1,2,3]\), \(\bfv_2=[2,4,0]\) are a basis of \(\real^3\).

Solution. Since \(\real^3\) is 3-dimensional, and there are only 2 vectors, it’s impossible for them to span it, so they are not a basis.

Example 4.16 Determine whether the vectors \(\bfv_1=[1,2]\), \(\bfv_2=[3,4]\), \(\bfv_3=[1,1]\) are a basis of \(\real^2\).

Solution. Since \(\real^2\) is 2-dimensional, and there are 3 vectors, it’s impossible for them to be independent, so they are not a basis.

Example 4.17 Determine whether the vectors \(\bfv_1=[3,2,1]\), \(\bfv_2=[0,-1,2]\), \(\bfv_3=[1,1,1]\) are a basis of \(\real^3\).

Solution. We find a basis for their span by putting them as columns of a matrix, then looking at its column space:

\[ \begin{bmatrix} 3 & 0 & 1 \\ 2 & -1 & 1 \\ 1 & 2 & 1 \end{bmatrix} \stackrel{\text{RRE}}{\implies} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \]

Every column is a pivot column. The original three vectors form a basis for their span, so they are independent. Since there are 3 of them, they are also a basis of the 3-dimensional \(\real^3\).

Earlier we defined rank as the number of pivot columns in the RRE form of the matrix. So now we have:

Theorem 4.13 For any \(m\times n\) matrix \(\bfA\), \(\rank(\bfA)=\dim(\colsp(\bfA))\).

The dimension of the null space also gets its own name.

Definition 4.13 (Nullity) The nullity of a matrix \(\bfA\), written \(\nullity(\bfA)\), is the dimension of \(\nullsp(\bfA)\).

Finally, we have a fancy way of saying that every variable in a linear system is either a pivot variable or a free variable:

Theorem 4.14 For any \(m\times n\) matrix \(\bfA\),

  1. \(\dim(\nullsp(\bfA))\) is the number of free variables in the RRE form of \(\mathbf{A}\).
  2. \(\rank(\bfA) + \nullity(\bfA) = n\).

Here is a table that tries to organize much of the language of the linear algebra learned so far.

And, here is an expanded form of the Fundamental Theorem.

Theorem 4.15 (Fundamental Theorem of Linear Algebra, Director’s Cut) If \(\bfA\) is a real \(n\times n\) matrix, then each of these statements is equivalent to all of the others.

  1. \(\bfA\) is invertible.
  2. The linear system \(\bfA\bfx=\bfb\) has the unique solution \(\bfx=\bfA^{-1}\bfb\).
  3. The RRE form of \(\bfA\) is the identity matrix.
  4. \(\rank(\bfA)=n\).
  5. The columns of \(\bfA\) form a basis of \(\real^n\).
  6. \(\nullity(\bfA)=0\).
  7. The null space of \(\bfA\) is just \(\{\bfzero\}\).

4.7 Other vector spaces

Here’s where things get a little weird.

4.7.1 Matrices

If we stack the columns of an \(m\times n\) matrix on top of one another, then we get a vector in \(\complex^{mn}\). Conversely, if we start with a vector in \(\complex^{mn}\), we can easily reshape it to be \(m\times n\). Let’s define and name these operations.

Definition 4.14 For fixed values of \(m\) and \(n\), we define the following functions:

\[ \vec\left( \begin{bmatrix} \bfa_1 & \bfa_2 & \cdots & \bfa_n \end{bmatrix} \right) = \begin{bmatrix} \bfa_1 \\ \bfa_2 \\ \vdots \\ \bfa_n \end{bmatrix}, \]

which maps \(\bfA \in \cmn{m}{n}\) to a vector in \(\complex^{mn}\), and its inverse function,

\[ \unvec\left( \begin{bmatrix} A_{11} \\ \vdots \\ A_{m1} \\ A_{12}\\ \vdots \\ A_{m2} \\ \vdots \\ A_{mn} \end{bmatrix} \right) = \begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & & \vdots \\ A_{m1} & \cdots & A_{mn} \end{bmatrix}. \]

Not only do we have a correspondence between \(\cmn{m}{n}\) and \(\complex^{mn}\), the operations of addition and scalar multiplication mirror one another. For instance, if \(\bfv=\vec(\bfA)\) and \(\bfw=\vec(\bfB)\), then \(\bfv+\bfw=\vec(\bfA+\bfB)\), and \(\bfA+\bfB = \unvec(\bfv) + \unvec(\bfw)\). That is, to add matrices in \(\cmn{m}{n}\), we can transform them over to \(\complex^{mn}\), add them there, and then transform the result back. The same is true for scalar multiplication. Finally, note that the vec of a zero matrix is equal to the zero vector in \(\complex^{mn}\).

All this is to say that the set \(\cmn{m}{n}\) acts just like the set \(\complex^{mn}\) when it comes to linear combinations. We can say more simply that \(\cmn{m}{n}\) is a vector space.

Example 4.18 Determine whether the matrix \(\twomat{1}{-4}{2}{1}\) lies in the span of \(\twomat{1}{0}{0}{1}\), \(\twomat{0}{1}{1}{0}\), and \(\twomat{1}{1}{-1}{1}\).

Solution. Simply reinterpret the question by mapping each given matrix to its \(\vec()\) equivalent. We therefore want to know about the consistency of the linear system

\[ \augmat{\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & -1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \end{matrix}}{\begin{matrix} 1 \\ 2 \\ -4 \\ 1 \end{matrix}}. \]

The RRE form is

\[ \augmat{\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{matrix}}{\begin{matrix} 4 \\ -1 \\ -3 \\ 0 \end{matrix}}, \]

which is consistent. Hence \(\twomat{1}{-4}{2}{1}\) is in the span of the three given matrices.

4.7.2 Polynomials

Definition 4.15 Let \(\mathcal{P}_n\) denote the set of all polynomials of degree \(n\) or less having real coefficients.

For instance, \(2x^2-1\) is a member of \(\mathcal{P}_2\) (as well as \(\mathcal{P}_3\), \(\mathcal{P}_4\), etc.), but \(x^4+t\) is not.

It is easy to set up a correspondence between every member of \(\mathcal{P}_n\) and a vector in \(\real^{n+1}\):

\[ c_0 + c_1 x + c_2 x^2 + \cdots + c_n x^n \Longleftrightarrow \begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}. \tag{4.8}\]

As with \(\cmn{m}{n}\), this correspondence preserves the structure of addition, scalar multiplication, and zero. Therefore, \(\mathcal{P}_n\) is a vector space.

Example 4.19 Determine whether the polynomials \(3x+4\), \(x^2-1\), and \(x^2+x\) are linearly dependent.

Solution. This question is identical to asking about the independence of

\[ \begin{bmatrix} 4 \\ 3 \\ 0 \end{bmatrix}, \quad \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix}, \quad \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}. \]

Hence we seek nontrivial solutions of the linear system

\[ \augmat{\begin{matrix} 4 & -1 & 0 \\ 3 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}}{\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}}. \]

The RRE form is

\[ \stackrel{RRE}{\implies} \augmat{\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}}{\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}}, \]

which has a unique solution. Hence the polynomials are linearly independent.

Note

The standard vectors \(\bfe_1. \ldots, \bfe_{n+1}\) of \(\real^{n+1}\) correspond to the monomials \(1, x, x^2, \ldots, x^n\) in \(\mathcal{P}_n\).

4.7.3 ODE solutions

All solutions of the homogeneous linear ODE \(x''+x = 0\) are in the form

\[ x(t) = c_1 \cos t + c_2 \sin t. \]

We can describe \(x\) as belonging to a solution space of the ODE that is spanned by \(\cos(t)\) and \(\sin(t)\). This space supports linear combinations of its members, just like the other vector spaces. We could set up a correspondence with \(\real^2\), but that’s not as useful as the ones we’ve seen so far. For instance, \(\cos(x+\pi)\) is equal to \(-\cos(x)\), which is not an easy connection to make outside the context of trigonometric functions.

4.8 Coordinates

When we want to specify a vector in, say, \(\real^3\), we usually give its three elements. But there are infinitely many different ways to express a given vector.

Example 4.20 If we specify \(\bfv = \threevec{2}{-1}{3}\), we are really making a statement about a linear combination of standard vectors in \(\real^3\):

\[ \bfv = 2\threevec{1}{0}{0} - \threevec{0}{1}{0} + 3\threevec{0}{0}{1} = 2\bfe_1 - \bfe_2 + 3\bfe_3. \]

We can get a different expression for \(\bfv\) if we use the basis \(\bfu_1 = \threevec{1}{1}{0}\), \(\bfu_2 = \threevec{0}{1}{1}\), \(\bfu_3 = \threevec{1}{0}{1}\), by solving for the coefficients in

\[ \bfv = x_1 \bfu_1 + x_2 \bfu_2 + x_3 \bfu_3. \]

This equation is a linear system \(\mathbf{U} \bfx = \bfv\), specifically,

\[ \begin{bmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 0 & 1 & 1 \end{bmatrix} \bfx = \threevec{2}{-1}{3}. \]

Given that \(\bfu_1, \bfu_2, \bfu_3\) form a basis of \(\real^3\), we know that \(\rank(\mathbf{U})=3\); therefore, the FTLA guarantees that a unique solution \(\bfx\) exists.

Definition 4.16 (Coordinates) In an \(n\)-dimensional vector space, the coordinates of a vector \(\bfv\) relative to a basis \(\bfu_1, \ldots, \bfu_n\) are the unique coefficients in the expression

\[ \bfv = x_1 \bfu_1 + \cdots + x_n \bfu_n. \tag{4.9}\]

If we name the basis as \(B\), then we write the coordinates relative to \(B\) as

\[ \bigl[ \bfv \bigr]_B = \begin{bmatrix} x_1 \\ x_2 \\ \cdots \\ x_n \end{bmatrix}. \]

The standard coordinates of a vector are its coordinates relative to the standard basis. We use \(E\) to denote the standard basis, with the vector space it refers to implied by context.

4.8.1 Computing coordinates

Here is a typical example.

Example 4.21 Find the coordinates of \(\bfv = \twomat{2}{-1}{-1}{3}\) relative to the basis

\[ B = \twomat{1}{1}{0}{1},\; \twomat{1}{0}{1}{1}, \; \twomat{0}{1}{0}{0}, \; \twomat{1}{1}{1}{0}. \]

Solution. If we write out Equation 4.9, we get 4 linear equations for \(x_1,\ldots,x_4\):

\[ \begin{bmatrix} 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 \end{bmatrix} \bfx = \begin{bmatrix} 2 \\ -1 \\ -1 \\ 3 \end{bmatrix}. \]

(Note that each column of the coordinate matrix is the vec() of a basis member.) The solution of the system gives the coordinate vector:

\[ \bigl[ \bfv \bigr]_B = \bfx = \begin{bmatrix} 3 \\ 0 \\ -3 \\ -1 \end{bmatrix}. \]

There’s some subtle sleight-of-hand here. Theoretically, a vector exists independently of any of the infinitely many ways we might choose to express it. But in practice, we often have to pick a coordinate system in order to work with that vector. Usually, that’s the standard basis, and we don’t even think of a vector as being different from its standard coordinates. Even though it isn’t stated, the problem statement in Example 4.21 is actually giving not \(\bfv\) but \(\bigl[ \bfv \bigr]_E\), and the same is true for the basis members.

I’m going to introduce some nonstandard notation here that I think helps with these sorts of problems—and what is to come.

Definition 4.17 (Standard matrix) The standard matrix of a basis \(B\) is the matrix whose columns are the basis vectors as expressed in standard coordinates. We denote it as \(\bigl[ B \bigr]_E\).

To be fully pedantic, if \(B=\bfu_1,\ldots,\bfu_n\) is a basis of an \(n\)-dimensional vector space, then

\[ \bigl[ B \bigr]_E = \begin{bmatrix} \bigl[ \bfu_1 \bigr]_E & \cdots & \bigl[ \bfu_n \bigr]_E \end{bmatrix}. \]

This notation allows us to write the linear system in Example 4.21, for instance, in an abstract form.

Theorem 4.16 If \(B\) is a basis of a vector space and \(\bfv\) is a vector in that space, then

\[ \bigl[ B \bigr]_E \cdot \bigl[\bfv\bigr]_B = \bigl[ \bfv \bigr]_E. \tag{4.10}\]

Note the cancellation pattern in the above:

\[ \bigl[ \cancel{B} \bigr]_E \cdot \bigl[\bfv\bigr]_{\cancel{B}} = \bigl[ \bfv \bigr]_E. \]

4.9 Change of basis

As stated earlier, in order to work with a vector space, we usually have to refer to a coordinate system, and the standard basis provides the default. Sometimes, though, we’d like to work with alternative bases.

Suppose \(B\) and \(C\) are bases for the same space. We can apply Equation 4.10 twice to get

\[ \bigl[ B \bigr]_E \cdot \bigl[\bfv\bigr]_B = \bigl[ \bfv \bigr]_E \]

and

\[ \bigl[ C \bigr]_E \cdot \bigl[\bfv\bigr]_C = \bigl[ \bfv \bigr]_E \quad. \]

This implies

\[ \bigl[ B \bigr]_E \cdot \bigl[\bfv\bigr]_B = \bigl[ C \bigr]_E \cdot \bigl[\bfv\bigr]_C \quad. \tag{4.11}\]

The standard matrices are invertible (hello, basis!), and we could write

\[ \bigl[\bfv\bigr]_B = \Bigl( \bigl[ B \bigr]_E^{-1} \cdot \bigl[ C \bigr]_E \Bigr) \bigl[\bfv\bigr]_C \quad. \]

This equation allows us to change from \(C\)-coordinates to \(B\)-coordinates with a single matrix-vector multiplication. Such a matrix represents a change of basis.

Here are the key results.

Definition 4.18 The coordinate matrix of a basis \(C\) relative to a basis \(B\) of the same vector space is the matrix \(\bigl[ C \bigr]_B\) whose columns are the basis vectors of \(C\) expressed in \(B\)-coordinates. That is, if \(C\) is \(\bfu_1,\ldots,\bfu_n\), then

\[ \bigl[ C \bigr]_B = \begin{bmatrix} \bigl[ \bfu_1 \bigr]_B & \cdots & \bigl[ \bfu_n \bigr]_B \end{bmatrix}. \]

Theorem 4.17 (Change of basis) Suppose that \(B\) and \(C\) are bases of the same vector space. Then

\[ \bigl[ C \bigr]_B = \bigl[ B \bigr]_E^{-1} \cdot \bigl[ C \bigr]_E\quad , \tag{4.12}\]

where \(E\) is the standard basis of the space, and

\[ \bigl[\bfv\bigr]_B = \bigl[ C \bigr]_B \cdot \bigl[\bfv\bigr]_C \quad. \tag{4.13}\]

Both Equation 4.13 and Equation 4.12 have the same cancellation pattern as Equation 4.10, if we allow for the inverse of a matrix to swap the roles of upper/lower position. This is clearer if we rewrite Equation 4.12 as

\[ \bigl[ \cancel{B} \bigr]_{E} \cdot \bigl[ C \bigr]_{\cancel{B}} = \bigl[ C \bigr]_E\quad . \]

Example 4.22 Find the matrix that changes coordinates from the \(\mathcal{P}_1\) basis \(B = 1+t\), \(1-t\) to the basis \(C=1\), \(1+2t\).

Solution. The first step is to translate everything into \(\real^2\):

\[ \begin{split} B &= \twovec{1}{1}, \; \twovec{1}{-1} \\ C &= \twovec{1}{0}, \; \twovec{1}{2}. \end{split} \]

The standard matrices are therefore

\[ \bigl[ B \bigr]_E = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \quad \bigl[ C \bigr]_E= \begin{bmatrix} 1 & 1 \\ 0 & 2 \end{bmatrix}. \]

The change of basis matrix is

\[ \begin{split} \bigl[ B \bigr]_C &= \begin{bmatrix} 1 & 1 \\ 0 & 2 \end{bmatrix}^{-1} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \\[1em] &= \frac{1}{2} \begin{bmatrix} 2 & -1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \\[1em] &= \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}. \end{split} \]

Example 4.23 Given the bases \(B = \threevec{1}{1}{1}\), \(\threevec{1}{0}{1}\), \(\threevec{1}{1}{0}\) and \(C = \threevec{1}{0}{-1}\), \(\threevec{0}{1}{1}\), \(\threevec{0}{2}{1}\) of \(\real^3\), and a vector whose \(C\)-coordinates are \(\threevec{1}{2}{3}\), find its \(B\)-coordinates.

Solution. The hard way to solve this problem would be to first apply Theorem 4.17 to find the change-of-basis matrix \(\bigl[ B \bigr]_C\). That would require finding the inverse of a \(3\times 3\) matrix.

The easy way is to use the standard basis as an intermediary, like in Equation 4.11:

\[ \begin{split} \bigl[ B \bigr]_E \cdot \bigl[\bfv\bigr]_B &= \bigl[ C \bigr]_E \cdot \bigl[\bfv\bigr]_C \\[1em] \begin{bmatrix} 1 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix} \cdot \bigl[\bfv\bigr]_B &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 2 \\ -1 & 1 & 1 \end{bmatrix} \cdot \threevec{1}{2}{3} \\[1em] \begin{bmatrix} 1 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix} \cdot \bigl[\bfv\bigr]_B &= \threevec{1}{8}{4} \end{split} \]

This has the solution \(\bigl[\bfv\bigr]_B = \threevec{11}{-7}{-3}\).