3 Linear algebraic systems

\[ \newcommand{\float}{\mathbb{F}} \newcommand{\real}{\mathbb{R}} \newcommand{\complex}{\mathbb{C}} \newcommand{\nat}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\bfa}{\mathbf{a}} \newcommand{\bfb}{\mathbf{b}} \newcommand{\bfc}{\mathbf{c}} \newcommand{\bfe}{\mathbf{e}} \newcommand{\bff}{\mathbf{f}} \newcommand{\bfh}{\mathbf{h}} \newcommand{\bfp}{\mathbf{p}} \newcommand{\bfq}{\mathbf{q}} \newcommand{\bfu}{\mathbf{u}} \newcommand{\bfv}{\mathbf{v}} \newcommand{\bfw}{\mathbf{w}} \newcommand{\bfx}{\mathbf{x}} \newcommand{\bfy}{\mathbf{y}} \newcommand{\bfz}{\mathbf{z}} \newcommand{\bfA}{\mathbf{A}} \newcommand{\bfB}{\mathbf{B}} \newcommand{\bfD}{\mathbf{D}} \newcommand{\bfE}{\mathbf{E}} \newcommand{\bfP}{\mathbf{P}} \newcommand{\bfV}{\mathbf{V}} \newcommand{\bfW}{\mathbf{W}} \newcommand{\bfX}{\mathbf{X}} \newcommand{\fundm}{\boldsymbol{\Phi}} \newcommand{\bfzero}{\boldsymbol{0}} \newcommand{\bfmu}{\boldsymbol{\mu}} \newcommand{\opA}{\mathcal{A}} \newcommand{\rmn}[2]{\mathbb{R}^{#1 \times #2}} \newcommand{\cmn}[2]{\mathbb{C}^{#1 \times #2}} \newcommand{\dd}[2]{\frac{d #1}{d #2}} \newcommand{\ddd}[2]{\frac{d^2 #1}{d #2^2}} \newcommand{\pp}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\norm}[1]{\left\lVert \mathstrut #1 \right\rVert} \newcommand{\abs}[1]{\left\lvert \mathstrut #1 \right\rvert} \newcommand{\twonorm}[1]{\norm{#1}_2} \newcommand{\onenorm}[1]{\norm{#1}_1} \newcommand{\infnorm}[1]{\norm{#1}_\infty} \newcommand{\innerprod}[2]{\langle #1,#2 \rangle} \newcommand{\pr}[1]{^{(#1)}} \newcommand{\diagm}[3]{\begin{bmatrix} #1 & & & \\ & #2 & & \\ & & \ddots & \\ & & & #3 \end{bmatrix}} \newcommand{\twovec}[2]{\begin{bmatrix} #1 \\ #2 \end{bmatrix}} \newcommand{\threevec}[3]{\begin{bmatrix} #1 \\ #2 \\ #3 \end{bmatrix}} \newcommand{\twomat}[4]{\begin{bmatrix} #1 & #2 \\ #3 & #4 \end{bmatrix}} \newcommand{\twodet}[4]{\begin{vmatrix} #1 & #2 \\ #3 & #4 \end{vmatrix}} \newcommand{\eye}[1]{\mathbf{e}_#1} \newcommand{\meye}{\mathbf{I}} \newcommand{\diag}{\operatorname{diag}} \newcommand{\sign}{\operatorname{sign}} \newcommand{\dist}{\operatorname{dist}} \newcommand{\simil}{\operatorname{sim}} \newcommand{\vec}{\operatorname{vec}} \newcommand{\unvec}{\operatorname{unvec}} \newcommand{\ee}{\times 10^} \newcommand{\floor}[1]{\lfloor#1\rfloor} \newcommand{\argmin}{\operatorname{argmin}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\span}{\operatorname{span}} \newcommand{\nullsp}{\operatorname{nullsp}} \newcommand{\nullity}{\operatorname{nullity}} \newcommand{\rowsp}{\operatorname{rowsp}} \newcommand{\colsp}{\operatorname{colsp}} % \newcommand{\dimen}{\operatorname{dim}} \newcommand{\augmat}[2]{\left[ #1 \;\middle|\; #2 \right]} \]

3.1 Overview

It might feel silly, but let’s review what it means to solve the linear equation

\[ax = b\]

for \(x\).

If \(a\neq 0\), there is a unique solution, \(x=b/a\).
Otherwise,
- If also \(b=0\), then every value of \(x\) is a valid solution.
- Otherwise, there are no solutions.

It turns out that no matter how many equations and variables we have, the only three possibilities are the ones above: zero, one, or infinitely many solutions. The main difference is that the condition “\(a=0\)” has to be generalized.

Definition 3.1 (Consistent linear system) A linear system that has at least one solution is called consistent. Otherwise, it is inconsistent.

3.1.1 Two variables

Going one more baby step to two equations in two variables, we want to solve

\[ \begin{split} ax + by &= f, \\ cx + dy &= g. \end{split} \]

There is some easy geometric intuition here. Each equation represents a straight line in the plane, and solving both equations simultaneously means finding an intersection of these lines. Our cases above translate directly into:

If the lines are not parallel, there is a unique solution.
Otherwise,
- If the lines are identical, there are infinitely many solutions.
- Otherwise, there are no solutions.

It’s not hard to turn those statements into algebraic conditions. The slopes of the two lines are (ignoring infinities for a moment) \(-a/b\) and \(-c/d\), and we can say the slopes are equal if and only if \(ad=bc\).

If \(ad \neq bc\), there is a single solution.
Otherwise,
- If one equation is a multiple of the other, there are infinitely many solutions.
- Otherwise, there are no solutions.

Example 3.1 Find all solutions to the equations

\[ \begin{split} x - 3y & = 1, \\ -2x + 6y & = 2. \end{split} \]

Solution. We identify \((a,b,c,d)=(1,-3,-2,6)\), and \(ad-bc=6-6=0\). So we know there is not a unique solution (i.e., the lines are parallel). Dividing the second equation by \(-2\) leads to the equivalent system

\[ \begin{split} x - 3y & = 1, \\ x - 3y & = -1. \end{split} \]

It’s now clear that there is no way to satisfy both equations simultaneously. The system is inconsistent.

3.1.2 General case

Time to put on our big-kid pants. For \(m\) equations in \(n\) variables, we need to use subscripts rather than different letters for everything.

Definition 3.2 A linear-algebraic system, usually just written linear system, is the set of simultaneous equations \[ \begin{split} A_{11} x_1 + A_{12} x_2 + \cdots + A_{1n} x_n & = b_1 \\ A_{21} x_1 + A_{22} x_2 + \cdots + A_{2n} x_n & = b_2 \\ & \vdots \\ A_{m1} x_1 + A_{m2} x_2 + \cdots + A_{mn} x_n & = b_m, \end{split} \tag{3.1}\] where all the \(A_{ij}\) and \(b_i\) are assumed to be known. A solution of the system is a list of values \(x_1,\ldots,x_n\) that makes all the equations true.

We want to gather similar elements in Equation 3.1 into collective mathematical objects.

Definition 3.3 (Vector) A vector is a finite collection of numbers known as its elements. The set of all vectors with \(n\) real-valued elements is denoted \(\real^n\). If the elements are complex numbers, we use the symbol \(\complex^n\).

I use boldfaced lowercase letters to represent vectors, and a subscript to refer to an individual element within one. For instance, \(x_3\) is the third element of a vector \(\bfx\).

Next, we come to the doubly-indexed values \(A_{ij}\). The first index is the equation number, and the second corresponds to the variable number. This implies a 2D table or array of values.

Definition 3.4 (Matrix) A matrix is an \(m\times n\) array of numbers known as its elements. The set of all \(m\times n\) matrices with real elements is denoted \(\rmn{m}{n}\), and with complex elements it’s \(\cmn{m}{n}\).

I use boldfaced uppercase letters to represent matrices, and a pair of subscripts to refer to an individual element within one. For instance, \(A_{23}\) is the element in row 2, column 3 of the matrix \(\bfA\).

Important

Matrix subscripts are always in the order row first, column second. (This is the opposite of Excel—which is just one of myriad reasons that Excel is evil.)

In the context of Equation 3.1, we call the \(m\times n\) matrix \(\bfA\) the coefficient matrix of the linear system.

Important

The rows of the coefficient matrix correspond to the equations in the linear system, and the columns correspond to the variables of the system.

The vector \(\bfb\in\real^m\) implied in Equation 3.1 doesn’t have a standard name, but I will call it the forcing vector by analogy with linear ODEs. Altogether, \(\bfA\) and \(\bfb\) are the data of the linear system, and the vector \(\bfx \in \real^{n}\) is the solution.

Example 3.2 Sometimes zeros are needed as placeholders in the coefficient matrix. In the linear system

\[ \begin{split} x_1 - x_2 + 3 x_3 &= 4,\\ 2x_2 + 5x_4 &= -1, \end{split} \]

the coefficient matrix is \(2\times 4\),

\[ \bfA = \begin{bmatrix} 1 & -1 & 3 & 0 \\ 0 & 2 & 0 & 5 \end{bmatrix}, \]

and the right-side vector is \(\twovec{4}{-1}\).

3.2 Row elimination

You’ve probably solved small systems of equations by substitution. In order to solve systems with more equations and variables, we systematize this idea as an elimination algorithm. The goal of elimination is to transform the system to an equivalent one whose solution(s) we can deduce easily. There are three legal moves in this game:

Definition 3.5 (Row operations) The following operations do not change the solutions of a linear system:

Swap the positions of two equations.
Multiply an equation by a nonzero constant.
Add a multiple of one equation to another equation.

Note

Recall that the equations of a linear system correspond to rows of the coefficient matrix \(\bfA\) and forcing vector \(\bfb\), hence the name “row operations.”

Example 3.3 Let’s use elimination to solve the \(3\times 3\) system

\[ \begin{split} x_1 - x_2 - x_3 & = 2, \\ 3x_1 - 2x_2 & = 9, \\ x_1 - 2x_2 - x_3 & = 5. \end{split} \]

The first step is to use the first equation to eliminate \(x_1\) from the second and third equations. We therefore add \(-3\) times equation 1 to equation 2, and \(-1\) times equation 1 to equation 3:

\[ \begin{split} x_1 - x_2 - x_3 & = 2, \\ (3x_1 - 2x_2) - 3(x_1 - x_2 - x_3) & = 9 - 3(2),\\ (x_1 - 2x_2 - x_3) - 1(x_1 - x_2 - x_3) & = 5 - 1(2) . \end{split} \]

This takes us to \[ \begin{split} x_1 - x_2 - x_3 & = 2, \\ x_2 + 3x_3 &= 3, \\ -x_2 &= 3. \end{split} \]

Note

It’s tempting to grab that last equation above and use it to remove \(x_2\) from everything else. That move certainly works out here. But in the context of a fully automatic algorithm that works every time, we will continue as though the last equation still might contain a dependence on \(x_3\).

Next, we leave the first equation alone and use the second to eliminate \(x_2\) from all the others below it.

\[ \begin{split} x_1 - x_2 - x_3 & = 2, \\ x_2 + 3x_3 & = 3, \\ (-x_2) + (x_2+3x_3) & = 3 + 3. \end{split} \]

We now have the system, equivalent to the original one, comprising

\[ \begin{split} x_1 - x_2 - x_3 & = 2, \\ x_2 + 3x_3 & = 3, \\ 3x_3 & = 6. \end{split} \]

We can now deduce that \(x_3=2\) from the last equation. Moving up to the second equation, we then find that \(x_2=3-3(2)=-3\). Finally, the first equation yields \(x_1=2+(-3)+(2)=1\).

The process in Example 3.3 leading to triangular system is commonly known as Gaussian elimination. (It’s a poor name, as the process was known at least in China thousands of years before Gauss.) Finding the solution from the triangular form is called backward substitution.

3.2.1 Augmented matrix

Before looking further into the elimination process, we will lighten the notational load by using matrices.

Definition 3.6 The augmented matrix of the linear system Equation 3.1 is the \(m\times (n+1)\) matrix \(\mathbf{G} = \augmat{\bfA}{\bfb}\).

Example 3.4 The starting system in Example 3.3 has augmented matrix

\[ \augmat{\begin{matrix} 1 & -1 & -1 \\ 3 &-2 & 0 \\ 1 & -2 & -1 \end{matrix}}{ \begin{matrix}2\\9\\5\end{matrix} }. \]

The final system in that example has augmented matrix \[ \augmat{\begin{matrix} 1 & -1 & -1 \\ 0 & 1 & 3 \\ 0 & 0 & 3 \end{matrix}}{ \begin{matrix}2\\3\\6\end{matrix} }. \]

The nonzeros in the final augmented matrix have a triangular shape that is the key to making back substitution work. Next, we will get specific about what this form requires.

3.2.2 Row echelon form

Here is a formal description of the goal of row elimination.

Definition 3.7 (Row echelon form (RE form)) A matrix is in RE form if:

Any rows that are completely zero are at the bottom.
The leftmost nonzero element in a row, called the leading element, is to the left of any other leading elements below it.

Example 3.5 The following matrices are not in RE form:

\[ \begin{bmatrix} 2 & 0 & 1 & -1 \\ 0 & 1 & 0 & 0 \\ 0 & -4 & 1 & 0 \end{bmatrix} \] (leading nonzeros do not move to the right from row 2 to 3)

\[ \begin{bmatrix} 2 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & -4 & 1 & 0 \end{bmatrix} \] (zero row not at the bottom)

To achieve RE form by row operations on a given starting matrix, we start at the top left and use row operations to introduce zeros under the leading element of the first row. We then move down one row and repeat. It’s easier to understand from more examples than to write out a formal algorithm.

Example 3.6 We start from \[ \begin{bmatrix} 1 & 2 & -4 & 5 \\ 2 & 4 & 0 & 2 \\ 2 & 3 & 2 & 5 \\ -1 & 1 & 3 & 5 \end{bmatrix} \]

In the first row, the 1 will be the leading element. We then multiples of the first row to create zeros in the rows underneath it.

\[ \begin{split} & \stackrel{R_2=R_2 - 2R_1}{\Longrightarrow} \begin{bmatrix} 1 & 2 & -4 & 5 \\ 0 & 0 & 8 & -8 \\ 2 & 3 & 2 & 5 \\ -1 & 1 & -8 & 5 \end{bmatrix} \\[1ex] & \stackrel{R_3=R_3 - 2R_1}{\Longrightarrow} \begin{bmatrix} 1 & 2 & -4 & 5 \\ 0 & 0 & 8 & -8 \\ 0 & -1 & 10 & -5 \\ -1 & 1 & -8 & 5 \end{bmatrix} \\[1ex] & \stackrel{R_4=R_4 + 1R_1}{\Longrightarrow} \begin{bmatrix} 1 & 2 & -4 & 5 \\ 0 & 0 & 8 & -8 \\ 0 & -1 & 10 & -5 \\ 0 & 3 & -12 & 10 \end{bmatrix} \end{split} \]

Now that leading 1 in the first row is all set. From now on, we can safely ignore the first column, as those zeros will remain throughout.

Moving to the second row, we notice that if we designated the 8 to be the leading element, then at least one of the rows underneath it would have a leading element to its left. That’s a violation of RE form. So we can swap either of those rows with the current row 2:

\[ \stackrel{R_2 \leftrightarrow R_3}{\Longrightarrow} \begin{bmatrix} 1 & 2 & -4 & 5 \\ 0 & -1 & 10 & -5 \\ 0 & 0 & 8 & -8 \\ 0 & 3 & -12 & 10 \end{bmatrix} \]

We now can select \(-1\) as the leading element in row 2, and create zeros underneath it. Of course, we don’t have any work to do in the current row 3.

\[ \stackrel{R_4 = R_4 + 3R_2}{\Longrightarrow} \begin{bmatrix} 1 & 2 & -4 & 5 \\ 0 & -1 & 10 & -5 \\ 0 & 0 & 8 & -8 \\ 0 & 0 & 18 & -5 \end{bmatrix} \]

Our first two rows are set. We’re happy choosing \(8\) as the leading element in row 3, and one more operation is needed to clear the element below it:

\[ \stackrel{R_4 = R_4 - \frac{18}{8}R_3}{\Longrightarrow} \begin{bmatrix} 1 & 2 & -4 & 5 \\ 0 & -1 & 10 & -5 \\ 0 & 0 & 8 & -8 \\ 0 & 0 & 0 & 13 \end{bmatrix} \]

The matrix is now in RE form.

If our starting point is the augmented matrix of a linear system, then our new linear system would be

\[ \begin{split} 1x_1 + 2x_2 -4x_3 &= 5 \\ 0x_1 - 1x_2 +10x_3 &= -5 \\ 0x_1 + 0x_2 +8x_3 &= -8 \\ 0x_1 + 0x_2 + 0x_3 &= 13 \end{split} \]

That last equation is a dealbreaker! The system is inconsistent.

Definition 3.8 (Pivots) The leading elements encountered during row elimination are called pivots. The columns they occur in are called leading variables, while the other variables (if any) are called free variables.

Example 3.7 Suppose we begin with the augmented matrix \[ \augmat{\begin{matrix} 1 & 1 & -1 \\ 1 & 2 & 2 \\ 2 & 1 & -5 \end{matrix}}{ {4}\\ {3}\\ {9} }. \]

The first 1 in the first row is a pivot, leading to

\[ \stackrel{R_2 = R_2 - R_1;\; R_3 = R_3 - 2R_1}{\Longrightarrow} \augmat{\begin{matrix} 1 & 1 & -1 \\ 0 & 1 & 3 \\ 0 & -1 & -3 \end{matrix}}{ \begin{matrix}4\\-1\\1\end{matrix} } \]

The leading 1 in the second row is the next pivot:

\[ \stackrel{R_3 = R_3 + R_2}{\Longrightarrow} \augmat{\begin{matrix} 1 & 1 & -1 \\ 0 & 1 & 3 \\ 0 & 0 & 0 \end{matrix}}{ \begin{matrix}4\\-1\\0\end{matrix} }. \]

Only the first two variables are considered leading variables, and \(x_3\) remains free. So we say that \(x_3=t\) for any arbitrary value of \(t\). The last row of the final augmented matrix represents the equation \(0=0\), which is true and implies no constraint. The second row implies

\[ x_2 + 3x_3 = -1, \]

which is equivalent to \(x_2=-1-3t\). Finally, the first row says that

\[ x_1 + x_2 - x_3 = 4, \]

so that \(x_1 = 4 - (-1-3t) + (t) = 5+4t\). The system has infinitely many solutions.

3.3 RRE form

An alternative to backward substitution is to transform the RE form further into something even simpler.

Definition 3.9 (RRE form) A matrix is in RRE form (reduced row-echelon form) if it meets all of these requirements:

Any rows of all zeros appear below all nonzero rows.
The leading element of any row is a one.
Every leading 1 that is lower than another leading 1 is also to the right of it.
Every leading 1 is the only nonzero in its column.

The columns that have leading ones are called pivot columns. The other columns are called free columns.

There are two things that make RRE form useful. One is that it makes a unique target:

Theorem 3.1 Every matrix is equivalent to a unique matrix in RRE form.

The other useful fact is that RRE form completely exposes everything we want to know about a linear system. In fact, everything else we will do theoretically with matrices comes back to the RRE form, whether or not it’s an augmented matrix.

Example 3.8 At the end of Example 3.3, we had reached the augmented matrix

\[ \augmat{\begin{matrix} 1 & -1 & -1 \\ 0 & 1 & 3 \\ 0 & 0 & 3 \end{matrix}}{ \begin{matrix}2\\3\\6\end{matrix} }. \]

Note that the leading element of row 3 is not a 1. We fix that by multiplying it through:

\[ \stackrel{R_3 = \frac{1}{3}R_3}{\Longrightarrow} \augmat{\begin{matrix} 1 & -1 & -1 \\ 0 & 1 & 3 \\ 0 & 0 & 1 \end{matrix}}{ \begin{matrix}2\\3\\2\end{matrix} }. \]

The leading one in row 1 is alone in its column. But the other two leading ones are not. We start at the bottom right and eliminate upwards now:

\[ \begin{split} & \stackrel{R_2 = R_2 - 3R_3}{\Longrightarrow} \augmat{\begin{matrix} 1 & -1 & -1 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}}{ \begin{matrix}2\\-3\\2\end{matrix} } \\[1ex] & \stackrel{R_1 = R_1 + 1R_3}{\Longrightarrow} \augmat{\begin{matrix} 1 & -1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}}{ \begin{matrix}4\\-3\\2\end{matrix} }. \end{split} \]

At this point, the leading one of row 3 is ready to go. We have just one more upwards step to perform from row 2:

\[ \stackrel{R_1 = R_1 + 1R_2}{\Longrightarrow} \augmat{\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}}{ \begin{matrix}1\\-3\\2\end{matrix} }. \]

The system is now trivial: \(x_1=1\), \(x_2=-3\), and \(x_3=2\). Done!

The process above is called Gauss–Jordan elimination, to distinguish it from Gaussian elimination.

Tip

You can use Wolfram Alpha to find the RRE form of a matrix.

Caution

While MATLAB has an rref command, it should not be relied on. It isn’t really MATLAB’s fault; the RRE form is a terrible target for numerical computations, because it is infinitely sensitive to rounding errors, and MATLAB calculates using numerical approximations for speed.

3.3.1 Solution from the RRE form

The RRE form of an augmented matrix represents a linear system that we can solve by inspection.

Theorem 3.2 To solve a system with augmented matrix in RRE form:

Ignore all zero rows.
If a leading one occurs in the last column, the system is inconsistent.
Otherwise, each variable associated with a free column is assigned to a free parameter (e.g., \(s\), \(t\), etc.).
Use the leading columns to solve for their corresponding variables in terms of the free parameters.

Example 3.9 In Example 3.7, we reduced a linear system to the RE augmented matrix

\[ \augmat{\begin{matrix} 1 & 1 & -1 \\ 0 & 1 & 3 \\ 0 & 0 & 0 \end{matrix}}{ \begin{matrix}4\\-1\\0\end{matrix} }. \]

To put this matrix into RRE form, we can use the second row to put a zero above its leading 1:

\[ \stackrel{R_1 = R_1 - R_2}{\Longrightarrow} \augmat{\begin{matrix} 1 & 0 & -4 \\ 0 & 1 & 3 \\ 0 & 0 & 0 \end{matrix}}{ \begin{matrix}5\\-1\\0\end{matrix} }. \]

This is the system

\[ \begin{split} x_1 - 4x_3 &= 5 \\ x_2 + 3x_3 &= -1, \end{split} \]

in which \(x_3=t\) is a free variable. Thus, \(x_2=-1-3t\) and \(x_1=5+4t\), which is the same conclusion we reached in Example 3.7 by backward substitution.

Example 3.10 A linear system has an augmented matrix equivalent to the RRE matrix

\[ \augmat{ \begin{matrix} 0 & 1 & -3 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix} }{ \begin{matrix}2 \\ -4 \\ 0 \\ 0 \end{matrix} } \]

Find all solutions to the linear system.

Solution. The last two rows have no information and can be ignored. Columns 2 and 4 are pivot columns. We give variables from the other columns arbitrary values, \(x_1=s\) and \(x_3=t\). The pivot variables are solved in terms of these to get \(x_2=2+3t\) and \(x_4=-4\). All solutions are expressed as the vector

\[ \bfx = \begin{bmatrix} s \\ 2+3t \\ t \\ -4 \end{bmatrix}, \qquad s,t\in \real. \]

Example 3.11 A linear system has an augmented matrix equivalent to the RRE form

\[ \augmat{ \begin{matrix} 1 & 0 & -4 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{matrix}}{ \begin{matrix}0 \\ 0 \\ 0 \\ 1 \end{matrix} } \]

Find all solutions to the linear system.

Solution. Look at the last row of the system. It expresses the equation \(0=1\), which is impossible to satisfy. Thus the system is inconsistent.

3.3.2 Rank

There are only three possible outcomes for a linear system, all deducible from the RRE form of the augmented matrix:

There is a leading 1 in the last column, in which case there are no solutions.
There are fewer pivot columns than variables, in which case there are infinitely many solutions.
There is a unique solution.

Really, these come down to counting the number of leading ones/pivot columns.

Definition 3.10 (Rank) The rank of a matrix is the number of pivot columns in its RRE or any equivalent RE form.

Theorem 3.3 If \(\bfA\) is an \(m\times n\) matrix, then \(\rank(\bfA) \le m\) and \(\rank(\bfA) \le n\).

Proof. Each leading one in the RRE form requires a row and column of its own.

Theorem 3.4 Suppose \(\bfA\) is \(m\times n\). If \(\rank(\bfA)<n\), then the linear system with augmented matrix \(\augmat{\bfA}{\bfb}\) cannot have a unique solution. It is inconsistent for some choices of \(\bfb\) and has infinitely many solutions for other choices of \(\bfb\).

Proof. Let \(r=\rank(\bfA)\). Then the RRE form of \(\bfA\) has all zero elements in rows \(r+1\) to \(m\). Now augment \(\bfA\) with \(\bfb\) and perform the row operations that reduce \(\bfA\) to RRE form. If there are any nonzeros in rows \(r+1\) to \(m\) of the last column, the system is inconsistent. Otherwise, the system has at least one free variable. We can start with either type of outcome and run the row elimination process backward to find a \(\bfb\) that led to it.

Corollary 3.1 A linear system with more variables than equations cannot have a unique solution.

Proof. \(\rank(\bfA) \le m < n\).

3.4 Linear combinations

There is a different perspective and terminology that is equivalent to linear systems and statements about them. In fact, we first saw this terminology for solutions of linear ODEs.

Definition 3.11 (Linear combination) A linear combination of vectors \(\bfv_1, \ldots, \bfv_k\) with coefficients \(c_1,\ldots,c_k\) is the vector

\[ c_1 \bfv_1 + c_2 \bfv_2 + \cdots + c_k \bfv_k. \]

Example 3.12 Is \(\threevec{1}{0}{3}\) a linear combination of \(\threevec{1}{2}{3}\) and \(\threevec{-1}{1}{-3}\)?

Solution. To rephrase the question: do there exist numbers \(x_1\) and \(x_2\) so that

\[ \threevec{1}{0}{3} \stackrel{?}{=} x_1 \threevec{1}{2}{3} + x_2 \threevec{-1}{1}{-3}. \]

This is the same as

\[ \threevec{1}{0}{3} \stackrel{?}{=} x_1 \threevec{1x_1 - 1x_2}{2x_1+ 1x_2}{3x_1 - 3x_2}. \]

This is equivalent to the linear system

\[ \begin{split} x_1 - x_2 &= 1 \\ 2x_1 + x_2 &= 0 \\ 3x_1 - 3x_2 &= 3, \end{split} \]

which has the augmented matrix

\[ \augmat{\begin{matrix} 1 & -1 \\ 2 & 1 \\ 3 & -3 \end{matrix}}{ \begin{matrix}1\\0\\3\end{matrix} }. \]

The RRE form of this matrix is
\[ \stackrel{RRE}{\Longrightarrow} \augmat{\begin{matrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{matrix}}{ \begin{matrix}\frac{1}{3}\\-\frac{2}{3}\\0\end{matrix} }, \]

which implies that the system is consistent and has a unique solution. Therefore, the answer to the original question is yes, and we found out also that is \(x_1=\frac{1}{3}\) and \(x_2=-\frac{2}{3}\).

The preceding example is summarized as the following.

Theorem 3.5 The vector \(\bfb\) is a linear combination of \(\bfv_1,\ldots,\bfv_k\) if and only if the system with augmented matrix \(\augmat{\bfA}{\bfb}\) is consistent, where \(\bfA\) is the matrix whose columns are \(\bfv_1,\ldots,\bfv_k\).

3.4.1 Span

Definition 3.12 The set of all linear combinations of \(\bfv_1,\ldots,\bfv_k\) is known as the span of those vectors. The span of an empty set is the singleton set \(\{\bfzero\}\).

For instance, in Example 3.12, we could replace the question of “Is \(\bfb\) equal to a linear combination of \(\bfv_1\) and \(\bfv_2\)?” with, “Is \(\bfb\) in the span of \(\bfv_1\) and \(\bfv_2\)?”

We can now restate Theorem 3.5 as follows.

Theorem 3.6 Let \(\bfa_1,\ldots,\bfa_n\) be vectors in \(\real^m\), and let

\[ \bfA = \begin{bmatrix} \bfa_1 & \bfa_2 & \cdots & \bfa_n \end{bmatrix} \in \rmn{m}{n}. \]

Then a vector \(\bfb \in \real^m\) lies in the span of \(\bfa_1,\ldots,\bfa_n\) if and only if the system with augmented matrix \(\augmat{\bfA}{\bfb}\) is consistent.

Example 3.13 Show that \(\threevec{1}{4}{3}\) is in the span of \(\bfu_1=\threevec{1}{-2}{-1}\) and \(\bfu_2=\threevec{3}{0}{1}\).

Solution. The goal is equivalent to showing that the linear system with augmented matrix

\[ \augmat{\begin{matrix} 1 & 3 \\ -2 & 0 \\ -1 & 1 \end{matrix}}{ \begin{matrix}1\\4\\3\end{matrix} } \]

is consistent. The RRE form of this matrix is

\[ \stackrel{RRE}{\Longrightarrow} \augmat{ \begin{matrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{matrix}}{ \begin{matrix} -2 \\ 1 \\ 0 \end{matrix} } \]

which is consistent, so we are done.

Looking carefully at Example 3.13, we can deduce a general statement. Because the coefficient matrix describing the span of the two vectors is taller than it is wide, there will be at least one row that is zero to the left of the augmented-matrix bar. The question really came down to whether the element to the right of that bar is also zero. That was the case for the specific vectors in the example, but it won’t always be. Geometrically, two vectors in \(\real^3\) span a plane through the origin, and there will be vectors in \(\real^3\) that are left out.

Theorem 3.7 Any set of \(n\) vectors cannot span \(\real^m\) if \(n < m\).

3.4.2 Independence

Here is an old ODE friend dressed up in new clothes.

Definition 3.13 (Linear independence) Vectors \(\bfv_1,\bfv_2,\ldots,\bfv_n\) are linearly dependent if there is a way to choose constants \(c_1,\ldots,c_n\), not all zero, such that

\[ c_1\bfv_1 + c_2\bfv_2 + \cdots c_n \bfv_n = \bfzero. \tag{3.2}\]

If the vectors are not linearly dependent, then they are linearly independent.

Example 3.14 Any set of vectors that contains the zero vector must be dependent: You can set the coefficient \(c_j\) in Equation 3.2 of the zero vector to be 1, and all the others to be 0, and then the equation is true.

A linear combination is just a linear system in a trench coat, so there is a definition on that side of the fence as well.

Definition 3.14 (Homogeneous linear system) A homogeneous linear system is one that has forcing vector \(\bfb\) equal to zero. We say it has a nontrivial solution if the zero vector is not the only solution.

Note

In linear ODEs, we found that every problem depends on the associated homogeneous problem. The same is true for linear systems.

The following conclusions ensue immediately from our definitions and previous results.

Theorem 3.8 A homogeneous system in \(n\) variables with coefficient matrix \(\bfA\) has a nontrivial solution if and only if \(\rank(\bfA)< n\). In particular, a homogeneous system with more variables than equations has a nontrivial solution.

Theorem 3.9 Let \(\bfa_1,\ldots,\bfa_n\) be vectors in \(\real^m\), and let

\[ \bfA = \begin{bmatrix} \bfa_1 & \bfa_2 & \cdots & \bfa_n \end{bmatrix} \in \rmn{m}{n}. \]

The following statements are equivalent.

The vectors \(\bfa_1,\ldots,\bfa_n\) are linearly independent.
The system with augmented matrix \(\augmat{\bfA}{\bfzero}\) has only zero as a solution.
\(\rank(\bfA) = n\).

Example 3.15 Revisiting Example 3.14, if your set includes the zero vector, it will put a column of zeros in the matrix \(\bfA\) in Theorem 3.9. Since that column can’t ever have a leading nonzero in it, the rank of \(\bfA\) will be less than the number of columns. So (we again conclude) the set is dependent.

Example 3.16 Determine whether the vectors \(\bfv_1=\threevec{1}{0}{-2}\), \(\bfv_2=\threevec{1}{0}{1}\), and \(\bfv_3 = \threevec{0}{0}{-1}\) are linearly independent.

Solution. These vectors are dependent if, and only if, we can find \(c_1,c_2,c_3\), not all zero, such that

\[ c_1 \bfv_1 + c_2 \bfv_2 + c_3 \bfv_3 = \bfzero. \]

Equating components on the two sides of this equation leads to the linear system with augmented matrix

\[ \augmat{ \begin{matrix} 1 & 1 & 0 \\ 0 & 0 & 0 \\ -2 & 1 & -1 \end{matrix}}{ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} } \]

The RRE form of this system is

\[ \stackrel{RRE}{\Longrightarrow} \augmat{ \begin{matrix} 1 & 0 & \frac{1}{3} \\ 0 & 1 & -\frac{1}{3} \\ 0 & 0 & 0 \end{matrix}}{ \begin{matrix} 0 \\ 0 \\ 0 \end{matrix}, } \]

so the rank is equal to 2, which is less than the number of vectors in the question. Therefore, the vectors are linearly dependent.

Example 3.17 Determine whether the vectors (written in row form for compactness) \([2,0,0,-1]\), \([1,1,0,0]\), and \([0,0,-1,1]\) are linearly independent.

Solution. We use the given vectors as columns of a coefficient matrix:

\[ \bfA = \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \\ -1 & 0 & 1 \end{bmatrix}. \]

The linear system with augmented matrix \(\augmat{\bfA}{\bfzero}\) has RRE form

\[ \stackrel{RRE}{\Longrightarrow} \augmat{ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{matrix}}{ \begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix} }. \]

This has rank 3, which is the same as the number of vectors in the question, so the vectors are linearly independent.

Besides the presence of a zero vector, there is one easy case for detecting dependence.

Example 3.18 Determine whether the vectors \(\bfv_1=\threevec{1}{0}{-2}\), \(\bfv_2=\threevec{1}{0}{1}\), \(\bfv_3 = \threevec{0}{0}{-1}\), and \(\bfv_4 = \threevec{1}{1}{0}\) are linearly independent.

Solution. According to Theorem 3.9, for these vectors to be independent, the matrix

\[ \bfA = \begin{bmatrix} 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ -2 & 1 & -1 & 0 \end{bmatrix} \]

must have rank 4. Without doing any work, we know that the rank is at most 3, since there are only 3 rows and at most one pivot variable per row. Therefore, the vectors are dependent.

Example 3.18 is an instance of the following, which is a counterpart of Theorem 3.7.

Theorem 3.10 Any collection of \(n\) vectors in \(\real^m\) is dependent if \(n > m\).

Warning

Theorem 3.10 says, “if \(n > m\), then dependent,” but the converse statement, “if dependent, then \(n > m\),” is not logically correct. There are examples of dependent sets of vectors with \(n < m\), \(n=m\), and \(n>m\).