LU factorization

2.4. LU factorization#

A major tool in numerical linear algebra is to factor a given matrix into terms that are individually easier to deal with than the original. In this section we derive a means to express a square matrix using triangular factors, which will allow us to solve a linear system using forward and backward substitution.

Outer products#

Our derivation of the factorization hinges on an expression of matrix products in terms of vector outer products. If \(\mathbf{u}\in\real^m\) and \(\mathbf{v}\in\real^n\), then the outer product of these vectors is the \(m\times n\) matrix

(2.4.1)#\[\begin{split}\mathbf{u} \mathbf{v}^T = \begin{bmatrix} u_1 v_1 & u_1 v_2 & \cdots & u_1 v_n \\u_2 v_1 & u_2 v_2 & \cdots & u_2 v_n \\ \vdots & \vdots & & \vdots \\ u_m v_1 & u_m v_2 & \cdots & u_m v_n \end{bmatrix}.\end{split}\]

We illustrate the connection of outer products to matrix multiplication by a small example.

Example 2.4.1

According to the usual definition of matrix multiplication,

\[\begin{split} \small \begin{bmatrix} 4 & -1 \\ -3 & 5 \\ -2 & 6 \end{bmatrix} \begin{bmatrix} 2 & -7 \\ -3 & 5 \end{bmatrix} & = \small \begin{bmatrix} (4)(2) + (-1)(-3) & (4)(-7) + (-1)(5) \\ (-3)(2) + (5)(-3) & (-3)(-7) + (5)(5) \\ (-2)(2) + (6)(-3) & (-2)(-7) + (6)(5) \end{bmatrix}. \end{split}\]

If we break this up into the sum of two matrices, however, each is an outer product.

\[\begin{split} & = \small \begin{bmatrix} (4)(2) & (4)(-7) \\ (-3)(2) & (-3)(-7) \\ (-2)(2) & (-2)(-7) \end{bmatrix} + \begin{bmatrix} (-1)(-3) & (-1)(5) \\ (5)(-3) & (5)(5) \\ (6)(-3) & (6)(5) \end{bmatrix}\\[2mm] & = \small \begin{bmatrix} 4 \\ -3 \\ -2 \end{bmatrix} \begin{bmatrix} 2 & -7 \end{bmatrix} \: + \: \begin{bmatrix} -1 \\ 5 \\ 6 \end{bmatrix} \begin{bmatrix} -3 & 5 \end{bmatrix}. \end{split}\]

Note that the vectors here are columns of the left-hand matrix and rows of the right-hand matrix. The matrix product is defined only if there are equal numbers of these.

It is not hard to derive the following generalization of Example 2.4.1 to all matrix products.

Theorem 2.4.2 : Matrix multiplication by outer products

Write the columns of \(\mathbf{A}\) as \(\mathbf{a}_1,\dots,\mathbf{a}_n\) and the rows of \(\mathbf{B}\) as \(\mathbf{b}_1^T,\dots,\mathbf{b}_n^T\). Then

(2.4.2)#\[\mathbf{A}\mathbf{B} = \sum_{k=1}^n \mathbf{a}_k \mathbf{b}_k^T.\]

Triangular product#

Equation (2.4.2) has some interesting structure for the product \(\mathbf{L}\mathbf{U}\), where \(\mathbf{L}\) is \(n\times n\) and lower triangular (i.e., zero above the main diagonal) and \(\mathbf{U}\) is \(n\times n\) and upper triangular (zero below the diagonal).

Demo 2.4.3

We explore the outer product formula for two random triangular matrices.

L = tril( rand(1:9,3,3) )

3×3 Matrix{Int64}:
0  0
4  0
8  1

U = triu( rand(1:9,3,3) )

3×3 Matrix{Int64}:
9  3
7  7
0  8

Here are the three outer products in the sum in (2.4.2):

Although U[1,:] is a row of U, it is a vector, and as such it has a default column interpretation.

L[:,1]*U[1,:]'

3×3 Matrix{Int64}:
81  27
 9   3
54  18

L[:,2]*U[2,:]'

3×3 Matrix{Int64}:
 0   0
28  28
56  56

L[:,3]*U[3,:]'

3×3 Matrix{Int64}:
0  0
0  0
0  8

Simply because of the triangular zero structures, only the first outer product contributes to the first row and first column of the entire product.

Let the columns of \(\mathbf{L}\) be written as \(\boldsymbol{\ell}_k\) and the rows of \(\mathbf{U}\) be written as \(\mathbf{u}_k^T\). Then the first row of \(\mathbf{L}\mathbf{U}\) is

(2.4.3)#\[\mathbf{e}_1^T \sum_{k=1}^n \boldsymbol{ℓ}_k \mathbf{u}_k^T = \sum_{k=1}^n (\mathbf{e}_1^T \boldsymbol{\ell}_k) \mathbf{u}_k^T = L_{11} \mathbf{u}_1^T.\]

Likewise, the first column of \(\mathbf{L}\mathbf{U}\) is

(2.4.4)#\[\left( \sum_{k=1}^n \mathbf{ℓ}_k \mathbf{u}_k^T\right) \mathbf{e}_1 = \sum_{k=1}^n \mathbf{\ell}_k (\mathbf{u}_k^T \mathbf{e}_1) = U_{11}\boldsymbol{\ell}_1.\]

These two calculations are enough to derive one of the most important algorithms in scientific computing.

Triangular factorization#

Our goal is to factor a given \(n\times n\) matrix \(\mathbf{A}\) as the triangular product \(\mathbf{A}=\mathbf{L}\mathbf{U}\). It turns out that we have \(n^2+n\) total nonzero unknowns in the two triangular matrices, so we set \(L_{11}=\cdots = L_{nn}=1\), making \(\mathbf{L}\) a unit lower triangular matrix.

Demo 2.4.4

For illustration, we work on a \(4 \times 4\) matrix. We name it with a subscript in preparation for what comes.

A₁ = [
     2    0    4     3 
    -4    5   -7   -10 
     1   15    2   -4.5
    -2    0    2   -13
    ];
L = diagm(ones(4))
U = zeros(4,4);

Now we appeal to (2.4.3). Since \(L_{11}=1\), we see that the first row of \(\mathbf{U}\) is just the first row of \(\mathbf{A}_1\).

U[1,:] = A₁[1,:]
U

4×4 Matrix{Float64}:
0  0.0  4.0  3.0
0  0.0  0.0  0.0
0  0.0  0.0  0.0
0  0.0  0.0  0.0

From (2.4.4), we see that we can find the first column of \(\mathbf{L}\) from the first column of \(\mathbf{A}_1\).

L[:,1] = A₁[:,1]/U[1,1]
L

4×4 Matrix{Float64}:
  1.0  0.0  0.0  0.0
 -2.0  1.0  0.0  0.0
  0.5  0.0  1.0  0.0
 -1.0  0.0  0.0  1.0

We have obtained the first term in the sum (2.4.2) for \(\mathbf{L}\mathbf{U}\), and we subtract it away from \(\mathbf{A}_1\).

A₂ = A₁ - L[:,1]*U[1,:]'

4×4 Matrix{Float64}:
0   0.0  0.0    0.0
0   5.0  1.0   -4.0
0  15.0  0.0   -6.0
0   0.0  6.0  -10.0

Now \(\mathbf{A}_2 = \boldsymbol{\ell}_2\mathbf{u}_2^T + \boldsymbol{\ell}_3\mathbf{u}_3^T + \boldsymbol{\ell}_4\mathbf{u}_4^T.\) If we ignore the first row and first column of the matrices in this equation, then in what remains we are in the same situation as at the start. Specifically, only \(\boldsymbol{\ell}_2\mathbf{u}_2^T\) has any effect on the second row and column, so we can deduce them now.

U[2,:] = A₂[2,:]
L[:,2] = A₂[:,2]/U[2,2]
L

4×4 Matrix{Float64}:
  1.0  0.0  0.0  0.0
 -2.0  1.0  0.0  0.0
  0.5  3.0  1.0  0.0
 -1.0  0.0  0.0  1.0

If we subtract off the latest outer product, we have a matrix that is zero in the first two rows and columns.

A₃ = A₂ - L[:,2]*U[2,:]'

4×4 Matrix{Float64}:
0  0.0   0.0    0.0
0  0.0   0.0    0.0
0  0.0  -3.0    6.0
0  0.0   6.0  -10.0

Now we can deal with the lower right \(2\times 2\) submatrix of the remainder in a similar fashion.

U[3,:] = A₃[3,:]
L[:,3] = A₃[:,3]/U[3,3]
A₄ = A₃ - L[:,3]*U[3,:]'

4×4 Matrix{Float64}:
0  0.0  0.0  0.0
0  0.0  0.0  0.0
0  0.0  0.0  0.0
0  0.0  0.0  2.0

Finally, we pick up the last unknown in the factors.

U[4,4] = A₄[4,4];

We now have all of \(\mathbf{L}\),

4×4 Matrix{Float64}:
  1.0  0.0  -0.0  0.0
 -2.0  1.0  -0.0  0.0
  0.5  3.0   1.0  0.0
 -1.0  0.0  -2.0  1.0

and all of \(\mathbf{U}\),

4×4 Matrix{Float64}:
0  0.0   4.0   3.0
0  5.0   1.0  -4.0
0  0.0  -3.0   6.0
0  0.0   0.0   2.0

We can verify that we have a correct factorization of the original matrix by computing the backward error:

A₁ - L*U

4×4 Matrix{Float64}:
0  0.0  0.0  0.0
0  0.0  0.0  0.0
0  0.0  0.0  0.0
0  0.0  0.0  0.0

In floating point, we cannot always expect all the elements to be exactly equal as we found in this toy example. Instead, we would be satisfied to see that each element of the difference above is comparable in size to machine precision.

We have arrived at the linchpin of solving linear systems.

Definition 2.4.5 : LU factorization

Given \(n\times n\) matrix \(\mathbf{A}\), its LU factorization is

(2.4.5)#\[\mathbf{A} = \mathbf{L}\mathbf{U},\]

where \(\mathbf{L}\) is a unit lower triangular matrix and \(\mathbf{U}\) is an upper triangular matrix.

The outer product algorithm for LU factorization seen in Demo 2.4.4 is coded as Function 2.4.6.

Function 2.4.6 : lufact

LU factorization (not stable)

"""
    lufact(A)

Compute the LU factorization of square matrix `A`, returning the
factors.
"""
function lufact(A)
    n = size(A,1)
    L = diagm(ones(n))   # ones on main diagonal, zeros elsewhere
    U = zeros(n,n)
    Aₖ = float(copy(A))

    # Reduction by outer products
    for k in 1:n-1
        U[k,:] = Aₖ[k,:]
        L[:,k] = Aₖ[:,k]/U[k,k]
        Aₖ -= L[:,k]*U[k,:]'
    end
    U[n,n] = Aₖ[n,n]
    return LowerTriangular(L),UpperTriangular(U)
end

About the code

Line 11 of Function 2.4.6 points out two subtle Julia issues. First, vectors and matrix variables are really just references to blocks of memory. Such a reference is much more efficient to pass around than the complete contents of the array. However, it means that a statement Aₖ=A just clones the array reference of A into the variable Aₖ. Any changes made to entries of Aₖ would then also be made to entries of A because they refer to the same location in memory. In this context we don’t want to change the original matrix, so we use copy here to create an independent copy of the array contents and a new reference to them.

The second issue is that even when A has all integer entries, its LU factors may not. So we convert Aₖ to floating point so that line 17 will not fail due to the creation of floating-point values in an integer matrix. An alternative would be to require the caller to provide a floating-point array in the first place.

Gaussian elimination and linear systems#

In your first matrix algebra course, you probably learned a triangularization technique called Gaussian elimination or row elimination to solve a linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\). In most presentations, you form an augmented matrix \([\mathbf{A}\;\mathbf{b}]\) and do row operations until the system reaches an upper triangular form, followed by backward substitution. LU factorization is equivalent to Gaussian elimination in which no row swaps are performed, and the elimination procedure produces the factors if you keep track of the row multipliers appropriately.

Like Gaussian elimination, the primary use of LU factorization is to solve a linear system. It reduces a given linear system to two triangular ones. From this, solving \(\mathbf{A}\mathbf{x}=\mathbf{b}\) follows immediately from associativity:

\[ \mathbf{b} = \mathbf{A} \mathbf{x} = (\mathbf{L} \mathbf{U}) \mathbf{x} = \mathbf{L} (\mathbf{U} \mathbf{x}). \]

Defining \(\mathbf{z} = \mathbf{U} \mathbf{x}\) leads to the following.

Algorithm 2.4.7 : Solution of linear systems by LU factorization (unstable)

Factor \(\mathbf{L}\mathbf{U}=\mathbf{A}\).
Solve \(\mathbf{L}\mathbf{z}=\mathbf{b}\) for \(\mathbf{z}\) using forward substitution.
Solve \(\mathbf{U}\mathbf{x}=\mathbf{z}\) for \(\mathbf{x}\) using backward substitution.

A key advantage of the factorization point of view is that it depends only on the matrix \(\mathbf{A}\). If systems are to be solved for a single \(\mathbf{A}\) but multiple different versions of \(\mathbf{b}\), then the factorization approach is more efficient, as we’ll see in Section 2.5.

Demo 2.4.8

Here are the data for a linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\).

A = [2 0 4 3; -4 5 -7 -10; 1 15 2 -4.5; -2 0 2 -13];
b = [4,9,9,4];

We apply Function 2.4.6 and then do two triangular solves.

L,U = FNC.lufact(A)
z = FNC.forwardsub(L,b)
x = FNC.backsub(U,z)

4-element Vector{Float64}:
 192.66666666666666
 -15.533333333333335
 -65.33333333333333
 -40.0

A check on the residual assures us that we found the solution.

b - A*x

4-element Vector{Float64}:
  0.0
 -5.684341886080802e-14
  2.842170943040401e-14
  0.0

As noted in the descriptions of Function 2.4.6 and Algorithm 2.4.7, the LU factorization as we have seen it so far is not stable for all matrices. In fact, it does not always even exist. The missing element is the row swapping allowed in Gaussian elimination. We will address these issues in Section 2.6.

Exercises#

✍ For each matrix, produce an LU factorization by hand.

(a) \(\quad \displaystyle \begin{bmatrix} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{bmatrix}\qquad\) (b) \(\quad \displaystyle \begin{bmatrix} 6 & -2 & -4 & 4\\ 3 & -3 & -6 & 1 \\ -12 & 8 & 21 & -8 \\ -6 & 0 & -10 & 7 \end{bmatrix}\)
⌨ The matrices

\[\begin{split}\mathbf{T}(x,y) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ x & y & 1 \end{bmatrix},\qquad \mathbf{R}(\theta) = \begin{bmatrix} \cos\theta & \sin \theta & 0 \\ -\sin\theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix}\end{split}\]

are used to represent translations and rotations of plane points in computer graphics. For the following, let

\[\begin{split}\mathbf{A} = \mathbf{T}(3,-1)\mathbf{R}(\pi/5)\mathbf{T}(-3,1), \qquad \mathbf{z} = \begin{bmatrix} 2 \\ 2 \\ 1 \end{bmatrix}.\end{split}\]

(a) Find \(\mathbf{b} = \mathbf{A}\mathbf{z}\).

(b) Use Function 2.4.6 to find the LU factorization of \(\mathbf{A}\).

(c) Use the factors with triangular substitutions in order to solve \(\mathbf{A}\mathbf{x}=\mathbf{b}\), and find \(\mathbf{x}-\mathbf{z}\).
⌨ In Julia, define

\[\begin{split}\mathbf{A}= \begin{bmatrix} 1 & 0 & 0 & 0 & 10^{12} \\ 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}, \quad \hat{\mathbf{x}} = \begin{bmatrix} 0 \\ 1/3 \\ 2/3 \\ 1 \\ 4/3 \end{bmatrix}, \quad \mathbf{b} = \mathbf{A}\hat{\mathbf{x}}.\end{split}\]

(a) Using Function 2.4.6 and triangular substitutions, solve the linear system \(\mathbf{A}\mathbf{x}=\mathbf{b}\), showing the result. To the nearest integer, how many accurate digits are in the result? (The answer is much less than the full 16 of double precision.)

(b) Repeat part (a) with \(10^{20}\) as the element in the upper right corner. (The result is even less accurate. We will study the causes of such low accuracy in Section 2.8.)
⌨ Let

\[\begin{split} \mathbf{A} = \begin{bmatrix} 1 & 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 \end{bmatrix}. \end{split}\]

Verify computationally that if \(\mathbf{A}=\mathbf{L}\mathbf{U}\) is the LU factorization, then the elements of \(\mathbf{L}\), \(\mathbf{U}\), \(\mathbf{L}^{-1}\), and \(\mathbf{U}^{-1}\) are all integers. Do not rely just on visual inspection of the numbers; perform a more definitive test.
⌨ Function 2.4.6 factors \(\mathbf{A}=\mathbf{L}\mathbf{U}\) in such a way that \(\mathbf{L}\) is a unit lower triangular matrix—that is, has all ones on the diagonal. It is also possible to define the factorization so that \(\mathbf{U}\) is a unit upper triangular matrix instead. Write a function lufact2 that uses Function 2.4.6 without modification to produce this version of the factorization. (Hint: Begin with the standard LU factorization of \(\mathbf{A}^T\).) Demonstrate on a nontrivial \(4\times 4\) example.
When computing the determinant of a matrix by hand, it’s common to use cofactor expansion and apply the definition recursively. But this is terribly inefficient as a function of the matrix size.

(a) ✍ Explain using determinant properties why, if \(\mathbf{A}=\mathbf{L}\mathbf{U}\) is an LU factorization,

\[ \det(\mathbf{A}) = U_{11}U_{22}\cdots U_{nn}=\prod_{i=1}^n U_{ii}.\]

(b) ⌨ Using the result of part (a), write a function determinant(A) that computes the determinant using Function 2.4.6. Test your function on at least two nontriangular \(5\times 5\) matrices, comparing your result to the result of the standard det function.