2.5. Efficiency of matrix computations#
Predicting how long an algorithm will take to solve a particular problem, on a particular computer, as written in a particular way in a particular programming language, is an enormously difficult undertaking. It’s more practical to predict how the required time will scale as a function of the size of the problem. In the case of a linear system of equations, the problem size is
Asymptotic analysis#
Let
We say
One immediate consequence is that
Consider the functions
Since
Consider
where
and so
so we cannot say that
It’s conventional to use asymptotic notation that is as specific as possible. For instance, while it is true that
There is a memorable way to use asymptotic notation to simplify sums:
These formulas greatly resemble the definite integral of
Flop counting#
Traditionally, in numerical linear algebra we count floating-point operations, or flops for short. In our interpretation each scalar addition, subtraction, multiplication, division, and square root counts as one flop. Given any algorithm, we simply add up the number of scalar flops and ignore everything else.
Here is a straightforward implementation of matrix-vector multiplication.
n = 6
A,x = randn(n,n),rand(n)
y = zeros(n)
for i in 1:n
for j in 1:n
y[i] += A[i,j]*x[j] # 1 multiply, 1 add
end
end
Each of the loops implies a summation of flops. The total flop count for this algorithm is
Since the matrix
Let’s run an experiment with the built-in matrix-vector multiplication. Note that Julia is unusual in that loops have a variable scope separate from its enclosing code. Thus, for n in n
below means that inside the loop, the name n
will take on each one of the values that were previously assigned to the vector n
.
The push!
function attaches a new value to the end of a vector.
n = 1000:1000:5000
t = []
for n in n
A = randn(n,n)
x = randn(n)
time = @elapsed for j in 1:80; A*x; end
push!(t,time)
end
The reason for doing multiple repetitions at each value of
pretty_table([n t], header=(["size", "time"], ["", "(sec)"]))
┌──────┬───────────┐
│ size │ time │
│ │ (sec) │
├──────┼───────────┤
│ 1000 │ 0.0169825 │
│ 2000 │ 0.090136 │
│ 3000 │ 0.217271 │
│ 4000 │ 0.295689 │
│ 5000 │ 0.400646 │
└──────┴───────────┘
Looking at the timings just for
The expression n.==4000
here produces a vector of Boolean (true/false) values the same size as n
. This result is used to index within t
, accessing only the value for which the comparison is true.
@show t[n.==4000] ./ t[n.==2000];
t[n .== 4000] ./ t[n .== 2000] = [3.2804724749267775]
If the run time is dominated by flops, then we expect this ratio to be
Suppose that the running time
So we expect that a graph of
Let’s repeat the experiment of the previous figure for more, and larger, values of
randn(5,5)*randn(5); # throwaway to force compilation
n = 400:200:6000
t = []
for n in n
A = randn(n,n)
x = randn(n)
time = @elapsed for j in 1:50; A*x; end
push!(t,time)
end
Plotting the time as a function of
scatter(n,t,label="data",legend=false,
xaxis=(:log10,L"n"), yaxis=(:log10,"elapsed time (sec)"),
title="Timing of matrix-vector multiplications")
You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents
plot!(n,t[end]*(n/n[end]).^2,l=:dash,
label=L"O(n^2)",legend=:topleft)
Solution of linear systems#
Recall the steps of Algorithm 2.4.7 for the system
Factor
using Gaussian elimination.Solve
for using forward substitution.Solve
for using backward substitution.
The second and third steps are solved by Function 2.3.4 and Function 2.3.5. Only one line in each of these functions dominates the arithmetic. Take forwardsub
, for instance. It has a single flop in line 11. Line 13 computes
sum( L[i,j]*x[j] for j in 1:i-1 )
This line requires
It is not hard to find an exact formula for the sum, but we use (2.5.1) to simplify it to
Solving a triangular
Before counting flops for the LU factorization, we have to admit that Function 2.4.6 is not written as economically as it could be. Recall from our motivating example in Demo 2.4.4 that we zero out the first row and column of lufact
with
15for k in 1:n-1
16 U[k,k:n] = Aₖ[k,k:n]
17 L[k:n,k] = Aₖ[k:n,k]/U[k,k]
18 Aₖ[k:n,k:n] -= L[k:n,k]*U[k,k:n]'
19end
We will use the following handy fact.
The range k:n
, where
Line 17 above divides each element of the vector Aₖ[k:n,k]
by a scalar. Hence the number of flops equals the length of the vector, which is
Line 18 has an outer product followed by a matrix subtraction. The definition (1) of the outer product makes it clear that that computation takes one flop (multiplication) per element of the result, which here results in
Altogether the factorization takes
There are different ways to simplify this expression. We will make a change of summation index using
We have proved the following.
The LU factorization of an
We’ll test the conclusion of lu
function instead of the purely instructive lufact
.
The first time a function is invoked, there may be significant time needed to compile it in memory. Thus, when timing a function, run it at least once before beginning the timing.
lu(randn(3,3)); # throwaway to force compilation
n = 400:400:4000
t = []
for n in n
A = randn(n,n)
time = @elapsed for j in 1:12; lu(A); end
push!(t,time)
end
We plot the timings on a log-log graph and compare it to
scatter(n,t,label="data",legend=:topleft,
xaxis=(:log10,L"n"), yaxis=(:log10,"elapsed time"))
plot!(n,t[end]*(n/n[end]).^3,l=:dash,label=L"O(n^3)")
In practice, flops are not the only aspect of an implementation that occupies significant time. Our position is that counting flops as a measure of performance is a useful oversimplification. We will assume that LU factorization (and as a result, the solution of a linear system of
Exercises#
✍ The following are asymptotic assertions about the limit
. In each case, prove the statement true or false.(a)
(b) if (c) (d) .✍ The following are asymptotic assertions about the limit
. In each case, prove the statement true or false.(a)
(b) if (c) (d) .✍ Show that the inner product of two
-vectors takes exactly flops.✍ Show that the multiplication of two
matrices takes flops.✍ This problem is about evaluation of a polynomial
.(a) Here is a little code to do the evaluation.
y = c[1] xpow = 1 for i in 2:n xpow *= x y += c[i]*xpow end
Assuming that
x
is a scalar, how many flops does this function take, as a function of ?(b) Compare the count from (a) to the flop count for Horner’s algorithm, Function 1.3.2.
The exact sums for
in (2.5.1) are as follows:(a) ✍ Use these to find the exact result for (2.5.4).
(b) ⌨ Plot the ratio of your result from (a) and the asymptotic result
for all , , using a log scale for and a linear scale for the ratio. (The curve should approach 1 asymptotically.)✍ Show that for any nonnegative constant integer
,⌨ The
UpperTriangular
andLowerTriangular
matrix types cause specialized algorithms to be invoked by the backslash. DefineA = rand(1000,1000) B = tril(A) C = LowerTriangular(B) b = rand(1000)
Using
@elapsed
with the backslash solver, time how long it takes to solve the linear system 100 times; then, do the same for matrices and . Is the timing for closer to or to ? (Hint: Remember to make one timing run without recording results, so that compilation time is not counted.)
- 1
More precisely,
and are sets of functions, and is a subset of . That we write rather than is a quirk of convention.