```
```{raw} latex
%%start demo%%
```
Here is a straightforward implementation of matrix-vector multiplication.
```{code-cell}
n = 6
A,x = randn(n,n),rand(n)
y = zeros(n)
for i in 1:n
for j in 1:n
y[i] += A[i,j]*x[j] # 1 multiply, 1 add
end
end
```
Each of the loops implies a summation of flops. The total flop count for this algorithm is
$$ \sum_{i=1}^n \sum_{j=1}^n 2 = \sum_{i=1}^n 2n = 2n^2. $$
Since the matrix $\mathbf{A}$ has $n^2$ elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than $O(n^2)$ in the general case.
```{index} ! Julia; push\!, ! Julia; for
```
::::{panels}
:column: col-7 left-side
:card: border-0 shadow-none
```{raw} latex
\begin{minipage}[t]{0.5\textwidth}
```
Let's run an experiment with the built-in matrix-vector multiplication. Note that Julia is unusual in that loops have a variable scope separate from its enclosing code. Thus, `for n in n` below means that inside the loop, the name `n` will take on each one of the values that were previously assigned to the vector `n`.
```{raw} latex
\end{minipage}\hfill
```
---
:column: col-5 right-side
:card: shadow-none comment
```{raw} latex
\begin{minipage}[t]{0.4\textwidth}\begin{mdframed}[default]\small
```
The `push!` function attaches a new value to the end of a vector.
```{raw} latex
\end{mdframed}\end{minipage}
```
::::
```{code-cell}
n = 1000:1000:5000
t = []
for n in n
A = randn(n,n)
x = randn(n)
time = @elapsed for j in 1:80; A*x; end
push!(t,time)
end
```
The reason for doing multiple repetitions at each value of $n$ in the loop above is to avoid having times so short that the resolution of the timer is significant.
```{code-cell}
pretty_table((n=n,t=t),["size" "time";"" "(sec)"])
```
```{index} Julia; Boolean indexing
```
::::{panels}
:column: col-7 left-side
:card: border-0 shadow-none
```{raw} latex
\begin{minipage}[t]{0.5\textwidth}
```
Looking at the timings just for $n=2000$ and $n=4000$, they have ratio
```{raw} latex
\end{minipage}\hfill
```
---
:column: col-5 right-side
:card: shadow-none comment
```{raw} latex
\begin{minipage}[t]{0.4\textwidth}\begin{mdframed}[default]\small
```
The expression `n.==4000` here produces a vector of Boolean (true/false) values the same size as `n`. This result is used to index within `t`, accessing only the value for which the comparison is true.
```{raw} latex
\end{mdframed}\end{minipage}
```
::::
```{code-cell}
@show t[n.==4000] ./ t[n.==2000];
```
If the run time is dominated by flops, then we expect this ratio to be $(4000)^2/(2000)^2=4$.
```{raw} html

```
```{raw} latex
%%end demo%%
```
Suppose that the running time $t$ of an algorithm obeys a function that is $O(n^p)$. For sufficiently large $n$, $t\approx Cn^p$ for a constant $C$ should be a good approximation. Hence
```{math}
:label: loglogfit
t \approx Cn^p \qquad \Longleftrightarrow \qquad \log t \approx p(\log n) + \log C.
```
So we expect that a graph of $\log t$ as a function of $\log n$ will be a straight line of slope $p$.
(demo-flops-loglog)=
```{proof:demo}
```
```{raw} html
```
```{raw} latex
%%start demo%%
```
Let's repeat the experiment of the previous figure for more, and larger, values of $n$.
```{code-cell}
randn(5,5)*randn(5); # throwaway to force compilation
n = 400:200:6000
t = []
for n in n
A = randn(n,n)
x = randn(n)
time = @elapsed for j in 1:50; A*x; end
push!(t,time)
end
```
Plotting the time as a function of $n$ on log-log scales is equivalent to plotting the logs of the variables.
```{code-cell}
scatter(n,t,label="data",legend=false,
xaxis=(:log10,L"n"), yaxis=(:log10,"elapsed time (sec)"),
title="Timing of matrix-vector multiplications")
```
You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents $O(n^2)$ growth exactly. (All such lines have slope equal to 2.)
```{code-cell}
plot!(n,t[end]*(n/n[end]).^2,l=:dash,
label=L"O(n^2)",legend=:topleft)
```
```{raw} html

```
```{raw} latex
%%end demo%%
```
## Solution of linear systems
Recall the steps of {numref}`Algorithm {number}
```
```{raw} latex
%%start demo%%
```
::::{panels}
:column: col-7 left-side
:card: border-0 shadow-none
```{raw} latex
\begin{minipage}[t]{0.5\textwidth}
```
We'll test the conclusion of $O(n^3)$ flops experimentally, using the built-in `lu` function instead of the purely instructive `lufact`.
```{raw} latex
\end{minipage}\hfill
```
---
:column: col-5 right-side
:card: shadow-none comment
```{raw} latex
\begin{minipage}[t]{0.4\textwidth}\begin{mdframed}[default]\small
```
The first time a function is invoked, there may be significant time needed to compile it in memory. Thus, when timing a function, run it at least once before beginning the timing.
```{raw} latex
\end{mdframed}\end{minipage}
```
::::
```{code-cell}
lu(randn(3,3)); # throwaway to force compilation
n = 400:400:4000
t = []
for n in n
A = randn(n,n)
time = @elapsed for j in 1:12; lu(A); end
push!(t,time)
end
```
We plot the timings on a log-log graph and compare it to $O(n^3)$. The result could vary significantly from machine to machine, but in theory the data should start to parallel the line as $n\to\infty$.
```{code-cell}
scatter(n,t,label="data",legend=:topleft,
xaxis=(:log10,L"n"), yaxis=(:log10,"elapsed time"))
plot!(n,t[end]*(n/n[end]).^3,l=:dash,label=L"O(n^3)")
```
```{raw} html

```
```{raw} latex
%%end demo%%
```
In practice, flops are not the only aspect of an implementation that occupies significant time. Our position is that counting flops as a measure of performance is a useful oversimplification. We will assume that LU factorization (and as a result, the solution of a linear system of $n$ equations) requires a real-world time that is roughly $O(n^3)$. This growth rate is a great deal more tolerable than, say, $O(2^n)$, but it does mean that for (at this writing) $n$ greater than 10,000 or so, something other than general LU factorization will have to be used.
## Exercises
1. ✍ The following are asymptotic assertions about the limit $n\rightarrow\infty$. In each case, prove the statement true or false.
**(a)** $n^2 = O(\log n),\quad$
**(b)** $n^{a} = O(n^b)$ if $a\le b,\quad$
**(c)** $e^n \sim e^{2n},\quad$
**(d)** $n+\sqrt{n}\sim n+2\sqrt{n}$.
2. ✍ The following are asymptotic assertions about the limit $h\to 0$. In each case, prove the statement true or false.
**(a)** $h^2\log(h) = O(h^3),\quad$
**(b)** $h^{a} = O(h^b)$ if $a < b,\quad$
**(c)** $\sin(h) \sim h,\quad$
**(d)** $(e^{2h}-1)\sim h$.
3. ✍ Show that the inner product of two $n$-vectors takes exactly $2n-1$ flops.
4. ✍ Show that the multiplication of two $n\times n$ matrices takes $\sim 2n^3$ flops.
5. ✍ This problem is about evaluation of a polynomial $c_1 + c_2 x + \cdots + c_{n}x^{n-1}$.
**(a)** Here is a little code to do the evaluation.
``` julia
y = c[1]
xpow = 1
for i in 2:n
xpow *= x
y += c[i]*xpow
end
```
Assuming that `x` is a scalar, how many flops does this function take, as a function of $n$?
**(b)** Compare the count from (a) to the flop count for Horner's algorithm, {numref}`Function {number}