\n",
"```\n",
"\n",
"```{raw} latex\n",
"%%start demo%%\n",
"```\n",
"\n",
"\n",
"Here is a straightforward implementation of matrix-vector multiplication."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3dc1c88e",
"metadata": {},
"outputs": [],
"source": [
"n = 6\n",
"A,x = randn(n,n),rand(n)\n",
"y = zeros(n)\n",
"for i in 1:n\n",
" for j in 1:n\n",
" y[i] += A[i,j]*x[j] # 1 multiply, 1 add\n",
" end\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "96d87e73",
"metadata": {},
"source": [
"Each of the loops implies a summation of flops. The total flop count for this algorithm is\n",
"\n",
"$$ \\sum_{i=1}^n \\sum_{j=1}^n 2 = \\sum_{i=1}^n 2n = 2n^2. $$\n",
"\n",
"Since the matrix $\\mathbf{A}$ has $n^2$ elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than $O(n^2)$ in the general case.\n",
"\n",
"```{index} ! Julia; push\\!, ! Julia; for\n",
"```\n",
"\n",
"::::{panels}\n",
":column: col-7 left-side\n",
":card: border-0 shadow-none\n",
"```{raw} latex\n",
"\\begin{minipage}[t]{0.5\\textwidth}\n",
"```\n",
"Let's run an experiment with the built-in matrix-vector multiplication. Note that Julia is unusual in that loops have a variable scope separate from its enclosing code. Thus, `for n in n` below means that inside the loop, the name `n` will take on each one of the values that were previously assigned to the vector `n`.\n",
"\n",
"```{raw} latex\n",
"\\end{minipage}\\hfill\n",
"```\n",
"---\n",
":column: col-5 right-side\n",
":card: shadow-none comment\n",
"```{raw} latex\n",
"\\begin{minipage}[t]{0.4\\textwidth}\\begin{mdframed}[default]\\small\n",
"```\n",
"The `push!` function attaches a new value to the end of a vector.\n",
"```{raw} latex\n",
"\\end{mdframed}\\end{minipage}\n",
"```\n",
"::::"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a030bebc",
"metadata": {},
"outputs": [],
"source": [
"n = 1000:1000:5000\n",
"t = []\n",
"for n in n\n",
" A = randn(n,n) \n",
" x = randn(n)\n",
" time = @elapsed for j in 1:80; A*x; end\n",
" push!(t,time)\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "855a709f",
"metadata": {},
"source": [
"The reason for doing multiple repetitions at each value of $n$ in the loop above is to avoid having times so short that the resolution of the timer is significant."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e78c5efe",
"metadata": {},
"outputs": [],
"source": [
"pretty_table((n=n,t=t),[\"size\" \"time\";\"\" \"(sec)\"])"
]
},
{
"cell_type": "markdown",
"id": "f3cf4efd",
"metadata": {},
"source": [
"```{index} Julia; Boolean indexing\n",
"```\n",
"\n",
"::::{panels}\n",
":column: col-7 left-side\n",
":card: border-0 shadow-none\n",
"```{raw} latex\n",
"\\begin{minipage}[t]{0.5\\textwidth}\n",
"```\n",
"Looking at the timings just for $n=2000$ and $n=4000$, they have ratio\n",
"\n",
"```{raw} latex\n",
"\\end{minipage}\\hfill\n",
"```\n",
"---\n",
":column: col-5 right-side\n",
":card: shadow-none comment\n",
"```{raw} latex\n",
"\\begin{minipage}[t]{0.4\\textwidth}\\begin{mdframed}[default]\\small\n",
"```\n",
"The expression `n.==4000` here produces a vector of Boolean (true/false) values the same size as `n`. This result is used to index within `t`, accessing only the value for which the comparison is true.\n",
"```{raw} latex\n",
"\\end{mdframed}\\end{minipage}\n",
"```\n",
"::::"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aaa9b7b2",
"metadata": {},
"outputs": [],
"source": [
"@show t[n.==4000] ./ t[n.==2000];"
]
},
{
"cell_type": "markdown",
"id": "d0b79eb1",
"metadata": {},
"source": [
"If the run time is dominated by flops, then we expect this ratio to be $(4000)^2/(2000)^2=4$.\n",
"```{raw} html\n",
"

\n",
"```\n",
"\n",
"```{raw} latex\n",
"%%end demo%%\n",
"```\n",
"\n",
"Suppose that the running time $t$ of an algorithm obeys a function that is $O(n^p)$. For sufficiently large $n$, $t\\approx Cn^p$ for a constant $C$ should be a good approximation. Hence\n",
"\n",
"```{math}\n",
" :label: loglogfit\n",
" t \\approx Cn^p \\qquad \\Longleftrightarrow \\qquad \\log t \\approx p(\\log n) + \\log C.\n",
"```\n",
"\n",
"So we expect that a graph of $\\log t$ as a function of $\\log n$ will be a straight line of slope $p$.\n",
"\n",
"\n",
"(demo-flops-loglog)=\n",
"```{proof:demo}\n",
"```\n",
"\n",
"```{raw} html\n",
"\n",
"```\n",
"\n",
"```{raw} latex\n",
"%%start demo%%\n",
"```\n",
"\n",
"Let's repeat the experiment of the previous figure for more, and larger, values of $n$."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f7a564ce",
"metadata": {},
"outputs": [],
"source": [
"randn(5,5)*randn(5); # throwaway to force compilation\n",
"\n",
"n = 400:200:6000\n",
"t = []\n",
"for n in n\n",
" A = randn(n,n) \n",
" x = randn(n)\n",
" time = @elapsed for j in 1:50; A*x; end\n",
" push!(t,time)\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "04c3b6ff",
"metadata": {},
"source": [
"Plotting the time as a function of $n$ on log-log scales is equivalent to plotting the logs of the variables."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbeb4226",
"metadata": {},
"outputs": [],
"source": [
"scatter(n,t,label=\"data\",legend=false,\n",
" xaxis=(:log10,L\"n\"), yaxis=(:log10,\"elapsed time (sec)\"),\n",
" title=\"Timing of matrix-vector multiplications\")"
]
},
{
"cell_type": "markdown",
"id": "75b5a2a9",
"metadata": {},
"source": [
"You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents $O(n^2)$ growth exactly. (All such lines have slope equal to 2.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38ebae3c",
"metadata": {},
"outputs": [],
"source": [
"plot!(n,t[end]*(n/n[end]).^2,l=:dash,\n",
" label=L\"O(n^2)\",legend=:topleft)"
]
},
{
"cell_type": "markdown",
"id": "65a437ce",
"metadata": {},
"source": [
"```{raw} html\n",
"

\n",
"```\n",
"\n",
"```{raw} latex\n",
"%%end demo%%\n",
"```\n",
"\n",
"## Solution of linear systems\n",
"\n",
"Recall the steps of {numref}`Algorithm {number} \n",
"```\n",
"\n",
"```{raw} latex\n",
"%%start demo%%\n",
"```\n",
"\n",
"::::{panels}\n",
":column: col-7 left-side\n",
":card: border-0 shadow-none\n",
"```{raw} latex\n",
"\\begin{minipage}[t]{0.5\\textwidth}\n",
"```\n",
"We'll test the conclusion of $O(n^3)$ flops experimentally, using the built-in `lu` function instead of the purely instructive `lufact`.\n",
"\n",
"```{raw} latex\n",
"\\end{minipage}\\hfill\n",
"```\n",
"---\n",
":column: col-5 right-side\n",
":card: shadow-none comment\n",
"```{raw} latex\n",
"\\begin{minipage}[t]{0.4\\textwidth}\\begin{mdframed}[default]\\small\n",
"```\n",
"The first time a function is invoked, there may be significant time needed to compile it in memory. Thus, when timing a function, run it at least once before beginning the timing.\n",
"```{raw} latex\n",
"\\end{mdframed}\\end{minipage}\n",
"```\n",
"::::"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa7ccbea",
"metadata": {},
"outputs": [],
"source": [
"lu(randn(3,3)); # throwaway to force compilation\n",
"\n",
"n = 400:400:4000\n",
"t = []\n",
"for n in n\n",
" A = randn(n,n) \n",
" time = @elapsed for j in 1:12; lu(A); end\n",
" push!(t,time)\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "9389638b",
"metadata": {},
"source": [
"We plot the timings on a log-log graph and compare it to $O(n^3)$. The result could vary significantly from machine to machine, but in theory the data should start to parallel the line as $n\\to\\infty$."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8861611f",
"metadata": {},
"outputs": [],
"source": [
"scatter(n,t,label=\"data\",legend=:topleft,\n",
" xaxis=(:log10,L\"n\"), yaxis=(:log10,\"elapsed time\"))\n",
"plot!(n,t[end]*(n/n[end]).^3,l=:dash,label=L\"O(n^3)\")"
]
},
{
"cell_type": "markdown",
"id": "9e9852aa",
"metadata": {},
"source": [
"```{raw} html\n",
"

\n",
"```\n",
"\n",
"```{raw} latex\n",
"%%end demo%%\n",
"```\n",
"\n",
"\n",
"In practice, flops are not the only aspect of an implementation that occupies significant time. Our position is that counting flops as a measure of performance is a useful oversimplification. We will assume that LU factorization (and as a result, the solution of a linear system of $n$ equations) requires a real-world time that is roughly $O(n^3)$. This growth rate is a great deal more tolerable than, say, $O(2^n)$, but it does mean that for (at this writing) $n$ greater than 10,000 or so, something other than general LU factorization will have to be used.\n",
"\n",
"## Exercises\n",
"\n",
"1. ✍ The following are asymptotic assertions about the limit $n\\rightarrow\\infty$. In each case, prove the statement true or false.\n",
"\n",
" **(a)** $n^2 = O(\\log n),\\quad$ \n",
" **(b)** $n^{a} = O(n^b)$ if $a\\le b,\\quad$\n",
" **(c)** $e^n \\sim e^{2n},\\quad$\n",
" **(d)** $n+\\sqrt{n}\\sim n+2\\sqrt{n}$.\n",
"\n",
"2. ✍ The following are asymptotic assertions about the limit $h\\to 0$. In each case, prove the statement true or false.\n",
"\n",
" **(a)** $h^2\\log(h) = O(h^3),\\quad$\n",
" **(b)** $h^{a} = O(h^b)$ if $a < b,\\quad$\n",
" **(c)** $\\sin(h) \\sim h,\\quad$\n",
" **(d)** $(e^{2h}-1)\\sim h$.\n",
"\n",
"3. ✍ Show that the inner product of two $n$-vectors takes exactly $2n-1$ flops.\n",
"\n",
"4. ✍ Show that the multiplication of two $n\\times n$ matrices takes $\\sim 2n^3$ flops.\n",
"\n",
"5. ✍ This problem is about evaluation of a polynomial $c_1 + c_2 x + \\cdots + c_{n}x^{n-1}$.\n",
" \n",
" **(a)** Here is a little code to do the evaluation.\n",
"\n",
" ``` julia\n",
" y = c[1]\n",
" xpow = 1\n",
" for i in 2:n\n",
" xpow *= x\n",
" y += c[i]*xpow\n",
" end\n",
" ```\n",
"\n",
" Assuming that `x` is a scalar, how many flops does this function take, as a function of $n$?\n",
"\n",
" **(b)** Compare the count from (a) to the flop count for Horner's algorithm, {numref}`Function {number}