1.4. Stability#

If we solve a problem using a computer algorithm and see a large error in the result, we might suspect poor conditioning in the original mathematical problem. But algorithms can also be sources of errors. When error in the result of an algorithm exceeds what conditioning can explain, we call the algorithm unstable.

Case study#

In Example 1.2.7 we showed that finding the roots of a quadratic polynomial \(ax^2 + b x+c\) is poorly conditioned if and only if the roots are close to each other relative to their size. Hence for the polynomial

(1.4.1)#\[p(x) = (x-10^6)(x-10^{-6}) = x^2 - (10^6+10^{-6})x + 1,\]

finding roots is a well-conditioned problem. An obvious algorithm for finding those roots is to directly apply the familiar quadratic formula,

(1.4.2)#\[x_1 = \frac{-b + \sqrt{b^2-4ac}}{2a}, \qquad x_2 = \frac{-b - \sqrt{b^2-4ac}}{2a}.\]
Demo 1.4.1
a = 1;  b = -(1e6+1e-6);  c = 1;
@show x₁ = (-b + sqrt(b^2-4a*c)) / 2a;
@show x₂ = (-b - sqrt(b^2-4a*c)) / 2a;
x₁ = (-b + sqrt(b ^ 2 - (4a) * c)) / (2a) = 1.0e6
x₂ = (-b - sqrt(b ^ 2 - (4a) * c)) / (2a) = 1.00000761449337e-6

The first value is correct to all stored digits, but the second has fewer than six accurate digits:

error = abs(1e-6-x₂)/1e-6 
@show accurate_digits = -log10(error);
accurate_digits = -(log10(error)) = 5.118358987126217

The instability is easily explained. Since \(a=c=1\), we treat them as exact numbers. First, we compute the condition numbers with respect to \(b\) for each elementary step in finding the “good” root:




\(u_1 = b^2\)

\(1.000000000002000\times 10^{12}\)


\(u_2 = u_1 - 4\)

\(9.999999999980000\times 10^{11}\)

\(\approx 1.00\)

\(u_3 = \sqrt{u_2}\)



\(u_4 = u_3 - b\)


\(\approx 0.500\)

\(u_5 = u_4/2\)



Using (1.2.9), the chain rule for condition numbers, the conditioning of the entire chain is the product of the individual steps, so there is essentially no growth of relative error here. However, if we use the quadratic formula for the “bad” root, the next-to-last step becomes \(u_4=(-u_3) - b\), and now \(\kappa=|u_3|/|u_4|\approx 5\times 10^{11}\). So we can expect to lose 11 digits of accuracy, which is what we observed. The key issue is the subtractive cancellation in this one step.

Demo 1.4.1 suggests that the venerable quadratic formula is an unstable means of computing roots in finite precision. The roots themselves were not sensitive to the data or arithmetic—it’s the specific computational path we chose that caused the huge growth in errors.

We can confirm this conclusion by finding a different path that avoids subtractive cancellation. A little algebra using (1.4.2) confirms the additional formula \(x_1x_2=c/a\). So given one root \(r\), we compute the other root using \(c/ar\), which has only multiplication and division and therefore creates no numerical trouble.

Demo 1.4.2

We repeat the rootfinding experiment of Demo 1.4.1 with an alternative algorithm.

a = 1;  b = -(1e6+1e-6);  c = 1;

First, we find the “good” root using the quadratic formula.

@show x₁ = (-b + sqrt(b^2-4a*c)) / 2a;
x₁ = (-b + sqrt(b ^ 2 - (4a) * c)) / (2a) = 1.0e6

Then we use the identity \(x_1x_2=\frac{c}{a}\) to compute the smaller root:

@show x₂ = c / (a*x₁);
x₂ = c / (a * x₁) = 1.0e-6

As you see in this output, Julia often suppresses trailing zeros in a decimal expansion. To be sure we have an accurate result, we compute its relative error.

abs(x₂-1e-6) / 1e-6

The algorithms in Demo 1.4.1 and Demo 1.4.2 are equivalent when using real numbers and exact arithmetic. When results are perturbed by machine representation at each step, though, the effects may depend dramatically on the specific sequence of operations, thanks to the chain rule (1.2.9).

Observation 1.4.3

The sensitivity of a problem \(f(x)\) is governed only by \(\kappa_f\), but the sensitivity of an algorithm depends on the condition numbers of all of its individual steps.

This situation may seem hopelessly complicated. But the elementary operations we take for granted, such as those in Table 1.2.1, are well-conditioned in most circumstances. Exceptions usually occur when \(|f(x)|\) is much smaller than \(|x|\), although not every such case signifies trouble. The most common culprit is simple subtractive cancellation.

A practical characterization of instability is that results are much less accurate than the conditioning of the problem can explain. Typically one should apply an algorithm to test problems whose answers are well-known, or for which other programs are known to work well, in order to spot possible instabilities. In the rest of this book we will see some specific ways in which instability is manifested for different types of problems.

Backward error#

In the presence of poor conditioning for a problem \(f(x)\), even just the act of rounding the data to floating point may introduce a large change in the result. It’s not realistic, then, to expect any algorithm \(\tilde{f}\) to have a small error in the sense \(\tilde{f}(x)\approx f(x)\). There is another way to characterize the error, though, that can be a useful alternative measurement. Instead of asking, “Did you get nearly the right answer?”, we ask, “Did you answer nearly the right question?”

Definition 1.4.4 :  Backward error

Let \(\tilde{f}\) be an algorithm for the problem \(f\). Let \(y=f(x)\) be an exact result and \(\tilde{y}=\tilde{f}(x)\) be its approximation by the algorithm. If there is a value \(\tilde{x}\) such that \(f(\tilde{x}) = \tilde{y}\), then the relative backward error in \(\tilde{y}\) is

(1.4.3)#\[\frac{ |\tilde{x}-x| } { |x| }. \]

The absolute backward error is \(|\tilde{x}-x|\).

Backward error measures the change to the original data that reproduces the result that was found by the algorithm. The situation is illustrated in Fig. 1.4.1.


Fig. 1.4.1 Backward error is the difference between the original data and the data that exactly produces the computed value.#

Demo 1.4.5

Our first step is to construct a polynomial with six known roots.

r = [-2.0,-1,1,1,3,6]
p = fromroots(r)
36.0 - 36.0∙x - 43.0∙x2 + 44.0∙x3 + 6.0∙x4 - 8.0∙x5 + 1.0∙x6

Now we use a standard numerical method for finding those roots, pretending that we don’t know them already. This corresponds to \(\tilde{y}\) in Definition 1.4.4.

 = sort(roots(p))   # type r\tilde and then press Tab
6-element Vector{Float64}:
println("Root errors:") 
@. abs(r-) / r
Root errors:
6-element Vector{Float64}:

It seems that the forward error is acceptably close to machine epsilon for double precision in all cases except the double root at \(x=1\). This is not a surprise, though, given the poor conditioning at such roots.

Let’s consider the backward error. The data in the rootfinding problem are the polynomial coefficients. We can apply fromroots to find the coefficients of the polynomial (that is, the data) whose roots were actually computed by the numerical algorithm. This corresponds to \(\tilde{x}\) in Definition 1.4.4.

 = fromroots()
35.99999999999993 - 35.999999999999915∙x - 42.99999999999997∙x2 + 43.99999999999998∙x3 + 5.999999999999973∙x4 - 7.999999999999991∙x5 + 1.0∙x6

We find that in a relative sense, these coefficients are very close to those of the original, exact polynomial:

c, = coeffs(p),coeffs()
println("Coefficient errors:") 
@. abs(c-) / c
Coefficient errors:
7-element Vector{Float64}:

In summary, even though there are some computed roots relatively far from their correct values, they are nevertheless the roots of a polynomial that is very close to the original.

Small backward error is the best we can hope for in a poorly conditioned problem. Without getting into the formal details, know that if an algorithm always produces small backward errors, then it is stable. But the converse is not always true: some stable algorithms may produce a large backward error.

Example 1.4.6

One stable algorithm that is not backward stable is floating-point evaluation for our old friend, \(f(x)=x+1\). If \(|x|<\epsilon_\text{mach}/2\), then the computed result is \(\tilde{f}(x)=1\), since there are no floating-point numbers between \(1\) and \(1+\epsilon_\text{mach}\). Hence the only possible choice for a real number \(\tilde{x}\) satisfying (1.4.3) is \(\tilde{x}=0\). But then \(|\tilde{x}-x|/|x|=1\), which indicates 100% backward error!


  1. The formulas

    \[f(x)=\frac{1-\cos(x)}{\sin(x)}, \quad g(x) = \frac{2\sin^2(x/2)}{\sin(x)},\]

    are mathematically equivalent, but they suggest evaluation algorithms that can behave quite differently in floating point.

    (a) ✍ Using (1.2.6), find the relative condition number of \(f\). (Because \(f\) and \(g\) are equivalent, the condition number of \(g\) is the same.) Show that it approaches 1 as \(x\to 0\). (Hence it should be possible to compute the function accurately near zero.)

    (b) ⌨ Compute \(f(10^{-6})\) using a sequence of four elementary operations. Using Table 1.2.1, make a table like the one in Demo 1.4.1 that shows the result of each elementary result and the numerical value of the condition number of that step.

    (c) ⌨ Repeat part (b) for \(g(10^{-6})\), which has six elementary steps.

    (d) ✍ Based on parts (b) and (c), is the numerical value of \(f(10^{-6})\) more accurate, or is \(g(10^{-6})\) more accurate?

  2. Let \(f(x) = \frac{e^x-1}{x}\).

    (a) ✍ Find the condition number \(\kappa_f(x)\). What is the maximum of \(\kappa_f(x)\) over \(-1\le x \le 1\)?

    (b) ⌨ Use the “obvious” algorithm

    (exp(x)-1) / x

    to compute \(f(x)\) at \(x=10^{-2},10^{-3},10^{-4},\ldots,10^{-11}\).

    (c) ⌨ Create a second algorithm from the first 8 terms of the Maclaurin series, i.e.,

    \[p(x) = 1 + \frac{1}{2!}x + \frac{1}{3!}x^2 + \cdots + \frac{1}{8!}x^8.\]

    Evaluate it at the same values of \(x\) as in part (b).

    (d) ⌨ Make a table of the relative difference between the two algorithms as a function of \(x\). Which algorithm is more accurate, and why?

  3. ⌨ The function

    \[x = \cosh(y) = \frac{e^y + e^{-y}}{2}\]

    can be inverted to yield a formula for \(\operatorname{acosh}(x)\):

    (1.4.4)#\[\operatorname{acosh}(x) = y = \log\bigl(x-\sqrt{x^2-1}\bigr).\]

    For the steps below, define \(y_i=-4i\) and \(x_i=\cosh(y_i)\) for \(i=1,\dots,4\). Hence \(y_i=\operatorname{acosh}(x_i)\).

    (a) Find the relative condition number of evaluating \(f(x) = \operatorname{acosh}(x)\). (You can use (1.4.4) or look up a formula for \(f'\) in a calculus book.) Evaluate \(\kappa_f\) at all the \(x_i\). (You will find that the problem is well-conditioned at these inputs.)

    (b) Use (1.4.4) to approximate \(f(x_i)\) for all \(i\). Compute the relative accuracy of the results. Why are some of the results so inaccurate?

    (c) An alternative formula is

    (1.4.5)#\[y = -2\log\left(\sqrt{\frac{x+1}{2}} + \sqrt{\frac{x-1}{2}}\right).\]

    Apply (1.4.5) to approximate \(f(x_i)\) for all \(i\), again computing the relative accuracy of the results.

  4. ⌨ (Continuation of Exercise 1.3.2. Adapted from [Hig02].) One drawback of the formula (1.3.2) for sample variance is that you must compute a sum for \(\overline{x}\) before beginning another sum to find \(s^2\). Some statistics textbooks quote a single-loop formula

    \[\begin{split}\begin{split} s^2 &= \frac{1}{n-1} \left( u - \tfrac{1}{n}v^2 \right),\\ u & = \sum_{i=1}^n x_i^2, \\ v &= \sum_{i=1}^n x_i. \end{split}\end{split}\]

    Try this formula for these three datasets, each of which has a variance exactly equal to 1:

    x = [ 1e6, 1+1e6, 2+1e6 ]
    x = [ 1e7, 1+1e7, 2+1e7 ]
    x = [ 1e8, 1+1e8, 2+1e8 ]

    Explain the results.