6.5. Adaptive Runge–Kutta#
The derivation and analysis of methods for initial-value problems usually assumes a fixed step size \(h\). While the error behavior \(O(h^p)\) is guaranteed by Theorem 6.2.7 as \(h\rightarrow 0\), this bound comes with an unknowable constant, and it is not very useful as a guide to the numerical value of the error at any particular value of \(h\). Furthermore, as we saw in Section 5.7 for numerical integration, in many problems a fixed step size is far from the most efficient strategy.
In response we will employ the basic strategy of Section 5.7: estimate the error and adapt the step size in order to reach an accuracy goal. Unlike the integration problem, though, the “integrand” of an IVP is dependent on the solution itself, so the details differ greatly.
Step size prediction#
Suppose that, starting from a given value \(u_i\) and using a step size \(h\), we run one step of two different RK methods simultaneously: one method with order \(p\), producing \(u_{i+1}\), and the other method with order \(p+1\), producing \(\tilde{u}_{i+1}\). In most circumstances, we can expect that \(\tilde{\mathbf{u}}_{i+1}\) is a much better approximation to the solution than \(\mathbf{u}_{i+1}\) is. So it seems reasonable to use
as an estimate of the actual local error made by the \(p\)th-order method. For a vector IVP, we would use a norm rather than an absolute value.
Now we ask: looking back, what step size should we have taken to meet an error target of size \(\epsilon\)? Let’s speculate, given the behavior of local truncation error as \(h\rightarrow 0\), that \(E_i(h)\approx C h^{p+1}\) for an unknown constant \(C\). If we had used a step size \(q h\) for some \(q>0\), then trivially, we would expect
Our best guess for \(q\) would therefore be to set \(E_i(qh)\approx \epsilon\), or
Perhaps, though, we should aim to control the contribution to global error, which is closer to \(E_i(qh)/(q h)\). Then we end up with
Experts have different recommendations about whether to use (6.5.1) or (6.5.2). Even though (6.5.2) appears to be more in keeping with our assumptions about global errors, modern practice seems to favor (6.5.1).
We now have an outline of an algorithm.
Given a solution estimate \(u_i\) at \(t=t_i\), and a step size \(h\), do the following:
Many details remain unspecified at this point, but we first address step 1.
Embedded formulas#
Suppose, for example, we choose to use a pair of second- and third-order RK methods to get the \(\mathbf{u}_{i+1}\) and \(\tilde{\mathbf{u}}_{i+1}\) needed in Algorithm 6.5.1. Then we seem to need at least \(2+3=5\) evaluations of \(f(t,y)\) for each attempted time step. This is more than double the computational work needed by the second-order method without adaptivity.
Fortunately, the marginal cost of adaptivity can be substantially reduced by using embedded Runge–Kutta formulas. Embedded RK formulas are a pair of RK methods whose stages share the same internal \(f\) evaluations, combining them differently in order to get estimates of two different orders of accuracy.
A good example of an embedded method is the Bogacki–Shampine (BS23) formula, given by the table
The top part of the table describes four stages in the usual RK fashion. The last two rows describe how to construct a third-order estimate \(\tilde{\mathbf{u}}_{i+1}\) and a second-order estimate \(\mathbf{u}_{i+1}\) by taking different combinations of those stages.
Implementation#
Our implementation of an embedded second/third-order (RK23) code is given in Function 6.5.2.
Adaptive IVP solver based on embedded RK formulas
1"""
2 rk23(ivp,tol)
3
4Apply an adaptive embedded RK formula pair to solve given IVP with
5estimated error `tol`. Returns a vector of times and a vector of
6solution values.
7"""
8function rk23(ivp,tol)
9 # Initialize for the first time step.
10 a,b = ivp.tspan
11 t = [a]
12 u = [float(ivp.u0)]; i = 1;
13 h = 0.5*tol^(1/3)
14 s₁ = ivp.f(ivp.u0,ivp.p,a)
15
16 # Time stepping.
17 while t[i] < b
18 # Detect underflow of the step size.
19 if t[i]+h == t[i]
20 @warn "Stepsize too small near t=$(t[i])"
21 break # quit time stepping loop
22 end
23
24 # New RK stages.
25 s₂ = ivp.f( u[i]+(h/2)*s₁, ivp.p, t[i]+h/2 )
26 s₃ = ivp.f( u[i]+(3h/4)*s₂, ivp.p, t[i]+3h/4 )
27 unew2 = u[i] + h*(2s₁ + 3s₂ + 4s₃)/9 # 2rd order solution
28 s₄ = ivp.f( unew2, ivp.p, t[i]+h )
29 err = h*(-5s₁/72 + s₂/12 + s₃/9 - s₄/8) # 2nd/3rd difference
30 E = norm(err,Inf) # error estimate
31 maxerr = tol*(1 + norm(u[i],Inf)) # relative/absolute blend
32
33 # Accept the proposed step?
34 if E < maxerr # yes
35 push!(t,t[i]+h)
36 push!(u,unew2)
37 i += 1
38 s₁ = s₄ # use FSAL property
39 end
40
41 # Adjust step size.
42 q = 0.8*(maxerr/E)^(1/3) # conservative optimal step factor
43 q = min(q,4) # limit stepsize growth
44 h = min(q*h,b-t[i]) # don't step past the end
45 end
46 return t,u
47end
About the code
The check t[i]+h==t[i]
on line 19 is to detect when \(h\) has become so small that it no longer changes the floating-point value of \(t_i\). This may be a sign that the underlying exact solution has a singularity near \(t=t_i\), but in any case, the solver must halt by using a break
statement to exit the loop.
On line 30, we use a combination of absolute and relative tolerances to judge the acceptability of a solution value, as in (5.7.4). In lines 41–43 we underestimate the step factor \(q\) a bit and prevent a huge increase in the step size, since a rejected step is expensive, and then we make sure that our final step doesn’t take us past the end of the domain.
Finally, line 37 exploits a subtle property of the BS23 formula called first same as last (FSAL).
While (6.5.3) calls for four stages to find the paired second- and third-order estimates, the final stage computed in stepping from \(t_i\) to \(t_{i+1}\) is identical to the first stage needed to step from \(t_{i+1}\) to \(t_{i+2}\). By repurposing s₄
as s₁
for the next pass, one of the stage evaluations comes for free, and only three evaluations of \(f\) are needed per successful step.
Let’s run adaptive RK on \(u'=e^{t-u\sin u}\).
f = (u,p,t) -> exp(t-u*sin(u))
ivp = ODEProblem(f,0,(0.,5.))
t,u = FNC.rk23(ivp,1e-5)
plot(t,u,m=2,
xlabel=L"t",ylabel=L"u(t)",title="Adaptive IVP solution")
The solution makes a very abrupt change near \(t=2.4\). The resulting time steps vary over three orders of magnitude.
Δt = diff(t)
plot(t[1:end-1],Δt,title="Adaptive step sizes",
xaxis=(L"t",(0,5)),yaxis=(:log10,"step size"))
If we had to run with a uniform step size to get this accuracy, it would be
println( "minimum step size = $(minimum(Δt))" )
minimum step size = 4.6096854609878335e-5
On the other hand, the average step size that was actually taken was
println( "average step size = $(sum(Δt)/(length(t)-1))" )
average step size = 0.03205128205128205
We took fewer steps by a factor of almost 1000! Even accounting for the extra stage per step and the occasional rejected step, the savings are clear.
In Demo 6.1.4 we saw an IVP that appears to blow up in a finite amount of time. Because the solution increases so rapidly as it approaches the blowup, adaptive stepping is required even to get close.
f = (u,p,t) -> (t+u)^2
ivp = ODEProblem(f,1,(0.,1.))
t,u = FNC.rk23(ivp,1e-5);
┌ Warning: Stepsize too small near t=0.7854087204072808
└ @ FundamentalsNumericalComputation ~/.julia/packages/FundamentalsNumericalComputation/Gvhz2/src/chapter06.jl:93
In fact, the failure of the adaptivity gives a decent idea of when the singularity occurs.
plot(t,u,legend=:none,
xlabel=L"t",yaxis=(:log10,L"u(t)"),title="Finite-time blowup")
tf = t[end]
vline!([tf],l=:dash)
annotate!(tf,1e5,latexstring(@sprintf("t = %.6f ",tf)),:right)
Often the adaptively chosen steps clearly correspond to identifiable features of the solution. However, there are so-called stiff problems in which the time steps seem unreasonably small in relation to the observable behavior of the solution. These problems benefit from a particular type of solver that is considered in Section 6.7.
Exercises#
⌨ Using Function 6.5.2 with an error tolerance of \(10^{-8}\), solve \(y'' +(1+y')^3 y = 0\) over \( 0 \le t \le 4 \pi\) with the indicated initial conditions. Plot \(y(t)\) and \(y'(t)\) as functions of \(t\) and separately plot the time step size as a function of \(t\).
(a) \(y(0) = 0.1, \quad y'(0) = 0\)
(b) \(y(0) = 0.5, \quad y'(0) = 0\)
(c) \(y(0) = 0.75, \quad y'(0) = 0\)
(d) \(y(0) = 0.95, \quad y'(0) = 0\)
⌨ Solve the FitzHugh–Nagumo system from Exercise 4.3.6 for \(I=0.05740\) using Function 6.5.2 with error tolerance \(10^{-2}\), \(10^{-3}\), and \(10^{-4}\). (This illustrates that the error tolerance is a target, not a guarantee!)
✍ Derive Equation (6.5.2) using the stated assumption about controlling global rather than local error.
⌨ Solve the problem \(u'=100u^2-u^3\), \(u(0)=0.0002\), \(0\le t \le 100\), and make plots that show both the solution and the time steps taken. The solution makes a quick transition between two nearly constant states. Does the step size selection behave the same in both states?