Unreasonable effectiveness of Runge-Kutta 2

I used to give this example of an initial-value problem whenever numerical solutions of ODE came up in class:

\displaystyle    y' = -y^2, \quad y(0)=1

The exact solution is {y(t)=1/(1+t)}. Using Euler’s method with step {h=1} (admittedly large) we get, starting with {y_0=1}:

\displaystyle    y_1 = y_0 + h (-y_0^2) = 0,\quad y_2 = y_1 + h (-y_1^2) = 0, \dots

The numerical solution drops down to zero at once and stays there. Not good at all.

Euler's method
Euler’s method

The trapezoidal version of RK2 (also known as improved Euler, or Heun’s method) treats the above value of {y_1} merely as prediction {\tilde y_1}, and computes the actual {y_1} using the average of the slopes at {y_0} and {\tilde y_1}:

\displaystyle    y_1 = y_0 + \frac{h}{2} (-y_0^2 - \tilde y_1^2) = 1 + \frac12 (-1 - 0 ) = \frac12

… which is right on target. Let’s do it again: the prediction {\tilde y_2 = y_1 + h (-y_1^2) = 1/4} and correction

\displaystyle    y_2 = y_1 + \frac{h}{2} (-y_1^2 - \tilde y_2^2) = 1 + \frac12 \left(-\frac{1}{4}-\frac{1}{16} \right) =    \frac{11}{32}

… which is {\approx 0.34}, compared to the exact solution {1/3}. Nice!

Trapezoidal RK2
Trapezoidal RK2

I like the example, because all numbers are manageable with mental arithmetics for two steps. And it’s clear that {11/32} is pretty close to {1/3 = 11/33}.


But the fact that the relative error stays below {3.7\%} throughout the computation despite the large step size {h=1} seems… too good to be true. Consider that for the exact solution, the expression {y^2} can differ by the factor of {4} between two {t}-values at distance {1}.

The midpoint version of RK2, which generally performs about as well as the trapezoidal version, is nowhere as close in this example. Indeed, using the same data as above, we get

\displaystyle    y_1 = y_0 - h \left(\frac{y_0+\tilde y_1}{2}\right)^2 = \frac{3}{4}

and so forth: the relative error reaches {59\%} at the second step. That’s 16 times worse than the trapezoidal version.

Midpoint RK2
Midpoint RK2

What happens here has less to do with numerical analysis than with algebra of rational functions. Using {h=1} in trapezoidal RK2, we are in effect iterating the function

\displaystyle \varphi(y) = y + \frac12 (-y^2-(y-y^2)^2) = y-y^2+y^3-\frac12 y^4

The exact solution would be obtained by iterating

\displaystyle \Phi (y) = \frac{y}{1+y} = y-y^2+ y^3 - y^4 + \dots

Two functions just happen to coincide at {y=1}, which is our starting point here.

Two iterands
Two iterands

From there we get to {0.5}, and on {[0,0.5]} they are really close anyway, due to another coincidence: the truncation error {|\varphi(y)-\Phi(y)|} is {O(y^4)} instead of {O(y^3)} as it is normally for second-order methods.

The midpoint method with {h=1} amounts to iterating

\displaystyle \psi(y) = y - \left(\frac{y+y-y^2}{2} \right)^2 = y-y^2+y^3-\frac14 y^4

which is not substantially further away from {\Phi}, but does not enjoy the lucky coincidence at {y=1}.


The tables are turned with the initial value {y_0=3}. The exact solution is {y(t) = 3/(1+3t)}, which drops sharply from {t=0} to {t=1}; its slope decreases by the factor of {16} during one step. Still, midpoint-RK2 does a decent job with {h=1}:

Midpoint RK2 starting with y=3
Midpoint RK2 starting with y=3

while trapezoidal RK2 outputs {y_1=-19.5}, {y_2=-80110}, {y_3 = -2\cdot 10^{19}} and promptly overflows.


With a reasonable step size, like {h=0.1}, normality is restored: both methods perform about equally well, with {0.13\%} error for trapezoidal and {0.21\%} for midpoint.

IVP vs BVP

A typical engineering-oriented course in ordinary differential equations focuses on solving initial value problems (IVP): first by elementary methods, then with power series (if nobody updated the syllabus since 1950s), then with the Laplace transform. One also considers the stability of equilibrium solutions and draws various diagrams to show how other solutions flow around equilibria. And then the semester is over.

And so, the boundary value problems (BVP) are likely to go unnoticed. Let’s compare:

  • IVP answers the question: if I aim at this angle, who will be shot?
  • BVP answers the question: how should I aim to shoot this guy?

Hm, which question is more practical?

  • IVP has a unique solution, as long as the functions involved are not too bad. This was observed already by the Cheshire Cat: you’re sure to get somewhere if you only walk long enough.
  • BVP can easily have no solutions, more than one, or infinitely many. The Scholarpedia article on BVP makes this fact intuitive with the example of the motion of a projectile: there may be one, two, or zero ways to hit the target, depending on how far it is.
From Scholarpedia, http://www.scholarpedia.org/article/Boundary_value_problem

The stability is also quite different for IVP and BVP. For example,

  • the IVP x''=x, x(0)=0, x'(0)=0 has a unique solution x(t)\equiv 0, which is unstable
  • the BVP x''=x, x(0)=0, x(1)=0 has a unique solution x(t)\equiv 0, which is stable

The instability of IVP means that a small perturbation of the initial values such as \tilde x(0)=\epsilon, \tilde x'(0)=0, will produce a solution \tilde x that moves away from x. Indeed, \tilde x(t)=\epsilon(e^t+e^{-t})/2 and the difference |\tilde x-x| grows exponentially in time. One can also see this instability without solving the equation. Just convert it into a 2×2 system of 1st order ODE and find that the Jacobian matrix \displaystyle \begin{pmatrix} 0 & 1 \\ 1& 0\end{pmatrix} has eigenvalues \pm 1. The positive eigenvalue makes the equilibrium unstable.

But if we perturb the boundary values, for example \tilde x(0)=\epsilon, \tilde x(1)=0, the new solution is \tilde x(t)=\epsilon(e^{1-t}-e^{t-1})/(e-e^{-1}). The plot is below: the important this is that |\tilde x-x|\le \epsilon, and this would be true even if we considered the BVP on the interval [0,1000] instead of [0,1].

Perturbed BVP (epsilon=0.1)

Same happens if the boundary derivatives are prescribed: for example, \tilde x'(0)=0.1 and \tilde x'(1)=-0.2:

Another perturbed BVP

Notice how the perturbed solution tries to approach the unperturbed x(t)=0 in the middle of the plot.

Where does this BVP stability come from? Turns out that the equation x''=x is the Euler-Lagrange equation for the energy functional \displaystyle E(x)=\int_0^1 ((x')^2+x^2)\,dt. Although the Wikipedia page explains why, let’s verify this directly. First, perform the variation by replacing x with x+\epsilon \eta, where \eta is some small “bump” function that vanishes near the endpoints. We get

\displaystyle E(x+\epsilon \eta)=\int_0^1 ((x'+\epsilon\eta')^2+(x+\epsilon\eta)^2)\,dt

Expand, take the first derivative with respect to \epsilon, then equate it to zero at \epsilon=0 (if x is a critical point, the derivative must be zero.) One notices that all this stuff amounts to keeping only the terms in which \eta appears to first power:

\displaystyle \int_0^1 (x'\eta'+x\eta)\,dt=0

In the term x'\eta' we want to move the derivative from \eta onto x. That is, integrate by parts:

\displaystyle \int_0^1 (-x''\eta+x\eta)\,dt=0

Now \eta factors out, and since it was arbitrary, the Euler-Lagrange equation is -x''+x=0, which is what we had. This explains why the solution was slightly bent toward the x=0 axis: it tried to minimize the integral x^2 while keeping the first derivative small, as to not have a large term (x')^2.

The boundary conditions determine a space V of all functions that satisfy them. The energy functional \displaystyle E(x)=\int_0^1 ((x')^2+x^2)\,dt is a strictly convex function from V to real numbers. Therefore, it has only one critical point on V which is the global minimum of energy, and therefore a stable equilibrium of whatever object is described by the energy E.