Rarefaction colliding with two shocks

The Burgers equation {v_t+vv_x =0} is a great illustration of shock formation and propagation, with {v(x,t)} representing velocity at time {t} at position {x}. Let’s consider it with piecewise constant initial condition

\displaystyle u(x,0) = \begin{cases} 1,\quad & x<0 \\ 0,\quad & 0<x<1 \\ 2,\quad & 1<x<2 \\ 0,\quad & x>2 \end{cases}

The equation, rewritten as {(v_t,v_x)\cdot (1,v)=0}, states that the function {v} stays constant on each line of the form {x= x_0 + v(x_0,0)t}. Since we know {v(x_0,0)}, we can go ahead and plot these lines (characteristics). The vertical axis is {t}, horizontal {x}.

burger0

Clearly, this picture isn’t complete. The gap near {1} is due to the jump of velocity from {0} to {2}: the particles in front move faster than those in back. This creates a rarefaction wave: the gap gets filled by characteristics emanating from {(1,0)}. They have different slopes, and so the velocity {v} also varies within the rarefaction: it is {v(x,t)=(x-1)/t}, namely the velocity with which one has to travel from {(1,0)} to {(x,t)}.

burger1

The intersecting families of characteristics indicate a shock wave. Its propagation speed is the average of two values of {v} to the left and to the right. Let’s draw the shock waves in red.

burger2

Characteristics terminate when they run into shock. Truncating the constant-speed characteristics clears up the picture:

burger3

This is already accurate up to the time {t=1}. But after that we encounter a complication: the shock wave to the right separates velocity fields of varying intensity, due to rarefaction to the left of the shock. Its propagation is now described by the ODE

\displaystyle \frac{dx}{dt} = \frac12 \left( \frac{x-1}{t} + 0 \right) = \frac{x-1}{2t}

which can be solved easily: {x(t) = 2\sqrt{t} +1}.

Similarly, the shock on the left catches up with rarefaction at {t=2}, and its position after that is found from the ODE

\displaystyle \frac{dx}{dt} = \frac12 \left( 1 + \frac{x-1}{t} \right) = \frac{x-1+t}{2t}

Namely, {x(t) = t-\sqrt{2t}+1} for {t>2}. Let’s plot:

burger4

The space-time trajectories of both shocks became curved. Although initially, the shock on the right moved twice as fast as the shock on the left, the rarefaction wave pulls them toward each other. What is this leading to? Yet another collision.

The final collision occurs at the time {t=6+4\sqrt{2} \approx 11.5} when the two shock waves meet. At this moment, rarefaction ceases to exist. The single shock wave forms, separating two constant velocity fields (velocities {1} and {0}). It propagates to the right at constant speed {1/2}. Here is the complete picture of the process:

burger5

I don’t know where the holes in the spacetime came from; they aren’t predicted by the Burgers equation.

Oscillatory explosion

Nonlinear differential equations don’t necessarily have globally defined solutions, even if the equation is autonomous (no time is involved explicitly). The simplest example is {y'=y^2} with initial condition {y(0)=1}: the solution is {y(t) = 1/(1-t )}, which ceases to exist at {t=1}, escaping to infinity.


This kind of behavior can often be demonstrated without solving the ODE. Consider {y''=y^3} with {y(0)=1}, {y'(0)=0}. I’ve no idea what the explicit solution is, but it’s clear that the quantity {E(t) = \frac12 (y')^2 - \frac14 y^4} remains constant: indeed, {E'(t) = y''y' - y^3 y' = 0}. (Here, {E} is analogous to the sum of kinetic and potential energy in physics.)

The solution {y} is increasing and convex. Let {t_n} be such that {y(t_n)=n}, e.g., {t_1=0}. The invariance of {E} yields {y'(t_n) = 2^{-1/2} (n^2 - 1)}. By the mean value theorem, {t_{n+1}-t_n \le \sqrt{2}(n^2-1)^{-1}} for {n=2,3,\dots}. Since the series on the right converges, the limit {T = \lim_{n\rightarrow\infty }t_n} is finite; this is the blow-up time.


But no blow-up occurs for the equation {y''=-y^3}, where the nonlinear term pushes back toward equilibrium. Indeed, the invariant energy is now {E= \frac12 (y')^2 + \frac14 y^4}, which implies that {y} and {y'} stay bounded. In the phase space {(y,y')} the solution with initial values {y(0)=1}, {y'(0)=0} traces out this curve:

Closed trajectory in the phase space (y,y')
Closed trajectory in the phase space (y,y’)

Accordingly, the solution is a periodic function of {t} (although this is not a trigonometric function):

Periodic solution, globally defined
Periodic solution, globally defined

Everything so far has been pretty straightforward. But here is a stranger animal: {y'''=-y^3}. As in the previous example, nonlinear term pushes toward equilibrium. Using initial conditions {y(0)=1}, {y'(0)=y''(0)=0}, I get this numerical solution up to time {T=5}:

0<t<5
0<t<5

As in the previous example, {y} oscillates. But the speed and amplitude of oscillation are increasing.

0<t<5.4
0<t<5.4
0<t<5.7
0<t<5.7

Rapidly increasing:

0<t<5.9
0<t<5.9
0<t<6
0<t<6

In the plane {(y,y')} the solution spirals out:

The plane (y,y')
The plane (y,y’)

The plots make it clear that the solution ceases to exist in finite time, but I don’t have a proof. The issue is that the function {y} does not escape to infinity in just one direction. Indeed, if {y(t_0)>0}, then regardless of the values of {y'} and {y''} at {t_0}, the negative third derivative will eventually make the function {y} decrease, and so {y} will attain the value {0} at some {t_1>t_0}. After that, the third derivative is positive, guaranteeing the existence of time {t_2>t_1} when {y} returns to {0} again. I haven’t succeeded in proving that the limit {\lim t_n} is finite, giving the time when oscillatory explosion occurs.


The plots were made in SageMathCloud:

var('t y y1 y2')
P = desolve_system_rk4([y1,y2,-y^3],[y,y1,y2], ics=[0,1,0,0],ivar=t,end_points=5,step=0.01)
Q=[ [i,j] for i,j,k,l in P]
list_plot(Q, plotjoined=True)

Implicit Möbius strip

Reading the Wolfram MathWorld™ article on the Möbius strip I came across an equation that was news to me.

After the familiar parametric form

\displaystyle    x=(1+s\cos(t/2))\cos t,\quad y=(1+s\cos(t/2))\sin t, \quad z = s\sin(t/2) \qquad (1)

it is said that “In this parametrization, the Möbius strip is therefore a cubic surface with equation…”

\displaystyle  y(x^2+y^2+z^2-1)-2z(x^2+y^2+x) =0 \qquad\qquad\qquad(2)

That’s a neat cubic equation for sure. But… wouldn’t the sign of the left hand side of (2) (call it F) distinguish the “positive” and “negative” sides of the surface, contrary to its non-orientability?

I went to SageMathCloud™ to check and got this plot of (2):

Implicit plot, looks complicated
Implicit plot, looks complicated

That’s… not exactly the Möbius strip as I know it. But it’s true that (1) implies (2): the cubic surface contains the Möbius strip. Here is the plot of (1) superimposed on the plot of (2).

Same with the Moebius band superimposed
Same with the Moebius band superimposed

The self-intersection arises where the “positive side” F > 0 suddenly transitions to negative F< 0 .

Trigonometric approximation and the Clenshaw-Curtis quadrature

Trying to approximate a generic continuous function on {[-1,1]} with the Fourier trigonometric series of the form {\sum_n (A_n\cos \pi nx+B_n\sin \pi nx)} is in general not very fruitful. Here’s such an approximation to {f(x)=\exp(x)}, with the sum over {n\le 4}:

Poor approximation to exp(x)
Poor approximation to exp(x)

It’s better to make a linear change of variable: consider {f(2x-1)} on the interval {[0,1]}, and use the formula for the cosine series. This results in {\exp(2x-1)}, which is approximated by the partial sum {\sum_{n=0}^4 A_n\cos \pi nx} of its cosine Fourier series as follows.

Better approximation after a change of variable
Better approximation after a change of variable

But one can do much better with a different, nonlinear change of variable. Consider {f(\cos x)} on the interval {[0,\pi]}, and again use the formula for the cosine series. This results in {\exp(\cos x)}, which is approximated by the partial sum {\sum_{n=0}^4 A_n\cos nx} of its cosine Fourier series as follows.

Excellent approximation after nonlinear change of variable
Excellent approximation after nonlinear change of variable

Yes, I can’t see any difference either: the error is less than {10^{-3}}.

The composition with cosine improves approximation because {f(\cos t)} is naturally a periodic function, with no jumps or corners in its graph. Fourier series, which are periodic by nature, follow such functions more easily.


A practical implication of this approximation of {f(\cos t)} is the Clenshaw-Curtis integration method. It can be expressed in one line:

\displaystyle    \int_{-1}^1 f(x)\,dx = \int_0^\pi f(\cos t)\sin t\,dt \approx \int_0^\pi \sum_{n=0}^N a_n \cos nt \sin t\,dt   = \sum_{n=0}^N \frac{(1+(-1)^n) a_n}{1-n^2}

The integral {\int_{-1}^1 f(x)\,dx} is approximated by summing {2a_{2k}/(1-4k^2)}, where {a_{2k}} are even-numbered cosine coefficients of {f(\cos t)}. In the example with {f(x)=\exp(x)} using just three coefficients yields

\displaystyle    \frac{2a_0}{1}+\frac{2a_2}{-3}+\frac{2a_4}{-15} \approx 2.350405

while the exact integral is {\approx 2.350402}.


At first this doesn’t look practical at all: after all, the Fourier coefficients are themselves found by integration. But since {f(\cos t)} is so close to a trigonometric polynomial, one can sample it at equally spaced points and apply the Fast Fourier transform to the result, quickly obtaining many coefficients at once. This is what the Clenshaw-Curtis quadrature does (at least in principle, the practical implementation may streamline these steps.)

Completely monotone imitation of 1/x

I wanted an example of a function {f} that behaves mostly like {1/x} (the product {xf(x)} is bounded between two positive constants), but such that {xf(x)} does not have a limit as {x\rightarrow 0}.

The first thing that comes to mind is {(2+\sin(1/x))/x}, but this function does not look very much like {1/x}.

sin(1/x) makes it too wiggly
sin(1/x) makes it too wiggly

Then I tried {f(x)=(2+\sin\log x)/x}, recalling an example from Linear Approximation and Differentiability. It worked well:

I can't believe it's not a hyperbola!
I can’t believe it’s not a hyperbola!

In fact, it worked much better than I expected. Not only if {f'} of constant sign, but so are {f''} and {f'''}. Indeed,

\displaystyle    f'(x) = \frac{\cos \ln x - \sin \log x - 2}{x^2}

is always negative,

\displaystyle    f''(x) = \frac{4 -3\cos \log x + \sin \log x}{x^3}

is always positive,

\displaystyle    f'''(x) = \frac{10\cos \log x -12}{x^4}

is always negative. The sign becomes less obvious with the fourth derivative,

\displaystyle    f^{(4)}(x) = \frac{48-40\cos\log x - 10\sin \cos \ln x}{x^5}

because the triangle inequality isn’t conclusive now. But the amplitude of {A\cos t+B\sin t} is {\sqrt{A^2+B^2}}, and {\sqrt{40^2+10^2}<48}.

So, it seems that {f} is completely monotone, meaning that {(-1)^n f^{(n)}(x)\ge 0} for all {x>0} and for all {n=0,1,2,\dots}. But we already saw that this sign pattern can break after many steps. So let’s check carefully.

Direct calculation yields the neat identity

\displaystyle    \left(\frac{1+a\cos \log x+b\sin\log x}{x^n}\right)' = -n\,\frac{1+(a-b/n)\cos\log x+(b+a/n) \sin\log x}{x^{n+1}}

With its help, the process of differentiating the function {f(x) = (1+a\cos \log x+b\sin\log x)/x} can be encoded as follows: {a_1=a}, {b_1=b}, then {a_{n+1}=a_n-b_n/n} and {b_{n+1} = b_n+a_n/n}. The presence of {1/n} is disconcerting because the harmonic series diverges. But orthogonality helps: the added vector {(-b_n/n, a_n/n)} is orthogonal to {(a_n,b_n)}.

The above example, rewritten as {f(x)=(1+\frac12\sin\log x)/x}, corresponds to starting with {(a,b) = (0,1/2)}. I calculated and plotted {10000} iterations: the points {(a_n,b_n)} are joined by piecewise linear curve.

Harmonic spiral
Harmonic spiral

The total length of this curve is infinite, since the harmonic series diverges. The question is, does it stay within the unit disk? Let’s find out. By the above recursion,

\displaystyle    a_{n+1}^2 + b_{n+1}^2 = \left(1+\frac{1}{n^2}\right) (a_n^2+b_n^2)

Hence, the squared magnitude of {(a_n,b_n)} will always be less than

\displaystyle    \frac14 \prod_{n=1}^\infty \left(1+\frac{1}{n^2}\right)

with {1/4} being {a^2+b^2}. The infinite product evaluates to {\frac{\sinh \pi}{\pi}a\approx 3.7} (explained here), and thus the polygonal spiral stays within the disk of radius {\frac12 \sqrt{\frac{\sinh \pi}{\pi}}\approx 0.96}. In conclusion,

\displaystyle    (-1)^{n} \left(\frac{1+(1/2)\sin\log x}{x}\right)^{(n)} = n!\,\frac{1+a_{n+1}\cos\log x+b_{n+1} \sin\log x}{x^{n+1}}

where the trigonometric function {a_{n+1}\cos\log x+b_{n+1} \sin\log x} has amplitude strictly less than {1}. Since the expression on the right is positive, {f} is completely monotone.

The plot was generated in Sage using the code below.

a,b,c,d = var('a b c d')
a = 0
b = 1/2
l = [(a,b)]
for k in range(1,10000):
    c = a-b/k
    d = b+a/k
    l.append((c,d))
    a = c
    b = d
show(line(l),aspect_ratio=1)

Linear approximation and differentiability

If a function {f\colon \mathbb R\rightarrow \mathbb R} is differentiable at {a\in \mathbb R}, then it admits good linear approximation at small scales. Precisely: for every {\epsilon>0} there is {\delta>0} and a linear function {\ell(x)} such that {|f(x)-\ell(x)|<\epsilon \,\delta} for all {|x|<\delta}. Having {\delta} multiplied by {\epsilon} means that the deviation from linearity is small compared to the (already small) scale {\delta} on which the function is considered.

For example, this is a linear approximation to {f(x)=e^x} near {0} at scale {\delta=0.1}.

Linear approximation to exponential function
Linear approximation to exponential function

As is done on this graph, we can always take {\ell} to be the secant line to the graph of {f} based on the endpoints of the interval of consideration. This is because if {L} is another line for which {|f(x)-L(x)|<\epsilon \,\delta} holds, then {|\ell-L|\le \epsilon \,\delta} at the endpoints, and therefore on all of the interval (the function {x\mapsto |\ell(x)-L(x)|} is convex).


Here is a non-differentiable function that obviously fails the linear approximation property at {0}.

Self-similar graph
Self-similar graph

(By the way, this post is mostly about me trying out SageMathCloud.) A nice thing about {f(x)=x\sin \log |x|} is self-similarity: {f(rx)=rf(x)} with the similarity factor {r=e^{2\pi}}. This implies that no matter how far we zoom in on the graph at {x=0}, the graph will not get any closer to linear.

I like {x\sin \log |x|} more than its famous, but not self-similar, cousin {x\sin(1/x)}, pictured below.

Standard example from intro to real analysis
Standard example from intro to real analysis

Interestingly, linear approximation property does not imply differentiability. The function {f(x)=x\sin \sqrt{-\log|x|}} has this property at {0}, but it lacks derivative there since {f(x)/x} does not have a limit as {x\rightarrow 0}. Here is how it looks.

Now with the square root!
Now with the square root!

Let’s look at the scale {\delta=0.1}

scale 0.01
scale 0.1

and compare to the scale {\delta=0.001}

scale 0.001
scale 0.001

Well, that was disappointing. Let’s use math instead. Fix {\epsilon>0} and consider the function {\phi(\delta)=\sqrt{-\log \delta}-\sqrt{-\log (\epsilon \delta)}}. Rewriting it as

\displaystyle    \frac{\log \epsilon}{\sqrt{-\log \delta}+\sqrt{-\log (\epsilon \delta)}}

shows {\phi(\delta)\rightarrow 0} as {\delta\rightarrow 0}. Choose {\delta} so that {|\phi(\delta)|<\epsilon} and define {\ell(x)=x\sqrt{-\log \delta}}. Then for {\epsilon \,\delta\le |x|< \delta} we have {|f(x)-\ell(x)|\le \epsilon |x|<\epsilon\,\delta}, and for {|x|<\epsilon \delta} the trivial bound {|f(x)-\ell(x)|\le |f(x)|+|\ell(x)|} suffices.

Thus, {f} can be well approximated by linear functions near {0}; it’s just that the linear function has to depend on the scale on which approximation is made: its slope {\sqrt{-\log \delta}} does not have a limit as {\delta\to0}.

The linear approximation property does not become apparent until extremely small scales. Here is {\delta = 10^{-30}}.

Scale 1e-30
Scale 1e-30

Using a paraboloid to cover points with a disk

Find the equation of tangent line to parabola {y=x^2}borrring calculus drill.

Okay. Draw two tangent lines to the parabola, then. Where do they intersect?

Two tangent lines
Two tangent lines

If the points of tangency are {a} and {b}, then the tangent lines are
{y=2a(x-a)+a^2} and {y=2b(x-b)+b^2}. Equate and solve:

\displaystyle    2a(x-a)+a^2 = 2b(x-b)+b^2 \implies x = \frac{a+b}{2}

Neat! The {x}-coordinate of the intersection point is midway between {a} and {b}.

What does the {y}-coordinate of the intersection tell us? It simplifies to

\displaystyle    2a(b-a)/2+a^2 = ab

the geometric meaning of which is not immediately clear. But maybe we should look at the vertical distance from intersection to the parabola itself. That would be

\displaystyle    x^2 - y = \left(\frac{a+b}{2}\right)^2 -ab = \left(\frac{a-b}{2}\right)^2

This is the square of the distance from the midpoint to {a} and {b}. In other words, the squared radius of the smallest “disk” covering the set {\{a,b\}}.


Same happens in higher dimensions, where parabola is replaced with the paraboloid {z=|\mathbf x|^2}, {\mathbf x = (x_1,\dots x_n)}.

Paraboloid
Paraboloid

Indeed, the tangent planes at {\mathbf a} and {\mathbf b} are
{z=2\mathbf a\cdot (\mathbf x-\mathbf a)+|\mathbf a|^2} and {z=2\mathbf b\cdot (\mathbf x-\mathbf b)+|\mathbf b|^2}. Equate and solve:

\displaystyle    2(\mathbf a-\mathbf b)\cdot \mathbf x = |\mathbf a|^2-|\mathbf b|^2 \implies \left(\mathbf x-\frac{\mathbf a+\mathbf b}{2}\right)\cdot (\mathbf a-\mathbf b) =0

So, {\mathbf x} lies on the equidistant plane from {\mathbf a} and {\mathbf b}. And, as above,

\displaystyle    |\mathbf x|^2 -z = \left|\frac{\mathbf a-\mathbf b}{2}\right|^2

is the square of the radius of smallest disk covering both {\mathbf a} and {\mathbf b}.


The above observations are useful for finding the smallest disk (or ball) covering given points. For simplicity, I stick to two dimensions: covering points on a plane with the smallest disk possible. The algorithm is:

  1. Given points {(x_i,y_i)}, {i=1,\dots,n}, write down the equations of tangent planes to paraboloid {z=x^2+y^2}. These are {z=2(x_i x+y_i y)-(x_i^2+y_i^2)}.
  2. Find the point {(x,y,z)} that minimizes the vertical distance to paraboloid, that is {x^2+y^2-z}, and lies (non-strictly) below all of these tangent planes.
  3. The {x,y} coordinates of this point is the center of the smallest disk covering the points. (Known as the Chebyshev center of the set). Also, {\sqrt{x^2+y^2-z}} is the radius of this disk; known as the Chebyshev radius.

The advantage conferred by the paraboloid model is that at step 2 we are minimizing a quadratic function subject to linear constraints. Implementation in Sage:

points = [[1,3], [1.5,2], [3,2], [2,-1], [-1,0.5], [-1,1]] 
constraints = [lambda x, p=q: 2*x[0]*p[0]+2*x[1]*p[1]-p[0]^2-p[1]^2-x[2] for q in points]
target = lambda x: x[0]^2+x[1]^2-x[2]
m = minimize_constrained(target,constraints,[0,0,0]) 
circle((m[0],m[1]),sqrt(m[0]^2+m[1]^2-m[2]),color='red') + point(points)

Smallest disk covering the points
Smallest disk covering the points

Credit: this post is an expanded version of a comment by David Speyer on last year’s post Covering points with caps, where I considered the same problem on a sphere.

Words that contain UIO, and best-fitting lines

In Calculus I we spend a fair amount of time talking about how nicely the tangent line fits a smooth curve.

Tangent line to exponential function at x=0
Tangent line to exponential function at x=0

But truth be told, it fits only near the point of tangency. How can we find the best approximating line for a function {f} on a given interval?

Best linear fit to the exponential function on [0,1]
Best linear fit to the exponential function on [0,1]

A natural measure of quality of approximation is the maximum deviation of the curve from the line, {E(f;\alpha,\beta) = \max_{[a, b]} |f(x) - \alpha x-\beta|} where {\alpha,\beta} are the coefficients in the line equation, to be determined. We need {\alpha,\beta} that minimize {E(f;\alpha,\beta)}.

The Chebyshev equioscillation theorem is quite useful here. For one thing, its name contains the letter combination uio, which Scrabble players may appreciate. (Can you think of other words with this combination?) Also, its statement does not involve concepts outside of Calculus I. Specialized to the case of linear fit, it says that {\alpha,\beta} are optimal if and only if there exist three numbers { x_1<x_2<x_3} in {[a, b]} such that the deviations {\delta_i = f(x_i) - \alpha x_i-\beta}

  • are equal to {E(f;\alpha,\beta)} in absolute value: {|\delta_i| = E(f;\alpha,\beta)} for {i=1,2,3}
  • have alternating signs: {\delta_1 = -\delta_2 = \delta_3}

Let’s consider what this means. First, {f'(x_i) =\alpha} unless {x_i} is an endpoint of {[a,b]}. Since {x_2} cannot be an endpoint, we have {f'(x_2)=\alpha}.

Furthermore, {f(x) - \alpha x } takes the same value at {x_1} and {x_3}. This gives an equation for {x_2}

\displaystyle    f(x_1)-f'(x_2)x_1 = f(x_3)-f'(x_2) x_3    \qquad \qquad (1)

which can be rewritten in the form resembling the Mean Value Theorem:

\displaystyle    f'(x_2) = \frac{f(x_1)-f(x_3)}{x_1-x_3}   \qquad \qquad (2)

If {f'} is strictly monotone, there can be only one {x_i} with {f'(x_i)=\alpha}. Hence {x_1=a} and {x_3=b} in this case, and we find {x_2} by solving (2). This gives {\alpha = f'(x_2)}, and then {\beta} is not hard to find.

Here is how I did this in Sage:

var('x a b')
f = sin(x)  # or another function
df = f.diff(x)
a = # left endpoint
b = # right endpoint

That was the setup. Now the actual computation:

var('x1 x2 x3')
x1 = a
x3 = b
x2 = find_root(f(x=x1)-df(x=x2)*x1 == f(x=x3)-df(x=x2)*x3, a, b)
alpha = df(x=x2)
beta = 1/2*(f(x=x1)-alpha*x1 + f(x=x2)-alpha*x2)
show(plot(f,a,b)+plot(alpha*x+beta,a,b,color='red'))
Fitting a line to the sine curve
Fitting a line to the sine

However, the algorithm fails to properly fit a line to the sine function on {[0,3\pi/2]}:

Not the best fit, obviously
Not the best fit, obviously

The problem is, {f'(x)=\cos x} is no longer monotone, making it possible for two of {x_i} to be interior points. Recalling the identities for cosine, we see that these points must be symmetric about {x=\pi}. One of {x_i} must still be an endpoint, so either {x_1=a} (and {x_3=2\pi-x_2}) or {x_3=b} (and {x_1=2\pi-x_2}). The first option works:

Best fit on [0,3pi/2]
Best fit on [0,3pi/2]

This same line is also the best fit on the full period {[0,2\pi]}. It passes through {(\pi,0)} and has the slope of {-0.2172336...} which is not a number I can recognize.

On the interval {[0,4\pi]}, all three of the above approaches fail:

Wrong!
Wrong!
Also wrong!
Also wrong!
What were you thinking?
What were you thinking?

Luckily we don’t need a computer in this case. Whenever {|f|} has at least three points of maximum with alternating signs of {f}, the Chebyshev equioscillation theorem implies that the best linear fit is the zero function.

Improving the Wallis product

The Wallis product for {\pi}, as seen on Wikipedia, is

{\displaystyle 2\prod_{k=1}^\infty \frac{4k^2}{4k^2-1}  = \pi \qquad \qquad (1)}

Historical significance of this formula nonwithstanding, one has to admit that this is not a good way to approximate {\pi}. For example, the product up to {k=10} is

{\displaystyle 2\,\frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{1\cdot 3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21} =\frac{137438953472}{44801898141} }

And all we get for this effort is the lousy approximation {\pi\approx \mathbf{3.0677}}.

But it turns out that (1) can be dramatically improved with a little tweak. First, let us rewrite partial products in (1) in terms of double factorials. This can be done in two ways: either

{\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1}  =  (4n+2) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2  \qquad \qquad (2)}

or

{\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1}  =  \frac{2}{2n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2  \qquad \qquad (3)}

Seeing how badly (2) underestimates {\pi}, it is natural to bump it up: replace {4n+2} with {4n+3}:

{\displaystyle \pi \approx b_n= (4n+3) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2  \qquad \qquad (4)}

Now with {n=10} we get {\mathbf{3.1407}} instead of {\mathbf{3.0677}}. The error is down by two orders of magnitude, and all we had to do was to replace the factor of {4n+2=42} with {4n+3=43}. In particular, the size of numerator and denominator hardly changed:

{\displaystyle b_{10}=43\, \frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21\cdot 21} }

Approximation (4) differs from (2) by additional term {\left(\frac{(2n)!!}{(2n+1)!!}\right)^2}, which decreases to zero. Therefore, it is not obvious whether the sequence {b_n} is increasing. To prove that it is, observe that the ratio {b_{n+1}/b_n} is

{\displaystyle  \frac{4n+7}{4n+3}\left(\frac{2n+2}{2n+3}\right)^2}

which is greater than 1 because

{\displaystyle  (4n+7)(2n+2)^2 - (4n+3)(2n+3)^2 = 1 >0 }

Sweet cancellation here. Incidentally, it shows that if we used {4n+3+\epsilon} instead of {4n+3}, the sequence would overshoot {\pi} and no longer be increasing.

The formula (3) can be similarly improved. The fraction {2/(2n+1)} is secretly {4/(4n+2)}, which should be replaced with {4/(4n+1)}. The resulting approximation for {\pi}

{\displaystyle c_n =  \frac{4}{4n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2  \qquad \qquad (5)}

is about as good as {b_n}, but it approaches {\pi} from above. For example, {c_{10}\approx \mathbf{3.1425}}.

The proof that {c_n} is decreasing is familiar: the ratio {c_{n+1}/c_n} is

{\displaystyle  \frac{4n+1}{4n+5}\left(\frac{2n+2}{2n+1}\right)^2}

which is less than 1 because

{\displaystyle  (4n+1)(2n+2)^2 - (4n+5)(2n+1)^2 = -1 <0 }

Sweet cancellation once again.

Thus, {b_n<\pi<c_n} for all {n}. The midpoint of this containing interval provides an even better approximation: for example, {(b_{10}+c_{10})/2 \approx \mathbf{3.1416}}. The plot below displays the quality of approximation as logarithm of the absolute error:

Logarithm of approximation error
  • yellow dots show the error of Wallis partial products (2)-(3)
  • blue is the error of {b_n}
  • red is for {c_n}
  • black is for {(b_n+c_n)/2}

And all we had to do was to replace {4n+2} with {4n+3} or {4n+1} in the right places.

Real zeros of sine Taylor polynomials

The more terms of Taylor series {\displaystyle \sin x = x-\frac{x^3}{3!}+ \frac{x^5}{5!}- \cdots } we use, the more resemblance we see between the Taylor polynomial and the sine function itself. The first-degree polynomial matches one zero of the sine, and gets the slope right. The third-degree polynomial has three zeros in about the right places.

Third degree Taylor polynomial
Third degree, three zeros

The fifth-degree polynomial will of course have … wait a moment.

Fifth degree Taylor polynomial
Fifth degree, only one zero

Since all four critical points are in the window, there are no real zeros outside of our view. Adding the fifth-degree term not only fails to increase the number of zeros to five, it even drops it back to the level of {T_1(x)=x}. How odd.

Since the sine Taylor series converges uniformly on bounded intervals, for every { A } there exists { n } such that {\max_{[-A,A]} |\sin x-T_n(x)|<1 }. Then { T_n } will have the same sign as { \sin x } at the maxima and minima of the latter. Consequently, it will have about { 2A/\pi } zeros on the interval {[-A,A] }. Indeed, the intermediate value theorem guarantees that many; and the fact that {T_n'(x) \approx \cos x } on { [-A,A]} will not allow for extraneous zeros within this interval.

Using the Taylor remainder estimate and Stirling's approximation, we find {A\approx (n!)^{1/n} \approx n/e }. Therefore, { T_n } will have about { 2n/(\pi e) } real zeros at about the right places. What happens when {|x| } is too large for Taylor remainder estimate to be effective, we can't tell.

Let's just count the zeros, then. Sage online makes it very easy:

sineroots = [[2*n-1,len(sin(x).taylor(x,0,2*n-1).roots(ring=RR))] for n in range(1,51)]
scatter_plot(sineroots) 
Roots of sine Taylor polynomials
Roots of sine Taylor polynomials

The up-and-down pattern in the number of zeros makes for a neat scatter plot. How close is this data to the predicted number { 2n/(\pi e) }? Pretty close.

scatter_plot(sineroots,facecolor='#eeee66') + plot(2*n/(pi*e),(n,1,100))
Compared to 2n/pi*e
Compared to 2n/pi*e

The slope of the blue line is { 2/(\pi e) \approx 0.2342 }; the (ir)rationality of this number is unknown. Thus, just under a quarter of the zeros of { T_n } are expected to be real when { n } is large.

The actual number of real zeros tends to exceed the prediction (by only a few) because some Taylor polynomials have real zeros in the region where they no longer follow the function. For example, { T_{11} } does this:

Spurious zero around x=7
Spurious zero around x=7

Richard S. Varga and Amos J. Carpenter wrote a series of papers titled Zeros of the partial sums of { \cos z } and {\sin z } in which they classify real zeros into Hurwitz (which follow the corresponding trigonometric function) and spurious. They give the precise count of the Hurwitz zeros: {1+2\lfloor n/(\pi e)\rfloor } for the sine and {2\lfloor n/(\pi e)+1/2\rfloor } for the cosine. The total number of real roots does not appear to admit such an explicit formula. It is the sequence A012264 in the OEIS.