## Rarefaction colliding with two shocks

The Burgers equation ${v_t+vv_x =0}$ is a great illustration of shock formation and propagation, with ${v(x,t)}$ representing velocity at time ${t}$ at position ${x}$. Let’s consider it with piecewise constant initial condition

$\displaystyle u(x,0) = \begin{cases} 1,\quad & x<0 \\ 0,\quad & 02 \end{cases}$

The equation, rewritten as ${(v_t,v_x)\cdot (1,v)=0}$, states that the function ${v}$ stays constant on each line of the form ${x= x_0 + v(x_0,0)t}$. Since we know ${v(x_0,0)}$, we can go ahead and plot these lines (characteristics). The vertical axis is ${t}$, horizontal ${x}$.

Clearly, this picture isn’t complete. The gap near ${1}$ is due to the jump of velocity from ${0}$ to ${2}$: the particles in front move faster than those in back. This creates a rarefaction wave: the gap gets filled by characteristics emanating from ${(1,0)}$. They have different slopes, and so the velocity ${v}$ also varies within the rarefaction: it is ${v(x,t)=(x-1)/t}$, namely the velocity with which one has to travel from ${(1,0)}$ to ${(x,t)}$.

The intersecting families of characteristics indicate a shock wave. Its propagation speed is the average of two values of ${v}$ to the left and to the right. Let’s draw the shock waves in red.

Characteristics terminate when they run into shock. Truncating the constant-speed characteristics clears up the picture:

This is already accurate up to the time ${t=1}$. But after that we encounter a complication: the shock wave to the right separates velocity fields of varying intensity, due to rarefaction to the left of the shock. Its propagation is now described by the ODE

$\displaystyle \frac{dx}{dt} = \frac12 \left( \frac{x-1}{t} + 0 \right) = \frac{x-1}{2t}$

which can be solved easily: ${x(t) = 2\sqrt{t} +1}$.

Similarly, the shock on the left catches up with rarefaction at ${t=2}$, and its position after that is found from the ODE

$\displaystyle \frac{dx}{dt} = \frac12 \left( 1 + \frac{x-1}{t} \right) = \frac{x-1+t}{2t}$

Namely, ${x(t) = t-\sqrt{2t}+1}$ for ${t>2}$. Let’s plot:

The space-time trajectories of both shocks became curved. Although initially, the shock on the right moved twice as fast as the shock on the left, the rarefaction wave pulls them toward each other. What is this leading to? Yet another collision.

The final collision occurs at the time ${t=6+4\sqrt{2} \approx 11.5}$ when the two shock waves meet. At this moment, rarefaction ceases to exist. The single shock wave forms, separating two constant velocity fields (velocities ${1}$ and ${0}$). It propagates to the right at constant speed ${1/2}$. Here is the complete picture of the process:

I don’t know where the holes in the spacetime came from; they aren’t predicted by the Burgers equation.

## Oscillatory explosion

Nonlinear differential equations don’t necessarily have globally defined solutions, even if the equation is autonomous (no time is involved explicitly). The simplest example is ${y'=y^2}$ with initial condition ${y(0)=1}$: the solution is ${y(t) = 1/(1-t )}$, which ceases to exist at ${t=1}$, escaping to infinity.

This kind of behavior can often be demonstrated without solving the ODE. Consider ${y''=y^3}$ with ${y(0)=1}$, ${y'(0)=0}$. I’ve no idea what the explicit solution is, but it’s clear that the quantity ${E(t) = \frac12 (y')^2 - \frac14 y^4}$ remains constant: indeed, ${E'(t) = y''y' - y^3 y' = 0}$. (Here, ${E}$ is analogous to the sum of kinetic and potential energy in physics.)

The solution ${y}$ is increasing and convex. Let ${t_n}$ be such that ${y(t_n)=n}$, e.g., ${t_1=0}$. The invariance of ${E}$ yields ${y'(t_n) = 2^{-1/2} (n^2 - 1)}$. By the mean value theorem, ${t_{n+1}-t_n \le \sqrt{2}(n^2-1)^{-1}}$ for ${n=2,3,\dots}$. Since the series on the right converges, the limit ${T = \lim_{n\rightarrow\infty }t_n}$ is finite; this is the blow-up time.

But no blow-up occurs for the equation ${y''=-y^3}$, where the nonlinear term pushes back toward equilibrium. Indeed, the invariant energy is now ${E= \frac12 (y')^2 + \frac14 y^4}$, which implies that ${y}$ and ${y'}$ stay bounded. In the phase space ${(y,y')}$ the solution with initial values ${y(0)=1}$, ${y'(0)=0}$ traces out this curve:

Accordingly, the solution is a periodic function of ${t}$ (although this is not a trigonometric function):

Everything so far has been pretty straightforward. But here is a stranger animal: ${y'''=-y^3}$. As in the previous example, nonlinear term pushes toward equilibrium. Using initial conditions ${y(0)=1}$, ${y'(0)=y''(0)=0}$, I get this numerical solution up to time ${T=5}$:

As in the previous example, ${y}$ oscillates. But the speed and amplitude of oscillation are increasing.

Rapidly increasing:

In the plane ${(y,y')}$ the solution spirals out:

The plots make it clear that the solution ceases to exist in finite time, but I don’t have a proof. The issue is that the function ${y}$ does not escape to infinity in just one direction. Indeed, if ${y(t_0)>0}$, then regardless of the values of ${y'}$ and ${y''}$ at ${t_0}$, the negative third derivative will eventually make the function ${y}$ decrease, and so ${y}$ will attain the value ${0}$ at some ${t_1>t_0}$. After that, the third derivative is positive, guaranteeing the existence of time ${t_2>t_1}$ when ${y}$ returns to ${0}$ again. I haven’t succeeded in proving that the limit ${\lim t_n}$ is finite, giving the time when oscillatory explosion occurs.

The plots were made in SageMathCloud:

var('t y y1 y2')
P = desolve_system_rk4([y1,y2,-y^3],[y,y1,y2], ics=[0,1,0,0],ivar=t,end_points=5,step=0.01)
Q=[ [i,j] for i,j,k,l in P]
list_plot(Q, plotjoined=True)

## Implicit MÃ¶bius strip

Reading the Wolfram MathWorldâ„¢ article on the MÃ¶bius strip I came across an equation that was news to me.

After the familiar parametric form

$\displaystyle x=(1+s\cos(t/2))\cos t,\quad y=(1+s\cos(t/2))\sin t, \quad z = s\sin(t/2) \qquad (1)$

it is said that “In this parametrization, the MÃ¶bius strip is therefore a cubic surface with equation…”

$\displaystyle y(x^2+y^2+z^2-1)-2z(x^2+y^2+x) =0 \qquad\qquad\qquad(2)$

That’s a neat cubic equation for sure. But… wouldn’t the sign of the left hand side of (2) (call it $F$) distinguish the “positive” and “negative” sides of the surface, contrary to its non-orientability?

I went to SageMathCloudâ„¢ to check and got this plot of (2):

That’s… not exactly the MÃ¶bius strip as I know it. But it’s true that (1) implies (2): the cubic surface contains the MÃ¶bius strip. Here is the plot of (1) superimposed on the plot of (2).

The self-intersection arises where the “positive side” $F > 0$ suddenly transitions to negative $F< 0$.

## Trigonometric approximation and the Clenshaw-Curtis quadrature

Trying to approximate a generic continuous function on ${[-1,1]}$ with the Fourier trigonometric series of the form ${\sum_n (A_n\cos \pi nx+B_n\sin \pi nx)}$ is in general not very fruitful. Here’s such an approximation to ${f(x)=\exp(x)}$, with the sum over ${n\le 4}$:

It’s better to make a linear change of variable: consider ${f(2x-1)}$ on the interval ${[0,1]}$, and use the formula for the cosine series. This results in ${\exp(2x-1)}$, which is approximated by the partial sum ${\sum_{n=0}^4 A_n\cos \pi nx}$ of its cosine Fourier series as follows.

But one can do much better with a different, nonlinear change of variable. Consider ${f(\cos x)}$ on the interval ${[0,\pi]}$, and again use the formula for the cosine series. This results in ${\exp(\cos x)}$, which is approximated by the partial sum ${\sum_{n=0}^4 A_n\cos nx}$ of its cosine Fourier series as follows.

Yes, I can’t see any difference either: the error is less than ${10^{-3}}$.

The composition with cosine improves approximation because ${f(\cos t)}$ is naturally a periodic function, with no jumps or corners in its graph. Fourier series, which are periodic by nature, follow such functions more easily.

A practical implication of this approximation of ${f(\cos t)}$ is the Clenshaw-Curtis integration method. It can be expressed in one line:

$\displaystyle \int_{-1}^1 f(x)\,dx = \int_0^\pi f(\cos t)\sin t\,dt \approx \int_0^\pi \sum_{n=0}^N a_n \cos nt \sin t\,dt = \sum_{n=0}^N \frac{(1+(-1)^n) a_n}{1-n^2}$

The integral ${\int_{-1}^1 f(x)\,dx}$ is approximated by summing ${2a_{2k}/(1-4k^2)}$, where ${a_{2k}}$ are even-numbered cosine coefficients of ${f(\cos t)}$. In the example with ${f(x)=\exp(x)}$ using just three coefficients yields

$\displaystyle \frac{2a_0}{1}+\frac{2a_2}{-3}+\frac{2a_4}{-15} \approx 2.350405$

while the exact integral is ${\approx 2.350402}$.

At first this doesn’t look practical at all: after all, the Fourier coefficients are themselves found by integration. But since ${f(\cos t)}$ is so close to a trigonometric polynomial, one can sample it at equally spaced points and apply the Fast Fourier transform to the result, quickly obtaining many coefficients at once. This is what the Clenshaw-Curtis quadrature does (at least in principle, the practical implementation may streamline these steps.)

## Completely monotone imitation of 1/x

I wanted an example of a function ${f}$ that behaves mostly like ${1/x}$ (the product ${xf(x)}$ is bounded between two positive constants), but such that ${xf(x)}$ does not have a limit as ${x\rightarrow 0}$.

The first thing that comes to mind is ${(2+\sin(1/x))/x}$, but this function does not look very much like ${1/x}$.

Then I tried ${f(x)=(2+\sin\log x)/x}$, recalling an example from Linear Approximation and Differentiability. It worked well:

In fact, it worked much better than I expected. Not only if ${f'}$ of constant sign, but so are ${f''}$ and ${f'''}$. Indeed,

$\displaystyle f'(x) = \frac{\cos \ln x - \sin \log x - 2}{x^2}$

is always negative,

$\displaystyle f''(x) = \frac{4 -3\cos \log x + \sin \log x}{x^3}$

is always positive,

$\displaystyle f'''(x) = \frac{10\cos \log x -12}{x^4}$

is always negative. The sign becomes less obvious with the fourth derivative,

$\displaystyle f^{(4)}(x) = \frac{48-40\cos\log x - 10\sin \cos \ln x}{x^5}$

because the triangle inequality isn’t conclusive now. But the amplitude of ${A\cos t+B\sin t}$ is ${\sqrt{A^2+B^2}}$, and ${\sqrt{40^2+10^2}<48}$.

So, it seems that ${f}$ is completely monotone, meaning that ${(-1)^n f^{(n)}(x)\ge 0}$ for all ${x>0}$ and for all ${n=0,1,2,\dots}$. But we already saw that this sign pattern can break after many steps. So let’s check carefully.

Direct calculation yields the neat identity

$\displaystyle \left(\frac{1+a\cos \log x+b\sin\log x}{x^n}\right)' = -n\,\frac{1+(a-b/n)\cos\log x+(b+a/n) \sin\log x}{x^{n+1}}$

With its help, the process of differentiating the function ${f(x) = (1+a\cos \log x+b\sin\log x)/x}$ can be encoded as follows: ${a_1=a}$, ${b_1=b}$, then ${a_{n+1}=a_n-b_n/n}$ and ${b_{n+1} = b_n+a_n/n}$. The presence of ${1/n}$ is disconcerting because the harmonic series diverges. But orthogonality helps: the added vector ${(-b_n/n, a_n/n)}$ is orthogonal to ${(a_n,b_n)}$.

The above example, rewritten as ${f(x)=(1+\frac12\sin\log x)/x}$, corresponds to starting with ${(a,b) = (0,1/2)}$. I calculated and plotted ${10000}$ iterations: the points ${(a_n,b_n)}$ are joined by piecewise linear curve.

The total length of this curve is infinite, since the harmonic series diverges. The question is, does it stay within the unit disk? Let’s find out. By the above recursion,

$\displaystyle a_{n+1}^2 + b_{n+1}^2 = \left(1+\frac{1}{n^2}\right) (a_n^2+b_n^2)$

Hence, the squared magnitude of ${(a_n,b_n)}$ will always be less than

$\displaystyle \frac14 \prod_{n=1}^\infty \left(1+\frac{1}{n^2}\right)$

with ${1/4}$ being ${a^2+b^2}$. The infinite product evaluates to ${\frac{\sinh \pi}{\pi}a\approx 3.7}$ (explained here), and thus the polygonal spiral stays within the disk of radius ${\frac12 \sqrt{\frac{\sinh \pi}{\pi}}\approx 0.96}$. In conclusion,

$\displaystyle (-1)^{n} \left(\frac{1+(1/2)\sin\log x}{x}\right)^{(n)} = n!\,\frac{1+a_{n+1}\cos\log x+b_{n+1} \sin\log x}{x^{n+1}}$

where the trigonometric function ${a_{n+1}\cos\log x+b_{n+1} \sin\log x}$ has amplitude strictly less than ${1}$. Since the expression on the right is positive, ${f}$ is completely monotone.

The plot was generated in Sage using the code below.

a,b,c,d = var('a b c d')
a = 0
b = 1/2
l = [(a,b)]
for k in range(1,10000):
c = a-b/k
d = b+a/k
l.append((c,d))
a = c
b = d
show(line(l),aspect_ratio=1)

## Linear approximation and differentiability

If a function ${f\colon \mathbb R\rightarrow \mathbb R}$ is differentiable at ${a\in \mathbb R}$, then it admits good linear approximation at small scales. Precisely: for every ${\epsilon>0}$ there is ${\delta>0}$ and a linear function ${\ell(x)}$ such that ${|f(x)-\ell(x)|<\epsilon \,\delta}$ for all ${|x|<\delta}$. Having ${\delta}$ multiplied by ${\epsilon}$ means that the deviation from linearity is small compared to the (already small) scale ${\delta}$ on which the function is considered.

For example, this is a linear approximation to ${f(x)=e^x}$ near ${0}$ at scale ${\delta=0.1}$.

As is done on this graph, we can always take ${\ell}$ to be the secant line to the graph of ${f}$ based on the endpoints of the interval of consideration. This is because if ${L}$ is another line for which ${|f(x)-L(x)|<\epsilon \,\delta}$ holds, then ${|\ell-L|\le \epsilon \,\delta}$ at the endpoints, and therefore on all of the interval (the function ${x\mapsto |\ell(x)-L(x)|}$ is convex).

Here is a non-differentiable function that obviously fails the linear approximation property at ${0}$.

(By the way, this post is mostly about me trying out SageMathCloud.) A nice thing about ${f(x)=x\sin \log |x|}$ is self-similarity: ${f(rx)=rf(x)}$ with the similarity factor ${r=e^{2\pi}}$. This implies that no matter how far we zoom in on the graph at ${x=0}$, the graph will not get any closer to linear.

I like ${x\sin \log |x|}$ more than its famous, but not self-similar, cousin ${x\sin(1/x)}$, pictured below.

Interestingly, linear approximation property does not imply differentiability. The function ${f(x)=x\sin \sqrt{-\log|x|}}$ has this property at ${0}$, but it lacks derivative there since ${f(x)/x}$ does not have a limit as ${x\rightarrow 0}$. Here is how it looks.

Let’s look at the scale ${\delta=0.1}$

and compare to the scale ${\delta=0.001}$

Well, that was disappointing. Let’s use math instead. Fix ${\epsilon>0}$ and consider the function ${\phi(\delta)=\sqrt{-\log \delta}-\sqrt{-\log (\epsilon \delta)}}$. Rewriting it as

$\displaystyle \frac{\log \epsilon}{\sqrt{-\log \delta}+\sqrt{-\log (\epsilon \delta)}}$

shows ${\phi(\delta)\rightarrow 0}$ as ${\delta\rightarrow 0}$. Choose ${\delta}$ so that ${|\phi(\delta)|<\epsilon}$ and define ${\ell(x)=x\sqrt{-\log \delta}}$. Then for ${\epsilon \,\delta\le |x|< \delta}$ we have ${|f(x)-\ell(x)|\le \epsilon |x|<\epsilon\,\delta}$, and for ${|x|<\epsilon \delta}$ the trivial bound ${|f(x)-\ell(x)|\le |f(x)|+|\ell(x)|}$ suffices.

Thus, ${f}$ can be well approximated by linear functions near ${0}$; it’s just that the linear function has to depend on the scale on which approximation is made: its slope ${\sqrt{-\log \delta}}$ does not have a limit as ${\delta\to0}$.

The linear approximation property does not become apparent until extremely small scales. Here is ${\delta = 10^{-30}}$.

## Using a paraboloid to cover points with a disk

Find the equation of tangent line to parabola ${y=x^2}$borrring calculus drill.

Okay. Draw two tangent lines to the parabola, then. Where do they intersect?

If the points of tangency are ${a}$ and ${b}$, then the tangent lines are
${y=2a(x-a)+a^2}$ and ${y=2b(x-b)+b^2}$. Equate and solve:

$\displaystyle 2a(x-a)+a^2 = 2b(x-b)+b^2 \implies x = \frac{a+b}{2}$

Neat! The ${x}$-coordinate of the intersection point is midway between ${a}$ and ${b}$.

What does the ${y}$-coordinate of the intersection tell us? It simplifies to

$\displaystyle 2a(b-a)/2+a^2 = ab$

the geometric meaning of which is not immediately clear. But maybe we should look at the vertical distance from intersection to the parabola itself. That would be

$\displaystyle x^2 - y = \left(\frac{a+b}{2}\right)^2 -ab = \left(\frac{a-b}{2}\right)^2$

This is the square of the distance from the midpoint to ${a}$ and ${b}$. In other words, the squared radius of the smallest “disk” covering the set ${\{a,b\}}$.

Same happens in higher dimensions, where parabola is replaced with the paraboloid ${z=|\mathbf x|^2}$, ${\mathbf x = (x_1,\dots x_n)}$.

Indeed, the tangent planes at ${\mathbf a}$ and ${\mathbf b}$ are
${z=2\mathbf a\cdot (\mathbf x-\mathbf a)+|\mathbf a|^2}$ and ${z=2\mathbf b\cdot (\mathbf x-\mathbf b)+|\mathbf b|^2}$. Equate and solve:

$\displaystyle 2(\mathbf a-\mathbf b)\cdot \mathbf x = |\mathbf a|^2-|\mathbf b|^2 \implies \left(\mathbf x-\frac{\mathbf a+\mathbf b}{2}\right)\cdot (\mathbf a-\mathbf b) =0$

So, ${\mathbf x}$ lies on the equidistant plane from ${\mathbf a}$ and ${\mathbf b}$. And, as above,

$\displaystyle |\mathbf x|^2 -z = \left|\frac{\mathbf a-\mathbf b}{2}\right|^2$

is the square of the radius of smallest disk covering both ${\mathbf a}$ and ${\mathbf b}$.

The above observations are useful for finding the smallest disk (or ball) covering given points. For simplicity, I stick to two dimensions: covering points on a plane with the smallest disk possible. The algorithm is:

1. Given points ${(x_i,y_i)}$, ${i=1,\dots,n}$, write down the equations of tangent planes to paraboloid ${z=x^2+y^2}$. These are ${z=2(x_i x+y_i y)-(x_i^2+y_i^2)}$.
2. Find the point ${(x,y,z)}$ that minimizes the vertical distance to paraboloid, that is ${x^2+y^2-z}$, and lies (non-strictly) below all of these tangent planes.
3. The ${x,y}$ coordinates of this point is the center of the smallest disk covering the points. (Known as the Chebyshev center of the set). Also, ${\sqrt{x^2+y^2-z}}$ is the radius of this disk; known as the Chebyshev radius.

The advantage conferred by the paraboloid model is that at step 2 we are minimizing a quadratic function subject to linear constraints. Implementation in Sage:

 points = [[1,3], [1.5,2], [3,2], [2,-1], [-1,0.5], [-1,1]] constraints = [lambda x, p=q: 2*x[0]*p[0]+2*x[1]*p[1]-p[0]^2-p[1]^2-x[2] for q in points] target = lambda x: x[0]^2+x[1]^2-x[2] m = minimize_constrained(target,constraints,[0,0,0]) circle((m[0],m[1]),sqrt(m[0]^2+m[1]^2-m[2]),color='red') + point(points) 

Credit: this post is an expanded version of a comment by David Speyer on last year’s post Covering points with caps, where I considered the same problem on a sphere.

## Words that contain UIO, and best-fitting lines

In Calculus I we spend a fair amount of time talking about how nicely the tangent line fits a smooth curve.

But truth be told, it fits only near the point of tangency. How can we find the best approximating line for a function ${f}$ on a given interval?

A natural measure of quality of approximation is the maximum deviation of the curve from the line, ${E(f;\alpha,\beta) = \max_{[a, b]} |f(x) - \alpha x-\beta|}$ where ${\alpha,\beta}$ are the coefficients in the line equation, to be determined. We need ${\alpha,\beta}$ that minimize ${E(f;\alpha,\beta)}$.

The Chebyshev equioscillation theorem is quite useful here. For one thing, its name contains the letter combination uio, which Scrabble players may appreciate. (Can you think of other words with this combination?) Also, its statement does not involve concepts outside of Calculus I. Specialized to the case of linear fit, it says that ${\alpha,\beta}$ are optimal if and only if there exist three numbers ${ x_1 in ${[a, b]}$ such that the deviations ${\delta_i = f(x_i) - \alpha x_i-\beta}$

• are equal to ${E(f;\alpha,\beta)}$ in absolute value: ${|\delta_i| = E(f;\alpha,\beta)}$ for ${i=1,2,3}$
• have alternating signs: ${\delta_1 = -\delta_2 = \delta_3}$

Let’s consider what this means. First, ${f'(x_i) =\alpha}$ unless ${x_i}$ is an endpoint of ${[a,b]}$. Since ${x_2}$ cannot be an endpoint, we have ${f'(x_2)=\alpha}$.

Furthermore, ${f(x) - \alpha x }$ takes the same value at ${x_1}$ and ${x_3}$. This gives an equation for ${x_2}$

$\displaystyle f(x_1)-f'(x_2)x_1 = f(x_3)-f'(x_2) x_3 \qquad \qquad (1)$

which can be rewritten in the form resembling the Mean Value Theorem:

$\displaystyle f'(x_2) = \frac{f(x_1)-f(x_3)}{x_1-x_3} \qquad \qquad (2)$

If ${f'}$ is strictly monotone, there can be only one ${x_i}$ with ${f'(x_i)=\alpha}$. Hence ${x_1=a}$ and ${x_3=b}$ in this case, and we find ${x_2}$ by solving (2). This gives ${\alpha = f'(x_2)}$, and then ${\beta}$ is not hard to find.

Here is how I did this in Sage:

var('x a b')
f = sin(x)  # or another function
df = f.diff(x)
a = # left endpoint
b = # right endpoint

That was the setup. Now the actual computation:

var('x1 x2 x3')
x1 = a
x3 = b
x2 = find_root(f(x=x1)-df(x=x2)*x1 == f(x=x3)-df(x=x2)*x3, a, b)
alpha = df(x=x2)
beta = 1/2*(f(x=x1)-alpha*x1 + f(x=x2)-alpha*x2)
show(plot(f,a,b)+plot(alpha*x+beta,a,b,color='red'))

However, the algorithm fails to properly fit a line to the sine function on ${[0,3\pi/2]}$:

The problem is, ${f'(x)=\cos x}$ is no longer monotone, making it possible for two of ${x_i}$ to be interior points. Recalling the identities for cosine, we see that these points must be symmetric about ${x=\pi}$. One of ${x_i}$ must still be an endpoint, so either ${x_1=a}$ (and ${x_3=2\pi-x_2}$) or ${x_3=b}$ (and ${x_1=2\pi-x_2}$). The first option works:

This same line is also the best fit on the full period ${[0,2\pi]}$. It passes through ${(\pi,0)}$ and has the slope of ${-0.2172336...}$ which is not a number I can recognize.

On the interval ${[0,4\pi]}$, all three of the above approaches fail:

Luckily we don’t need a computer in this case. Whenever ${|f|}$ has at least three points of maximum with alternating signs of ${f}$, the Chebyshev equioscillation theorem implies that the best linear fit is the zero function.

## Improving the Wallis product

The Wallis product for ${\pi}$, as seen on Wikipedia, is

${\displaystyle 2\prod_{k=1}^\infty \frac{4k^2}{4k^2-1} = \pi \qquad \qquad (1)}$

Historical significance of this formula nonwithstanding, one has to admit that this is not a good way to approximate ${\pi}$. For example, the product up to ${k=10}$ is

${\displaystyle 2\,\frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{1\cdot 3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21} =\frac{137438953472}{44801898141} }$

And all we get for this effort is the lousy approximation ${\pi\approx \mathbf{3.0677}}$.

But it turns out that (1) can be dramatically improved with a little tweak. First, let us rewrite partial products in (1) in terms of double factorials. This can be done in two ways: either

${\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1} = (4n+2) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2 \qquad \qquad (2)}$

or

${\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1} = \frac{2}{2n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2 \qquad \qquad (3)}$

Seeing how badly (2) underestimates ${\pi}$, it is natural to bump it up: replace ${4n+2}$ with ${4n+3}$:

${\displaystyle \pi \approx b_n= (4n+3) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2 \qquad \qquad (4)}$

Now with ${n=10}$ we get ${\mathbf{3.1407}}$ instead of ${\mathbf{3.0677}}$. The error is down by two orders of magnitude, and all we had to do was to replace the factor of ${4n+2=42}$ with ${4n+3=43}$. In particular, the size of numerator and denominator hardly changed:

${\displaystyle b_{10}=43\, \frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21\cdot 21} }$

Approximation (4) differs from (2) by additional term ${\left(\frac{(2n)!!}{(2n+1)!!}\right)^2}$, which decreases to zero. Therefore, it is not obvious whether the sequence ${b_n}$ is increasing. To prove that it is, observe that the ratio ${b_{n+1}/b_n}$ is

${\displaystyle \frac{4n+7}{4n+3}\left(\frac{2n+2}{2n+3}\right)^2}$

which is greater than 1 because

${\displaystyle (4n+7)(2n+2)^2 - (4n+3)(2n+3)^2 = 1 >0 }$

Sweet cancellation here. Incidentally, it shows that if we used ${4n+3+\epsilon}$ instead of ${4n+3}$, the sequence would overshoot ${\pi}$ and no longer be increasing.

The formula (3) can be similarly improved. The fraction ${2/(2n+1)}$ is secretly ${4/(4n+2)}$, which should be replaced with ${4/(4n+1)}$. The resulting approximation for ${\pi}$

${\displaystyle c_n = \frac{4}{4n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2 \qquad \qquad (5)}$

is about as good as ${b_n}$, but it approaches ${\pi}$ from above. For example, ${c_{10}\approx \mathbf{3.1425}}$.

The proof that ${c_n}$ is decreasing is familiar: the ratio ${c_{n+1}/c_n}$ is

${\displaystyle \frac{4n+1}{4n+5}\left(\frac{2n+2}{2n+1}\right)^2}$

which is less than 1 because

${\displaystyle (4n+1)(2n+2)^2 - (4n+5)(2n+1)^2 = -1 <0 }$

Sweet cancellation once again.

Thus, ${b_n<\pi for all ${n}$. The midpoint of this containing interval provides an even better approximation: for example, ${(b_{10}+c_{10})/2 \approx \mathbf{3.1416}}$. The plot below displays the quality of approximation as logarithm of the absolute error:

• yellow dots show the error of Wallis partial products (2)-(3)
• blue is the error of ${b_n}$
• red is for ${c_n}$
• black is for ${(b_n+c_n)/2}$

And all we had to do was to replace ${4n+2}$ with ${4n+3}$ or ${4n+1}$ in the right places.

## Real zeros of sine Taylor polynomials

The more terms of Taylor series ${\displaystyle \sin x = x-\frac{x^3}{3!}+ \frac{x^5}{5!}- \cdots }$ we use, the more resemblance we see between the Taylor polynomial and the sine function itself. The first-degree polynomial matches one zero of the sine, and gets the slope right. The third-degree polynomial has three zeros in about the right places.

The fifth-degree polynomial will of course have … wait a moment.

Since all four critical points are in the window, there are no real zeros outside of our view. Adding the fifth-degree term not only fails to increase the number of zeros to five, it even drops it back to the level of ${T_1(x)=x}$. How odd.

Since the sine Taylor series converges uniformly on bounded intervals, for every ${ A }$ there exists ${ n }$ such that ${\max_{[-A,A]} |\sin x-T_n(x)|<1 }$. Then ${ T_n }$ will have the same sign as ${ \sin x }$ at the maxima and minima of the latter. Consequently, it will have about ${ 2A/\pi }$ zeros on the interval ${[-A,A] }$. Indeed, the intermediate value theorem guarantees that many; and the fact that ${T_n'(x) \approx \cos x }$ on ${ [-A,A]}$ will not allow for extraneous zeros within this interval.

Using the Taylor remainder estimate and Stirling's approximation, we find ${A\approx (n!)^{1/n} \approx n/e }$. Therefore, ${ T_n }$ will have about ${ 2n/(\pi e) }$ real zeros at about the right places. What happens when ${|x| }$ is too large for Taylor remainder estimate to be effective, we can't tell.

Let's just count the zeros, then. Sage online makes it very easy:

sineroots = [[2*n-1,len(sin(x).taylor(x,0,2*n-1).roots(ring=RR))] for n in range(1,51)]
scatter_plot(sineroots) 

The up-and-down pattern in the number of zeros makes for a neat scatter plot. How close is this data to the predicted number ${ 2n/(\pi e) }$? Pretty close.

scatter_plot(sineroots,facecolor='#eeee66') + plot(2*n/(pi*e),(n,1,100))

The slope of the blue line is ${ 2/(\pi e) \approx 0.2342 }$; the (ir)rationality of this number is unknown. Thus, just under a quarter of the zeros of ${ T_n }$ are expected to be real when ${ n }$ is large.

The actual number of real zeros tends to exceed the prediction (by only a few) because some Taylor polynomials have real zeros in the region where they no longer follow the function. For example, ${ T_{11} }$ does this:

Richard S. Varga and Amos J. Carpenter wrote a series of papers titled Zeros of the partial sums of ${ \cos z }$ and ${\sin z }$ in which they classify real zeros into Hurwitz (which follow the corresponding trigonometric function) and spurious. They give the precise count of the Hurwitz zeros: ${1+2\lfloor n/(\pi e)\rfloor }$ for the sine and ${2\lfloor n/(\pi e)+1/2\rfloor }$ for the cosine. The total number of real roots does not appear to admit such an explicit formula. It is the sequence A012264 in the OEIS.