Wild power pie

Many people are aware of {\pi} being a number between 3 and 4, and some also know that {e} is between 2 and 3. Although the difference {\pi-e} is less than 1/2, it’s enough to place the two constants in separate buckets on the number line, separated by an integer.

When dealing with powers of {e}, using {e>2} is frequently wasteful, so it helps to know that {e^2>7}. Similarly, {\pi^2<10} is way more precise than {\pi<4}. To summarize: {e^2} is between 7 and 8, while {\pi^2} is between 9 and 10.

Do any two powers of {\pi} and {e} have the same integer part? That is, does the equation {\lfloor \pi^n \rfloor = \lfloor e^m \rfloor} have a solution in positive integers {m,n}?

Probably not. Chances are that the only pairs {(m,n)} for which {|\pi^n - e^m|<10} are {m,n\in \{1,2\}}, the smallest difference attained by {m=n=1}.

Indeed, having {|\pi^n - e^m|<1} implies that {|n\log \pi - m|<e^{-m}}, or put differently, {\left|\log \pi - \dfrac{m}{n}\right| < \dfrac{1}{n \,\pi^n}}. This would be an extraordinary rational approximation… for example, with {n=100} it would mean that {\log \pi = 1.14\ldots} with the following {50} digits all being {0}. This isn’t happening.

Looking at the continued fraction expansion of {\log \pi} shows the denominators of modest size {[1; 6, 1, 10, 24, \dots]}, indicating the lack of extraordinarily nice rational approximations. Of course, can use them to get good approximations, {\left|\log \pi - \dfrac{m}{n}\right| < \dfrac{1}{n^2}}, which leads to {\pi^n\approx e^m} with small relative error. For example, dropping {24} and subsequent terms yields the convergent {87/76}, and one can check that {\pi^{76} = 6.0728... \cdot 10^{37}} while {e^{87} = 6.0760...\cdot 10^{37}}.

Trying a few not-too-obscure constants with the help of mpmath library, the best coincidence of integer parts that I found is the following: the 13th power of the golden ratio {\varphi = (\sqrt{5}+1)/2} and the 34th power of Apèry’s constant {\zeta(3) = 1^{-3}+2^{-3}+3^{-3}+4^{-4}+\dots} both have integer part 521.

Down with sines!

Suppose you have a reasonable continuous function {f} on some interval, say {f(x)=e^x} on {[-1,1]}, and you want to approximate it by a trigonometric polynomial. A straightforward approach is to write

\displaystyle    f(x) \approx \frac12 A_0+\sum_{n=1}^N (A_n\cos n \pi x +B_n \sin n \pi x)

where {A_n} and {B_n} are the Fourier coefficients:

\displaystyle    A_n= \int_{-1}^1 e^x \cos \pi n x \,dx = (-1)^n \frac{e-e^{-1}}{1+ \pi^2 n^2 }

\displaystyle    B_n= \int_{-1}^1 e^x \sin \pi n x \,dx = (-1)^{n-1} \frac{\pi n (e-e^{-1})}{1+ \pi^2 n^2 }

(Integration can be done with the standard Calculus torture device). With {N=4}, we get

\displaystyle    e^{x} \approx 1.175 -0.216\cos\pi x +0.679 \sin \pi x +0.058\cos 2\pi x -0.365 \sin 2\pi x -0.026\cos 3\pi x +0.247 \sin 3\pi x +0.015\cos 4\pi x   -0.186 \sin 4\pi x

which, frankly, is not a very good deal for the price.

Would not buy again
Would not buy again

Still using the standard Fourier expansion formulas, one can improve approximation by shifting the function to {[0,2]} and expanding it into the cosine Fourier series.

\displaystyle    f(x-1) \approx \frac12 A_0+\sum_{n=1}^N A_n\cos \frac{n \pi x}{2}


\displaystyle    A_n= \int_{0}^2 e^{x-1} \cos \frac{\pi n x}{2} \,dx = \begin{cases} \dfrac{e-e^{-1}}{1+ \pi^2 k^2 }  \quad & n=2k \\ {} \\   (-1)^k \dfrac{4(e+e^{-1})}{4+ \pi^2 (2k-1)^2 } \quad & n = 2k-1 \end{cases}

Then replace {x} with {x+1} to shift the interval back. With {N=4}, the partial sum is

\displaystyle    e^x \approx 1.175 -0.890\cos \frac{\pi(x+1)}{2} +0.216\cos \pi(x+1)-0.133\cos \frac{3\pi(x+1)}{2} +0.058\cos 2\pi (x+1)

which gives a much better approximation with fewer coefficients to calculate.

When the sines don't get in the way.
When the sines don’t get in the way

To see what is going on, one has to look beyond the interval on which {f} is defined. The first series actually approximates the periodic extension of {f}, which is discontinuous because the endpoint values are not equal:

Periodic extension of a continuous function may be discontinuous
Periodic extension of a continuous function may be discontinuous

Cosines, being even, approximate the symmetric periodic extension of {f}, which is continuous whenever {f} is.

Symmetric periodic extension preserves continuity
Symmetric periodic extension preserves continuity

Discontinuities hurt the quality of Fourier approximation more than the lack of smoothness does.

Just for laughs I included the pure sine approximation, also with {N=4}.

Down with sines
Down with sines

Improving the Wallis product

The Wallis product for {\pi}, as seen on Wikipedia, is

{\displaystyle 2\prod_{k=1}^\infty \frac{4k^2}{4k^2-1}  = \pi \qquad \qquad (1)}

Historical significance of this formula nonwithstanding, one has to admit that this is not a good way to approximate {\pi}. For example, the product up to {k=10} is

{\displaystyle 2\,\frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{1\cdot 3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21} =\frac{137438953472}{44801898141} }

And all we get for this effort is the lousy approximation {\pi\approx \mathbf{3.0677}}.

But it turns out that (1) can be dramatically improved with a little tweak. First, let us rewrite partial products in (1) in terms of double factorials. This can be done in two ways: either

{\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1}  =  (4n+2) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2  \qquad \qquad (2)}


{\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1}  =  \frac{2}{2n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2  \qquad \qquad (3)}

Seeing how badly (2) underestimates {\pi}, it is natural to bump it up: replace {4n+2} with {4n+3}:

{\displaystyle \pi \approx b_n= (4n+3) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2  \qquad \qquad (4)}

Now with {n=10} we get {\mathbf{3.1407}} instead of {\mathbf{3.0677}}. The error is down by two orders of magnitude, and all we had to do was to replace the factor of {4n+2=42} with {4n+3=43}. In particular, the size of numerator and denominator hardly changed:

{\displaystyle b_{10}=43\, \frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21\cdot 21} }

Approximation (4) differs from (2) by additional term {\left(\frac{(2n)!!}{(2n+1)!!}\right)^2}, which decreases to zero. Therefore, it is not obvious whether the sequence {b_n} is increasing. To prove that it is, observe that the ratio {b_{n+1}/b_n} is

{\displaystyle  \frac{4n+7}{4n+3}\left(\frac{2n+2}{2n+3}\right)^2}

which is greater than 1 because

{\displaystyle  (4n+7)(2n+2)^2 - (4n+3)(2n+3)^2 = 1 >0 }

Sweet cancellation here. Incidentally, it shows that if we used {4n+3+\epsilon} instead of {4n+3}, the sequence would overshoot {\pi} and no longer be increasing.

The formula (3) can be similarly improved. The fraction {2/(2n+1)} is secretly {4/(4n+2)}, which should be replaced with {4/(4n+1)}. The resulting approximation for {\pi}

{\displaystyle c_n =  \frac{4}{4n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2  \qquad \qquad (5)}

is about as good as {b_n}, but it approaches {\pi} from above. For example, {c_{10}\approx \mathbf{3.1425}}.

The proof that {c_n} is decreasing is familiar: the ratio {c_{n+1}/c_n} is

{\displaystyle  \frac{4n+1}{4n+5}\left(\frac{2n+2}{2n+1}\right)^2}

which is less than 1 because

{\displaystyle  (4n+1)(2n+2)^2 - (4n+5)(2n+1)^2 = -1 <0 }

Sweet cancellation once again.

Thus, {b_n<\pi<c_n} for all {n}. The midpoint of this containing interval provides an even better approximation: for example, {(b_{10}+c_{10})/2 \approx \mathbf{3.1416}}. The plot below displays the quality of approximation as logarithm of the absolute error:

Logarithm of approximation error
  • yellow dots show the error of Wallis partial products (2)-(3)
  • blue is the error of {b_n}
  • red is for {c_n}
  • black is for {(b_n+c_n)/2}

And all we had to do was to replace {4n+2} with {4n+3} or {4n+1} in the right places.

Dirichlet vs Fejér

Convolution of a continuous function {f} on the circle {\mathbb T=\mathbb R/\mathbb (2\pi \mathbb Z)} with the Fejér kernel {\displaystyle F_N(x)=\frac{1-\cos (N+1)x}{(N+1)(1-\cos x)}} is guaranteed to produce trigonometric polynomials that converge to {f} uniformly as {N\rightarrow\infty}. For the Dirichlet kernel {\displaystyle D_N(x)=\frac{\sin((N+1/2)x)}{\sin(x/2)}} this is not the case: the sequence may fail to converge to {f} even pointwise. The underlying reason is that {\int_{\mathbb T} |D_N|\rightarrow \infty }, while the Fejér kernel, being positive, has constant {L^1} norm. Does this mean that Fejér’s kernel is to be preferred for approximation purposes?

Let’s compare the performance of both kernels on the function {f(x)=2\pi^2 x^2-x^4}, which is reasonably nice: {f\in C^2(\mathbb T)}. Convolution with {D_2} yields {\displaystyle \frac{1}{2\pi}\int_{-\pi}^{\pi} f(t)D_2(x-t)\,dt = \frac{7\pi^4}{15} -48 \cos x +3 \cos 2x }. The trigonometric polynomial is in blue, the original function in red:

Convolution with D_2
Convolution with D_2

I’d say this is a very good approximation.

Now try the Fejér kernel, also with {N=2}. The polynomial is {\displaystyle \frac{1}{2\pi}\int_{-\pi}^{\pi} f(t)K_2(x-t)\,dt = \frac{7\pi^4}{15} - 32 \cos x + \cos 2x }

Convolution with F_2
Convolution with F_2

This is not good at all.

And even with {N=20} terms the Fejér approximation is not as good as Dirichlet with merely {N=2}.

Convolution with F_20
Convolution with F_20

The performance of {F_{50}} is comparable to that of {D_2}. Of course, a {50}-term approximation is not what one normally wants to use. And it still has visible deviation near the origin, where the function {f} is {C^\infty} smooth:

Convolution with F_50
Convolution with F_50

In contrast, the Dirichlet kernel with {N=4} gives a low-degree polynomial
{\displaystyle \frac{7\pi^4}{15} -48 \cos x +3 \cos 2x -\frac{16}{27}\cos 3x+\frac{3}{16}\cos 4x} that approximates {f} to within the resolution of the plot:

Convolution with D_4
Convolution with D_4

What we have here is the trigonometric version of Biased and unbiased mollification. Convolution with {D_N} amounts to truncation of the Fourier series at index {N}. Therefore, it reproduces the trigonometric polynomials of low degrees precisely. But {F_N} performs soft thresholding: it multiplies the {n}th Fourier coefficient of {f} by {(1-|n|/(N+1))^+}. In particular, it transforms {\cos x} into {(N/(N+1))\cos x}, introducing the error of order {1/N} — a pretty big one. Since this error is built into the kernel, it limits the rate of convergence no matter how smooth the function {f} is. Such is the price that must be paid for positivity.

This reminds me of a parenthetical remark by G. B. Folland in Real Analysis (2nd ed., page 264):

if one wants to approximate a function {f\in C(\mathbb T)} uniformly by trigonometric polynomials, one should not count on partial sums {S_mf} to do the job; the Cesàro means work much better in general.

Right, for ugly “generic” elements of {C(\mathbb T)} the Fejér kernel is a safer option. But for decently behaved functions the Dirichlet kernel wins by a landslide. The function above was {C^2}-smooth; as a final example I take {f(x)=x^2} which is merely Lipschitz on {\mathbb T}. The original function is in red, {f*D_4} is in blue, and {f*F_4} is in green.

Dirichlet wins again
Dirichlet wins again

Added: the Jackson kernel {J_{2N}} is the square of {F_{N}}, normalized. I use {2N} as the index because squaring doubles the degree. Here is how it approximates {f(x)=2\pi^2 x^2-x^4}:

Convolution with J_2
Convolution with J_2
Convolution with J_4
Convolution with J_4
Convolution with J_20
Convolution with J_20

The Jackson kernel performs somewhat better than {F_N}, because the coefficient of {\cos x} is off by {O(1/N^2)}. Still not nearly as good as the non-positive Dirichlet kernel.

Biased and unbiased mollification

When we want to smoothen (mollify) a given function {f:{\mathbb R}\rightarrow{\mathbb R}}, the standard recipe suggests: take the {C^{\infty}}-smooth bump function

\displaystyle    \varphi(t) = \begin{cases} c\, \exp\{1/(1-t^2)\}\quad & |t|<1; \\   0 \quad & |t|\ge 1 \end{cases}

where {c} is chosen so that {\int_{{\mathbb R}} \varphi=1} (for the record, {c\approx 2.2523}).

Standard bump
Standard bump

Make the bump narrow and tall: {\varphi_{\delta}(t)=\delta^{-1}\varphi(t/\delta)}. Then define {f_\delta = f*\varphi_\delta}, that is

\displaystyle    f_\delta(x) = \int_{\mathbb R} f(x-t) \varphi_\delta(t)\,dt = \int_{\mathbb R} f(t) \varphi_\delta(x-t)\,dt

The second form of the integral makes it clear that {f_\delta} is infinitely differentiable. And it is easy to prove that for any continuous {f} the approximation {f_\delta} converges to { f} uniformly on compact subsets of {{\mathbb R}}.

The choice of the particular mollifier given above is quite natural: we want a {C^\infty} function with compact support (to avoid any issues with fast-growing functions {f}), so it cannot be analytic. And functions like {\exp(-1/t)} are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If {f} is an ugly function like {f(x)=\sqrt{|x|}}, then we probably should not expect fast convergence. But is could be something like {f(x)=|x|^7}; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start {f\in C^6} that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at {x=0}; it is the presence of (nonzero) second derivative at {x\ne 0}. To study this effect in isolation, consider the function {f(x)=x^2}, which has nothing beyond the second derivative. Here it is together with {f_{0.1}}: the red and blue graphs are nearly indistinguishable.

Good approximation

But upon closer inspection, {f_{0.1}} misses the target by almost {2\cdot 10^{-3}}. And not only around the origin: the difference {f_{0.1}-f} is constant.

But it overshoots the target

With {\delta=0.01} the approximation is better.

Better approximation
Better approximation

But upon closer inspection, {f_{0.01}} misses the target by almost {2\cdot 10^{-5}}.

Still overshoots
Still overshoots

And so it continues, with the error of order {\delta^2}. And here is where it comes from:

\displaystyle f_\delta(0) = \int_{\mathbb R} t^2\varphi_\delta(t)\,dt = \delta^{-1} \int_{\mathbb R} t^2\varphi(t/\delta)\,dt  = \delta^{2} \int_{\mathbb R} s^2\varphi(s)\,ds

The root of the problem is the nonzero second moment {\int_{\mathbb R} s^2\varphi(s)\,ds \approx 0.158}. But of course, this moment cannot be zero if {\varphi} does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as {\varphi}, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order {\delta^2}.

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since {\int_{\mathbb R} t^n \varphi(t)\,dt } is a multiple of {\widehat{\varphi}^{(n)}(0)}, it suffices to find a nice function {\psi} such that {\psi(0)=1} and {\psi^{(n)}(0)=0} for {n\ge 1}; the mollifier will be the inverse Fourier transform of {\psi}.

As an example, I took something similar to the original {\varphi}, but with a flat top:

\displaystyle  \psi(\xi) = \begin{cases} 1 \quad & |\xi|\le 0.1; \\    \exp\{1-1/(1-(|\xi|-0.01)^2)\} \quad & 0.1<|\xi|<1.1\\  0\quad & |\xi|\ge 1.1  \end{cases}

Fourier transform of unbiased mollifier
Fourier transform of unbiased mollifier

The inverse Fourier transform of {\psi} is a mollifier that reproduces all polynomials exactly: {p*\varphi = p} for any polynomial. Here it is:

Unbiased mollifier
Unbiased mollifier

Since I did not make {\psi} very carefully (its second derivative is discontinuous at {\pm 0.01}), the mollifier {\varphi} has a moderately heavy tail. With a more careful construction it would decay faster than any power of {t}. However, it cannot be compactly supported. Indeed, if {\varphi} were compactly supported, then {\widehat{\varphi}} would be real-analytic; that is, represented by its power series centered at {\xi=0}. But that power series is

\displaystyle 1+0+0+0+0+0+0+0+0+0+\dots

The idea of using negative weights in the process of averaging {f} looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

Credit: I got the idea of this post from the following remark by fedja on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond {C^2} and {\delta^2} with them.