Wild power pie

Many people are aware of ${\pi}$ being a number between 3 and 4, and some also know that ${e}$ is between 2 and 3. Although the difference ${\pi-e}$ is less than 1/2, it’s enough to place the two constants in separate buckets on the number line, separated by an integer.

When dealing with powers of ${e}$, using ${e>2}$ is frequently wasteful, so it helps to know that ${e^2>7}$. Similarly, ${\pi^2<10}$ is way more precise than ${\pi<4}$. To summarize: ${e^2}$ is between 7 and 8, while ${\pi^2}$ is between 9 and 10.

Do any two powers of ${\pi}$ and ${e}$ have the same integer part? That is, does the equation ${\lfloor \pi^n \rfloor = \lfloor e^m \rfloor}$ have a solution in positive integers ${m,n}$?

Probably not. Chances are that the only pairs ${(m,n)}$ for which ${|\pi^n - e^m|<10}$ are ${m,n\in \{1,2\}}$, the smallest difference attained by ${m=n=1}$.

Indeed, having ${|\pi^n - e^m|<1}$ implies that ${|n\log \pi - m|, or put differently, ${\left|\log \pi - \dfrac{m}{n}\right| < \dfrac{1}{n \,\pi^n}}$. This would be an extraordinary rational approximation… for example, with ${n=100}$ it would mean that ${\log \pi = 1.14\ldots}$ with the following ${50}$ digits all being ${0}$. This isn’t happening.

Looking at the continued fraction expansion of ${\log \pi}$ shows the denominators of modest size ${[1; 6, 1, 10, 24, \dots]}$, indicating the lack of extraordinarily nice rational approximations. Of course, can use them to get good approximations, ${\left|\log \pi - \dfrac{m}{n}\right| < \dfrac{1}{n^2}}$, which leads to ${\pi^n\approx e^m}$ with small relative error. For example, dropping ${24}$ and subsequent terms yields the convergent ${87/76}$, and one can check that ${\pi^{76} = 6.0728... \cdot 10^{37}}$ while ${e^{87} = 6.0760...\cdot 10^{37}}$.

Trying a few not-too-obscure constants with the help of mpmath library, the best coincidence of integer parts that I found is the following: the 13th power of the golden ratio ${\varphi = (\sqrt{5}+1)/2}$ and the 34th power of Apèry’s constant ${\zeta(3) = 1^{-3}+2^{-3}+3^{-3}+4^{-4}+\dots}$ both have integer part 521.

Down with sines!

Suppose you have a reasonable continuous function ${f}$ on some interval, say ${f(x)=e^x}$ on ${[-1,1]}$, and you want to approximate it by a trigonometric polynomial. A straightforward approach is to write

$\displaystyle f(x) \approx \frac12 A_0+\sum_{n=1}^N (A_n\cos n \pi x +B_n \sin n \pi x)$

where ${A_n}$ and ${B_n}$ are the Fourier coefficients:

$\displaystyle A_n= \int_{-1}^1 e^x \cos \pi n x \,dx = (-1)^n \frac{e-e^{-1}}{1+ \pi^2 n^2 }$

$\displaystyle B_n= \int_{-1}^1 e^x \sin \pi n x \,dx = (-1)^{n-1} \frac{\pi n (e-e^{-1})}{1+ \pi^2 n^2 }$

(Integration can be done with the standard Calculus torture device). With ${N=4}$, we get

$\displaystyle e^{x} \approx 1.175 -0.216\cos\pi x +0.679 \sin \pi x +0.058\cos 2\pi x -0.365 \sin 2\pi x -0.026\cos 3\pi x +0.247 \sin 3\pi x +0.015\cos 4\pi x -0.186 \sin 4\pi x$

which, frankly, is not a very good deal for the price.

Still using the standard Fourier expansion formulas, one can improve approximation by shifting the function to ${[0,2]}$ and expanding it into the cosine Fourier series.

$\displaystyle f(x-1) \approx \frac12 A_0+\sum_{n=1}^N A_n\cos \frac{n \pi x}{2}$

where

$\displaystyle A_n= \int_{0}^2 e^{x-1} \cos \frac{\pi n x}{2} \,dx = \begin{cases} \dfrac{e-e^{-1}}{1+ \pi^2 k^2 } \quad & n=2k \\ {} \\ (-1)^k \dfrac{4(e+e^{-1})}{4+ \pi^2 (2k-1)^2 } \quad & n = 2k-1 \end{cases}$

Then replace ${x}$ with ${x+1}$ to shift the interval back. With ${N=4}$, the partial sum is

$\displaystyle e^x \approx 1.175 -0.890\cos \frac{\pi(x+1)}{2} +0.216\cos \pi(x+1)-0.133\cos \frac{3\pi(x+1)}{2} +0.058\cos 2\pi (x+1)$

which gives a much better approximation with fewer coefficients to calculate.

To see what is going on, one has to look beyond the interval on which ${f}$ is defined. The first series actually approximates the periodic extension of ${f}$, which is discontinuous because the endpoint values are not equal:

Cosines, being even, approximate the symmetric periodic extension of ${f}$, which is continuous whenever ${f}$ is.

Discontinuities hurt the quality of Fourier approximation more than the lack of smoothness does.

Just for laughs I included the pure sine approximation, also with ${N=4}$.

Improving the Wallis product

The Wallis product for ${\pi}$, as seen on Wikipedia, is

${\displaystyle 2\prod_{k=1}^\infty \frac{4k^2}{4k^2-1} = \pi \qquad \qquad (1)}$

Historical significance of this formula nonwithstanding, one has to admit that this is not a good way to approximate ${\pi}$. For example, the product up to ${k=10}$ is

${\displaystyle 2\,\frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{1\cdot 3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21} =\frac{137438953472}{44801898141} }$

And all we get for this effort is the lousy approximation ${\pi\approx \mathbf{3.0677}}$.

But it turns out that (1) can be dramatically improved with a little tweak. First, let us rewrite partial products in (1) in terms of double factorials. This can be done in two ways: either

${\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1} = (4n+2) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2 \qquad \qquad (2)}$

or

${\displaystyle 2\prod_{k=1}^n \frac{4k^2}{4k^2-1} = \frac{2}{2n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2 \qquad \qquad (3)}$

Seeing how badly (2) underestimates ${\pi}$, it is natural to bump it up: replace ${4n+2}$ with ${4n+3}$:

${\displaystyle \pi \approx b_n= (4n+3) \left(\frac{(2n)!!}{(2n+1)!!}\right)^2 \qquad \qquad (4)}$

Now with ${n=10}$ we get ${\mathbf{3.1407}}$ instead of ${\mathbf{3.0677}}$. The error is down by two orders of magnitude, and all we had to do was to replace the factor of ${4n+2=42}$ with ${4n+3=43}$. In particular, the size of numerator and denominator hardly changed:

${\displaystyle b_{10}=43\, \frac{2\cdot 2\cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10 \cdot 10\cdot 12\cdot 12\cdot 14\cdot 14\cdot 16\cdot 16\cdot 18 \cdot 18\cdot 20\cdot 20}{3\cdot 3\cdot 5 \cdot 5\cdot 7\cdot 7\cdot 9\cdot 9\cdot 11\cdot 11\cdot 13\cdot 13\cdot 15\cdot 15\cdot 17\cdot 17\cdot 19\cdot 19\cdot 21\cdot 21} }$

Approximation (4) differs from (2) by additional term ${\left(\frac{(2n)!!}{(2n+1)!!}\right)^2}$, which decreases to zero. Therefore, it is not obvious whether the sequence ${b_n}$ is increasing. To prove that it is, observe that the ratio ${b_{n+1}/b_n}$ is

${\displaystyle \frac{4n+7}{4n+3}\left(\frac{2n+2}{2n+3}\right)^2}$

which is greater than 1 because

${\displaystyle (4n+7)(2n+2)^2 - (4n+3)(2n+3)^2 = 1 >0 }$

Sweet cancellation here. Incidentally, it shows that if we used ${4n+3+\epsilon}$ instead of ${4n+3}$, the sequence would overshoot ${\pi}$ and no longer be increasing.

The formula (3) can be similarly improved. The fraction ${2/(2n+1)}$ is secretly ${4/(4n+2)}$, which should be replaced with ${4/(4n+1)}$. The resulting approximation for ${\pi}$

${\displaystyle c_n = \frac{4}{4n+1} \left(\frac{(2n)!!}{(2n-1)!!}\right)^2 \qquad \qquad (5)}$

is about as good as ${b_n}$, but it approaches ${\pi}$ from above. For example, ${c_{10}\approx \mathbf{3.1425}}$.

The proof that ${c_n}$ is decreasing is familiar: the ratio ${c_{n+1}/c_n}$ is

${\displaystyle \frac{4n+1}{4n+5}\left(\frac{2n+2}{2n+1}\right)^2}$

which is less than 1 because

${\displaystyle (4n+1)(2n+2)^2 - (4n+5)(2n+1)^2 = -1 <0 }$

Sweet cancellation once again.

Thus, ${b_n<\pi for all ${n}$. The midpoint of this containing interval provides an even better approximation: for example, ${(b_{10}+c_{10})/2 \approx \mathbf{3.1416}}$. The plot below displays the quality of approximation as logarithm of the absolute error:

• yellow dots show the error of Wallis partial products (2)-(3)
• blue is the error of ${b_n}$
• red is for ${c_n}$
• black is for ${(b_n+c_n)/2}$

And all we had to do was to replace ${4n+2}$ with ${4n+3}$ or ${4n+1}$ in the right places.

Dirichlet vs Fejér

Convolution of a continuous function ${f}$ on the circle ${\mathbb T=\mathbb R/\mathbb (2\pi \mathbb Z)}$ with the Fejér kernel ${\displaystyle F_N(x)=\frac{1-\cos (N+1)x}{(N+1)(1-\cos x)}}$ is guaranteed to produce trigonometric polynomials that converge to ${f}$ uniformly as ${N\rightarrow\infty}$. For the Dirichlet kernel ${\displaystyle D_N(x)=\frac{\sin((N+1/2)x)}{\sin(x/2)}}$ this is not the case: the sequence may fail to converge to ${f}$ even pointwise. The underlying reason is that ${\int_{\mathbb T} |D_N|\rightarrow \infty }$, while the Fejér kernel, being positive, has constant ${L^1}$ norm. Does this mean that Fejér’s kernel is to be preferred for approximation purposes?

Let’s compare the performance of both kernels on the function ${f(x)=2\pi^2 x^2-x^4}$, which is reasonably nice: ${f\in C^2(\mathbb T)}$. Convolution with ${D_2}$ yields ${\displaystyle \frac{1}{2\pi}\int_{-\pi}^{\pi} f(t)D_2(x-t)\,dt = \frac{7\pi^4}{15} -48 \cos x +3 \cos 2x }$. The trigonometric polynomial is in blue, the original function in red:

I’d say this is a very good approximation.

Now try the Fejér kernel, also with ${N=2}$. The polynomial is ${\displaystyle \frac{1}{2\pi}\int_{-\pi}^{\pi} f(t)K_2(x-t)\,dt = \frac{7\pi^4}{15} - 32 \cos x + \cos 2x }$

This is not good at all.

And even with ${N=20}$ terms the Fejér approximation is not as good as Dirichlet with merely ${N=2}$.

The performance of ${F_{50}}$ is comparable to that of ${D_2}$. Of course, a ${50}$-term approximation is not what one normally wants to use. And it still has visible deviation near the origin, where the function ${f}$ is ${C^\infty}$ smooth:

In contrast, the Dirichlet kernel with ${N=4}$ gives a low-degree polynomial
${\displaystyle \frac{7\pi^4}{15} -48 \cos x +3 \cos 2x -\frac{16}{27}\cos 3x+\frac{3}{16}\cos 4x}$ that approximates ${f}$ to within the resolution of the plot:

What we have here is the trigonometric version of Biased and unbiased mollification. Convolution with ${D_N}$ amounts to truncation of the Fourier series at index ${N}$. Therefore, it reproduces the trigonometric polynomials of low degrees precisely. But ${F_N}$ performs soft thresholding: it multiplies the ${n}$th Fourier coefficient of ${f}$ by ${(1-|n|/(N+1))^+}$. In particular, it transforms ${\cos x}$ into ${(N/(N+1))\cos x}$, introducing the error of order ${1/N}$ — a pretty big one. Since this error is built into the kernel, it limits the rate of convergence no matter how smooth the function ${f}$ is. Such is the price that must be paid for positivity.

This reminds me of a parenthetical remark by G. B. Folland in Real Analysis (2nd ed., page 264):

if one wants to approximate a function ${f\in C(\mathbb T)}$ uniformly by trigonometric polynomials, one should not count on partial sums ${S_mf}$ to do the job; the Cesàro means work much better in general.

Right, for ugly “generic” elements of ${C(\mathbb T)}$ the Fejér kernel is a safer option. But for decently behaved functions the Dirichlet kernel wins by a landslide. The function above was ${C^2}$-smooth; as a final example I take ${f(x)=x^2}$ which is merely Lipschitz on ${\mathbb T}$. The original function is in red, ${f*D_4}$ is in blue, and ${f*F_4}$ is in green.

Added: the Jackson kernel ${J_{2N}}$ is the square of ${F_{N}}$, normalized. I use ${2N}$ as the index because squaring doubles the degree. Here is how it approximates ${f(x)=2\pi^2 x^2-x^4}$:

The Jackson kernel performs somewhat better than ${F_N}$, because the coefficient of ${\cos x}$ is off by ${O(1/N^2)}$. Still not nearly as good as the non-positive Dirichlet kernel.

Biased and unbiased mollification

When we want to smoothen (mollify) a given function ${f:{\mathbb R}\rightarrow{\mathbb R}}$, the standard recipe suggests: take the ${C^{\infty}}$-smooth bump function

$\displaystyle \varphi(t) = \begin{cases} c\, \exp\{1/(1-t^2)\}\quad & |t|<1; \\ 0 \quad & |t|\ge 1 \end{cases}$

where ${c}$ is chosen so that ${\int_{{\mathbb R}} \varphi=1}$ (for the record, ${c\approx 2.2523}$).

Make the bump narrow and tall: ${\varphi_{\delta}(t)=\delta^{-1}\varphi(t/\delta)}$. Then define ${f_\delta = f*\varphi_\delta}$, that is

$\displaystyle f_\delta(x) = \int_{\mathbb R} f(x-t) \varphi_\delta(t)\,dt = \int_{\mathbb R} f(t) \varphi_\delta(x-t)\,dt$

The second form of the integral makes it clear that ${f_\delta}$ is infinitely differentiable. And it is easy to prove that for any continuous ${f}$ the approximation ${f_\delta}$ converges to ${ f}$ uniformly on compact subsets of ${{\mathbb R}}$.

The choice of the particular mollifier given above is quite natural: we want a ${C^\infty}$ function with compact support (to avoid any issues with fast-growing functions ${f}$), so it cannot be analytic. And functions like ${\exp(-1/t)}$ are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If ${f}$ is an ugly function like ${f(x)=\sqrt{|x|}}$, then we probably should not expect fast convergence. But is could be something like ${f(x)=|x|^7}$; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start ${f\in C^6}$ that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at ${x=0}$; it is the presence of (nonzero) second derivative at ${x\ne 0}$. To study this effect in isolation, consider the function ${f(x)=x^2}$, which has nothing beyond the second derivative. Here it is together with ${f_{0.1}}$: the red and blue graphs are nearly indistinguishable.

But upon closer inspection, ${f_{0.1}}$ misses the target by almost ${2\cdot 10^{-3}}$. And not only around the origin: the difference ${f_{0.1}-f}$ is constant.

With ${\delta=0.01}$ the approximation is better.

But upon closer inspection, ${f_{0.01}}$ misses the target by almost ${2\cdot 10^{-5}}$.

And so it continues, with the error of order ${\delta^2}$. And here is where it comes from:

$\displaystyle f_\delta(0) = \int_{\mathbb R} t^2\varphi_\delta(t)\,dt = \delta^{-1} \int_{\mathbb R} t^2\varphi(t/\delta)\,dt = \delta^{2} \int_{\mathbb R} s^2\varphi(s)\,ds$

The root of the problem is the nonzero second moment ${\int_{\mathbb R} s^2\varphi(s)\,ds \approx 0.158}$. But of course, this moment cannot be zero if ${\varphi}$ does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as ${\varphi}$, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order ${\delta^2}$.

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since ${\int_{\mathbb R} t^n \varphi(t)\,dt }$ is a multiple of ${\widehat{\varphi}^{(n)}(0)}$, it suffices to find a nice function ${\psi}$ such that ${\psi(0)=1}$ and ${\psi^{(n)}(0)=0}$ for ${n\ge 1}$; the mollifier will be the inverse Fourier transform of ${\psi}$.

As an example, I took something similar to the original ${\varphi}$, but with a flat top:

$\displaystyle \psi(\xi) = \begin{cases} 1 \quad & |\xi|\le 0.1; \\ \exp\{1-1/(1-(|\xi|-0.01)^2)\} \quad & 0.1<|\xi|<1.1\\ 0\quad & |\xi|\ge 1.1 \end{cases}$

The inverse Fourier transform of ${\psi}$ is a mollifier that reproduces all polynomials exactly: ${p*\varphi = p}$ for any polynomial. Here it is:

Since I did not make ${\psi}$ very carefully (its second derivative is discontinuous at ${\pm 0.01}$), the mollifier ${\varphi}$ has a moderately heavy tail. With a more careful construction it would decay faster than any power of ${t}$. However, it cannot be compactly supported. Indeed, if ${\varphi}$ were compactly supported, then ${\widehat{\varphi}}$ would be real-analytic; that is, represented by its power series centered at ${\xi=0}$. But that power series is

$\displaystyle 1+0+0+0+0+0+0+0+0+0+\dots$

The idea of using negative weights in the process of averaging ${f}$ looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

Credit: I got the idea of this post from the following remark by fedja on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond ${C^2}$ and ${\delta^2}$ with them.