Many people are aware of being a number between 3 and 4, and some also know that is between 2 and 3. Although the difference is less than 1/2, it’s enough to place the two constants in separate buckets on the number line, separated by an integer.
When dealing with powers of , using is frequently wasteful, so it helps to know that . Similarly, is way more precise than . To summarize: is between 7 and 8, while is between 9 and 10.
Do any two powers of and have the same integer part? That is, does the equation have a solution in positive integers ?
Probably not. Chances are that the only pairs for which are , the smallest difference attained by .
Indeed, having implies that , or put differently, . This would be an extraordinary rational approximation… for example, with it would mean that with the following digits all being . This isn’t happening.
Looking at the continued fraction expansion of shows the denominators of modest size , indicating the lack of extraordinarily nice rational approximations. Of course, can use them to get good approximations, , which leads to with small relative error. For example, dropping and subsequent terms yields the convergent , and one can check that while .
Trying a few not-too-obscure constants with the help of mpmath library, the best coincidence of integer parts that I found is the following: the 13th power of the golden ratio and the 34th power of Apèry’s constant both have integer part 521.
which, frankly, is not a very good deal for the price.
Still using the standard Fourier expansion formulas, one can improve approximation by shifting the function to and expanding it into the cosine Fourier series.
Then replace with to shift the interval back. With , the partial sum is
which gives a much better approximation with fewer coefficients to calculate.
To see what is going on, one has to look beyond the interval on which is defined. The first series actually approximates the periodic extension of , which is discontinuous because the endpoint values are not equal:
Cosines, being even, approximate the symmetric periodic extension of , which is continuous whenever is.
Discontinuities hurt the quality of Fourier approximation more than the lack of smoothness does.
Just for laughs I included the pure sine approximation, also with .
Convolution of a continuous function on the circle with the Fejér kernel is guaranteed to produce trigonometric polynomials that converge to uniformly as . For the Dirichlet kernel this is not the case: the sequence may fail to converge to even pointwise. The underlying reason is that , while the Fejér kernel, being positive, has constant norm. Does this mean that Fejér’s kernel is to be preferred for approximation purposes?
Let’s compare the performance of both kernels on the function , which is reasonably nice: . Convolution with yields . The trigonometric polynomial is in blue, the original function in red:
I’d say this is a very good approximation.
Now try the Fejér kernel, also with . The polynomial is
This is not good at all.
And even with terms the Fejér approximation is not as good as Dirichlet with merely .
The performance of is comparable to that of . Of course, a -term approximation is not what one normally wants to use. And it still has visible deviation near the origin, where the function is smooth:
In contrast, the Dirichlet kernel with gives a low-degree polynomial
that approximates to within the resolution of the plot:
What we have here is the trigonometric version of Biased and unbiased mollification. Convolution with amounts to truncation of the Fourier series at index . Therefore, it reproduces the trigonometric polynomials of low degrees precisely. But performs soft thresholding: it multiplies the th Fourier coefficient of by . In particular, it transforms into , introducing the error of order — a pretty big one. Since this error is built into the kernel, it limits the rate of convergence no matter how smooth the function is. Such is the price that must be paid for positivity.
This reminds me of a parenthetical remark by G. B. Folland in Real Analysis (2nd ed., page 264):
if one wants to approximate a function uniformly by trigonometric polynomials, one should not count on partial sums to do the job; the Cesàro means work much better in general.
Right, for ugly “generic” elements of the Fejér kernel is a safer option. But for decently behaved functions the Dirichlet kernel wins by a landslide. The function above was -smooth; as a final example I take which is merely Lipschitz on . The original function is in red, is in blue, and is in green.
Added: the Jackson kernel is the square of , normalized. I use as the index because squaring doubles the degree. Here is how it approximates :
The Jackson kernel performs somewhat better than , because the coefficient of is off by . Still not nearly as good as the non-positive Dirichlet kernel.
When we want to smoothen (mollify) a given function , the standard recipe suggests: take the -smooth bump function
where is chosen so that (for the record, ).
Make the bump narrow and tall: . Then define , that is
The second form of the integral makes it clear that is infinitely differentiable. And it is easy to prove that for any continuous the approximation converges to uniformly on compact subsets of .
The choice of the particular mollifier given above is quite natural: we want a function with compact support (to avoid any issues with fast-growing functions ), so it cannot be analytic. And functions like are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?
Well, one may ask for a good rate of convergence. If is an ugly function like , then we probably should not expect fast convergence. But is could be something like ; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start that we are given?
No, it will not. The limiting factor is not the lack of seventh derivative at ; it is the presence of (nonzero) second derivative at . To study this effect in isolation, consider the function , which has nothing beyond the second derivative. Here it is together with : the red and blue graphs are nearly indistinguishable.
But upon closer inspection, misses the target by almost . And not only around the origin: the difference is constant.
With the approximation is better.
But upon closer inspection, misses the target by almost .
And so it continues, with the error of order . And here is where it comes from:
The root of the problem is the nonzero second moment . But of course, this moment cannot be zero if does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as , have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order .
Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since is a multiple of , it suffices to find a nice function such that and for ; the mollifier will be the inverse Fourier transform of .
As an example, I took something similar to the original , but with a flat top:
The inverse Fourier transform of is a mollifier that reproduces all polynomials exactly: for any polynomial. Here it is:
Since I did not make very carefully (its second derivative is discontinuous at ), the mollifier has a moderately heavy tail. With a more careful construction it would decay faster than any power of . However, it cannot be compactly supported. Indeed, if were compactly supported, then would be real-analytic; that is, represented by its power series centered at . But that power series is
The idea of using negative weights in the process of averaging looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.
Credit: I got the idea of this post from the following remark by fedja on MathOverflow:
The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond and with them.