When we want to smoothen (*mollify*) a given function , the standard recipe suggests: take the -smooth bump function

where is chosen so that (for the record, ).

Make the bump narrow and tall: . Then define , that is

The second form of the integral makes it clear that is infinitely differentiable. And it is easy to prove that for any continuous the approximation converges to uniformly on compact subsets of .

The choice of the particular *mollifier* given above is quite natural: we want a function with compact support (to avoid any issues with fast-growing functions ), so it cannot be analytic. And functions like are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If is an ugly function like , then we probably should not expect fast convergence. But is could be something like ; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at ; it is the presence of (nonzero) **second derivative** at . To study this effect in isolation, consider the function , which has nothing beyond the second derivative. Here it is together with : the red and blue graphs are nearly indistinguishable.

But upon closer inspection, misses the target by almost . And not only around the origin: the difference is constant.

With the approximation is better.

But upon closer inspection, misses the target by almost .

And so it continues, with the error of order . And here is where it comes from:

The root of the problem is the nonzero second moment . But of course, this moment cannot be zero if does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as , have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order .

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since is a multiple of , it suffices to find a nice function such that and for ; the mollifier will be the inverse Fourier transform of .

As an example, I took something similar to the original , but with a flat top:

The inverse Fourier transform of is a mollifier that reproduces all polynomials **exactly**: for any polynomial. Here it is:

Since I did not make very carefully (its second derivative is discontinuous at ), the mollifier has a moderately heavy tail. With a more careful construction it would decay faster than any power of . However, it cannot be compactly supported. Indeed, if were compactly supported, then would be real-analytic; that is, represented by its power series centered at . But that power series is

The idea of using **negative** weights in the process of **averaging** looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

*Credit:* I got the idea of this post from the following remark by *fedja* on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond and with them.