When we want to smoothen (mollify) a given function , the standard recipe suggests: take the
-smooth bump function
where is chosen so that
(for the record,
).

Make the bump narrow and tall: . Then define
, that is
The second form of the integral makes it clear that is infinitely differentiable. And it is easy to prove that for any continuous
the approximation
converges to
uniformly on compact subsets of
.
The choice of the particular mollifier given above is quite natural: we want a function with compact support (to avoid any issues with fast-growing functions
), so it cannot be analytic. And functions like
are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?
Well, one may ask for a good rate of convergence. If is an ugly function like
, then we probably should not expect fast convergence. But is could be something like
; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start
that we are given?
No, it will not. The limiting factor is not the lack of seventh derivative at ; it is the presence of (nonzero) second derivative at
. To study this effect in isolation, consider the function
, which has nothing beyond the second derivative. Here it is together with
: the red and blue graphs are nearly indistinguishable.

But upon closer inspection, misses the target by almost
. And not only around the origin: the difference
is constant.

With the approximation is better.

But upon closer inspection, misses the target by almost
.

And so it continues, with the error of order . And here is where it comes from:
The root of the problem is the nonzero second moment . But of course, this moment cannot be zero if
does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as
, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order
.
Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since is a multiple of
, it suffices to find a nice function
such that
and
for
; the mollifier will be the inverse Fourier transform of
.
As an example, I took something similar to the original , but with a flat top:

The inverse Fourier transform of is a mollifier that reproduces all polynomials exactly:
for any polynomial. Here it is:

Since I did not make very carefully (its second derivative is discontinuous at
), the mollifier
has a moderately heavy tail. With a more careful construction it would decay faster than any power of
. However, it cannot be compactly supported. Indeed, if
were compactly supported, then
would be real-analytic; that is, represented by its power series centered at
. But that power series is
The idea of using negative weights in the process of averaging looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.
Credit: I got the idea of this post from the following remark by fedja on MathOverflow:
The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond
and
with them.