Biased and unbiased mollification

When we want to smoothen (mollify) a given function {f:{\mathbb R}\rightarrow{\mathbb R}}, the standard recipe suggests: take the {C^{\infty}}-smooth bump function

\displaystyle    \varphi(t) = \begin{cases} c\, \exp\{1/(1-t^2)\}\quad & |t|<1; \\   0 \quad & |t|\ge 1 \end{cases}

where {c} is chosen so that {\int_{{\mathbb R}} \varphi=1} (for the record, {c\approx 2.2523}).

Standard bump
Standard bump

Make the bump narrow and tall: {\varphi_{\delta}(t)=\delta^{-1}\varphi(t/\delta)}. Then define {f_\delta = f*\varphi_\delta}, that is

\displaystyle    f_\delta(x) = \int_{\mathbb R} f(x-t) \varphi_\delta(t)\,dt = \int_{\mathbb R} f(t) \varphi_\delta(x-t)\,dt

The second form of the integral makes it clear that {f_\delta} is infinitely differentiable. And it is easy to prove that for any continuous {f} the approximation {f_\delta} converges to { f} uniformly on compact subsets of {{\mathbb R}}.

The choice of the particular mollifier given above is quite natural: we want a {C^\infty} function with compact support (to avoid any issues with fast-growing functions {f}), so it cannot be analytic. And functions like {\exp(-1/t)} are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If {f} is an ugly function like {f(x)=\sqrt{|x|}}, then we probably should not expect fast convergence. But is could be something like {f(x)=|x|^7}; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start {f\in C^6} that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at {x=0}; it is the presence of (nonzero) second derivative at {x\ne 0}. To study this effect in isolation, consider the function {f(x)=x^2}, which has nothing beyond the second derivative. Here it is together with {f_{0.1}}: the red and blue graphs are nearly indistinguishable.

Good approximation

But upon closer inspection, {f_{0.1}} misses the target by almost {2\cdot 10^{-3}}. And not only around the origin: the difference {f_{0.1}-f} is constant.

But it overshoots the target

With {\delta=0.01} the approximation is better.

Better approximation
Better approximation

But upon closer inspection, {f_{0.01}} misses the target by almost {2\cdot 10^{-5}}.

Still overshoots
Still overshoots

And so it continues, with the error of order {\delta^2}. And here is where it comes from:

\displaystyle f_\delta(0) = \int_{\mathbb R} t^2\varphi_\delta(t)\,dt = \delta^{-1} \int_{\mathbb R} t^2\varphi(t/\delta)\,dt  = \delta^{2} \int_{\mathbb R} s^2\varphi(s)\,ds

The root of the problem is the nonzero second moment {\int_{\mathbb R} s^2\varphi(s)\,ds \approx 0.158}. But of course, this moment cannot be zero if {\varphi} does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as {\varphi}, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order {\delta^2}.

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since {\int_{\mathbb R} t^n \varphi(t)\,dt } is a multiple of {\widehat{\varphi}^{(n)}(0)}, it suffices to find a nice function {\psi} such that {\psi(0)=1} and {\psi^{(n)}(0)=0} for {n\ge 1}; the mollifier will be the inverse Fourier transform of {\psi}.

As an example, I took something similar to the original {\varphi}, but with a flat top:

\displaystyle  \psi(\xi) = \begin{cases} 1 \quad & |\xi|\le 0.1; \\    \exp\{1-1/(1-(|\xi|-0.01)^2)\} \quad & 0.1<|\xi|<1.1\\  0\quad & |\xi|\ge 1.1  \end{cases}

Fourier transform of unbiased mollifier
Fourier transform of unbiased mollifier

The inverse Fourier transform of {\psi} is a mollifier that reproduces all polynomials exactly: {p*\varphi = p} for any polynomial. Here it is:

Unbiased mollifier
Unbiased mollifier

Since I did not make {\psi} very carefully (its second derivative is discontinuous at {\pm 0.01}), the mollifier {\varphi} has a moderately heavy tail. With a more careful construction it would decay faster than any power of {t}. However, it cannot be compactly supported. Indeed, if {\varphi} were compactly supported, then {\widehat{\varphi}} would be real-analytic; that is, represented by its power series centered at {\xi=0}. But that power series is

\displaystyle 1+0+0+0+0+0+0+0+0+0+\dots

The idea of using negative weights in the process of averaging {f} looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

Credit: I got the idea of this post from the following remark by fedja on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond {C^2} and {\delta^2} with them.