Biased and unbiased mollification

When we want to smoothen (mollify) a given function ${f:{\mathbb R}\rightarrow{\mathbb R}}$, the standard recipe suggests: take the ${C^{\infty}}$-smooth bump function

$\displaystyle \varphi(t) = \begin{cases} c\, \exp\{1/(1-t^2)\}\quad & |t|<1; \\ 0 \quad & |t|\ge 1 \end{cases}$

where ${c}$ is chosen so that ${\int_{{\mathbb R}} \varphi=1}$ (for the record, ${c\approx 2.2523}$).

Make the bump narrow and tall: ${\varphi_{\delta}(t)=\delta^{-1}\varphi(t/\delta)}$. Then define ${f_\delta = f*\varphi_\delta}$, that is

$\displaystyle f_\delta(x) = \int_{\mathbb R} f(x-t) \varphi_\delta(t)\,dt = \int_{\mathbb R} f(t) \varphi_\delta(x-t)\,dt$

The second form of the integral makes it clear that ${f_\delta}$ is infinitely differentiable. And it is easy to prove that for any continuous ${f}$ the approximation ${f_\delta}$ converges to ${ f}$ uniformly on compact subsets of ${{\mathbb R}}$.

The choice of the particular mollifier given above is quite natural: we want a ${C^\infty}$ function with compact support (to avoid any issues with fast-growing functions ${f}$), so it cannot be analytic. And functions like ${\exp(-1/t)}$ are standard examples of infinitely smooth non-analytic functions. Being nonnegative is obviously a plus. What else to ask for?

Well, one may ask for a good rate of convergence. If ${f}$ is an ugly function like ${f(x)=\sqrt{|x|}}$, then we probably should not expect fast convergence. But is could be something like ${f(x)=|x|^7}$; a function that is already six times differentiable. Will the rate of convergence be commensurate with the head start ${f\in C^6}$ that we are given?

No, it will not. The limiting factor is not the lack of seventh derivative at ${x=0}$; it is the presence of (nonzero) second derivative at ${x\ne 0}$. To study this effect in isolation, consider the function ${f(x)=x^2}$, which has nothing beyond the second derivative. Here it is together with ${f_{0.1}}$: the red and blue graphs are nearly indistinguishable.

But upon closer inspection, ${f_{0.1}}$ misses the target by almost ${2\cdot 10^{-3}}$. And not only around the origin: the difference ${f_{0.1}-f}$ is constant.

With ${\delta=0.01}$ the approximation is better.

But upon closer inspection, ${f_{0.01}}$ misses the target by almost ${2\cdot 10^{-5}}$.

And so it continues, with the error of order ${\delta^2}$. And here is where it comes from:

$\displaystyle f_\delta(0) = \int_{\mathbb R} t^2\varphi_\delta(t)\,dt = \delta^{-1} \int_{\mathbb R} t^2\varphi(t/\delta)\,dt = \delta^{2} \int_{\mathbb R} s^2\varphi(s)\,ds$

The root of the problem is the nonzero second moment ${\int_{\mathbb R} s^2\varphi(s)\,ds \approx 0.158}$. But of course, this moment cannot be zero if ${\varphi}$ does not change sign. All familiar mollifiers, from Gaussian and Poisson kernels to compactly supported bumps such as ${\varphi}$, have this limitation. Since they do not reproduce quadratic polynomials exactly, they cannot approximate anything with nonzero second derivative better than to the order ${\delta^2}$.

Let’s find a mollifier without such limitations; that is, with zero moments of all orders. One way to do it is to use the Fourier transform. Since ${\int_{\mathbb R} t^n \varphi(t)\,dt }$ is a multiple of ${\widehat{\varphi}^{(n)}(0)}$, it suffices to find a nice function ${\psi}$ such that ${\psi(0)=1}$ and ${\psi^{(n)}(0)=0}$ for ${n\ge 1}$; the mollifier will be the inverse Fourier transform of ${\psi}$.

As an example, I took something similar to the original ${\varphi}$, but with a flat top:

$\displaystyle \psi(\xi) = \begin{cases} 1 \quad & |\xi|\le 0.1; \\ \exp\{1-1/(1-(|\xi|-0.01)^2)\} \quad & 0.1<|\xi|<1.1\\ 0\quad & |\xi|\ge 1.1 \end{cases}$

The inverse Fourier transform of ${\psi}$ is a mollifier that reproduces all polynomials exactly: ${p*\varphi = p}$ for any polynomial. Here it is:

Since I did not make ${\psi}$ very carefully (its second derivative is discontinuous at ${\pm 0.01}$), the mollifier ${\varphi}$ has a moderately heavy tail. With a more careful construction it would decay faster than any power of ${t}$. However, it cannot be compactly supported. Indeed, if ${\varphi}$ were compactly supported, then ${\widehat{\varphi}}$ would be real-analytic; that is, represented by its power series centered at ${\xi=0}$. But that power series is

$\displaystyle 1+0+0+0+0+0+0+0+0+0+\dots$

The idea of using negative weights in the process of averaging ${f}$ looks counterintuitive, but it’s a fact of life. Like the appearance of negative coefficients in the 9-point Newton-Cotes quadrature formula… but that’s another story.

Credit: I got the idea of this post from the following remark by fedja on MathOverflow:

The usual spherical cap mollifiers reproduce constants and linear functions faithfully but have a bias on quadratic polynomials. That’s why you cannot go beyond ${C^2}$ and ${\delta^2}$ with them.