Convolution of a continuous function on the circle with the Fejér kernel is guaranteed to produce trigonometric polynomials that converge to uniformly as . For the Dirichlet kernel this is not the case: the sequence may fail to converge to even pointwise. The underlying reason is that , while the Fejér kernel, being positive, has constant norm. Does this mean that Fejér’s kernel is to be preferred for approximation purposes?

Let’s compare the performance of both kernels on the function , which is reasonably nice: . Convolution with yields . The trigonometric polynomial is in blue, the original function in red:

I’d say this is a very good approximation.

Now try the Fejér kernel, also with . The polynomial is

This is not good at all.

And even with terms the Fejér approximation is not as good as Dirichlet with merely .

The performance of is comparable to that of . Of course, a -term approximation is not what one normally wants to use. And it still has visible deviation near the origin, where the function is smooth:

In contrast, the Dirichlet kernel with gives a low-degree polynomial

that approximates to within the resolution of the plot:

What we have here is the trigonometric version of Biased and unbiased mollification. Convolution with amounts to truncation of the Fourier series at index . Therefore, it reproduces the trigonometric polynomials of low degrees precisely. But performs *soft thresholding*: it multiplies the th Fourier coefficient of by . In particular, it transforms into , introducing the error of order — a pretty big one. Since this error is built into the kernel, it limits the rate of convergence no matter how smooth the function is. Such is the price that must be paid for positivity.

This reminds me of a parenthetical remark by G. B. Folland in *Real Analysis* (2nd ed., page 264):

if one wants to approximate a function uniformly by trigonometric polynomials, one should not count on partial sums to do the job; the Cesàro means work much better in general.

Right, for ugly “generic” elements of the Fejér kernel is a safer option. But for decently behaved functions the Dirichlet kernel wins by a landslide. The function above was -smooth; as a final example I take which is merely Lipschitz on . The original function is in red, is in blue, and is in green.

**Added**: the Jackson kernel is the square of , normalized. I use as the index because squaring doubles the degree. Here is how it approximates :

The Jackson kernel performs somewhat better than , because the coefficient of is off by . Still not nearly as good as the non-positive Dirichlet kernel.