Dirichlet vs Fejér

Convolution of a continuous function ${f}$ on the circle ${\mathbb T=\mathbb R/\mathbb (2\pi \mathbb Z)}$ with the Fejér kernel $F_N(x)=\frac{1-\cos (N+1)x}{(N+1)(1-\cos x)}$ is guaranteed to produce trigonometric polynomials that converge to ${f}$ uniformly as ${N\rightarrow\infty}$. For the Dirichlet kernel $D_N(x)=\frac{\sin((N+1/2)x)}{\sin(x/2)}$ this is not the case: the sequence may fail to converge to ${f}$ even pointwise. The underlying reason is that ${\int_{\mathbb T} |D_N|\rightarrow \infty }$, while the Fejér kernel, being positive, has constant ${L^1}$ norm. Does this mean that Fejér’s kernel is to be preferred for approximation purposes?

Let’s compare the performance of both kernels on the function ${f(x)=2\pi^2 x^2-x^4}$, which is reasonably nice: ${f\in C^2(\mathbb T)}$. Convolution with ${D_2}$ yields $\frac{1}{2\pi}\int_{-\pi}^{\pi} f(t)D_2(x-t)\,dt = \frac{7\pi^4}{15} -48 \cos x +3 \cos 2x$. The trigonometric polynomial is in blue, the original function in red:

I’d say this is a very good approximation.

Now try the Fejér kernel, also with ${N=2}$. The polynomial is $\frac{1}{2\pi}\int_{-\pi}^{\pi} f(t)K_2(x-t)\,dt = \frac{7\pi^4}{15} - 32 \cos x + \cos 2x$

This is not good at all.

And even with ${N=20}$ terms the Fejér approximation is not as good as Dirichlet with merely ${N=2}$.

The performance of ${F_{50}}$ is comparable to that of ${D_2}$. Of course, a ${50}$-term approximation is not what one normally wants to use. And it still has visible deviation near the origin, where the function ${f}$ is ${C^\infty}$ smooth:

In contrast, the Dirichlet kernel with ${N=4}$ gives a low-degree polynomial $\frac{7\pi^4}{15} -48 \cos x +3 \cos 2x -\frac{16}{27}\cos 3x+\frac{3}{16}\cos 4x$ that approximates ${f}$ to within the resolution of the plot:

What we have here is the trigonometric version of Biased and unbiased mollification. Convolution with ${D_N}$ amounts to truncation of the Fourier series at index ${N}$. Therefore, it reproduces the trigonometric polynomials of low degrees precisely. But ${F_N}$ performs soft thresholding: it multiplies the ${n}$th Fourier coefficient of ${f}$ by ${(1-|n|/(N+1))^+}$. In particular, it transforms ${\cos x}$ into ${(N/(N+1))\cos x}$, introducing the error of order ${1/N}$ — a pretty big one. Since this error is built into the kernel, it limits the rate of convergence no matter how smooth the function ${f}$ is. Such is the price that must be paid for positivity.

This reminds me of a parenthetical remark by G. B. Folland in Real Analysis (2nd ed., page 264):

if one wants to approximate a function ${f\in C(\mathbb T)}$ uniformly by trigonometric polynomials, one should not count on partial sums ${S_mf}$ to do the job; the Cesàro means work much better in general.

Right, for ugly “generic” elements of ${C(\mathbb T)}$ the Fejér kernel is a safer option. But for decently behaved functions the Dirichlet kernel wins by a landslide. The function above was ${C^2}$-smooth; as a final example I take ${f(x)=x^2}$ which is merely Lipschitz on ${\mathbb T}$. The original function is in red, ${f*D_4}$ is in blue, and ${f*F_4}$ is in green.

Added: the Jackson kernel ${J_{2N}}$ is the square of ${F_{N}}$, normalized. I use ${2N}$ as the index because squaring doubles the degree. Here is how it approximates ${f(x)=2\pi^2 x^2-x^4}$:
The Jackson kernel performs somewhat better than ${F_N}$, because the coefficient of ${\cos x}$ is off by ${O(1/N^2)}$. Still not nearly as good as the non-positive Dirichlet kernel.