## Discrete maximum principle for polynomials

The polynomially convex hull of a compact set ${K\subset \mathbb C}$ is defined as the set of all points ${z\in \mathbb C}$ such that the inequality ${|p(z)|\le \sup_K |p|}$ (a form of the maximum principle) holds for every polynomial ${p}$. For example, the polynomially convex hull of a simple closed curve is the union of that curve with its interior region. In general, this process fills up the holes in the set ${K}$, resulting in the complement of the unbounded connected component of ${\mathbb C\setminus K}$.

We can recover the usual convex hull from this construction by restricting ${p}$ to the polynomials of first degree. Indeed, when ${p}$ is linear, the set ${|p|\le M}$ is a closed disk, and we end up with the intersection of all closed disks that contain ${K}$. This is precisely the convex hull of ${K}$.

What if we restrict ${p}$ to the polynomials of degree at most ${n}$? Let’s call the resulting set the degree-${n}$ convex hull of ${K}$, denoted ${K_n}$. By definition, it is contained in the convex hull and contains the polynomially convex hull. To exactly compute ${K_n}$ for general ${K}$ appears to be difficult even when ${n=2}$.

Consider finite sets. When ${K}$ has at most ${n}$ points, we have ${K_n=K}$ because there is a polynomial of degree ${n}$ whose zero set is precisely ${K}$. So, the first nontrivial case is of ${K}$ having ${n+1}$ points. Let us write ${K=\{z_0, \dots, z_n\}}$.

Depending on the location of the points, ${K_n}$ may be strictly larger than ${K}$. For example, if ${K}$ consists of the vertices of a regular ${(n+1)}$-gon, then ${K_n}$ also contains its center. Here is why. By a linear transformation, we can make sure that ${K=\{\zeta^k\colon k=0, \dots, n\}}$ where ${\zeta = \exp(2\pi i/(n+1))}$. For ${j=1, \dots, n}$ we have ${\sum_{k=0}^n \zeta^{kj} = (\zeta^{(n+1)j}-1)/(\zeta^j - 1) = 0}$. Hence, for any polynomial ${p}$ of degree at most ${n}$, the sum ${\sum_{k=0}^n p(\zeta^k)}$ is equal to ${(n+1)p(0)}$. This implies ${|p(0)|\le \max_{0\le k\le n}|p(\zeta^k)|}$, a kind of a discrete maximum principle.

A more systematic approach is to use the Lagrange basis polynomials, that is ${L_j(z) = \prod_{k\ne j} (z-z_k)/(z_j-z_k)}$, which satisfy ${L_j(z_k) = \delta_{jk}}$. Since ${p = \sum_j p(z_j)L_j}$ for any polynomial of degree at most ${n}$, it follows that ${z\in K_n}$ if and only if ${\left|\sum_j c_j L_j(z)\right|\le \max |c_j| }$ holds for every choice of scalars ${c_0, \dots, c_n}$. The latter is equivalent to the inequality ${\sum_j |L_j(z)|\le 1}$.

This leads us to consider the function ${S=\sum_j |L_j|}$, the sum of the absolute values of the Lagrange basis polynomials. (Remark: S is called a Lebesgue function for this interpolation problem.) Since ${\sum_j L_j\equiv 1}$, it follows that ${S\ge 1}$ everywhere. By construction, ${S=1}$ on ${K}$. At a point ${z\notin K}$, the equality ${S(z)=1}$ holds if and only if ${\arg L_j(z)}$ is the same for all ${j}$.

In the trivial case ${K=\{z_0, z_1\}}$, the function ${S(z) = (|z-z_0|+|z-z_1|)/|z_0-z_1|}$ is equal to ${1}$ precisely on the linear segment with endpoints ${z_0, z_1}$. Of course, this only repeats what we already knew: the degree-1 convex hull is the ordinary convex hull.

If ${K=\{x_0, \dots, x_n\}}$ with ${x_0<\cdots real and ${n\ge 2}$, then ${K_n=K}$. Indeed, if ${x\in K_n\setminus K}$, then ${x}$ lies in the convex hull of ${K}$, and therefore ${x_{j-1} for some ${j}$. The basis polynomial ${L_j}$ is positive at ${x}$, since it is equal to ${1}$ at ${x_j}$ and does not vanish outside of ${K}$. Since a polynomial changes its sign at every simple zero, it follows that ${L_{j+1}(x) < 0}$. Well, there is no ${L_{j+1}}$ if ${j=n}$, but in that case, the same reasoning applies to ${L_{j-2}(x)<0}$. In any case, the conclusion is that ${\arg L_k(x)}$ cannot be the same for all ${k}$.

At this point one might guess that the vertex set of a regular polygon is the only kind of finite sets that admit a nontrivial discrete maximum principle for polynomials. But this is not so: the vertices of a rhombus work as well. Indeed, if ${K=\{a, -a, ib, -ib\}}$ with ${a, b>0}$, then ${L_j(0)>0}$ for all ${j}$, hence ${S(0)=1}$.

The vertices of a non-square rectangle do not work: if ${K}$ is the set of these vertices, the associated function ${S}$ is strictly greater than 1 on the complement of ${K}$.

Are there any other finite sets that support a discrete maximum principle for polynomials?

## Discrete Cosine Transforms: Trapezoidal vs Midpoint

The existence of multiple versions of Discrete Cosine Transform (DCT) can be confusing. Wikipedia explains that its 8 types are determined by how one reflects across the boundaries. E.g., one can reflect 1, 2, 3 across the left boundary as 3, 2, 1, 2, 3 or as 3, 2, 1, 1, 2, 3, and there are such choices for the other boundary too (also, the other reflection can be odd or even). Makes sense enough.

But there is another aspect to the two most used forms, DCT-I and DCT-II (types I and II): they can be expressed in terms of the Trapezoidal and Midpoint rules for integration. Here is how.

The cosines ${\cos kx}$, ${k=0,1,\dots}$ are orthogonal on the interval ${[0,\pi]}$ with respect to the Lebesgue measure. The basis of discrete transforms is that these cosines are also orthogonal with respect to some discrete measures, until certain frequency is reached. Indeed, if cosines are orthogonal with respect to measure ${\mu}$ whose support consists of ${n}$ points, then we can efficiently use them to represent any function defined on the support of ${\mu}$, and those are naturally identified with sequences of length n.

How to find such measures? It helps that some simple rules of numerical integration are exact for trigonometric polynomials up to some degree.

For example, the trapezoidal rule with n sample points exactly integrates the functions ${\cos kx}$ for ${k=0,\dots, 2n-3}$. This could be checked by converting to exponential form and summing geometric progressions, but here is a visual explanation with n=4, where the interval ${[0,\pi]}$ is represented as upper semi-circle. The radius of each red circle indicates the weight placed at that point; the endpoints get 1/2 of the weight of other sample points. To integrate ${\cos x}$ correctly, we must have the x-coordinate of the center of mass equal to zero, which is obviously the case.

Replacing ${\cos x}$ by ${\cos kx}$ means multiplying the polar angle of each sample point by ${k}$. This is what we get:

In all cases the x-coordinate of the center of mass is zero. With k=6 this breaks down, as all the weight gets placed in one point. And this is how it goes in general with integration of sines and cosines: equally spaced points work perfectly until they don’t work at all, which happens when the step size is equal to the period of the function. When ${k=2n-2}$, the period ${2\pi/k}$ is equal to ${\pi/(n-1)}$, the spacing of points in the trapezoidal rule.

The orthogonality of cosines has to do with the formula ${\cos kx \cos j x = \frac12(\cos (k-j)x) + \frac12(\cos (k+j)x)}$. Let ${\tau_n}$ be the measure expressing the trapezoidal rule on ${[0,\pi]}$ with ${n}$ sample points; so it’s the sum of point masses at ${0, \pi/(n-1), \dots , \pi}$. Then ${\{\cos k x \colon k=0,\dots, n-1\}}$ are orthogonal with respect to ${\tau_n}$ because any product ${\cos k x\cos j x }$ with ${k >j}$ taken from this range will have ${k-j, k+j \le 2n-3}$. Consequently, we can compute the coefficients of any function ${f}$ in the cosine basis as

${\displaystyle c_k = \int f(x)\cos kx\,d\tau_n(x) \bigg/ \int \cos^2 kx\,d\tau_n(x)}$

The above is what DCT-I (discrete cosine transform of type 1) does, up to normalization.

The DCT-II transform uses the Midpoint rule instead of the Trapezoidal rule. Let ${\mu_n}$ be the measure expressing the Midpoint rule on ${[0,\pi]}$ with ${n}$ sample points; it gives equal mass to the points ${(2j-1)\pi/(2n)}$ for ${j=1,\dots, n}$. These are spaced at ${\pi/n}$ and therefore the midpoint rule is exact for ${\cos kx}$ with ${k=0,\dots, 2n-1}$ which is better than what the trapezoidal rule does. Perhaps more significantly, by identifying the given data points with function values at the midpoints of subintervals we stay away from the endpoints ${0,\pi}$ where the cosines are somewhat restricted by having to have zero slope.

Let’s compare DCT-I and DCT-II on the same data set, ${y=(0, \sqrt{1}, \sqrt{2}, \dots, \sqrt{8})}$. There are 9 numbers here. Following DCT-I we place them at the sample points of the trapezoidal rule, and expand into cosines using the inner product with respect to ${\tau_n}$. Here is the plot of the resulting trigonometric polynomial: of course it interpolates the data.

But DCT-II does it better, despite having exactly the same cosine functions. The only change is that we use ${\mu_n}$ and so place the ${y}$-values along its support.

Less oscillation means the high-degree coefficients are smaller, and therefore easier to discard in order to compress information. For example, drop the last two coefficients in each expansion, keeping 6 numbers instead of 8. DCT-II clearly wins in accuracy then.

Okay, so the Midpoint rule is better, no surprise. After all, it’s in general about twice as accurate as the Trapezoidal rule. What about Simpson’s rule, would it lead to some super-efficient form of DCT? That is, why don’t we let ${\sigma_n}$ be the discrete measure that expresses Simpson’s rule and use the inner product ${\int fg\,d\sigma_n}$ for cosine expansion? Alas, Simpson’s rule on ${n}$ points is exact only for ${\cos kx}$ with ${k=0,\dots, n-2}$, which is substantially worse than either Trapezoidal or Midpoint rules. As a result, we don’t get enough orthogonal cosines with respect to ${\sigma_n}$ to have an orthogonal basis. Simpson’s rule has an advantage when dealing with algebraic polynomials, not with trigonometric ones.

Finally, the Python code used for the graphics; I did not use SciPy’s DCT method (which is of course more efficient) to keep the relation to numerical integration explicit in the code. The method trapz implements the trapezoidal rule, and the midpoint rule is just the summation of sampled values. In both cases there is no need to worry about factor dx, since it cancels out when we divide one numerical integral by the other.

import numpy as np
import matplotlib.pyplot as plt
#   Setup
y = np.sqrt(np.arange(9))
c = np.zeros_like(y)
n = y.size
#   DCT-I, trapezoidal
x = np.arange(n)*np.pi/(n-1)
for k in range(n):
c[k] = np.trapz(y*np.cos(k*x))/np.trapz(np.cos(k*x)**2)
t = np.linspace(0, np.pi, 500)
yy = np.sum(c*np.cos(np.arange(9)*t.reshape(-1, 1)), axis=1)
plt.plot(x, y, 'ro')
plt.plot(t, yy)
plt.show()
#   DCT-II, midpoint
x = np.arange(n)*np.pi/n + np.pi/(2*n)
for k in range(n):
c[k] = np.sum(y*np.cos(k*x))/np.sum(np.cos(k*x)**2)
t = np.linspace(0, np.pi, 500)
yy = np.sum(c*np.cos(np.arange(9)*t.reshape(-1, 1)), axis=1)
plt.plot(x, y, 'ro')
plt.plot(t, yy)
plt.show()

## Connecting dots naturally

How to draw a nice curve through given points ${(x_i,y_i)}$?

One way is to connect them with straight lines, creating a piecewise linear function:

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

$\displaystyle \int_0^{10} \sqrt{1+(f'(x))^2}\,dx$

among all functions with ${f(x_i)=y_i}$. As is often the case, the length functional can be replaced with the elastic energy

$\displaystyle \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx$

because the piecewise linear ${f}$ (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

It is not natural for a curve connecting the points with ${1\le y\le 9}$ to shoot up over ${y=40}$. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

$\displaystyle \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx$

and the function that minimizes it subject to conditions ${f(x_i)=y_i}$ looks very nice indeed:

The Euler-Lagrange equation for the functional ${\mathcal{B}}$ dictates that the fourth derivative of ${f}$ is zero in the intervals between the knots ${x_i}$. Thus, ${f}$ is a piecewise cubic polynomial. Also, both ${f}$ and ${f'}$ must be continuous for any function with integrable second derivative. More delicate analysis is required for ${f''}$, but it also can be shown to be continuous for minimizing function ${f}$; moreover, ${f''}$ must vanish at the endpoints ${0}$ and ${10}$. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term ${C}$ to the piecewise linear interpolant ${L}$. Here the spline is shown together with ${L}$ (green) and ${C}$ (magenta).

On each interval ${[x_i,x_{i+1}]}$ the correction term ${C}$ is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

$\displaystyle C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2$

on this interval. There are 20 coefficients ${\alpha_i}$, ${\beta_i}$ to find. At each of 9 knots 1,2,…9 we have two conditions: ${C''}$ must have removable singularity and ${C'}$ must jump by the amount opposite to the jump of ${L'}$. Since ${C''}$ also vanishes at ${1,10}$, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of ${z_i=C''(x_i)}$. Indeed, the values of ${C''}$ at two consecutive knots determine the correction term within: ${\alpha_i= \dfrac{z_{i+1}+2z_i}{6} }$ and ${\beta_i = \dfrac{2z_{i+1}+z_i}{6}}$. This leaves ${n-1}$ equations (from the jumps of ${C'}$) for ${n-1}$ unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

$\displaystyle \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix}$

One can solve the system for ${z_i}$ within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0]
allx = []; spl = []
for i=1:n
xL = a+h*(i-1)
xR = a+h*i
x = linspace(xL,xR,100)
linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
allx = [allx, x]
spl = [spl, linear + cor]
end
plot(allx, spl)
plot(a:h:b, y, 'r*')

## Convexity of polar curves

Everybody knows the second derivative test for the convexity of Cartesian curves ${y=y(x)}$. What is the convexity test for polar curves ${r=r(\theta)}$? Google search brought up Robert Israel’s answer on Math.SE: the relevant inequality is

$\displaystyle \mathcal{C}[r]:=r^2+2(r')^2-r\,r''\ge 0 \ \ \ \ \ (1)$

But when using it, one should be aware of the singularity at the origin. For example, ${r=1+\cos \theta}$ satisfies $\mathcal{C}[r] = 3(1+\cos \theta)\ge 0$ but the curve is not convex: it’s the cardioid.

The formula (1) was derived for ${r> 0}$; the points with ${r=0}$ must be investigated directly. Actually, it is easy to see that when ${r }$ has a strict local minimum with value ${0}$, the polar curve has an inward cusp and therefore is not convex.

As usual, theoretical material is followed by an exercise.

Exercise: find all real numbers ${p}$ such that the polar curve ${r=(1+\cos 2\theta)^p}$ is convex.

All values ${p>0}$ are ruled out by the cusp formed at ${\theta=\pi/2}$. For ${p=0}$ we get a circle, obviously convex. When ${p<0}$, some calculations are in order:

$\displaystyle \mathcal{C}[r] = (1+4p+4p^2+(1-4p^2)\cos 2\theta)(1+\cos 2\theta)^{2p-1}$

For this to be nonnegative for all ${\theta}$, we need ${1+4p+4p^2\ge |1-4p^2|}$. Which is equivalent to two inequalities: ${p(4+8p) \ge 0}$ and ${2+4p\ge 0}$. Since ${p<0}$, the first inequality says that ${p\le -1/2}$, which is the opposite of the second.

Answer: ${p=0}$ and ${p=-1/2}$.

This exercise is relevant to the problem from the previous post Peanut allergy and Polar interpolation about the search for the “right” interpolation method in polar coordinates. Given a set of ${(r,\theta)}$ values, I interpolated ${(r^{1/p},\theta)}$ with a trigonometric polynomial, and then raised that polynomial to power ${p}$.

If the given points had Cartesian coordinates ${(\pm a,0), (0,\pm b)}$ then this interpolation yields ${r=(\alpha+\beta \cos \theta)^p}$ where ${\alpha,\beta}$ depend on ${a,b}$ and satisfy ${\alpha>|\beta|}$. Using the exercise above, one can deduce that ${p=-1/2}$ is the only nonzero power for which the interpolated curve is convex for any given points of the form ${(\pm a,0), (0,\pm b)}$.

In general, curves of the form ${r=P(\theta)^{-1/2}}$, with ${P}$ a trigonometric polynomial, need not be convex. But even then they look more natural than their counterparts with powers ${p\ne -1/2}$. Perhaps this is not surprising: the equation ${r^2P(\theta)=1}$ has a decent chance of being algebraic when the degree of ${P}$ is low.

Random example at the end: ${r=(3-\sin \theta +2\cos \theta+\cos 2\theta-2\sin 2\theta)^{-1/2}}$

## Peanut allergy and Polar interpolation

Draw a closed curve passing through the points ${(\pm 3,0)}$ and ${(0,\pm 1)}$: more specifically, from (3,0) to (0,1) to (-3,0) to (0,-1) and back to (3,0).

I’ll wait.

… Done?

This is an interpolation problem. The geometry of the problem does not allow for curves of the form ${y=f(x)}$. But polar graphs ${\rho=f(\theta)}$ should work. (Aside: in ${(x,y)}$ the variable usually taken to be independent is listed first; in ${(\rho,\theta)}$ it is listed second. Is consistent notation too much to ask for? Yes, I know the answer…) Since ${f}$ must be ${2\pi}$-periodic, it is natural to interpolate ${(3,0),(1,\pi/2),(3,\pi),(1,3\pi/2)}$ by a trigonometric polynomial. The polynomial ${f(\theta)=2+\cos 2\theta}$ does the job. But does it do it well?

This is not what I would draw. My picture would be more like the ellipse with the given points as vertices:

The unnecessary concavity of the peanut graph comes from the second derivative ${f''}$ being too large at the points of minimum (and too small at the points of maximum). We need a function with flat minima and sharp maxima. Here is the comparison between ${f(\theta)=2+\cos 2\theta}$ (red) and the function that describes the ellipse in polar coordinates (blue):

I thought of pre-processing: apply some monotone function ${\varphi\colon (0,\infty)\rightarrow \mathbb R}$ to the ${\rho}$-values, interpolate, and then compose the interpolant with ${\varphi^{-1}}$. The function ${\varphi}$ should have a large derivative near ${0}$, so that its inverse has a small derivative there, flattening the minima.

The first two candidates ${\varphi(r)=\log r}$ and ${\varphi(r)=1/r}$ did not improve the picture enough. But when I tried ${\varphi(r)=1/r^2}$, the effect surprised me. Interpolation between ${(1/9,0),(1,\pi/2),(1/9,\pi),(1,3\pi/2)}$ yields ${(5-4\cos 2\theta)/9}$, which after application of ${\varphi^{-1}}$ produces ${\rho = 3/\sqrt{5-4\cos 2\theta}}$exactly the ellipse pictured above.

Next, I tried a less symmetric example: curve through ${(5,0);(0,2);(-3,0);(0,-2)}$.

Again, the second approach produces a more natural looking curve.

Finally, a curve with five-fold symmetry, with ${\rho}$ ranging from ${2}$ to ${5}$.

The interpolation and plotting were done in Scilab, by invoking the functions given below with commands polar1([3,1,3,1]) or polar2([2,5,2,5,2,5,2,5,2,5]), etc. Because I considered equally spaced interpolation nodes, the coefficients of interpolated polynomials are nothing but the (inverse) discrete Fourier transform of the given values. First function interpolates the radius ${\rho}$ itself.

function polar1(R)
clf()
p = fft(R,1)
t = 0:.01:2*%pi
rho = zeros(t)
for i = 1:length(p)
rho = rho + real(p(i)*exp(-%i*(i-1)*t))
end
polarplot(t,rho,style=5)
t = resize_matrix(linspace(0,2*%pi,length(R)+1),1,length(R))
plot(R.*cos(t), R.*sin(t), 'o')
endfunction

The second function includes preprocessing: it interpolates ${1/\rho^2}$ and then raises the interpolant to power ${-1/2}$.

function polar2(R)
clf()
p = fft(R^(-2),1)
t = 0:.01:2*%pi
rho = zeros(t)
for i = 1:length(p)
rho = rho + real(p(i)*exp(-%i*(i-1)*t))
end
rho = max(rho,0.01*ones(rho))
polarplot(t,rho^(-1/2),style=5)
t = resize_matrix(linspace(0,2*%pi,length(R)+1),1,length(R))
plot(R.*cos(t), R.*sin(t), 'o')
endfunction

Added: one more comparison, ${\rho^{1}}$ vs ${\rho^{-2}}$ vs ${\rho^{-3}}$; the last one loses convexity again.

And then to ${\rho^{-7}}$:

## Gregory-Newton series

[Interpolation] rarely appears in calculus books today, and then only as numerical method. Yet three of the most important founders of calculus, Newton, Gregory, and Leibniz, began their work with interpolation… Of course, interpolation is a numerical method in practice, when one uses only a few terms of the Gregory-Newton series, but the full series is exact and hence of much greater interest.

— John Stillwell, Mathematics and its History.

For a function $f\colon\mathbb R\to\mathbb R$ define $\Delta^0 f=f$, and then inductively $\Delta^{n+1}f(a)=\Delta^n f(a+h)-\Delta^n f(a)$, where $h$ is a fixed step size. Also, let $\displaystyle \binom{\alpha}{n} = \frac{\alpha(\alpha-1)\dots (\alpha-n+1)}{n!}$ where $\alpha\in \mathbb R$ is not necessarily an integer. The Gregory-Newton series for $f$ (centered at zero) is

$\displaystyle f(x) \sim \sum_{n=0}^\infty \Delta^n f(0) \binom{x/h}{n}$

The idea is that $N$th partial sum is the (unique) $N$th degree polynomial that agrees with $f$ at the points $0,h,2h,\dots,Nh$, and we hope that the limit $N\to\infty$ recovers $f$ exactly. But does it really? After all, the formula uses only the values of $f$ at integer multiples of $h$, and there are infinitely many functions (including analytic ones) that have the same values as $f$ at these points. For example, with $f(x)=\sin x$ and $h=\pi$ we get $\Delta^n f(0)=0$ for all $n$, hence the sum is zero.

Convergence is also an issue. Since $\binom{x/h}{n}^{1/n}\to 1$ for most values of $x$, we need $\Delta^n f(0)$ to decay exponentially for the series to be of much use. This is not the case when $f(x)=\cos x$ and $h=\pi$: it is easy to see that $\Delta^n f(0) = (-2)^n$, and therefore the series diverges. Even though the function is analytic… I used OpenOffice spreadsheet to carry out the computations for $f(x)=\cos x$ with various step sizes.

With $h=\pi/2$ we still have exponential divergence:

But at $h=\pi/3$ the pattern becomes periodic:

With $h=1$ the trend is not immediately clear:

But for slightly smaller values, such as $0.9$, the exponential decay becomes evident:

What is happening here? As usual, it helps to switch from trigonometric to exponential form and consider $f(x)=\cos x + i\sin x=e^{ix}$. Another thing that helps is the identity $\displaystyle \Delta^n f(0) = \sum_{k=0}^n (-1)^{n-k}\binom{n}{k} f(kh)$, which is a straightforward induction exercise. Putting the two together, we find $\displaystyle \Delta^n f(0) = (e^{ih}-1)^n$. We could now take the real part to return to the cosine, but it is already clear than $\Delta^n f(0)$ decays exponentially if and only if $|e^{ih}-1|<1$. The latter holds exactly when $|h|<\pi/3$; in particular the step size $h=1$ should work. Here is a Maple plot comparing the degree 10 partial sum (in red) and the cosine itself (in green), using $h=1$. I expanded the plot range beyond $[0,10]$, otherwise the graphs would be indistinguishable.

In contrast, the Taylor polynomial of 10th degree visibly diverges from the function starting at $x=\pm 4$. The interval in which it represents the cosine accurately is shorter than for the interpolating polynomial.

With the real exponential function the situation is somewhat different. If $f(x)=e^x$ then $\displaystyle \Delta^n f(0) = (e^{h}-1)^n$ and convergence holds when $h<\ln 2\approx 0.69$, which is worse than for trigonometric functions. On the other hand, $f(x)=e^{-x}$ has $\displaystyle \Delta^n f(0) = (e^{-h}-1)^n$ which decays exponentially no matter how large $h>0$ is.

Of course, the convergence of a series does not mean that its sum is equal to $f$. We already saw what happens with $h=2\pi$ for either cosine or sine: the Gregory-Newton series degenerates to a constant, which of course does not represent the function.

On the other hand, divergent series are not necessarily useless. Here is the 10th partial sum of the series for cosine with $h=\pi/2$. The series diverges everywhere except at $0$, but the partial sum still does a decent job. The key is to use the right number of terms: not too few, but not too many either.

I find the step size $h=\pi$ particularly interesting, because it addresses a quasi-philosophical question: is trigonometric wave the “canonical” function alternating between $1$ and $-1$ at equal intervals? Although the Gregory-Newton series rapidly diverges, its $N$th partial sum resembles the cosine near the midpoint of the interval, $N\pi/2$. However the quality of approximation is poor. I used $N=20$ below, but even with $N=50$ the sum visibly diverges from cosine in every period.