Discrete Cosine Transforms: Trapezoidal vs Midpoint

The existence of multiple versions of Discrete Cosine Transform (DCT) can be confusing. Wikipedia explains that its 8 types are determined by how one reflects across the boundaries. E.g., one can reflect 1, 2, 3 across the left boundary as 3, 2, 1, 2, 3 or as 3, 2, 1, 1, 2, 3, and there are such choices for the other boundary too (also, the other reflection can be odd or even). Makes sense enough.

But there is another aspect to the two most used forms, DCT-I and DCT-II (types I and II): they can be expressed in terms of the Trapezoidal and Midpoint rules for integration. Here is how.

The cosines {\cos kx}, {k=0,1,\dots} are orthogonal on the interval {[0,\pi]} with respect to the Lebesgue measure. The basis of discrete transforms is that these cosines are also orthogonal with respect to some discrete measures, until certain frequency is reached. Indeed, if cosines are orthogonal with respect to measure {\mu} whose support consists of {n} points, then we can efficiently use them to represent any function defined on the support of {\mu}, and those are naturally identified with sequences of length n.

How to find such measures? It helps that some simple rules of numerical integration are exact for trigonometric polynomials up to some degree.

For example, the trapezoidal rule with n sample points exactly integrates the functions {\cos kx} for {k=0,\dots, 2n-3}. This could be checked by converting to exponential form and summing geometric progressions, but here is a visual explanation with n=4, where the interval {[0,\pi]} is represented as upper semi-circle. The radius of each red circle indicates the weight placed at that point; the endpoints get 1/2 of the weight of other sample points. To integrate {\cos x} correctly, we must have the x-coordinate of the center of mass equal to zero, which is obviously the case.

Trapezoidal rule with 4 points

Replacing {\cos x} by {\cos kx} means multiplying the polar angle of each sample point by {k}. This is what we get:

Argument of cosine multiplied by k=2 or k=4
Argument multiplied by k=3
Argument multiplied by k=5

In all cases the x-coordinate of the center of mass is zero. With k=6 this breaks down, as all the weight gets placed in one point. And this is how it goes in general with integration of sines and cosines: equally spaced points work perfectly until they don’t work at all, which happens when the step size is equal to the period of the function. When {k=2n-2}, the period {2\pi/k} is equal to {\pi/(n-1)}, the spacing of points in the trapezoidal rule.

The orthogonality of cosines has to do with the formula {\cos kx \cos j x = \frac12(\cos (k-j)x) + \frac12(\cos (k+j)x)}. Let {\tau_n} be the measure expressing the trapezoidal rule on {[0,\pi]} with {n} sample points; so it’s the sum of point masses at {0, \pi/(n-1), \dots , \pi}. Then {\{\cos k x \colon k=0,\dots, n-1\}} are orthogonal with respect to {\tau_n} because any product {\cos k x\cos j x } with {k >j} taken from this range will have {k-j, k+j \le 2n-3}. Consequently, we can compute the coefficients of any function {f} in the cosine basis as

{\displaystyle c_k = \int f(x)\cos kx\,d\tau_n(x) \bigg/ \int \cos^2 kx\,d\tau_n(x)}

The above is what DCT-I (discrete cosine transform of type 1) does, up to normalization.

The DCT-II transform uses the Midpoint rule instead of the Trapezoidal rule. Let {\mu_n} be the measure expressing the Midpoint rule on {[0,\pi]} with {n} sample points; it gives equal mass to the points {(2j-1)\pi/(2n)} for {j=1,\dots, n}. These are spaced at {\pi/n} and therefore the midpoint rule is exact for {\cos kx} with {k=0,\dots, 2n-1} which is better than what the trapezoidal rule does. Perhaps more significantly, by identifying the given data points with function values at the midpoints of subintervals we stay away from the endpoints {0,\pi} where the cosines are somewhat restricted by having to have zero slope.

Let’s compare DCT-I and DCT-II on the same data set, {y=(0, \sqrt{1}, \sqrt{2}, \dots, \sqrt{8})}. There are 9 numbers here. Following DCT-I we place them at the sample points of the trapezoidal rule, and expand into cosines using the inner product with respect to {\tau_n}. Here is the plot of the resulting trigonometric polynomial: of course it interpolates the data.

DCT-I interpolation

But DCT-II does it better, despite having exactly the same cosine functions. The only change is that we use {\mu_n} and so place the {y}-values along its support.

DCT-II interpolation

Less oscillation means the high-degree coefficients are smaller, and therefore easier to discard in order to compress information. For example, drop the last two coefficients in each expansion, keeping 6 numbers instead of 8. DCT-II clearly wins in accuracy then.

Truncated DCT-I
Truncated DCT-II

Okay, so the Midpoint rule is better, no surprise. After all, it’s in general about twice as accurate as the Trapezoidal rule. What about Simpson’s rule, would it lead to some super-efficient form of DCT? That is, why don’t we let {\sigma_n} be the discrete measure that expresses Simpson’s rule and use the inner product {\int fg\,d\sigma_n} for cosine expansion? Alas, Simpson’s rule on {n} points is exact only for {\cos kx} with {k=0,\dots, n-2}, which is substantially worse than either Trapezoidal or Midpoint rules. As a result, we don’t get enough orthogonal cosines with respect to {\sigma_n} to have an orthogonal basis. Simpson’s rule has an advantage when dealing with algebraic polynomials, not with trigonometric ones.

Finally, the Python code used for the graphics; I did not use SciPy’s DCT method (which is of course more efficient) to keep the relation to numerical integration explicit in the code. The method trapz implements the trapezoidal rule, and the midpoint rule is just the summation of sampled values. In both cases there is no need to worry about factor dx, since it cancels out when we divide one numerical integral by the other.

import numpy as np
import matplotlib.pyplot as plt
#   Setup 
y = np.sqrt(np.arange(9))
c = np.zeros_like(y)
n = y.size
#   DCT-I, trapezoidal
x = np.arange(n)*np.pi/(n-1)
for k in range(n):
  c[k] = np.trapz(y*np.cos(k*x))/np.trapz(np.cos(k*x)**2)
t = np.linspace(0, np.pi, 500)
yy = np.sum(c*np.cos(np.arange(9)*t.reshape(-1, 1)), axis=1)
plt.plot(x, y, 'ro')
plt.plot(t, yy)
#   DCT-II, midpoint
x = np.arange(n)*np.pi/n + np.pi/(2*n)
for k in range(n):
  c[k] = np.sum(y*np.cos(k*x))/np.sum(np.cos(k*x)**2)
t = np.linspace(0, np.pi, 500)
yy = np.sum(c*np.cos(np.arange(9)*t.reshape(-1, 1)), axis=1)
plt.plot(x, y, 'ro')
plt.plot(t, yy)

Connecting dots naturally

How to draw a nice curve through given points {(x_i,y_i)}?

Some data
Some data

One way is to connect them with straight lines, creating a piecewise linear function:

Piecewise linear interpolant
Piecewise linear interpolant

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

\displaystyle    \int_0^{10} \sqrt{1+(f'(x))^2}\,dx

among all functions with {f(x_i)=y_i}. As is often the case, the length functional can be replaced with the elastic energy

\displaystyle    \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx

because the piecewise linear {f} (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

Interpolating polynomial
Interpolating polynomial

It is not natural for a curve connecting the points with {1\le y\le 9} to shoot up over {y=40}. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

\displaystyle    \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx

and the function that minimizes it subject to conditions {f(x_i)=y_i} looks very nice indeed:

Natural cubic spline
Natural cubic spline

The Euler-Lagrange equation for the functional {\mathcal{B}} dictates that the fourth derivative of {f} is zero in the intervals between the knots {x_i}. Thus, {f} is a piecewise cubic polynomial. Also, both {f} and {f'} must be continuous for any function with integrable second derivative. More delicate analysis is required for {f''}, but it also can be shown to be continuous for minimizing function {f}; moreover, {f''} must vanish at the endpoints {0} and {10}. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term {C} to the piecewise linear interpolant {L}. Here the spline is shown together with {L} (green) and {C} (magenta).

PL interpolant, correction term, and their sum: the cubic spline
PL interpolant, correction term, and their sum: cubic spline

On each interval {[x_i,x_{i+1}]} the correction term {C} is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

\displaystyle    C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2

on this interval. There are 20 coefficients {\alpha_i}, {\beta_i} to find. At each of 9 knots 1,2,…9 we have two conditions: {C''} must have removable singularity and {C'} must jump by the amount opposite to the jump of {L'}. Since {C''} also vanishes at {1,10}, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of {z_i=C''(x_i)}. Indeed, the values of {C''} at two consecutive knots determine the correction term within: {\alpha_i= \dfrac{z_{i+1}+2z_i}{6} } and {\beta_i = \dfrac{2z_{i+1}+z_i}{6}}. This leaves {n-1} equations (from the jumps of {C'}) for {n-1} unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

\displaystyle    \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\    1 & 4 & 1 & 0 & \ldots & 0 & 0 \\    0 & 1 & 4 & 1 & \ldots & 0 & 0 \\    \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\    0 & 0 & 0 & 0 & \ldots & 1 & 4    \end{pmatrix}

One can solve the system for {z_i} within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0] 
allx = []; spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
   cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear + cor] 
plot(allx, spl) 
plot(a:h:b, y, 'r*')

Convexity of polar curves

Everybody knows the second derivative test for the convexity of Cartesian curves {y=y(x)}. What is the convexity test for polar curves {r=r(\theta)}? Google search brought up Robert Israel’s answer on Math.SE: the relevant inequality is

\displaystyle  \mathcal{C}[r]:=r^2+2(r')^2-r\,r''\ge 0 \ \ \ \ \ (1)

But when using it, one should be aware of the singularity at the origin. For example, {r=1+\cos \theta} satisfies \mathcal{C}[r] = 3(1+\cos \theta)\ge 0 but the curve is not convex: it’s the cardioid.

FooPlot graphics today: fooplot.com
FooPlot graphics today: fooplot.com

The formula (1) was derived for {r> 0}; the points with {r=0} must be investigated directly. Actually, it is easy to see that when {r } has a strict local minimum with value {0}, the polar curve has an inward cusp and therefore is not convex.

As usual, theoretical material is followed by an exercise.

Exercise: find all real numbers {p} such that the polar curve {r=(1+\cos 2\theta)^p} is convex.

All values {p>0} are ruled out by the cusp formed at {\theta=\pi/2}. For {p=0} we get a circle, obviously convex. When {p<0}, some calculations are in order:

\displaystyle \mathcal{C}[r] = (1+4p+4p^2+(1-4p^2)\cos 2\theta)(1+\cos 2\theta)^{2p-1}

For this to be nonnegative for all {\theta}, we need {1+4p+4p^2\ge |1-4p^2|}. Which is equivalent to two inequalities: {p(4+8p) \ge 0} and {2+4p\ge 0}. Since {p<0}, the first inequality says that {p\le -1/2}, which is the opposite of the second.

Answer: {p=0} and {p=-1/2}.

This exercise is relevant to the problem from the previous post, finding the “right” interpolation method in polar coordinates. Given a set of {(r,\theta)} values, I interpolated {(r^{1/p},\theta)} with a trigonometric polynomial, and then raised that polynomial to power {p}.

If the given points had Cartesian coordinates {(\pm a,0), (0,\pm b)} then this interpolation yields {r=(\alpha+\beta \cos \theta)^p} where {\alpha,\beta} depend on {a,b} and satisfy {\alpha>|\beta|}. Using the exercise above, one can deduce that {p=-1/2} is the only nonzero power for which the interpolated curve is convex for any given points of the form {(\pm a,0), (0,\pm b)}.

In general, curves of the form {r=P(\theta)^{-1/2}}, with {P} a trigonometric polynomial, need not be convex. But even then they look more natural than their counterparts with powers {p\ne -1/2}. Perhaps this is not surprising: the equation {r^2P(\theta)=1} has a decent chance of being algebraic when the degree of {P} is low.

Random example at the end: {r=(3-\sin \theta +2\cos \theta+\cos 2\theta-2\sin 2\theta)^{-1/2}}

Celebration of power -1/2
Celebration of power -1/2

Peanut allergy and Polar interpolation

Draw a closed curve passing through the points {(\pm 3,0)} and {(0,\pm 1)}: more specifically, from (3,0) to (0,1) to (-3,0) to (0,-1) and back to (3,0).

How hard could it be?
How hard could it be?

I’ll wait.

… Done?

Okay, you may read further.

This is an interpolation problem. The geometry of the problem does not allow for curves of the form {y=f(x)}. But polar graphs {\rho=f(\theta)} should work. (Aside: in {(x,y)} the variable usually taken to be independent is listed first; in {(\rho,\theta)} it is listed second. Is consistent notation too much to ask for? Yes, I know the answer…) Since {f} must be {2\pi}-periodic, it is natural to interpolate {(3,0),(1,\pi/2),(3,\pi),(1,3\pi/2)} by a trigonometric polynomial. The polynomial {f(\theta)=2+\cos 2\theta} does the job. But does it do it well?

I did not order peanuts
I did not order peanuts

This is not what I would draw. My picture would be more like the ellipse with the given points as vertices:


The unnecessary concavity of the peanut graph comes from the second derivative {f''} being too large at the points of minimum (and too small at the points of maximum). We need a function with flat minima and sharp maxima. Here is the comparison between {f(\theta)=2+\cos 2\theta} (red) and the function that describes the ellipse in polar coordinates (blue):

We want blue, not red.
We want blue, not red.

I thought of pre-processing: apply some monotone function {\varphi\colon (0,\infty)\rightarrow \mathbb R} to the {\rho}-values, interpolate, and then compose the interpolant with {\varphi^{-1}}. The function {\varphi} should have a large derivative near {0}, so that its inverse has a small derivative there, flattening the minima.

The first two candidates {\varphi(r)=\log r} and {\varphi(r)=1/r} did not improve the picture enough. But when I tried {\varphi(r)=1/r^2}, the effect surprised me. Interpolation between {(1/9,0),(1,\pi/2),(1/9,\pi),(1,3\pi/2)} yields {(5-4\cos 2\theta)/9}, which after application of {\varphi^{-1}} produces {\rho = 3/\sqrt{5-4\cos 2\theta}}exactly the ellipse pictured above.

Next, I tried a less symmetric example: curve through {(5,0);(0,2);(-3,0);(0,-2)}.

Egg without preprocessing
Egg without preprocessing
Preprocessed egg
Preprocessed egg

Again, the second approach produces a more natural looking curve.

Finally, a curve with five-fold symmetry, with {\rho} ranging from {2} to {5}.

Star without preprocessing
Star without preprocessing
Star with preprocessing
Star with preprocessing

The interpolation and plotting were done in Scilab, by invoking the functions given below with commands polar1([3,1,3,1]) or polar2([2,5,2,5,2,5,2,5,2,5]), etc. Because I considered equally spaced interpolation nodes, the coefficients of interpolated polynomials are nothing but the (inverse) discrete Fourier transform of the given values. First function interpolates the radius {\rho} itself.

function polar1(R)
 p = fft(R,1)  
 t = 0:.01:2*%pi
 rho = zeros(t)
 for i = 1:length(p)
     rho = rho + real(p(i)*exp(-%i*(i-1)*t))
 t = resize_matrix(linspace(0,2*%pi,length(R)+1),1,length(R))
 plot(R.*cos(t), R.*sin(t), 'o')

The second function includes preprocessing: it interpolates {1/\rho^2} and then raises the interpolant to power {-1/2}.

function polar2(R)
 p = fft(R^(-2),1)
 t = 0:.01:2*%pi
 rho = zeros(t)
 for i = 1:length(p)
     rho = rho + real(p(i)*exp(-%i*(i-1)*t))
 rho = max(rho,0.01*ones(rho))
 t = resize_matrix(linspace(0,2*%pi,length(R)+1),1,length(R))
 plot(R.*cos(t), R.*sin(t), 'o')

Added: one more comparison, {\rho^{1}} vs {\rho^{-2}} vs {\rho^{-3}}; the last one loses convexity again.

Raising to -3 is going too far.
Raising to -3 is going too far.

And then to {\rho^{-7}}:

Power -7
Power -7

Gregory-Newton series

[Interpolation] rarely appears in calculus books today, and then only as numerical method. Yet three of the most important founders of calculus, Newton, Gregory, and Leibniz, began their work with interpolation… Of course, interpolation is a numerical method in practice, when one uses only a few terms of the Gregory-Newton series, but the full series is exact and hence of much greater interest.

— John Stillwell, Mathematics and its History.

For a function f\colon\mathbb R\to\mathbb R define \Delta^0 f=f, and then inductively \Delta^{n+1}f(a)=\Delta^n f(a+h)-\Delta^n f(a), where h is a fixed step size. Also, let \displaystyle \binom{\alpha}{n} = \frac{\alpha(\alpha-1)\dots (\alpha-n+1)}{n!} where \alpha\in \mathbb R is not necessarily an integer. The Gregory-Newton series for f (centered at zero) is

\displaystyle f(x) \sim \sum_{n=0}^\infty \Delta^n f(0) \binom{x/h}{n}

The idea is that Nth partial sum is the (unique) Nth degree polynomial that agrees with f at the points 0,h,2h,\dots,Nh, and we hope that the limit N\to\infty recovers f exactly. But does it really? After all, the formula uses only the values of f at integer multiples of h, and there are infinitely many functions (including analytic ones) that have the same values as f at these points. For example, with f(x)=\sin x and h=\pi we get \Delta^n f(0)=0 for all n, hence the sum is zero.

Convergence is also an issue. Since \binom{x/h}{n}^{1/n}\to 1 for most values of x, we need \Delta^n f(0) to decay exponentially for the series to be of much use. This is not the case when f(x)=\cos x and h=\pi: it is easy to see that \Delta^n f(0) = (-2)^n, and therefore the series diverges. Even though the function is analytic… I used OpenOffice spreadsheet to carry out the computations for f(x)=\cos x with various step sizes.

With h=\pi/2 we still have exponential divergence:

h = pi / 2

But at h=\pi/3 the pattern becomes periodic:

h = pi / 3

With h=1 the trend is not immediately clear:

h = 1

But for slightly smaller values, such as 0.9, the exponential decay becomes evident:

h = 0.9

What is happening here? As usual, it helps to switch from trigonometric to exponential form and consider f(x)=\cos x + i\sin x=e^{ix}. Another thing that helps is the identity \displaystyle \Delta^n f(0) = \sum_{k=0}^n (-1)^{n-k}\binom{n}{k} f(kh), which is a straightforward induction exercise. Putting the two together, we find \displaystyle \Delta^n f(0) = (e^{ih}-1)^n. We could now take the real part to return to the cosine, but it is already clear than \Delta^n f(0) decays exponentially if and only if |e^{ih}-1|<1. The latter holds exactly when |h|<\pi/3; in particular the step size h=1 should work. Here is a Maple plot comparing the degree 10 partial sum (in red) and the cosine itself (in green), using h=1. I expanded the plot range beyond [0,10], otherwise the graphs would be indistinguishable.

Degree 10 interpolation

In contrast, the Taylor polynomial of 10th degree visibly diverges from the function starting at x=\pm 4. The interval in which it represents the cosine accurately is shorter than for the interpolating polynomial.

Taylor polynomial of 10th degree

With the real exponential function the situation is somewhat different. If f(x)=e^x then \displaystyle \Delta^n f(0) = (e^{h}-1)^n and convergence holds when h<\ln 2\approx 0.69, which is worse than for trigonometric functions. On the other hand, f(x)=e^{-x} has \displaystyle \Delta^n f(0) = (e^{-h}-1)^n which decays exponentially no matter how large h>0 is.

Of course, the convergence of a series does not mean that its sum is equal to f. We already saw what happens with h=2\pi for either cosine or sine: the Gregory-Newton series degenerates to a constant, which of course does not represent the function.

On the other hand, divergent series are not necessarily useless. Here is the 10th partial sum of the series for cosine with h=\pi/2. The series diverges everywhere except at 0, but the partial sum still does a decent job. The key is to use the right number of terms: not too few, but not too many either.

Degree 10, step size pi/2

I find the step size h=\pi particularly interesting, because it addresses a quasi-philosophical question: is trigonometric wave the “canonical” function alternating between 1 and -1 at equal intervals? Although the Gregory-Newton series rapidly diverges, its Nth partial sum resembles the cosine near the midpoint of the interval, N\pi/2. However the quality of approximation is poor. I used N=20 below, but even with N=50 the sum visibly diverges from cosine in every period.

Partial sum of degree 20, using step size pi.