Discrete Cosine Transforms: Trapezoidal vs Midpoint

The existence of multiple versions of Discrete Cosine Transform (DCT) can be confusing. Wikipedia explains that its 8 types are determined by how one reflects across the boundaries. E.g., one can reflect 1, 2, 3 across the left boundary as 3, 2, 1, 2, 3 or as 3, 2, 1, 1, 2, 3, and there are such choices for the other boundary too (also, the other reflection can be odd or even). Makes sense enough.

But there is another aspect to the two most used forms, DCT-I and DCT-II (types I and II): they can be expressed in terms of the Trapezoidal and Midpoint rules for integration. Here is how.

The cosines {\cos kx}, {k=0,1,\dots} are orthogonal on the interval {[0,\pi]} with respect to the Lebesgue measure. The basis of discrete transforms is that these cosines are also orthogonal with respect to some discrete measures, until certain frequency is reached. Indeed, if cosines are orthogonal with respect to measure {\mu} whose support consists of {n} points, then we can efficiently use them to represent any function defined on the support of {\mu}, and those are naturally identified with sequences of length n.

How to find such measures? It helps that some simple rules of numerical integration are exact for trigonometric polynomials up to some degree.

For example, the trapezoidal rule with n sample points exactly integrates the functions {\cos kx} for {k=0,\dots, 2n-3}. This could be checked by converting to exponential form and summing geometric progressions, but here is a visual explanation with n=4, where the interval {[0,\pi]} is represented as upper semi-circle. The radius of each red circle indicates the weight placed at that point; the endpoints get 1/2 of the weight of other sample points. To integrate {\cos x} correctly, we must have the x-coordinate of the center of mass equal to zero, which is obviously the case.

Trapezoidal rule with 4 points

Replacing {\cos x} by {\cos kx} means multiplying the polar angle of each sample point by {k}. This is what we get:

Argument of cosine multiplied by k=2 or k=4
Argument multiplied by k=3
Argument multiplied by k=5

In all cases the x-coordinate of the center of mass is zero. With k=6 this breaks down, as all the weight gets placed in one point. And this is how it goes in general with integration of sines and cosines: equally spaced points work perfectly until they don’t work at all, which happens when the step size is equal to the period of the function. When {k=2n-2}, the period {2\pi/k} is equal to {\pi/(n-1)}, the spacing of points in the trapezoidal rule.

The orthogonality of cosines has to do with the formula {\cos kx \cos j x = \frac12(\cos (k-j)x) + \frac12(\cos (k+j)x)}. Let {\tau_n} be the measure expressing the trapezoidal rule on {[0,\pi]} with {n} sample points; so it’s the sum of point masses at {0, \pi/(n-1), \dots , \pi}. Then {\{\cos k x \colon k=0,\dots, n-1\}} are orthogonal with respect to {\tau_n} because any product {\cos k x\cos j x } with {k >j} taken from this range will have {k-j, k+j \le 2n-3}. Consequently, we can compute the coefficients of any function {f} in the cosine basis as

{\displaystyle c_k = \int f(x)\cos kx\,d\tau_n(x) \bigg/ \int \cos^2 kx\,d\tau_n(x)}

The above is what DCT-I (discrete cosine transform of type 1) does, up to normalization.

The DCT-II transform uses the Midpoint rule instead of the Trapezoidal rule. Let {\mu_n} be the measure expressing the Midpoint rule on {[0,\pi]} with {n} sample points; it gives equal mass to the points {(2j-1)\pi/(2n)} for {j=1,\dots, n}. These are spaced at {\pi/n} and therefore the midpoint rule is exact for {\cos kx} with {k=0,\dots, 2n-1} which is better than what the trapezoidal rule does. Perhaps more significantly, by identifying the given data points with function values at the midpoints of subintervals we stay away from the endpoints {0,\pi} where the cosines are somewhat restricted by having to have zero slope.

Let’s compare DCT-I and DCT-II on the same data set, {y=(0, \sqrt{1}, \sqrt{2}, \dots, \sqrt{8})}. There are 9 numbers here. Following DCT-I we place them at the sample points of the trapezoidal rule, and expand into cosines using the inner product with respect to {\tau_n}. Here is the plot of the resulting trigonometric polynomial: of course it interpolates the data.

DCT-I interpolation

But DCT-II does it better, despite having exactly the same cosine functions. The only change is that we use {\mu_n} and so place the {y}-values along its support.

DCT-II interpolation

Less oscillation means the high-degree coefficients are smaller, and therefore easier to discard in order to compress information. For example, drop the last two coefficients in each expansion, keeping 6 numbers instead of 8. DCT-II clearly wins in accuracy then.

Truncated DCT-I
Truncated DCT-II

Okay, so the Midpoint rule is better, no surprise. After all, it’s in general about twice as accurate as the Trapezoidal rule. What about Simpson’s rule, would it lead to some super-efficient form of DCT? That is, why don’t we let {\sigma_n} be the discrete measure that expresses Simpson’s rule and use the inner product {\int fg\,d\sigma_n} for cosine expansion? Alas, Simpson’s rule on {n} points is exact only for {\cos kx} with {k=0,\dots, n-2}, which is substantially worse than either Trapezoidal or Midpoint rules. As a result, we don’t get enough orthogonal cosines with respect to {\sigma_n} to have an orthogonal basis. Simpson’s rule has an advantage when dealing with algebraic polynomials, not with trigonometric ones.

Finally, the Python code used for the graphics; I did not use SciPy’s DCT method (which is of course more efficient) to keep the relation to numerical integration explicit in the code. The method trapz implements the trapezoidal rule, and the midpoint rule is just the summation of sampled values. In both cases there is no need to worry about factor dx, since it cancels out when we divide one numerical integral by the other.

import numpy as np
import matplotlib.pyplot as plt
#   Setup 
y = np.sqrt(np.arange(9))
c = np.zeros_like(y)
n = y.size
#   DCT-I, trapezoidal
x = np.arange(n)*np.pi/(n-1)
for k in range(n):
  c[k] = np.trapz(y*np.cos(k*x))/np.trapz(np.cos(k*x)**2)
t = np.linspace(0, np.pi, 500)
yy = np.sum(c*np.cos(np.arange(9)*t.reshape(-1, 1)), axis=1)
plt.plot(x, y, 'ro')
plt.plot(t, yy)
#   DCT-II, midpoint
x = np.arange(n)*np.pi/n + np.pi/(2*n)
for k in range(n):
  c[k] = np.sum(y*np.cos(k*x))/np.sum(np.cos(k*x)**2)
t = np.linspace(0, np.pi, 500)
yy = np.sum(c*np.cos(np.arange(9)*t.reshape(-1, 1)), axis=1)
plt.plot(x, y, 'ro')
plt.plot(t, yy)

From boring to puzzling in 30 iterative steps

The function {f(x)=2\cos x } may be nice and important as a part of trigonometric basis, but there is nothing exciting in its graph:

f(x) = 2 cos x
f(x) = 2 cos x

Let’s look at its iterations {f^n=f\circ f\circ \dots \circ f} where {n } is the number of iterations, not an exponent. Here is the graph of {f^{14}}:

14th iteration
14th iteration

A rectangular pattern is already visible above; further iterations only make it stronger. For example, {f^{30} }:

30 iterations
30 iterations

It may be impossible to see on the graph, but the rectangles are slightly apart from one another (though of course they are connected by the graph of continuous function). This is easier to see on the histogram of the values {f^{n}(0) } for {n=0,\dots, 10000 }, which contains two small gaps in addition to a large one:

Histogram of an orbit of f
Histogram of an orbit of f

What goes on here? The range of {f} on {[-2,2]}, as well as the range of any of its iterates, is of course connected: it is the closed interval {[f^{2}(0),f(0)] = [2 \cos 2, 2]}. But the second iterate {f^2=f\circ f} also has two invariant subintervals, marked here by horizontal lines:

Second iterate
Second iterate

Namely, they are {I_1=[f^{2}(0), f^{4}(0)]} and {I_2=[f^{3}(0),2]}. It is easy to see that {f(I_1)=I_2} and {f(I_2)=I_1}. The gap between {I_1} and {I_2} contains the repelling fixed point of {f}, approximately {x=1.03}. Every orbit except for the fixed point itself is pushed away from this point and is eventually trapped in the cycle between {I_1} and {I_2}.

But there is more. A closer look at the fourth iterate reveals smaller invariant subintervals of {f^4}. Here is what it does on {I_2}:

Fourth iterate
Fourth iterate

Here the gap contains a repelling fixed point of {f^2}, approximately {1.8}. The invariant subintervals of {I_2} are {I_{21}=[f^{3}(0), f^{7}(0)]} and {I_{22}=[f^9(0), 2]}. Also, {I_1} contains invariant subintervals {I_{11}=[f^{2}(0), f^{6}(0)]} and {I_{12}=[f^8(0), f^4(0)]}. These are the projections of the rectangles in the graph of {f^{30}} onto the vertical axes.

No more such splitting occurs. The histogram of the values of iterates of {f} indeed consists of four disjoint intervals. Can one get a Cantor-type set in this way, starting from some boring function?

Integrate by parts twice and solve for the integral

The dreaded calculus torture device that works for exactly two integrals, {\int e^{ax}\sin bx\,dx} and {\int e^{ax}\cos bx\,dx}.

Actually, no. A version of it (with one integration by parts) works for {\int x^n\,dx}:

\displaystyle    \int x^n\,dx = x^n x - \int x\, d(x^n) = x^{n+1} - n \int x^n\,dx

hence (assuming {n\ne -1})

\displaystyle  \int x^n\,dx = \frac{x^{n+1}}{n+1} +C

Yes, this is more of a calculus joke. A more serious example comes from Fourier series.

The functions {\sin nx}, {n=1,2,\dots}, are orthogonal on {[0,\pi]}, in the sense that

\displaystyle    \int_0^\pi \sin nx \sin mx \,dx =0 , \quad m\ne n

This is usually proved using a trigonometric identity that converts the product to a sum. But the double integration by parts give a nicer proof, because no obscure identities are needed. No boundary terms will appear because the sines vanish at both endpoints:

\displaystyle    \int_0^\pi \sin nx \sin mx \,dx = \frac{n}{m} \int_0^\pi \cos nx \cos mx \,dx = \frac{n^2}{m^2} \int_0^\pi \sin nx \sin mx \,dx

All integrals here must vanish because {n^2/m^2\ne 1}. As a bonus, we get the orthogonality of cosines, {\int_0^\pi \cos nx \cos mx \,dx=0}, with no additional effort.

The double integration by parts is also a more conceptual proof, because it gets to the heart of the matter: eigenvectors of a symmetric matrix (operator) that correspond to different eigenvalues are orthogonal. The trigonometric form is incidental, the eigenfunction property is essential. Let’s try this one more time, for the mixed boundary value problem {f(a)=0}, {f'(b)=0}. Suppose that {f} and {g} satisfy the boundary conditions, {f''=\lambda f}, and {g''=\mu g}. Since {fg'} and {f'g} vanish at both endpoints, we can pass the primes easily:

\displaystyle    \int_a^b fg= \frac{1}{\mu}\int_a^b fg'' = -\frac{1}{\mu}\int_a^b f'g' = \frac{1}{\mu}\int_a^b f''g = \frac{\lambda}{\mu} \int_a^b fg

If {\lambda\ne \mu}, all integrals must vanish.

Gregory-Newton series

[Interpolation] rarely appears in calculus books today, and then only as numerical method. Yet three of the most important founders of calculus, Newton, Gregory, and Leibniz, began their work with interpolation… Of course, interpolation is a numerical method in practice, when one uses only a few terms of the Gregory-Newton series, but the full series is exact and hence of much greater interest.

— John Stillwell, Mathematics and its History.

For a function f\colon\mathbb R\to\mathbb R define \Delta^0 f=f, and then inductively \Delta^{n+1}f(a)=\Delta^n f(a+h)-\Delta^n f(a), where h is a fixed step size. Also, let \displaystyle \binom{\alpha}{n} = \frac{\alpha(\alpha-1)\dots (\alpha-n+1)}{n!} where \alpha\in \mathbb R is not necessarily an integer. The Gregory-Newton series for f (centered at zero) is

\displaystyle f(x) \sim \sum_{n=0}^\infty \Delta^n f(0) \binom{x/h}{n}

The idea is that Nth partial sum is the (unique) Nth degree polynomial that agrees with f at the points 0,h,2h,\dots,Nh, and we hope that the limit N\to\infty recovers f exactly. But does it really? After all, the formula uses only the values of f at integer multiples of h, and there are infinitely many functions (including analytic ones) that have the same values as f at these points. For example, with f(x)=\sin x and h=\pi we get \Delta^n f(0)=0 for all n, hence the sum is zero.

Convergence is also an issue. Since \binom{x/h}{n}^{1/n}\to 1 for most values of x, we need \Delta^n f(0) to decay exponentially for the series to be of much use. This is not the case when f(x)=\cos x and h=\pi: it is easy to see that \Delta^n f(0) = (-2)^n, and therefore the series diverges. Even though the function is analytic… I used OpenOffice spreadsheet to carry out the computations for f(x)=\cos x with various step sizes.

With h=\pi/2 we still have exponential divergence:

h = pi / 2

But at h=\pi/3 the pattern becomes periodic:

h = pi / 3

With h=1 the trend is not immediately clear:

h = 1

But for slightly smaller values, such as 0.9, the exponential decay becomes evident:

h = 0.9

What is happening here? As usual, it helps to switch from trigonometric to exponential form and consider f(x)=\cos x + i\sin x=e^{ix}. Another thing that helps is the identity \displaystyle \Delta^n f(0) = \sum_{k=0}^n (-1)^{n-k}\binom{n}{k} f(kh), which is a straightforward induction exercise. Putting the two together, we find \displaystyle \Delta^n f(0) = (e^{ih}-1)^n. We could now take the real part to return to the cosine, but it is already clear than \Delta^n f(0) decays exponentially if and only if |e^{ih}-1|<1. The latter holds exactly when |h|<\pi/3; in particular the step size h=1 should work. Here is a Maple plot comparing the degree 10 partial sum (in red) and the cosine itself (in green), using h=1. I expanded the plot range beyond [0,10], otherwise the graphs would be indistinguishable.

Degree 10 interpolation

In contrast, the Taylor polynomial of 10th degree visibly diverges from the function starting at x=\pm 4. The interval in which it represents the cosine accurately is shorter than for the interpolating polynomial.

Taylor polynomial of 10th degree

With the real exponential function the situation is somewhat different. If f(x)=e^x then \displaystyle \Delta^n f(0) = (e^{h}-1)^n and convergence holds when h<\ln 2\approx 0.69, which is worse than for trigonometric functions. On the other hand, f(x)=e^{-x} has \displaystyle \Delta^n f(0) = (e^{-h}-1)^n which decays exponentially no matter how large h>0 is.

Of course, the convergence of a series does not mean that its sum is equal to f. We already saw what happens with h=2\pi for either cosine or sine: the Gregory-Newton series degenerates to a constant, which of course does not represent the function.

On the other hand, divergent series are not necessarily useless. Here is the 10th partial sum of the series for cosine with h=\pi/2. The series diverges everywhere except at 0, but the partial sum still does a decent job. The key is to use the right number of terms: not too few, but not too many either.

Degree 10, step size pi/2

I find the step size h=\pi particularly interesting, because it addresses a quasi-philosophical question: is trigonometric wave the “canonical” function alternating between 1 and -1 at equal intervals? Although the Gregory-Newton series rapidly diverges, its Nth partial sum resembles the cosine near the midpoint of the interval, N\pi/2. However the quality of approximation is poor. I used N=20 below, but even with N=50 the sum visibly diverges from cosine in every period.

Partial sum of degree 20, using step size pi.