Words that contain UIO, and best-fitting lines

In Calculus I we spend a fair amount of time talking about how nicely the tangent line fits a smooth curve.

Tangent line to exponential function at x=0
Tangent line to exponential function at x=0

But truth be told, it fits only near the point of tangency. How can we find the best approximating line for a function {f} on a given interval?

Best linear fit to the exponential function on [0,1]
Best linear fit to the exponential function on [0,1]

A natural measure of quality of approximation is the maximum deviation of the curve from the line, {E(f;\alpha,\beta) = \max_{[a, b]} |f(x) - \alpha x-\beta|} where {\alpha,\beta} are the coefficients in the line equation, to be determined. We need {\alpha,\beta} that minimize {E(f;\alpha,\beta)}.

The Chebyshev equioscillation theorem is quite useful here. For one thing, its name contains the letter combination uio, which Scrabble players may appreciate. (Can you think of other words with this combination?) Also, its statement does not involve concepts outside of Calculus I. Specialized to the case of linear fit, it says that {\alpha,\beta} are optimal if and only if there exist three numbers { x_1<x_2<x_3} in {[a, b]} such that the deviations {\delta_i = f(x_i) - \alpha x_i-\beta}

  • are equal to {E(f;\alpha,\beta)} in absolute value: {|\delta_i| = E(f;\alpha,\beta)} for {i=1,2,3}
  • have alternating signs: {\delta_1 = -\delta_2 = \delta_3}

Let’s consider what this means. First, {f'(x_i) =\alpha} unless {x_i} is an endpoint of {[a,b]}. Since {x_2} cannot be an endpoint, we have {f'(x_2)=\alpha}.

Furthermore, {f(x) - \alpha x } takes the same value at {x_1} and {x_3}. This gives an equation for {x_2}

\displaystyle    f(x_1)-f'(x_2)x_1 = f(x_3)-f'(x_2) x_3    \qquad \qquad (1)

which can be rewritten in the form resembling the Mean Value Theorem:

\displaystyle    f'(x_2) = \frac{f(x_1)-f(x_3)}{x_1-x_3}   \qquad \qquad (2)

If {f'} is strictly monotone, there can be only one {x_i} with {f'(x_i)=\alpha}. Hence {x_1=a} and {x_3=b} in this case, and we find {x_2} by solving (2). This gives {\alpha = f'(x_2)}, and then {\beta} is not hard to find.

Here is how I did this in Sage:

var('x a b')
f = sin(x)  # or another function
df = f.diff(x)
a = # left endpoint
b = # right endpoint

That was the setup. Now the actual computation:

var('x1 x2 x3')
x1 = a
x3 = b
x2 = find_root(f(x=x1)-df(x=x2)*x1 == f(x=x3)-df(x=x2)*x3, a, b)
alpha = df(x=x2)
beta = 1/2*(f(x=x1)-alpha*x1 + f(x=x2)-alpha*x2)
show(plot(f,a,b)+plot(alpha*x+beta,a,b,color='red'))
Fitting a line to the sine curve
Fitting a line to the sine

However, the algorithm fails to properly fit a line to the sine function on {[0,3\pi/2]}:

Not the best fit, obviously
Not the best fit, obviously

The problem is, {f'(x)=\cos x} is no longer monotone, making it possible for two of {x_i} to be interior points. Recalling the identities for cosine, we see that these points must be symmetric about {x=\pi}. One of {x_i} must still be an endpoint, so either {x_1=a} (and {x_3=2\pi-x_2}) or {x_3=b} (and {x_1=2\pi-x_2}). The first option works:

Best fit on [0,3pi/2]
Best fit on [0,3pi/2]

This same line is also the best fit on the full period {[0,2\pi]}. It passes through {(\pi,0)} and has the slope of {-0.2172336...} which is not a number I can recognize.

On the interval {[0,4\pi]}, all three of the above approaches fail:

Wrong!
Wrong!
Also wrong!
Also wrong!
What were you thinking?
What were you thinking?

Luckily we don’t need a computer in this case. Whenever {|f|} has at least three points of maximum with alternating signs of {f}, the Chebyshev equioscillation theorem implies that the best linear fit is the zero function.

Integrate by parts twice and solve for the integral

The dreaded calculus torture device that works for exactly two integrals, {\int e^{ax}\sin bx\,dx} and {\int e^{ax}\cos bx\,dx}.

Actually, no. A version of it (with one integration by parts) works for {\int x^n\,dx}:

\displaystyle    \int x^n\,dx = x^n x - \int x\, d(x^n) = x^{n+1} - n \int x^n\,dx

hence (assuming {n\ne -1})

\displaystyle  \int x^n\,dx = \frac{x^{n+1}}{n+1} +C

Yes, this is more of a calculus joke. A more serious example comes from Fourier series.

The functions {\sin nx}, {n=1,2,\dots}, are orthogonal on {[0,\pi]}, in the sense that

\displaystyle    \int_0^\pi \sin nx \sin mx \,dx =0 , \quad m\ne n

This is usually proved using a trigonometric identity that converts the product to a sum. But the double integration by parts give a nicer proof, because no obscure identities are needed. No boundary terms will appear because the sines vanish at both endpoints:

\displaystyle    \int_0^\pi \sin nx \sin mx \,dx = \frac{n}{m} \int_0^\pi \cos nx \cos mx \,dx = \frac{n^2}{m^2} \int_0^\pi \sin nx \sin mx \,dx

All integrals here must vanish because {n^2/m^2\ne 1}. As a bonus, we get the orthogonality of cosines, {\int_0^\pi \cos nx \cos mx \,dx=0}, with no additional effort.

The double integration by parts is also a more conceptual proof, because it gets to the heart of the matter: eigenvectors of a symmetric matrix (operator) that correspond to different eigenvalues are orthogonal. The trigonometric form is incidental, the eigenfunction property is essential. Let’s try this one more time, for the mixed boundary value problem {f(a)=0}, {f'(b)=0}. Suppose that {f} and {g} satisfy the boundary conditions, {f''=\lambda f}, and {g''=\mu g}. Since {fg'} and {f'g} vanish at both endpoints, we can pass the primes easily:

\displaystyle    \int_a^b fg= \frac{1}{\mu}\int_a^b fg'' = -\frac{1}{\mu}\int_a^b f'g' = \frac{1}{\mu}\int_a^b f''g = \frac{\lambda}{\mu} \int_a^b fg

If {\lambda\ne \mu}, all integrals must vanish.

Real zeros of sine Taylor polynomials

The more terms of Taylor series {\displaystyle \sin x = x-\frac{x^3}{3!}+ \frac{x^5}{5!}- \cdots } we use, the more resemblance we see between the Taylor polynomial and the sine function itself. The first-degree polynomial matches one zero of the sine, and gets the slope right. The third-degree polynomial has three zeros in about the right places.

Third degree Taylor polynomial
Third degree, three zeros

The fifth-degree polynomial will of course have … wait a moment.

Fifth degree Taylor polynomial
Fifth degree, only one zero

Since all four critical points are in the window, there are no real zeros outside of our view. Adding the fifth-degree term not only fails to increase the number of zeros to five, it even drops it back to the level of {T_1(x)=x}. How odd.

Since the sine Taylor series converges uniformly on bounded intervals, for every { A } there exists { n } such that {\max_{[-A,A]} |\sin x-T_n(x)|<1 }. Then { T_n } will have the same sign as { \sin x } at the maxima and minima of the latter. Consequently, it will have about { 2A/\pi } zeros on the interval {[-A,A] }. Indeed, the intermediate value theorem guarantees that many; and the fact that {T_n'(x) \approx \cos x } on { [-A,A]} will not allow for extraneous zeros within this interval.

Using the Taylor remainder estimate and Stirling's approximation, we find {A\approx (n!)^{1/n} \approx n/e }. Therefore, { T_n } will have about { 2n/(\pi e) } real zeros at about the right places. What happens when {|x| } is too large for Taylor remainder estimate to be effective, we can't tell.

Let's just count the zeros, then. Sage online makes it very easy:

sineroots = [[2*n-1,len(sin(x).taylor(x,0,2*n-1).roots(ring=RR))] for n in range(1,51)]
scatter_plot(sineroots) 
Roots of sine Taylor polynomials
Roots of sine Taylor polynomials

The up-and-down pattern in the number of zeros makes for a neat scatter plot. How close is this data to the predicted number { 2n/(\pi e) }? Pretty close.

scatter_plot(sineroots,facecolor='#eeee66') + plot(2*n/(pi*e),(n,1,100))
Compared to 2n/pi*e
Compared to 2n/pi*e

The slope of the blue line is { 2/(\pi e) \approx 0.2342 }; the (ir)rationality of this number is unknown. Thus, just under a quarter of the zeros of { T_n } are expected to be real when { n } is large.

The actual number of real zeros tends to exceed the prediction (by only a few) because some Taylor polynomials have real zeros in the region where they no longer follow the function. For example, { T_{11} } does this:

Spurious zero around x=7
Spurious zero around x=7

Richard S. Varga and Amos J. Carpenter wrote a series of papers titled Zeros of the partial sums of { \cos z } and {\sin z } in which they classify real zeros into Hurwitz (which follow the corresponding trigonometric function) and spurious. They give the precise count of the Hurwitz zeros: {1+2\lfloor n/(\pi e)\rfloor } for the sine and {2\lfloor n/(\pi e)+1/2\rfloor } for the cosine. The total number of real roots does not appear to admit such an explicit formula. It is the sequence A012264 in the OEIS.

Gregory-Newton series

[Interpolation] rarely appears in calculus books today, and then only as numerical method. Yet three of the most important founders of calculus, Newton, Gregory, and Leibniz, began their work with interpolation… Of course, interpolation is a numerical method in practice, when one uses only a few terms of the Gregory-Newton series, but the full series is exact and hence of much greater interest.

— John Stillwell, Mathematics and its History.

For a function f\colon\mathbb R\to\mathbb R define \Delta^0 f=f, and then inductively \Delta^{n+1}f(a)=\Delta^n f(a+h)-\Delta^n f(a), where h is a fixed step size. Also, let \displaystyle \binom{\alpha}{n} = \frac{\alpha(\alpha-1)\dots (\alpha-n+1)}{n!} where \alpha\in \mathbb R is not necessarily an integer. The Gregory-Newton series for f (centered at zero) is

\displaystyle f(x) \sim \sum_{n=0}^\infty \Delta^n f(0) \binom{x/h}{n}

The idea is that Nth partial sum is the (unique) Nth degree polynomial that agrees with f at the points 0,h,2h,\dots,Nh, and we hope that the limit N\to\infty recovers f exactly. But does it really? After all, the formula uses only the values of f at integer multiples of h, and there are infinitely many functions (including analytic ones) that have the same values as f at these points. For example, with f(x)=\sin x and h=\pi we get \Delta^n f(0)=0 for all n, hence the sum is zero.

Convergence is also an issue. Since \binom{x/h}{n}^{1/n}\to 1 for most values of x, we need \Delta^n f(0) to decay exponentially for the series to be of much use. This is not the case when f(x)=\cos x and h=\pi: it is easy to see that \Delta^n f(0) = (-2)^n, and therefore the series diverges. Even though the function is analytic… I used OpenOffice spreadsheet to carry out the computations for f(x)=\cos x with various step sizes.

With h=\pi/2 we still have exponential divergence:

h = pi / 2

But at h=\pi/3 the pattern becomes periodic:

h = pi / 3

With h=1 the trend is not immediately clear:

h = 1

But for slightly smaller values, such as 0.9, the exponential decay becomes evident:

h = 0.9

What is happening here? As usual, it helps to switch from trigonometric to exponential form and consider f(x)=\cos x + i\sin x=e^{ix}. Another thing that helps is the identity \displaystyle \Delta^n f(0) = \sum_{k=0}^n (-1)^{n-k}\binom{n}{k} f(kh), which is a straightforward induction exercise. Putting the two together, we find \displaystyle \Delta^n f(0) = (e^{ih}-1)^n. We could now take the real part to return to the cosine, but it is already clear than \Delta^n f(0) decays exponentially if and only if |e^{ih}-1|<1. The latter holds exactly when |h|<\pi/3; in particular the step size h=1 should work. Here is a Maple plot comparing the degree 10 partial sum (in red) and the cosine itself (in green), using h=1. I expanded the plot range beyond [0,10], otherwise the graphs would be indistinguishable.

Degree 10 interpolation

In contrast, the Taylor polynomial of 10th degree visibly diverges from the function starting at x=\pm 4. The interval in which it represents the cosine accurately is shorter than for the interpolating polynomial.

Taylor polynomial of 10th degree

With the real exponential function the situation is somewhat different. If f(x)=e^x then \displaystyle \Delta^n f(0) = (e^{h}-1)^n and convergence holds when h<\ln 2\approx 0.69, which is worse than for trigonometric functions. On the other hand, f(x)=e^{-x} has \displaystyle \Delta^n f(0) = (e^{-h}-1)^n which decays exponentially no matter how large h>0 is.

Of course, the convergence of a series does not mean that its sum is equal to f. We already saw what happens with h=2\pi for either cosine or sine: the Gregory-Newton series degenerates to a constant, which of course does not represent the function.

On the other hand, divergent series are not necessarily useless. Here is the 10th partial sum of the series for cosine with h=\pi/2. The series diverges everywhere except at 0, but the partial sum still does a decent job. The key is to use the right number of terms: not too few, but not too many either.

Degree 10, step size pi/2

I find the step size h=\pi particularly interesting, because it addresses a quasi-philosophical question: is trigonometric wave the “canonical” function alternating between 1 and -1 at equal intervals? Although the Gregory-Newton series rapidly diverges, its Nth partial sum resembles the cosine near the midpoint of the interval, N\pi/2. However the quality of approximation is poor. I used N=20 below, but even with N=50 the sum visibly diverges from cosine in every period.

Partial sum of degree 20, using step size pi.