Critical points of a cubic spline

The choice of piecewise polynomials of degree 3 for interpolation is justifiably popular: even-degree splines are algebraically awkward to construct, degree 1 is simply piecewise linear interpolation (not smooth), and degree 5, while feasible, entails juggling too many coefficients. Besides, a cubic polynomial minimizes the amount of wiggling (the integral of second derivative squared) for given values and slopes at the endpoints of an interval. (Recall Connecting dots naturally.)

But the derivative of a cubic spline is a quadratic spline. And one needs the derivative to find the critical points. This results in an awkward example in SciPy documentation, annotated with “(NB: sproot only works for order 3 splines, so we fit an order 4 spline)”.

Although not implemented in SciPy, the task of computing the roots of a quadratic spline is a simple one. Obtaining the roots from the internal representation of a quadratic spline in SciPy (as a linear combination of B-splines) would take some work and reading. But a quadratic polynomial is determined by three values, so sampling it at three points, such as two consecutive knots and their average, is enough.

Quadratic formula with values instead of coefficients

Suppose we know the values of a quadratic polynomial q at -1, 0, 1, and wish to find if it has roots between -1 and 1. Let’s normalize so that q(0)=1, and let x = q(-1), y = q(1). If either x or y is negative, there is definitely a root on the interval. If they are positive, there is still a chance: we need the parabola to be concave up, have a minimum within [-1, 1], and for the minimum to be negative. All of this is easily determined once we note that the coefficients of the polynomial are a = (x+y)/2 – 1, b = (y-x)/2, and c = 1.

The inequality {(x-y)^2 \ge 8(x+y-2)} ensures the suitable sign of the discriminant. It describes a parabola with vertex (1, 1) and focus (2, 2), contained in the first quadrant and tangent to the axes at (4, 0) and (0, 4). Within the orange region there are no real roots.

orange
No real roots in the orange region

The line x+y=2, tangent to the parabola at its vertex, separates convex and concave parabolas. While concavity in conjunction with x, y being positive definitely precludes having roots in [-1, 1], slight convexity is not much better: it results in real roots outside of the interval. Here is the complete picture: green means there is a root in [-1, 1], orange means no real roots, red covers the rest.

all_colors
Green = there is a root in the interval [-1, 1]

Back to splines

Since the derivative of a spline is implemented in SciPy (B-splines have a nice formula for derivatives), all we need is a root-finding routine for quadratic splines. Here it is, based on the above observations but using built-in NumPy polynomial solver np.roots to avoid dealing with various special cases for the coefficients.

def quadratic_spline_roots(spl):
    roots = []
    knots = spl.get_knots()
    for a, b in zip(knots[:-1], knots[1:]):
        u, v, w = spl(a), spl((a+b)/2), spl(b)
        t = np.roots([u+w-2*v, w-u, 2*v])
        t = t[np.isreal(t) & (np.abs(t) <= 1)]
        roots.extend(t*(b-a)/2 + (b+a)/2)
    return np.array(roots)

A demonstration, which plots the spline (blue), its critical points (red), and original data points (black) as follows:

spline
There can be 0, 1, or 2 critical points between two knots
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline

x = np.arange(7)
y = np.array([3, 1, 1, 2, 2, 4, 3])
f = InterpolatedUnivariateSpline(x, y, k=3)
crit_pts = quadratic_spline_roots(f.derivative())

t = np.linspace(x[0], x[-1], 500)
plt.plot(t, f(t))
plt.plot(x, y, 'kd')
plt.plot(crit_pts, f(crit_pts), 'ro')
plt.show()

 

Between a cubic spline and the line of regression

Given some data points {(x_i,y_i)} one can devise various functions that extract information from them. Such as

  • Regression line (considered in When the digits of {\pi} go to 11): it detects a linear trend {y=ax+b}, minimizing the sum of squares of residuals {y_i-(ax_i+b)}.
  • Natural cubic spline (considered in Connecting dots naturally), which passes through every data point while minimizing the amount of wiggling, measured by the square of its second derivative. Like this:
Natural cubic spline
Natural cubic spline

A smoothing spline is something in between the above extremes: it insists on neither being a line (i.e., having zero second derivative) nor on passing through given points (i.e., having zero residuals). Instead, it minimizes a weighted sum of both things: the integral of second derivative squared, and the sum of residuals squared. Like this plot, where the red points are given but the spline chooses to interpolate the green ones:

Smoothing spline
Smoothing spline

I’ll demonstrate a version of a smoothing spline that might not be exactly canonical, but is very easy to implement in Matlab or Scilab (which I prefer to use). As in Connecting dots naturally, assume the knots {a=x_0<x_1<\dots<x_n=b} equally spaced at distance {h=(b-a)/n}. The natural cubic spline is determined by the values of its second derivative at {x_1,\dots,x_{n-1}}, denoted {z_1,\dots,z_n}. As explained earlier, these can be found from the linear system

\displaystyle    \frac{h}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_{n-1}\end{pmatrix}   = - \frac{1}{h} \begin{pmatrix} \Delta^2 y_1 \\ \Delta^2 y_2 \\ \vdots \\ \Delta^2 y_{n-1}\end{pmatrix}

where the column on the right contains the amounts by which the derivative of the piecewise linear interpolant through given points jumps at every knot. The notation {\Delta^2y } means the second difference of {(y_i)}, for example {\Delta^2y_1 =y_2-2y_1+y_0}.

A smoothing spline is also a natural spline, but for a different set of points {(x_i,\tilde y_i)}. One has to find {\tilde y_i} that minimize a weighted sum of {\sum_i (\tilde y_i-y_i)^2} and of {\int_a^b (f''(x))^2\,dx}. The latter integral is easily computed in terms of {z_i}: it is equal to {\frac{h}{3}\sum_{i=1}^{n}(z_i^2+z_{i-1}^2+z_iz_{i-1})}. Since this quadratic form is comparable to {\sum_{i=1}^{n}z_i^2}, I’ll work with the latter instead.

The idea is to set up an underdetermined system for {z_i} and {\tilde y_i-y_i}, and let Scilab find a least squares solution of that. Let’s introduce a weight parameter {\lambda>0} that will determine the relative importance of curvature vs residuals. It is convenient to let {\tilde y_i=y_i+\lambda h^2 \delta_i}, so that both {\delta_i} and {z_i} (second derivative) scale the same way. The terms {\delta_i} contributes to the linear system for {z_i}, since the right hand side now has {\tilde y_i} instead of {y_i}. This contribution is {- \lambda h \Delta^2 \delta}. Moving it to the left hand-side (since {\delta_i} are unknowns) we obtain the following system.

\displaystyle    \begin{pmatrix} A & | & B \end{pmatrix} \begin{pmatrix} z \\ \delta \end{pmatrix}   = - \frac{1}{h} \Delta^2 y

where {A} is the same tridiagonal matrix as above, and {B} is the rectangular Laplacian-type matrix

\displaystyle    B = \frac{\lambda}{h} \begin{pmatrix} -1 & 2 & -1 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 2 & -1 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 2 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 2 & -1 \end{pmatrix}

All in all, the system has {2n } unknowns {z_1,\dots,z_{n-1},\delta_0,\dots,\delta_n}, and {(n-1)} equations, reflecting the continuity of first derivative at each interior knot. The lsq command of Scilab finds the solution with the least sum of squares of the unknowns, which is what we are after.

Time for some examples. {\lambda=0} and {\lambda=0.05} can be seen above. Here are more:

lambda = 0.1
lambda = 0.1
lambda = 1
lambda = 1

As {\lambda} increases, the spline approaches the regression line:

lambda = 10
lambda = 10
lambda=100
lambda=100

Finally, the Scilab code. It is almost the same as for natural spline; the difference is in five lines from B=... to newy=... The code after that is merely evaluation and plotting of the spline.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
lambda = 0.1

n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))

B = lambda/h*(diag(-2*ones(1,n+1))+diag(ones(1,n),1)+diag(ones(1,n),-1))
C = [A, B(2:$-1,:)]
sol = lsq(C,-jumps')'
z = [0,sol(1:n-1),0]
newy = y + lambda*h^2*sol(n:$)

allx = []
spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = newy(i)*(xR-x)/h + newy(i+1)*(x-xL)/h
   correction = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear+correction] 
end

plot(allx, spl) 
plot(a:h:b, newy, 'g*')
plot(a:h:b, y, 'r*')

B-splines and probability

If one picks two real numbers {X_1,X_2} from the interval {[0,1]} (independent, uniformly distributed), their sum {S_2=X_1+X_2} has the triangular distribution.

Also known as the hat function
Also known as the hat function

The sum {S_3} of three such numbers has a differentiable probability density function:

Piecewise quadratic, C1 smooth
Piecewise quadratic, C1 smooth

And the density of {S_4=X_1+X_2+X_3+X_4} is smoother still: the p.d.f. has two
continuous derivatives.

This begins to look normal
This begins to look normal

As the number of summands increases, these distributions converge to normal if they are translated and scaled properly. But I am not going to do that. Let’s keep the number of summands to four at most.

The p.d.f. of {S_n} is a piecewise polynomial of degree {n-1}. Indeed, for {S_1=X_1} the density is piecewise constant, and the formula

\displaystyle  S_n(x) = \int_{x-1}^x S_{n-1}(t)\,dt

provides the inductive step.

For each {n}, the translated copies of function {S_n} form a partition of unity:

\displaystyle  \sum_{k\in\mathbb Z}S_n(x-k)\equiv 1

The integral recurrence relation gives an easy proof of this:

\displaystyle  \sum_{k\in\mathbb Z}\int_{x-k-1}^{x-k} S_{n-1}(t)\,dt = \int_{\mathbb R} S_{n-1}(t)\,dt = 1

And here is the picture for the quadratic case:

Piecewise quadratic partition of unity
Piecewise quadratic partition of unity

A partition of unity can be used to approximate functions by piecewise polynomials: just multiply each partition element by the value of the function at the center of the corresponding interval, and add the results.

Doing this with {S_2} amounts to piecewise linear interpolation: the original function {f(x)=e^{-x/2}} is in blue, the weighted sum of hat functions in red.

Using the function exp(-x/2)
PL interpolation of exp(-x/2)

With {S_4} we get a smooth curve.

Approximating exp(-x/2) with a cubic B-spline
Approximating exp(-x/2) with a cubic B-spline

Unlike interpolating splines, this curve does not attempt to pass through the given points exactly. However, it has several advantages over interpolating splines:

  • Is easier to calculate; no linear system to solve;
  • Yields positive function for positive data;
  • Yields monotone function for monotone data

Connecting dots naturally

How to draw a nice curve through given points {(x_i,y_i)}?

Some data
Some data

One way is to connect them with straight lines, creating a piecewise linear function:

Piecewise linear interpolant
Piecewise linear interpolant

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

\displaystyle    \int_0^{10} \sqrt{1+(f'(x))^2}\,dx

among all functions with {f(x_i)=y_i}. As is often the case, the length functional can be replaced with the elastic energy

\displaystyle    \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx

because the piecewise linear {f} (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

Interpolating polynomial
Interpolating polynomial

It is not natural for a curve connecting the points with {1\le y\le 9} to shoot up over {y=40}. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

\displaystyle    \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx

and the function that minimizes it subject to conditions {f(x_i)=y_i} looks very nice indeed:

Natural cubic spline
Natural cubic spline

The Euler-Lagrange equation for the functional {\mathcal{B}} dictates that the fourth derivative of {f} is zero in the intervals between the knots {x_i}. Thus, {f} is a piecewise cubic polynomial. Also, both {f} and {f'} must be continuous for any function with integrable second derivative. More delicate analysis is required for {f''}, but it also can be shown to be continuous for minimizing function {f}; moreover, {f''} must vanish at the endpoints {0} and {10}. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term {C} to the piecewise linear interpolant {L}. Here the spline is shown together with {L} (green) and {C} (magenta).

PL interpolant, correction term, and their sum: the cubic spline
PL interpolant, correction term, and their sum: cubic spline

On each interval {[x_i,x_{i+1}]} the correction term {C} is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

\displaystyle    C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2

on this interval. There are 20 coefficients {\alpha_i}, {\beta_i} to find. At each of 9 knots 1,2,…9 we have two conditions: {C''} must have removable singularity and {C'} must jump by the amount opposite to the jump of {L'}. Since {C''} also vanishes at {1,10}, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of {z_i=C''(x_i)}. Indeed, the values of {C''} at two consecutive knots determine the correction term within: {\alpha_i= \dfrac{z_{i+1}+2z_i}{6} } and {\beta_i = \dfrac{2z_{i+1}+z_i}{6}}. This leaves {n-1} equations (from the jumps of {C'}) for {n-1} unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

\displaystyle    \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\    1 & 4 & 1 & 0 & \ldots & 0 & 0 \\    0 & 1 & 4 & 1 & \ldots & 0 & 0 \\    \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\    0 & 0 & 0 & 0 & \ldots & 1 & 4    \end{pmatrix}

One can solve the system for {z_i} within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0] 
allx = []; spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
   cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear + cor] 
end
plot(allx, spl) 
plot(a:h:b, y, 'r*')