Partitions of unity and monotonicity-preserving approximation

There are many ways to approximate a given continuous function {f\colon [a, b]\to \mathbb R} (I will consider the interval {[a, b]=[0, 1]} for convenience.) For example, one can use piecewise linear interpolation through the points {(k/n, f(k/n))}, where {k=0, 1, \dots, n}. The resulting piecewise linear function {g} has some nice properties: for example, it is increasing if {f} is increasing. But it is not smooth.

A convenient way to represent piecewise linear interpolation is the sum {g(x) = \sum_{k=0}^n f(k/n) \varphi_k(x)} where the functions {\varphi_k} are the triangles shown below: {\varphi_k(x) = \max(0, 1 - |nx-k|)}.

Triangular basis functions

The functions {{\varphi_k}} form a partition of unity, meaning that {\sum_k \varphi_k \equiv 1} and all {\varphi_k} are nonnegative. This property leads to the estimate

{\displaystyle |f(x) - g(x)| = \left| \sum_{k=0}^n (f(x) - f(k/n)) \varphi_k(x)\right| \le \sum_{k=0}^n |f(x) - f(k/n)| \varphi_k(x) }

The latter sum is small because when {x} is close to {k/n}, the first factor {|f(x) - f(k/n)|} is small by virtue of continuity, while the second factor {\varphi_k(x)} is bounded by {1}. When {x} is far from {k/n}, the second factor {\varphi_k(x)} is zero, so the first one is irrelevant. The upshot is that {f-g} is uniformly small.

But if we want a smooth approximation {g}, we need a smooth partition of unity {{\varphi_k}}. But not just any set of smooth nonnegative functions that add up to {0} is equally good. One desirable property is preserving monotonicity: if {f} is increasing, then {g} should be increasing, just as this works for piecewise linear interpolation. What does this condition require of our partition of unity?

An increasing function can be expressed as a limit of sums of the form {\sum_{j} c_j [x \ge t_j]} where {c_j>0} and {[\cdots ]} is the Iverson bracket: 1 if true, 0 if false. By linearity, it suffices to have increasing {g} for the case {f(x) = [x \ge t]}. In this case {g} is simply {s_m := \sum_{k=m}^n \varphi_k} for some {m}, {0\le m\le n}. So we want all {s_m} to be increasing functions. Which is the case for the triangular partition of unity, when each {s_m} looks like this:

Also known as a piecewise linear activation function

One smooth choice is Bernstein basis polynomials: {\displaystyle \varphi_k(x) = \binom{n}{k} x^k (1-x)^{n-k}}. These are nonnegative on {[0, 1]}, and the binomial formula shows {\displaystyle \sum_{k=0}^n \varphi_k(x) = (x + 1-x)^n \equiv 1}. Are the sums {\displaystyle s_m(x) = \sum_{k=m}^n \binom{n}{k} x^k (1-x)^{n-k}} increasing with {x}? Let’s find out. By the product rule,

{\displaystyle s_m'(x) = \sum_{k=m}^n \binom{n}{k} k x^{k-1} (1-x)^{n-k} - \sum_{k=m}^n \binom{n}{k} (n-k) x^{k} (1-x)^{n-k - 1}}

In the second sum the term with {k=n} vanishes, and the terms with {k<n} can be rewritten as {\displaystyle \frac{n!}{k! (n-k)!} (n-k) x^{k} (1-x)^{n-k - 1}}, which is {\frac{n!}{(k+1)! (n-k-1)!} (k+1) x^{k} (1-x)^{n-k - 1}}, which is {\binom{n}{k+1} (k+1) x^{k} (1-x)^{n-k - 1} }. After the index shift {k+1\mapsto k} this becomes identical to the terms of the first sum and cancels them out (except for the first one). Thus,

{\displaystyle s_m'(x) = \binom{n}{m} m x^{m-1} (1-x)^{n-m} \ge 0 }

To summarize: the Bernstein polynomials {\displaystyle B_n(x) = \sum_{k=0}^n f(k/n) \binom{n}{k} x^k (1-x)^{n-k}} are monotone whenever {f} is. On the other hand, the proof that {B_n\to f} uniformly is somewhat complicated by the fact that the polynomial basis functions {\varphi_k} are not localized the way that the triangle basis functions are: the factors {\varphi_k(x)} do not vanish when {x} is far from {k/n}. I refer to Wikipedia for a proof of convergence (which, by the way, is quite slow).

Bernstein polynomial basis

Is there some middle ground between non-smooth triangles and non-localized polynomials? Yes, of course: piecewise polynomials, splines. More specifically, B-splines which can be defined as follows: B-splines of degree {1} are the triangle basis functions shown above; a B-spline of degree {d+1} is the moving averages of a {B}-spline of degree {d} with a window of length {h = 1/n}. The moving average of {F} can be written as {\frac{1}{h} \int_{x-h/2}^{x+h/2} f}. We get a partition of unity because the sum of moving averages is the moving average of a sum, and averaging a constant function does not change it.

The splines of even degrees are awkward to work with… they are obtained from the triangles by taking those integrals with {h/2} an odd number of times, which makes their knots fall in the midpoints of the uniform grid instead of the grid points themselves. But I will use {d=2} anyway, because this degree is enough for {C^1}-smooth approximation.

Recall that a triangular basis function {\varphi_k} has slope {\pm n} and is supported on an interval {[(k-1)h, (k+1)h]} where {h=1/n}. Accordingly, its moving average {\psi_k} will be supported on {[(k-3/2)h, (k+3/2)h]}. Since {\psi_k'(x) = n(\phi_k(x+h/2) - \phi_k(x-h/2))}, the second derivative {\psi_k''} is {n^2} when {-3/2< nx-k< -1/2}, is {-2n^2} when {|nx-k| < 1/2}, and is {n^2} again when {1/2 < nx-k < 3/2}. This is enough to figure out the formula for {\psi_k}:

{\displaystyle \psi_k(n) = \begin{cases} (nx-k+3/2)^2 / 2, & -3/2\le nx -k\le -1/2 \\ 3/4 -(nx-k)^2, & -1/2\le nx-k \le 1/2 \\ (nx-k-3/2)^2 / 2, & 1/2\le nx -k \le 3/2 \\ \end{cases} }

These look like:

Is this right?

Nice! But wait a moment, the sum near the endpoints is not constant: it is less than 1 because we do not get a contributions of two splines to the left and right of the interval. To correct for this boundary effect, replace {\psi_0} with {\psi_0 + \psi_{-1}} and {\psi_n} with {\psi_n + \psi_{n+1}}, using “ghost” elements of the basis that lie outside of the actual grid. Now the quadratic B-spline basis is correct:

A smooth partition of unity by quadratic splines

Does this partition of unity preserve monotinicity? Yes, it does: {\displaystyle \sum_{k\ge m}\psi_k'(x) = n\sum_{k\ge m} (\phi_k(x+h/2) - \phi_k(x-h/2)) = n(s(x+h/2) - s(x-h/2))} which is nonnegative because the sum {s := \sum_{k\ge m} \phi_k} is an increasing piecewise linear function, as noted previously. Same logic works for B-splines of higher degree.

In conclusion, here is a quadratic B-spline approximation (orange) to a tricky increasing function (blue).

Smooth approximation

One may wonder why the orange curve deviates from the line at the end – did we miss some boundary effect there? Yes, in a way… the spline actually approximates the continuous extension of our original function by constant values on the left and right. Imagine the blue graph continuing to the right as a horizontal line: this creates a corner at {x=1} and the spline is smoothing that corner. To avoid this effect, one may want to extend {f} in a better way and then work with the extended function, not folding the ghosts {\psi_{-1}, \psi_{n+1}} into {\psi_0, \psi_n}.

But even so, B-spline achieves a better approximation than the Bernstein polynomial with the same number of basis functions (eight):

Bernstein polynomial

The reason is the non-local nature of the polynomial basis {\varphi_k}, which was noted above. Bernstein polynomials do match the function perfectly at the endpoints, but this is small consolation.

Critical points of a cubic spline

The choice of piecewise polynomials of degree 3 for interpolation is justifiably popular: even-degree splines are algebraically awkward to construct, degree 1 is simply piecewise linear interpolation (not smooth), and degree 5, while feasible, entails juggling too many coefficients. Besides, a cubic polynomial minimizes the amount of wiggling (the integral of second derivative squared) for given values and slopes at the endpoints of an interval. (Recall Connecting dots naturally.)

But the derivative of a cubic spline is a quadratic spline. And one needs the derivative to find the critical points. This results in an awkward example in SciPy documentation, annotated with “(NB: sproot only works for order 3 splines, so we fit an order 4 spline)”.

Although not implemented in SciPy, the task of computing the roots of a quadratic spline is a simple one. Obtaining the roots from the internal representation of a quadratic spline in SciPy (as a linear combination of B-splines) would take some work and reading. But a quadratic polynomial is determined by three values, so sampling it at three points, such as two consecutive knots and their average, is enough.

Quadratic formula with values instead of coefficients

Suppose we know the values of a quadratic polynomial q at -1, 0, 1, and wish to find if it has roots between -1 and 1. Let’s normalize so that q(0)=1, and let x = q(-1), y = q(1). If either x or y is negative, there is definitely a root on the interval. If they are positive, there is still a chance: we need the parabola to be concave up, have a minimum within [-1, 1], and for the minimum to be negative. All of this is easily determined once we note that the coefficients of the polynomial are a = (x+y)/2 – 1, b = (y-x)/2, and c = 1.

The inequality {(x-y)^2 \ge 8(x+y-2)} ensures the suitable sign of the discriminant. It describes a parabola with vertex (1, 1) and focus (2, 2), contained in the first quadrant and tangent to the axes at (4, 0) and (0, 4). Within the orange region there are no real roots.

No real roots in the orange region

The line x+y=2, tangent to the parabola at its vertex, separates convex and concave parabolas. While concavity in conjunction with x, y being positive definitely precludes having roots in [-1, 1], slight convexity is not much better: it results in real roots outside of the interval. Here is the complete picture: green means there is a root in [-1, 1], orange means no real roots, red covers the rest.

Green = there is a root in the interval [-1, 1]

Back to splines

Since the derivative of a spline is implemented in SciPy (B-splines have a nice formula for derivatives), all we need is a root-finding routine for quadratic splines. Here it is, based on the above observations but using built-in NumPy polynomial solver np.roots to avoid dealing with various special cases for the coefficients.

def quadratic_spline_roots(spl):
    roots = []
    knots = spl.get_knots()
    for a, b in zip(knots[:-1], knots[1:]):
        u, v, w = spl(a), spl((a+b)/2), spl(b)
        t = np.roots([u+w-2*v, w-u, 2*v])
        t = t[np.isreal(t) & (np.abs(t) <= 1)]
        roots.extend(t*(b-a)/2 + (b+a)/2)
    return np.array(roots)

A demonstration, which plots the spline (blue), its critical points (red), and original data points (black) as follows:

There can be 0, 1, or 2 critical points between two knots

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline

x = np.arange(7)
y = np.array([3, 1, 1, 2, 2, 4, 3])
f = InterpolatedUnivariateSpline(x, y, k=3)
crit_pts = quadratic_spline_roots(f.derivative())

t = np.linspace(x[0], x[-1], 500)
plt.plot(t, f(t))
plt.plot(x, y, 'kd')
plt.plot(crit_pts, f(crit_pts), 'ro')


Between a cubic spline and the line of regression

Given some data points {(x_i,y_i)} one can devise various functions that extract information from them. Such as

  • Regression line (considered in When the digits of {\pi} go to 11): it detects a linear trend {y=ax+b}, minimizing the sum of squares of residuals {y_i-(ax_i+b)}.
  • Natural cubic spline (considered in Connecting dots naturally), which passes through every data point while minimizing the amount of wiggling, measured by the square of its second derivative. Like this:

Natural cubic spline
Natural cubic spline

A smoothing spline is something in between the above extremes: it insists on neither being a line (i.e., having zero second derivative) nor on passing through given points (i.e., having zero residuals). Instead, it minimizes a weighted sum of both things: the integral of second derivative squared, and the sum of residuals squared. Like this plot, where the red points are given but the spline chooses to interpolate the green ones:

Smoothing spline
Smoothing spline

I’ll demonstrate a version of a smoothing spline that might not be exactly canonical, but is very easy to implement in Matlab or Scilab (which I prefer to use). As in Connecting dots naturally, assume the knots {a=x_0<x_1<\dots<x_n=b} equally spaced at distance {h=(b-a)/n}. The natural cubic spline is determined by the values of its second derivative at {x_1,\dots,x_{n-1}}, denoted {z_1,\dots,z_n}. As explained earlier, these can be found from the linear system

\displaystyle    \frac{h}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_{n-1}\end{pmatrix}   = - \frac{1}{h} \begin{pmatrix} \Delta^2 y_1 \\ \Delta^2 y_2 \\ \vdots \\ \Delta^2 y_{n-1}\end{pmatrix}

where the column on the right contains the amounts by which the derivative of the piecewise linear interpolant through given points jumps at every knot. The notation {\Delta^2y } means the second difference of {(y_i)}, for example {\Delta^2y_1 =y_2-2y_1+y_0}.

A smoothing spline is also a natural spline, but for a different set of points {(x_i,\tilde y_i)}. One has to find {\tilde y_i} that minimize a weighted sum of {\sum_i (\tilde y_i-y_i)^2} and of {\int_a^b (f''(x))^2\,dx}. The latter integral is easily computed in terms of {z_i}: it is equal to {\frac{h}{3}\sum_{i=1}^{n}(z_i^2+z_{i-1}^2+z_iz_{i-1})}. Since this quadratic form is comparable to {\sum_{i=1}^{n}z_i^2}, I’ll work with the latter instead.

The idea is to set up an underdetermined system for {z_i} and {\tilde y_i-y_i}, and let Scilab find a least squares solution of that. Let’s introduce a weight parameter {\lambda>0} that will determine the relative importance of curvature vs residuals. It is convenient to let {\tilde y_i=y_i+\lambda h^2 \delta_i}, so that both {\delta_i} and {z_i} (second derivative) scale the same way. The terms {\delta_i} contributes to the linear system for {z_i}, since the right hand side now has {\tilde y_i} instead of {y_i}. This contribution is {- \lambda h \Delta^2 \delta}. Moving it to the left hand-side (since {\delta_i} are unknowns) we obtain the following system.

\displaystyle    \begin{pmatrix} A & | & B \end{pmatrix} \begin{pmatrix} z \\ \delta \end{pmatrix}   = - \frac{1}{h} \Delta^2 y

where {A} is the same tridiagonal matrix as above, and {B} is the rectangular Laplacian-type matrix

\displaystyle    B = \frac{\lambda}{h} \begin{pmatrix} -1 & 2 & -1 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 2 & -1 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 2 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 2 & -1 \end{pmatrix}

All in all, the system has {2n } unknowns {z_1,\dots,z_{n-1},\delta_0,\dots,\delta_n}, and {(n-1)} equations, reflecting the continuity of first derivative at each interior knot. The lsq command of Scilab finds the solution with the least sum of squares of the unknowns, which is what we are after.

Time for some examples. {\lambda=0} and {\lambda=0.05} can be seen above. Here are more:

lambda = 0.1
lambda = 0.1

lambda = 1
lambda = 1

As {\lambda} increases, the spline approaches the regression line:

lambda = 10
lambda = 10


Finally, the Scilab code. It is almost the same as for natural spline; the difference is in five lines from B=... to newy=... The code after that is merely evaluation and plotting of the spline.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
lambda = 0.1

n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))

B = lambda/h*(diag(-2*ones(1,n+1))+diag(ones(1,n),1)+diag(ones(1,n),-1))
C = [A, B(2:$-1,:)]
sol = lsq(C,-jumps')'
z = [0,sol(1:n-1),0]
newy = y + lambda*h^2*sol(n:$)

allx = []
spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = newy(i)*(xR-x)/h + newy(i+1)*(x-xL)/h
   correction = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear+correction] 

plot(allx, spl) 
plot(a:h:b, newy, 'g*')
plot(a:h:b, y, 'r*')

B-splines and probability

If one picks two real numbers {X_1,X_2} from the interval {[0,1]} (independent, uniformly distributed), their sum {S_2=X_1+X_2} has the triangular distribution.

Also known as the hat function
Also known as the hat function

The sum {S_3} of three such numbers has a differentiable probability density function:

Piecewise quadratic, C1 smooth
Piecewise quadratic, C1 smooth

And the density of {S_4=X_1+X_2+X_3+X_4} is smoother still: the p.d.f. has two
continuous derivatives.

This begins to look normal
This begins to look normal

As the number of summands increases, these distributions converge to normal if they are translated and scaled properly. But I am not going to do that. Let’s keep the number of summands to four at most.

The p.d.f. of {S_n} is a piecewise polynomial of degree {n-1}. Indeed, for {S_1=X_1} the density is piecewise constant, and the formula

\displaystyle  S_n(x) = \int_{x-1}^x S_{n-1}(t)\,dt

provides the inductive step.

For each {n}, the translated copies of function {S_n} form a partition of unity:

\displaystyle  \sum_{k\in\mathbb Z}S_n(x-k)\equiv 1

The integral recurrence relation gives an easy proof of this:

\displaystyle  \sum_{k\in\mathbb Z}\int_{x-k-1}^{x-k} S_{n-1}(t)\,dt = \int_{\mathbb R} S_{n-1}(t)\,dt = 1

And here is the picture for the quadratic case:

Piecewise quadratic partition of unity
Piecewise quadratic partition of unity

A partition of unity can be used to approximate functions by piecewise polynomials: just multiply each partition element by the value of the function at the center of the corresponding interval, and add the results.

Doing this with {S_2} amounts to piecewise linear interpolation: the original function {f(x)=e^{-x/2}} is in blue, the weighted sum of hat functions in red.

Using the function exp(-x/2)
PL interpolation of exp(-x/2)

With {S_4} we get a smooth curve.

Approximating exp(-x/2) with a cubic B-spline
Approximating exp(-x/2) with a cubic B-spline

Unlike interpolating splines, this curve does not attempt to pass through the given points exactly. However, it has several advantages over interpolating splines:

  • Is easier to calculate; no linear system to solve;
  • Yields positive function for positive data;
  • Yields monotone function for monotone data

Connecting dots naturally

How to draw a nice curve through given points {(x_i,y_i)}?

Some data
Some data

One way is to connect them with straight lines, creating a piecewise linear function:

Piecewise linear interpolant
Piecewise linear interpolant

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

\displaystyle    \int_0^{10} \sqrt{1+(f'(x))^2}\,dx

among all functions with {f(x_i)=y_i}. As is often the case, the length functional can be replaced with the elastic energy

\displaystyle    \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx

because the piecewise linear {f} (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

Interpolating polynomial
Interpolating polynomial

It is not natural for a curve connecting the points with {1\le y\le 9} to shoot up over {y=40}. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

\displaystyle    \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx

and the function that minimizes it subject to conditions {f(x_i)=y_i} looks very nice indeed:

Natural cubic spline
Natural cubic spline

The Euler-Lagrange equation for the functional {\mathcal{B}} dictates that the fourth derivative of {f} is zero in the intervals between the knots {x_i}. Thus, {f} is a piecewise cubic polynomial. Also, both {f} and {f'} must be continuous for any function with integrable second derivative. More delicate analysis is required for {f''}, but it also can be shown to be continuous for minimizing function {f}; moreover, {f''} must vanish at the endpoints {0} and {10}. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term {C} to the piecewise linear interpolant {L}. Here the spline is shown together with {L} (green) and {C} (magenta).

PL interpolant, correction term, and their sum: the cubic spline
PL interpolant, correction term, and their sum: cubic spline

On each interval {[x_i,x_{i+1}]} the correction term {C} is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

\displaystyle    C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2

on this interval. There are 20 coefficients {\alpha_i}, {\beta_i} to find. At each of 9 knots 1,2,…9 we have two conditions: {C''} must have removable singularity and {C'} must jump by the amount opposite to the jump of {L'}. Since {C''} also vanishes at {1,10}, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of {z_i=C''(x_i)}. Indeed, the values of {C''} at two consecutive knots determine the correction term within: {\alpha_i= \dfrac{z_{i+1}+2z_i}{6} } and {\beta_i = \dfrac{2z_{i+1}+z_i}{6}}. This leaves {n-1} equations (from the jumps of {C'}) for {n-1} unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

\displaystyle    \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\    1 & 4 & 1 & 0 & \ldots & 0 & 0 \\    0 & 1 & 4 & 1 & \ldots & 0 & 0 \\    \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\    0 & 0 & 0 & 0 & \ldots & 1 & 4    \end{pmatrix}

One can solve the system for {z_i} within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0] 
allx = []; spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
   cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear + cor] 
plot(allx, spl) 
plot(a:h:b, y, 'r*')