B-splines and probability

If one picks two real numbers {X_1,X_2} from the interval {[0,1]} (independent, uniformly distributed), their sum {S_2=X_1+X_2} has the triangular distribution.

Also known as the hat function
Also known as the hat function

The sum {S_3} of three such numbers has a differentiable probability density function:

Piecewise quadratic, C1 smooth
Piecewise quadratic, C1 smooth

And the density of {S_4=X_1+X_2+X_3+X_4} is smoother still: the p.d.f. has two
continuous derivatives.

This begins to look normal
This begins to look normal

As the number of summands increases, these distributions converge to normal if they are translated and scaled properly. But I am not going to do that. Let’s keep the number of summands to four at most.

The p.d.f. of {S_n} is a piecewise polynomial of degree {n-1}. Indeed, for {S_1=X_1} the density is piecewise constant, and the formula

\displaystyle  S_n(x) = \int_{x-1}^x S_{n-1}(t)\,dt

provides the inductive step.

For each {n}, the translated copies of function {S_n} form a partition of unity:

\displaystyle  \sum_{k\in\mathbb Z}S_n(x-k)\equiv 1

The integral recurrence relation gives an easy proof of this:

\displaystyle  \sum_{k\in\mathbb Z}\int_{x-k-1}^{x-k} S_{n-1}(t)\,dt = \int_{\mathbb R} S_{n-1}(t)\,dt = 1

And here is the picture for the quadratic case:

Piecewise quadratic partition of unity
Piecewise quadratic partition of unity

A partition of unity can be used to approximate functions by piecewise polynomials: just multiply each partition element by the value of the function at the center of the corresponding interval, and add the results.

Doing this with {S_2} amounts to piecewise linear interpolation: the original function {f(x)=e^{-x/2}} is in blue, the weighted sum of hat functions in red.

Using the function exp(-x/2)
PL interpolation of exp(-x/2)

With {S_4} we get a smooth curve.

Approximating exp(-x/2) with a cubic B-spline
Approximating exp(-x/2) with a cubic B-spline

Unlike interpolating splines, this curve does not attempt to pass through the given points exactly. However, it has several advantages over interpolating splines:

  • Is easier to calculate; no linear system to solve;
  • Yields positive function for positive data;
  • Yields monotone function for monotone data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s