Irrational sunflowers

A neat way to visualize a real number {\alpha} is to make a sunflower out of it. This is an arrangement of points with polar angles {2 \pi \alpha k} and polar radii {\sqrt{k}} (so that the concentric disks around the origin get the number of points proportional to their area). The prototypical sunflower has {\alpha=(\sqrt{5}+1)/2}, the golden ratio. This is about the most uniform arrangement of points within a disk that one can get.

Golden ratio sunflower
Golden ratio sunflower

But nothing stops us from using other numbers. The square root of 5 is not nearly as uniform, forming distinct spirals.

Square root of 5
Square root of 5

The number {e} begins with spirals, but quickly turns into something more uniform.

Euler's sunflower
Euler’s sunflower

The number {\pi} has stronger spirals: seven of them, due to {\pi\approx 22/7} approximation.

pi sunflower
pi sunflower

Of course, if {\pi} was actually {22/7}, the arrangement would have rays instead of spirals:

Rational sunflower: 22/7
Rational sunflower: 22/7

What if we used more points? The previous pictures have 500 points; here is {\pi} with {3000}. The new pattern has 113 rays: {\pi\approx 355/113}.

pi with 3000 points
pi with 3000 points

Apéry’s constant, after beginning with five spirals, refuses to form rays or spirals again even with 3000 points.

Apéry's constant, 3000 points
Apéry’s constant, 3000 points

The images were made with Scilab as follows, with an offset by 1/2 in the polar radius to prevent the center from sticking out too much.

n = 500
alpha = (sqrt(5)+1)/2
r = sqrt([1:n]-1/2)  
theta = 2*%pi*alpha*[1:n]
plot(r.*cos(theta), r.*sin(theta), '*');
set(gca(), "isoview", "on")

Between a cubic spline and the line of regression

Given some data points {(x_i,y_i)} one can devise various functions that extract information from them. Such as

  • Regression line (considered in When the digits of {\pi} go to 11): it detects a linear trend {y=ax+b}, minimizing the sum of squares of residuals {y_i-(ax_i+b)}.
  • Natural cubic spline (considered in Connecting dots naturally), which passes through every data point while minimizing the amount of wiggling, measured by the square of its second derivative. Like this:
Natural cubic spline
Natural cubic spline

A smoothing spline is something in between the above extremes: it insists on neither being a line (i.e., having zero second derivative) nor on passing through given points (i.e., having zero residuals). Instead, it minimizes a weighted sum of both things: the integral of second derivative squared, and the sum of residuals squared. Like this plot, where the red points are given but the spline chooses to interpolate the green ones:

Smoothing spline
Smoothing spline

I’ll demonstrate a version of a smoothing spline that might not be exactly canonical, but is very easy to implement in Matlab or Scilab (which I prefer to use). As in Connecting dots naturally, assume the knots {a=x_0<x_1<\dots<x_n=b} equally spaced at distance {h=(b-a)/n}. The natural cubic spline is determined by the values of its second derivative at {x_1,\dots,x_{n-1}}, denoted {z_1,\dots,z_n}. As explained earlier, these can be found from the linear system

\displaystyle    \frac{h}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_{n-1}\end{pmatrix}   = - \frac{1}{h} \begin{pmatrix} \Delta^2 y_1 \\ \Delta^2 y_2 \\ \vdots \\ \Delta^2 y_{n-1}\end{pmatrix}

where the column on the right contains the amounts by which the derivative of the piecewise linear interpolant through given points jumps at every knot. The notation {\Delta^2y } means the second difference of {(y_i)}, for example {\Delta^2y_1 =y_2-2y_1+y_0}.

A smoothing spline is also a natural spline, but for a different set of points {(x_i,\tilde y_i)}. One has to find {\tilde y_i} that minimize a weighted sum of {\sum_i (\tilde y_i-y_i)^2} and of {\int_a^b (f''(x))^2\,dx}. The latter integral is easily computed in terms of {z_i}: it is equal to {\frac{h}{3}\sum_{i=1}^{n}(z_i^2+z_{i-1}^2+z_iz_{i-1})}. Since this quadratic form is comparable to {\sum_{i=1}^{n}z_i^2}, I’ll work with the latter instead.

The idea is to set up an underdetermined system for {z_i} and {\tilde y_i-y_i}, and let Scilab find a least squares solution of that. Let’s introduce a weight parameter {\lambda>0} that will determine the relative importance of curvature vs residuals. It is convenient to let {\tilde y_i=y_i+\lambda h^2 \delta_i}, so that both {\delta_i} and {z_i} (second derivative) scale the same way. The terms {\delta_i} contributes to the linear system for {z_i}, since the right hand side now has {\tilde y_i} instead of {y_i}. This contribution is {- \lambda h \Delta^2 \delta}. Moving it to the left hand-side (since {\delta_i} are unknowns) we obtain the following system.

\displaystyle    \begin{pmatrix} A & | & B \end{pmatrix} \begin{pmatrix} z \\ \delta \end{pmatrix}   = - \frac{1}{h} \Delta^2 y

where {A} is the same tridiagonal matrix as above, and {B} is the rectangular Laplacian-type matrix

\displaystyle    B = \frac{\lambda}{h} \begin{pmatrix} -1 & 2 & -1 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 2 & -1 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 2 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 2 & -1 \end{pmatrix}

All in all, the system has {2n } unknowns {z_1,\dots,z_{n-1},\delta_0,\dots,\delta_n}, and {(n-1)} equations, reflecting the continuity of first derivative at each interior knot. The lsq command of Scilab finds the solution with the least sum of squares of the unknowns, which is what we are after.

Time for some examples. {\lambda=0} and {\lambda=0.05} can be seen above. Here are more:

lambda = 0.1
lambda = 0.1
lambda = 1
lambda = 1

As {\lambda} increases, the spline approaches the regression line:

lambda = 10
lambda = 10
lambda=100
lambda=100

Finally, the Scilab code. It is almost the same as for natural spline; the difference is in five lines from B=... to newy=... The code after that is merely evaluation and plotting of the spline.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
lambda = 0.1

n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))

B = lambda/h*(diag(-2*ones(1,n+1))+diag(ones(1,n),1)+diag(ones(1,n),-1))
C = [A, B(2:$-1,:)]
sol = lsq(C,-jumps')'
z = [0,sol(1:n-1),0]
newy = y + lambda*h^2*sol(n:$)

allx = []
spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = newy(i)*(xR-x)/h + newy(i+1)*(x-xL)/h
   correction = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear+correction] 
end

plot(allx, spl) 
plot(a:h:b, newy, 'g*')
plot(a:h:b, y, 'r*')

Normalizing to zero mean or median

Pick two random numbers {x_1,x_2} from the interval {[0,1]}; independent, uniformly distributed. Normalize them to have mean zero, which simply means subtracting their mean {(x_1+x_2)/2} from each. Repeat many times. Plot the histogram of all numbers obtained in the process.

Two random numbers  normalized to zero mean
Two random numbers normalized to zero mean

No surprise here. In effect this is the distribution of {Y=(X_1-X_2)/2} with {X_1,X_2} independent and uniformly distributed over {[0,1]}. The probability density function of {Y} is found via convolution, and the convolution of {\chi_{[0,1]}} with itself is a triangular function.

Repeat the same with four numbers {x_1,\dots,x_4}, again subtracting the mean. Now the distribution looks vaguely bell-shaped.

Four random numbers  normalized to zero mean
Four random numbers normalized to zero mean

With ten numbers or more, the distribution is not so bell-shaped anymore: the top is too flat.

Ten random numbers  normalized to zero mean
Ten random numbers normalized to zero mean

The mean now follows an approximately normal distribution, but the fact that it’s subtracted from uniformly distributed {x_1,\dots,x_{10}} amounts to convolving the Gaussian with {\chi_{[0,1]}}. Hence the flattened top.


What if we use the median instead of the mean? With two numbers there is no difference: the median is the same as the mean. With four there is.

Four random numbers  normalized to zero median
Four random numbers normalized to zero median

That’s an odd-looking distribution, with convex curves on both sides of a pointy maximum. And with {10} points it becomes even more strange.

Ten random numbers  normalized to zero median
Ten random numbers normalized to zero median

Scilab code:

k = 10
A = rand(200000,k)
A = A - median(A,'c').*.ones(1,k)
histplot(100,A(:))

Tossing a continuous coin

To generate a random number {X} uniformly distributed on the interval {[0,1]}, one can keep tossing a fair coin, record the outcomes as an infinite sequence {(d_k)} of 0s and 1s, and let {X=\sum_{k=1}^\infty 2^{-k} d_k}. Here is a histogram of {10^6} samples from the uniform distribution… nothing to see here, except maybe an incidental interference pattern.

Sampling the uniform distribution
Sampling the uniform distribution

Let’s note that {X=\frac12 (d_1+x_1)} where {X_1=\sum_{k=1}^\infty 2^{-k} d_{k+1}} has the same distribution as {X} itself, and {d_1} is independent of {X_1}. This has an implication for the (constant) probability density function of {X}:

\displaystyle    p(x) = p(2x) + p(2x-1)

because {2 p(2x)} is the p.d.f. of {\frac12 X_1} and {2p(2x-1)} is the p.d.f. of {\frac12(1+X_1)}. Simply put, {p} is equal to the convolution of the rescaled function {2p(2x)} with the discrete measure {\frac12(\delta_0+\delta_1)}.


Let’s iterate the above construction by letting each {d_k} be uniformly distributed on {[0,1]} instead of being constrained to the endpoints. This is like tossing a “continuous fair coin”. Here is a histogram of {10^6} samples of {X=\sum_{k=1}^\infty 2^{-k} d_k}; predictably, with more averaging the numbers gravitate toward the middle.

Sampling the Fabius distribution
Sampling the Fabius distribution

This is not a normal distribution; the top is too flat. The plot was made with this Scilab code, putting n samples put into b buckets:

n = 1e6
b = 200
z = zeros(1,n)
for i = 1:10
    z = z + rand(1,n)/2^i
end
c = histplot(b,z)

If this plot too jagged, look at the cumulative distribution function instead:

Fabius function
Fabius function

It took just more line of code: plot(linspace(0,1,b),cumsum(c)/sum(c))

Compare the two plots: the c.d.f. looks very similar to the left half of the p.d.f. It turns out, they are identical up to scaling.


Let’s see what is going on here. As before, {X=\frac12 (d_1+X_1)} where {X_1=\sum_{k=1}^\infty 2^{-k} d_{k+1}} has the same distribution as {X} itself, and the summands {d_1,X_1} are independent. But now that {d_1} is uniform, the implication for the p.d.f of {X} is different:

\displaystyle    p(x) = \int_0^{1} 2p(2x-t)\,dt

This is a direct relation between {p} and its antiderivative. Incidentally, if shows that {p} is infinitely differentiable because the right hand side always has one more derivative than the left hand side.


To state the self-similarity property of {X} in the cleanest way possible, one introduces the cumulative distribution function {F} (the Fabius function) and extends it beyond {[0,1]} by alternating even and odd reflections across the right endpoint. The resulting function satisfies the delay-differential equation {F\,'(x)=2F(2x)}: the derivative is a rescaled copy of the function itself.

Since {F} vanishes at the even integers, it follows that at every dyadic rational, all but finitely many derivatives of {F} are zero. The Taylor expansion at such points is a polynomial, while {F} itself is not. Thus, {F} is nowhere analytic despite being everywhere {C^\infty}.

This was, in fact, the motivation for J. Fabius to introduce this construction in 1966 paper Probabilistic Example of a Nowhere Analytic {C^\infty}-Function.

The Nelder-Mead minimization algorithm

It is easy to find the minimum of {f(x,y) = x^2+16y^2} if you are human. For a computer this takes more work:

Search for the minimum of x^2+16y^2
Search for the minimum of x^2+16y^2

The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of {f}. Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?

A natural way to look for minimum is to slide along the graph in the direction opposite to {\nabla f}; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of {f}; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.

This is already a reasonable minimization algorithm: begin with a triangle {T}; find the values of {f} at the vertices of {T}; reflect the triangle away from the highest value; if the reflected point {R} has a smaller value, move there; otherwise stop.

But there’s a problem: the size of triangle never changes in this process. If {T} is large, we won’t know where the minimum is even if {T} eventually covers it. If {T} is small, it will be moving in tiny steps.

Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of {T}. It is natural to contract it toward the “best” vertex (the one with the smallest value of {f}), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of {f} at all vertices becoming very close to one another.

This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of {f}.

The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:

  • Begin with some triangle {T}.
  • Evaluate the function {f} at each vertex. Call the vertices {W,G,B} where {W} is the worst one (the largest value of {f}) and {B} is the best.
  • Reflect {W} about the midpoint of the good side {GB}. Let {R} be the reflected point.
  • If {f(R)<f(B)}, then we consider moving even further in the same direction, extending the line {WR} beyond {R} by half the length of {WR}. Choose between {R} and {E} based on where {f} is smaller, and make the chosen point a new vertex of our triangle, replacing {W}.
  • Else, do not reflect and instead shrink the triangle toward {B}.
  • Repeat, stopping when we either exceed the number of iterations or all values of {f} at the vertices of triangle become nearly equal.

(The full version of the Nelder-Mead algorithm also includes the comparison of {R} with {G}, and also involves trying a point inside the triangle.)

Rosenbrock's function
Rosenbrock’s function

This is Rosenbrock’s function {f(x,y)=100(x^2-y)^2 + (x-1)^2}, one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola {y=x^2}. At the bottom of the valley, the incline toward the minimum {(1,1)} is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.

The algorithm described above gets within {0.001} of the minimum in 65 steps.

Minimizing Rosenbrock's function
Minimizing Rosenbrock’s function

In conclusion, Scilab code with this algorithm.

x = -0.4:0.1:1.6; y = -2:0.1:1.4          // viewing window  
[X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30)  // contour plot
plot([1],[1],'r+')                             // minimum point
tol = 10^(-6)
n = 0
T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5]        //  initial triangle
for i=1:3 
    values(i) = f(T(i,1), T(i,2))
end
while (%T) 
    xpoly(T(:,1),T(:,2),'lines',1)         // draw the triangle  
    [values, index] = gsort(values)          // sort the values 
    T = T(index,:)       
    if values(1)-values(3) < tol            // close enough?
        mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2))
        break 
    end
    R = T(2,:) + T(3,:) - T(1,:)             // reflected  
    fR = f(R(1),R(2))
    if fR < values(3)                         
        E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:)  // extended  
        fE = f(E(1),E(2))
        if fE < fR 
            T(1,:)=E; values(1)=fE     // pick extended 
        else
            T(1,:)=R; values(1)=fR     // pick reflected 
        end
    else 
        for i=1:2
            T(i,:) = (T(i,:)+T(3,:))/2      // shrink
            values(i) = f(T(i,1), T(i,2))      
        end
    end
    n = n+1
    if n >= 200
        disp('Failed to converge'); break    //  too bad 
    end
end

Fractal-ish monotone functions

There are several ways to construct a strictly increasing continuous function which has zero derivative almost everywhere. I like the explicit construction given by R. Salem, On some singular monotonic functions which are strictly increasing (1943).

lambda = 1/4
lambda = 1/4

Here is a streamlined version of the construction.

Fix {\lambda\in (0,1/2)} (on the above picture {\lambda=1/4}). Let {f_0(x)=x}, and inductively define {f_{n+1}} so that

  1. {f_{n+1} (x) = f_n(x)} when {x\in 2^{-n}\mathbb Z}.
  2. If {x\in 2^{-n}\mathbb Z}, let {f_{n+1}(x+2^{-n-1}) =\lambda f_n(x) + (1-\lambda) f_n(x+2^{-n})}.
  3. Now that {f_{n+1}} has been defined on {2^{-n-1}\mathbb Z}, extend it to {\mathbb R} by linear interpolation.
  4. Let {f=\lim f_n}.

Since {f(x+1)=f(x)+1} by construction, it suffices to understand the behavior of {f} on {[0,1]}.

Each {f_n} is piecewise linear and increasing. At each step of the construction, every line segment of {f_n} (say, with slope {s}) is replaced by two segments, with slopes {2(1-\lambda)s} and {2\lambda s}. Since {\lambda f_n(x+2^{-n-1})}. Hence, {f_{n+1}\ge f_n}.

Since {f(x)=f_n(x)} when {x\in 2^{-n}\mathbb Z}, it is easy to understand {f} by considering its values at dyadic rationals and using monotonicity. This is how one can see that:

  • The difference of values of {f} at consecutive points of {2^{-n}\mathbb Z} is at most {(1-r)^{n}}. Therefore, {f} is Hölder continuous with exponent {\alpha = - \frac{\log (1-r)}{\log 2}}.
  • The difference of values of {f} at consecutive points of {2^{-n}\mathbb Z} is at least {r^{n}}. Therefore, {f} is strictly increasing, and its inverse is Hölder continuous with exponent {\alpha = - \frac{\log r}{\log 2}}.

It remains to check that {f'=0} almost everywhere. Since {f} is monotone, it is differentiable almost everywhere. Let {x} be a point of differentiability (and not a dyadic rational, though this is automatic). For each {n} there is {x_n\in 2^{-n}\mathbb Z} such that {x_n < x< x_n+2^{-n}}. Let {s_n = 2^{n} (f_n(x_n+2^{-n})-f_n(x_n))}; this is the slope of {f_n} on the {2^{-n}}-dyadic interval containing {x}. Since {f'(x)} exists, we must have {f'(x) = \lim_{n\rightarrow\infty} s_n}. On the other hand, the ratio of consecutive terms of this sequence, {s_{n+1}/s_n}, is always either {2 (1-\lambda )} or {2\lambda}. Such a sequence cannot have a finite nonzero limit. Thus {f'(x)=0}.


Here is another {f}, with {\lambda=1/8}.

lambda = 1/8
lambda = 1/8

By making {\lambda} very small, and being more careful with the analysis of {f'}, one can make the Hausdorff dimension of the complement of {\{x \colon f'(x)=0\}} arbitrarily small.


An interesting modification of Salem’s function was introduced by Tukia in Hausdorff dimension and quasisymmetric mappings (1989). For the functions considered above, the one-sided derivatives at every dyadic rational are zero and infinity, which is a rather non-symmetric state of affair. In particular, these functions are not quasisymmetric. But Tukia showed that if one alternates between {\lambda} and {1-\lambda} at every step, the resulting homeomorphism of {\mathbb R} becomes quasisymmetric. Here is the picture of alternating construction with {\lambda=1/4}; preliminary stages of construction are in green.

Tukia's modification of Salem's function
Tukia’s modification of Salem’s function

This is quite similar to how the introduction of alternating signs turns Takagi curve (blancmange curve) into a quasiarc (i.e., a curve without cusps); see Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk. But the fractal curves in this post are relatively mild-mannered: they are rectifiable (and thus, not really fractal).


Here is the simple Scilab code I used for the above plots.

r = 1/4
t = [0 1]
f = t
for i = 1:10
    t = [t, t(1:$-1)+2^(-i)]
    f = [f, r*f(1:$-1)+(1-r)*f(2:$)]
    [t, ind] = gsort(t,'g','i')
    f = f(ind)
end
plot(t,f)

To have preliminary stages shown as well, move plot(t,f) into the loop. For Tukia’s alternating version, insert the line r = 1-r into the loop.

Connecting dots naturally

How to draw a nice curve through given points {(x_i,y_i)}?

Some data
Some data

One way is to connect them with straight lines, creating a piecewise linear function:

Piecewise linear interpolant
Piecewise linear interpolant

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

\displaystyle    \int_0^{10} \sqrt{1+(f'(x))^2}\,dx

among all functions with {f(x_i)=y_i}. As is often the case, the length functional can be replaced with the elastic energy

\displaystyle    \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx

because the piecewise linear {f} (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

Interpolating polynomial
Interpolating polynomial

It is not natural for a curve connecting the points with {1\le y\le 9} to shoot up over {y=40}. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

\displaystyle    \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx

and the function that minimizes it subject to conditions {f(x_i)=y_i} looks very nice indeed:

Natural cubic spline
Natural cubic spline

The Euler-Lagrange equation for the functional {\mathcal{B}} dictates that the fourth derivative of {f} is zero in the intervals between the knots {x_i}. Thus, {f} is a piecewise cubic polynomial. Also, both {f} and {f'} must be continuous for any function with integrable second derivative. More delicate analysis is required for {f''}, but it also can be shown to be continuous for minimizing function {f}; moreover, {f''} must vanish at the endpoints {0} and {10}. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term {C} to the piecewise linear interpolant {L}. Here the spline is shown together with {L} (green) and {C} (magenta).

PL interpolant, correction term, and their sum: the cubic spline
PL interpolant, correction term, and their sum: cubic spline

On each interval {[x_i,x_{i+1}]} the correction term {C} is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

\displaystyle    C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2

on this interval. There are 20 coefficients {\alpha_i}, {\beta_i} to find. At each of 9 knots 1,2,…9 we have two conditions: {C''} must have removable singularity and {C'} must jump by the amount opposite to the jump of {L'}. Since {C''} also vanishes at {1,10}, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of {z_i=C''(x_i)}. Indeed, the values of {C''} at two consecutive knots determine the correction term within: {\alpha_i= \dfrac{z_{i+1}+2z_i}{6} } and {\beta_i = \dfrac{2z_{i+1}+z_i}{6}}. This leaves {n-1} equations (from the jumps of {C'}) for {n-1} unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

\displaystyle    \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\    1 & 4 & 1 & 0 & \ldots & 0 & 0 \\    0 & 1 & 4 & 1 & \ldots & 0 & 0 \\    \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\    0 & 0 & 0 & 0 & \ldots & 1 & 4    \end{pmatrix}

One can solve the system for {z_i} within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0] 
allx = []; spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
   cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear + cor] 
end
plot(allx, spl) 
plot(a:h:b, y, 'r*')