## Irrational sunflowers

A neat way to visualize a real number ${\alpha}$ is to make a sunflower out of it. This is an arrangement of points with polar angles ${2 \pi \alpha k}$ and polar radii ${\sqrt{k}}$ (so that the concentric disks around the origin get the number of points proportional to their area). The prototypical sunflower has ${\alpha=(\sqrt{5}+1)/2}$, the golden ratio. This is about the most uniform arrangement of points within a disk that one can get.

But nothing stops us from using other numbers. The square root of 5 is not nearly as uniform, forming distinct spirals.

The number ${e}$ begins with spirals, but quickly turns into something more uniform.

The number ${\pi}$ has stronger spirals: seven of them, due to ${\pi\approx 22/7}$ approximation.

Of course, if ${\pi}$ was actually ${22/7}$, the arrangement would have rays instead of spirals:

What if we used more points? The previous pictures have 500 points; here is ${\pi}$ with ${3000}$. The new pattern has 113 rays: ${\pi\approx 355/113}$.

Apéry’s constant, after beginning with five spirals, refuses to form rays or spirals again even with 3000 points.

The images were made with Scilab as follows, with an offset by 1/2 in the polar radius to prevent the center from sticking out too much.

n = 500
alpha = (sqrt(5)+1)/2
r = sqrt([1:n]-1/2)
theta = 2*%pi*alpha*[1:n]
plot(r.*cos(theta), r.*sin(theta), '*');
set(gca(), "isoview", "on")

## Between a cubic spline and the line of regression

Given some data points ${(x_i,y_i)}$ one can devise various functions that extract information from them. Such as

A smoothing spline is something in between the above extremes: it insists on neither being a line (i.e., having zero second derivative) nor on passing through given points (i.e., having zero residuals). Instead, it minimizes a weighted sum of both things: the integral of second derivative squared, and the sum of residuals squared. Like this plot, where the red points are given but the spline chooses to interpolate the green ones:

I’ll demonstrate a version of a smoothing spline that might not be exactly canonical, but is very easy to implement in Matlab or Scilab (which I prefer to use). As in Connecting dots naturally, assume the knots ${a=x_0 equally spaced at distance ${h=(b-a)/n}$. The natural cubic spline is determined by the values of its second derivative at ${x_1,\dots,x_{n-1}}$, denoted ${z_1,\dots,z_n}$. As explained earlier, these can be found from the linear system

$\displaystyle \frac{h}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix} \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_{n-1}\end{pmatrix} = - \frac{1}{h} \begin{pmatrix} \Delta^2 y_1 \\ \Delta^2 y_2 \\ \vdots \\ \Delta^2 y_{n-1}\end{pmatrix}$

where the column on the right contains the amounts by which the derivative of the piecewise linear interpolant through given points jumps at every knot. The notation ${\Delta^2y }$ means the second difference of ${(y_i)}$, for example ${\Delta^2y_1 =y_2-2y_1+y_0}$.

A smoothing spline is also a natural spline, but for a different set of points ${(x_i,\tilde y_i)}$. One has to find ${\tilde y_i}$ that minimize a weighted sum of ${\sum_i (\tilde y_i-y_i)^2}$ and of ${\int_a^b (f''(x))^2\,dx}$. The latter integral is easily computed in terms of ${z_i}$: it is equal to ${\frac{h}{3}\sum_{i=1}^{n}(z_i^2+z_{i-1}^2+z_iz_{i-1})}$. Since this quadratic form is comparable to ${\sum_{i=1}^{n}z_i^2}$, I’ll work with the latter instead.

The idea is to set up an underdetermined system for ${z_i}$ and ${\tilde y_i-y_i}$, and let Scilab find a least squares solution of that. Let’s introduce a weight parameter ${\lambda>0}$ that will determine the relative importance of curvature vs residuals. It is convenient to let ${\tilde y_i=y_i+\lambda h^2 \delta_i}$, so that both ${\delta_i}$ and ${z_i}$ (second derivative) scale the same way. The terms ${\delta_i}$ contributes to the linear system for ${z_i}$, since the right hand side now has ${\tilde y_i}$ instead of ${y_i}$. This contribution is ${- \lambda h \Delta^2 \delta}$. Moving it to the left hand-side (since ${\delta_i}$ are unknowns) we obtain the following system.

$\displaystyle \begin{pmatrix} A & | & B \end{pmatrix} \begin{pmatrix} z \\ \delta \end{pmatrix} = - \frac{1}{h} \Delta^2 y$

where ${A}$ is the same tridiagonal matrix as above, and ${B}$ is the rectangular Laplacian-type matrix

$\displaystyle B = \frac{\lambda}{h} \begin{pmatrix} -1 & 2 & -1 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 2 & -1 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 2 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 2 & -1 \end{pmatrix}$

All in all, the system has ${2n }$ unknowns ${z_1,\dots,z_{n-1},\delta_0,\dots,\delta_n}$, and ${(n-1)}$ equations, reflecting the continuity of first derivative at each interior knot. The lsq command of Scilab finds the solution with the least sum of squares of the unknowns, which is what we are after.

Time for some examples. ${\lambda=0}$ and ${\lambda=0.05}$ can be seen above. Here are more:

As ${\lambda}$ increases, the spline approaches the regression line:

Finally, the Scilab code. It is almost the same as for natural spline; the difference is in five lines from B=... to newy=... The code after that is merely evaluation and plotting of the spline.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
lambda = 0.1

n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))

B = lambda/h*(diag(-2*ones(1,n+1))+diag(ones(1,n),1)+diag(ones(1,n),-1))
C = [A, B(2:$-1,:)] sol = lsq(C,-jumps')' z = [0,sol(1:n-1),0] newy = y + lambda*h^2*sol(n:$)

allx = []
spl = []
for i=1:n
xL = a+h*(i-1)
xR = a+h*i
x = linspace(xL,xR,100)
linear = newy(i)*(xR-x)/h + newy(i+1)*(x-xL)/h
correction = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
allx = [allx, x]
spl = [spl, linear+correction]
end

plot(allx, spl)
plot(a:h:b, newy, 'g*')
plot(a:h:b, y, 'r*')

## Normalizing to zero mean or median

Pick two random numbers ${x_1,x_2}$ from the interval ${[0,1]}$; independent, uniformly distributed. Normalize them to have mean zero, which simply means subtracting their mean ${(x_1+x_2)/2}$ from each. Repeat many times. Plot the histogram of all numbers obtained in the process.

No surprise here. In effect this is the distribution of ${Y=(X_1-X_2)/2}$ with ${X_1,X_2}$ independent and uniformly distributed over ${[0,1]}$. The probability density function of ${Y}$ is found via convolution, and the convolution of ${\chi_{[0,1]}}$ with itself is a triangular function.

Repeat the same with four numbers ${x_1,\dots,x_4}$, again subtracting the mean. Now the distribution looks vaguely bell-shaped.

With ten numbers or more, the distribution is not so bell-shaped anymore: the top is too flat.

The mean now follows an approximately normal distribution, but the fact that it’s subtracted from uniformly distributed ${x_1,\dots,x_{10}}$ amounts to convolving the Gaussian with ${\chi_{[0,1]}}$. Hence the flattened top.

What if we use the median instead of the mean? With two numbers there is no difference: the median is the same as the mean. With four there is.

That’s an odd-looking distribution, with convex curves on both sides of a pointy maximum. And with ${10}$ points it becomes even more strange.

Scilab code:

k = 10
A = rand(200000,k)
A = A - median(A,'c').*.ones(1,k)
histplot(100,A(:))

## Tossing a continuous coin

To generate a random number ${X}$ uniformly distributed on the interval ${[0,1]}$, one can keep tossing a fair coin, record the outcomes as an infinite sequence ${(d_k)}$ of 0s and 1s, and let ${X=\sum_{k=1}^\infty 2^{-k} d_k}$. Here is a histogram of ${10^6}$ samples from the uniform distribution… nothing to see here, except maybe an incidental interference pattern.

Let’s note that ${X=\frac12 (d_1+x_1)}$ where ${X_1=\sum_{k=1}^\infty 2^{-k} d_{k+1}}$ has the same distribution as ${X}$ itself, and ${d_1}$ is independent of ${X_1}$. This has an implication for the (constant) probability density function of ${X}$:

$\displaystyle p(x) = p(2x) + p(2x-1)$

because ${2 p(2x)}$ is the p.d.f. of ${\frac12 X_1}$ and ${2p(2x-1)}$ is the p.d.f. of ${\frac12(1+X_1)}$. Simply put, ${p}$ is equal to the convolution of the rescaled function ${2p(2x)}$ with the discrete measure ${\frac12(\delta_0+\delta_1)}$.

Let’s iterate the above construction by letting each ${d_k}$ be uniformly distributed on ${[0,1]}$ instead of being constrained to the endpoints. This is like tossing a “continuous fair coin”. Here is a histogram of ${10^6}$ samples of ${X=\sum_{k=1}^\infty 2^{-k} d_k}$; predictably, with more averaging the numbers gravitate toward the middle.

This is not a normal distribution; the top is too flat. The plot was made with this Scilab code, putting n samples put into b buckets:

n = 1e6
b = 200
z = zeros(1,n)
for i = 1:10
z = z + rand(1,n)/2^i
end
c = histplot(b,z)


If this plot too jagged, look at the cumulative distribution function instead:

It took just more line of code: plot(linspace(0,1,b),cumsum(c)/sum(c))

Compare the two plots: the c.d.f. looks very similar to the left half of the p.d.f. It turns out, they are identical up to scaling.

Let’s see what is going on here. As before, ${X=\frac12 (d_1+X_1)}$ where ${X_1=\sum_{k=1}^\infty 2^{-k} d_{k+1}}$ has the same distribution as ${X}$ itself, and the summands ${d_1,X_1}$ are independent. But now that ${d_1}$ is uniform, the implication for the p.d.f of ${X}$ is different:

$\displaystyle p(x) = \int_0^{1} 2p(2x-t)\,dt$

This is a direct relation between ${p}$ and its antiderivative. Incidentally, if shows that ${p}$ is infinitely differentiable because the right hand side always has one more derivative than the left hand side.

To state the self-similarity property of ${X}$ in the cleanest way possible, one introduces the cumulative distribution function ${F}$ (the Fabius function) and extends it beyond ${[0,1]}$ by alternating even and odd reflections across the right endpoint. The resulting function satisfies the delay-differential equation ${F\,'(x)=2F(2x)}$: the derivative is a rescaled copy of the function itself.

Since ${F}$ vanishes at the even integers, it follows that at every dyadic rational, all but finitely many derivatives of ${F}$ are zero. The Taylor expansion at such points is a polynomial, while ${F}$ itself is not. Thus, ${F}$ is nowhere analytic despite being everywhere ${C^\infty}$.

This was, in fact, the motivation for J. Fabius to introduce this construction in 1966 paper Probabilistic Example of a Nowhere Analytic ${C^\infty}$-Function.

## The Nelder-Mead minimization algorithm

It is easy to find the minimum of ${f(x,y) = x^2+16y^2}$ if you are human. For a computer this takes more work:

The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of ${f}$. Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?

A natural way to look for minimum is to slide along the graph in the direction opposite to ${\nabla f}$; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of ${f}$; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.

This is already a reasonable minimization algorithm: begin with a triangle ${T}$; find the values of ${f}$ at the vertices of ${T}$; reflect the triangle away from the highest value; if the reflected point ${R}$ has a smaller value, move there; otherwise stop.

But there’s a problem: the size of triangle never changes in this process. If ${T}$ is large, we won’t know where the minimum is even if ${T}$ eventually covers it. If ${T}$ is small, it will be moving in tiny steps.

Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of ${T}$. It is natural to contract it toward the “best” vertex (the one with the smallest value of ${f}$), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of ${f}$ at all vertices becoming very close to one another.

This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of ${f}$.

The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:

• Begin with some triangle ${T}$.
• Evaluate the function ${f}$ at each vertex. Call the vertices ${W,G,B}$ where ${W}$ is the worst one (the largest value of ${f}$) and ${B}$ is the best.
• Reflect ${W}$ about the midpoint of the good side ${GB}$. Let ${R}$ be the reflected point.
• If ${f(R), then we consider moving even further in the same direction, extending the line ${WR}$ beyond ${R}$ by half the length of ${WR}$. Choose between ${R}$ and ${E}$ based on where ${f}$ is smaller, and make the chosen point a new vertex of our triangle, replacing ${W}$.
• Else, do not reflect and instead shrink the triangle toward ${B}$.
• Repeat, stopping when we either exceed the number of iterations or all values of ${f}$ at the vertices of triangle become nearly equal.

(The full version of the Nelder-Mead algorithm also includes the comparison of ${R}$ with ${G}$, and also involves trying a point inside the triangle.)

This is Rosenbrock’s function ${f(x,y)=100(x^2-y)^2 + (x-1)^2}$, one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola ${y=x^2}$. At the bottom of the valley, the incline toward the minimum ${(1,1)}$ is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.

The algorithm described above gets within ${0.001}$ of the minimum in 65 steps.

In conclusion, Scilab code with this algorithm.

x = -0.4:0.1:1.6; y = -2:0.1:1.4          // viewing window
[X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30)  // contour plot
plot([1],[1],'r+')                             // minimum point
tol = 10^(-6)
n = 0
T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5]        //  initial triangle
for i=1:3
values(i) = f(T(i,1), T(i,2))
end
while (%T)
xpoly(T(:,1),T(:,2),'lines',1)         // draw the triangle
[values, index] = gsort(values)          // sort the values
T = T(index,:)
if values(1)-values(3) < tol            // close enough?
mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2))
break
end
R = T(2,:) + T(3,:) - T(1,:)             // reflected
fR = f(R(1),R(2))
if fR < values(3)
E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:)  // extended
fE = f(E(1),E(2))
if fE < fR
T(1,:)=E; values(1)=fE     // pick extended
else
T(1,:)=R; values(1)=fR     // pick reflected
end
else
for i=1:2
T(i,:) = (T(i,:)+T(3,:))/2      // shrink
values(i) = f(T(i,1), T(i,2))
end
end
n = n+1
if n >= 200
disp('Failed to converge'); break    //  too bad
end
end

## Fractal-ish monotone functions

There are several ways to construct a strictly increasing continuous function which has zero derivative almost everywhere. I like the explicit construction given by R. Salem, On some singular monotonic functions which are strictly increasing (1943).

Here is a streamlined version of the construction.

Fix ${\lambda\in (0,1/2)}$ (on the above picture ${\lambda=1/4}$). Let ${f_0(x)=x}$, and inductively define ${f_{n+1}}$ so that

1. ${f_{n+1} (x) = f_n(x)}$ when ${x\in 2^{-n}\mathbb Z}$.
2. If ${x\in 2^{-n}\mathbb Z}$, let ${f_{n+1}(x+2^{-n-1}) =\lambda f_n(x) + (1-\lambda) f_n(x+2^{-n})}$.
3. Now that ${f_{n+1}}$ has been defined on ${2^{-n-1}\mathbb Z}$, extend it to ${\mathbb R}$ by linear interpolation.
4. Let ${f=\lim f_n}$.

Since ${f(x+1)=f(x)+1}$ by construction, it suffices to understand the behavior of ${f}$ on ${[0,1]}$.

Each ${f_n}$ is piecewise linear and increasing. At each step of the construction, every line segment of ${f_n}$ (say, with slope ${s}$) is replaced by two segments, with slopes ${2(1-\lambda)s}$ and ${2\lambda s}$. Since ${\lambda f_n(x+2^{-n-1})}$. Hence, ${f_{n+1}\ge f_n}$.

Since ${f(x)=f_n(x)}$ when ${x\in 2^{-n}\mathbb Z}$, it is easy to understand ${f}$ by considering its values at dyadic rationals and using monotonicity. This is how one can see that:

• The difference of values of ${f}$ at consecutive points of ${2^{-n}\mathbb Z}$ is at most ${(1-r)^{n}}$. Therefore, ${f}$ is Hölder continuous with exponent ${\alpha = - \frac{\log (1-r)}{\log 2}}$.
• The difference of values of ${f}$ at consecutive points of ${2^{-n}\mathbb Z}$ is at least ${r^{n}}$. Therefore, ${f}$ is strictly increasing, and its inverse is Hölder continuous with exponent ${\alpha = - \frac{\log r}{\log 2}}$.

It remains to check that ${f'=0}$ almost everywhere. Since ${f}$ is monotone, it is differentiable almost everywhere. Let ${x}$ be a point of differentiability (and not a dyadic rational, though this is automatic). For each ${n}$ there is ${x_n\in 2^{-n}\mathbb Z}$ such that ${x_n < x< x_n+2^{-n}}$. Let ${s_n = 2^{n} (f_n(x_n+2^{-n})-f_n(x_n))}$; this is the slope of ${f_n}$ on the ${2^{-n}}$-dyadic interval containing ${x}$. Since ${f'(x)}$ exists, we must have ${f'(x) = \lim_{n\rightarrow\infty} s_n}$. On the other hand, the ratio of consecutive terms of this sequence, ${s_{n+1}/s_n}$, is always either ${2 (1-\lambda )}$ or ${2\lambda}$. Such a sequence cannot have a finite nonzero limit. Thus ${f'(x)=0}$.

Here is another ${f}$, with ${\lambda=1/8}$.

By making ${\lambda}$ very small, and being more careful with the analysis of ${f'}$, one can make the Hausdorff dimension of the complement of ${\{x \colon f'(x)=0\}}$ arbitrarily small.

An interesting modification of Salem’s function was introduced by Tukia in Hausdorff dimension and quasisymmetric mappings (1989). For the functions considered above, the one-sided derivatives at every dyadic rational are zero and infinity, which is a rather non-symmetric state of affair. In particular, these functions are not quasisymmetric. But Tukia showed that if one alternates between ${\lambda}$ and ${1-\lambda}$ at every step, the resulting homeomorphism of ${\mathbb R}$ becomes quasisymmetric. Here is the picture of alternating construction with ${\lambda=1/4}$; preliminary stages of construction are in green.

This is quite similar to how the introduction of alternating signs turns Takagi curve (blancmange curve) into a quasiarc (i.e., a curve without cusps); see Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk. But the fractal curves in this post are relatively mild-mannered: they are rectifiable (and thus, not really fractal).

Here is the simple Scilab code I used for the above plots.

r = 1/4
t = [0 1]
f = t
for i = 1:10
t = [t, t(1:$-1)+2^(-i)] f = [f, r*f(1:$-1)+(1-r)*f(2:\$)]
[t, ind] = gsort(t,'g','i')
f = f(ind)
end
plot(t,f)


To have preliminary stages shown as well, move plot(t,f) into the loop. For Tukia’s alternating version, insert the line r = 1-r into the loop.

## Connecting dots naturally

How to draw a nice curve through given points ${(x_i,y_i)}$?

One way is to connect them with straight lines, creating a piecewise linear function:

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

$\displaystyle \int_0^{10} \sqrt{1+(f'(x))^2}\,dx$

among all functions with ${f(x_i)=y_i}$. As is often the case, the length functional can be replaced with the elastic energy

$\displaystyle \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx$

because the piecewise linear ${f}$ (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

It is not natural for a curve connecting the points with ${1\le y\le 9}$ to shoot up over ${y=40}$. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

$\displaystyle \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx$

and the function that minimizes it subject to conditions ${f(x_i)=y_i}$ looks very nice indeed:

The Euler-Lagrange equation for the functional ${\mathcal{B}}$ dictates that the fourth derivative of ${f}$ is zero in the intervals between the knots ${x_i}$. Thus, ${f}$ is a piecewise cubic polynomial. Also, both ${f}$ and ${f'}$ must be continuous for any function with integrable second derivative. More delicate analysis is required for ${f''}$, but it also can be shown to be continuous for minimizing function ${f}$; moreover, ${f''}$ must vanish at the endpoints ${0}$ and ${10}$. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term ${C}$ to the piecewise linear interpolant ${L}$. Here the spline is shown together with ${L}$ (green) and ${C}$ (magenta).

On each interval ${[x_i,x_{i+1}]}$ the correction term ${C}$ is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

$\displaystyle C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2$

on this interval. There are 20 coefficients ${\alpha_i}$, ${\beta_i}$ to find. At each of 9 knots 1,2,…9 we have two conditions: ${C''}$ must have removable singularity and ${C'}$ must jump by the amount opposite to the jump of ${L'}$. Since ${C''}$ also vanishes at ${1,10}$, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of ${z_i=C''(x_i)}$. Indeed, the values of ${C''}$ at two consecutive knots determine the correction term within: ${\alpha_i= \dfrac{z_{i+1}+2z_i}{6} }$ and ${\beta_i = \dfrac{2z_{i+1}+z_i}{6}}$. This leaves ${n-1}$ equations (from the jumps of ${C'}$) for ${n-1}$ unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

$\displaystyle \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & 4 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & 4 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\ 0 & 0 & 0 & 0 & \ldots & 1 & 4 \end{pmatrix}$

One can solve the system for ${z_i}$ within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0]
allx = []; spl = []
for i=1:n
xL = a+h*(i-1)
xR = a+h*i
x = linspace(xL,xR,100)
linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
allx = [allx, x]
spl = [spl, linear + cor]
end
plot(allx, spl)
plot(a:h:b, y, 'r*')

## Random knight

A knight walks randomly on the standard chessboard. What is the proportion of time that it will spend at e4?

The answer (1/42) is not hard to get, but I’ll take the computational approach first (testing Scilab 5.5.0 beta 1, by the way). Begin by placing the knight at b1 and letting it make five random moves. This is the distribution of its position after five moves:

Unsurprisingly, half of the board is black; actually more than half because the knight can’t get to h8 in five moves. the other half isn’t — you can even get to h8 in five moves (want to find all possible ways to do this?).

After ten moves, the colors become more uniform.

After 200 (or any large even number) of moves, the distribution is little changed. But you may notice that it is centrally symmetric, while the previous one isn’t quite.

Let’s repeat the process beginning with two knights at b1 and g1. After five moves of each:

After ten moves:

After a large number of moves (does not matter, even or odd), the variety of colors is greatly reduced:

Indeed, this is the distribution which also describes the proportion of time that the knight (wherever it started) will spend at a given square Q.

Precisely, the proportion of time spent at Q is P(Q)=N(Q)/336 where N(Q) is the number of legal moves from Q. For the sixteen central squares P(Q) = 8/336 = 1/42, while for the corners we get 2/336 = 1/168.

Here is a quick argument to support the above. Let Q1 and Q2 be two squares such that the move from Q1 to Q2 is legal. The proportion of time that the knight makes this move is P(Q1)/N(Q1). Similarly, the time proportion of Q2-Q1 move is P(Q2)/N(Q2). Since the process is symmetric in time (we have a reversible Markov chain), P(Q1)/N(Q1)=P(Q2)/N(Q2). In other words, the ratio P/Q is the same for any two squares that are joined by a legal move; it follows that P/Q is the same for all cells. Finding the coefficient of proportionality is a matter of counting, since the sum of P(Q) over all Q is 1.

The Scilab code I wrote for this post is largely not knight-specific. The function update receives the initial distribution of a chess piece and the set of its legal moves. It computes the distribution after one random move.

function newState=update(state,moves)
[maxK,n]=size(moves)
[n,n]=size(state)
newState=zeros(state)
for i=1:n
for j=1:n
neighborCount=0
contribution=zeros(state)
for k=1:maxK
pos=[i,j]+moves(k,:)
if (min(pos)>=1)&(max(pos)<=n) then
neighborCount=neighborCount+1
contribution(pos(1),pos(2))=state(i,j)
end
end
newState=newState+contribution/neighborCount
end
end
endfunction

This set is given in the function knight which calls update iteratively.

function state=knight(state,iter)
moves=[2 1;1 2;-2 1;1 -2;2 -1;-1 2;-2 -1;-1 -2]
for i=1:iter
state=update(state,moves)
end
endfunction

For example, this is how I plotted the distribution after 10 random moves starting at b1:

initial = zeros(8,8)
initial(8,2)=1
state = knight(initial,10)
isoview(0,0,9,9)
Matplot(state*500)
f = gcf()
f.color_map = hotcolormap(32)

NB: getting the correct color representation of matrices from Matplot requires an up-to-date version of Scilab; in version 5.4.0 Matplot smoothens colors, producing intriguing but less informative images.

## Structure of the 3n+1 stopping time

Returning to the stopping time of the ${3n+1}$ process (first part here):

Here is the plot of the stopping time embellished with a few logarithmic functions.

This structure is explained by looking at the number of ${3x+1}$ operations that a number experiences before reaching ${1}$.

No ${3x+1}$ operations. Such number are obviously of the form ${2^m}$, with stopping time ${m}$. These creates the points ${(2^m,m)}$ which lie on the curve ${y=\log_2 x}$.

One ${3x+1}$ operation. To find such numbers, follow their orbit backward: a series of multiplication by ${2}$, then ${(x-1)/3}$ operation, then more multiplications by ${2}$. This leads to numbers of the form ${2^n \dfrac{2^{m}-1}{3}}$ where ${m}$ must be even in order for ${2^m-1}$ to be divisible by ${3}$. The stopping time is ${n+m+1}$. Since ${2^n \dfrac{2^{m}-1}{3} \approx \dfrac{2^{n+m}}{3}}$, the corresponding points lie close to the curve ${y=1+\log_2(3x)}$. Also notice that unlike the preceding case, clusters appear: there may be multiple pairs ${(n,m)}$ with even ${m}$ and the same ${n+m}$. The larger the sum ${n+2m}$ is, the more such pairs occur.

Two ${3x+1}$ operations. Tracing the orbit backwards again, we find that these are numbers of the form

${2^p\dfrac{2^n \dfrac{2^{m}-1}{3} -1}{3}}$

It is straightforward to work out the conditions on ${(m,n)}$ which allow both divisions by ${3}$ to proceed. They are: either ${n}$ is odd and ${m \equiv 4 \mod 6}$, or ${n}$ is even and ${m\equiv 2 \mod 6}$. In any event, the stopping time is ${m+n+p+2}$ and the number itself is approximately ${2^{m+n+p}/9}$. On the above chart, these points lie near the curve ${y=2+\log_2(9x)}$. Clustering will be more prominent than in the previous case, because we now deal with triples ${(n,m,p)}$ that will be nearby each other if ${n+m+p}$ is the same.

k) ${k}$ operations of ${3x+1}$ kind. These yield the points near the curve ${y=k+\log_2(3^k x)}$, or, to put it another way, ${y=k\log_2 6+\log_2(x)}$. The plot above shows such curves for ${k=0,1,\dots,11}$.

## Asymptotics of 3n+1 stopping time

It is a well-known open problem whether the following process terminates for every positive integer:

Experiments suggest that it does, possibly with unintended side effects.

Since for any odd integer ${n}$ the number ${3n+1}$ is even, it is convenient to replace the step ${n:=3n+1}$ with ${n:=(3n+1)/2}$, as show below:

As a function of ${n}$, the stopping time has a nice patterned graph:

An odd integer ${n}$ is of the form ${4k+1}$ or ${4k+3}$. In the first case, ${(3n+1)/2 = 6k+2}$ is even, while in the second case ${(3n+1)/2 = 6k+5}$ is odd. So, if ${n}$ is picked randomly from all odd integers of certain size, the probability of ${(3n+1)/2}$ being even is ${1/2}$. Similarly, for an even ${n}$, the number ${n/2}$ can be even or odd with probability ${1/2}$.

This leads to a stochastic model of the process:

The graph of stopping time in the stochastic model is, of course random. It looks nothing like the nice pattern of the deterministic process.

However, smoothing out both graphs by moving window of width ${200}$ or so, we see the similarity:

The stochastic process is much easier to analyze. Focusing on the logarithm ${\log x}$, we see that it changes either by ${\log(1/2)}$ or by approximately ${\log (3/2)}$. The expected value of the change is ${\displaystyle \frac12\log (3/4)}$. This suggests that we can expect the logarithm to drop down to ${\log 1=0}$ in about ${\displaystyle \frac{2}{\log (4/3)}\log x}$ steps. (Rigorous derivation requires more tools from probability theory, but is still routine.)

The curve ${\displaystyle \frac{2}{\log (4/3)}\log x}$ fits the experimental data nicely. (The red curve, being randomly generated, is different from the one on the previous graph.)

For an in-depth investigation, see Lower bounds for the total stopping time of 3X+1 iterates by Applegate and Lagarias.

For the computations, I used Scilab. The function hail(n,m) calculates the stopping times up to given value of n, and takes moving average with window size m (which can be set to 1 for no averaging).

 function hail(n,m) steps=zeros(1:n); steps(1)=0 for i=2:n k=i; s=0; while k>=i if modulo(k,2)==0 then k=k/2; s=s+1; else k=(3*k+1)/2; s=s+1; end end steps(i)=s+steps(k); end total = cumsum(steps) for i=1:n-m average(i)=(total(i+m)-total(i))/m; end plot(average,'+'); endfunction 

As soon as the result of computations drops below the starting value, the number of remaining steps is fetched from the array that is already computed. This speeds up the process a bit.

The second function follows the stochastic model, for which the aforementioned optimization is not available. This is actually an interesting point: it is conceivable that the stochastic model would be more accurate if it also used the pre-computed stopping time once ${x}$ drops below the starting value. This would change the distribution of stopping times, resulting in wider fluctuations after averaging.

 function randomhail(n,m) rsteps=zeros(1:n); rsteps(1)=0 for i=2:n k=i; s=0; while k>1 if grand(1,1,"bin",1,1/2)==0 then k=k/2; s=s+1; else k=(3*k+1)/2; s=s+1; end end rsteps(i)=s; end rtotal = cumsum(rsteps) for i=1:n-m raverage(i)=(rtotal(i+m)-rtotal(i))/m; end plot(raverage,'r+'); endfunction