Peanut allergy and Polar interpolation

Draw a closed curve passing through the points {(\pm 3,0)} and {(0,\pm 1)}: more specifically, from (3,0) to (0,1) to (-3,0) to (0,-1) and back to (3,0).

How hard could it be?
How hard could it be?

I’ll wait.

… Done?

Okay, you may read further.

This is an interpolation problem. The geometry of the problem does not allow for curves of the form {y=f(x)}. But polar graphs {\rho=f(\theta)} should work. (Aside: in {(x,y)} the variable usually taken to be independent is listed first; in {(\rho,\theta)} it is listed second. Is consistent notation too much to ask for? Yes, I know the answer…) Since {f} must be {2\pi}-periodic, it is natural to interpolate {(3,0),(1,\pi/2),(3,\pi),(1,3\pi/2)} by a trigonometric polynomial. The polynomial {f(\theta)=2+\cos 2\theta} does the job. But does it do it well?

I did not order peanuts
I did not order peanuts

This is not what I would draw. My picture would be more like the ellipse with the given points as vertices:

Ellipse
Ellipse

The unnecessary concavity of the peanut graph comes from the second derivative {f''} being too large at the points of minimum (and too small at the points of maximum). We need a function with flat minima and sharp maxima. Here is the comparison between {f(\theta)=2+\cos 2\theta} (red) and the function that describes the ellipse in polar coordinates (blue):

We want blue, not red.
We want blue, not red.

I thought of pre-processing: apply some monotone function {\varphi\colon (0,\infty)\rightarrow \mathbb R} to the {\rho}-values, interpolate, and then compose the interpolant with {\varphi^{-1}}. The function {\varphi} should have a large derivative near {0}, so that its inverse has a small derivative there, flattening the minima.

The first two candidates {\varphi(r)=\log r} and {\varphi(r)=1/r} did not improve the picture enough. But when I tried {\varphi(r)=1/r^2}, the effect surprised me. Interpolation between {(1/9,0),(1,\pi/2),(1/9,\pi),(1,3\pi/2)} yields {(5-4\cos 2\theta)/9}, which after application of {\varphi^{-1}} produces {\rho = 3/\sqrt{5-4\cos 2\theta}}exactly the ellipse pictured above.

Next, I tried a less symmetric example: curve through {(5,0);(0,2);(-3,0);(0,-2)}.

Egg without preprocessing
Egg without preprocessing
Preprocessed egg
Preprocessed egg

Again, the second approach produces a more natural looking curve.

Finally, a curve with five-fold symmetry, with {\rho} ranging from {2} to {5}.

Star without preprocessing
Star without preprocessing
Star with preprocessing
Star with preprocessing

The interpolation and plotting were done in Scilab, by invoking the functions given below with commands polar1([3,1,3,1]) or polar2([2,5,2,5,2,5,2,5,2,5]), etc. Because I considered equally spaced interpolation nodes, the coefficients of interpolated polynomials are nothing but the (inverse) discrete Fourier transform of the given values. First function interpolates the radius {\rho} itself.

function polar1(R)
 clf()
 p = fft(R,1)  
 t = 0:.01:2*%pi
 rho = zeros(t)
 for i = 1:length(p)
     rho = rho + real(p(i)*exp(-%i*(i-1)*t))
 end
 polarplot(t,rho,style=5)
 t = resize_matrix(linspace(0,2*%pi,length(R)+1),1,length(R))
 plot(R.*cos(t), R.*sin(t), 'o')
endfunction

The second function includes preprocessing: it interpolates {1/\rho^2} and then raises the interpolant to power {-1/2}.

function polar2(R)
 clf()
 p = fft(R^(-2),1)
 t = 0:.01:2*%pi
 rho = zeros(t)
 for i = 1:length(p)
     rho = rho + real(p(i)*exp(-%i*(i-1)*t))
 end
 rho = max(rho,0.01*ones(rho))
 polarplot(t,rho^(-1/2),style=5)
 t = resize_matrix(linspace(0,2*%pi,length(R)+1),1,length(R))
 plot(R.*cos(t), R.*sin(t), 'o')
endfunction

Added: one more comparison, {\rho^{1}} vs {\rho^{-2}} vs {\rho^{-3}}; the last one loses convexity again.

Raising to -3 is going too far.
Raising to -3 is going too far.

And then to {\rho^{-7}}:

Power -7
Power -7

Tristram-Levine signatures with Scilab

The signature of a Hermitian matrix {A} can be defined either as the pair (number of positive eigenvalues, number of negative eigenvalues), or simply as the difference

\displaystyle s(A)=\#\{\lambda_i: \lambda_i>0\}-\#\{\lambda_i: \lambda_i<0\}

The function {s(A)} hides some information when the matrix is degenerate, but this will not be of concern here. For a matrix of size {n}, the number {s(A)} is between {-n} and {n} and has the same parity as {n}.

Given any square matrix {V} with real entries and a complex number {\omega}, we can form {V_\omega = (1-\omega)V+(1-\bar\omega)V^T}, which is a Hermitian matrix. Then {s(\omega):=s(V_\omega)} is an integer-valued function of {\omega}. Restricting attention to the unit circle {|\omega|=1}, we obtain a piecewise constant function with jumps at the points where {V_\omega} is degenerate.

When {A} is a Seifert matrix of a knot, the function {s(\omega)} is the Tristram-Levine signature of the knot. To each knot {K} there are infinitely many Seifert surfaces {F}, and to each Seifert surface there are infinitely many Seifert matrices {A}, depending on how we choose the generators of {H_1(F)}. Yet, {s(\omega)} depends on {K} alone.

Below I plot {s(\omega)} for a few knots using Scilab for computation of signature and the knot data from

J. C. Cha and C. Livingston, KnotInfo: Table of Knot Invariants

Technical point: since Scilab enumerates colors using positive integers, the colors below correspond to the number of positive eigenvalues rather than the signature. Namely, black for 0, blue (1), green (2), and cyan (3).

As always, first comes the trefoil:

Trefoil 3_1

One of its Seifert matrices is

\displaystyle \begin{pmatrix} -1 & 0 \\ -1 & -1 \end{pmatrix}

and the Tristram-Levine signature is

trefoil

Next, the knot {8_5}

8_5

with Seifert matrix

\displaystyle  \begin{pmatrix} -1& 0& 0& -1& -1& -1\\ 0& 1& 0& 0& 0& 0\\ -1& 0& -1& -1& -1& -1\\ 0& -1& 0& -1& -1& -1\\ 0& -1& 0& 0& -1& 0\\ 0& -1& 0& 0& -1& -1 \end{pmatrix}

and the signature

TL signature for 8_5
TL signature for 8_5

And finally, the knot {8_{19}}:

8_19

with Seifert matrix

\displaystyle  \begin{pmatrix} -1& 0& 0& 0& 0& 0\\ -1& -1& 0& 0& 0& 0\\ -1& -1& -1& -1& 0& -1\\ -1& -1& 0& -1& 0& 0\\ 0& 0& -1& -1& -1& -1\\ -1& -1& 0& -1& 0& -1\end{pmatrix}

and the signature

TL signature for 8_19
TL signature for 8_19

I experimented with a few more, trying to create more colors. However, guessing the complexity of the signature by looking at the Seifert matrix is one of many skills I do not have. So I conclude with the simple code used to plot the signatures.

function sig(A)
    clf();
    [n,n] = size(A);
    Npoints = 200;
    r = 2*%pi/Npoints;
    arcs = zeros(6,Npoints);
    colors = zeros(Npoints);
    for m = 1:Npoints
        x = cos(2*%pi*(m-1/2)/Npoints); 
        y = sin(2*%pi*(m-1/2)/Npoints);
        omega = complex(x,y);
        B = (1-omega)*A+(1-conj(omega))*A';
        signature = sum(sign(spec(B)));
        colors(m) = (signature+n)/2+1;
        arcs(:,m) = [x-r,y-r,2*r,2*r,0,360*64]';
    end
    xfarcs(arcs,colors');
    replot([-1,-1,1,1]);
    a=get("current_axes");
    a.isoview="on";
endfunction

Space-time (holiday) trees

No new math here, only new graphics. The following plot is from Collisions II, where it illustrates the process of contracting a 6-point planar set into a single point.

6 points
Intersections?

I remarked there that the trajectories cross in space, but not in space-time. The space-time plot of the contraction process is always a tree, but I never showed these trees to you.

Well, here they are. Time flows down, so that the trees have branches (initial points) above root.

4 points
4 points

The four points above were placed into the vertices of a rectangle. The plots below use points with random coordinates.

7 points
7 points
10 points
10 points

The Scilab command was

gravity4(grand(1,10,"def"),grand(1,10,"def"),250)

and the function itself is below.

function gravity4(vx,vy,Nsteps)
    Npoints = length(vx);
    h = 0.01/Npoints;
    moving = ones(vx);
    tolerance = h*(Npoints-1)/3;
    x = zeros(Nsteps,Npoints);
    y = zeros(Nsteps,Npoints);
    z = zeros(Nsteps,Npoints);    
    x(1,:) = vx;
    y(1,:) = vy;
    for m = 1:(Nsteps-1)
        x(m+1,:)= x(m,:);
        y(m+1,:)= y(m,:);
        z(m+1,:) = z(m,:);
        for i = 1:Npoints
            if moving(i)==0 then continue
            end
                z(m+1,i)=z(m,i)-h*Npoints;
                for j = (i+1):Npoints
                if moving(j)==0 then continue
                end
                v = [x(m,j),y(m,j)]-[x(m,i),y(m,i)];
                if norm(v) < tolerance then 
                    moving(j)=0; 
                else
                    x(m+1,i) = x(m+1,i) + h*v(1)/norm(v);
                    x(m+1,j) = x(m+1,j) - h*v(1)/norm(v);
                    y(m+1,i) = y(m+1,i) + h*v(2)/norm(v);
                    y(m+1,j) = y(m+1,j) - h*v(2)/norm(v);
                end
            end
        end
    end
    clf();
    param3d1(x(:,:),y(:,:),z(:,:));
    a = get("current_axes");
    curves = a.children.children; 
    for i=1:Npoints
        curves(i).foreground = 2+modulo(i,5);
        curves(i).thickness = 4;    
    end
endfunction 

I tried to replace a.children.children with a.grandchildren, but that did not work…

Nested powers and collisions: final chapter

This is where the story of nested powers of metric spaces and of collision-inducing ODE reaches its logical conclusion.

Theorem 1. For every {d\ge 1} and every {n\ge 2} there exists a Lipschitz retraction {r\colon {\mathbb R}^d(n)\rightarrow {\mathbb R}^d(n-1)}.

Compared to previous posts, I changed notation to improve readability: {X(n)} now stands for the set of all nonempty subsets of {X} with at most {n} elements. The metric on {X(n)} is the Hausdorff metric {d_H}. The map {r} was defined in Collisions II as follows:

Given a set {A\in {\mathbb R}^d(n)\setminus {\mathbb R}^d(n-1)}, enumerate its elements {a_1,\dots,a_n} and consider the initial-value problem

\displaystyle    \dot{x}_i = \sum_{j\ne i}\frac{x_j-x_i}{|x_j-x_i|}, \quad x_i(0)=a_i, \quad i=1,\dots,n

Define {r(A) = \{x_i(t_c)\}} where {t_c} be the time of the first collision between particles in this system. Also define {r(A)=A} when {A\in{\mathbb R}^d(n-1)}.

I claim that

\displaystyle    d_H(r(A),r(B))\le (2n-1)\sqrt{n} \,d_H(A,B)   \ \ \ \ \ (1)


for all {A,B\in{\mathbb R}^d(n)}. Interestingly, the Lipschitz constant does not depend on the dimension of ambient space. (Well, I’m not so sure it has to depend on cardinality either.)

Proof. Let

\displaystyle \delta(A)=\begin{cases} \min_{i\ne j}|a_i-a_j|,\quad  A\in{\mathbb R}^d(n)\setminus {\mathbb R}^d(n-1) \\ 0,\quad \quad A\in \mathbb R^d(n-1) \end{cases}

It is easy to check that the function {A\mapsto \delta(A)} is {2}-Lipschitz on {{\mathbb R}^d(n)}.

Earlier I proved that { t_c\le \dfrac{1}{2}\delta(A)} and, as a consequence,

\displaystyle    d_H(r(A),A)\le \frac{n-1}{2} \delta(A)   \ \ \ \ \ (2)

Let {\rho=d_H(A,B)}. If {\delta(A)+\delta(B)\le 4\,\rho}, then (2) already gives (1). From now on assume {\delta(A)> 2\,\rho}, hence {\delta(B)>0}. The geometric meaning of these inequalities is that the points of {A} are spread out in space, yet each of them is close to a point of {B}. Therefore, we can enumerate them {a_i} and {b_i} in such a way that {|a_i-b_i|\le \rho} for all {i}.

Let {(x_i)} and {(y_i)} be the solutions of our ODE with initial data {(a_i)} and {(b_i)}, respectively. Here comes a monotonicity lemma.

Lemma 2. {\sum_{i=1}^n |x_i-y_i|^2} is nonincreasing with time {t}.

Proof: Encode the sequence {(x_i)} by a single point {\mathbf{x}\in {\mathbb R}^{nd}}. This point is moved by the gradient flow of the convex function {\Phi(\mathbf{x}) = \sum_{i<j}|x_i-x_j|}. Since the gradient of a convex function is monotone, we have {\langle {\mathbf{ \dot x}}-{\mathbf{\dot y}}, \mathbf{x}-\mathbf{y}\rangle\le 0}. It follows that {|\mathbf{x}-\mathbf{y}|^2} is nonincreasing. \Box

By Lemma 2, {|x_i-y_i|\le \sqrt{n}\,\rho } holds for all {i} until a collision occurs in either system. We may assume that {x}-particles collide first. Then, at the time of their collision they form the set {r(A)} while the {y}-particles form some set {B'}, with {d_H(r(A),B')\le \sqrt{n}\,\rho}. Since r(A) has fewer than n elements, {\delta(B')\le 2d_H(r(A),B')}. From (2) we obtain {d_H(r(B'),B')\le 2(n-1)\sqrt{n}\,\rho}. Note that {r(B')=r(B)} because our ODE system is autonomous. The final estimate is {d_H(r(A),r(B))\le d_H(r(A),B')+d_H(r(B'),B')}, which is (1). {\Box}

Here is how a 20-point set gets retracted onto {{\mathbb R}^2(19)}:

20 particles, stopped
20 particles, stopped

The plot was produced by a version of the familiar Scilab routine, modified to stop the computation at first collision. It was called with

gravity3(grand(1,20,"unf",0,2),grand(1,20,"def"),300)
function gravity3(vx,vy,Nsteps)
    Npoints = length(vx);
    h = 0.01/Npoints;
    tolerance = h*(Npoints-1)/3;
    collided = 0;
    x = zeros(Nsteps,Npoints);
    y = zeros(Nsteps,Npoints);
    x(1,:) = vx;
    y(1,:) = vy;
    m = 1;
    while (m<Nsteps & collided==0) do
        x(m+1,:)= x(m,:);
        y(m+1,:)= y(m,:);
        for i = 1:Npoints
            for j = (i+1):Npoints
                v = [x(m,j),y(m,j)]-[x(m,i),y(m,i)];
                if norm(v) < tolerance then 
                    collided=1; 
                else
                    x(m+1,i) = x(m+1,i) + h*v(1)/norm(v);
                    x(m+1,j) = x(m+1,j) - h*v(1)/norm(v);
                    y(m+1,i) = y(m+1,i) + h*v(2)/norm(v);
                    y(m+1,j) = y(m+1,j) - h*v(2)/norm(v);
                end
            end
        end
    m = m+1;    
    end
    clf();
    plot(x(:,:),y(:,:),'o');
endfunction

Collisions II

Continuation of the preceding post, in which I considered the ODE system

\displaystyle    \dot{x}_i = \sum_{j\ne i}\frac{x_j-x_i}{|x_j-x_i|},\quad i=1,\dots,n

with the initial conditions {x_i(0)=p_i}, where {p_1,\dots,p_n} are distinct points in a Euclidean space. This can be seen as the gradient flow (in the space of all {n}-point subsets) of the function {\sum_{i\ne j}|x_i-x_j|}. Not surprisingly, minimization of the sum of distances leads to collisions. And they happen as quickly as they possibly could: here is a precise statement.

Lemma 1. Let {t_c} be the time of the first collision. Then

{\displaystyle \frac{1}{2(n-1)}\,\delta\le t_c\le \frac{1}{2}\,\delta}

where {\displaystyle \delta=\min_{i\ne j}|p_i-p_j|}.

The lower bound follows at once from {|\dot{x}_i|\le n-1}. To prove the upper bound on {t_c}, we need the concept of monotonicity. A map {F\colon{\mathbb R}^d\rightarrow{\mathbb R}^d} is called monotone if {\langle F(a)-F(b),a-b\rangle \ge 0} for all {a,b\in{\mathbb R}^d}. It is not hard to prove that the gradient of any convex function is monotone. In particular, {F(x)=x/|x|} is a monotone map, being the gradient of convex function {x\mapsto |x|} (the lack of smoothness at the origin need not concern us here).

Renumbering the points, we may assume {|p_1-p_2|=\delta}. Consider the function {\varphi(t)=|x_1-x_2|}, {0<t<t_c}. Differentiation yields

\displaystyle \dot{\varphi}=|x_1-x_2|^{-1}\langle \dot{x}_1-\dot{x}_2, x_1-x_2\rangle.

This inner product consists of the term

\displaystyle \langle F(x_2-x_1)-F(x_1-x_2), x_1-x_2\rangle = -2|x_1-x_2|

and the sum over {j=3,\dots,n} of

\displaystyle \langle F(x_j-x_1)-F(x_j-x_2), x_1-x_2\rangle \\ = - \langle F(x_j-x_1)-F(x_j-x_2),\ (x_j-x_1)-(x_j-x_2)\rangle \\ \le 0.

Thus, {\dot{\varphi} \le -2} for {0<t<t_c}, which implies {t_c\le \varphi(0)/2}. {\Box}

It is worth noting that if the process is stopped at the moment of first collision, it defines a map between symmetric products {r\colon ({\mathbb R}^d)^{n}\rightarrow ({\mathbb R}^d)^{n-1}} which fixes {({\mathbb R}^d)^{n-1}} pointwise. The latter statement can now be made quantitative: if a set {A\in ({\mathbb R}^d)^{n}} is at distance {\epsilon} from {({\mathbb R}^d)^{n-1}}, then {d_H(r(A),A)\le (n-1)\,\epsilon}. Indeed, in the notation of Lemma 1 we have {\delta=2 \epsilon}. Since the first collision occurs at a time {t_c\le \epsilon}, it follows that {|x_i(t_c)-p_i|\le (n-1) t_c\le (n-1)\,\epsilon}.

Time for some pictures. I changed the Scilab routine so that after each collision it drops one of the collided particles from the process, and restarts. As predicted, this did not change the qualitative picture. Main difference is that trajectories are now visibly non-smooth: angles are formed when a collision happens elsewhere.

4 points
4 points
5 points
5 points
6 points
6 points

In the above examples some trajectories cross in space, but not in space-time.

7 points
7 points

8 points
8 points
9 points
9 points

While the above plots may look like hierarchical clustering trees, solving a system of {n} nonlinear ODE is hardly a practical algorithm for clustering {n} points.

The updated scilab routine is below. This time I called it with random parameters:

 gravity2(grand(1,7,"unf",0,2),grand(1,7,"def"),300) 
function gravity2(vx,vy,Nsteps)
    h = 0.002;
    Npoints = length(vx);
    moving = ones(vx);
    tolerance = h*(Npoints-1)/3;
    x = zeros(Nsteps,Npoints);
    y = zeros(Nsteps,Npoints);
    x(1,:) = vx;
    y(1,:) = vy;
    for m = 1:(Nsteps-1)
        x(m+1,:)= x(m,:);
        y(m+1,:)= y(m,:);
        for i = 1:Npoints
            if moving(i)==0 then continue
            end
            for j = (i+1):Npoints
                if moving(j)==0 then continue
                end
                v = [x(m,j),y(m,j)]-[x(m,i),y(m,i)];
                if norm(v) < tolerance then 
                    moving(j)=0; 
                else
                    x(m+1,i) = x(m+1,i) + h*v(1)/norm(v);
                    x(m+1,j) = x(m+1,j) - h*v(1)/norm(v);
                    y(m+1,i) = y(m+1,i) + h*v(2)/norm(v);
                    y(m+1,j) = y(m+1,j) - h*v(2)/norm(v);
                end
            end
        end
    end
    clf();
    plot(x(:,:),y(:,:),'o');
endfunction 

Clustering by collisions

This construction looks like it should answer the question raised at the end of the previous post, but instead of checking whether it really works, I decided to post some pictures.

Given {n} distinct points {p_1,\dots,p_n} in a Euclidean space {{\mathbb R}^d}, consider the ODE system

\displaystyle    \dot{x}_i = \sum_{j\ne i}\frac{x_j-x_i}{|x_j-x_i|},\quad i=1,\dots,n

with the initial condition {x_i(0)=p_i}. The solution exists and is well-behaved until the first collision: {x_i=x_j} for some {i\ne j}. We can stop the process then (thus obtaining a set of at most {n-1} points) or remove the duplicates and restart. Let’s try the latter approach. To simplify things (at a cost), we can write the system as follows:

\displaystyle    \dot{x}_i = \sum_{x_j\ne x_j}\frac{x_j-x_i}{|x_j-x_i|},\quad i=1,\dots,n.

Now the solution exists for all times, but becomes stationary when all points come together. The cost to which I alluded above is that keeping multiple copies of collided points distorts the post-collision behavior, but I hope that the qualitative picture survives.

I used Scilab to plot the trajectories of this system for several initial configurations. For purposes of numeric computation, the ODE was modified once again:

\displaystyle    \dot{x}_i = \sum_{|x_j- x_j|>\epsilon }\frac{x_j-x_i}{|x_j-x_i|},\quad i=1,\dots,n.

where {\epsilon} is tolerance, related to the step size used by the solver (Euler’s method works fine).

Three points
Three points
Four points: vertices of a rectangle
Four points: vertices of a rectangle
Four points: vertices of a long rectangle
Four points: vertices of a long rectangle
Five points
Five points
Six points
Six points

In each case the particles obviously want to collide: the points placed nearby move toward each other. Indeed, suppose that {p_1} and {p_2} are much close to each other than to the other points. The ODE system yields

\displaystyle    \frac{d}{dt}(x_1-x_2) = -2\,\frac{x_1-x_2}{|x_1-x_2|} + \text{(small terms)}

where the smallness is due to the near cancellation of {\displaystyle \frac{x_j-x_1}{|x_j-x_1|}} and {\displaystyle \frac{x_j-x_2}{|x_j-x_2|}} for {j\ne 1,2}. As long as the sum of small terms is bounded by a constant less than {2}, the distance {|x_1-x_2|} will decrease at linear rate, heading toward zero.

Here is the (hastily written) scilab function. It takes three parameters: vector of x-coordinates, vector of y-coordinates, and the number of steps. I use 100 for the latter, increasing it if the points do not have enough time to collide. For example, try gravity([0,1,2,3],[2,3,0,0.5],100).


function gravity(vx,vy,Nsteps)
h = 0.01;
Npoints = length(vx);
tolerance = h*(Npoints-1)/2;
x = zeros(Nsteps,Npoints);
y = zeros(Nsteps,Npoints);
x(1,:) = vx;
y(1,:) = vy;
for m = 1:(Nsteps-1)
    x(m+1,:)= x(m,:);
    y(m+1,:)= y(m,:);
    for i = 1:Npoints
        for j = 1:Npoints
            v = [x(m,j),y(m,j)]-[x(m,i),y(m,i)];
            if norm(v) > tolerance then 
                x(m+1,i) = x(m+1,i) + h*v(1)/norm(v);
                y(m+1,i) = y(m+1,i) + h*v(2)/norm(v);
            end
        end
    end
end
clf();
plot(x(:,:),y(:,:));
endfunction

Kolakoski sequence II

The computational evidence in my previous post on the Kolakoski sequence was meagre: I looked only at the first 500 terms. By definition, the Kolakoski sequence consists of 1s and 2s, begins with 1, and is self-similar in the following sense: replacing each run by its length, you recover the original sequence. Here is its beginning, with separating spaces omitted within each run.

1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 11 2 11 2 1 22 1 22 11 2 1 22 1 2 11 2 11 22 1 22 11 2 1 22 1 22 11 2 11 2 1 22 1 2 11 22 1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 1 22 11 2 11 2 1 22 11 2 11 22 1 2 11 2 11 22 1 22 11 2 1 22 1 2 11 22 1 22 1 2 11 2 11 22 1 22 11 …

Conjecturally, the density of 1s and of 2s in this sequence is 50%. And indeed, the number of 2s among the first N terms grows in a way that’s hard to distinguish from N/2 on a plot. However, the density looks quite different if one looks at three subsequences separately:

  • a_{3k+1} (red subsequence)
  • a_{3k+2} (green subsequence)
  • a_{3k+3} (blue subsequence)

At the very beginning (among the first 500 terms) the red sequence is dominated by 1s while the blue is dominated by 2s. But the story changes quickly. The plot below shows the count of 2s in each subsequence, minus the linear prediction. The average of red, green, and blue is shown in black.

200000 terms, separated into 3 subsequences

The nearly flawless behavior of the overall count of 2s hides the wild oscillation within the three subsequences. Presumably, they continue moving back and forth and the amplitude of their oscillation appears to be growing. However, the growth looks sublinear, which means that even within each subsequence, the asymptotic density of 2s might converge to 1/2.

The computation was done in scilab. For simplicity I replaced 1s and 2s by 0s ans 1s, so that the length of the jth run is a_j+1. The variable rep indicates whether the next term should be the same as its predecessor; j keeps the count of runs.

N=200000; a=zeros(N); j=1; rep=0;
for i=2:N
if (rep==1) a(i)=a(i-1); rep=0;
else a(i)=1-a(i-1); j=j+1; rep=a(j);
end
end

Pretty neat, isn’t it? The rest of the code is less neat, though. I still don’t know of any good way to calculate running totals in scilab.

M=floor(N/3)+1; s=zeros(M,4);
for i=0:M-2
for j=1:3
s(i+2,j)=s(i+1,j)+a(3*i+j)
end
s(i+2,4)=i/2;
end
plot(s(:,1)-s(:,4),'r'); plot(s(:,2)-s(:,4),'g'); plot(s(:,3)-s(:,4),'b');
plot((s(:,1)+s(:,2)+s(:,3))/3-s(:,4),'k');

Bonus content: I extended the computations to N=1663500 (yeah, should have increased the stack size beforehand).

1663500 terms

The OEIS sequence A000002

The sequences in The On-Line Encyclopedia of Integer Sequences® are themselves arranged into a sequence, each being indexed by Axxxxxx. Hence, the sequence
2, 3, 2, 1, 3, 4, 1, 7, 7, 5, 45, 2, 181, 43, 17, ...
can never be included in the encyclopedia: its definition is a_n=1+\text{nth term of the nth sequence in the OEIS}.

By the way, which sequences have the honor of being on the virtual first page of the OEIS? A000004 consists of zeros. Its description offers the following explicit formula:

a(n)=A*sin(n*Pi) for any A. [From Jaume Oliver Lafont, Apr 02 2010]

The sequence A000001 counts the groups of order n.

The sequence A000002 consists of 1s and 2s only:
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, ...
The defining property of this sequence is self-similarity: consider the runs of equal numbers
1, 22, 11, 2, 1, 22, 1, 22, 11, 2, 11, 22, 1, 2, 11, 2, 1, 22, 11, ...
and replace each run by its length:
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, ...
You get the original sequence again.

This is the Kolakoski sequence, which first appeared in print in 1965. Contrary to what one might expect from such a simple definition, there is no simple pattern; it is not even known whether 1s and 2s appear with the same asymptotic frequency 1/2. One way to look for patterns in a function/sequence is to take its Fourier transform, and this is what I did below:

Fourier transform: click to magnify

Specifically, this is the absolute value of \displaystyle F(t)=\sum_{n=1}^{500} (2a_n-3) e^{2\pi i n t}. To better see where the peaks fall, it’s convenient to look at |F(1/t)|:

Reciprocal axis: click to magnify

Something is going on at t=1/3, i.e. at period 3. Recall that the sequence contains no triples 111 or 222: there are no runs of more than 2 equal numbers. So perhaps it’s not surprising that a pattern emerges at period 3. And indeed, even though the percentage of 1s among the first 500 terms of the sequence is 49.8%, breaking it down by n mod 3 yields the following:

  • Among a_{3k}, the percentage of 1s is 17.4%
  • Among a_{3k+1}, the percentage of 1s is 85.2%
  • Among a_{3k+2}, the percentage of 1s is 46.8%

Noticing another peak at t=1/9, I also broke down the percentages by n mod 9:

  • n=9k: there are 5.4% 1s
  • n=9k+1: there are 74.8% 1s
  • n=9k+2: there are 41.4% 1s
  • n=9k+3: there are 21.6% 1s
  • n=9k+4: there are 97.2% 1s
  • n=9k+5: there are 70.2% 1s
  • n=9k+6: there are 25.2% 1s
  • n=9k+7: there are 84.6% 1s
  • n=9k+8: there are 28.8% 1s

Nothing close to 50%…

In conclusion, my code for the Kolakoski sequence. This time I used Maple (instead of Scilab) for no particular reason.

N:=500; ks:=Array(1..N+2); i:=0; j:=1; new:=1;
while i<=N do i:=i+1; ks[i]:=new;
if (ks[j]=2) then i:=i+1; ks[i]:=new; end if;
j:=j+1; new:=3-new;
end do:

The Alabern-Mateu-Verdera characterization of Sobolev spaces

I’ve been trying to understand how the Alabern-Mateu-Verdera characterization of Sobolev spaces works. Consider a function f\colon \mathbb R\to\mathbb R. Let f_t be the average of f on scale t>0; that is, \displaystyle f_t(x)=\frac{1}{2t}\int_{x-t}^{x+t}f(y)\,dy. The difference |f_t-f| measures the deviation of f from a line. AMV define the square function S_f as a weighted average of the squares of such deviations:

\displaystyle S_f^2(x)=\int_0^{\infty} \left|\frac{f_t(x)-f(x)}{t}\right|^2 \frac{dt}{t}

Since I’m mostly interested in the local matters, I’ll use the truncated square function S_f^2(x)=\int_0^{1}\dots which avoids the large-scale considerations. If f is very nice (say, twice continuously differentiable), then |f_t-f| is of order t^2, and the integral converges with room to spare. For example, here is the Gaussian (f is in blue, S_f in red):

Gaussian

This looks suspicious. Clearly, S_f measures the size of the second derivative, not of the first. Yet, one of the Alabern-Mateu-Verdera theorems is for the Sobolev space of first order W^{1,p}: namely, f\in W^{1,p} \iff S_f\in L^p (for finite p). So, the degree of integrability of |f'| matches that of S_f, even though the functions look very different.

For functions that are not very nice S_f may be infinite for some values of x. For example, if the graph of f has a corner at some point, then |f_t-f| is of order t there, and the integral defining S_f diverges as \displaystyle \frac{dt}{t}. For example, take the triangle f(x)=(1-|x|)^+:

Triangle

The triangle is a Lipschitz, i.e., s\in W^{1,\infty}, but its S_f is not bounded. So, the AMV characterization f\in W^{1,p} \iff S_f\in L^p does not extend to p=\infty. However, the blow-up rate of S_f in this example is merely logarithmic (|\log{}|^{1/2} to be precise), which implies S_f\in L^p for all p<\infty, in accordance with the AMV theorem. Again, we notice that S_f and |f'| look rather unlike each other… except that S_f now resembles the absolute value of the Hilbert transform of f'.

Here is the semicircle f(x)=\sqrt{1-x^2}:

At the endpoints of the arc |f'| blows up as (1-|x|)^{-1/2}, and therefore f\in W^{1,p} only when p<2. And indeed, near the endpoints the nonlinearity on scale t is about \sqrt{t}, which turns the integrand in the definition of S_f into \displaystyle \frac{dt}{t^2}. Hence, S_f^2(x)\sim \frac{1}{|x-1|} as x\to 1. We have S_f\in L^p iff p<2, as needed.

The last example, f(x)=(1-\sqrt{|x|})^+, has both a cusp and two corners, demonstrating the different rates at which S_f blows up.

Cusp and corners

My Scilab code is probably not the most efficient one for this purpose. I calculated f_t-f using a multidiagonal matrix with -1 on the main diagonal and 1/(2k) on the nearest 2k diagonals.

step=0.01; scale=1; x=[-3:step:3]; n=length(x); s=zeros(n)
f=max(0,1-sqrt(abs(x)))
for k=1:(scale/step) do
avg=-diag(ones(n,1))
for j=1:k do
avg=avg+diag(ones(n-j,1)/(2*k),j)+diag(ones(n-j,1)/(2*k),-j)
end
s=s+(avg*f').^2/(step^2*k^3)
end
s=s.^(1/2)
clf(); plot(x,f); plot(x,s,'red')

Galperin’s billiard method of computing pi

For the lack of original pi-related ideas, I describe a billiard experiment invented and popularized by Gregory Galperin (Григорий Гальперин). I read about it in a short note he published in 2001 in the Russian journal “Mathematical Education” (“Математическое Просвещение”). Although the table of contents is in Russian, it’s not hard to find the article with “pi” in the title. It’s clear from the article that the observation precedes its publication by some years; the author gave talks on this subject for a while before committing it to print.

Place two billiard balls (red and blue) on the half-line [0,\infty): the red ball is at rest, the blue ball is to the right of it and is moving the the left. At x=0 there is an elastic wall.

1-dimensional billiard table

Assume perfectly elastic collisions: the kinetic energy is preserved. How many collisions will happen depends on the masses of the ball, or rather their ratio. Suppose they have equal mass (ratio=1). Then we’ll see the following:

  1. Blue hits Red and stops; Red begins to move to the left
  2. Red hits the wall and bounces back
  3. Red hits Blue and stops; Blue begins to move to the right

That’s all, 3 collisions in total. Here is a Scilab-generated illustration: I placed red at x=1 and blue at x=1.5, with initial velocity -1. The time axis is vertical.

Mass ratio 1 leads to 3 collisions

Number 3 is also the first digit of \pi. Coincidence? No.

Consider the blue/red mass ratio of 100. Now we really need scilab. Count the collisions as you scroll down:

Mass ratio 100 leads to 31 collisions

Yes, that’s exactly 31 collisions. The mass ratio of 10000 would yield 314 collisions, and so on.

How does this work? The trick is to consider the two-dimensional configuration space in which the axes are scaled proportional to the square roots of masses. A point on the configuration plane describes the position of two balls at once, but it also behaves just like a billiard ball itself, bouncing off the lines (red=0) and (red=blue). The scaling by square roots of masses is needed to make the (red=blue) bounce work correctly: the angle of incidence is equal to the angle of reflection. (See Update 2 below.)

Configuration space

The opening angle is \alpha=\tan^{-1}\sqrt{M_r/M_b}. Now we can straighten the billiard trajectory by repeated reflection: this will help us count the number of collisions.

Straightening a billiard trajectory by reflection

It remains to count how many angles of size \alpha=\tan^{-1}\sqrt{M_r/M_b} fit inside of angle \pi. By taking \sqrt{M_r/M_b}=10^{-n}, we should get 10^n \pi collisions, rounded to an integer one way or the other. It seems that the rounding is always down, that is, we get exactly the first n digits of \pi. I once tried to prove that this always happens, but unsuccessfully. The error of approximation \tan^{-1} 10^{-n}\approx 10^{-n} needs to be compared to the fractional part of 10^n \pi, which looks like tricky business.

Here’s the scilab source. The plots in this post were obtained with galperin(1) and galperin(100).

function galperin(m)
    h=0.001; steps=4000;  x1=zeros(steps); x2=zeros(steps);
    x1(1)=1.5;  v1=-1;  x2(1)=1;  v2=0; collision=0; j=1;
    while j<steps,
        select collision
        case 0 then
            newx1=x1(j)+h*v1; newx2=x2(j)+h*v2;
            if (newx2<0) then collision=1;
            elseif (newx1<newx2) then collision=2;
            else
                x1(j+1)=newx1; x2(j+1)=newx2; j=j+1;
            end
        case 1 then
            v2=-v2; x1(j+1)=x1(j); x2(j+1)=x2(j); collision=0; j=j+1;
        case 2 then
            newv1=(v1*(m-1)+2*v2)/(m+1);  newv2=(v2*(1-m)+2*m*v1)/(m+1);
            v1=newv1; v2=newv2; x1(j+1)=x1(j); x2(j+1)=x2(j);
            collision=0; j=j+1;
        end
    end
    t=linspace(0, h*steps, steps); plot(t, x1); plot(t, x2, 'red');
endfunction

Update: this post was referred to from Math StackExchange, with a request for a more intuitive explanation of how the scaling by \sqrt{M} works. Let V_c be the velocity vector of the point in the configuration space. Thanks to the scaling, we have the following:

  • The magnitude of V_c is equal to the total kinetic energy of the system, up to a constant factor.
  • The projection of V_c onto the collision line x_r=x_b is equal to the total momentum of the system, up to a constant factor.

When the balls collide with each other, both the energy and the momentum are preserved. It follows that the new vector V_c will have the same magnitude and the same projection onto the collision line. By trigonometry, this implies the equality of the incident and reflected angles.