## Kolakoski turtle curve

Let’s take another look at the Kolakoski sequence (part 1, part 2) which, by definition, is the sequence of 1s and 2s in which the nth term is equal to the length of the nth run of consecutive equal numbers in the same sequence. When a sequence has only two distinct entries, it can be visualized with the help of a turtle that turns left (when the entry is 1) or right (when the entry is 2). This visualization method seems particularly appropriate for the  Kolakoski sequence since there are no runs of 3 equal entries, meaning the turtle will never move around a square of sidelength equal to its step. In particular, this leaves open the possibility of getting a simple curve… Here are the first 300 terms; the turtle makes its first move down and then goes left-right-right-left-left-right-left-… according to the terms 1,2,2,1,1,2,1,…

No self-intersections yet… alas, at the 366th term it finally happens.

Self-intersections keep occurring after that:

again and again…

Okay, the curve obviously doesn’t mind intersecting self. But it can’t be periodic since the Kolakoski sequence isn’t. This leaves two questions unresolved:

• Does the turtle ever get above its initial position? Probably… I haven’t tried more than 5000 terms
• Is the curve bounded? Unlikely, but I’ve no idea how one would dis/prove that. For example, there cannot be a long diagonal run (left-right-left-right-left) because having 1,2,1,2,1 in the sequence implies that elsewhere, there are three consecutive 1s, and that doesn’t happen.

Here’s the Python code used for the above. I represented the sequence as a Boolean array with 1 = False, 2 = True.

import numpy as np
import turtle
n = 500                   # number of terms to compute
a = np.zeros(n, dtype=np.bool_)
j = 0                     # the index to look back at
same = False   # will next term be same as current one?
for i in range(1, n):
if same:
a[i] = a[i-1]     # current run continues
same = False
else:
a[i] = not a[i-1] # the run is over
j += 1            # another run begins
same = a[j]       # a[j] determines its length

turtle.hideturtle()
turtle.right(90)
for i in range(n):
turtle.forward(10)    # used steps of 10 or 5 pixels
if a[i]:
turtle.right(90)
else:
turtle.left(90)


## Kolakoski sequence II

The computational evidence in my previous post on the Kolakoski sequence was meagre: I looked only at the first 500 terms. By definition, the Kolakoski sequence consists of 1s and 2s, begins with 1, and is self-similar in the following sense: replacing each run by its length, you recover the original sequence. Here is its beginning, with separating spaces omitted within each run.

1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 11 2 11 2 1 22 1 22 11 2 1 22 1 2 11 2 11 22 1 22 11 2 1 22 1 22 11 2 11 2 1 22 1 2 11 22 1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 1 22 11 2 11 2 1 22 11 2 11 22 1 2 11 2 11 22 1 22 11 2 1 22 1 2 11 22 1 22 1 2 11 2 11 22 1 22 11 …

Conjecturally, the density of 1s and of 2s in this sequence is 50%. And indeed, the number of 2s among the first N terms grows in a way that’s hard to distinguish from N/2 on a plot. However, the density looks quite different if one looks at three subsequences separately:

• $a_{3k+1}$ (red subsequence)
• $a_{3k+2}$ (green subsequence)
• $a_{3k+3}$ (blue subsequence)

At the very beginning (among the first 500 terms) the red sequence is dominated by 1s while the blue is dominated by 2s. But the story changes quickly. The plot below shows the count of 2s in each subsequence, minus the linear prediction. The average of red, green, and blue is shown in black.

The nearly flawless behavior of the overall count of 2s hides the wild oscillation within the three subsequences. Presumably, they continue moving back and forth and the amplitude of their oscillation appears to be growing. However, the growth looks sublinear, which means that even within each subsequence, the asymptotic density of 2s might converge to 1/2.

The computation was done in scilab. For simplicity I replaced 1s and 2s by 0s ans 1s, so that the length of the $j$th run is $a_j+1$. The variable rep indicates whether the next term should be the same as its predecessor; $j$ keeps the count of runs.
 N=200000; a=zeros(N); j=1; rep=0; for i=2:N if (rep==1) a(i)=a(i-1); rep=0; else a(i)=1-a(i-1); j=j+1; rep=a(j); end end 

Pretty neat, isn’t it? The rest of the code is less neat, though. I still don’t know of any good way to calculate running totals in scilab.
 M=floor(N/3)+1; s=zeros(M,4); for i=0:M-2 for j=1:3 s(i+2,j)=s(i+1,j)+a(3*i+j) end s(i+2,4)=i/2; end plot(s(:,1)-s(:,4),'r'); plot(s(:,2)-s(:,4),'g'); plot(s(:,3)-s(:,4),'b'); plot((s(:,1)+s(:,2)+s(:,3))/3-s(:,4),'k'); 

Bonus content: I extended the computations to N=1663500 (yeah, should have increased the stack size beforehand).

## The OEIS sequence A000002

The sequences in The On-Line Encyclopedia of Integer Sequences® are themselves arranged into a sequence, each being indexed by Axxxxxx. Hence, the sequence
2, 3, 2, 1, 3, 4, 1, 7, 7, 5, 45, 2, 181, 43, 17, ...
can never be included in the encyclopedia: its definition is $a_n=1+\text{nth term of the nth sequence in the OEIS}$.

By the way, which sequences have the honor of being on the virtual first page of the OEIS? A000004 consists of zeros. Its description offers the following explicit formula:

a(n)=A*sin(n*Pi) for any A. [From Jaume Oliver Lafont, Apr 02 2010]

The sequence A000001 counts the groups of order n.

The sequence A000002 consists of 1s and 2s only:
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, ...
The defining property of this sequence is self-similarity: consider the runs of equal numbers
1, 22, 11, 2, 1, 22, 1, 22, 11, 2, 11, 22, 1, 2, 11, 2, 1, 22, 11, ...
and replace each run by its length:
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, ...
You get the original sequence again.

This is the Kolakoski sequence, which first appeared in print in 1965. Contrary to what one might expect from such a simple definition, there is no simple pattern; it is not even known whether 1s and 2s appear with the same asymptotic frequency 1/2. One way to look for patterns in a function/sequence is to take its Fourier transform, and this is what I did below:

Specifically, this is the absolute value of $\displaystyle F(t)=\sum_{n=1}^{500} (2a_n-3) e^{2\pi i n t}$. To better see where the peaks fall, it’s convenient to look at $|F(1/t)|$:

Something is going on at $t=1/3$, i.e. at period 3. Recall that the sequence contains no triples 111 or 222: there are no runs of more than 2 equal numbers. So perhaps it’s not surprising that a pattern emerges at period 3. And indeed, even though the percentage of 1s among the first 500 terms of the sequence is 49.8%, breaking it down by n mod 3 yields the following:

• Among $a_{3k}$, the percentage of 1s is 17.4%
• Among $a_{3k+1}$, the percentage of 1s is 85.2%
• Among $a_{3k+2}$, the percentage of 1s is 46.8%

Noticing another peak at $t=1/9$, I also broke down the percentages by n mod 9:

• $n=9k$: there are 5.4% 1s
• $n=9k+1$: there are 74.8% 1s
• $n=9k+2$: there are 41.4% 1s
• $n=9k+3$: there are 21.6% 1s
• $n=9k+4$: there are 97.2% 1s
• $n=9k+5$: there are 70.2% 1s
• $n=9k+6$: there are 25.2% 1s
• $n=9k+7$: there are 84.6% 1s
• $n=9k+8$: there are 28.8% 1s

Nothing close to 50%…

In conclusion, my code for the Kolakoski sequence. This time I used Maple (instead of Scilab) for no particular reason.

N:=500; ks:=Array(1..N+2); i:=0; j:=1; new:=1; while i<=N do i:=i+1; ks[i]:=new; if (ks[j]=2) then i:=i+1; ks[i]:=new; end if; j:=j+1; new:=3-new; end do: