The computational evidence in my previous post on the Kolakoski sequence was meagre: I looked only at the first 500 terms. By definition, the Kolakoski sequence consists of 1s and 2s, begins with 1, and is self-similar in the following sense: replacing each run by its length, you recover the original sequence. Here is its beginning, with separating spaces omitted within each run.

1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 11 2 11 2 1 22 1 22 11 2 1 22 1 2 11 2 11 22 1 22 11 2 1 22 1 22 11 2 11 2 1 22 1 2 11 22 1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 1 22 11 2 11 2 1 22 11 2 11 22 1 2 11 2 11 22 1 22 11 2 1 22 1 2 11 22 1 22 1 2 11 2 11 22 1 22 11 …

Conjecturally, the density of 1s and of 2s in this sequence is 50%. And indeed, the number of 2s among the first N terms grows in a way that’s hard to distinguish from N/2 on a plot. However, the density looks quite different if one looks at three subsequences separately:

- (red subsequence)
- (green subsequence)
- (blue subsequence)

At the very beginning (among the first 500 terms) the red sequence is dominated by 1s while the blue is dominated by 2s. But the story changes quickly. The plot below shows the count of 2s in each subsequence, minus the linear prediction. The average of red, green, and blue is shown in black.

The nearly flawless behavior of the overall count of 2s hides the wild oscillation within the three subsequences. Presumably, they continue moving back and forth and the amplitude of their oscillation appears to be growing. However, the growth looks sublinear, which means that even within each subsequence, the asymptotic density of 2s might converge to 1/2.

The computation was done in scilab. For simplicity I replaced 1s and 2s by 0s ans 1s, so that the length of the th run is . The variable *rep* indicates whether the next term should be the same as its predecessor; keeps the count of runs.

N=200000; a=zeros(N); j=1; rep=0;

for i=2:N

if (rep==1) a(i)=a(i-1); rep=0;

else a(i)=1-a(i-1); j=j+1; rep=a(j);

end

end

Pretty neat, isn’t it? The rest of the code is less neat, though. I still don’t know of any good way to calculate running totals in scilab.

M=floor(N/3)+1; s=zeros(M,4);

for i=0:M-2

for j=1:3

s(i+2,j)=s(i+1,j)+a(3*i+j)

end

s(i+2,4)=i/2;

end

plot(s(:,1)-s(:,4),'r'); plot(s(:,2)-s(:,4),'g'); plot(s(:,3)-s(:,4),'b');

plot((s(:,1)+s(:,2)+s(:,3))/3-s(:,4),'k');

Bonus content: I extended the computations to N=1663500 (yeah, should have increased the stack size beforehand).