Asymptotics of 3n+1 stopping time

It is a well-known open problem whether the following process terminates for every positive integer:

3n+1 flow chart
3n+1 flow chart

Experiments suggest that it does, possibly with unintended side effects.

Since for any odd integer {n} the number {3n+1} is even, it is convenient to replace the step {n:=3n+1} with {n:=(3n+1)/2}, as show below:

Optimized flow
(3n+1)/2 flow chart

As a function of {n}, the stopping time has a nice patterned graph:

Stopping time
Stopping time

An odd integer {n} is of the form {4k+1} or {4k+3}. In the first case, {(3n+1)/2 = 6k+2} is even, while in the second case {(3n+1)/2 = 6k+5} is odd. So, if {n} is picked randomly from all odd integers of certain size, the probability of {(3n+1)/2} being even is {1/2}. Similarly, for an even {n}, the number {n/2} can be even or odd with probability {1/2}.

This leads to a stochastic model of the process:

Stochastic flow
Stochastic flow

The graph of stopping time in the stochastic model is, of course random. It looks nothing like the nice pattern of the deterministic process.

Stopping time, stochastic version
Stopping time, stochastic version

However, smoothing out both graphs by moving window of width {200} or so, we see the similarity:

Moving averages, deterministic and stochastic
Moving averages, deterministic and stochastic

The stochastic process is much easier to analyze. Focusing on the logarithm {\log x}, we see that it changes either by {\log(1/2)} or by approximately {\log (3/2)}. The expected value of the change is {\displaystyle \frac12\log (3/4)}. This suggests that we can expect the logarithm to drop down to {\log 1=0} in about {\displaystyle \frac{2}{\log (4/3)}\log x} steps. (Rigorous derivation requires more tools from probability theory, but is still routine.)

The curve {\displaystyle \frac{2}{\log (4/3)}\log x} fits the experimental data nicely. (The red curve, being randomly generated, is different from the one on the previous graph.)

Logarithmic growth
Logarithmic growth

For an in-depth investigation, see Lower bounds for the total stopping time of 3X+1 iterates by Applegate and Lagarias.

For the computations, I used Scilab. The function hail(n,m) calculates the stopping times up to given value of n, and takes moving average with window size m (which can be set to 1 for no averaging).

function hail(n,m)
    for i=2:n 
        while k>=i 
            if modulo(k,2)==0 then 
    total = cumsum(steps) 
    for i=1:n-m

As soon as the result of computations drops below the starting value, the number of remaining steps is fetched from the array that is already computed. This speeds up the process a bit.

The second function follows the stochastic model, for which the aforementioned optimization is not available. This is actually an interesting point: it is conceivable that the stochastic model would be more accurate if it also used the pre-computed stopping time once {x} drops below the starting value. This would change the distribution of stopping times, resulting in wider fluctuations after averaging.

function randomhail(n,m)
    for i=2:n 
        while k>1 
            if grand(1,1,"bin",1,1/2)==0 then 
    rtotal = cumsum(rsteps) 
    for i=1:n-m

Two subspaces, part II

Good news first. Given a 2-dimensional subspace M of \mathbb R^4 (or \mathbb C^4) such that e_j\notin M\cup M^\perp (j=1,2,3,4), we can always find a coordinate subspace N such that M and N are in generic position. (Recall that generic position means (M\cup M^\perp)\cap (N\cup N^\perp)=\{0\}, which is feasible only when the dimension of M and N is half the dimension of the ambient space.)

Indeed, suppose that none of the subspaces \langle e_1,e_2\rangle, \langle e_1,e_3\rangle, \langle e_1,e_4\rangle are in generic position with M. By the pigeonhole principle, either M or M^\perp meets at least two of these subspaces. We may assume M meets \langle e_1,e_2\rangle and \langle e_1,e_3\rangle. Since M is two-dimensional, it follows that M\subset \langle e_1,e_2,e_3\rangle. Hence e_4\in M^\perp, a contradiction.

The story is different in n=6 and higher dimensions. Consider the space

M=\langle e_1-e_2,e_1-e_3,e_4+e_5+e_6\rangle, with M^\perp=\langle e_1+e_2+e_3,e_4-e_5,e_4-e_6\rangle

The condition \forall j\ e_j\notin M\cup M^\perp holds. Yet, for any 3-dimensional coordinate space N, either N or its complement contains at least two of vectors e_1,e_2,e_3, and therefore intersect M. Damn perfidious pigeons.

So it’s not enough to demand that M\cup M^\perp does not contain any basis vectors. Let’s ask it to stay away from the basis vectors, as far as possible. By the Pythagorean theorem, the maximal possible distance of e_j to M\cup M^\perp is 1/\sqrt{2}, attained when e_j is equidistant from M and M^\perp. Let’s call M an equidistant subspace if this holds for all e_j. There are at least two other natural ways to express this property:

  • In the basis \{e_1,\dots,e_{2n}\}, the projection onto M is a matrix with 1/2 on the diagonal
  • Reflection across M sends e_j into a vector orthogonal to e_j. As a matrix, this reflection has 0s on the diagonal.

In two dimensions, the only equidistant subspaces are y=x and y=-x. In higher dimensions they form a connected subset of the Grassmannian \mathrm{Gr} (n/2, n) (self-promotion).

Is every equidistant subspace in generic position with some coordinate subspace?

We already saw that the answer is yes when n=2,4. It is also affirmative when n=6 (G. Weiss and V. Zarikian, “Paving small matrices and the Kadison-Singer extension problem”, 2010). I am 75% sure that the answer is yes in general.

The OEIS sequence A000002

The sequences in The On-Line Encyclopedia of Integer Sequences® are themselves arranged into a sequence, each being indexed by Axxxxxx. Hence, the sequence
2, 3, 2, 1, 3, 4, 1, 7, 7, 5, 45, 2, 181, 43, 17, ...
can never be included in the encyclopedia: its definition is a_n=1+\text{nth term of the nth sequence in the OEIS}.

By the way, which sequences have the honor of being on the virtual first page of the OEIS? A000004 consists of zeros. Its description offers the following explicit formula:

a(n)=A*sin(n*Pi) for any A. [From Jaume Oliver Lafont, Apr 02 2010]

The sequence A000001 counts the groups of order n.

The sequence A000002 consists of 1s and 2s only:
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, ...
The defining property of this sequence is self-similarity: consider the runs of equal numbers
1, 22, 11, 2, 1, 22, 1, 22, 11, 2, 11, 22, 1, 2, 11, 2, 1, 22, 11, ...
and replace each run by its length:
1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, ...
You get the original sequence again.

This is the Kolakoski sequence, which first appeared in print in 1965. Contrary to what one might expect from such a simple definition, there is no simple pattern; it is not even known whether 1s and 2s appear with the same asymptotic frequency 1/2. One way to look for patterns in a function/sequence is to take its Fourier transform, and this is what I did below:

Fourier transform: click to magnify

Specifically, this is the absolute value of \displaystyle F(t)=\sum_{n=1}^{500} (2a_n-3) e^{2\pi i n t}. To better see where the peaks fall, it’s convenient to look at |F(1/t)|:

Reciprocal axis: click to magnify

Something is going on at t=1/3, i.e. at period 3. Recall that the sequence contains no triples 111 or 222: there are no runs of more than 2 equal numbers. So perhaps it’s not surprising that a pattern emerges at period 3. And indeed, even though the percentage of 1s among the first 500 terms of the sequence is 49.8%, breaking it down by n mod 3 yields the following:

  • Among a_{3k}, the percentage of 1s is 17.4%
  • Among a_{3k+1}, the percentage of 1s is 85.2%
  • Among a_{3k+2}, the percentage of 1s is 46.8%

Noticing another peak at t=1/9, I also broke down the percentages by n mod 9:

  • n=9k: there are 5.4% 1s
  • n=9k+1: there are 74.8% 1s
  • n=9k+2: there are 41.4% 1s
  • n=9k+3: there are 21.6% 1s
  • n=9k+4: there are 97.2% 1s
  • n=9k+5: there are 70.2% 1s
  • n=9k+6: there are 25.2% 1s
  • n=9k+7: there are 84.6% 1s
  • n=9k+8: there are 28.8% 1s

Nothing close to 50%…

In conclusion, my code for the Kolakoski sequence. This time I used Maple (instead of Scilab) for no particular reason.

N:=500; ks:=Array(1..N+2); i:=0; j:=1; new:=1;
while i<=N do i:=i+1; ks[i]:=new;
if (ks[j]=2) then i:=i+1; ks[i]:=new; end if;
j:=j+1; new:=3-new;
end do:

Oversampling of polynomials

Suppose p(z)=c_0+\dots +c_d z^d is a polynomial which we can evaluate at the Nth roots of unity. By way of normalization, assume that |p(z)|\le 1 whenever z^N=1. What is the best upper estimate for |p| on the unit circle, i.e., the smallest number K(N,d) for which we can say \sup_{|z|=1}|p(z)|\le K(N,d)?

There is no bound at all if N\le d, because p can vanish at all roots of unity and be as large elsewhere as it pleases. So the first nontrivial case is N=d+1. In this case our p is the Lagrange interpolation polynomial with equidistributed nodes on the unit circle. Since its values at the nodes are bounded by 1, the matter reduces to estimating the basis polynomials \ell_j(z)=\prod_{m\ne j}\frac{z-\zeta_m}{\zeta_j-\zeta_m} where \zeta_m are the nodes. This leads to geometrically appealing calculations such as

N black dots uniformly distributed on the unit circle


N black dots uniformly distributed on the unit circle; and a green midpoint

Alexander Overwijk I am not.

After summation we get a partial sum of the harmonic series, and so K(d+1,d) is of order \log d. I’ve seen this result attributed to Marcinkiewicz, specifically to his 1937 paper Sur la divergence de polynomes d’interpolation. However, I could not find anything of the sort in this paper. The estimate is used, without proof, in his other paper in the same issue of the journal: Quelques remarques sur l’interpolation. I’m pretty sure Marcinkiewicz did not consider the estimate new; most likely, Bernstein and Faber did such computations earlier.

Anyway, what happens if N>d+1? Fast forward to the 21 century. In the 2005 paper On discrete norms of polynomials by Рахманов and Шехтман, the constant K(d,N) is shown to be of order \log \frac{N}{N-d}+1 for all N>d. The logarithm matters only if N/d is close to 1; for large values (say, N\ge 2d) this estimate is simply K_1\le K(N,d)\le K_2 with implicit constants K_1,K_2. Keeping d fixed, we should have K(N,d)\to 1 as N\to\infty, but the estimate does not tell us this.

Shortly thereafter (2008), T. Sheil-Small obtained a clean estimate K(N,d) \le \sqrt{\frac{N}{N-d}}, which indeed has the property K(N,d)\to 1 as N\to\infty. Interestingly, the estimate is sharp when N=2d (and only in this case): a multiple of z^d+i achieves the value K(2d,d)=\sqrt{2}. The extremal polynomial has a zero at the midpoint of every other gap between the Nth roots of unity.

And quite recently (2011), В.Н. Дубинин obtained the estimate K(N,d) \le 1/\cos\frac{\pi d}{2N} which is sharp whenever d divides N (and only then). In particular, that \sqrt{2} was secretly 1/\cos \frac{\pi}{4}. The polynomial that achieves the value K(md,d) has a zero at the midpoint of every mth gap between the Nth roots of unity.

Can one compute K(N,d) when N is not divisible by d? I don’t want to say it’s impossible, but one would first need to understand the extremal polynomials, and my numerical experiments show no clear pattern. Their zeros do not lie on the unit circle; they are sometimes inside and sometimes outside. So I’m not optimistic.

Integer sets without singular matrices

Let’s say that a set A\subset \mathbb N contains an n\times n singular matrix if there exists an n\times n matrix with determinant zero whose entries are distinct elements of A. For example, the set of prime numbers does not contain any 2\times 2 singular matrices; however, for every n\ge 3 it contains infinitely many n\times n singular matrices.

I don’t know of an elementary proof of the latter fact. By a 1939 theorem of van der Corput, the set of primes contains infinitely many progressions of length 3. (Much more recently, Green and Tao replaced 3 with an arbitrary k.) If every row of a matrix begins with a 3-term arithmetic progression, the matrix is singular.

In the opposite direction, one may want to see an example of an infinite set A\subset \mathbb N that contains no singular matrices of any size. Here is one:
Continue reading “Integer sets without singular matrices”

Almost norming functionals, Part 2

Let E be a real Banach space with the dual E^*. Fix \delta\in (0,1) and call a linear functional e^*\in E^* almost norming for e if |e|=|e^*|=1 and e^*(e)\ge \delta. In Part 1 I showed that in any Banach space there exists a continuous selection of almost norming functionals. Here I will prove that there is no uniformly continuous selection in \ell_1.

Claim. Let S be the unit sphere in \ell_1^n, the n-dimensional \ell_1-space.  Suppose that f\colon S\to \ell_{\infty}^n is a map such that f(e) is almost norming e in the above sense. Then the modulus of continuity \omega_f satisfies \omega_f(2/n)\ge 2\delta.

(If an uniformly continuous selection was available in \ell_1, it would yield selections in \ell_1^n with a modulus of continuity independent of n.)

Proof. Write f=(f_1,\dots,f_n). For any \epsilon\in \{-1,1\}^n we have n^{-1}\epsilon \in S, hence

\sum\limits_{i=1}^n \epsilon_i f_i(n^{-1}\epsilon)\ge n\delta for all \epsilon\in \{-1,1\}^n. Sum over all \epsilon and change the order of summation:

\sum\limits_{i=1}^n \sum\limits_{\epsilon}\epsilon_i f_i(n^{-1}\epsilon)\ge n2^n\delta

There exists i\in\{1,2,\dots,n\} such that

\sum\limits_{\epsilon}\epsilon_i f_i(n^{-1}\epsilon) \ge 2^n \delta

Fix this i from now on. Define \tilde \epsilon to be the same \pm vector as \epsilon, but with the ith component flipped. Rewrite the previous sum as

\sum\limits_{\epsilon} -\epsilon_i f_i(n^{-1}\tilde \epsilon)\ge 2^n\delta

and add them together:

\sum\limits_{\epsilon}\epsilon_i [f_i(n^{-1}\epsilon)-f_i(n^{-1}\tilde \epsilon)]\ge 2^{n+1}\delta

Since \|n^{-1}\epsilon-n^{-1}\tilde \epsilon\|=2/n, it follows that 2^n \omega_f(2/n) \ge 2^{n+1}\delta, as claimed.

Controlled bilipschitz extension

A map f\colon X\to Y is L-bilipschitz if L^{-1} |a-b| \le |f(a)-f(b)| \le L |a-b| for all a,b\in X. This definition makes sense if X and Y are general metric spaces, but let’s suppose they are subsets on the plane \mathbb R^2.

Definition 1. A set A\subset \mathbb R^2 has the BL extension property if any bilipschitz map f\colon A\to\mathbb R^2 can be extended to a bilipschitz map F\colon \mathbb R^2\to\mathbb R^2. (Extension means that F is required to agree with f on A.)

Lines and circles have the BL extension property. This was proved in early 1980s independently by Tukia, Jerison and Kenig, and Latfullin.

Definition 2. A set A\subset \mathbb R^2 has the controlled BL extension property if there exists a constant C such that any L-bilipschitz map f\colon A\to\mathbb R^2 can be extended to a C L-bilipschitz map F\colon \mathbb R^2\to\mathbb R^2.

Clearly, Definition 2 asks for more than Definition 1. I can prove that a line has the controlled BL extension property, even with a modest constant such as C=2000. (Incidentally, one cannot take C=1.) I still can’t prove the controlled BL extension property for a circle.

Update: extension from line is done in this paper and from the circle in this one.