## Asymptotics of 3n+1 stopping time

It is a well-known open problem whether the following process terminates for every positive integer:

Experiments suggest that it does, possibly with unintended side effects.

Since for any odd integer ${n}$ the number ${3n+1}$ is even, it is convenient to replace the step ${n:=3n+1}$ with ${n:=(3n+1)/2}$, as show below:

As a function of ${n}$, the stopping time has a nice patterned graph:

An odd integer ${n}$ is of the form ${4k+1}$ or ${4k+3}$. In the first case, ${(3n+1)/2 = 6k+2}$ is even, while in the second case ${(3n+1)/2 = 6k+5}$ is odd. So, if ${n}$ is picked randomly from all odd integers of certain size, the probability of ${(3n+1)/2}$ being even is ${1/2}$. Similarly, for an even ${n}$, the number ${n/2}$ can be even or odd with probability ${1/2}$.

This leads to a stochastic model of the process:

The graph of stopping time in the stochastic model is, of course random. It looks nothing like the nice pattern of the deterministic process.

However, smoothing out both graphs by moving window of width ${200}$ or so, we see the similarity:

The stochastic process is much easier to analyze. Focusing on the logarithm ${\log x}$, we see that it changes either by ${\log(1/2)}$ or by approximately ${\log (3/2)}$. The expected value of the change is ${\displaystyle \frac12\log (3/4)}$. This suggests that we can expect the logarithm to drop down to ${\log 1=0}$ in about ${\displaystyle \frac{2}{\log (4/3)}\log x}$ steps. (Rigorous derivation requires more tools from probability theory, but is still routine.)

The curve ${\displaystyle \frac{2}{\log (4/3)}\log x}$ fits the experimental data nicely. (The red curve, being randomly generated, is different from the one on the previous graph.)

For an in-depth investigation, see Lower bounds for the total stopping time of 3X+1 iterates by Applegate and Lagarias.

For the computations, I used Scilab. The function hail(n,m) calculates the stopping times up to given value of n, and takes moving average with window size m (which can be set to 1 for no averaging).

 function hail(n,m) steps=zeros(1:n); steps(1)=0 for i=2:n k=i; s=0; while k>=i if modulo(k,2)==0 then k=k/2; s=s+1; else k=(3*k+1)/2; s=s+1; end end steps(i)=s+steps(k); end total = cumsum(steps) for i=1:n-m average(i)=(total(i+m)-total(i))/m; end plot(average,'+'); endfunction 

As soon as the result of computations drops below the starting value, the number of remaining steps is fetched from the array that is already computed. This speeds up the process a bit.

The second function follows the stochastic model, for which the aforementioned optimization is not available. This is actually an interesting point: it is conceivable that the stochastic model would be more accurate if it also used the pre-computed stopping time once ${x}$ drops below the starting value. This would change the distribution of stopping times, resulting in wider fluctuations after averaging.

 function randomhail(n,m) rsteps=zeros(1:n); rsteps(1)=0 for i=2:n k=i; s=0; while k>1 if grand(1,1,"bin",1,1/2)==0 then k=k/2; s=s+1; else k=(3*k+1)/2; s=s+1; end end rsteps(i)=s; end rtotal = cumsum(rsteps) for i=1:n-m raverage(i)=(rtotal(i+m)-rtotal(i))/m; end plot(raverage,'r+'); endfunction 

## Walking randomly again

This is a follow-up to An exercise in walking randomly, but can be read independently.

Suppose I randomly and blindly walk back and forth along this hallway, which may or may not happen in reality. The safest place to start is in the middle, next to the department’s main office. How long will it take before I bump into a wall at one of the ends?

(Visitors are advised to not walk blindly near the ongoing construction.)

Formally, this is a random walk: I start at the midpoint of interval $[-n,n]$ and move to the left or right with probability $1/2$. The walk ends when I reach either $n$ or $-n$. What is the probability that the walk will last exactly $k$ steps?

The answer (found in the earlier post) is given by the coefficient of $y^k$ in the power series for $\mathcal T_n(1/y)^{-1}$, where $\mathcal T_n$ is the Chebyshev polynomial of degree $n$. The plot below shows the exit time distribution for $n=10$, with an extraneous but nice pattern.

In a comment, Lianxin He pointed out an alternative approach via reflection principle: it gives the answer as a sum of binomial coefficients, which can be terminated to yield a simple and effective approximation. I describe his approach below.

To begin with, assume that $k+n$ is even, otherwise the probability is zero. Out of $2^k$ possible walks, $\displaystyle \binom{k}{(k+n)/2}$ make $n$ more steps to the right than to the left. Dividing this binomial coefficient by $2^k$ and multiplying by two (to account for both exits), we conclude that $\displaystyle p_0(k)=2^{1-k} \binom{k}{(k+n)/2}$ is the probability of reaching an end of the hallway at time $k$. The graph of $p_0$ is in red below:

This is not good at all. We neglected to exclude the walks that hit a wall before time $k$. How many of them are there? Let’s consider only the walks that end at $n$ at times $k$. Formally, $S_k=n$ where $S_j$ denotes the position at time $j$.

1. The walks that hit $-n$ before time $k$ can be reflected about $-n$ the first time they get there. The reflection puts them in bijection with the walks such that $S_k=-3n$. There are $\displaystyle \binom{k}{(k+3n)/2}$ of those.
2. If a walk hits $n$ before time $k$, we flip a fair coin deciding whether to reflect it about $n$. As a result of our intervention, such walks will have $S_{k-1}=n-1$ or $S_{k-1}=n+1$ with equal probability. But on the other hand, our intervention did not change the distribution of walks. Hence, the number of walks hitting $n$ before time $k$ is twice the number of walks with $S_{k-1}=n+1$. Namely, $\displaystyle 2\binom{k-1}{(k+n)/2}$.

Excluding both types 1) and 2) counted above, we arrive at the new probability estimate

$\displaystyle p_1(k)=2^{1-k} \left\{\binom{k}{(k+n)/2}-2\binom{k-1}{(k+n)/2}-\binom{k}{(k+3n)/2}\right\}$

Let’s see how it works…

Looks very good at the beginning, but for large $k$ we are seriously undercounting. Indeed, the events 1) and 2) described above are not mutually exclusive. For example, a walk could hit $n$, then $-n$, and then arrive to $n$ again at time $k$. At present we are excluding such a walk twice. Let’s fix this by adding them back.

• The walks that hit $n$ and then $-n$ before time $k$ can be reflected twice, putting them in bijection with the walks such that $S_k=5n$. There are $\displaystyle \binom{k}{(k+5n)/2}$ of those.
• If a walk hits $-n$ and then $n$ before time $k$, we reflect at $-n$ and then flip a coin when the walk reaches $-3n$. Following the reasoning in 2) above, the count is $\displaystyle 2\binom{k-1}{(k+3n)/2}$.

Our new estimate, then, is $\displaystyle p_2(k)=p_1(k)+2^{1-k} \left\{2\binom{k-1}{(k+3n)/2} + \binom{k}{(k+5n)/2}\right\}$. Here it is.

Looks good! But of course now we overcount the walks that hit the walls three times before time $k$. Fortunately, the pattern is now clear: the next approximation will be $\displaystyle p_3(k)=p_2(k)-2^{1-k} \left\{ 2\binom{k-1}{(k+5n)/2} + \binom{k}{(k+7n)/2}\right\}$.

You already can’t tell this approximation from the exact distribution. To summarize (pun intended), the exact probability of exiting $(-n,n)$ at time $k$ is

$\displaystyle p(k)=2^{1-k}\left\{\binom{k}{(k+n)/2} +\sum_{r=1}^\infty (-1)^r \left[2\binom{k-1}{(k+(2r-1)n)/2}+\binom{k}{(k+(2r+1)n)/2}\right] \right\}$

where the series is actually finite.

## An exercise in walking randomly

Suppose I randomly and blindly walk back and forth along this hallway, which may or may not happen in reality. The safest place to start is in the middle, next to the department’s main office. How long will it take before I bump into a wall at one of the ends?

Formally, let $X_1,X_2,\dots$ be independent random variables which take values $\pm 1$ with equal probability $1/2$. These are the steps of the symmetric random walk. Starting from $S_0=0$, after $n$ steps I end up at the point $S_n=X_1+\dots +X_n$. The sequence $(S_n)$ is a martingale, being the sum of independent random variables with zero mean. Less obviously, for every number $\theta\in\mathbb R$ the sequence $Z_n=\exp(\theta S_n-n\log \cosh \theta)$ is also a martingale, even though the increments $d_n=Z_n-Z_{n-1}$ are not independent of each other. The reason for $\log \cosh\theta$ is that $E(e^{\theta X_n})=\cosh\theta$. It is convenient to write $s=\log\cosh\theta$ in what follows. We check the martingale property of $Z_n=e^{\theta S_n-ns}$ this by conditioning on some value $S_{n-1}=\alpha$ and computing $E(Z_n | S_{n-1}=\alpha)=e^{\theta\alpha-(n-1)s}=Z_{n-1}$.

If our hallway is the interval $[-N,N]$, we should investigate the stopping time $T=\min\{n\ge 1\colon |S_n|\ge N\}$. The optional stopping time theorem applies to $Z_T$ because $|Z_n|\le e^{|\theta| N}$ whenever $n\le T$. Thus, $E(Z_T)=E(Z_0)=1$. In terms of $S_n$ this means $E e^{\theta S_T-sT}=1$.

Now, $S_T$ is either $N$ or $-N$; both happen with $P=1/2$ and in both cases $T$ has the same distribution. Therefore, $1=E e^{\theta S_T-sT}=\frac{1}{2}(e^{N\theta}+e^{-N\theta}) E e^{-sT}$. We end up with the Laplace transform of $T$,

$\displaystyle E e^{-sT}=\frac{1}{\cosh N\theta} = \frac{1}{\cosh N\, \mathrm{arccosh}\, e^{s}}, \qquad s\ge 0$

(I would not want to write $\cosh^{-1}$ in such a formula.) Since N is an integer and $e^{s}\ge 1$, a magical identity applies:

$\displaystyle \cosh N \,\mathrm{arccosh}\,x = \mathcal T_N(x)\qquad x\ge 1$,

$\mathcal T_N$ being the Nth Чебышёв polynomial of the first kind. (Compare to $\displaystyle \cos N \,\mathrm{arccos}\,x = \mathcal T_N(x)$ which holds for $|x|\le 1$.) Thus,

$\displaystyle E e^{-sT}=\frac{1}{\mathcal T_N (e^{s})}, \qquad s\ge 0$

One more change of notation: let $e^{-s}=y$, so that $E e^{-sT}=E(y^T)=\sum_{n=1}^{\infty} P(T=n)\,y^n$. The formula

$\displaystyle \sum_{n=1}^{\infty} P(T=n)\,y^n=\frac{1}{\mathcal T_N (1/y)}$

gives us the probabilities $P(T=n)$ for each $n$, assuming we can expand the reciprocal of the Чебышёв polynomial into a power series. Let’s check two simple cases:

• if $N=1$, then $\mathcal T_1(x)=x$ and $\frac{1}{\mathcal T_1 (1/y)}=y$. Hence $T\equiv 1$ which is correct: the walk immediately hits one of the walls $\pm 1$.
• if $N=2$, then $\mathcal T_1(x)=2x^2-1$, hence $\displaystyle \frac{1}{\mathcal T_2 (1/y)}=\frac{y^2}{2}\frac{1}{1-y^2/2}=\sum_{k=1}^\infty \frac{1}{2^k} y^{2k}$. This means $T$ can never be odd (which makes sense, since it must always have the parity of $N$), and the probability of hitting a wall in exactly $2k$ steps is $1/2^k$. A moment’s reflection confirms that the answer is right.

For a serious example, with $N=10$, I used Maple:

 with(plots): n:=150: ser:=series(1/ChebyshevT(10, 1/x),x=0,2*n+1): for i from 1 to 2*n do a[i]:=coeff(ser, x^i) end do: listplot([seq(a[i],i=1..2*n)], color=blue, thickness=2); 

By default listplot connects the dots, and since every other term is zero, the plot has a trippy pattern.

Obviously, it’s impossible to hit a wall in fewer than $N$ steps. The formula confirms this: writing $1/\mathcal T_N (1/y)$ a rational function with numerator $y^N$, we see that no terms $y^n$ with $n can appear in the expansion.

(This post is an exercise which I should have done long ago but never did. Thanks are due to Ted Cox for a walk-through.)