# Recursive randomness of integers

Entering a string such as “random number 0 to 7” into Google search brings up a neat random number generator. For now, it supports only uniform probability distributions over integers. That’s still enough to play a little game.

Pick a positive number, such as 7. Then pick a number at random between 0 and 7 (integers, with equal probability); for example, 5. Then pick a number between 0 and 5, perhaps 2… repeat indefinitely. When we reach 0, the game becomes really boring, so that is a good place to stop. Ignoring the initial non-random number, we got a random non-increasing sequence such as 5, 2, 1, 1, 0. The sum of this one is 9… how are these sums distributed?

Let’s call the initial number A and the sum S. The simplest case is A=1, when S is the number of returns to 1 until the process hits 0. Since each return to 1 has probability 1/2, we get the following geometric distribution Sum Probability 0 1/2 1 1/4 2 1/8 3 1/16 k 1/2k+1

When starting with A=2, things are already more complicated: for one thing, the probability mass function is no longer decreasing, with P[S=2] being greater than P[S=1]. The histogram shows the counts obtained after 2,000,000 trials with A=2. The probability mass function is still not too hard to compute: let’s say b is the number of times the process arrives at 2, then the sum is 2b + the result with A=1. So we end up convolving two geometric distributions, one of which is supported on even integers: hence the bias toward even sums.

 Sum Probability 0 1/3 1 1/6 2 5/36 3 7/72 k ((4/3)[k/2]+1-1)/2k

For large k, the ratio P[S=k+2]/P[s=k] is asymptotic to (4/3)/4 = 1/3, which means that the tail of the distribution is approximately geometric with the ratio of ${1/\sqrt{3}}$.

I did not feel like computing exact distribution for larger A, resorting to simulations. Here is A=10 (ignore the little bump at the end, an artifact of truncation): There are three distinct features: P[S=0] is much higher than the rest; the distribution is flat (with a bias toward even, which is diminishing) until about S=n, and after that it looks geometric. Let’s see what we can say for a general starting value A.

Perhaps surprisingly, the expected value E[S] is exactly A. To see this, consider that we are dealing with a Markov chain with states 0,1,…,A. The transition probabilities from n to any number 0,…,n are 1/(n+1). Ignoring the terminal state 0, which does not contribute to the sum, we get the following kind of transition matrix (the case A=4 shown): $M = \begin{pmatrix}1/2 & 0 & 0 & 0 \\ 1/3 & 1/3 & 0 & 0 \\ 1/4 & 1/4 & 1/4 & 0 \\ 1/5 & 1/5 & 1/5 & 1/5\end{pmatrix}$

The initial state is a vector such as ${v = (0,0,0,1)}$. So ${vM^j}$ is the state after j steps. The expected value contributed by the j-th step is ${vM^jw}$ where ${w = (1,2,3,4)^T}$ is the weight vector. So, the expected value of the sum is $\sum_{j=1}^\infty vM^jw = v\left(\sum_{j=1}^\infty M^j\right)w = vM(I-M)^{-1}w$

It turns out that the matrix ${M(I-M)^{-1}}$ has a simple form, strongly resembling M itself. $M(I-M)^{-1} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1/2 & 0 & 0 \\ 1 & 1/2 & 1/3 & 0 \\ 1 & 1/2 & 1/3 & 1/4 \end{pmatrix}$

Left multiplication by v extracts the bottom row of this matrix, and we are left with a dot product of the form ${(1,1/2,1/3,1/4)\cdot (1,2,3,4) = 1 + 1 + 1 + 1 = 4 }$. Neat.

What else can we say? The median is less than A, which is no surprise given the long tail on the right. Also, P[S=0] = 1/(A+1) since the only way to have zero sum is to hit 0 at once. A more interesting question is: what is the limit of the distribution of T = S/A as A tends to infinity? Here is the histogram of 2,000,000 trials with A=50. It looks like the distribution of T tends to a limit, which has constant density until 1 (so, until A before rescaling) and decays exponentially after that. Writing the supposed probability density function as ${f(t) = c}$ for ${0\le t\le 1}$, ${f(t) = c\exp(k(1-t))}$ for ${t > 1}$, and using the fact that the expected value of T is 1, we arrive at ${c = 2-\sqrt{2} \approx 0.586}$ and ${k=\sqrt{2}}$. This is a pretty good approximation in some aspects: the median of this distribution is ${1/(2c)}$, suggesting that the median of S is around ${n/(4-2\sqrt{2})}$ which is in reasonable agreement with experiment. But the histogram for A=1000 still has a significant deviation from the exponential curve, indicating that the supposedly geometric part of T isn’t really geometric: