Entering a string such as “random number 0 to 7” into Google search brings up a neat random number generator. For now, it supports only uniform probability distributions over integers. That’s still enough to play a little game.

Pick a positive number, such as 7. Then pick a number at random between 0 and 7 (integers, with equal probability); for example, 5. Then pick a number between 0 and 5, perhaps 2… repeat indefinitely. When we reach 0, the game becomes really boring, so that is a good place to stop. Ignoring the initial non-random number, we got a random non-increasing sequence such as 5, 2, 1, 1, 0. The sum of this one is 9… how are these sums distributed?

Let’s call the initial number A and the sum S. The simplest case is A=1, when S is the number of returns to 1 until the process hits 0. Since each return to 1 has probability 1/2, we get the following geometric distribution

Sum | Probability |

0 | 1/2 |

1 | 1/4 |

2 | 1/8 |

3 | 1/16 |

k | 1/2^{k+1} |

When starting with A=2, things are already more complicated: for one thing, the probability mass function is no longer decreasing, with P[S=2] being greater than P[S=1]. The histogram shows the counts obtained after 2,000,000 trials with A=2.

The probability mass function is still not too hard to compute: let’s say b is the number of times the process arrives at 2, then the sum is 2b + the result with A=1. So we end up convolving two geometric distributions, one of which is supported on even integers: hence the bias toward even sums.

Sum | Probability |

0 | 1/3 |

1 | 1/6 |

2 | 5/36 |

3 | 7/72 |

k | ((4/3)^{[k/2]+1}-1)/2^{k} |

For large k, the ratio P[S=k+2]/P[s=k] is asymptotic to (4/3)/4 = 1/3, which means that the tail of the distribution is approximately geometric with the ratio of .

I did not feel like computing exact distribution for larger A, resorting to simulations. Here is A=10 (ignore the little bump at the end, an artifact of truncation):

There are three distinct features: P[S=0] is much higher than the rest; the distribution is flat (with a bias toward even, which is diminishing) until about S=n, and after that it looks geometric. Let’s see what we can say for a general starting value A.

Perhaps surprisingly, the expected value E[S] is exactly A. To see this, consider that we are dealing with a Markov chain with states 0,1,…,A. The transition probabilities from n to any number 0,…,n are 1/(n+1). Ignoring the terminal state 0, which does not contribute to the sum, we get the following kind of transition matrix (the case A=4 shown):

The initial state is a vector such as . So is the state after j steps. The expected value contributed by the j-th step is where is the weight vector. So, the expected value of the sum is

It turns out that the matrix has a simple form, strongly resembling M itself.

Left multiplication by v extracts the bottom row of this matrix, and we are left with a dot product of the form . Neat.

What else can we say? The median is less than A, which is no surprise given the long tail on the right. Also, P[S=0] = 1/(A+1) since the only way to have zero sum is to hit 0 at once. A more interesting question is: what is the limit of the distribution of T = S/A as A tends to infinity? Here is the histogram of 2,000,000 trials with A=50.

It looks like the distribution of T tends to a limit, which has constant density until 1 (so, until A before rescaling) and decays exponentially after that. Writing the supposed probability density function as for , for , and using the fact that the expected value of T is 1, we arrive at and . This is a pretty good approximation in some aspects: the median of this distribution is , suggesting that the median of S is around which is in reasonable agreement with experiment. But the histogram for A=1000 still has a significant deviation from the exponential curve, indicating that the supposedly geometric part of T isn’t really geometric:

One can express S as a sum of several independent geometric random variables, but the number of summands grows quadratically in A, and I didn’t get any useful asymptotics from this. What is the true limiting distribution of S/A, if it’s not the red curve above?