Relating integers by differences of reciprocals

Let’s say that two positive integers m, n are compatible if the difference of their reciprocal is the reciprocal of an integer: that is, mn/(m-n) is an integer. For example, 2 is compatible with 1, 3, 4, and 6 but not with 5. Compatibility is a symmetric relation, which we’ll denote {m\sim n} even though it’s not an equivalence. Here is a chart of this relation, with red dots indicating compatibility.

Compatibility of integers up to 200


A few natural questions arise, for any given {n}:

  1. What is the greatest number compatible with {n}?
  2. What is the smallest number compatible with {n}?
  3. How many numbers are compatible with {n}?

Before answering them, let’s observe that {m\sim n} if and only if {|m-n|} is a product of two divisors of {n}. Indeed, for {mn/(m-n)} to be an integer, we must be able to write {|n-m|} as the product of a divisor of {m} and a divisor of {n}. But a common divisor of {m} and {m-n} is also a divisor of {n}.

Of course, “a product of two divisors of {n}” is the same as “a divisor of {n^2}“. But it’s sometimes easier to think in terms of the former.

Question 1 is now easy to answer: the greatest number compatible with {n} is {n+n^2 = n(n+1)}.

But there is no such an easy answer to Questions 2 and 3, because of the possibility of overshooting into negative when subtracting a divisor of {n^2} from {n}. The answer to Question 2 is {n-d} where {d} is the greatest divisor of {n^2} than is less than {n}. This is the OEIS sequence A063428, pictured below.

The smallest compatible number for n

The lines are numbers with few divisors: for a prime p, the smallest compatible number is p-1, while for 2p it is p, etc.

The answer to Question 3 is: the number of distinct divisors of {n^2}, plus the number of such divisors that are less than {n}. This is the OEIS sequence A146564.


Since consecutive integers are compatible, every number {n} is a part of a compatible chain {1\sim \cdots\sim n}. How to build a short chain like this?

Strategy A: starting with {n}, take the smallest integer compatible with the previous one. This is sure to reach 1. But in general, this greedy algorithm is not optimal. For n=22 it yields 22, 11, 10, 5, 4, 2, 1 but there is a shorter chain: 22, 18, 6, 2, 1.

Strategy B: starting with {1}, write the greatest integer compatible with the previous one (which is {k(k+1)} by the above). This initially results in the OEIS sequence A007018: 1, 2, 6, 42, 1806, … which is notable, among other things, for giving a constructive proof of the infinitude of primes, even with a (very weak) lower density bound. Eventually we have to stop adding {k^2} to {k} every time; so instead add the greatest divisor of {k^2} such that the sum does not exceed {n}. For 22, this yields the optimal chain 1, 2, 6, 18, 22 stated above. But for 20 strategy B yields 1, 2, 6, 18, 20 while the shortest chain is 1, 2, 4, 20. Being greedy is not optimal, in either up or down direction.

Strategy C is not optimal either, but it is explicit and provides a simple upper bound on the length of a shortest chain. It uses the expansion of n in the factorial number system which is the sum of factorials k! with coefficients less than k. For example {67 = 2\cdot 4! + 3\cdot 3! + 0\cdot 2! + 1\cdot 1!}, so its factorial representation is 2301.

If n is written as abcd in the factorial system, then the following is a compatible chain leading to n (possibly with repetitions), as is easy to check:

1, 10, 100, 1000, 1000, a000, ab00, abc0, abcd

In the example with 67, this chain is

1, 10, 100, 1000, 2000, 2300, 2300, 2301

in factorial notation, which converts to decimals as

1, 2, 6, 24, 48, 66, 66, 67

The repeated 66 (due to 0 in 2301) should be removed.

Thus, for an even number s, there is a chain of length at most s leading to any integer that is less than (s/2+1)!.

As a consequence, the smallest possible length of chain leading to n is {O(\log n/\log \log n)}. Is this the best possible O-estimate?

All of the strategies described above produce monotone chains, but the shortest chain is generally not monotone. For example, 17 can be reached by the non-monotone chain 1, 2, 6, 18, 17 of length 5 but any monotone chain will have length at least 6.

The smallest-chain-length sequence is 1, 2, 3, 3, 4, 3, 4, 4, 4, 4, 5, 4, 5, 5, 4, 5, 5, 4, 5, 4, 5, 5, 5, 4, 5, … which is not in OEIS.  Here is its plot:

Smallest chain length for integers from 1 to 1000

The sequence 1, 2, 3, 5, 11, 29, 67, 283, 2467,… lists the smallest numbers that require a chain of given length — for example, 11 is the first number that requires a chain of length 5, etc. Not in OEIS; the rate of growth is unclear.

Need for speed vs bounded position and acceleration

You are driving a car with maximal acceleration (and deceleration) A on a road that’s been blocked off in both directions (or, if you prefer, on the landing strip of an aircraft carrier). Let L be the length of the road available to you.
What is the maximal speed you can reach?

Besides A and L, the answer also depends on your mood: do you want to live, or are you willing to go out in a blaze of glory? In the latter case the answer is obvious: position the car at one end of the interval, and put the pedal to the metal. The car will cover the distance L within the time {\sqrt{2L/A}}, reaching the speed {v=\sqrt{2AL}} at the end. In the former scenario one has to switch to the brake pedal midway through the distance, so the maximal speed will be attained at half the length, {\sqrt{AL}}.

Rephrased in mathematical terms: if {f} is a twice differentiable function and {M_k = \sup|f^{(k)}|} for {k=0,1,2}, then {M_1^2 \le 4M_0M_2} if {f} is defined on a half-infinite interval, and {M_1^2 \le 2M_0M_2} if the domain of {f} is the entire line. To connect the notation, just put {L=2M_0} and {A=M_1} in the previous paragraph… and I guess some proof other than “this is obvious” is called for, but it’s not hard to find one: this is problem 5.15 in Rudin’s Principles of Mathematical Analysis.

Perhaps more interesting is to study the problem in higher dimensions: one could be driving in a parking lot of some shape, etc. Let’s normalize the maximal acceleration as 1, keeping in mind it’s a vector. Given a set E, let S(E) be the square of maximal speed attainable by a unit-acceleration vehicle which stays in E indefinitely. Also let U(E) be the square of maximal speed one can attain while crashing out of bounds after the record is set. Squaring makes these quantities scale linearly with the size of the set. Both are monotone with respect to set inclusion. And we know what they are for an interval of length L: namely, {S = L} and {U=2L}, so that gives some lower bounds for sets that contain a line interval.

When E is a circle of radius 1, the best we can do is to drive along it with constant speed 1; then the centripetal acceleration is also 1. Any higher speed will exceed the allowable acceleration in the normal direction, never mind the tangential one. So, for a circle both S and U are equal to its radius.

On the other hand, if E is a disk of radius R, then driving along its diameter is better: it gives {S\ge 2R} and {U\ge 4R}.

Some questions:

  1. If E is a convex set of diameter D, is it true that {S(E) = D} and {U(E) = 2D}?
  2. Is it true that {U\le 2S} in general?
  3. How to express S and U for a smooth closed curve in terms of its curvature? They are not necessarily equal (like they are for a circle): consider thin ellipses converging to a line segment, for which S and U approach the corresponding values for that segment.

The answer to Question 1 is yes. Consider the orthogonal projection of E, and of a trajectory it contains, onto some line L. This does not increase the diameter or the acceleration; thus, the one-dimensional result implies that the projection of velocity vector onto L does not exceed {\sqrt{D}} (or {\sqrt{2D}} for the crashing-out version). Since L was arbitrary, it follows that {S(E) \le D} and {U(E) \le 2D}. These upper bounds hold for general sets, not only convex ones. But when E is convex, we get matching lower bounds by considering the longest segment contained in E.

I don’t have answers to questions 2 and 3.

Lightness, hyperspace, and lower oscillation bounds

When does a map {f\colon X\to Y} admit a lower “anti-continuity” bound like {d_Y(f(a),f(b))\ge \lambda(d_X(a,b))} for some function {\lambda\colon (0,\infty)\to (0, \infty)} and for all {a\ne b}? The answer is easy: {f} must be injective and its inverse must be uniformly continuous. End of story.

But recalling what happened with diameters of connected sets last time, let’s focus on the inequality {\textrm{diam}\, f(E)\ge \lambda (\textrm{diam}\, E)} for connected subsets {E\subset X}. If such {\lambda} exists, the map f has the LOB property, for “lower oscillation bound” (oscillation being the diameter of image). The LOB property does not require {f} to be injective. On the real line, {f(x)=|x|} satisfies it with {\lambda(\delta)=\delta/2}: since it simply folds the line, the worst that can happen to the diameter of an interval is to be halved. Similarly, {f(x)=x^2} admits a lower oscillation bound {\lambda(\delta) = (\delta/2)^2}. This one decays faster than linear at 0, indicating some amount of squeezing going on. One may check that every polynomial has the LOB property as well.

On the other hand, the exponential function {f(x)=e^x} does not have the LOB property, since {\textrm{diam}\, f([x,x+1])} tends to {0} as {x\to-\infty}. No surprise there; we know from the relation of continuity and uniform continuity that things like that happen on a non-compact domain.

Also, a function that is constant on some nontrivial connected set will obviously fail LOB. In topology, a mapping is called light if the preimage of every point is totally disconnected, which is exactly the same as not being constant on any nontrivial connected set. So, lightness is necessary for LOB, but not sufficient as {e^x} shows.

Theorem 1: Every continuous light map {f\colon X\to Y} with compact domain {X} admits a lower oscillation bound.

Proof. Suppose not. Then there exists {\epsilon>0} and a sequence of connected subsets {E_n\subset X} such that {\textrm{diam}\, E_n\ge \epsilon} and {\textrm{diam}\, f(E_n)\to 0}. We can assume {E_n} compact, otherwise replace it with its closure {\overline{E_n}} which we can because {f(\overline{E_n})\subset \overline{f(E_n)}}.

The space of nonempty compact subsets of {X} is called the hyperspace of {X}; when equipped with the Hausdorff metric, it becomes a compact metric space itself. Pass to a convergent subsequence, still denoted {\{E_n\}}. Its limit {E} has diameter at least {\epsilon}, because diameter is a continuous function on the hyperspace. Finally, using the uniform continuity of {f} we get {\textrm{diam}\, f(E) = \lim \textrm{diam}\, f(E_n) = 0}, contradicting the lightness of {f}. {\quad \Box}

A homeomorphism lacking LOB

Here is another example to demonstrate the importance of compactness (not just boundedness) and continuity: on the domain {X = \{(x,y)\colon 0 < x < 1, 0 < y < 1\}} define {f(x,y)=(x,xy)}. This is a homeomorphism, the inverse being {(u,v)\mapsto (u, v/u)}. Yet it fails LOB because the image of line segment {\{x\}\times (0,1)} has diameter {x}, which can be arbitrarily close to 0. So, the lack of compactness hurts. Extending {f} to the closed square in a discontinuous way, say by letting it be the identity map on the boundary, we see that continuity is also needed, although it’s slightly non-intuitive that one needs continuity (essentially an upper oscillation bound) to estimate oscillation from below.

All that said, on a bounded interval of real line we need neither compactness nor continuity.

Theorem 2: If {I\subset \mathbb R} is a bounded interval, then every light map {f\colon I\to Y} admits a lower oscillation bound.

Proof. Following the proof of Theorem 1, consider a sequence of intervals {(a_n, b_n)} such that {b_n-a_n\ge \epsilon} and {\textrm{diam}\, f((a_n,b_n))\to 0}. There is no loss of generality in considering open intervals, since it can only make the diameter of the image smaller. Also WLOG, suppose {a_n\to a} and {b_n\to b}; this uses the boundedness of {I}. Consider a nontrivial closed interval {[c,d]\subset (a,b)}. For all sufficiently large {n} we have {[c,d]\subset (a_n,b_n)}, which implies {\textrm{diam}\, f([c,d])\le \textrm{diam}\, f((a_n,b_n))\to 0}. Thus {f} is constant on {[c,d]}, a contradiction. {\quad \Box}

The property that distinguishes real line here is that nontrivial connected sets have nonempty interior. The same works on the circle and various tree-like spaces, but fails for spaces that don’t look one-dimensional.

Continuity and diameters of connected sets

The definition of uniform continuity (if it’s done right) can be phrased as: {f\colon X\to Y} is uniformly continuous if there exists a function {\omega\colon (0,\infty)\to (0,\infty)}, with {\omega(0+)=0}, such that {\textrm{diam}\, f(E)\le \omega (\textrm{diam}\, E)} for every set {E\subset X}. Indeed, when {E} is a two-point set {\{a,b\}} this is the same as {|f(a)-f(b)|\le \omega(|a-b|)}, the modulus of continuity. Allowing general sets {E} does not change anything, since the diameter is determined by two-point subsets.

Does it make a difference if we ask for {\textrm{diam}\, f(E)\le \omega (\textrm{diam}\, E)} only for connected sets {E}? For functions defined on the real line, or on an interval of the line, there is no difference: we can just consider the intervals {[a,b]} and obtain

{|f(a)-f(b)|\le \textrm{diam}\, f([a,b]) \le \omega(|a-b|)}

as before.

However, the situation does change for maps defined on a non-convex domain. Consider the principal branch of square root, {f(z)=\sqrt{z}}, defined on the slit plane {G=\mathbb C\setminus (-\infty, 0]}.

Conformal map of a slit domain is not uniformly continuous

This function is continuous on {G} but not uniformly continuous, since {f(-1 \pm i y) \to \pm i } as {y\to 0+}. Yet, it satisfies {\textrm{diam}\, f(E)\le \omega(\textrm{diam}\, E)} for connected subsets {E\subset G}, where one can take {\omega(\delta)=2\sqrt{\delta}}. I won’t do the estimates; let’s just note that although the points {-1 \pm i y} are close to each other, any connected subset of {G} containing both of them has diameter greater than 1.

These points are far apart with respect to the inner diameter metric

In a way, this is still uniform continuity, just with respect to a different metric. Given a metric space {(X,d)}, one can define inner diameter metric {\rho} on {X} by letting {\rho(a,b)} be the infimum of diameters of connected sets that contain both {a} and {b}. This is indeed a metric if the space {X} is reasonable enough (i.e., any two points are contained in some bounded connected set). On a convex subset of {\mathbb R^n}, the inner diameter metric coincides with the Euclidean metric {d_2}.

One might think that the equality {\rho=d_e} should imply that the domain is convex, but this is not so. Indeed, consider the union of three quadrants on a plane, say {A = \{(x,y) \colon x > 0\text{ or }y > 0\}}. Any two points of {A} can be connected by going up from whichever is lower, and then moving horizontally. The diameter of a right triangle is equal to its hypotenuse, which is the Euclidean distance between the points we started with.

A non-convex domain where inner diameter metric is the same as Euclidean

Inner diameter metric comes up (often implicitly) in complex analysis. By the Riemann mapping theorem, every simply connected domain {G\subset \mathbb C}, other than {\mathbb C} itself, admits a conformal map {f\colon G\to \mathbb D} onto the unit disk {D}. This map need not be uniformly continuous in the Euclidean metric (the slit plane is one example), but it is uniformly continuous with respect to the inner diameter metric on {G}.

Furthermore, by normalizing the situation in a natural way (say, {G \supset \mathbb D} and {f(0)=0}), one can obtain a uniform modulus of continuity for all conformal maps {f} onto the unit disk, whatever the domain is. This uniform modulus of continuity can be taken of the form {\omega(\delta) = C\sqrt{\delta}} for some universal constant {C}. Informally speaking, this means that a slit domain is the worst that can happen to the continuity of a conformal map. This fact isn’t often mentioned in complex analysis books. A proof can be found in the book Conformally Invariant Processes in the Plane by Gregory Lawler, Proposition 3.85. A more elementary proof, with a rougher estimate for the modulus of continuity, is on page 15 of lecture notes by Mario Bonk.

Green envelopes

Green functions of the Laplacian are bounded in one dimensional domains, a reflection of the fact that square integrability of the derivative implies boundedness in one dimension but not in two or more. So, it’s possible to say something about the envelope {\sup_y g(x,y)} obtained by taking the supremum over source location y.

In order to satisfy the distributional equation {-\frac{d^2}{dx^2}g(x,y) = \delta_{x=y}}, Green functions have a corner at y, where the derivative jumps down by 1. Combining this property with the boundary conditions identifies the function. Let’s use the interval (0,1) as the domain.

The Dirichlet boundary condition {g(0,y) = 0 = g(1,y)} leads to the formula {g(x,y)=\min(x,y)-xy}:


The upper envelope is the parabola {x(1-x)}. This fact becomes more evident if we use more functions.

Dirichlet envelope

Mixed Dirichlet-Neumann conditions {g(0,y) = 0 = g_x'(1,y)} yield an even simpler formula {g(x,y) = \min(x,y)}. The envelope is boring in this case:

Dirichlet-Neumann envelope

The pure Neumann condition {g_x'(0,y) = 0 = g_x'(1,y)} requires an adjustment to the definition of Green function, since it is incompatible with {-\frac{d^2}{dx^2}g(x,y) = \delta_{x=y}}. Instead, let’s ask for {-\frac{d^2}{dx^2}g(x,y) = \delta_{x=y} - c} where the constant c is the reciprocal of the length of the interval. This ensures that the integral of the second derivative is zero. Also, it is more convenient to use the interval (-1,1) here. The Neumann Green function for this interval is {g(x,y) = (x^2+y^2)/4 - |x-y|/2}, up to an additive constant.

Neumann envelope

So the envelope is also determined up to a constant. On the picture above it is {x^2/2}.

Finally, let’s consider the periodic condition {g(0,y) = g(1,y)}, {g_x'(0,y) = g_x'(1,y)}. As with the Neumann condition, the definition is adjusted to allow the second derivative integral be zero. Using the interval (-1, 1) we get {g(x,y) = (1+x\wedge y)(1-x\vee y)/2 + (x^2+y^2)/4} using the notation {x\wedge y = \min(x,y)} and {x\vee y = \max(x,y)} for brevity. Note that the first term, {(1+x\wedge y)(1-x\vee y)/2}, is the Dirichlet Green function for this interval. The additional quadratic term brings the integral of the second derivative to zero, which ensures the periodic condition for the first derivative. And the upper envelope is, up to a constant…

Periodic envelope

a constant. Not exciting but makes perfect sense, considering that we are dealing with the Green function of a circle, where moving the source amounts to shifting the function.

Recursive randomness of reals: summing a random decreasing sequence

In a comment to Recursive randomness of integers Rahul pointed out a continuous version of the same problem: pick {x_0} uniformly in {[0,1]}, then {x_1} uniformly in {[0,x_0]}, then {x_2} uniformly in {[0, x_1]}, etc. What is the distribution of the sum {S=\sum_{n=0}^\infty x_n}?

The continuous version turns out to be easier to analyse. To begin with, it’s equivalent to picking uniformly distributed, independent {y_n\in [0,1]} and letting {x_n = y_0y_1\cdots y_n}. Then the sum is

{\displaystyle y_0+y_0y_1 + y_0y_1y_2 + \cdots }

which can be written as

{\displaystyle y_0(1+y_1(1+y_2(1+\cdots )))}

So, {S} is a stationary point of the random process {X\mapsto (X+1)U} where {U} is uniformly distributed in {[0,1]}. Simply put, {S} and {(S+1)U} have the same distribution. This yields the value of {E[S]} in a much simpler way than in the previous post:

{E[S]=E[(S+1)U] = (E[S] + 1) E[U] = (E[S] + 1)/2}

hence {E[S]=1}.

We also get an equation for the cumulative distribution function {C(t) = P[S\le t]}. Indeed,

{\displaystyle P[S\le t] = P[(S+1)U \le t] = P[S \le t/U-1]}

The latter probability is {\int_0^1 P[S\le t/u-1]\,du = \int_0^1 C(t/u-1)\,du}. Conclusion: {C(t) = \int_0^1 C(t/u-1)\,du}. Differentiate to get an equation for the probability density function {p(t)}, namely {p(t) = \int_0^1 p(t/u-1)\,du/u}. It’s convenient to change the variable of integration to {v = t/u-1}, which leads to

{\displaystyle p(t) = \int_{t-1}^\infty p(v)\,\frac{dv}{v+1}}

Another differentiation turns the integral equation into a delay differential equation,

{\displaystyle p'(t) = - \frac{p(t-1)}{t}}

Looks pretty simple, doesn’t it? Since the density is zero for negative arguments, it is constant on {[0,1]}. This constant, which I’ll denote {\gamma}, is {\int_{0}^\infty p(v)\,\frac{dv}{v+1}}, or simply {E[1/(S+1)]}. I couldn’t get an analytic formula for {\gamma}. My attempt was {E[1/(S+1)] = \sum_{n=0}^\infty (-1)^n M_n} where {M_n=E[S^n]} are the moments of {S}. The moments can be computed recursively using {E[S^n] = E[(S+1)^n]E[U^n]}, which yields

{\displaystyle M_n=\frac{1}{n} \sum_{k=0}^{n-1} \binom{n}{k}M_k}

The first few moments, starting with {M_0}, are 1, 1, 3/2, 17/6, 19/3, 81/5, 8351/80… Unfortunately the series {\sum_{n=0}^\infty (-1)^n M_n} diverges, so this approach seems doomed. Numerically {\gamma \approx 0.5614} which is not far from the Euler-Mascheroni constant, hence the choice of notation.

On the interval (1,2) we have {p'(t) = -\gamma/t}, hence

{p(t) = \gamma(1-\log t)} for {1 \le t \le 2}.

The DDE gets harder to integrate after that… on the interval {[2,3]} the solution already involves the dilogarithm (Spence’s function):

{\displaystyle p(t) = \gamma(1+\pi^2/12 - \log t + \log(t-1)\log t + \mathrm{Spence}\,(t))}

following SciPy’s convention for Spence. This is as far as I went… but here is an experimental confirmation of the formulas obtained so far (link to full size image).

Histogram of 10 million samples and the pdf (red) plotted on [0,3]

To generate a sample from distribution S, I begin with a bunch of zeros and repeat “add 1, multiply by U[0,1]” many times. That’s it.

import numpy as np
import matplotlib.pyplot as plt
trials = 10000000
terms = 10000
x = np.zeros(shape=(trials,))
for _ in range(terms):
    np.multiply(x+1, np.random.uniform(size=(trials,)), out=x)
_ = plt.hist(x, bins=1000, normed=True)

I still want to know the exact value of {\gamma}… after all, it’s also the probability that the sum of our random decreasing sequence is less than 1.


The constant I called “{\gamma}” is in fact {\exp(-\gamma)} where {\gamma} is indeed Euler’s constant… This is what I learned from the Inverse Symbolic Calculator after solving the DDE (with initial value 1) numerically, and calculating the integral of the solution. From there, it did not take long to find that

Oh well. At least I practiced solving delay differential equations in Python. There is no built-in method in SciPy for that, and although there are some modules for DDE out there, I decided to roll my own. The logic is straightforward: solve the ODE on an interval of length 1, then build an interpolating spline out of the numeric solution and use it as the right hand side in the ODE, repeat. I used Romberg’s method for integrating the solution; the integration is done separately on each interval [k, k+1] because of the lack of smoothness at the integers.

import numpy as np
from scipy.integrate import odeint, romb
from scipy.interpolate import interp1d
numpoints = 2**12 + 1
solution = [lambda x: 1]
integrals = [1]
for k in range(1, 15):
    y0 = solution[k-1](k)
    t = np.linspace(k, k+1, numpoints)
    rhs = lambda y, x: -solution[k-1](np.clip(x-1, k-1, k))/x
    y = odeint(rhs, y0, t, atol=1e-15, rtol=1e-13).squeeze()
    solution.append(interp1d(t, y, kind='cubic', assume_sorted=True))
    integrals.append(romb(y, dx=1/(numpoints-1)))
total_integral = sum(integrals)

As a byproduct, the program found the probabilities of the random sum being in each integer interval:

  • 56.15% in [0,1]
  • 34.46% in [1,2]
  • 8.19% in [2,3]
  • 1.1% in [3,4]
  • 0.1% in [4,5]
  • less than 0.01% chance of being greater than 5

Recursive randomness of integers

Entering a string such as “random number 0 to 7” into Google search brings up a neat random number generator. For now, it supports only uniform probability distributions over integers. That’s still enough to play a little game.

Pick a positive number, such as 7. Then pick a number at random between 0 and 7 (integers, with equal probability); for example, 5. Then pick a number between 0 and 5, perhaps 2… repeat indefinitely. When we reach 0, the game becomes really boring, so that is a good place to stop. Ignoring the initial non-random number, we got a random non-increasing sequence such as 5, 2, 1, 1, 0. The sum of this one is 9… how are these sums distributed?

Let’s call the initial number A and the sum S. The simplest case is A=1, when S is the number of returns to 1 until the process hits 0. Since each return to 1 has probability 1/2, we get the following geometric distribution


Sum Probability
0 1/2
1 1/4
2 1/8
3 1/16
k 1/2k+1

When starting with A=2, things are already more complicated: for one thing, the probability mass function is no longer decreasing, with P[S=2] being greater than P[S=1]. The histogram shows the counts obtained after 2,000,000 trials with A=2.


The probability mass function is still not too hard to compute: let’s say b is the number of times the process arrives at 2, then the sum is 2b + the result with A=1. So we end up convolving two geometric distributions, one of which is supported on even integers: hence the bias toward even sums.

Sum Probability
0 1/3
1 1/6
2 5/36
3 7/72
k ((4/3)[k/2]+1-1)/2k

For large k, the ratio P[S=k+2]/P[s=k] is asymptotic to (4/3)/4 = 1/3, which means that the tail of the distribution is approximately geometric with the ratio of {1/\sqrt{3}}.

I did not feel like computing exact distribution for larger A, resorting to simulations. Here is A=10 (ignore the little bump at the end, an artifact of truncation): n10t2m

There are three distinct features: P[S=0] is much higher than the rest; the distribution is flat (with a bias toward even, which is diminishing) until about S=n, and after that it looks geometric. Let’s see what we can say for a general starting value A.

Perhaps surprisingly, the expected value E[S] is exactly A. To see this, consider that we are dealing with a Markov chain with states 0,1,…,A. The transition probabilities from n to any number 0,…,n are 1/(n+1). Ignoring the terminal state 0, which does not contribute to the sum, we get the following kind of transition matrix (the case A=4 shown):

{\displaystyle M = \begin{pmatrix}1/2 & 0 & 0 & 0 \\ 1/3 & 1/3 & 0 & 0 \\ 1/4 & 1/4 & 1/4 & 0 \\ 1/5 & 1/5 & 1/5 & 1/5\end{pmatrix} }

The initial state is a vector such as {v = (0,0,0,1)}. So {vM^j} is the state after j steps. The expected value contributed by the j-th step is {vM^jw} where {w = (1,2,3,4)^T} is the weight vector. So, the expected value of the sum is

{\displaystyle \sum_{j=1}^\infty vM^jw = v\left(\sum_{j=1}^\infty M^j\right)w = vM(I-M)^{-1}w}

It turns out that the matrix {M(I-M)^{-1}} has a simple form, strongly resembling M itself.

{\displaystyle M(I-M)^{-1} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1/2 & 0 & 0 \\ 1 & 1/2 & 1/3 & 0 \\ 1 & 1/2 & 1/3 & 1/4 \end{pmatrix} }

Left multiplication by v extracts the bottom row of this matrix, and we are left with a dot product of the form {(1,1/2,1/3,1/4)\cdot (1,2,3,4) = 1 + 1 + 1 + 1 = 4 }. Neat.

What else can we say? The median is less than A, which is no surprise given the long tail on the right. Also, P[S=0] = 1/(A+1) since the only way to have zero sum is to hit 0 at once. A more interesting question is: what is the limit of the distribution of T = S/A as A tends to infinity? Here is the histogram of 2,000,000 trials with A=50.


It looks like the distribution of T tends to a limit, which has constant density until 1 (so, until A before rescaling) and decays exponentially after that. Writing the supposed probability density function as {f(t) = c} for {0\le t\le 1}, {f(t) = c\exp(k(1-t))} for {t > 1}, and using the fact that the expected value of T is 1, we arrive at {c = 2-\sqrt{2} \approx 0.586} and {k=\sqrt{2}}. This is a pretty good approximation in some aspects: the median of this distribution is {1/(2c)}, suggesting that the median of S is around {n/(4-2\sqrt{2})} which is in reasonable agreement with experiment. But the histogram for A=1000 still has a significant deviation from the exponential curve, indicating that the supposedly geometric part of T isn’t really geometric:

The distribution of S with A=1000 versus the constant-geometric approximation

One can express S as a sum of several independent geometric random variables, but the number of summands grows quadratically in A, and I didn’t get any useful asymptotics from this. What is the true limiting distribution of S/A, if it’s not the red curve above?