The standard normal probability density function has inflection points at where which is about 60.5% of the maximum of this function. For this, as for other bell-shaped curves, the inflection points are also the points of steepest incline.

This is good to know for drawing an accurate sketch of this function, but in general, the Gaussian curve may be scaled differently, like , and then the inflection points will be elsewhere. However, their relative height is invariant under scaling: it is always 60.5% of the maximum height of the curve. Since it is the height that we focus on, let us normalize various bell-shaped curves to have maximum :

So, the Gaussian curve is inflected at the relative height of . For the Cauchy density the inflection is noticeably higher, at of the maximum:

Another popular bell shape, hyperbolic secant or simply , is in between with inflection height . It is slightly unexpected to see an algebraic number arising from this transcendental function.

Can we get inflection height below ? One candidate is with large even , but this does not work: the relative height of inflection is . Shown here for :

However, increasing the power of in the Gaussian curve works: for example, has inflection at relative height :

More generally, the relative height of inflection for is for even . As , this approaches . Can we go lower?

Well, there are compactly supported bump functions which look bell-shaped, for example for . Normalizing the height to makes it . For this function inflection occurs at relative height about .

Once again, we can replace by an arbitrary positive even integer and get relative inflection height down to . As increases, this height decreases to where is the golden ratio. This is less than which is low enough for me today. The smallest for which the height is less than is : it achieves inflection at .

In the opposite direction, it is easy to produce bell-shaped curve with high inflection points: either or will do, where is slightly larger than . But these examples are only once differentiable, unlike the infinitely smooth examples above. Aside: as , the latter function converges to the (rescaled) density of the Laplace distribution and the former to a non-integrable function.

As for the middle between two extremes… I did not find a reasonable bell-shaped curve that inflects at exactly half of its maximal height. An artificial example is with but this is ugly and only smooth.

Suppose we have real numbers and want to find the sum of all distances over . Why? Maybe because over five years ago, the gradient flow of this quantity was used for "clustering by collision" (part 1, part 2, part 3).

If I have a Python console open, the problem appears to be solved with one line:

>>> 0.5 * np.abs(np.subtract.outer(x, x)).sum()

where the outer difference of x with x creates a matrix of all differences , then absolute values are taken, and then they are all added up. Double-counted, hence the factor of 0.5.

But trying this with, say, one million numbers is not likely to work. If each number takes 8 bytes of memory (64 bits, double precision), then the array x is still pretty small (under 8 MB) but a million-by-million matrix will require over 7 terabytes, and I won’t have that kind of RAM anytime soon.

In principle, one could run a loop adding these values, or store the matrix on a hard drive. Both are going to take forever.

There is a much better way, though. First, sort the numbers in nondecreasing order; this does not require much time or memory (compared to quadratic memory cost of forming a matrix). Then consider the partial sums ; the cost of computing them is linear in time and memory. For each fixed , the sum of distances to with is simply , or, equivalently, . So, all we have to do is add these up. Still one line of code (after sorting), but a much faster one:

For example, x could be a sample from some continuous distribution. Assuming the distribution has a mean (i.e., is not too heavy tailed), the sum of all pairwise distances grows quadratically with n, and its average approaches a finite limit. For the uniform distribution on [0, 1] the computation shows this limit is 1/3. For the standard normal distribution it is 1.128… which is not as recognizable a number.

As , the average distance of a sample taken from a distribution converges to the expected value of |X-Y| where X, Y are two independent variables with that distribution. Let’s express this in terms of the probability density function and the cumulative distribution function . By symmetry, we can integrate over and double the result:

Integrate by parts in the second integral: , and the boundary terms are zero.

Integrate by parts in the other integral, throwing the derivative onto the indefinite integral and thus eliminating it. There is a boundary term this time.

Since , this simplifies nicely:

This is a lot neater than I expected: is simply the integral of . I don’t often see CDF squared, like here. Some examples: for the uniform distribution on [0,1] we get

and for the standard normal, with , it is

The trick with sorting and cumulative sums can also be used to find, for every point , the sum (or average) of distances to all other points. To do this, we don’t sum over but must also add for . The latter sum is simply where is the total sum. So, all we need is

Unfortunately, the analogous problems for vector-valued sequences are not as easy. If the Manhattan metric is used, we can do the computations for each coordinate separately, and add the results. For the Euclidean metric…

Given a sequence of numbers of length one may want to look for evidence of its periodic behavior. One way to do this is by computing autocorrelation, the correlation of the sequence with a shift of itself. Here is one reasonable way to do so: for lag values compute the correlation coefficient of with . That the lag does not exceed ensures the entire sequence participates in the computation, so we are not making a conclusion about its periodicity after comparing a handful of terms at the beginning and the end. In other words, we are not going to detect periodicity if the period is more than half of the observed time period.

Having obtained the correlation coefficients, pick one with the largest absolute value; call it R. How large does R have to be in order for us to conclude the correlation is not a fluke? The answer depends on the distribution of our data, but an experiment can be used to get some idea of likelihood of large R.

I picked independently from the standard normal distribution, and computed as above. After 5 million trials with a sequence of length 100, the distribution of R was as follows:

Based on this experiment, the probability of obtaining |R| greater than 0.5 is less than 0.0016. So, 0.5 is pretty solid evidence. The probability of is two orders of magnitude less, etc. Also, |R| is unlikely to be very close to zero unless the data is structured in some strange way. Some kind of correlation ought to be present in the white noise.

Aside: it’s not easy to construct perfectly non-autocorrelated sequences for the above test. For length 5 an example is 1,2,3,2,3. Indeed, (1,2,3,2) is uncorrelated with (2,3,2,3) and (1,2,3) is uncorrelated with (3,2,3). For length 6 and more I can’t construct these without filling them with a bunch of zeros.

Repeating the experiment with sequences of length 1000 shows a tighter distribution of R: now |R| is unlikely to be above 0.2. So, if a universal threshold is to be used here, we need to adjust R based on sequence length.

I did not look hard for statistical studies of this subject, resorting to an experiment. Experimentally obtained p-values are pretty consistent for the criterion . The number of trials was not very large (10000) so there is some fluctuation, but the pattern is clear.
Â

Length, L

P(L^{0.45}|R| > 4)

100

0.002

300

0.0028

500

0.0022

700

0.0028

900

0.0034

1100

0.0036

1300

0.0039

1500

0.003

1700

0.003

1900

0.0042

2100

0.003

2300

0.0036

2500

0.0042

2700

0.0032

2900

0.0043

3100

0.0042

3300

0.0025

3500

0.0031

3700

0.0027

3900

0.0042

Naturally, all this depends on the assumption of independent normal variables.

And this is the approach I took to computing r in Python:

import numpy as np
n = 1000
x = np.random.normal(size=(n,))
acorr = np.correlate(x, x, mode='same')
acorr = acorr[n//2+1:]/(x.var()*np.arange(n-1, n//2, -1))
r = acorr[np.abs(acorr).argmax()]

In a comment to Recursive randomness of integers Rahul pointed out a continuous version of the same problem: pick uniformly in , then uniformly in , then uniformly in , etc. What is the distribution of the sum ?

The continuous version turns out to be easier to analyse. To begin with, it’s equivalent to picking uniformly distributed, independent and letting . Then the sum is

which can be written as

So, is a stationary point of the random process where is uniformly distributed in . Simply put, and have the same distribution. This yields the value of in a much simpler way than in the previous post:

hence .

We also get an equation for the cumulative distribution function . Indeed,

The latter probability is . Conclusion: . Differentiate to get an equation for the probability density function , namely . It’s convenient to change the variable of integration to , which leads to

Looks pretty simple, doesn’t it? Since the density is zero for negative arguments, it is constant on . This constant, which I’ll denote , is , or simply . I couldn’t get an analytic formula for . My attempt was where are the moments of . The moments can be computed recursively using , which yields

The first few moments, starting with , are 1, 1, 3/2, 17/6, 19/3, 81/5, 8351/80… Unfortunately the series diverges, so this approach seems doomed. Numerically which is not far from the Euler-Mascheroni constant, hence the choice of notation.

On the interval (1,2) we have , hence

for .

The DDE gets harder to integrate after that… on the interval the solution already involves the dilogarithm (Spence’s function):

To generate a sample from distribution S, I begin with a bunch of zeros and repeat “add 1, multiply by U[0,1]” many times. That’s it.

import numpy as np
import matplotlib.pyplot as plt
trials = 10000000
terms = 10000
x = np.zeros(shape=(trials,))
for _ in range(terms):
np.multiply(x+1, np.random.uniform(size=(trials,)), out=x)
_ = plt.hist(x, bins=1000, normed=True)
plt.show()

I still want to know the exact value of … after all, it’s also the probability that the sum of our random decreasing sequence is less than 1.

Update

The constant I called “” is in fact where is indeed Euler’s constant… This is what I learned from the Inverse Symbolic Calculator after solving the DDE (with initial value 1) numerically, and calculating the integral of the solution. From there, it did not take long to find that

The p.d.f. of S is the Dickman function, normalized to have integral 1. Wikipedia plots it on log scale, but Wolfram Mathworld has a plot identical to the above. Neither one mention the sum of random decreasing sequence.

The problem and its solution go back to 1973 paper “A probabilistic approach to differential-difference equations arising in analytic number theory” by J.-M-F. Chamayou, cited in the survey Euler’s constant: Euler’s work and modern developments by J. C. Lagarias.

Oh well. At least I practiced solving delay differential equations in Python. There is no built-in method in SciPy for that, and although there are some modules for DDE out there, I decided to roll my own. The logic is straightforward: solve the ODE on an interval of length 1, then build an interpolating spline out of the numeric solution and use it as the right hand side in the ODE, repeat. I used Romberg’s method for integrating the solution; the integration is done separately on each interval [k, k+1] because of the lack of smoothness at the integers.

import numpy as np
from scipy.integrate import odeint, romb
from scipy.interpolate import interp1d
numpoints = 2**12 + 1
solution = [lambda x: 1]
integrals = [1]
for k in range(1, 15):
y0 = solution[k-1](k)
t = np.linspace(k, k+1, numpoints)
rhs = lambda y, x: -solution[k-1](np.clip(x-1, k-1, k))/x
y = odeint(rhs, y0, t, atol=1e-15, rtol=1e-13).squeeze()
solution.append(interp1d(t, y, kind='cubic', assume_sorted=True))
integrals.append(romb(y, dx=1/(numpoints-1)))
total_integral = sum(integrals)
print("{:.15f}".format(1/total_integral))

As a byproduct, the program found the probabilities of the random sum being in each integer interval:

Entering a string such as “random number 0 to 7” into Google search brings up a neat random number generator. For now, it supports only uniform probability distributions over integers. That’s still enough to play a little game.

Pick a positive number, such as 7. Then pick a number at random between 0 and 7 (integers, with equal probability); for example, 5. Then pick a number between 0 and 5, perhaps 2… repeat indefinitely. When we reach 0, the game becomes really boring, so that is a good place to stop. Ignoring the initial non-random number, we got a random non-increasing sequence such as 5, 2, 1, 1, 0. The sum of this one is 9… how are these sums distributed?

Let’s call the initial number A and the sum S. The simplest case is A=1, when S is the number of returns to 1 until the process hits 0. Since each return to 1 has probability 1/2, we get the following geometric distribution

Sum

Probability

0

1/2

1

1/4

2

1/8

3

1/16

k

1/2^{k+1}

When starting with A=2, things are already more complicated: for one thing, the probability mass function is no longer decreasing, with P[S=2] being greater than P[S=1]. The histogram shows the counts obtained after 2,000,000 trials with A=2.

The probability mass function is still not too hard to compute: let’s say b is the number of times the process arrives at 2, then the sum is 2b + the result withÂ A=1. So we end upÂ convolving two geometric distributions, one of which is supported onÂ even integers: hence theÂ bias toward even sums.

Sum

Probability

0

1/3

1

1/6

2

5/36

3

7/72

k

((4/3)^{[k/2]+1}-1)/2^{k}

For large k, the ratioÂ P[S=k+2]/P[s=k] is asymptotic to (4/3)/4 = 1/3, which means that the tail of the distribution is approximately geometric with the ratio of .

I did not feel like computing exact distribution for larger A, resorting to simulations. Here is A=10 (ignore the little bump at the end, an artifactÂ of truncation):Â

There are three distinct features: P[S=0] is much higher than the rest; the distribution is flat (with a bias toward even, which is diminishing) until about S=n, and after that it looks geometric. Let’s see what we can say for a general starting value A.

Perhaps surprisingly, the expected value E[S] is exactly A. To see this, consider that we are dealing with a Markov chain with states 0,1,…,A. The transition probabilitiesÂ from nÂ to any number 0,…,nÂ are 1/(n+1). Ignoring the terminal state 0, which does not contribute to the sum, we get the following kind of transition matrix (the case A=4 shown):

The initial state is a vector such as . So is the state after j steps. The expected value contributed by the j-th step is where is the weight vector. So, the expected value of the sum is

It turns out that the matrix has a simple form, strongly resembling M itself.

Left multiplication by v extracts the bottom row of this matrix, and we are left with a dot product of the form . Neat.

What else can we say? The median is less than A, which is no surprise given the long tail on the right. Also, P[S=0] =Â 1/(A+1) since the only way to have zero sum is to hit 0 at once. A more interesting question is: what is the limit of the distribution of T = S/A as A tends to infinity? Here is the histogram of 2,000,000 trials with A=50.

It looks like the distribution of T tends to a limit, which has constant density until 1 (so, until A before rescaling) and decays exponentially after that. Writing the supposed probability density function as for , for , and using the fact that the expected value of T is 1, we arrive at and . This is a pretty good approximation in some aspects: the median of this distribution is , suggesting that the median of S is around which is in reasonable agreement with experiment. But the histogram for A=1000 still has a significant deviation from the exponential curve, indicating that the supposedly geometric part of T isn’t really geometric:

One can express S as a sum of several independent geometric random variables, but the number of summands growsÂ quadratically in A, and I didn’t get any useful asymptotics from this. What is the true limiting distribution of S/A, if it’s not the red curve above?

Pick two random numbers from the interval ; independent, uniformly distributed. Normalize them to have mean zero, which simply means subtracting their mean from each. Repeat many times. Plot the histogram of all numbers obtained in the process.

No surprise here. In effect this is the distribution of with independent and uniformly distributed over . The probability density function of is found via convolution, and the convolution of with itself is a triangular function.

Repeat the same with four numbers , again subtracting the mean. Now the distribution looks vaguely bell-shaped.

With ten numbers or more, the distribution is not so bell-shaped anymore: the top is too flat.

The mean now follows an approximately normal distribution, but the fact that it’s subtracted from uniformly distributed amounts to convolving the Gaussian with . Hence the flattened top.

What if we use the median instead of the mean? With two numbers there is no difference: the median is the same as the mean. With four there is.

That’s an odd-looking distribution, with convex curves on both sides of a pointy maximum. And with points it becomes even more strange.

Scilab code:

k = 10
A = rand(200000,k)
A = A - median(A,'c').*.ones(1,k)
histplot(100,A(:))

If one picks two real numbers from the interval (independent, uniformly distributed), their sum has the triangular distribution.

The sum of three such numbers has a differentiable probability density function:

And the density of is smoother still: the p.d.f. has two
continuous derivatives.

As the number of summands increases, these distributions converge to normal if they are translated and scaled properly. But I am not going to do that. Let’s keep the number of summands to four at most.

The p.d.f. of is a piecewise polynomial of degree . Indeed, for the density is piecewise constant, and the formula

provides the inductive step.

For each , the translated copies of function form a partition of unity:

The integral recurrence relation gives an easy proof of this:

And here is the picture for the quadratic case:

A partition of unity can be used to approximate functions by piecewise polynomials: just multiply each partition element by the value of the function at the center of the corresponding interval, and add the results.

Doing this with amounts to piecewise linear interpolation: the original function is in blue, the weighted sum of hat functions in red.

With we get a smooth curve.

Unlike interpolating splines, this curve does not attempt to pass through the given points exactly. However, it has several advantages over interpolating splines:

Is easier to calculate; no linear system to solve;

To generate a random number uniformly distributed on the interval , one can keep tossing a fair coin, record the outcomes as an infinite sequence of 0s and 1s, and let . Here is a histogram of samples from the uniform distribution… nothing to see here, except maybe an incidental interference pattern.

Let’s note that where has the same distribution as itself, and is independent of . This has an implication for the (constant) probability density function of :

because is the p.d.f. of and is the p.d.f. of . Simply put, is equal to the convolution of the rescaled function with the discrete measure .

Let’s iterate the above construction by letting each be uniformly distributed on instead of being constrained to the endpoints. This is like tossing a “continuous fair coin”. Here is a histogram of samples of ; predictably, with more averaging the numbers gravitate toward the middle.

This is not a normal distribution; the top is too flat. The plot was made with this Scilab code, putting n samples put into b buckets:

n = 1e6
b = 200
z = zeros(1,n)
for i = 1:10
z = z + rand(1,n)/2^i
end
c = histplot(b,z)

If this plot too jagged, look at the cumulative distribution function instead:

It took just more line of code: plot(linspace(0,1,b),cumsum(c)/sum(c))

Compare the two plots: the c.d.f. looks very similar to the left half of the p.d.f. It turns out, they are identical up to scaling.

Let’s see what is going on here. As before, where has the same distribution as itself, and the summands are independent. But now that is uniform, the implication for the p.d.f of is different:

This is a direct relation between and its antiderivative. Incidentally, if shows that is infinitely differentiable because the right hand side always has one more derivative than the left hand side.

To state the self-similarity property of in the cleanest way possible, one introduces the cumulative distribution function (the Fabius function) and extends it beyond by alternating even and odd reflections across the right endpoint. The resulting function satisfies the delay-differential equation : the derivative is a rescaled copy of the function itself.

Since vanishes at the even integers, it follows that at every dyadic rational, all but finitely many derivatives of are zero. The Taylor expansion at such points is a polynomial, while itself is not. Thus, is nowhere analytic despite being everywhere .

This was, in fact, the motivation for J. Fabius to introduce this construction in 1966 paper Probabilistic Example of a Nowhere Analytic -Function.