Inflection points of bell shaped curves

The standard normal probability density function {f(x) = \exp(-x^2/2)/\sqrt{2\pi}} has inflection points at {x = \pm 1} where {y=\exp(-1/2) /\sqrt{2\pi}\approx 0.24} which is about 60.5% of the maximum of this function. For this, as for other bell-shaped curves, the inflection points are also the points of steepest incline.

standard_gaussian
Standard Gaussian with inflection points marked in red

This is good to know for drawing an accurate sketch of this function, but in general, the Gaussian curve may be scaled differently, like {A\exp(-B x^2)}, and then the inflection points will be elsewhere. However, their relative height is invariant under scaling: it is always 60.5% of the maximum height of the curve. Since it is the height that we focus on, let us normalize various bell-shaped curves to have maximum {1}:

height1_gaussian
Gaussian rescaled to height 1

So, the Gaussian curve is inflected at the relative height of {\exp(-1/2)\approx 0.605}. For the Cauchy density {1/(x^2+1)} the inflection is noticeably higher, at {3/4 = 0.75} of the maximum:

Cauchy
Cauchy distribution

Another popular bell shape, hyperbolic secant or simply {2/(e^x+e^{-x})}, is in between with inflection height {1/\sqrt{2}\approx 0.707}. It is slightly unexpected to see an algebraic number arising from this transcendental function.

sech
sech is such a cool name for a function

Can we get inflection height below {1/2}? One candidate is {1/(x^n+1)} with large even {n}, but this does not work: the relative height of inflection is {\displaystyle \frac{n+1}{2n} > \frac{1}{2}}. Shown here for {n=10}:

pow10
A smooth trapezoid

However, increasing the power of {x} in the Gaussian curve works: for example, {\exp(-x^4)} has inflection at relative height {\exp(-3/4) \approx 0.47}:

exp4
Does this distribution have a name?

More generally, the relative height of inflection for {\exp(-x^n)} is {\exp(1/n - 1)} for even {n}. As {n\to \infty}, this approaches {1/e\approx 0.37}. Can we go lower?

Well, there are compactly supported bump functions which look bell-shaped, for example {\exp(1/(x^2-1))} for {|x|<1}. Normalizing the height to {1} makes it {\exp(x^2/(x^2-1))}. For this function inflection occurs at relative height about {0.255}.

bump1
This one actually looks like a bell

Once again, we can replace {2} by an arbitrary positive even integer {n} and get relative inflection height down to {\displaystyle\exp\left(-\left(1+\sqrt{5-4/n}\right)/2\right)}. As {n} increases, this height decreases to {\displaystyle e^{-\varphi}} where {\varphi} is the golden ratio. This is less than {0.2} which is low enough for me today. The smallest {n} for which the height is less than {0.2} is {n=54}: it achieves inflection at {y\approx 0.1999}.

bump54
A tough one for derivative-based numerical solvers

In the opposite direction, it is easy to produce bell-shaped curve with high inflection points: either {1/(|x|^p + 1)} or {\exp(-|x|^p)} will do, where {p} is slightly larger than {1}. But these examples are only once differentiable, unlike the infinitely smooth examples above. Aside: as {p\to 1}, the latter function converges to the (rescaled) density of the Laplace distribution and the former to a non-integrable function.

As for the middle between two extremes… I did not find a reasonable bell-shaped curve that inflects at exactly half of its maximal height. An artificial example is {\exp(-|x|^p)} with {p=1/(1-\log 2) \approx 3.26} but this is ugly and only {C^3} smooth.

The sum of pairwise distances and the square of CDF

Suppose we have {n} real numbers {x_0,\dots, x_{n-1}} and want to find the sum of all distances {|x_j-x_k|} over {j < k}. Why? Maybe because over five years ago, the gradient flow of this quantity was used for "clustering by collision" (part 1, part 2, part 3).

If I have a Python console open, the problem appears to be solved with one line:

>>> 0.5 * np.abs(np.subtract.outer(x, x)).sum()

where the outer difference of x with x creates a matrix of all differences {x_i-x_j}, then absolute values are taken, and then they are all added up. Double-counted, hence the factor of 0.5.

But trying this with, say, one million numbers is not likely to work. If each number takes 8 bytes of memory (64 bits, double precision), then the array x is still pretty small (under 8 MB) but a million-by-million matrix will require over 7 terabytes, and I won’t have that kind of RAM anytime soon.

In principle, one could run a loop adding these values, or store the matrix on a hard drive. Both are going to take forever.

There is a much better way, though. First, sort the numbers in nondecreasing order; this does not require much time or memory (compared to quadratic memory cost of forming a matrix). Then consider the partial sums {s_k = x_0+\dots+x_k}; the cost of computing them is linear in time and memory. For each fixed {k}, the sum of distances to {x_j} with {j<k} is simply {kx_k - s_{k-1}}, or, equivalently, {(k+1)x_k - s_k}. So, all we have to do is add these up. Still one line of code (after sorting), but a much faster one:

>>> x.sort()
>>> (np.arange(1, n+1)*x - np.cumsum(x)).sum()

For example, x could be a sample from some continuous distribution. Assuming the distribution has a mean (i.e., is not too heavy tailed), the sum of all pairwise distances grows quadratically with n, and its average approaches a finite limit. For the uniform distribution on [0, 1] the computation shows this limit is 1/3. For the standard normal distribution it is 1.128… which is not as recognizable a number.


As {n\to \infty}, the average distance of a sample taken from a distribution converges to the expected value of |X-Y| where X, Y are two independent variables with that distribution. Let’s express this in terms of the probability density function {p} and the cumulative distribution function {\Phi}. By symmetry, we can integrate over {x> y} and double the result:

{\displaystyle \frac12 E|X-Y| = \int_{-\infty}^\infty p(x)\,dx \int_{-\infty}^x (x-y) p(y)\,dy}

Integrate by parts in the second integral: {p(y) = \Phi'(y)}, and the boundary terms are zero.

{\displaystyle \frac12 E|X-Y| = \int_{-\infty}^\infty p(x)\,dx \int_{-\infty}^x \Phi(y)\,dy}

Integrate by parts in the other integral, throwing the derivative onto the indefinite integral and thus eliminating it. There is a boundary term this time.

{\displaystyle \frac12 E|X-Y| = \Phi(\infty) \int_{-\infty}^\infty \Phi(y)\,dy - \int_{-\infty}^\infty \Phi(x)^2\,dx}

Since {\Phi(\infty) = 1}, this simplifies nicely:

{\displaystyle \frac12 E|X-Y| = \int_{-\infty}^\infty \Phi(x) (1-\Phi(x))\,dx}

This is a lot neater than I expected: {E|X-Y|} is simply the integral of {2\Phi(1-\Phi)}. I don’t often see CDF squared, like here. Some examples: for the uniform distribution on [0,1] we get

{\displaystyle E|X-Y| = \int_0^1 2x(1-x)\,dx = \frac13}

and for the standard normal, with {\Phi(x) = (1+\mathrm{erf}\,(x/\sqrt{2}))/2}, it is

{\displaystyle \int_{-\infty}^\infty \frac12 \left(1-\mathrm{erf}\,(x/\sqrt{2}) ^2 \right)\,dx = \frac{2}{\sqrt{\pi}}\approx 1.12838\ldots }


The trick with sorting and cumulative sums can also be used to find, for every point {x_k}, the sum (or average) of distances to all other points. To do this, we don’t sum over {k} but must also add {|x_j-x_k|} for {j>k}. The latter sum is simply {S - s_k - (n-k-1)x_k} where {S} is the total sum. So, all we need is

>>> (2*np.arange(1,n+1)-n)*x - 2*np.cumsum(x) + x.sum()

Unfortunately, the analogous problems for vector-valued sequences are not as easy. If the Manhattan metric is used, we can do the computations for each coordinate separately, and add the results. For the Euclidean metric…

Kolakoski sequence II

The computational evidence in my previous post on the Kolakoski sequence was meagre: I looked only at the first 500 terms. By definition, the Kolakoski sequence consists of 1s and 2s, begins with 1, and is self-similar in the following sense: replacing each run by its length, you recover the original sequence. Here is its beginning, with separating spaces omitted within each run.

1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 11 2 11 2 1 22 1 22 11 2 1 22 1 2 11 2 11 22 1 22 11 2 1 22 1 22 11 2 11 2 1 22 1 2 11 22 1 22 11 2 1 22 1 22 11 2 11 22 1 2 11 2 1 22 1 22 11 2 11 2 1 22 11 2 11 22 1 2 11 2 11 22 1 22 11 2 1 22 1 2 11 22 1 22 1 2 11 2 11 22 1 22 11 …

Conjecturally, the density of 1s and of 2s in this sequence is 50%. And indeed, the number of 2s among the first N terms grows in a way that’s hard to distinguish from N/2 on a plot. However, the density looks quite different if one looks at three subsequences separately:

  • a_{3k+1} (red subsequence)
  • a_{3k+2} (green subsequence)
  • a_{3k+3} (blue subsequence)

At the very beginning (among the first 500 terms) the red sequence is dominated by 1s while the blue is dominated by 2s. But the story changes quickly. The plot below shows the count of 2s in each subsequence, minus the linear prediction. The average of red, green, and blue is shown in black.

200000 terms, separated into 3 subsequences

The nearly flawless behavior of the overall count of 2s hides the wild oscillation within the three subsequences. Presumably, they continue moving back and forth and the amplitude of their oscillation appears to be growing. However, the growth looks sublinear, which means that even within each subsequence, the asymptotic density of 2s might converge to 1/2.

The computation was done in scilab. For simplicity I replaced 1s and 2s by 0s ans 1s, so that the length of the jth run is a_j+1. The variable rep indicates whether the next term should be the same as its predecessor; j keeps the count of runs.

N=200000; a=zeros(N); j=1; rep=0;
for i=2:N
if (rep==1) a(i)=a(i-1); rep=0;
else a(i)=1-a(i-1); j=j+1; rep=a(j);
end
end

Pretty neat, isn’t it? The rest of the code is less neat, though. I still don’t know of any good way to calculate running totals in scilab.

M=floor(N/3)+1; s=zeros(M,4);
for i=0:M-2
for j=1:3
s(i+2,j)=s(i+1,j)+a(3*i+j)
end
s(i+2,4)=i/2;
end
plot(s(:,1)-s(:,4),'r'); plot(s(:,2)-s(:,4),'g'); plot(s(:,3)-s(:,4),'b');
plot((s(:,1)+s(:,2)+s(:,3))/3-s(:,4),'k');

Bonus content: I extended the computations to N=1663500 (yeah, should have increased the stack size beforehand).

1663500 terms