Laplacian spectrum of small graphs

This is a collection of entirely unoriginal remarks about Laplacian spectrum of graphs. For an accessible overview of the subject I recommend the M.S. thesis The Laplacian Spectrum of Graphs by Michael William Newman. It also includes a large table of graphs with their spectra. Here I will avoid introducing matrices and enumerating vertices.

Let {V} be the vertex set of a graph. Write {u\sim v} if {u, v} are adjacent vertices. Given a function {f\colon V\to \mathbb R}, define {L f(v) = \sum_{u\colon u\sim v}(f(v)-f(u))}.
This is a linear operator (the graph Laplacian) on the Euclidean space {\ell^2(V)} of all functions {f\colon V\to \mathbb R} with the norm {\|f\|^2 = \sum_{v\in V} f(v)^2}. It is symmetric: {\langle L f, g\rangle = \langle f, L g\rangle } and positive semidefinite: {\langle L f, f\rangle = \frac12 \sum_{u\sim v}(f(u)-f(v))^2\ge 0}. Since equality is attained for constant {f}, 0 is always an eigenvalue of {L}.

This is the standard setup, but I prefer to change things a little and replace {\ell^2(V)} by the smaller space {\ell^2_0(V)} of functions with zero mean: {\sum_{v\in V}f(v)=0}. Indeed, {L} maps {\ell^2(V)} to {\ell^2_0(V)} anyway, and since it kills the constants, it makes sense to focus on {\ell^2_0(V)}. It is a vector space of dimension {n-1} where {n=|V|}.

One advantage is that the smallest eigenvalue is 0 if and only if the graph is disconnected: indeed, {\langle L f, f\rangle=0} is equivalent to {f} being constant on each connected component. We also gain better symmetry between {L} and the Laplacian of the graph complement, denoted {L'}. Indeed, since {L' f(v) = \sum_{u\colon u\not \sim v}(f(v)-f(u))}, it follows that {(L+L')f(v) = \sum_{u\colon u\ne v} (f(v)-f(u)) = n f(v)} for every {f\in \ell^2_0(V)}. So, the identity {L+L' = nI} holds on {\ell^2_0(V)} (it does not hold on {\ell^2(V)}). Hence the eigenvalues of {L'} are obtained by subtracting the eigenvalues of {L} from {n}. As a corollary, the largest eigenvalue of {L} is at most {n}, with equality if and only if the graph complement is disconnected. More precisely, the multiplicity of eigenvalue {n} is one less than the number of connected components of the graph complement.

Let {D} denote the diameter of the graph. Then the number of distinct Laplacian eigenvalues is at least {D}. Indeed, let {u, v} be two vertices at distance {D} from each other. Define {f_0(u) = 1} and {f_0=0} elsewhere. Also let {f_k=L^k f_0} for {k=1, 2, \dots}. Note that {f_k\in \ell_0^2(V)} for all {k\ge 1}. One can prove by induction that {f_k(w)=0} when the distance from {w} to {u} is greater than {k}, and {(-1)^k f_k(w) > 0} when the distance from {w} to {u} is equal to {k}. In particular, {f_k(v) = 0} when {k<D} and {f_D(v)\ne 0}. This shows that {f_D} is not a linear combination of {f_1, \dots, f_{D-1}}. Since {f_k = L^{k-1}f_1}, it follows that {L^{D-1}} is not a linear combination of {L^0, L^1, \dots, L^{D-2}}. Hence the minimal polynomial of {L} has degree at least {D}, which implies the claim.

Let’s consider a few examples of connected graphs.

3 vertices

There are two connected graphs: the 3-path (D=2) and the 3-cycle (D=1). In both cases we get D distinct eigenvalues. The spectra are [1, 3] and [3, 3], respectively.

4 vertices

  • One graph of diameter 3, the path. Its spectrum is {[2-\sqrt{2}, 2, 2+\sqrt{2}]}.
  • One graph of diameter 1, the complete graph. Its spectrum is {[4, 4, 4]}. This pattern continues for other complete graphs: since the complement is the empty graph ({n} components), all {n-1} eigenvalues are equal to {n}.
  • Four graphs of diameter 2, which are shown below, with each caption being the spectrum.
4-0
1, 1, 4
4-2
1, 3, 4
4-3
2, 2, 4
4-4
2, 4, 4

Remarks:

  • The graph [1, 3, 4] has more distinct eigenvalues than its diameter.
  • The graph [2, 2, 4] is regular (all vertices have the same degree).
  • The smallest eigenvalue of graphs [1, 1, 4] and [2, 2, 4] is multiple, due to the graphs having a large group of automorphisms (here rotations); applying some of these automorphisms to an eigenfunctions for the smallest eigenvalue yields another eigenfunction.
  • [1, 3, 4] and [2, 4, 4] also have automorphisms, but their automorphisms preserve the eigenfunction for the lowest eigenvalue, up to a constant factor.

5 vertices

  • One graph of diameter 4, the path. Its spectrum is related to the golden ratio: it consists of {(3\pm \sqrt{5})/2, (5\pm \sqrt{5})/2}.
  • One graph of diameter 1, the complete one: [5, 5, 5, 5]
  • Five graphs of diameter 3. All have connected complement, with the highest eigenvalue strictly between 4 and 5. None are regular. Each has 4 distinct eigenvalues.
  • 14 graphs of diameter 2. Some of these are noted below.

Two have connected complement, so their eigenvalues are less than 5 (spectrum shown on hover):

One has both integers and non-integers in its spectrum, the smallest such graph. Its eigenvalues are {3\pm \sqrt{2}, 3, 5}.

5-15
1.585786, 3, 4.414214, 5

Two have eigenvalues of multiplicity 3, indicating a high degree of symmetry (spectrum shown on hover).

Two have all eigenvalues integer and distinct:

 

The 5-cycle and the complete graph are the only regular graphs on 5 vertices.

6 vertices

This is where we first encounter isospectral graphs: the Laplacian spectrum cannot tell them apart.

Both of these have spectrum {3\pm \sqrt{5}, 2, 3, 3} but they are obviously non-isomorphic (consider the vertex degrees):

Both have these have spectrum {3\pm \sqrt{5}, 3, 3, 4} and are non-isomorphic.

Indeed, the second pair is obtained from the first by taking graph complement.

Also notable are regular graphs on 6 vertices, all of which have integer spectrum.

6-46
1, 1, 3, 3, 4
6-71
3, 3, 3, 3, 6
6-98
2, 3, 3, 5, 5
6-108
4, 4, 4, 6, 6
6-111
6, 6, 6, 6, 6

Here [3, 3, 3, 3, 6] (complete bipartite) and [2, 3, 3, 5, 5] (prism) are both regular of degree 3, but the spectrum allows us to tell them apart.

The prism is the smallest regular graph for which the first eigenvalue is a simple one. It has plenty of automorphisms, but the relevant eigenfunction (1 on one face of the prism, -1 on the other face) is compatible with all of them.

7 vertices

There are four regular graphs on 7 vertices. Two of them are by now familiar: 7-cycle and complete graph. Here are the other two, both regular of degree 4 but with different spectra.

7-720
3, 4, 4, 5, 5, 7
7-832
3.198, 3.198, 4.555, 4.555, 6.247, 6.247

There are lots of isospectral pairs of graphs on 7 vertices, so I will list only the isospectral triples, of which there are five.

Spectrum 0.676596, 2, 3, 3.642074, 5, 5.681331:

Spectrum 0.726927, 2, 3.140435, 4, 4, 6.132637:

Spectrum 0.867363, 3, 3, 3.859565, 5, 6.273073:

Spectrum 1.318669, 2, 3.357926, 4, 5, 6.323404:

All of the triples mentioned so far have connected complement: for example, taking the complement of the triple with the spectrum [0.676596, 2, 3, 3.642074, 5, 5.681331] turns it into the triple with the spectrum [1.318669, 2, 3.357926, 4, 5, 6.323404].

Last but not least, an isospectral triple with an integer spectrum: 3, 4, 4, 6, 6, 7. This one has no counterpart since the complement of each of these graphs is disconnected.

8 vertices

Regular graphs, excluding the cycle (spectrum 0.585786, 0.585786, 2, 2, 3.414214, 3.414214, 4) and the complete one.

Degree 3 regular:

8-4326
2, 2, 2, 4, 4, 4, 6
8-6409
0.763932, 2, 4, 4, 4, 4, 5.236068
8-6579
1.438447, 2.381966, 2.381966, 3, 4.618034, 4.618034
8-6716
1.267949, 2, 2.585786, 4, 4, 4.732051,
5.414214
8-8725
2, 2, 2.585786, 2.585786, 4, 5.414214,
5.414214

Degree 4 regular

8-4575
4, 4, 4, 4, 4, 4, 8
8-9570
2.763932, 4, 4, 4, 4, 6, 7.236068
8-10188
2.438447, 3.381966, 3.381966, 5, 5.618034, 5.618034, 6.561553
8-10202
2.585786, 3.267949, 4, 4, 5.414214, 6, 6.732051
8-10819
2.585786, 2.585786, 4, 5.414214, 5.414214, 6, 6
8-10888
2, 4, 4, 4, 6, 6, 6

Degree 5 regular

8-10481
4.381966, 4.381966, 5, 5, 6.618034, 6.618034, 8
8-10975
4, 4, 6, 6, 6, 6, 8
8-11082
4, 4.585786, 4.585786, 6, 6, 7.414214, 7.414214

Degree 6 regular

8-11112
6, 6, 6, 6, 8, 8, 8

Credits: the list of graphs by Brendan McKay and NetworkX library specifically the methods read_graph6 (to read the files provided by Prof. McKay), laplacian_spectrum, diameter, degree, and draw.

The distribution of pow(2, n, n)

For a positive integer {n} let {f(n)} be the remainder of the division of {2^n} by {n}. This is conveniently computed in Python as pow(2, n, n). For example, the first 20 values are returned by

[pow(2, n, n) for n in range(1, 21)]

and they are:

[0, 0, 2, 0, 2, 4, 2, 0, 8, 4, 2, 4, 2, 4, 8, 0, 2, 10, 2, 16]

Obviously, {f(n)=0} if and only if {n} is a power of {2}. It also looks like {f} is always even, with powers of 2 dominating the list of its values. But {f} does take on odd values, although this does not happen often: only 6 times in the first 100 integers.

[(n, pow(2, n, n)) for n in range(1, 101) if pow(2, n, n) % 2]

returns

[(25, 7), (45, 17), (55, 43), (91, 37), (95, 13), (99, 17)]

It is conjectured that the range of {f} consists of all nonnegative integers except 1. Here is a proof that {f(n)\ne 1}, provided by Max Alexeyev on the OEIS page for A036236

Suppose {2^n \equiv 1 \bmod n} for some {n>1}. Let {p} be the smallest prime divisor of {n}. Then {2^n\equiv 1 \bmod p}. This means that the order of {2} in the multiplicative group {(\mathbb Z/ p \mathbb Z)^*} is a divisor of {n}. But this group has {p-1} elements, and the only divisor of {n} that is smaller than {p} is {1}. Thus, the order of {2} is {1}, which means {2\equiv 1 \bmod p}, which is absurd.

The fact that the smallest {n} with {f(n) = 3} is 4700063497 deserves to be mentioned here. But I would like to consider the most frequent values of {f} instead. Experimentally, they are found like this:

from collections import Counter
freq = Counter(pow(2, n, n) for n in range(1, 100000000 + 1))
print(freq.most_common(20))

which prints 20 pairs (value, frequency):

(2, 5763518),
(4, 3004047),
(8, 2054167), 
(16, 1569725), 
(64, 1076293), 
(256, 824438), 
(512, 737450), 
(1024, 668970), 
(32, 638063), 
(4096, 569705), 
(128, 466015), 
(65536, 435306), 
(262144, 389305), 
(1048576, 351723), 
(2097152, 326926), 
(16384, 246533), 
(16777216, 243841), 
(32768, 232037), 
(2048, 153537), 
(8192, 131614)

These are all powers of 2… but not exactly in the order one might expect. I repeated the experiment for the first {10^8k} integers with {k=1, 2, 3, 4, 5, 6}. The frequency order remained stable at the top. These are the most common 11 values, with their frequency (in %) listed for the aforementioned six ranges.

2, [5.8, 5.5, 5.4, 5.3, 5.3, 5.2]
4, [3.0, 2.9, 2.8, 2.8, 2.7, 2.7]
8, [2.1, 2.0, 1.9, 1.9, 1.9, 1.8]
16, [1.6, 1.5, 1.5, 1.4, 1.4, 1.4]
64, [1.1, 1.0, 1.0, 1.0, 1.0, 1.0]
256, [0.8, 0.8, 0.8, 0.8, 0.7, 0.7]
512, [0.7, 0.7, 0.7, 0.7, 0.7, 0.7]
1024, [0.7, 0.6, 0.6, 0.6, 0.6, 0.6]
32, [0.6, 0.6, 0.6, 0.6, 0.6, 0.6]
4096, [0.6, 0.5, 0.5, 0.5, 0.5, 0.5]
128, [0.5, 0.4, 0.4, 0.4, 0.4, 0.4]

It seems that 32 and 128 are consistently less common than one might expect.

The most common value, 2, is contributed by primes and pseudoprimes to base 2. The value of 4 appears when {n = 2p} with {p} prime (but not only then). Still, the primes having zero density makes it difficult to explain the frequency pattern based on primality. A more convincing reason could be: when {n=2k} is even, we are computing {4^k \bmod 2k} and that is likely to be a power of {4}. This boosts the frequency of the even (generally, composite) powers of 2 in the sequence.

Numerical integration visualized

scipy.integrate.quad is a popular method of numerical integration with Python. Let’s see how it chooses the points at which to evaluate the function being integrated. Begin with a simple example, the exponential function.

exp1
exp(x) on [-1, 1]
The blue dots indicate the evaluation points, their y-coordinate being the order of evaluation. So, it begins with x=0, continues with something close to -1, etc. The function, drawn in red, is not to its true vertical scale.

The placement of dots gives away the method of integration: it is the Gauss-Kronrod quadrature with 10 Gauss nodes and 21 Kronrod nodes, abbreviated (G10, K21).  The Gauss nodes are included in the Kronrod nodes, and of course the function is not evaluated there again. The evaluation process is slightly out of order in that 0 is a Kronrod node but comes first, followed by 10 Gauss nodes, followed by 10 remaining Kronrod nodes. The process ends there, as the comparison of G10 and K21 results shows the necessary precision was reached.

The square root on [0, 1] is not such a nice function, so (G10, K21) does not reach the required precision at once. The interval is bisected again and again until it does.

sqrt
sqrt(x) on [0, 4]
  Surely the cube root is even worse. Or is it?

cbrt
cbrt(x) on [-1, 1]
The nodes are symmetric about the midpoint of the interval of integration. So for any odd function on an interval symmetric about 0 we get G10 = 0 and K21 = 0, and the process stops at once. To see the effect of the cube root singularity, one has to use a non-symmetric interval such as [-1, 2].

cbrt12
cbrt(x) on [-1, 2]
That’s still fewer subdivisions than for the square root: cancellation between the left and right neighborhoods of 0 still helps. Let’s look at smooth functions next.

sinxsq
sin(x^2) on [0, 7]
Rapid oscillations forces subdivisions here. There are also other reasons for subdivision, such as the loss of analyticity:

exp_1x
exp(-1/x) on [0, 1]
Although exp(-1/x) is infinitely differentiable on this interval, the fact that it is not analytic at 0 makes it a more difficult integrand that exp(x). Finally, an analytic function with no oscillation which still needs a bunch of subintervals:

runge
1/(1+x^2) on [-5, 5]
This is the standard example used to illustrate the Runge phenomenon. Although we are not interpolating here, numerical integration is also influenced by the function having a small radius of convergence of its Taylor series. After all, G10 can be thought of as degree-9 interpolation of the given function at the Gauss nodes (the zeros of a Legendre polynomial),  with the formula returning the integral of the interpolating polynomial.

The code used to plot these things:

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
global eval_points
f = lambda x: np.exp(x)
def f_int(x):
    eval_points.append(x)
    return f(x)

eval_points = []
a, b = -1, 1
quad(f_int, a, b)
n = len(eval_points)
t = np.linspace(a, b, 1000)
y = f(t)
yy = n*(y - y.min())/(y.max() - y.min())
plt.plot(t, yy, 'r') 
plt.plot(eval_points, np.arange(0, n), '.')
plt.show()

Critical points of a cubic spline

The choice of piecewise polynomials of degree 3 for interpolation is justifiably popular: even-degree splines are algebraically awkward to construct, degree 1 is simply piecewise linear interpolation (not smooth), and degree 5, while feasible, entails juggling too many coefficients. Besides, a cubic polynomial minimizes the amount of wiggling (the integral of second derivative squared) for given values and slopes at the endpoints of an interval. (Recall Connecting dots naturally.)

But the derivative of a cubic spline is a quadratic spline. And one needs the derivative to find the critical points. This results in an awkward example in SciPy documentation, annotated with “(NB: sproot only works for order 3 splines, so we fit an order 4 spline)”.

Although not implemented in SciPy, the task of computing the roots of a quadratic spline is a simple one. Obtaining the roots from the internal representation of a quadratic spline in SciPy (as a linear combination of B-splines) would take some work and reading. But a quadratic polynomial is determined by three values, so sampling it at three points, such as two consecutive knots and their average, is enough.

Quadratic formula with values instead of coefficients

Suppose we know the values of a quadratic polynomial q at -1, 0, 1, and wish to find if it has roots between -1 and 1. Let’s normalize so that q(0)=1, and let x = q(-1), y = q(1). If either x or y is negative, there is definitely a root on the interval. If they are positive, there is still a chance: we need the parabola to be concave up, have a minimum within [-1, 1], and for the minimum to be negative. All of this is easily determined once we note that the coefficients of the polynomial are a = (x+y)/2 – 1, b = (y-x)/2, and c = 1.

The inequality {(x-y)^2 \ge 8(x+y-2)} ensures the suitable sign of the discriminant. It describes a parabola with vertex (1, 1) and focus (2, 2), contained in the first quadrant and tangent to the axes at (4, 0) and (0, 4). Within the orange region there are no real roots.

orange
No real roots in the orange region

The line x+y=2, tangent to the parabola at its vertex, separates convex and concave parabolas. While concavity in conjunction with x, y being positive definitely precludes having roots in [-1, 1], slight convexity is not much better: it results in real roots outside of the interval. Here is the complete picture: green means there is a root in [-1, 1], orange means no real roots, red covers the rest.

all_colors
Green = there is a root in the interval [-1, 1]

Back to splines

Since the derivative of a spline is implemented in SciPy (B-splines have a nice formula for derivatives), all we need is a root-finding routine for quadratic splines. Here it is, based on the above observations but using built-in NumPy polynomial solver np.roots to avoid dealing with various special cases for the coefficients.

def quadratic_spline_roots(spl):
    roots = []
    knots = spl.get_knots()
    for a, b in zip(knots[:-1], knots[1:]):
        u, v, w = spl(a), spl((a+b)/2), spl(b)
        t = np.roots([u+w-2*v, w-u, 2*v])
        t = t[np.isreal(t) & (np.abs(t) <= 1)]
        roots.extend(t*(b-a)/2 + (b+a)/2)
    return np.array(roots)

A demonstration, which plots the spline (blue), its critical points (red), and original data points (black) as follows:

spline
There can be 0, 1, or 2 critical points between two knots
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline

x = np.arange(7)
y = np.array([3, 1, 1, 2, 2, 4, 3])
f = InterpolatedUnivariateSpline(x, y, k=3)
crit_pts = quadratic_spline_roots(f.derivative())

t = np.linspace(x[0], x[-1], 500)
plt.plot(t, f(t))
plt.plot(x, y, 'kd')
plt.plot(crit_pts, f(crit_pts), 'ro')
plt.show()

 

The sum of pairwise distances and the square of CDF

Suppose we have {n} real numbers {x_0,\dots, x_{n-1}} and want to find the sum of all distances {|x_j-x_k|} over {j < k}. Why? Maybe because over five years ago, the gradient flow of this quantity was used for "clustering by collision" (part 1, part 2, part 3).

If I have a Python console open, the problem appears to be solved with one line:

>>> 0.5 * np.abs(np.subtract.outer(x, x)).sum()

where the outer difference of x with x creates a matrix of all differences {x_i-x_j}, then absolute values are taken, and then they are all added up. Double-counted, hence the factor of 0.5.

But trying this with, say, one million numbers is not likely to work. If each number takes 8 bytes of memory (64 bits, double precision), then the array x is still pretty small (under 8 MB) but a million-by-million matrix will require over 7 terabytes, and I won’t have that kind of RAM anytime soon.

In principle, one could run a loop adding these values, or store the matrix on a hard drive. Both are going to take forever.

There is a much better way, though. First, sort the numbers in nondecreasing order; this does not require much time or memory (compared to quadratic memory cost of forming a matrix). Then consider the partial sums {s_k = x_0+\dots+x_k}; the cost of computing them is linear in time and memory. For each fixed {k}, the sum of distances to {x_j} with {j<k} is simply {kx_k - s_{k-1}}, or, equivalently, {(k+1)x_k - s_k}. So, all we have to do is add these up. Still one line of code (after sorting), but a much faster one:

>>> x.sort()
>>> (np.arange(1, n+1)*x - np.cumsum(x)).sum()

For example, x could be a sample from some continuous distribution. Assuming the distribution has a mean (i.e., is not too heavy tailed), the sum of all pairwise distances grows quadratically with n, and its average approaches a finite limit. For the uniform distribution on [0, 1] the computation shows this limit is 1/3. For the standard normal distribution it is 1.128… which is not as recognizable a number.


As {n\to \infty}, the average distance of a sample taken from a distribution converges to the expected value of |X-Y| where X, Y are two independent variables with that distribution. Let’s express this in terms of the probability density function {p} and the cumulative distribution function {\Phi}. By symmetry, we can integrate over {x> y} and double the result:

{\displaystyle \frac12 E|X-Y| = \int_{-\infty}^\infty p(x)\,dx \int_{-\infty}^x (x-y) p(y)\,dy}

Integrate by parts in the second integral: {p(y) = \Phi'(y)}, and the boundary terms are zero.

{\displaystyle \frac12 E|X-Y| = \int_{-\infty}^\infty p(x)\,dx \int_{-\infty}^x \Phi(y)\,dy}

Integrate by parts in the other integral, throwing the derivative onto the indefinite integral and thus eliminating it. There is a boundary term this time.

{\displaystyle \frac12 E|X-Y| = \Phi(\infty) \int_{-\infty}^\infty \Phi(y)\,dy - \int_{-\infty}^\infty \Phi(x)^2\,dx}

Since {\Phi(\infty) = 1}, this simplifies nicely:

{\displaystyle \frac12 E|X-Y| = \int_{-\infty}^\infty \Phi(x) (1-\Phi(x))\,dx}

This is a lot neater than I expected: {E|X-Y|} is simply the integral of {2\Phi(1-\Phi)}. I don’t often see CDF squared, like here. Some examples: for the uniform distribution on [0,1] we get

{\displaystyle E|X-Y| = \int_0^1 2x(1-x)\,dx = \frac13}

and for the standard normal, with {\Phi(x) = (1+\mathrm{erf}\,(x/\sqrt{2}))/2}, it is

{\displaystyle \int_{-\infty}^\infty \frac12 \left(1-\mathrm{erf}\,(x/\sqrt{2}) ^2 \right)\,dx = \frac{2}{\sqrt{\pi}}\approx 1.12838\ldots }


The trick with sorting and cumulative sums can also be used to find, for every point {x_k}, the sum (or average) of distances to all other points. To do this, we don’t sum over {k} but must also add {|x_j-x_k|} for {j>k}. The latter sum is simply {S - s_k - (n-k-1)x_k} where {S} is the total sum. So, all we need is

>>> (2*np.arange(1,n+1)-n)*x - 2*np.cumsum(x) + x.sum()

Unfortunately, the analogous problems for vector-valued sequences are not as easy. If the Manhattan metric is used, we can do the computations for each coordinate separately, and add the results. For the Euclidean metric…

Experiments with the significance of autocorrelation

Given a sequence of numbers {x_j} of length {L} one may want to look for evidence of its periodic behavior. One way to do this is by computing autocorrelation, the correlation of the sequence with a shift of itself. Here is one reasonable way to do so: for lag values {\ell=1,\dots, \lfloor L/2 \rfloor} compute the correlation coefficient of {(x_1,\dots, x_{L-\ell}} with {(x_{\ell+1},\dots, x_L)}. That the lag does not exceed {L/2} ensures the entire sequence participates in the computation, so we are not making a conclusion about its periodicity after comparing a handful of terms at the beginning and the end. In other words, we are not going to detect periodicity if the period is more than half of the observed time period.

Having obtained the correlation coefficients, pick one with the largest absolute value; call it R. How large does R have to be in order for us to conclude the correlation is not a fluke? The answer depends on the distribution of our data, but an experiment can be used to get some idea of likelihood of large R.

I picked {x_j} independently from the standard normal distribution, and computed {r} as above. After 5 million trials with a sequence of length 100, the distribution of R was as follows:

ac100-5M
Extremal correlation coefficient in a sequence of length 100

Based on this experiment, the probability of obtaining |R| greater than 0.5 is less than 0.0016. So, 0.5 is pretty solid evidence. The probability of {|R| > 0.6} is two orders of magnitude less, etc. Also, |R| is unlikely to be very close to zero unless the data is structured in some strange way. Some kind of correlation ought to be present in the white noise.

Aside: it’s not easy to construct perfectly non-autocorrelated sequences for the above test. For length 5 an example is 1,2,3,2,3. Indeed, (1,2,3,2) is uncorrelated with (2,3,2,3) and (1,2,3) is uncorrelated with (3,2,3). For length 6 and more I can’t construct these without filling them with a bunch of zeros.

Repeating the experiment with sequences of length 1000 shows a tighter distribution of R: now |R| is unlikely to be above 0.2. So, if a universal threshold is to be used here, we need to adjust R based on sequence length.

ac1000-5M
Extremal correlation coefficient in a sequence of length 1000

I did not look hard for statistical studies of this subject, resorting to an experiment. Experimentally obtained p-values are pretty consistent for the criterion {L^{0.45}|R| > 4}. The number of trials was not very large (10000) so there is some fluctuation, but the pattern is clear.
 

Length, L P(L0.45|R| > 4)
100 0.002
300 0.0028
500 0.0022
700 0.0028
900 0.0034
1100 0.0036
1300 0.0039
1500 0.003
1700 0.003
1900 0.0042
2100 0.003
2300 0.0036
2500 0.0042
2700 0.0032
2900 0.0043
3100 0.0042
3300 0.0025
3500 0.0031
3700 0.0027
3900 0.0042

Naturally, all this depends on the assumption of independent normal variables.

And this is the approach I took to computing r in Python:

import numpy as np
n = 1000
x = np.random.normal(size=(n,))
acorr = np.correlate(x, x, mode='same')
acorr = acorr[n//2+1:]/(x.var()*np.arange(n-1, n//2, -1))
r = acorr[np.abs(acorr).argmax()]

Recursive randomness of reals: summing a random decreasing sequence

In a comment to Recursive randomness of integers Rahul pointed out a continuous version of the same problem: pick {x_0} uniformly in {[0,1]}, then {x_1} uniformly in {[0,x_0]}, then {x_2} uniformly in {[0, x_1]}, etc. What is the distribution of the sum {S=\sum_{n=0}^\infty x_n}?

The continuous version turns out to be easier to analyse. To begin with, it’s equivalent to picking uniformly distributed, independent {y_n\in [0,1]} and letting {x_n = y_0y_1\cdots y_n}. Then the sum is

{\displaystyle y_0+y_0y_1 + y_0y_1y_2 + \cdots }

which can be written as

{\displaystyle y_0(1+y_1(1+y_2(1+\cdots )))}

So, {S} is a stationary point of the random process {X\mapsto (X+1)U} where {U} is uniformly distributed in {[0,1]}. Simply put, {S} and {(S+1)U} have the same distribution. This yields the value of {E[S]} in a much simpler way than in the previous post:

{E[S]=E[(S+1)U] = (E[S] + 1) E[U] = (E[S] + 1)/2}

hence {E[S]=1}.

We also get an equation for the cumulative distribution function {C(t) = P[S\le t]}. Indeed,

{\displaystyle P[S\le t] = P[(S+1)U \le t] = P[S \le t/U-1]}

The latter probability is {\int_0^1 P[S\le t/u-1]\,du = \int_0^1 C(t/u-1)\,du}. Conclusion: {C(t) = \int_0^1 C(t/u-1)\,du}. Differentiate to get an equation for the probability density function {p(t)}, namely {p(t) = \int_0^1 p(t/u-1)\,du/u}. It’s convenient to change the variable of integration to {v = t/u-1}, which leads to

{\displaystyle p(t) = \int_{t-1}^\infty p(v)\,\frac{dv}{v+1}}

Another differentiation turns the integral equation into a delay differential equation,

{\displaystyle p'(t) = - \frac{p(t-1)}{t}}

Looks pretty simple, doesn’t it? Since the density is zero for negative arguments, it is constant on {[0,1]}. This constant, which I’ll denote {\gamma}, is {\int_{0}^\infty p(v)\,\frac{dv}{v+1}}, or simply {E[1/(S+1)]}. I couldn’t get an analytic formula for {\gamma}. My attempt was {E[1/(S+1)] = \sum_{n=0}^\infty (-1)^n M_n} where {M_n=E[S^n]} are the moments of {S}. The moments can be computed recursively using {E[S^n] = E[(S+1)^n]E[U^n]}, which yields

{\displaystyle M_n=\frac{1}{n} \sum_{k=0}^{n-1} \binom{n}{k}M_k}

The first few moments, starting with {M_0}, are 1, 1, 3/2, 17/6, 19/3, 81/5, 8351/80… Unfortunately the series {\sum_{n=0}^\infty (-1)^n M_n} diverges, so this approach seems doomed. Numerically {\gamma \approx 0.5614} which is not far from the Euler-Mascheroni constant, hence the choice of notation.

On the interval (1,2) we have {p'(t) = -\gamma/t}, hence

{p(t) = \gamma(1-\log t)} for {1 \le t \le 2}.

The DDE gets harder to integrate after that… on the interval {[2,3]} the solution already involves the dilogarithm (Spence’s function):

{\displaystyle p(t) = \gamma(1+\pi^2/12 - \log t + \log(t-1)\log t + \mathrm{Spence}\,(t))}

following SciPy’s convention for Spence. This is as far as I went… but here is an experimental confirmation of the formulas obtained so far (link to full size image).

match03
Histogram of 10 million samples and the pdf (red) plotted on [0,3]

To generate a sample from distribution S, I begin with a bunch of zeros and repeat “add 1, multiply by U[0,1]” many times. That’s it.

import numpy as np
import matplotlib.pyplot as plt
trials = 10000000
terms = 10000
x = np.zeros(shape=(trials,))
for _ in range(terms):
    np.multiply(x+1, np.random.uniform(size=(trials,)), out=x)
_ = plt.hist(x, bins=1000, normed=True)
plt.show()

I still want to know the exact value of {\gamma}… after all, it’s also the probability that the sum of our random decreasing sequence is less than 1.

Update

The constant I called “{\gamma}” is in fact {\exp(-\gamma)} where {\gamma} is indeed Euler’s constant… This is what I learned from the Inverse Symbolic Calculator after solving the DDE (with initial value 1) numerically, and calculating the integral of the solution. From there, it did not take long to find that

Oh well. At least I practiced solving delay differential equations in Python. There is no built-in method in SciPy for that, and although there are some modules for DDE out there, I decided to roll my own. The logic is straightforward: solve the ODE on an interval of length 1, then build an interpolating spline out of the numeric solution and use it as the right hand side in the ODE, repeat. I used Romberg’s method for integrating the solution; the integration is done separately on each interval [k, k+1] because of the lack of smoothness at the integers.

import numpy as np
from scipy.integrate import odeint, romb
from scipy.interpolate import interp1d
numpoints = 2**12 + 1
solution = [lambda x: 1]
integrals = [1]
for k in range(1, 15):
    y0 = solution[k-1](k)
    t = np.linspace(k, k+1, numpoints)
    rhs = lambda y, x: -solution[k-1](np.clip(x-1, k-1, k))/x
    y = odeint(rhs, y0, t, atol=1e-15, rtol=1e-13).squeeze()
    solution.append(interp1d(t, y, kind='cubic', assume_sorted=True))
    integrals.append(romb(y, dx=1/(numpoints-1)))
total_integral = sum(integrals)
print("{:.15f}".format(1/total_integral))

As a byproduct, the program found the probabilities of the random sum being in each integer interval:

  • 56.15% in [0,1]
  • 34.46% in [1,2]
  • 8.19% in [2,3]
  • 1.1% in [3,4]
  • 0.1% in [4,5]
  • less than 0.01% chance of being greater than 5