## Experiments with the significance of autocorrelation

Given a sequence of numbers ${x_j}$ of length ${L}$ one may want to look for evidence of its periodic behavior. One way to do this is by computing autocorrelation, the correlation of the sequence with a shift of itself. Here is one reasonable way to do so: for lag values ${\ell=1,\dots, \lfloor L/2 \rfloor}$ compute the correlation coefficient of ${(x_1,\dots, x_{L-\ell}}$ with ${(x_{\ell+1},\dots, x_L)}$. That the lag does not exceed ${L/2}$ ensures the entire sequence participates in the computation, so we are not making a conclusion about its periodicity after comparing a handful of terms at the beginning and the end. In other words, we are not going to detect periodicity if the period is more than half of the observed time period.

Having obtained the correlation coefficients, pick one with the largest absolute value; call it R. How large does R have to be in order for us to conclude the correlation is not a fluke? The answer depends on the distribution of our data, but an experiment can be used to get some idea of likelihood of large R.

I picked ${x_j}$ independently from the standard normal distribution, and computed ${r}$ as above. After 5 million trials with a sequence of length 100, the distribution of R was as follows:

Based on this experiment, the probability of obtaining |R| greater than 0.5 is less than 0.0016. So, 0.5 is pretty solid evidence. The probability of ${|R| > 0.6}$ is two orders of magnitude less, etc. Also, |R| is unlikely to be very close to zero unless the data is structured in some strange way. Some kind of correlation ought to be present in the white noise.

Aside: it’s not easy to construct perfectly non-autocorrelated sequences for the above test. For length 5 an example is 1,2,3,2,3. Indeed, (1,2,3,2) is uncorrelated with (2,3,2,3) and (1,2,3) is uncorrelated with (3,2,3). For length 6 and more I can’t construct these without filling them with a bunch of zeros.

Repeating the experiment with sequences of length 1000 shows a tighter distribution of R: now |R| is unlikely to be above 0.2. So, if a universal threshold is to be used here, we need to adjust R based on sequence length.

I did not look hard for statistical studies of this subject, resorting to an experiment. Experimentally obtained p-values are pretty consistent for the criterion ${L^{0.45}|R| > 4}$. The number of trials was not very large (10000) so there is some fluctuation, but the pattern is clear.

Length, L P(L0.45|R| > 4)
100 0.002
300 0.0028
500 0.0022
700 0.0028
900 0.0034
1100 0.0036
1300 0.0039
1500 0.003
1700 0.003
1900 0.0042
2100 0.003
2300 0.0036
2500 0.0042
2700 0.0032
2900 0.0043
3100 0.0042
3300 0.0025
3500 0.0031
3700 0.0027
3900 0.0042

Naturally, all this depends on the assumption of independent normal variables.

And this is the approach I took to computing r in Python:

import numpy as np
n = 1000
x = np.random.normal(size=(n,))
acorr = np.correlate(x, x, mode='same')
acorr = acorr[n//2+1:]/(x.var()*np.arange(n-1, n//2, -1))
r = acorr[np.abs(acorr).argmax()]

## Recursive randomness of reals: summing a random decreasing sequence

In a comment to Recursive randomness of integers Rahul pointed out a continuous version of the same problem: pick ${x_0}$ uniformly in ${[0,1]}$, then ${x_1}$ uniformly in ${[0,x_0]}$, then ${x_2}$ uniformly in ${[0, x_1]}$, etc. What is the distribution of the sum ${S=\sum_{n=0}^\infty x_n}$?

The continuous version turns out to be easier to analyse. To begin with, it’s equivalent to picking uniformly distributed, independent ${y_n\in [0,1]}$ and letting ${x_n = y_0y_1\cdots y_n}$. Then the sum is

${\displaystyle y_0+y_0y_1 + y_0y_1y_2 + \cdots }$

which can be written as

${\displaystyle y_0(1+y_1(1+y_2(1+\cdots )))}$

So, ${S}$ is a stationary point of the random process ${X\mapsto (X+1)U}$ where ${U}$ is uniformly distributed in ${[0,1]}$. Simply put, ${S}$ and ${(S+1)U}$ have the same distribution. This yields the value of ${E[S]}$ in a much simpler way than in the previous post:

${E[S]=E[(S+1)U] = (E[S] + 1) E[U] = (E[S] + 1)/2}$

hence ${E[S]=1}$.

We also get an equation for the cumulative distribution function ${C(t) = P[S\le t]}$. Indeed,

${\displaystyle P[S\le t] = P[(S+1)U \le t] = P[S \le t/U-1]}$

The latter probability is ${\int_0^1 P[S\le t/u-1]\,du = \int_0^1 C(t/u-1)\,du}$. Conclusion: ${C(t) = \int_0^1 C(t/u-1)\,du}$. Differentiate to get an equation for the probability density function ${p(t)}$, namely ${p(t) = \int_0^1 p(t/u-1)\,du/u}$. It’s convenient to change the variable of integration to ${v = t/u-1}$, which leads to

${\displaystyle p(t) = \int_{t-1}^\infty p(v)\,\frac{dv}{v+1}}$

Another differentiation turns the integral equation into a delay differential equation,

${\displaystyle p'(t) = - \frac{p(t-1)}{t}}$

Looks pretty simple, doesn’t it? Since the density is zero for negative arguments, it is constant on ${[0,1]}$. This constant, which I’ll denote ${\gamma}$, is ${\int_{0}^\infty p(v)\,\frac{dv}{v+1}}$, or simply ${E[1/(S+1)]}$. I couldn’t get an analytic formula for ${\gamma}$. My attempt was ${E[1/(S+1)] = \sum_{n=0}^\infty (-1)^n M_n}$ where ${M_n=E[S^n]}$ are the moments of ${S}$. The moments can be computed recursively using ${E[S^n] = E[(S+1)^n]E[U^n]}$, which yields

${\displaystyle M_n=\frac{1}{n} \sum_{k=0}^{n-1} \binom{n}{k}M_k}$

The first few moments, starting with ${M_0}$, are 1, 1, 3/2, 17/6, 19/3, 81/5, 8351/80… Unfortunately the series ${\sum_{n=0}^\infty (-1)^n M_n}$ diverges, so this approach seems doomed. Numerically ${\gamma \approx 0.5614}$ which is not far from the Euler-Mascheroni constant, hence the choice of notation.

On the interval (1,2) we have ${p'(t) = -\gamma/t}$, hence

${p(t) = \gamma(1-\log t)}$ for ${1 \le t \le 2}$.

The DDE gets harder to integrate after that… on the interval ${[2,3]}$ the solution already involves the dilogarithm (Spence’s function):

${\displaystyle p(t) = \gamma(1+\pi^2/12 - \log t + \log(t-1)\log t + \mathrm{Spence}\,(t))}$

following SciPy’s convention for Spence. This is as far as I went… but here is an experimental confirmation of the formulas obtained so far (link to full size image).

To generate a sample from distribution S, I begin with a bunch of zeros and repeat “add 1, multiply by U[0,1]” many times. That’s it.

import numpy as np
import matplotlib.pyplot as plt
trials = 10000000
terms = 10000
x = np.zeros(shape=(trials,))
for _ in range(terms):
np.multiply(x+1, np.random.uniform(size=(trials,)), out=x)
_ = plt.hist(x, bins=1000, normed=True)
plt.show()

I still want to know the exact value of ${\gamma}$… after all, it’s also the probability that the sum of our random decreasing sequence is less than 1.

### Update

The constant I called “${\gamma}$” is in fact ${\exp(-\gamma)}$ where ${\gamma}$ is indeed Euler’s constant… This is what I learned from the Inverse Symbolic Calculator after solving the DDE (with initial value 1) numerically, and calculating the integral of the solution. From there, it did not take long to find that

Oh well. At least I practiced solving delay differential equations in Python. There is no built-in method in SciPy for that, and although there are some modules for DDE out there, I decided to roll my own. The logic is straightforward: solve the ODE on an interval of length 1, then build an interpolating spline out of the numeric solution and use it as the right hand side in the ODE, repeat. I used Romberg’s method for integrating the solution; the integration is done separately on each interval [k, k+1] because of the lack of smoothness at the integers.

import numpy as np
from scipy.integrate import odeint, romb
from scipy.interpolate import interp1d
numpoints = 2**12 + 1
solution = [lambda x: 1]
integrals = [1]
for k in range(1, 15):
y0 = solution[k-1](k)
t = np.linspace(k, k+1, numpoints)
rhs = lambda y, x: -solution[k-1](np.clip(x-1, k-1, k))/x
y = odeint(rhs, y0, t, atol=1e-15, rtol=1e-13).squeeze()
solution.append(interp1d(t, y, kind='cubic', assume_sorted=True))
integrals.append(romb(y, dx=1/(numpoints-1)))
total_integral = sum(integrals)
print("{:.15f}".format(1/total_integral))

As a byproduct, the program found the probabilities of the random sum being in each integer interval:

• 56.15% in [0,1]
• 34.46% in [1,2]
• 8.19% in [2,3]
• 1.1% in [3,4]
• 0.1% in [4,5]
• less than 0.01% chance of being greater than 5

## The Kolakoski-Cantor set

A 0-1 sequence can be interpreted as a point in the interval [0,1]. But this makes the long-term behavior of the sequence practically invisible due to limited resolution of our screens (and eyes). To make it visible, we can also plot the points obtained by shifting the binary sequence to the left (Bernoulli shift, which also goes by many other names). The resulting orbit  is often dense in the interval, which doesn’t really help us visualize any patterns. But sometimes we get an interesting complex structure.

The vertical axis here is the time parameter, the number of dyadic shifts. The 0-1 sequence being visualized is the Kolakoski sequence in its binary form, with 0 and 1 instead of 1 and 2. By definition, the n-th run of equal digits in this sequence has length ${x_n+1}$. In particular, 000 and 111 never occur, which contributes to the blank spots near 0 and 1.

Although the sequence is not periodic, the set is quite stable in time; it does not make a visible difference whether one plots the first 10,000 shifts, or 10,000,000. The apparent symmetry about 1/2 is related to the open problem of whether the Kolakoski sequence is mirror invariant, meaning that together with any finite word (such as 0010) it also contains its complement (that would be 1101).

There are infinitely many forbidden words apart from 000 and 111 (and the words containing those). For example, 01010 cannot occur because it has 3 consecutive runs of length 1, which implies having 000 elsewhere in the sequence. For the same reason, 001100 is forbidden. This goes on forever: 00100100 is forbidden because it implies having 10101, etc.

The number of distinct words of length n in the Kolakoski sequence is bounded by a power of n (see F. M. Dekking, What is the long range order in the Kolakoski sequence?). Hence, the set pictured above is covered by ${O(n^p)}$ intervals of length ${2^{-n}}$, which implies it (and even its closure) is zero-dimensional in any fractal sense (has Minkowski dimension 0).

The set KC apparently does not have any isolated points; this is also an open problem, of recurrence (whether every word that appears in the sequence has to appear infinitely many times). Assuming this is so, the closure of the orbit is a totally disconnected compact set without isolated points, i.e., a Cantor set. It is not self-similar (not surprising, given it’s zero-dimensional), but its relation to the Bernoulli shift implies a structure resembling self-similarity:

Applying the transformations ${x\mapsto x/2}$ and ${x\mapsto (1+x)/2}$ yields two disjoint smaller copies that cover the original set, but with some spare parts left. The leftover bits exist because not every word in the sequence can be preceded by both 0 and 1.

Applying the transformations ${x\mapsto 2x}$ and ${x\mapsto 2x-1}$ yields two larger copies that cover the original set. There are no extra parts within the interval [0,1] but there is an overlap between the two copies.

The number ${c = \inf KC\approx 0.146778684766479}$ appears several times in the structure of the set: for instance, the central gap is ${((1-c)/2, (1+c)/2)}$, the second-largest gap on the left has the left endpoint ${(1-c)/4}$, etc. The Inverse Symbolic Calculator has not found anything about this number. Its binary expansion begins with 0.001 001 011 001 001 101 001 001 101 100… which one can recognize as the smallest binary number that can be written without doing anything three times in a row. (Can’t have 000; also can’t have 001 three times in a row; and 001 010 is not allowed because it contains 01010, three runs of length 1. Hence, the number begins with 001 001 011.) This number is obviously irrational, but other than that…

In conclusion, the Python code used to plot KC.

import numpy as np
import matplotlib.pyplot as plt
n = 1000000
a = np.zeros(n, dtype=int)
j = 0
same = False
for i in range(1, n):
if same:
a[i] = a[i-1]
same = False
else:
a[i] = 1 - a[i-1]
j += 1
same = bool(a[j])
v = np.array([1/2**k for k in range(60, 0, -1)])
b = np.convolve(a, v, mode='valid')
plt.plot(b, np.arange(np.size(b)), '.', ms=2)
plt.show()

## Pisot constant beyond 0.843

In a 1946 paper Charles Pisot proved a theorem involving a curious constant ${\gamma_0= 0.843\dots}$. It can be defined as follows:

${\gamma_0= \sup\{r \colon \exists }$ monic polynomial ${p}$ such that ${|p(e^z)| \le 1}$ whenever ${|z|\le r \}}$

Equivalently, ${\gamma_0}$ is determined by the requirement that the set ${\{e^z\colon |z|\le \gamma_0\}}$ have logarithmic capacity 1; this won’t be used here. The theorem is stated below, although this post is really about the constant.

Theorem: If an entire function takes integer values at nonnegative integers and is ${O(e^{\gamma |z|})}$ for some ${\gamma < \gamma_0}$, then it is a finite linear combination of terms of the form ${z^n \alpha^z}$, where each ${\alpha }$ is an algebraic integer.

The value of ${\gamma_0}$ is best possible; thus, in some sense Pisot’s theorem completed a line of investigation that began with a 1915 theorem by Pólya which had ${\log 2}$ in place of ${\gamma_0}$, and where the conclusion was that ${f}$ is a polynomial. (Informally speaking, Pólya proved that ${2^z}$ is the “smallest” entire-function that is integer-valued on nonnegative integers.)

Although the constant ${\gamma_0}$ was mentioned in later literature (here, here, and here), no further digits of it have been stated anywhere, as far as I know. So, let it be known that the decimal expansion of ${\gamma_0}$ begins with 0.84383.

A lower bound on ${\gamma_0}$ can be obtained by constructing a monic polynomial that is bounded by 1 on the set ${E(r) = \{e^z \colon |z|\le r \}}$. Here is E(0.843):

It looks pretty round, except for that flat part on the left. In fact, E(0.82) is covered by a disk of unit radius centered at 1.3, which means that the choice ${p(z) = z-1.3}$ shows ${\gamma_0 > 0.82}$.

How to get an upper bound on ${\gamma_0}$? Turns out, it suffices to exhibit a monic polynomial ${q}$ that has all zeros in ${E(r)}$ and satisfies ${|q|>1}$ on the boundary of ${E(r)}$. The existence of such ${q}$ shows ${\gamma_0 < r}$. Indeed, suppose that ${p}$ is monic and ${|p|\le 1}$ on ${E(r)}$. Consider the function ${\displaystyle u(z) = \frac{\log|p(z)|}{\deg p} - \frac{\log|q(z)|}{\deg q}}$. By construction ${u<0}$ on the boundary of ${E(r)}$. Also, ${u}$ is subharmonic in its complement, including ${\infty}$, where the singularities of both logarithms cancel out, leaving ${u(\infty)=0}$. This contradicts the maximum principle for subharmonic functions, according to which ${u(\infty)}$ cannot exceed the maximum of ${u}$ on the boundary.

The choice of ${q(z) = z-1.42}$ works for ${r=0.89}$.

So we have ${\gamma_0}$ boxed between 0.82 and 0.89; how to get more precise bounds? I don’t know how Pisot achieved the precision of 0.843… it’s possible that he strategically picked some linear and quadratic factors, raised them to variable integer powers and optimized the latter. Today it is too tempting to throw some optimization routine on the problem and let it run for a while.

But what to optimize? The straightforward approach is to minimize the maximum of ${|p(e^z)|}$ on the circle ${|z|=r}$, approximated by sampling the function at a sufficiently fine uniform grid ${\{z_k\}}$ and picking the maximal value. This works… unspectacularly. One problem is that the objective function is non-differentiable. Another is that taking maximum throws out a lot of information: we are not using the values at other sample points to better direct the search. After running optimization for days, trying different optimization methods, tolerance options, degrees of the polynomial, and starting values, I was not happy with the results…

Turns out, the optimization is much more effective if one minimizes the variance of the set ${\{|p(\exp(z_k))|^2\}}$. Now we are minimizing a polynomial function of ${p(\exp(z_k)}$, which pushes them toward having the same absolute value — the behavior that we want the polynomial to have. It took from seconds to minutes to produce the polynomials shown below, using BFGS method as implemented in SciPy.

As the arguments for optimization function I took the real and imaginary parts of the zeros of the polynomial. The symmetry about the real axis was enforced automatically: the polynomial was the product of quadratic terms ${(z-x_k-iy_k) (z-x_k+iy_k)}$. This eliminated the potentially useful option of having real zeros of odd order, but I did not feel like special-casing those.

### Three digits

Real part: 0.916, 1.186, 1.54, 1.783
Imaginary part: 0.399, 0.572, 0.502, 0.199

Here and below, only the zeros with positive imaginary part are listed (in the left-to-right order), the others being their conjugates.

Real part: 0.878, 1.0673, 1.3626, 1.6514, 1.8277
Imaginary part: 0.3661, 0.5602, 0.6005, 0.4584, 0.171

### Four digits

Real part: 0.8398, 0.9358, 1.1231, 1.357, 1.5899, 1.776, 1.8788
Imaginary part: 0.3135, 0.4999 ,0.6163, 0.637, 0.553, 0.3751, 0.1326

Real part: 0.8397, 0.9358, 1.1231, 1.3571, 1.5901, 1.7762, 1.879
Imaginary part: 0.3136, 0.5, 0.6164, 0.6372, 0.5531, 0.3751, 0.1326

No, I didn’t post the same picture twice. The polynomials are just that similar. But as the list of zeros shows, there are tiny differences…

### Five digits

Real part: 0.81527, 0.8553, 0.96028, 1.1082, 1.28274, 1.46689, 1.63723, 1.76302, 1.82066, 1.86273
Imaginary part: 0.2686, 0.42952, 0.556, 0.63835, 0.66857, 0.63906, 0.54572, 0.39701, 0.23637, 0.08842

Real part: 0.81798, 0.85803, 0.95788, 1.09239, 1.25897, 1.44255, 1.61962, 1.76883, 1.86547, 1.89069
Imaginary part: 0.26631, 0.4234, 0.54324, 0.62676, 0.66903, 0.65366, 0.57719, 0.44358, 0.26486, 0.07896

Again, nearly the same polynomial works for upper and lower bounds. The fact that the absolute value of each of these polynomials is below 1 (for lower bounds) or greater than 1 (for upper bounds) can be ascertained by sampling them and using an upper estimate on the derivative; there is enough margin to trust computations with double precision.

Finally, the Python script I used. The function “obj” is getting minimized while function “values” returns the actual values of interest: the minimum and maximum of polynomial. The degree of polynomial is 2n, and the radius under consideration is r. The sample points are collected in array s. To begin with, the roots are chosen randomly. After minimization runs (inevitably, ending in a local minimum of which there are myriads), the new starting point is obtained by randomly perturbing the local minimum found. (The perturbation is smaller if minimization was particularly successful.)

import numpy as np
from scipy.optimize import minimize

def obj(r):
rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
p = np.prod(np.abs(s-rc)**2, axis=0)
return np.var(p)

def values(r):
rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
p = np.prod(np.abs(s-rc), axis=0)
return [np.min(p), np.max(p)]

r = 0.84384
n = 10
record = 2
s = np.exp(r * np.exp(1j*np.arange(0, np.pi, 0.01)))
xr = np.random.uniform(0.8, 1.8, size=(n,))
xi = np.random.uniform(0, 0.7, size=(n,))
x0 = np.concatenate((xr, xi))

while True:
res = minimize(obj, x0, method = 'BFGS')
if res['fun'] < record:
record = res['fun']
print(repr(res['x']))
print(values(res['x']))
x0 = res['x'] + np.random.uniform(-0.001, 0.001, size=x0.shape)
else:
x0 = res['x'] + np.random.uniform(-0.05, 0.05, size=x0.shape)

## Laguerre polynomials under 1

Laguerre polynomials have many neat definitions; I am partial to ${\displaystyle L_n(x) = \left(\frac{d}{dx} - 1\right)^n \frac{x^n}{n!}}$ because it’s so easy to remember:

1. Begin with ${x^n}$
2. Repeat “subtract the derivative” ${n}$ times
3. Normalize so the constant term is 1.

For example, for n=3 this process goes as ${x^3}$ to ${x^3-3x^2}$ to ${x^3 -6x^2 + 6x}$ to ${x^3-9x^2+18x -6}$, which normalizes to ${-\frac16x^3+\frac{3}{2}x^2 -3x +1}$. This would make a mean exercise on differentiating polynomials: every mistake snowballs throughout the computation.

What would happen if we added the derivative instead? Nothing really new: this change is equivalent to reversing the direction of the x-axis, so we’d end up with ${L_n(-x)}$. Incidentally, this shows that the polynomial ${L_n(-x)}$ has positive coefficients, which means the behavior of ${L_n}$ for negative ${x}$ is boring: the values go up as ${x}$ becomes more negative. Laguerre polynomials are all about the interval ${[0,\infty)}$ on which they are orthogonal with respect to the weight ${\exp(-x)}$ and therefore change sign often.

But when I look at the plot shown above, it’s not the zeros that draw my attention (perhaps because the x-axis is not shown) but the horizontal line ${y=1}$, the zero-degree polynomial. The coefficients of ${L_n}$ have alternating signs; in particular, ${L_n(0)=1}$ and ${L_n'(0)=-n}$. So, nonconstant Laguerre polynomials start off with the value of 1 and immediately dive below it. All except the linear one, ${L_1(x)=1-x}$, eventually recover and reach 1 again (or so it seems; I don’t have a proof).

The yellow curve that is the first to cross the blue line is the 5th degree Laguerre polynomial. Let’s see if any of the other polynomials rises about 1 sooner…

Still, nobody beats ${L_5}$ (and the second place is held by ${L_4}$). By the way, the visible expansion of oscillations is approximately exponential; multiplying the polynomials by ${\exp(-x/2)}$ turns the envelopes into horizontal lines:

Back to the crossing of y=1 line. The quantity to study is the smallest positive root of ${L_n - 1}$, denoted  ${r(n)}$ from now on. (It is the second smallest root overall; as discussed above, this polynomial has a root at x=0 and no negative roots.) For n=2,3,4,5,6, the value of ${r(n)}$ is  ${4, 3, 6-2\sqrt{3}, (15-\sqrt{105})/2, 6}$ which evaluates to 4, 3, 2.536…, 2.377…, and 6 respectively. I got these with Python / SymPy:

from sympy import *
x = Symbol('x')
[Poly(laguerre(n, x) - 1).all_roots()[1] for n in range(2, 7)]

For higher degrees we need numerics. SymPy can still help (applying .evalf() to the roots), but the process gets slow. Switching to NumPy’s roots method speeds things up, but when it indicated than ${r(88)}$  and a few others are in double digits, I became suspicious…  a closer check showed this was a numerical artifact.

Conjecture: ${r(5) \le r(n) \le r(6)}$ for all ${n}$. Moreover, ${3 < r(n) < 6}$ when ${n \ge 7}$.

Here is a closer look at the exceptional polynomials of degrees 3, 4, 5 and 6, with 1 subtracted from each:

The first local maximum of ${L_n}$ shifts down and to the left as the degree n increases. The degree n=5 is the last for which ${L_n}$ exceeds 1 on the first attempt, so it becomes the quickest to do so. On the other hand, n=6 fails on its first attempt to clear the bar, and its second attempt is later than for any subsequent Laguerre polynomial; so it sets the record for maximal ${r(n)}$.

Evaluating high-degree Laguerre polynomials is a numerical challenge: adding large terms of alternating signs can reduce accuracy dramatically. Here is a plot of the degree 98 polynomial (minus 1): where is its first positive root?

Fortunately, SymPy can evaluate Laguerre polynomials at rational points using exact arithmetics since the coefficients are rational. For example, when it evaluates the expression laguerre(98, 5) > 1 to True, that’s a (computer-assisted) proof that ${r(98) < 5}$, which one could in principle "confirm" by computing the same rational value of ${L_{98}(5) }$ by hand (of course, in this situation a human is far less trustworthy than a computer) . Evaluation at the 13 rational points 3, 3.25, 3.5, … , 5.75, 6 is enough to certify that ${r(n) < 6}$ for ${n}$ up to 200 (with the aforementioned exception of ${r(6) = 6}$).

The lower bounds call for Sturm’s theorem which is more computationally expensive than sampling at a few rational points. SymPy offers a root-counting routine based on this theorem (it counts the roots within the given closed interval):

for n in range(2, 101):
num_roots = count_roots((laguerre(n,x)-1)/x, 0, 3)
print('{} roots for n = {}'.format(num_roots, n))

Division by x eliminates the root at 0, so we are left with the root count on (0,3] — which is 1 for n=3,4 and 2 for n=5. The count is zero for all other degrees up to 100, confirming that ${r(n) > 3}$ for ${n \ge 6}$.

So, the conjecture looks solid. I don’t have a clue to its proof (nor do I know if it’s something known). The only upper bound on ${L_n}$ that I know is Szegő’s ${|L_n(x)|\le \exp(x/2)}$ for ${x\ge 0}$, which is not helping here.

## Complex Cantor sets

Every real number in the interval [0,1] can be written in binary as ${\sum_{k=1}^\infty c_k(1/2)^k}$ where each coefficient ${c_k}$ is either 0 or 1. Another way to put this: the set of all possible sums ${\sum_{k=1}^\infty c_kb^k}$ for b = 1/2 is a line segment.

What is this set for other values of “base” b, then? Let’s stick to |b| < 1 for now, so that the series converges. Nothing interesting happens for real b between 1/2 and 1; the segment grows longer, to length b/(1-b). When b is between 0 and 1, we get Cantor sets, with the classical middle-third set being the case b = 1/3.

There is no need to consider negative b, because of a symmetry between b and -b. Indeed, up to scaling and translation, the coefficients can be taken from {-1, 1} instead of {0, 1}. Then it’s obvious that changing the sign of b is the same as flipping half of coefficients the other way — does not change the set of possible sums.

Let’s look at purely imaginary b, then. Here is b = 0.6i

Why so rectangular? The real part is the sum of ${c_kb^k}$ over even k, and the imaginary part is the sum over odd k. Each of these yields a Cantor type set as long as ${|b|^2 < 1/2}$. Since the odd- and even-numbered coefficients are independent of each other, we get the product of two Cantor sets. Which changes into a rectangle when  ${|b| \ge \sqrt{1/2}}$:

(I didn’t think a full-size picture of a solid rectangle was necessary here.)

This is already interesting: the phase transition from dust to solid (connected, and even with interior) happens at different values in the real and imaginary directions: 1/2 versus ${\sqrt{1/2}}$. What will happen for other complex values? Using complex conjugation and the symmetry between b and -b, we reduce the problem to the quarter-disk in the first quadrant. Which still leaves a room for a lot of things to happen…

It’s clear that for |b| < 1/2 we get a totally disconnected set — it is covered by 2 copies of itself scaled by the factor of |b|, so its Hausdorff dimension is less than 1 when |b| is less than 1/2. Also, the argument of b is responsible for rotation of the scaled copies, and it looks like rotation favors disconnectivity… but then again, the pieces may link up again after being scaled-rotated a few times, so the story is not a simple one.

The set of bases b for which the complex Cantor set is connected is a Mandelbrot-like set introduced by Barnsley and Harrington in 1985. It has the symmetries of a rectangle, and features a prominent hole centered at 0 (discussed above). But it actually has infinitely many holes, with “exotic” holes being tiny islands of disconnectedness, surrounded by connected sets. This was proved in 2014 by Calegari, Koch, Walker, so I refer to Danny Calegari’s post for an explanation and more pictures (much better looking than mine).

Besides “disconnected to connected”, there is another phase transition: empty interior to nonempty interior. Hare and Sidorov proved that the complex Cantor set has nonempty interior when  ${|b| > 2^{-1/4}}$; their path to the proof involved a MathOverflow question The Minkowski sum of two curves which is of its own interest.

The pictures were made with a straightforward Python script, using expansions of length 20:

import matplotlib.pyplot as plt
import numpy as np
import itertools
n = 20
b = 0.6 + 0.3j
c = np.array(list(itertools.product([0, 1], repeat=n)))
w = np.array([b**k for k in range(n)]).reshape(1, -1)
z = np.sum(c*w, axis=1)
plt.plot(np.real(z), np.imag(z), '.', ms=4)
plt.axis('equal')
plt.show()

Since we are looking at partial sums anyway, it’s not necessary to limit ourselves to |b| being less than 1. Replacing b by 1/b only scales the picture, so the place to look for new kinds of pictures is the unit circle. Let’s try a 7th root of unity:

The set above looks sparse because many points overlap. Let’s change b to something non-algebraic:

What’s with the cusps along the perimeter?

## Vectorization of integration

A frequently asked question about numerical integration is: how to speed up the process of computing ${\int_a^b f(x,p)\,dx}$ for many values of parameter ${p}$? Running an explicit for over the values of ${p}$ seems slow… is there a way to vectorize this, performing integration on an array of functions at once (which in reality means, pushing the loop to a lower level language)?

Usually, the answer is some combination of “no” and “you are worrying about the wrong thing”. Integration routines are typically adaptive, meaning they pick evaluation points based on the function being integrated. Different values of the parameter will require different evaluation points to achieve the desired precision, so vectorization is out of question. This applies, for example, to quad method of SciPy integration module.

Let’s suppose we insist on vectorization and are willing to accept a non-adaptive method. What are our options, considering SciPy/NumPy? I will compare

The test case is ${\int_0^1 (p+1)x^p\,dx}$ with ${p=0,0.01,\dots,99.99}$. Theoretically all these are equal to ${1}$. But most of these functions are not analytic at ${0}$, and some aren’t even differentiable there, so their integration is not a piece of cake.

Keeping in mind that Romberg’s integration requires ${2^k}$ subintervals, let’s use ${1024}$ equal subintervals, hence ${1025}$ equally spaced sample points. Here is the vectorized evaluation of all our functions at these points, resulting in a 2D array “values”:

import numpy as np
import scipy.integrate as si
x = np.linspace(0, 1, 1025).reshape(1, -1)
dx = x[0,1] - x[0,0]
p = np.arange(0, 100, 0.01).reshape(-1, 1)
values = (p+1)*x**p

This computation took 0.799 seconds on my AWS instance (t2.nano). Putting the results of evaluation together takes less time:

• Trapezoidal rule np.trapz(values, dx=dx) took 0.144 seconds and returned values ranging from 0.99955 to 1.00080.
• Simpson’s rule si.simps(values, dx=dx) took 0.113 seconds and returned values ranging from 0.99970 to 1.0000005.
• Romberg quadrature si.romb(values, dx=dx) was fastest at 0.0414 seconds, and also most accurate, with output between 0.99973 and 1.000000005.

Conclusions so far:

• SciPy’s implementation of Romberg quadrature is surprisingly fast, considering that this algorithm is the result of repeated Richardson extrapolation of the trapezoidal rule (and Simpson’s rule is just the result of the first extrapolation step). It’d be nice to backport whatever makes Romberg so effective back to Simpson’s rule.
• The underestimation errors 0.999… are greatest when ${p}$ is near zero, so the integrand is very nonsmooth at ${0}$. The lack of smoothness renders Richardson extrapolation ineffective, hence all three rules have about the same error here.
• The overestimation errors are greatest when ${p}$ is large. The function is pretty smooth then, so upgrading from trapezoidal to Simpson to Romberg brings substantial improvements.

All of this is nice, but the fact remains that non-vectorized adaptive integration is both faster and much more accurate. The following loop with quad, which uses adaptive Clenshaw-Curtis quadrature, runs in 0.616 seconds (less than it took to evaluate our functions on a grid) and returns values between 0.99999999995 and 1.00000000007. Better to use exponential notation here: ${1-5\cdot 10^{-11} }$ and ${1+7\cdot 10^{-11}}$.

funcs = [lambda x, p=p_: (p+1)*x**p for p_ in np.arange(0, 100, 0.01)]
result = [si.quad(fun, 0, 1)[0] for fun in funcs]

An adaptive algorithm shines when ${p}$ is small, by placing more evaluation points near zero. It controls both over- and under- estimates equally well, continuing to add points until the required precision is reached.

The last hope of non-adaptive methods is Gaussian quadrature, which is implemented in SciPy as fixed_quad (“fixed” referring to the number of evaluation points used). With 300 evaluation points, the integration takes 0.263 seconds (excluding the computation of nodes and weights, which can be cached) and the results are between ${1-2\cdot 10^{-12}}$ and ${1+2\cdot 10^{-7}}$. This is twice as fast as a loop with quad, and more accurate for large ${p}$ — but sadly, not so accurate for small ${p}$. As said before, ${x^p}$ with ${p}$ near zero is really a showcase for adaptive methods.