Continuity and diameters of connected sets

The definition of uniform continuity (if it’s done right) can be phrased as: {f\colon X\to Y} is uniformly continuous if there exists a function {\omega\colon (0,\infty)\to (0,\infty)}, with {\omega(0+)=0}, such that {\textrm{diam}\, f(E)\le \omega (\textrm{diam}\, E)} for every set {E\subset X}. Indeed, when {E} is a two-point set {\{a,b\}} this is the same as {|f(a)-f(b)|\le \omega(|a-b|)}, the modulus of continuity. Allowing general sets {E} does not change anything, since the diameter is determined by two-point subsets.

Does it make a difference if we ask for {\textrm{diam}\, f(E)\le \omega (\textrm{diam}\, E)} only for connected sets {E}? For functions defined on the real line, or on an interval of the line, there is no difference: we can just consider the intervals {[a,b]} and obtain

{|f(a)-f(b)|\le \textrm{diam}\, f([a,b]) \le \omega(|a-b|)}

as before.

However, the situation does change for maps defined on a non-convex domain. Consider the principal branch of square root, {f(z)=\sqrt{z}}, defined on the slit plane {G=\mathbb C\setminus (-\infty, 0]}.

sqrt
Conformal map of a slit domain is not uniformly continuous

This function is continuous on {G} but not uniformly continuous, since {f(-1 \pm i y) \to \pm i } as {y\to 0+}. Yet, it satisfies {\textrm{diam}\, f(E)\le \omega(\textrm{diam}\, E)} for connected subsets {E\subset G}, where one can take {\omega(\delta)=2\sqrt{\delta}}. I won’t do the estimates; let’s just note that although the points {-1 \pm i y} are close to each other, any connected subset of {G} containing both of them has diameter greater than 1.

Capture3
These points are far apart with respect to the inner diameter metric

In a way, this is still uniform continuity, just with respect to a different metric. Given a metric space {(X,d)}, one can define inner diameter metric {\rho} on {X} by letting {\rho(a,b)} be the infimum of diameters of connected sets that contain both {a} and {b}. This is indeed a metric if the space {X} is reasonable enough (i.e., any two points are contained in some bounded connected set). On a convex subset of {\mathbb R^n}, the inner diameter metric coincides with the Euclidean metric {d_2}.

One might think that the equality {\rho=d_e} should imply that the domain is convex, but this is not so. Indeed, consider the union of three quadrants on a plane, say {A = \{(x,y) \colon x > 0\text{ or }y > 0\}}. Any two points of {A} can be connected by going up from whichever is lower, and then moving horizontally. The diameter of a right triangle is equal to its hypotenuse, which is the Euclidean distance between the points we started with.

Capture2
A non-convex domain where inner diameter metric is the same as Euclidean

Inner diameter metric comes up (often implicitly) in complex analysis. By the Riemann mapping theorem, every simply connected domain {G\subset \mathbb C}, other than {\mathbb C} itself, admits a conformal map {f\colon G\to \mathbb D} onto the unit disk {D}. This map need not be uniformly continuous in the Euclidean metric (the slit plane is one example), but it is uniformly continuous with respect to the inner diameter metric on {G}.

Furthermore, by normalizing the situation in a natural way (say, {G \supset \mathbb D} and {f(0)=0}), one can obtain a uniform modulus of continuity for all conformal maps {f} onto the unit disk, whatever the domain is. This uniform modulus of continuity can be taken of the form {\omega(\delta) = C\sqrt{\delta}} for some universal constant {C}. Informally speaking, this means that a slit domain is the worst that can happen to the continuity of a conformal map. This fact isn’t often mentioned in complex analysis books. A proof can be found in the book Conformally Invariant Processes in the Plane by Gregory Lawler, Proposition 3.85. A more elementary proof, with a rougher estimate for the modulus of continuity, is on page 15 of lecture notes by Mario Bonk.

Green envelopes

Green functions of the Laplacian are bounded in one dimensional domains, a reflection of the fact that square integrability of the derivative implies boundedness in one dimension but not in two or more. So, it’s possible to say something about the envelope {\sup_y g(x,y)} obtained by taking the supremum over source location y.

In order to satisfy the distributional equation {-\frac{d^2}{dx^2}g(x,y) = \delta_{x=y}}, Green functions have a corner at y, where the derivative jumps down by 1. Combining this property with the boundary conditions identifies the function. Let’s use the interval (0,1) as the domain.

The Dirichlet boundary condition {g(0,y) = 0 = g(1,y)} leads to the formula {g(x,y)=\min(x,y)-xy}:

green10

The upper envelope is the parabola {x(1-x)}. This fact becomes more evident if we use more functions.

green200
Dirichlet envelope

Mixed Dirichlet-Neumann conditions {g(0,y) = 0 = g_x'(1,y)} yield an even simpler formula {g(x,y) = \min(x,y)}. The envelope is boring in this case:

greendn50
Dirichlet-Neumann envelope

The pure Neumann condition {g_x'(0,y) = 0 = g_x'(1,y)} requires an adjustment to the definition of Green function, since it is incompatible with {-\frac{d^2}{dx^2}g(x,y) = \delta_{x=y}}. Instead, let’s ask for {-\frac{d^2}{dx^2}g(x,y) = \delta_{x=y} - c} where the constant c is the reciprocal of the length of the interval. This ensures that the integral of the second derivative is zero. Also, it is more convenient to use the interval (-1,1) here. The Neumann Green function for this interval is {g(x,y) = (x^2+y^2)/4 - |x-y|/2}, up to an additive constant.

greenn50large
Neumann envelope

So the envelope is also determined up to a constant. On the picture above it is {x^2/2}.

Finally, let’s consider the periodic condition {g(0,y) = g(1,y)}, {g_x'(0,y) = g_x'(1,y)}. As with the Neumann condition, the definition is adjusted to allow the second derivative integral be zero. Using the interval (-1, 1) we get {g(x,y) = (1+x\wedge y)(1-x\vee y)/2 + (x^2+y^2)/4} using the notation {x\wedge y = \min(x,y)} and {x\vee y = \max(x,y)} for brevity. Note that the first term, {(1+x\wedge y)(1-x\vee y)/2}, is the Dirichlet Green function for this interval. The additional quadratic term brings the integral of the second derivative to zero, which ensures the periodic condition for the first derivative. And the upper envelope is, up to a constant…

greenp50
Periodic envelope

a constant. Not exciting but makes perfect sense, considering that we are dealing with the Green function of a circle, where moving the source amounts to shifting the function.

Recursive randomness of reals: summing a random decreasing sequence

In a comment to Recursive randomness of integers Rahul pointed out a continuous version of the same problem: pick {x_0} uniformly in {[0,1]}, then {x_1} uniformly in {[0,x_0]}, then {x_2} uniformly in {[0, x_1]}, etc. What is the distribution of the sum {S=\sum_{n=0}^\infty x_n}?

The continuous version turns out to be easier to analyse. To begin with, it’s equivalent to picking uniformly distributed, independent {y_n\in [0,1]} and letting {x_n = y_0y_1\cdots y_n}. Then the sum is

{\displaystyle y_0+y_0y_1 + y_0y_1y_2 + \cdots }

which can be written as

{\displaystyle y_0(1+y_1(1+y_2(1+\cdots )))}

So, {S} is a stationary point of the random process {X\mapsto (X+1)U} where {U} is uniformly distributed in {[0,1]}. Simply put, {S} and {(S+1)U} have the same distribution. This yields the value of {E[S]} in a much simpler way than in the previous post:

{E[S]=E[(S+1)U] = (E[S] + 1) E[U] = (E[S] + 1)/2}

hence {E[S]=1}.

We also get an equation for the cumulative distribution function {C(t) = P[S\le t]}. Indeed,

{\displaystyle P[S\le t] = P[(S+1)U \le t] = P[S \le t/U-1]}

The latter probability is {\int_0^1 P[S\le t/u-1]\,du = \int_0^1 C(t/u-1)\,du}. Conclusion: {C(t) = \int_0^1 C(t/u-1)\,du}. Differentiate to get an equation for the probability density function {p(t)}, namely {p(t) = \int_0^1 p(t/u-1)\,du/u}. It’s convenient to change the variable of integration to {v = t/u-1}, which leads to

{\displaystyle p(t) = \int_{t-1}^\infty p(v)\,\frac{dv}{v+1}}

Another differentiation turns the integral equation into a delay differential equation,

{\displaystyle p'(t) = - \frac{p(t-1)}{t}}

Looks pretty simple, doesn’t it? Since the density is zero for negative arguments, it is constant on {[0,1]}. This constant, which I’ll denote {\gamma}, is {\int_{0}^\infty p(v)\,\frac{dv}{v+1}}, or simply {E[1/(S+1)]}. I couldn’t get an analytic formula for {\gamma}. My attempt was {E[1/(S+1)] = \sum_{n=0}^\infty (-1)^n M_n} where {M_n=E[S^n]} are the moments of {S}. The moments can be computed recursively using {E[S^n] = E[(S+1)^n]E[U^n]}, which yields

{\displaystyle M_n=\frac{1}{n} \sum_{k=0}^{n-1} \binom{n}{k}M_k}

The first few moments, starting with {M_0}, are 1, 1, 3/2, 17/6, 19/3, 81/5, 8351/80… Unfortunately the series {\sum_{n=0}^\infty (-1)^n M_n} diverges, so this approach seems doomed. Numerically {\gamma \approx 0.5614} which is not far from the Euler-Mascheroni constant, hence the choice of notation.

On the interval (1,2) we have {p'(t) = -\gamma/t}, hence

{p(t) = \gamma(1-\log t)} for {1 \le t \le 2}.

The DDE gets harder to integrate after that… on the interval {[2,3]} the solution already involves the dilogarithm (Spence’s function):

{\displaystyle p(t) = \gamma(1+\pi^2/12 - \log t + \log(t-1)\log t + \mathrm{Spence}\,(t))}

following SciPy’s convention for Spence. This is as far as I went… but here is an experimental confirmation of the formulas obtained so far (link to full size image).

match03
Histogram of 10 million samples and the pdf (red) plotted on [0,3]

To generate a sample from distribution S, I begin with a bunch of zeros and repeat “add 1, multiply by U[0,1]” many times. That’s it.

import numpy as np
import matplotlib.pyplot as plt
trials = 10000000
terms = 10000
x = np.zeros(shape=(trials,))
for _ in range(terms):
    np.multiply(x+1, np.random.uniform(size=(trials,)), out=x)
_ = plt.hist(x, bins=1000, normed=True)
plt.show()

I still want to know the exact value of {\gamma}… after all, it’s also the probability that the sum of our random decreasing sequence is less than 1.

Update

The constant I called “{\gamma}” is in fact {\exp(-\gamma)} where {\gamma} is indeed Euler’s constant… This is what I learned from the Inverse Symbolic Calculator after solving the DDE (with initial value 1) numerically, and calculating the integral of the solution. From there, it did not take long to find that

Oh well. At least I practiced solving delay differential equations in Python. There is no built-in method in SciPy for that, and although there are some modules for DDE out there, I decided to roll my own. The logic is straightforward: solve the ODE on an interval of length 1, then build an interpolating spline out of the numeric solution and use it as the right hand side in the ODE, repeat. I used Romberg’s method for integrating the solution; the integration is done separately on each interval [k, k+1] because of the lack of smoothness at the integers.

import numpy as np
from scipy.integrate import odeint, romb
from scipy.interpolate import interp1d
numpoints = 2**12 + 1
solution = [lambda x: 1]
integrals = [1]
for k in range(1, 15):
    y0 = solution[k-1](k)
    t = np.linspace(k, k+1, numpoints)
    rhs = lambda y, x: -solution[k-1](np.clip(x-1, k-1, k))/x
    y = odeint(rhs, y0, t, atol=1e-15, rtol=1e-13).squeeze()
    solution.append(interp1d(t, y, kind='cubic', assume_sorted=True))
    integrals.append(romb(y, dx=1/(numpoints-1)))
total_integral = sum(integrals)
print("{:.15f}".format(1/total_integral))

As a byproduct, the program found the probabilities of the random sum being in each integer interval:

  • 56.15% in [0,1]
  • 34.46% in [1,2]
  • 8.19% in [2,3]
  • 1.1% in [3,4]
  • 0.1% in [4,5]
  • less than 0.01% chance of being greater than 5

Recursive randomness of integers

Entering a string such as “random number 0 to 7” into Google search brings up a neat random number generator. For now, it supports only uniform probability distributions over integers. That’s still enough to play a little game.

Pick a positive number, such as 7. Then pick a number at random between 0 and 7 (integers, with equal probability); for example, 5. Then pick a number between 0 and 5, perhaps 2… repeat indefinitely. When we reach 0, the game becomes really boring, so that is a good place to stop. Ignoring the initial non-random number, we got a random non-increasing sequence such as 5, 2, 1, 1, 0. The sum of this one is 9… how are these sums distributed?

Let’s call the initial number A and the sum S. The simplest case is A=1, when S is the number of returns to 1 until the process hits 0. Since each return to 1 has probability 1/2, we get the following geometric distribution

n1t2m

Sum Probability
0 1/2
1 1/4
2 1/8
3 1/16
k 1/2k+1

When starting with A=2, things are already more complicated: for one thing, the probability mass function is no longer decreasing, with P[S=2] being greater than P[S=1]. The histogram shows the counts obtained after 2,000,000 trials with A=2.

n2t2m

The probability mass function is still not too hard to compute: let’s say b is the number of times the process arrives at 2, then the sum is 2b + the result with A=1. So we end up convolving two geometric distributions, one of which is supported on even integers: hence the bias toward even sums.

Sum Probability
0 1/3
1 1/6
2 5/36
3 7/72
k ((4/3)[k/2]+1-1)/2k

For large k, the ratio P[S=k+2]/P[s=k] is asymptotic to (4/3)/4 = 1/3, which means that the tail of the distribution is approximately geometric with the ratio of {1/\sqrt{3}}.

I did not feel like computing exact distribution for larger A, resorting to simulations. Here is A=10 (ignore the little bump at the end, an artifact of truncation): n10t2m

There are three distinct features: P[S=0] is much higher than the rest; the distribution is flat (with a bias toward even, which is diminishing) until about S=n, and after that it looks geometric. Let’s see what we can say for a general starting value A.

Perhaps surprisingly, the expected value E[S] is exactly A. To see this, consider that we are dealing with a Markov chain with states 0,1,…,A. The transition probabilities from n to any number 0,…,n are 1/(n+1). Ignoring the terminal state 0, which does not contribute to the sum, we get the following kind of transition matrix (the case A=4 shown):

{\displaystyle M = \begin{pmatrix}1/2 & 0 & 0 & 0 \\ 1/3 & 1/3 & 0 & 0 \\ 1/4 & 1/4 & 1/4 & 0 \\ 1/5 & 1/5 & 1/5 & 1/5\end{pmatrix} }

The initial state is a vector such as {v = (0,0,0,1)}. So {vM^j} is the state after j steps. The expected value contributed by the j-th step is {vM^jw} where {w = (1,2,3,4)^T} is the weight vector. So, the expected value of the sum is

{\displaystyle \sum_{j=1}^\infty vM^jw = v\left(\sum_{j=1}^\infty M^j\right)w = vM(I-M)^{-1}w}

It turns out that the matrix {M(I-M)^{-1}} has a simple form, strongly resembling M itself.

{\displaystyle M(I-M)^{-1} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1/2 & 0 & 0 \\ 1 & 1/2 & 1/3 & 0 \\ 1 & 1/2 & 1/3 & 1/4 \end{pmatrix} }

Left multiplication by v extracts the bottom row of this matrix, and we are left with a dot product of the form {(1,1/2,1/3,1/4)\cdot (1,2,3,4) = 1 + 1 + 1 + 1 = 4 }. Neat.

What else can we say? The median is less than A, which is no surprise given the long tail on the right. Also, P[S=0] = 1/(A+1) since the only way to have zero sum is to hit 0 at once. A more interesting question is: what is the limit of the distribution of T = S/A as A tends to infinity? Here is the histogram of 2,000,000 trials with A=50.

n50t2m

It looks like the distribution of T tends to a limit, which has constant density until 1 (so, until A before rescaling) and decays exponentially after that. Writing the supposed probability density function as {f(t) = c} for {0\le t\le 1}, {f(t) = c\exp(k(1-t))} for {t > 1}, and using the fact that the expected value of T is 1, we arrive at {c = 2-\sqrt{2} \approx 0.586} and {k=\sqrt{2}}. This is a pretty good approximation in some aspects: the median of this distribution is {1/(2c)}, suggesting that the median of S is around {n/(4-2\sqrt{2})} which is in reasonable agreement with experiment. But the histogram for A=1000 still has a significant deviation from the exponential curve, indicating that the supposedly geometric part of T isn’t really geometric:

fit1000
The distribution of S with A=1000 versus the constant-geometric approximation

One can express S as a sum of several independent geometric random variables, but the number of summands grows quadratically in A, and I didn’t get any useful asymptotics from this. What is the true limiting distribution of S/A, if it’s not the red curve above?

Very fractional geometric progressions with an integer ratio

The geometric progression 1/3, 2/3, 4/3, 8/3, 16/3,… is notable for being dyadic (ratio 2) and staying away from integers as much as possible (distance 1/3 between this progression and the set of integers; no other dyadic progression stays further away from integers). This property is occasionally useful: by taking the union of the dyadic partition of an interval with its shift by 1/3, one gets a system of intervals that comfortably covers every point: for every point x and every (small) radius r there is an interval of size r, in which x is near the middle.

dyadic
Endpoints of the intervals in one group are away from the endpoints of the other

It’s easy to see that for any real number x, the distance between the progression {x, 2x, 4x, 8x, …} and the set of integers cannot be greater than 1/3. Indeed, since the integer part of x does not matter, it suffices to consider x between 0 and 1. The values between 0 and 1/3 lose immediately; the values between 1/3 and 1/2 lose after being multiplied by 2. And since x and 1-x yield the same distance, we are done.

Let’s find the most fractional progressions with other integer ratios r. When r is odd, the solution is obvious: starting with 1/2 keeps all the terms at half-integers, so the distance 1/2 is achieved. When r is even, say r = 2k, the best starting value is x = k/(2k+1), which achieves the distance x since rx = k-x. The values between 0 and k/(2k+1) are obviously worse, and those between k/(2k+1) and 1/2 become worse after being multiplied by r: they are mapped to the interval between k-x and k.

The problem is solved! But what if… x is irrational?

Returning to ratio r=2, it is clear that 1/3 is no longer attainable. The base-2 expansion of x cannot be 010101… as that would be periodic. So it must contain either 00 or 11 somewhere. Either of those will bring a dyadic multiple of x within distance less than 0.001111… (base 2) of an integer, that is distance 1/4.

The goal is to construct x so that its binary expansion is as balanced between 0 and 1 as possible, but without being periodic. The Thue-Morse constant does exactly this. It’s constructed by starting with 0 and then adding the complement of the sequence constructed so far: x = .0 1 10 1001 10010110 … which is approximately 0.412. The closest the dyadic geometric progression starting with x comes to an integer is 2x, which has distance about 0.175. The Wikipedia article links to the survey The ubiquitous Prouhet-Thue-Morse sequence by Allouche and Shallit, in which Corollary 2 implies that no other irrational number has a dyadic progression with a greater distance from integers, provided that this distance is attained. I have not been able to sort out the case in which the distance from a progression to the integers is not attained, but it seems very likely that Thue-Morse remains on top.

What about other ratios? When the ratio r is even, the situation is essentially the same as for r=2, for the following reason. In base r there are two digits nearest (r-1)/2, for example 4 and 5 in base 10. Using these digits in the Thue-Morse sequence, we get a strong candidate for the most fractional progression with ratio r: for example, 0.455454455445… in base 10, with the distance of about 0.445. Using any other digit loses the game at once: for example, having 3 in the decimal expansion implies that some multiple of 10 is within less than 0.39999… =  0.4 of an integer.

When the ratio is odd, there are three digits that could conceivably be used in the extremal x: namely, (r-1)/2 and its two neighbors. If the central digit (r-1)/2 is never used, we are back to the Thue-Morse pattern, such as  x = 0.0220200220020220… in base 3 (an element of the standard Cantor set, by the way). But this is an unspectacular achievement, with the distance of about 0.0852. One can do better by starting with 1/2 = 0.1111111… and sprinkling this ternary expansions with 0s or 2s in some aperiodic way, doing so very infrequently. By making the runs of 1s extremely long, we get the distance arbitrarily close to 1 – 0.2111111… base 3, which is simply 1/2 – 1/3 = 1/6.

So it seems that for irrational geometric progressions with an odd ratio r, the distance to integers can be arbitrarily close to the number 1/2 – 1/r, but there is no progression achieving this value.

The Kolakoski-Cantor set

A 0-1 sequence can be interpreted as a point in the interval [0,1]. But this makes the long-term behavior of the sequence practically invisible due to limited resolution of our screens (and eyes). To make it visible, we can also plot the points obtained by shifting the binary sequence to the left (Bernoulli shift, which also goes by many other names). The resulting orbit  is often dense in the interval, which doesn’t really help us visualize any patterns. But sometimes we get an interesting complex structure.

kol_set_small
The Kolakoski-Cantor set, KC

The vertical axis here is the time parameter, the number of dyadic shifts. The 0-1 sequence being visualized is the Kolakoski sequence in its binary form, with 0 and 1 instead of 1 and 2. By definition, the n-th run of equal digits in this sequence has length {x_n+1}. In particular, 000 and 111 never occur, which contributes to the blank spots near 0 and 1.

Although the sequence is not periodic, the set is quite stable in time; it does not make a visible difference whether one plots the first 10,000 shifts, or 10,000,000. The apparent symmetry about 1/2 is related to the open problem of whether the Kolakoski sequence is mirror invariant, meaning that together with any finite word (such as 0010) it also contains its complement (that would be 1101).

There are infinitely many forbidden words apart from 000 and 111 (and the words containing those). For example, 01010 cannot occur because it has 3 consecutive runs of length 1, which implies having 000 elsewhere in the sequence. For the same reason, 001100 is forbidden. This goes on forever: 00100100 is forbidden because it implies having 10101, etc.

The number of distinct words of length n in the Kolakoski sequence is bounded by a power of n (see F. M. Dekking, What is the long range order in the Kolakoski sequence?). Hence, the set pictured above is covered by {O(n^p)} intervals of length {2^{-n}}, which implies it (and even its closure) is zero-dimensional in any fractal sense (has Minkowski dimension 0).

The set KC apparently does not have any isolated points; this is also an open problem, of recurrence (whether every word that appears in the sequence has to appear infinitely many times). Assuming this is so, the closure of the orbit is a totally disconnected compact set without isolated points, i.e., a Cantor set. It is not self-similar (not surprising, given it’s zero-dimensional), but its relation to the Bernoulli shift implies a structure resembling self-similarity:

kolhalf
KC is covered by two copies scaled by 1/2

Applying the transformations {x\mapsto x/2} and {x\mapsto (1+x)/2} yields two disjoint smaller copies that cover the original set, but with some spare parts left. The leftover bits exist because not every word in the sequence can be preceded by both 0 and 1.

kolx2
KC covered by two copies scaled by 2

Applying the transformations {x\mapsto 2x} and {x\mapsto 2x-1} yields two larger copies that cover the original set. There are no extra parts within the interval [0,1] but there is an overlap between the two copies.

The number {c = \inf KC\approx 0.146778684766479} appears several times in the structure of the set: for instance, the central gap is {((1-c)/2, (1+c)/2)}, the second-largest gap on the left has the left endpoint {(1-c)/4}, etc. The Inverse Symbolic Calculator has not found anything about this number. Its binary expansion begins with 0.001 001 011 001 001 101 001 001 101 100… which one can recognize as the smallest binary number that can be written without doing anything three times in a row. (Can’t have 000; also can’t have 001 three times in a row; and 001 010 is not allowed because it contains 01010, three runs of length 1. Hence, the number begins with 001 001 011.) This number is obviously irrational, but other than that…

In conclusion, the Python code used to plot KC.

import numpy as np
import matplotlib.pyplot as plt
n = 1000000
a = np.zeros(n, dtype=int)
j = 0                  
same = False  
for i in range(1, n):
    if same:
        a[i] = a[i-1]    
        same = False
    else:
        a[i] = 1 - a[i-1]
        j += 1            
        same = bool(a[j])
v = np.array([1/2**k for k in range(60, 0, -1)])
b = np.convolve(a, v, mode='valid')
plt.plot(b, np.arange(np.size(b)), '.', ms=2)
plt.show()

Pisot constant beyond 0.843

In a 1946 paper Charles Pisot proved a theorem involving a curious constant {\gamma_0= 0.843\dots}. It can be defined as follows:

{\gamma_0= \sup\{r \colon \exists } monic polynomial {p} such that {|p(e^z)| \le 1} whenever {|z|\le r \}}

Equivalently, {\gamma_0} is determined by the requirement that the set {\{e^z\colon |z|\le \gamma_0\}} have logarithmic capacity 1; this won’t be used here. The theorem is stated below, although this post is really about the constant.

Theorem: If an entire function takes integer values at nonnegative integers and is {O(e^{\gamma |z|})} for some {\gamma < \gamma_0}, then it is a finite linear combination of terms of the form {z^n \alpha^z}, where each {\alpha } is an algebraic integer.

The value of {\gamma_0} is best possible; thus, in some sense Pisot’s theorem completed a line of investigation that began with a 1915 theorem by Pólya which had {\log 2} in place of {\gamma_0}, and where the conclusion was that {f} is a polynomial. (Informally speaking, Pólya proved that {2^z} is the “smallest” entire-function that is integer-valued on nonnegative integers.)

Although the constant {\gamma_0} was mentioned in later literature (here, here, and here), no further digits of it have been stated anywhere, as far as I know. So, let it be known that the decimal expansion of {\gamma_0} begins with 0.84383.


A lower bound on {\gamma_0} can be obtained by constructing a monic polynomial that is bounded by 1 on the set {E(r) = \{e^z \colon |z|\le r \}}. Here is E(0.843):

er

It looks pretty round, except for that flat part on the left. In fact, E(0.82) is covered by a disk of unit radius centered at 1.3, which means that the choice {p(z) = z-1.3} shows {\gamma_0 > 0.82}.

082disk
p(z) = z-1.3 gives lower bound 0.82

How to get an upper bound on {\gamma_0}? Turns out, it suffices to exhibit a monic polynomial {q} that has all zeros in {E(r)} and satisfies {|q|>1} on the boundary of {E(r)}. The existence of such {q} shows {\gamma_0 < r}. Indeed, suppose that {p} is monic and {|p|\le 1} on {E(r)}. Consider the function {\displaystyle u(z) = \frac{\log|p(z)|}{\deg p} - \frac{\log|q(z)|}{\deg q}}. By construction {u<0} on the boundary of {E(r)}. Also, {u} is subharmonic in its complement, including {\infty}, where the singularities of both logarithms cancel out, leaving {u(\infty)=0}. This contradicts the maximum principle for subharmonic functions, according to which {u(\infty)} cannot exceed the maximum of {u} on the boundary.

The choice of {q(z) = z-1.42} works for {r=0.89}.

089disk

So we have {\gamma_0} boxed between 0.82 and 0.89; how to get more precise bounds? I don’t know how Pisot achieved the precision of 0.843… it’s possible that he strategically picked some linear and quadratic factors, raised them to variable integer powers and optimized the latter. Today it is too tempting to throw some optimization routine on the problem and let it run for a while.

But what to optimize? The straightforward approach is to minimize the maximum of {|p(e^z)|} on the circle {|z|=r}, approximated by sampling the function at a sufficiently fine uniform grid {\{z_k\}} and picking the maximal value. This works… unspectacularly. One problem is that the objective function is non-differentiable. Another is that taking maximum throws out a lot of information: we are not using the values at other sample points to better direct the search. After running optimization for days, trying different optimization methods, tolerance options, degrees of the polynomial, and starting values, I was not happy with the results…

Turns out, the optimization is much more effective if one minimizes the variance of the set {\{|p(\exp(z_k))|^2\}}. Now we are minimizing a polynomial function of {p(\exp(z_k)}, which pushes them toward having the same absolute value — the behavior that we want the polynomial to have. It took from seconds to minutes to produce the polynomials shown below, using BFGS method as implemented in SciPy.

As the arguments for optimization function I took the real and imaginary parts of the zeros of the polynomial. The symmetry about the real axis was enforced automatically: the polynomial was the product of quadratic terms {(z-x_k-iy_k) (z-x_k+iy_k)}. This eliminated the potentially useful option of having real zeros of odd order, but I did not feel like special-casing those.

Three digits

843
Degree 8, lower bound 0.843

Real part: 0.916, 1.186, 1.54, 1.783
Imaginary part: 0.399, 0.572, 0.502, 0.199

Here and below, only the zeros with positive imaginary part are listed (in the left-to-right order), the others being their conjugates.

844
Degree 10, upper bound 0.844

Real part: 0.878, 1.0673, 1.3626, 1.6514, 1.8277
Imaginary part: 0.3661, 0.5602, 0.6005, 0.4584, 0.171

Four digits

8438
Degree 14, lower bound 0.8438

Real part: 0.8398, 0.9358, 1.1231, 1.357, 1.5899, 1.776, 1.8788
Imaginary part: 0.3135, 0.4999 ,0.6163, 0.637, 0.553, 0.3751, 0.1326

8439
Degree 14, upper bound 0.8439

Real part: 0.8397, 0.9358, 1.1231, 1.3571, 1.5901, 1.7762, 1.879
Imaginary part: 0.3136, 0.5, 0.6164, 0.6372, 0.5531, 0.3751, 0.1326

No, I didn’t post the same picture twice. The polynomials are just that similar. But as the list of zeros shows, there are tiny differences…

Five digits

84383
Degree 20, lower bound 0.84383

Real part: 0.81527, 0.8553, 0.96028, 1.1082, 1.28274, 1.46689, 1.63723, 1.76302, 1.82066, 1.86273
Imaginary part: 0.2686, 0.42952, 0.556, 0.63835, 0.66857, 0.63906, 0.54572, 0.39701, 0.23637, 0.08842

84384
Degree 20, upper bound 0.84384

Real part: 0.81798, 0.85803, 0.95788, 1.09239, 1.25897, 1.44255, 1.61962, 1.76883, 1.86547, 1.89069
Imaginary part: 0.26631, 0.4234, 0.54324, 0.62676, 0.66903, 0.65366, 0.57719, 0.44358, 0.26486, 0.07896

Again, nearly the same polynomial works for upper and lower bounds. The fact that the absolute value of each of these polynomials is below 1 (for lower bounds) or greater than 1 (for upper bounds) can be ascertained by sampling them and using an upper estimate on the derivative; there is enough margin to trust computations with double precision.

Finally, the Python script I used. The function “obj” is getting minimized while function “values” returns the actual values of interest: the minimum and maximum of polynomial. The degree of polynomial is 2n, and the radius under consideration is r. The sample points are collected in array s. To begin with, the roots are chosen randomly. After minimization runs (inevitably, ending in a local minimum of which there are myriads), the new starting point is obtained by randomly perturbing the local minimum found. (The perturbation is smaller if minimization was particularly successful.)

import numpy as np
from scipy.optimize import minimize

def obj(r):
    rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
    p = np.prod(np.abs(s-rc)**2, axis=0)
    return np.var(p)

def values(r):
    rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
    p = np.prod(np.abs(s-rc), axis=0)
    return [np.min(p), np.max(p)]

r = 0.84384
n = 10
record = 2 
s = np.exp(r * np.exp(1j*np.arange(0, np.pi, 0.01)))
xr = np.random.uniform(0.8, 1.8, size=(n,))
xi = np.random.uniform(0, 0.7, size=(n,))
x0 = np.concatenate((xr, xi))

while True:
    res = minimize(obj, x0, method = 'BFGS')
    if res['fun'] < record:
        record = res['fun']
        print(repr(res['x']))
        print(values(res['x']))
        x0 = res['x'] + np.random.uniform(-0.001, 0.001, size=x0.shape)
    else:
        x0 = res['x'] + np.random.uniform(-0.05, 0.05, size=x0.shape)