Pisot constant beyond 0.843

In a 1946 paper Charles Pisot proved a theorem involving a curious constant {\gamma_0= 0.843\dots}. It can be defined as follows:

{\gamma_0= \sup\{r \colon \exists } monic polynomial {p} such that {|p(e^z)| \le 1} whenever {|z|\le r \}}

Equivalently, {\gamma_0} is determined by the requirement that the set {\{e^z\colon |z|\le \gamma_0\}} have logarithmic capacity 1; this won’t be used here. The theorem is stated below, although this post is really about the constant.

Theorem: If an entire function takes integer values at nonnegative integers and is {O(e^{\gamma |z|})} for some {\gamma < \gamma_0}, then it is a finite linear combination of terms of the form {z^n \alpha^z}, where each {\alpha } is an algebraic integer.

The value of {\gamma_0} is best possible; thus, in some sense Pisot’s theorem completed a line of investigation that began with a 1915 theorem by Pólya which had {\log 2} in place of {\gamma_0}, and where the conclusion was that {f} is a polynomial. (Informally speaking, Pólya proved that {2^z} is the “smallest” entire-function that is integer-valued on nonnegative integers.)

Although the constant {\gamma_0} was mentioned in later literature (here, here, and here), no further digits of it have been stated anywhere, as far as I know. So, let it be known that the decimal expansion of {\gamma_0} begins with 0.84383.


A lower bound on {\gamma_0} can be obtained by constructing a monic polynomial that is bounded by 1 on the set {E(r) = \{e^z \colon |z|\le r \}}. Here is E(0.843):

er

It looks pretty round, except for that flat part on the left. In fact, E(0.82) is covered by a disk of unit radius centered at 1.3, which means that the choice {p(z) = z-1.3} shows {\gamma_0 > 0.82}.

082disk
p(z) = z-1.3 gives lower bound 0.82

How to get an upper bound on {\gamma_0}? Turns out, it suffices to exhibit a monic polynomial {q} that has all zeros in {E(r)} and satisfies {|q|>1} on the boundary of {E(r)}. The existence of such {q} shows {\gamma_0 < r}. Indeed, suppose that {p} is monic and {|p|\le 1} on {E(r)}. Consider the function {\displaystyle u(z) = \frac{\log|p(z)|}{\deg p} - \frac{\log|q(z)|}{\deg q}}. By construction {u<0} on the boundary of {E(r)}. Also, {u} is subharmonic in its complement, including {\infty}, where the singularities of both logarithms cancel out, leaving {u(\infty)=0}. This contradicts the maximum principle for subharmonic functions, according to which {u(\infty)} cannot exceed the maximum of {u} on the boundary.

The choice of {q(z) = z-1.42} works for {r=0.89}.

089disk

So we have {\gamma_0} boxed between 0.82 and 0.89; how to get more precise bounds? I don’t know how Pisot achieved the precision of 0.843… it’s possible that he strategically picked some linear and quadratic factors, raised them to variable integer powers and optimized the latter. Today it is too tempting to throw some optimization routine on the problem and let it run for a while.

But what to optimize? The straightforward approach is to minimize the maximum of {|p(e^z)|} on the circle {|z|=r}, approximated by sampling the function at a sufficiently fine uniform grid {\{z_k\}} and picking the maximal value. This works… unspectacularly. One problem is that the objective function is non-differentiable. Another is that taking maximum throws out a lot of information: we are not using the values at other sample points to better direct the search. After running optimization for days, trying different optimization methods, tolerance options, degrees of the polynomial, and starting values, I was not happy with the results…

Turns out, the optimization is much more effective if one minimizes the variance of the set {\{|p(\exp(z_k))|^2\}}. Now we are minimizing a polynomial function of {p(\exp(z_k)}, which pushes them toward having the same absolute value — the behavior that we want the polynomial to have. It took from seconds to minutes to produce the polynomials shown below, using BFGS method as implemented in SciPy.

As the arguments for optimization function I took the real and imaginary parts of the zeros of the polynomial. The symmetry about the real axis was enforced automatically: the polynomial was the product of quadratic terms {(z-x_k-iy_k) (z-x_k+iy_k)}. This eliminated the potentially useful option of having real zeros of odd order, but I did not feel like special-casing those.

Three digits

843
Degree 8, lower bound 0.843

Real part: 0.916, 1.186, 1.54, 1.783
Imaginary part: 0.399, 0.572, 0.502, 0.199

Here and below, only the zeros with positive imaginary part are listed (in the left-to-right order), the others being their conjugates.

844
Degree 10, upper bound 0.844

Real part: 0.878, 1.0673, 1.3626, 1.6514, 1.8277
Imaginary part: 0.3661, 0.5602, 0.6005, 0.4584, 0.171

Four digits

8438
Degree 14, lower bound 0.8438

Real part: 0.8398, 0.9358, 1.1231, 1.357, 1.5899, 1.776, 1.8788
Imaginary part: 0.3135, 0.4999 ,0.6163, 0.637, 0.553, 0.3751, 0.1326

8439
Degree 14, upper bound 0.8439

Real part: 0.8397, 0.9358, 1.1231, 1.3571, 1.5901, 1.7762, 1.879
Imaginary part: 0.3136, 0.5, 0.6164, 0.6372, 0.5531, 0.3751, 0.1326

No, I didn’t post the same picture twice. The polynomials are just that similar. But as the list of zeros shows, there are tiny differences…

Five digits

84383
Degree 20, lower bound 0.84383

Real part: 0.81527, 0.8553, 0.96028, 1.1082, 1.28274, 1.46689, 1.63723, 1.76302, 1.82066, 1.86273
Imaginary part: 0.2686, 0.42952, 0.556, 0.63835, 0.66857, 0.63906, 0.54572, 0.39701, 0.23637, 0.08842

84384
Degree 20, upper bound 0.84384

Real part: 0.81798, 0.85803, 0.95788, 1.09239, 1.25897, 1.44255, 1.61962, 1.76883, 1.86547, 1.89069
Imaginary part: 0.26631, 0.4234, 0.54324, 0.62676, 0.66903, 0.65366, 0.57719, 0.44358, 0.26486, 0.07896

Again, nearly the same polynomial works for upper and lower bounds. The fact that the absolute value of each of these polynomials is below 1 (for lower bounds) or greater than 1 (for upper bounds) can be ascertained by sampling them and using an upper estimate on the derivative; there is enough margin to trust computations with double precision.

Finally, the Python script I used. The function “obj” is getting minimized while function “values” returns the actual values of interest: the minimum and maximum of polynomial. The degree of polynomial is 2n, and the radius under consideration is r. The sample points are collected in array s. To begin with, the roots are chosen randomly. After minimization runs (inevitably, ending in a local minimum of which there are myriads), the new starting point is obtained by randomly perturbing the local minimum found. (The perturbation is smaller if minimization was particularly successful.)

import numpy as np
from scipy.optimize import minimize

def obj(r):
    rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
    p = np.prod(np.abs(s-rc)**2, axis=0)
    return np.var(p)

def values(r):
    rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
    p = np.prod(np.abs(s-rc), axis=0)
    return [np.min(p), np.max(p)]

r = 0.84384
n = 10
record = 2 
s = np.exp(r * np.exp(1j*np.arange(0, np.pi, 0.01)))
xr = np.random.uniform(0.8, 1.8, size=(n,))
xi = np.random.uniform(0, 0.7, size=(n,))
x0 = np.concatenate((xr, xi))

while True:
    res = minimize(obj, x0, method = 'BFGS')
    if res['fun'] < record:
        record = res['fun']
        print(repr(res['x']))
        print(values(res['x']))
        x0 = res['x'] + np.random.uniform(-0.001, 0.001, size=x0.shape)
    else:
        x0 = res['x'] + np.random.uniform(-0.05, 0.05, size=x0.shape)

 

Using a paraboloid to cover points with a disk

Find the equation of tangent line to parabola {y=x^2}borrring calculus drill.

Okay. Draw two tangent lines to the parabola, then. Where do they intersect?

Two tangent lines
Two tangent lines

If the points of tangency are {a} and {b}, then the tangent lines are
{y=2a(x-a)+a^2} and {y=2b(x-b)+b^2}. Equate and solve:

\displaystyle    2a(x-a)+a^2 = 2b(x-b)+b^2 \implies x = \frac{a+b}{2}

Neat! The {x}-coordinate of the intersection point is midway between {a} and {b}.

What does the {y}-coordinate of the intersection tell us? It simplifies to

\displaystyle    2a(b-a)/2+a^2 = ab

the geometric meaning of which is not immediately clear. But maybe we should look at the vertical distance from intersection to the parabola itself. That would be

\displaystyle    x^2 - y = \left(\frac{a+b}{2}\right)^2 -ab = \left(\frac{a-b}{2}\right)^2

This is the square of the distance from the midpoint to {a} and {b}. In other words, the squared radius of the smallest “disk” covering the set {\{a,b\}}.


Same happens in higher dimensions, where parabola is replaced with the paraboloid {z=|\mathbf x|^2}, {\mathbf x = (x_1,\dots x_n)}.

Paraboloid
Paraboloid

Indeed, the tangent planes at {\mathbf a} and {\mathbf b} are
{z=2\mathbf a\cdot (\mathbf x-\mathbf a)+|\mathbf a|^2} and {z=2\mathbf b\cdot (\mathbf x-\mathbf b)+|\mathbf b|^2}. Equate and solve:

\displaystyle    2(\mathbf a-\mathbf b)\cdot \mathbf x = |\mathbf a|^2-|\mathbf b|^2 \implies \left(\mathbf x-\frac{\mathbf a+\mathbf b}{2}\right)\cdot (\mathbf a-\mathbf b) =0

So, {\mathbf x} lies on the equidistant plane from {\mathbf a} and {\mathbf b}. And, as above,

\displaystyle    |\mathbf x|^2 -z = \left|\frac{\mathbf a-\mathbf b}{2}\right|^2

is the square of the radius of smallest disk covering both {\mathbf a} and {\mathbf b}.


The above observations are useful for finding the smallest disk (or ball) covering given points. For simplicity, I stick to two dimensions: covering points on a plane with the smallest disk possible. The algorithm is:

  1. Given points {(x_i,y_i)}, {i=1,\dots,n}, write down the equations of tangent planes to paraboloid {z=x^2+y^2}. These are {z=2(x_i x+y_i y)-(x_i^2+y_i^2)}.
  2. Find the point {(x,y,z)} that minimizes the vertical distance to paraboloid, that is {x^2+y^2-z}, and lies (non-strictly) below all of these tangent planes.
  3. The {x,y} coordinates of this point is the center of the smallest disk covering the points. (Known as the Chebyshev center of the set). Also, {\sqrt{x^2+y^2-z}} is the radius of this disk; known as the Chebyshev radius.

The advantage conferred by the paraboloid model is that at step 2 we are minimizing a quadratic function subject to linear constraints. Implementation in Sage:

points = [[1,3], [1.5,2], [3,2], [2,-1], [-1,0.5], [-1,1]] 
constraints = [lambda x, p=q: 2*x[0]*p[0]+2*x[1]*p[1]-p[0]^2-p[1]^2-x[2] for q in points]
target = lambda x: x[0]^2+x[1]^2-x[2]
m = minimize_constrained(target,constraints,[0,0,0]) 
circle((m[0],m[1]),sqrt(m[0]^2+m[1]^2-m[2]),color='red') + point(points)

Smallest disk covering the points
Smallest disk covering the points

Credit: this post is an expanded version of a comment by David Speyer on last year’s post Covering points with caps, where I considered the same problem on a sphere.

The Nelder-Mead minimization algorithm

It is easy to find the minimum of {f(x,y) = x^2+16y^2} if you are human. For a computer this takes more work:

Search for the minimum of x^2+16y^2
Search for the minimum of x^2+16y^2

The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of {f}. Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?

A natural way to look for minimum is to slide along the graph in the direction opposite to {\nabla f}; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of {f}; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.

This is already a reasonable minimization algorithm: begin with a triangle {T}; find the values of {f} at the vertices of {T}; reflect the triangle away from the highest value; if the reflected point {R} has a smaller value, move there; otherwise stop.

But there’s a problem: the size of triangle never changes in this process. If {T} is large, we won’t know where the minimum is even if {T} eventually covers it. If {T} is small, it will be moving in tiny steps.

Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of {T}. It is natural to contract it toward the “best” vertex (the one with the smallest value of {f}), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of {f} at all vertices becoming very close to one another.

This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of {f}.

The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:

  • Begin with some triangle {T}.
  • Evaluate the function {f} at each vertex. Call the vertices {W,G,B} where {W} is the worst one (the largest value of {f}) and {B} is the best.
  • Reflect {W} about the midpoint of the good side {GB}. Let {R} be the reflected point.
  • If {f(R)<f(B)}, then we consider moving even further in the same direction, extending the line {WR} beyond {R} by half the length of {WR}. Choose between {R} and {E} based on where {f} is smaller, and make the chosen point a new vertex of our triangle, replacing {W}.
  • Else, do not reflect and instead shrink the triangle toward {B}.
  • Repeat, stopping when we either exceed the number of iterations or all values of {f} at the vertices of triangle become nearly equal.

(The full version of the Nelder-Mead algorithm also includes the comparison of {R} with {G}, and also involves trying a point inside the triangle.)

Rosenbrock's function
Rosenbrock’s function

This is Rosenbrock’s function {f(x,y)=100(x^2-y)^2 + (x-1)^2}, one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola {y=x^2}. At the bottom of the valley, the incline toward the minimum {(1,1)} is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.

The algorithm described above gets within {0.001} of the minimum in 65 steps.

Minimizing Rosenbrock's function
Minimizing Rosenbrock’s function

In conclusion, Scilab code with this algorithm.

x = -0.4:0.1:1.6; y = -2:0.1:1.4          // viewing window  
[X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30)  // contour plot
plot([1],[1],'r+')                             // minimum point
tol = 10^(-6)
n = 0
T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5]        //  initial triangle
for i=1:3 
    values(i) = f(T(i,1), T(i,2))
end
while (%T) 
    xpoly(T(:,1),T(:,2),'lines',1)         // draw the triangle  
    [values, index] = gsort(values)          // sort the values 
    T = T(index,:)       
    if values(1)-values(3) < tol            // close enough?
        mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2))
        break 
    end
    R = T(2,:) + T(3,:) - T(1,:)             // reflected  
    fR = f(R(1),R(2))
    if fR < values(3)                         
        E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:)  // extended  
        fE = f(E(1),E(2))
        if fE < fR 
            T(1,:)=E; values(1)=fE     // pick extended 
        else
            T(1,:)=R; values(1)=fR     // pick reflected 
        end
    else 
        for i=1:2
            T(i,:) = (T(i,:)+T(3,:))/2      // shrink
            values(i) = f(T(i,1), T(i,2))      
        end
    end
    n = n+1
    if n >= 200
        disp('Failed to converge'); break    //  too bad 
    end
end

Covering points with caps

Sometimes easy problems should be solved in a hard way.

Problem 1. Given a finite subset of a circle, find a shortest arc containing the set.

Easy way. Sort the points by polar angle; find a maximal gap between points; return its complement as an answer.

Shortest arc
Shortest arc

Hard way. Not only is this approach harder, it also requires an additional assumption: the set is contained in some open semi-circle.

Draw the tangent line at every point {x_k} of the set. It divides the plane into two half-planes. Focus on the closed half-plane {P_k} that does not contain the circle. Find the point {x^*\in \bigcap P_k} that is closest to the center of circle. The line {L} from the center to {x^*} is the line of symmetry of the required arc. Its size is determined by the maximal distance of given points to {L}.

Intersection of halfplanes determined by tangent lines
Intersection of halfplanes determined by tangent lines

Finding {x^*} is a quadratic minimization problem {\|x\|^2\rightarrow \min} with linear constraints, which is not so bad. Still, this is obviously the harder way.


Problem 2. Given a finite subset of a sphere, find the smallest spherical cap containing the set.

A moment’s reflection is enough to see that the easy way is no longer available. What was previously the hard way now becomes a reasonable solution which works in every dimension (provided that the set is contained in an open hemisphere). Assuming the sphere is a unit sphere, all you do is minimize {\|x\|^2} subject to linear constraints {\langle x,x_k\rangle\ge 1 } for every {k}.

Why this works: {\{y:\langle x,y\rangle\ge 1 \}} is a half-space that intersects the sphere along a spherical cap. The smaller {\|x\|} is, the greater is the distance from this half-space to the origin, and the smaller spherical cap is cut off. The constraints ensure that the cap contains the given points.


One might expect the “flat” version of Problem 2, with plane instead of sphere, to be easier. Yet, I don’t see a way to reduce the flat problem to quadratic minimization with linear constraints.

Problem 3. Given a finite subset of the plane, find the smallest disk containing the set.

This is an instance of the problem of finding the Chebyshev center of a set.

Inner-variational equations

It’s been a while since the last time I posted a post-colloquium post. This one is based on a colloquium given by me, but don’t worry: none of my results are included here.

As a warm-up, consider the (trivial) problem of finding the shortest path between two points a,b\in\mathbb R^n. The naive approach is to minimize the length L(f)=\int_0^1 |f\,'(t)|\,dt among all maps f\colon [0,1]\to \mathbb R^n that are sufficiently smooth and satisfy the boundary conditions f(0)=a and f(1)=b. This turns out to be a bad idea: L(f) is neither strictly convex nor differentiable, and the set of minimizing maps is huge, containing some rather nonsmooth specimen.

The right approach is to minimize the energy E(f)=\int_0^1 |f\,'(t)|^2\,dt. While this functional is not immediately related to length for general maps, it is not hard to see that for minimizing maps f we have E(f)=L(f)^2. Indeed, consider performing the inner variation \widetilde{f}(t)=f(t+\epsilon \eta(t)) where \eta\colon [0,1]\to\mathbb R is smooth and vanishes at the endpoints. Expanding the inequality E(\widetilde{f})\ge E(f), we arrive at \int_0^1 |f\,'(t)|^2\eta'(t)\,dt=0, which after integration by parts yields \frac{d}{dt}|f\,'(t)|^2=0. Thus, minimization of energy enforces constant-speed parametrization, and since for constant-speed maps we have E(f)=L(f)^2, the geometric nature of the variational problem has not been lost.

As a side remark, the inner-variational equation \frac{d}{dt}|f\,'(t)|^2=0 could be written as f\,''\cdot f\,'=0, which is a nonlinear second-order equation.

For comparison, try the first variation \widetilde{f}(t)=f(t)+\epsilon \eta(t) where \eta\colon [0,1]\to\mathbb R^n is smooth and vanishes at the endpoints. Expanding the inequality E(\widetilde{f})\ge E(f), we arrive at \int_0^1 f\,'(t)\cdot \eta'(t)\,dt=0, which after integration by parts yields f\,''\equiv 0, a linear second-order equation. This Euler-Lagrange equation immediately tells us what the minimizing map f is: there is only one linear map with the given boundary values. Obviously, f\,''\equiv 0 is a much stronger statement that f\,''\cdot f\,'=0. However, if f+\epsilon \eta is not admissible for some geometric reason (e.g., the curve must avoid an obstacle), the Euler-Lagrange equation may not be available.

Moving one dimension up, consider the problem of parameterizing a given simply-connected domain \Omega\subset \mathbb C. Now we are to minimize the energy of diffeomorphisms f\colon \mathbb D\to\Omega which is defined, as before, to be the sum of squares of derivatives. (Here \mathbb D is the unit disk.) In complex notation, E(f)=\iint_{\mathbb D}(|f_z|^2+|f_{\bar z}|^2). For definiteness assume f is sense-preserving, that is |f_z|\ge |f_{\bar z}|. Minimizers ought to be conformal maps onto \Omega, but how can we see this from variational equations?

The Euler-Lagrange equation that we get from E(f)\le E(f+\epsilon\eta) turns out to be the Laplace equation \Delta f=0. This is much weaker than the Cauchy-Riemann equation f_{\bar z}=0 that we expect. One problem is that \eta must vanish on the boundary: otherwise f+\epsilon\eta will violate the geometric constraint. We could try to move the values of f in the direction tangent to \partial\Omega, but since does not necessarily make sense since the boundary of \Omega could be something like the von Koch snowflake. And of course, it is not at all clear why the minimum of E must be attained by a diffeomorphism. If the class of maps is expanded to include suitable limits of diffeomorphisms, then it’s no longer clear (actually, not true) that f+\epsilon\eta belongs to the same class. All things considered, the approach via the first variation does not appear promising.

Let’s try the inner variation instead. For small \epsilon the map z\mapsto z+\epsilon\eta is a diffeomorphism of \mathbb D, hence its composition with f is as good a candidate as f itself. Furthermore, since the inner variation deals with the model domain \mathbb D and not with the generic domain \Omega, it is easy to allow modification of boundary values: \eta should be tangent to \partial \mathbb D, i.e., \mathrm{Re}\,\bar z\eta(z) should vanish on the boundary. It takes a bit of computation to turn E(f(z+\epsilon \eta(z))) into something manageable, but the final outcome is remarkably simple. The inner-variational equation says that the function \varphi:=f_z\overline{f_{\bar z}} is holomorphic in \mathbb D and z^2 \varphi is real on the boundary \partial \mathbb D. (Technically speaking, \varphi\, dz^2 must be a real holomorphic quadratic differential.) What can we conclude from this? To begin with, the maximum principle implies that z^2 \varphi is a constant function. And since it vanishes at z=0, the inevitable conclusion is \varphi\equiv 0. Recalling the sense-preserving constraint |f_z|\ge |f_{\bar z}|, we arrive at f_{\bar z}\equiv 0, the desired Cauchy-Riemann equation.

Executive summary

  • Suppose we are to minimize some quantity (called “energy”) that depends on function (or a map) f
  • We can consider applying the first variation f+ \epsilon\eta of the inner variation f\circ (\mathrm{id}+\epsilon \eta). Here \eta is a small perturbation which is applied differently: in the first case it changes the values of f, in the second it shuffles them around.
  • Inner variation applies even in the presence of geometric constraints that make first variation illegal. One example of such constraint is “f must be injective”.
  • Inner-variational equations are quite different from the Euler-Lagrange equations. Even for simple quadratic functionals they are nonlinear.
  • Inner-variational equations are useful because they tell us something about the maps of minimal energy.