provides reasonable approximations to the function near zero: for example, with we get

The quality of approximation not being spectacular, one can try to improve it by using rational functions instead of polynomials. In view of the identity

one can get with . The improvement is substantial for negative :

Having rational approximation of the form makes perfect sense, because such approximants obey the same functional equation as the exponential function itself. We cannot hope to satisfy other functional equations like by functions simpler than .

However, the polynomial is not optimal for approximation except for . For degree , the optimal choice is . In the same plot window as above, the graph of is indistinguishable from .

This is a Padé approximant to the exponential function. One way to obtain such approximants is to replace the Taylor series with continued fraction, using long division:

Terminating the expansion after an even number of apprearances gives a Padé approximant of the above form.

This can be compared to replacing the decimal expansion of number :

with the continued fraction expansion

which, besides greater accuracy, has a regular pattern: 121 141 161 181 …

The numerators of the (diagonal) Padé approximants to the exponential function happen to have a closed form:

which shows that for every fixed , the coefficient of converges to as . The latter is precisely the Taylor coefficient of .

In practice, a recurrence relation is probably the easiest way to get these numerators: begin with and , and after that use . This relation can be derived from the recurrence relations for the convergents of a generalized continued fraction . Namely, and . Only the first relation is actually needed here.

Using the recurrence relation, we get

(so, not all coefficients have numerator 1…)

The quality of approximation to is best seen on logarithmic scale: i.e., how close is to ? Here is this comparison using .

For comparison, the Taylor polynomial of fifth degree, also on logarithmic scale: where .

There are many ways to approximate a given continuous function (I will consider the interval for convenience.) For example, one can use piecewise linear interpolation through the points , where . The resulting piecewise linear function has some nice properties: for example, it is increasing if is increasing. But it is not smooth.

A convenient way to represent piecewise linear interpolation is the sum where the functions are the triangles shown below: .

The functions form a partition of unity, meaning that and all are nonnegative. This property leads to the estimate

The latter sum is small because when is close to , the first factor is small by virtue of continuity, while the second factor is bounded by . When is far from , the second factor is zero, so the first one is irrelevant. The upshot is that is uniformly small.

But if we want a smooth approximation , we need a smooth partition of unity . But not just any set of smooth nonnegative functions that add up to is equally good. One desirable property is preserving monotonicity: if is increasing, then should be increasing, just as this works for piecewise linear interpolation. What does this condition require of our partition of unity?

An increasing function can be expressed as a limit of sums of the form where and is the Iverson bracket: 1 if true, 0 if false. By linearity, it suffices to have increasing for the case . In this case is simply for some , . So we want all to be increasing functions. Which is the case for the triangular partition of unity, when each looks like this:

One smooth choice is Bernstein basis polynomials: . These are nonnegative on , and the binomial formula shows . Are the sums increasing with ? Let’s find out. By the product rule,

In the second sum the term with vanishes, and the terms with can be rewritten as , which is , which is . After the index shift this becomes identical to the terms of the first sum and cancels them out (except for the first one). Thus,

To summarize: the Bernstein polynomials are monotone whenever is. On the other hand, the proof that uniformly is somewhat complicated by the fact that the polynomial basis functions are not localized the way that the triangle basis functions are: the factors do not vanish when is far from . I refer to Wikipedia for a proof of convergence (which, by the way, is quite slow).

Is there some middle ground between non-smooth triangles and non-localized polynomials? Yes, of course: piecewise polynomials, splines. More specifically, B-splines which can be defined as follows: B-splines of degree are the triangle basis functions shown above; a B-spline of degree is the moving averages of a -spline of degree with a window of length . The moving average of can be written as . We get a partition of unity because the sum of moving averages is the moving average of a sum, and averaging a constant function does not change it.

The splines of even degrees are awkward to work with… they are obtained from the triangles by taking those integrals with an odd number of times, which makes their knots fall in the midpoints of the uniform grid instead of the grid points themselves. But I will use anyway, because this degree is enough for -smooth approximation.

Recall that a triangular basis function has slope and is supported on an interval where . Accordingly, its moving average will be supported on . Since , the second derivative is when , is when , and is again when . This is enough to figure out the formula for :

These look like:

Nice! But wait a moment, the sum near the endpoints is not constant: it is less than 1 because we do not get a contributions of two splines to the left and right of the interval. To correct for this boundary effect, replace with and with , using “ghost” elements of the basis that lie outside of the actual grid. Now the quadratic B-spline basis is correct:

Does this partition of unity preserve monotinicity? Yes, it does: which is nonnegative because the sum is an increasing piecewise linear function, as noted previously. Same logic works for B-splines of higher degree.

In conclusion, here is a quadratic B-spline approximation (orange) to a tricky increasing function (blue).

One may wonder why the orange curve deviates from the line at the end – did we miss some boundary effect there? Yes, in a way… the spline actually approximates the continuous extension of our original function by constant values on the left and right. Imagine the blue graph continuing to the right as a horizontal line: this creates a corner at and the spline is smoothing that corner. To avoid this effect, one may want to extend in a better way and then work with the extended function, not folding the ghosts into .

But even so, B-spline achieves a better approximation than the Bernstein polynomial with the same number of basis functions (eight):

The reason is the non-local nature of the polynomial basis , which was noted above. Bernstein polynomials do match the function perfectly at the endpoints, but this is small consolation.

In Calculus I students are taught how to find the points at which the graph of a function has zero curvature (that is, the points of inflection). The points of maximal curvature are usually not discussed. This post attempts to rectify this omission.

The (signed) curvature of is

We want to maximize the absolute value of , whether the function is positive or negative there (so both maxima and minima of can be of interest). The critical points of are the zeros of

So we are lead to consider a polynomial of the first three derivatives of , namely .

Begin with some simple examples:

has so the curvature of a parabola is maximal at its vertex. No surprise there.

has , indicating two symmetric points of maximal curvature, , pretty close to the point of inflection.

has . This has three real roots, but actually minimizes curvature (it vanishes there).

More generally, with positive integer yields indicating two points which tend to as grows.

The graph of a polynomial of degree can have at most points of zero curvature, because the second derivative vanishes at those. How many points of maximal curvature can it have? The degree of expression above is but it is not obvious whether all of its roots can be real and distinct, and also be the maxima of (unlike for ). For we do get point of maximal curvature. But for , there can be at most such points, not . Edwards and Gordon (Extreme curvature of polynomials, 2004) conjectured that the graph of a polynomial of degree has at most points of maximal curvature. This remains open despite several partial results: see the recent paper Extreme curvature of polynomials and level sets (2017).

A few more elementary functions:

has , so the curvature is maximal at . Did I expect the maximum of curvature to occur for a negative ? Not really.

has . The first factor is irrelevant: the points of maximum curvature of a sine wave are at its extrema, as one would guess.

has which is zero at… ahem. The expression factors as

Writing we can get a cubic equation in , but it is not a nice one. Or we could do some trigonometry and reduce to the equation . Either way, a numerical solution is called for: (and for other periods).

The polynomially convex hull of a compact set is defined as the set of all points such that the inequality (a form of the maximum principle) holds for every polynomial . For example, the polynomially convex hull of a simple closed curve is the union of that curve with its interior region. In general, this process fills up the holes in the set , resulting in the complement of the unbounded connected component of .

We can recover the usual convex hull from this construction by restricting to the polynomials of first degree. Indeed, when is linear, the set is a closed disk, and we end up with the intersection of all closed disks that contain . This is precisely the convex hull of .

What if we restrict to the polynomials of degree at most ? Let’s call the resulting set the degree- convex hull of , denoted . By definition, it is contained in the convex hull and contains the polynomially convex hull. To exactly compute for general appears to be difficult even when .

Consider finite sets. When has at most points, we have because there is a polynomial of degree whose zero set is precisely . So, the first nontrivial case is of having points. Let us write .

Depending on the location of the points, may be strictly larger than . For example, if consists of the vertices of a regular -gon, then also contains its center. Here is why. By a linear transformation, we can make sure that where . For we have . Hence, for any polynomial of degree at most , the sum is equal to . This implies , a kind of a discrete maximum principle.

A more systematic approach is to use the Lagrange basis polynomials, that is , which satisfy . Since for any polynomial of degree at most , it follows that if and only if holds for every choice of scalars . The latter is equivalent to the inequality .

This leads us to consider the function , the sum of the absolute values of the Lagrange basis polynomials. (Remark: S is called a Lebesgue function for this interpolation problem.) Since , it follows that everywhere. By construction, on . At a point , the equality holds if and only if is the same for all .

In the trivial case , the function is equal to precisely on the linear segment with endpoints . Of course, this only repeats what we already knew: the degree-1 convex hull is the ordinary convex hull.

If with real and , then . Indeed, if , then lies in the convex hull of , and therefore for some . The basis polynomial is positive at , since it is equal to at and does not vanish outside of . Since a polynomial changes its sign at every simple zero, it follows that . Well, there is no if , but in that case, the same reasoning applies to . In any case, the conclusion is that cannot be the same for all .

At this point one might guess that the vertex set of a regular polygon is the only kind of finite sets that admit a nontrivial discrete maximum principle for polynomials. But this is not so: the vertices of a rhombus work as well. Indeed, if with , then for all , hence .

The vertices of a non-square rectangle do not work: if is the set of these vertices, the associated function is strictly greater than 1 on the complement of .

Are there any other finite sets that support a discrete maximum principle for polynomials?

Suppose is a holomorphic function in the unit disk such that in the disk. How large can its Taylor polynomial be in the disk?

We should not expect to be bounded by 1 as well. Indeed, the Möbius transformation has Taylor expansion , so in this case. This turns out to be the worst case: in general is bounded by 5/4 in the disk.

For the second-degree polynomial the sharp bound is , attained when ; the image of the unit circle under the extremal is shown below. Clearly, there is something nontrivial going on.

Edmund Landau established the sharp bound for in his paper Abschätzung der Koeffizientensumme einer Potenzreihe, published in Archiv der Mathematik und Physik (3) 21 in 1913. Confusingly, there are two papers with the same title in the same issue of the journal: one on pages 42-50, the other on pages 250-255, and they appear in different volumes of Landau’s Collected Works. The sharp bound is in the second paper.

First steps

By rotation, it suffices to bound , which is . As is often done, we rescale a bit so that it’s holomorphic in a slightly larger disk, enabling the use of the Cauchy integral formula on the unit circle . The Cauchy formula says . Hence

It is natural to use now, which leads to

Here we can use the geometric sum formula and try to estimate the integral of on the unit circle. This is what Landau does in the first of two papers; the result is which is the correct rate of growth (this is essentially the Dirichlet kernel estimate from the theory of Fourier series). But there is a way to do better and get the sharp bound.

Key ideas

First idea: the factor could be replaced by any polynomial as long as the coefficients of powers up to stay the same. Higher powers contribute nothing to the integral that evaluates , but they might reduce the integral of .

Second idea: we should choose to be the square of some polynomial, , because can be computed exactly: it is just the sum of squares of the coefficients of , by Parseval’s formula.

Implementation

Since is the -th degree Taylor polynomial of , it is natural to choose to be the -th degree Taylor polynomial of . Indeed, if , then as desired (asymptotics as ). The binomial formula tells us that

The coefficient of here can be written out as or rewritten as which shows that in lowest terms, its denominator is a power of 2. To summarize, is bounded by the sum of squares of the coefficients of . Such sums are referred to as the Landau constants,

A number of asymptotic and non-asymptotic formulas have been derived for , for example Brutman (1982) shows that is between 1 and 1.0663.

Sharpness

To demonstrate the sharpness of the bound , we want and on the unit circle. Both are arranged by taking which is a Blaschke product of degree . Note that the term can also be written as . Hence which is simply when . Equality holds in all the estimates above, so they are sharp.

Here are the images of the unit circle under extremal Taylor polynomials and .

These polynomials attain large values only on a short subarc of the circle; most of the time they oscillate at levels less than 1. Indeed, the mean value of cannot exceed the mean of which is at most 1. Here is the plot of the roots of extremal : they are nearly uniform around the circle, except for a gap near 1.

But we are not done…

Wait a moment. Does define a holomorphic function in the unit disk? We are dividing by here. Fortunately, has no zeros in the unit disk, because its coefficients are positive and decreasing as the exponent increases. Indeed, if with , then has constant term and other coefficients , , … , . Summing the absolute values of the coefficients of nonconstant terms we get . So, when these coefficients are attached to with , the sum of nonconstant terms is strictly less than in absolute value. This proves in the unit disk. Landau credits Adolf Hurwitz with this proof.

In fact, the zeros of (Taylor polynomials of ) lie just outside of the unit disk.

The zeros of the Blaschke products formed from are the reciprocals of the zeros of , so they lie just inside the unit circle, much like the zeros of (though they are different).

For every nonnegative integer there exists a (unique) polynomial of degree in and separately with the following reproducing property:

for every polynomial of degree at most , and for every . For example, ; other examples are found in the post Polynomial delta function.

This fact gives an explicit pointwise bound on a polynomial in terms of its integral on an interval:

where . For example, .

Although in principle could be any real or complex number, it makes sense to restrict attention to , where integration takes place. This leads to the search for extreme values of on the square . Here is how this function looks for :

The symmetries are evident here.

Explicitly,

where is the Legendre polynomial of degree and the factor is included to make the polynomials an orthonormal set in . Since oscillates between and , it follows that

and this bound is attained at because and .

Is

the minimum value of on the square ? Certainly not for even . Indeed, differentiating the sum

with respect to and using , we arrive at

which is negative if is even, ruling out this point as a minimum.

What about odd , then: is it true that on the square ?

: yes, is clear enough.

: the inequality is also true… at least numerically. It can be simplified to but I do not see a way forward from there. At least on the boundary of it can be shown without much work:

The quadratic term has no real roots, which is easy to check.

: similar story, the inequality appears to be true but I can only prove it on the boundary, using

The quartic term has no real roots, which is not so easy to check.

: surprisingly, which is about , disproving the conjectural bound . This fact is not at all obvious from the plot.

Questions:

Is on the square with some universal constant ?

Is the minimum of on always attained on the boundary of ?

Can be expressed in closed form, at least for small degrees ?

The choice of piecewise polynomials of degree 3 for interpolation is justifiably popular: even-degree splines are algebraically awkward to construct, degree 1 is simply piecewise linear interpolation (not smooth), and degree 5, while feasible, entails juggling too many coefficients. Besides, a cubic polynomial minimizes the amount of wiggling (the integral of second derivative squared) for given values and slopes at the endpoints of an interval. (Recall Connecting dots naturally.)

But the derivative of a cubic spline is a quadratic spline. And one needs the derivative to find the critical points. This results in an awkward example in SciPy documentation, annotated with “(NB: sproot only works for order 3 splines, so we fit an order 4 spline)”.

Although not implemented in SciPy, the task of computing the roots of a quadratic spline is a simple one. Obtaining the roots from the internal representation of a quadratic spline in SciPy (as a linear combination of B-splines) would take some work and reading. But a quadratic polynomial is determined by three values, so sampling it at three points, such as two consecutive knots and their average, is enough.

Quadratic formula with values instead of coefficients

Suppose we know the values of a quadratic polynomial q at -1, 0, 1, and wish to find if it has roots between -1 and 1. Let’s normalize so that q(0)=1, and let x = q(-1), y = q(1). If either x or y is negative, there is definitely a root on the interval. If they are positive, there is still a chance: we need the parabola to be concave up, have a minimum within [-1, 1], and for the minimum to be negative. All of this is easily determined once we note that the coefficients of the polynomial are a = (x+y)/2 – 1, b = (y-x)/2, and c = 1.

The inequality ensures the suitable sign of the discriminant. It describes a parabola with vertex (1, 1) and focus (2, 2), contained in the first quadrant and tangent to the axes at (4, 0) and (0, 4). Within the orange region there are no real roots.

The line x+y=2, tangent to the parabola at its vertex, separates convex and concave parabolas. While concavity in conjunction with x, y being positive definitely precludes having roots in [-1, 1], slight convexity is not much better: it results in real roots outside of the interval. Here is the complete picture: green means there is a root in [-1, 1], orange means no real roots, red covers the rest.

Back to splines

Since the derivative of a spline is implemented in SciPy (B-splines have a nice formula for derivatives), all we need is a root-finding routine for quadratic splines. Here it is, based on the above observations but using built-in NumPy polynomial solver np.roots to avoid dealing with various special cases for the coefficients.

def quadratic_spline_roots(spl):
roots = []
knots = spl.get_knots()
for a, b in zip(knots[:-1], knots[1:]):
u, v, w = spl(a), spl((a+b)/2), spl(b)
t = np.roots([u+w-2*v, w-u, 2*v])
t = t[np.isreal(t) & (np.abs(t) <= 1)]
roots.extend(t*(b-a)/2 + (b+a)/2)
return np.array(roots)

A demonstration, which plots the spline (blue), its critical points (red), and original data points (black) as follows:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline
x = np.arange(7)
y = np.array([3, 1, 1, 2, 2, 4, 3])
f = InterpolatedUnivariateSpline(x, y, k=3)
crit_pts = quadratic_spline_roots(f.derivative())
t = np.linspace(x[0], x[-1], 500)
plt.plot(t, f(t))
plt.plot(x, y, 'kd')
plt.plot(crit_pts, f(crit_pts), 'ro')
plt.show()

In a 1946 paper Charles Pisot proved a theorem involving a curious constant . It can be defined as follows:

monic polynomial such that whenever

Equivalently, is determined by the requirement that the set have logarithmic capacity 1; this won’t be used here. The theorem is stated below, although this post is really about the constant.

Theorem: If an entire function takes integer values at nonnegative integers and is for some , then it is a finite linear combination of terms of the form , where each is an algebraic integer.

The value of is best possible; thus, in some sense Pisot’s theorem completed a line of investigation that began with a 1915 theorem by Pólya which had in place of , and where the conclusion was that is a polynomial. (Informally speaking, Pólya proved that is the “smallest” entire-function that is integer-valued on nonnegative integers.)

Although the constant was mentioned in later literature (here, here, and here), no further digits of it have been stated anywhere, as far as I know. So, let it be known that the decimal expansion of begins with 0.84383.

A lower bound on can be obtained by constructing a monic polynomial that is bounded by 1 on the set . Here is E(0.843):

It looks pretty round, except for that flat part on the left. In fact, E(0.82) is covered by a disk of unit radius centered at 1.3, which means that the choice shows .

How to get an upper bound on ? Turns out, it suffices to exhibit a monic polynomial that has all zeros in and satisfies on the boundary of . The existence of such shows . Indeed, suppose that is monic and on . Consider the function . By construction on the boundary of . Also, is subharmonic in its complement, including , where the singularities of both logarithms cancel out, leaving . This contradicts the maximum principle for subharmonic functions, according to which cannot exceed the maximum of on the boundary.

The choice of works for .

So we have boxed between 0.82 and 0.89; how to get more precise bounds? I don’t know how Pisot achieved the precision of 0.843… it’s possible that he strategically picked some linear and quadratic factors, raised them to variable integer powers and optimized the latter. Today it is too tempting to throw some optimization routine on the problem and let it run for a while.

But what to optimize? The straightforward approach is to minimize the maximum of on the circle , approximated by sampling the function at a sufficiently fine uniform grid and picking the maximal value. This works… unspectacularly. One problem is that the objective function is non-differentiable. Another is that taking maximum throws out a lot of information: we are not using the values at other sample points to better direct the search. After running optimization for days, trying different optimization methods, tolerance options, degrees of the polynomial, and starting values, I was not happy with the results…

Turns out, the optimization is much more effective if one minimizes the variance of the set . Now we are minimizing a polynomial function of , which pushes them toward having the same absolute value — the behavior that we want the polynomial to have. It took from seconds to minutes to produce the polynomials shown below, using BFGS method as implemented in SciPy.

As the arguments for optimization function I took the real and imaginary parts of the zeros of the polynomial. The symmetry about the real axis was enforced automatically: the polynomial was the product of quadratic terms . This eliminated the potentially useful option of having real zeros of odd order, but I did not feel like special-casing those.

Again, nearly the same polynomial works for upper and lower bounds. The fact that the absolute value of each of these polynomials is below 1 (for lower bounds) or greater than 1 (for upper bounds) can be ascertained by sampling them and using an upper estimate on the derivative; there is enough margin to trust computations with double precision.

Finally, the Python script I used. The function “obj” is getting minimized while function “values” returns the actual values of interest: the minimum and maximum of polynomial. The degree of polynomial is 2n, and the radius under consideration is r. The sample points are collected in array s. To begin with, the roots are chosen randomly. After minimization runs (inevitably, ending in a local minimum of which there are myriads), the new starting point is obtained by randomly perturbing the local minimum found. (The perturbation is smaller if minimization was particularly successful.)

import numpy as np
from scipy.optimize import minimize
def obj(r):
rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
p = np.prod(np.abs(s-rc)**2, axis=0)
return np.var(p)
def values(r):
rc = np.concatenate((r[:n]+1j*r[n:], r[:n]-1j*r[n:])).reshape(-1,1)
p = np.prod(np.abs(s-rc), axis=0)
return [np.min(p), np.max(p)]
r = 0.84384
n = 10
record = 2
s = np.exp(r * np.exp(1j*np.arange(0, np.pi, 0.01)))
xr = np.random.uniform(0.8, 1.8, size=(n,))
xi = np.random.uniform(0, 0.7, size=(n,))
x0 = np.concatenate((xr, xi))
while True:
res = minimize(obj, x0, method = 'BFGS')
if res['fun'] < record:
record = res['fun']
print(repr(res['x']))
print(values(res['x']))
x0 = res['x'] + np.random.uniform(-0.001, 0.001, size=x0.shape)
else:
x0 = res['x'] + np.random.uniform(-0.05, 0.05, size=x0.shape)

The function has a curious property: for any linear function , and any point , the integral evaluates to . This is easy to check using the fact that odd powers of integrate to zero:

More generally, for any integer there exists a unique symmetric polynomial that has degree in and separately and satisfies for all polynomials of degree at most . For example, (obviously) and

The formula is not really intuitive, and a 3d plot would not help the matter much. To visualize , I plotted , , and below (green, red, blue respectively).

For and large , the function approaches the Dirac delta at , although the convergence is slow, especially when is close to . I don’t think there is anything good to be said about the case .

The existence and uniqueness of are a consequence of the Riesz representation of linear functionals on an inner product space. Indeed, polynomials of degree at most form such a space with inner product , and the functional is linear for any fixed . Hence, this functional can be written as for some . The function is a reproducing kernel for this space. Its symmetry is not immediately obvious.

The Legendre polynomials are an orthogonal basis of ; more precisely, form an orthonormal basis. It’s a general fact about reproducing kernels that

(which, incidentally, proves the symmetry ). Indeed, taking this sum as the definition of and writing , we find

This is the Sage code used for the above plots.

n = 20
k = sum([(j+1/2)*legendre_P(j,x)*legendre_P(j,y) for j in range(0,n+1)])
plot(k(x,y=-3/4),(x,-1,1),color='green') + plot(k(x,y=0),(x,-1,1),color='red') + plot(k(x,y=1/2),(x,-1,1),color='blue')

If one picks two real numbers from the interval (independent, uniformly distributed), their sum has the triangular distribution.

The sum of three such numbers has a differentiable probability density function:

And the density of is smoother still: the p.d.f. has two
continuous derivatives.

As the number of summands increases, these distributions converge to normal if they are translated and scaled properly. But I am not going to do that. Let’s keep the number of summands to four at most.

The p.d.f. of is a piecewise polynomial of degree . Indeed, for the density is piecewise constant, and the formula

provides the inductive step.

For each , the translated copies of function form a partition of unity:

The integral recurrence relation gives an easy proof of this:

And here is the picture for the quadratic case:

A partition of unity can be used to approximate functions by piecewise polynomials: just multiply each partition element by the value of the function at the center of the corresponding interval, and add the results.

Doing this with amounts to piecewise linear interpolation: the original function is in blue, the weighted sum of hat functions in red.

With we get a smooth curve.

Unlike interpolating splines, this curve does not attempt to pass through the given points exactly. However, it has several advantages over interpolating splines:

Is easier to calculate; no linear system to solve;