Unpopular positive opinion challenge

Challenge accepted. I got three, though none are below 40%.

The Yellow Birds (2018), 45% on RT

The market isn’t so hot for Iraq war movies. And it’s nearly impossible to adapt such an introspective novel into film. I still respect the effort and its outcome, even if all references to the normal distribution got left out of it.

I spent a lot of time trying to identify the exact point at which I noticed a change in Murph, somehow thinking that if I could figure out where he had begun to slide down the curve of the bell that I could do something about it. But these are subtle shifts, and trying to distinguish them is like trying to measure the degrees of gray when evening comes. It’s impossible to identify the cause of anything, and I began to see the war as a big joke, for how cruel it was, for how desperately I wanted to measure the particulars of Murph’s new, strange behavior and trace it back to one moment, to one cause, to one thing I would not be guilty of. And I realized very suddenly one afternoon while throwing rocks into a bucket in a daze that the joke was in fact on me. Because how can you measure deviation if you don’t know the mean? There was no center in the world. The curves of all our bells were cracked.

(From The Yellow Birds by Kevin Powers)

Hearts in Atlantis (2001), 49% on RT

Two actors with (essentially) the same first name and over 50 years of age difference (Anthony Hopkins 1937-, Anton Yelchin 1989-2016) make this Stephen King adaptation well worth watching.

He made another circuit of his room, working the tingles out of his legs, feeling like a prisoner pacing his cell. The door had no lock on it—no more than his mom’s did—but he felt like a jailbird just the same. He was afraid to go out. She hadn’t called him for supper, and although he was hungry—a little, anyway—he was afraid to go out. He was afraid of how he might find her… or of not finding her at all. Suppose she had decided she’d finally had enough of Bobby-O, stupid lying little Bobby-O, his father’s son? Even if she was here, and seemingly back to normal… was there even such a thing as normal? People had terrible things behind their faces sometimes. He knew that now.

(From Low Men in Yellow Coats by Stephen King)

Maze Runner: The Death Cure (2018), 43% on RT

Sure, it’s not as good as the first film in the series (which does not qualify for the challenge, scoring 65% on RT), but a major improvement on the mindless zombie chases of the second part. I like to think of it as a parable illustrating ethical issues in public health… allowing for the customary movie-science vs actual-science differences.

Measuring nonlinearity and reducing it

How to measure the nonlinearity of a function {f\colon I\to \mathbb R} where {I\subset \mathbb R} is an interval? A natural way is to consider the smallest possible deviation from a line {y=kx+b}, that is {\inf_{k, b}\sup_{x\in I}|f(x)-kx-b|}. It turns out to be convenient to divide this by {|I|}, the length of the interval {I}. So, let {\displaystyle NL(f;I) = \frac{1}{|I|} \inf_{k, b}\sup_{x\in I}|f(x)-kx-b|}. (This is similar to β-numbers of Peter Jones, except the deviation from a line is measured only in the vertical direction.)

NL(f; I) is the maximal vertical distance between red and black, divided by the length of I

Relation with derivatives

The definition of derivative immediately implies that if {f'(a)} exists, then {NL(f;I)\to 0} as {I} shrinks to {a} (that is, gets smaller while containing {a}). A typical construction of a nowhere differentiable continuous function is based on making {NL(f;I)} bounded from below; it is enough to do this for dyadic intervals, and that can be done by adding wiggly terms like {2^{-n}\mathrm{dist}\,(x, 2^{-n}\mathbb Z)}: see the blancmange curve.

The converse is false: if {NL(f; I)\to 0} as {I} shrinks to {a}, the function {f} may still fail to be differentiable at {a}. The reason is that the affine approximation may have different slopes at different scales. An example is {f(x)=x \sin \sqrt{-\log |x|}} in a neighborhood of {0}. Consider a small interval {[-\delta, \delta]}. The line {y = kx} with {k=\sin\sqrt{-\log \delta}} is a good approximation to {f} because {f(x)/x\approx k} on most of the interval except for a very small part near {0}, and on that part {f} is very close to {0} anyway.

Why the root of logarithm? Because {\sin \log |x|} has a fixed amount of change on a fixed proportion of  {[-\delta, \delta]}, independently of {\delta}. We need a function slower than the logarithm, so that as {\delta} decreases, there is a smaller amount of change on a larger part of the interval {[-\delta, \delta]}.

Nonlinearity of Lipschitz functions

Suppose {f} is a Lipschitz function, that is, there exists a constant {L} such that {|f(x)-f(y)|\le L|x-y|} for all {x, y\in I}. It’s easy to see that {NL(f;I)\le L/2}, by taking the mid-range approximation {y=\frac12 (\max_I f + \min_I f)}. But the sharp bound is {NL(f;I)\le L/4} whose proof is not as trivial. The sharpness is shown by {f(x)=|x|} with {I=[-1,1]}.

With the maximum difference 1/2 and the length of interval 2, we get NL(f; [-1, 1]) = 1/4
Proof. Let {k} be the slope of the linear function that agrees with {f} at the endpoints of {I}. Subtracting this linear function from {f} gives us a Lipschitz function {g} such that {-L-k\le g'\le L-k} and {\int_I g'= 0}. Let {A = \int_I (g')^+ = \int_I (g')^-}. Chebyshev’s inequality gives lower bounds for the measures of the sets {g'>0} and {g'<0}: namely, {|g'>0|\ge A/(L-k)} and {|g'<0|\le A/(L+k)}. By adding these, we find that {|I| \ge 2LA/(L^2-k^2)\ge 2A/L}. Since {\max _I g - \min_I g \le A}, the mid-range approximation to {g} has error at most {A/2 \le |I|L/4}. Hence {NL(f; I) = NL(g; I) \le L/4}.

Reducing nonlinearity

Turns out, the graph of every Lipschitz function has relatively large almost-flat pieces.  That is, there are subintervals of nontrivial size where the measure of nonlinearity is much smaller than the Lipschitz constant. This result is a special (one-dimensional) case of Theorem 2.3 in Affine approximation of Lipschitz functions and nonlinear quotients by Bates, Johnson, Lindenstrauss, Preiss, and Schechtman.

Theorem AA (for “affine approximation”): For every {\epsilon>0} there exists {\delta>0} with the following property. If {f\colon I\to \mathbb R} is an {L}-Lipschitz function, then there exists an interval {J\subset I} with {|J|\ge \delta |I|} and {NL(f; J)\le \epsilon L}.

Theorem AA should not be confused with Rademacher’s theorem which says that a Lipschitz function is differentiable almost everywhere. The point here is a lower bound on the size of the interval {J}. Differentiability does not provide that. In fact, if we knew that {f} is smooth, or even a polynomial, the proof of Theorem AA would not become any easier.

Proof of Theorem AA

We may assume {I=[-1, 1]} and {L=1}. For {t\in (0, 2]} let {L(t) = \sup \{|f(x)-f(y)|/|x-y| \colon x, y\in I, \ |x-y|\ge t\}}. That is, {L(t)} is the restricted Lipschitz constant, one that applies for distances at least {t}. It is a decreasing function of {t}, and {L(0+)=1}.

Note that {|f(-1)-f(1)|\le 2L(1)} and that every value of {f} is within {2L(1)} of either {f(-1)} or {f(1)}. Hence, the oscillation of {f} on {I} is at most {6L(1)}. If {L(1) \le \epsilon/3}, then the constant mid-range approximation on {I} gives the desired conclusion, with {J=I}. From now on {L(1) > \epsilon/3}.

The sequence {L_k = L(4^{-k})} is increasing toward {L(0+)=1}, which implies {L_{k+1}\le (1+\epsilon) L_k} for some {k}. Pick an interval {[a, b]\subset I} that realizes {L_k}, that is {b-a\ge 4^{-k}} and {|f(b)-f(a)| = 4^{-k}L_k}. Without loss of generality {f(b)>f(a)} (otherwise consider {-f}). Let {J = [(3a+b)/4, (a+3b)/4]} be the middle half of {[a. b]}. Since each point of {J} is within distance {\ge 4^{-k-1}} of both {a} and {b}, it follows that {\displaystyle f(b) + L_{k+1}(x-b) \le f(x) \le f(a) + L_{k+1}(x-a) } for all {x \in J}.

The green lines have slope L_{k+1} which is close to the slope L_k of the secant line through (a, f(a)) and (b, f(b)). The graph of f is pinched between these green lines, except near a or b.

So far we have pinched {f} between two affine functions of equal slope. Let us consider their difference:
{\displaystyle (f(a) + L_{k+1}(x-a)) - (f(b) + L_{k+1}(x-b)) = (L_{k+1}-L_k) (b-a)}. Recall that {L_{k+1}\le (1+\epsilon) L_k}, which gives a bound of {\epsilon L_k(b-a) \le 2\epsilon L |J|} for the difference. Approximating {f} by the average of the two affine functions we conclude that {NL(f;J)\le \epsilon L} as required.

It remains to consider the size of {J}, about which we only know {|J|\ge 4^{-k}/2} so far. Naturally, we want to take the smallest {k} such that {L_{k+1}\le (1+\epsilon) L_k} holds. Let {m} be this value; then {L_m > (1+\epsilon)^{m} L_0}. Here {L_m\le 1} and {L_0 = L(1)> \epsilon/3 }. The conclusion is that {(1+\epsilon)^m < 3/\epsilon}, hence {m< \log(3/\epsilon)/\log(1+\epsilon)}. This finally yields {\displaystyle \delta = 4^{-\log(3/\epsilon)/\log(1+\epsilon)}/2} as an acceptable choice, completing the proof of Theorem AA.

A large amount of work has been done on quantifying {\delta} in various contexts; for example Heat flow and quantitative differentiation by Hytönen and Naor.

Laplacian spectrum of small graphs

This is a collection of entirely unoriginal remarks about Laplacian spectrum of graphs. For an accessible overview of the subject I recommend the M.S. thesis The Laplacian Spectrum of Graphs by Michael William Newman. It also includes a large table of graphs with their spectra. Here I will avoid introducing matrices and enumerating vertices.

Let {V} be the vertex set of a graph. Write {u\sim v} if {u, v} are adjacent vertices. Given a function {f\colon V\to \mathbb R}, define {L f(v) = \sum_{u\colon u\sim v}(f(v)-f(u))}.
This is a linear operator (the graph Laplacian) on the Euclidean space {\ell^2(V)} of all functions {f\colon V\to \mathbb R} with the norm {\|f\|^2 = \sum_{v\in V} f(v)^2}. It is symmetric: {\langle L f, g\rangle = \langle f, L g\rangle } and positive semidefinite: {\langle L f, f\rangle = \frac12 \sum_{u\sim v}(f(u)-f(v))^2\ge 0}. Since equality is attained for constant {f}, 0 is always an eigenvalue of {L}.

This is the standard setup, but I prefer to change things a little and replace {\ell^2(V)} by the smaller space {\ell^2_0(V)} of functions with zero mean: {\sum_{v\in V}f(v)=0}. Indeed, {L} maps {\ell^2(V)} to {\ell^2_0(V)} anyway, and since it kills the constants, it makes sense to focus on {\ell^2_0(V)}. It is a vector space of dimension {n-1} where {n=|V|}.

One advantage is that the smallest eigenvalue is 0 if and only if the graph is disconnected: indeed, {\langle L f, f\rangle=0} is equivalent to {f} being constant on each connected component. We also gain better symmetry between {L} and the Laplacian of the graph complement, denoted {L'}. Indeed, since {L' f(v) = \sum_{u\colon u\not \sim v}(f(v)-f(u))}, it follows that {(L+L')f(v) = \sum_{u\colon u\ne v} (f(v)-f(u)) = n f(v)} for every {f\in \ell^2_0(V)}. So, the identity {L+L' = nI} holds on {\ell^2_0(V)} (it does not hold on {\ell^2(V)}). Hence the eigenvalues of {L'} are obtained by subtracting the eigenvalues of {L} from {n}. As a corollary, the largest eigenvalue of {L} is at most {n}, with equality if and only if the graph complement is disconnected. More precisely, the multiplicity of eigenvalue {n} is one less than the number of connected components of the graph complement.

Let {D} denote the diameter of the graph. Then the number of distinct Laplacian eigenvalues is at least {D}. Indeed, let {u, v} be two vertices at distance {D} from each other. Define {f_0(u) = 1} and {f_0=0} elsewhere. Also let {f_k=L^k f_0} for {k=1, 2, \dots}. Note that {f_k\in \ell_0^2(V)} for all {k\ge 1}. One can prove by induction that {f_k(w)=0} when the distance from {w} to {u} is greater than {k}, and {(-1)^k f_k(w) > 0} when the distance from {w} to {u} is equal to {k}. In particular, {f_k(v) = 0} when {k<D} and {f_D(v)\ne 0}. This shows that {f_D} is not a linear combination of {f_1, \dots, f_{D-1}}. Since {f_k = L^{k-1}f_1}, it follows that {L^{D-1}} is not a linear combination of {L^0, L^1, \dots, L^{D-2}}. Hence the minimal polynomial of {L} has degree at least {D}, which implies the claim.

Let’s consider a few examples of connected graphs.

3 vertices

There are two connected graphs: the 3-path (D=2) and the 3-cycle (D=1). In both cases we get D distinct eigenvalues. The spectra are [1, 3] and [3, 3], respectively.

4 vertices

  • One graph of diameter 3, the path. Its spectrum is {[2-\sqrt{2}, 2, 2+\sqrt{2}]}.
  • One graph of diameter 1, the complete graph. Its spectrum is {[4, 4, 4]}. This pattern continues for other complete graphs: since the complement is the empty graph ({n} components), all {n-1} eigenvalues are equal to {n}.
  • Four graphs of diameter 2, which are shown below, with each caption being the spectrum.
1, 1, 4
1, 3, 4
2, 2, 4
2, 4, 4


  • The graph [1, 3, 4] has more distinct eigenvalues than its diameter.
  • The graph [2, 2, 4] is regular (all vertices have the same degree).
  • The smallest eigenvalue of graphs [1, 1, 4] and [2, 2, 4] is multiple, due to the graphs having a large group of automorphisms (here rotations); applying some of these automorphisms to an eigenfunctions for the smallest eigenvalue yields another eigenfunction.
  • [1, 3, 4] and [2, 4, 4] also have automorphisms, but their automorphisms preserve the eigenfunction for the lowest eigenvalue, up to a constant factor.

5 vertices

  • One graph of diameter 4, the path. Its spectrum is related to the golden ratio: it consists of {(3\pm \sqrt{5})/2, (5\pm \sqrt{5})/2}.
  • One graph of diameter 1, the complete one: [5, 5, 5, 5]
  • Five graphs of diameter 3. All have connected complement, with the highest eigenvalue strictly between 4 and 5. None are regular. Each has 4 distinct eigenvalues.
  • 14 graphs of diameter 2. Some of these are noted below.

Two have connected complement, so their eigenvalues are less than 5 (spectrum shown on hover):

One has both integers and non-integers in its spectrum, the smallest such graph. Its eigenvalues are {3\pm \sqrt{2}, 3, 5}.

1.585786, 3, 4.414214, 5

Two have eigenvalues of multiplicity 3, indicating a high degree of symmetry (spectrum shown on hover).

Two have all eigenvalues integer and distinct:


The 5-cycle and the complete graph are the only regular graphs on 5 vertices.

6 vertices

This is where we first encounter isospectral graphs: the Laplacian spectrum cannot tell them apart.

Both of these have spectrum {3\pm \sqrt{5}, 2, 3, 3} but they are obviously non-isomorphic (consider the vertex degrees):

Both have these have spectrum {3\pm \sqrt{5}, 3, 3, 4} and are non-isomorphic.

Indeed, the second pair is obtained from the first by taking graph complement.

Also notable are regular graphs on 6 vertices, all of which have integer spectrum.

1, 1, 3, 3, 4
3, 3, 3, 3, 6
2, 3, 3, 5, 5
4, 4, 4, 6, 6
6, 6, 6, 6, 6

Here [3, 3, 3, 3, 6] (complete bipartite) and [2, 3, 3, 5, 5] (prism) are both regular of degree 3, but the spectrum allows us to tell them apart.

The prism is the smallest regular graph for which the first eigenvalue is a simple one. It has plenty of automorphisms, but the relevant eigenfunction (1 on one face of the prism, -1 on the other face) is compatible with all of them.

7 vertices

There are four regular graphs on 7 vertices. Two of them are by now familiar: 7-cycle and complete graph. Here are the other two, both regular of degree 4 but with different spectra.

3, 4, 4, 5, 5, 7
3.198, 3.198, 4.555, 4.555, 6.247, 6.247

There are lots of isospectral pairs of graphs on 7 vertices, so I will list only the isospectral triples, of which there are five.

Spectrum 0.676596, 2, 3, 3.642074, 5, 5.681331:

Spectrum 0.726927, 2, 3.140435, 4, 4, 6.132637:

Spectrum 0.867363, 3, 3, 3.859565, 5, 6.273073:

Spectrum 1.318669, 2, 3.357926, 4, 5, 6.323404:

All of the triples mentioned so far have connected complement: for example, taking the complement of the triple with the spectrum [0.676596, 2, 3, 3.642074, 5, 5.681331] turns it into the triple with the spectrum [1.318669, 2, 3.357926, 4, 5, 6.323404].

Last but not least, an isospectral triple with an integer spectrum: 3, 4, 4, 6, 6, 7. This one has no counterpart since the complement of each of these graphs is disconnected.

8 vertices

Regular graphs, excluding the cycle (spectrum 0.585786, 0.585786, 2, 2, 3.414214, 3.414214, 4) and the complete one.

Degree 3 regular:

2, 2, 2, 4, 4, 4, 6
0.763932, 2, 4, 4, 4, 4, 5.236068
1.438447, 2.381966, 2.381966, 3, 4.618034, 4.618034
1.267949, 2, 2.585786, 4, 4, 4.732051,
2, 2, 2.585786, 2.585786, 4, 5.414214,

Degree 4 regular

4, 4, 4, 4, 4, 4, 8
2.763932, 4, 4, 4, 4, 6, 7.236068
2.438447, 3.381966, 3.381966, 5, 5.618034, 5.618034, 6.561553
2.585786, 3.267949, 4, 4, 5.414214, 6, 6.732051
2.585786, 2.585786, 4, 5.414214, 5.414214, 6, 6
2, 4, 4, 4, 6, 6, 6

Degree 5 regular

4.381966, 4.381966, 5, 5, 6.618034, 6.618034, 8
4, 4, 6, 6, 6, 6, 8
4, 4.585786, 4.585786, 6, 6, 7.414214, 7.414214

Degree 6 regular

6, 6, 6, 6, 8, 8, 8

Credits: the list of graphs by Brendan McKay and NetworkX library specifically the methods read_graph6 (to read the files provided by Prof. McKay), laplacian_spectrum, diameter, degree, and draw.

Extremal Taylor polynomials

Suppose {f(z)=a_0+a_1z+a_2z^2+\cdots} is a holomorphic function in the unit disk {|z|<1} such that {|f|\le 1} in the disk. How large can its Taylor polynomial {T_n(z)=a_0+a_1z+\cdots +a_n z^n} be in the disk?

We should not expect {T_n} to be bounded by 1 as well. Indeed, the Möbius transformation {f(z)=(z+1/2)/(1+z/2)} has Taylor expansion {(z+1/2)(1-z/2+O(z^2)) = 1/2 + (3/4)z + O(z^2)}, so {T_1(1)=5/4} in this case. This turns out to be the worst case: in general {T_1} is bounded by 5/4 in the disk.

For the second-degree polynomial {T_2} the sharp bound is {89/64}, attained when {f(z) = (8z^2 + 4z + 3)/(3z^2 + 4z + 8)}; the image of the unit circle under the extremal {T_2} is shown below. Clearly, there is something nontrivial going on.

Extremal T_2 attains 89/64 > 1.39

Edmund Landau established the sharp bound for {|T_n|} in his paper Abschätzung der Koeffizientensumme einer Potenzreihe, published in Archiv der Mathematik und Physik (3) 21 in 1913. Confusingly, there are two papers with the same title in the same issue of the journal: one on pages 42-50, the other on pages 250-255, and they appear in different volumes of Landau’s Collected Works. The sharp bound is in the second paper.

First steps

By rotation, it suffices to bound {|T_n(1)|}, which is {|a_0+\cdots +a_n|}. As is often done, we rescale {f} a bit so that it’s holomorphic in a slightly larger disk, enabling the use of the Cauchy integral formula on the unit circle {\mathbb T}. The Cauchy formula says {2\pi i a_k = \int_{\mathbb T} z^{-k-1} f(z) \,dz}. Hence

{\displaystyle 2\pi |T_n(1)| = \left| \int_{\mathbb T} z^{-n-1}(1+z+\dots+z^n) f(z) \,dz \right|}

It is natural to use {|f(z)|\le 1} now, which leads to

{\displaystyle 2\pi |T_n(1)| \le \int_{\mathbb T} |1+z+\dots+z^n|\, |dz| }

Here we can use the geometric sum formula and try to estimate the integral of {|(1-z^{n+1})/(1-z)|} on the unit circle. This is what Landau does in the first of two papers; the result is {O(\log n)} which is the correct rate of growth (this is essentially the Dirichlet kernel estimate from the theory of Fourier series). But there is a way to do better and get the sharp bound.

Key ideas

First idea: the factor {1+z+\dots+z^n} could be replaced by any polynomial {Q} as long as the coefficients of powers up to {n} stay the same. Higher powers contribute nothing to the integral that evaluates {T_n(1)}, but they might reduce the integral of {|Q|}.

Second idea: we should choose {Q} to be the square of some polynomial, {Q=P^2}, because {(2\pi)^{-1}\int_{\mathbb T} |P(z)|^2\, |dz|} can be computed exactly: it is just the sum of squares of the coefficients of {P}, by Parseval’s formula.


Since {1+z+\dots+z^n} is the {n}-th degree Taylor polynomial of {(1-z)^{-1}}, it is natural to choose {P} to be the {n}-th degree Taylor polynomial of {(1-z)^{-1/2}}. Indeed, if {P_n(z) = (1-z)^{-1/2} + O(z^{n+1})}, then {P_n(z)^2 = (1-z)^{-1} + O(z^{n+1}) = 1+z+\dots+z^n + O(z^{n+1})} as desired (asymptotics as {z\to 0}). The binomial formula tells us that
{\displaystyle P_n(z)=\sum_{k=0}^n (-1)^k\binom{-1/2}{k}z^k }

The coefficient of {z^k} here can be written out as {(2k-1)!!/(2k)!!} or rewritten as {4^{-k}\binom{2k}{k}} which shows that in lowest terms, its denominator is a power of 2. To summarize, {|T_n(1)|} is bounded by the sum of squares of the coefficients of {P_n}. Such sums are referred to as the Landau constants,

{\displaystyle G_n = 1+ \left(\frac{1}{2}\right)^2 + \left(\frac{1\cdot 3}{2\cdot 4}\right)^2 + \cdots + \left(\frac{(2n-1)!!}{(2n)!!}\right)^2 }

A number of asymptotic and non-asymptotic formulas have been derived for {G_n}, for example Brutman (1982) shows that {G_n - (1/\pi)\log(n+1)} is between 1 and 1.0663.


To demonstrate the sharpness of the bound {|T_n|\le G_n}, we want {|f|\equiv 1} and {P_n(z)^2f(z)/z^n\ge 0} on the unit circle. Both are arranged by taking {f(z) = z^n P_n(1/z) / P_n(z)} which is a Blaschke product of degree {n}. Note that the term {P_n(1/z)} can also be written as {\overline{P_n(1/\bar z)}}. Hence {P_n(z)^2f(z)/z^n = P_n(z) \overline{P_n(1/\bar z)}} which is simply {|P_n(z)|^2} when {|z|=1}. Equality holds in all the estimates above, so they are sharp.

Here are the images of the unit circle under extremal Taylor polynomials {T_5} and {T_{20}}.

Extremal Taylor polynomial of 5th degree
Extremal Taylor polynomial of 20th degree

These polynomials attain large values only on a short subarc of the circle; most of the time they oscillate at levels less than 1. Indeed, the mean value of {|T_n|^2} cannot exceed the mean of {|f|^2} which is at most 1. Here is the plot of the roots of extremal {T_n}:  they are nearly uniform around the circle, except for a gap near 1.

Roots of extremail T_10
Roots of extremal T_20

But we are not done…

Wait a moment. Does {f(z) = z^n P_n(1/z) / P_n(z)} define a holomorphic function in the unit disk? We are dividing by {P_n} here. Fortunately, {P_n} has no zeros in the unit disk, because its coefficients are positive and decreasing as the exponent {k} increases. Indeed, if {p(z)=c_0+c_1z+\cdots + c_nz^n} with {c_0>c_1>\dots>c_n > 0}, then {(1-z)p(z)} has constant term {c_0} and other coefficients {c_1-c_0}, {c_2-c_1}, … {c_n-c_{n-1}}, {-c_n}. Summing the absolute values of the coefficients of nonconstant terms we get {c_0}. So, when these coefficients are attached to {z^k} with {|z|<1}, the sum of nonconstant terms is strictly less than {c_0} in absolute value. This proves {P_n\ne 0} in the unit disk. Landau credits Adolf Hurwitz with this proof.

In fact, the zeros of {P_n} (Taylor polynomials of {(1-z)^{-1/2}}) lie just outside of the unit disk.

Zeros of P_20
Zeros of P_50

The zeros of the Blaschke products formed from {P_n} are the reciprocals of the zeros of  {P_n}, so they lie just inside the unit circle, much like the zeros of {T_n} (though they are different).

The distribution of pow(2, n, n)

For a positive integer {n} let {f(n)} be the remainder of the division of {2^n} by {n}. This is conveniently computed in Python as pow(2, n, n). For example, the first 20 values are returned by

[pow(2, n, n) for n in range(1, 21)]

and they are:

[0, 0, 2, 0, 2, 4, 2, 0, 8, 4, 2, 4, 2, 4, 8, 0, 2, 10, 2, 16]

Obviously, {f(n)=0} if and only if {n} is a power of {2}. It also looks like {f} is always even, with powers of 2 dominating the list of its values. But {f} does take on odd values, although this does not happen often: only 6 times in the first 100 integers.

[(n, pow(2, n, n)) for n in range(1, 101) if pow(2, n, n) % 2]


[(25, 7), (45, 17), (55, 43), (91, 37), (95, 13), (99, 17)]

It is conjectured that the range of {f} consists of all nonnegative integers except 1. Here is a proof that {f(n)\ne 1}, provided by Max Alexeyev on the OEIS page for A036236

Suppose {2^n \equiv 1 \bmod n} for some {n>1}. Let {p} be the smallest prime divisor of {n}. Then {2^n\equiv 1 \bmod p}. This means that the order of {2} in the multiplicative group {(\mathbb Z/ p \mathbb Z)^*} is a divisor of {n}. But this group has {p-1} elements, and the only divisor of {n} that is smaller than {p} is {1}. Thus, the order of {2} is {1}, which means {2\equiv 1 \bmod p}, which is absurd.

The fact that the smallest {n} with {f(n) = 3} is 4700063497 deserves to be mentioned here. But I would like to consider the most frequent values of {f} instead. Experimentally, they are found like this:

from collections import Counter
freq = Counter(pow(2, n, n) for n in range(1, 100000000 + 1))

which prints 20 pairs (value, frequency):

(2, 5763518),
(4, 3004047),
(8, 2054167), 
(16, 1569725), 
(64, 1076293), 
(256, 824438), 
(512, 737450), 
(1024, 668970), 
(32, 638063), 
(4096, 569705), 
(128, 466015), 
(65536, 435306), 
(262144, 389305), 
(1048576, 351723), 
(2097152, 326926), 
(16384, 246533), 
(16777216, 243841), 
(32768, 232037), 
(2048, 153537), 
(8192, 131614)

These are all powers of 2… but not exactly in the order one might expect. I repeated the experiment for the first {10^8k} integers with {k=1, 2, 3, 4, 5, 6}. The frequency order remained stable at the top. These are the most common 11 values, with their frequency (in %) listed for the aforementioned six ranges.

2, [5.8, 5.5, 5.4, 5.3, 5.3, 5.2]
4, [3.0, 2.9, 2.8, 2.8, 2.7, 2.7]
8, [2.1, 2.0, 1.9, 1.9, 1.9, 1.8]
16, [1.6, 1.5, 1.5, 1.4, 1.4, 1.4]
64, [1.1, 1.0, 1.0, 1.0, 1.0, 1.0]
256, [0.8, 0.8, 0.8, 0.8, 0.7, 0.7]
512, [0.7, 0.7, 0.7, 0.7, 0.7, 0.7]
1024, [0.7, 0.6, 0.6, 0.6, 0.6, 0.6]
32, [0.6, 0.6, 0.6, 0.6, 0.6, 0.6]
4096, [0.6, 0.5, 0.5, 0.5, 0.5, 0.5]
128, [0.5, 0.4, 0.4, 0.4, 0.4, 0.4]

It seems that 32 and 128 are consistently less common than one might expect.

The most common value, 2, is contributed by primes and pseudoprimes to base 2. The value of 4 appears when {n = 2p} with {p} prime (but not only then). Still, the primes having zero density makes it difficult to explain the frequency pattern based on primality. A more convincing reason could be: when {n=2k} is even, we are computing {4^k \bmod 2k} and that is likely to be a power of {4}. This boosts the frequency of the even (generally, composite) powers of 2 in the sequence.

Magic angles

The function {u(x, y, z) = 2z^2 - x^2 - y^2} is notable for the following combination of properties:

  1. It is a harmonic function: {\Delta u = -2 - 2 + 4 = 0}
  2. It vanishes at the origin {(0, 0, 0)} together with its gradient.
  3. It is positive on the cone {C=\{(x, y, z) : z > \sqrt{(x^2+y^2)/2}\}}

The cone C has the opening angle {\theta_3 = \cos^{-1}(1/\sqrt{3}) \approx 57.4^\circ} which is known as the magic angle in the context of NMR spectroscopy. Let us consider the mathematical side of its magic.

If C is replaced by any larger cone, the properties 1-2-3 cannot be satisfied by a harmonic function in a neighborhood of the origin. That is, C is the largest cone such that a harmonic function can have a critical point at its vertex which is also a point of its extremum on the cone. Why is that?

Let {u} be a harmonic function in some neighborhood of {(0,0,0)} and suppose {u(0,0,0)=0}, {\nabla u(0,0,0) = 0}, and {u>0} on some cone {C = \{(x, y, z) \colon z>c \sqrt{x^2+y^2+z^2} \}}. Expand {u} into a sum of polynomials {p_2+p_3+\cdots } where each {p_k} is a harmonic homogeneous polynomial of degree {k}. Let {m} be the smallest integer such that {p_m} is not identically zero. Then {p_m} has the same properties as {u} itself, since it dominates the other terms near the origin. We may as well replace {u} by {p_m}: that is, {u} is a spherical harmonic from now on.

Rotating {u} around the {z}-axis preserves all the properties of interest: harmonic, positive on the cone, zero gradient at the origin. Averaging over all these rotations we get a rotationally symmetric function known as a zonal spherical harmonic. Up to a constant factor, such a function is given by {P_m(\cos \phi)} where {\phi} is a spherical coordinate (angle with the {z}-axis) and {P_m} is the Legendre polynomial of degree {m}.

The positivity condition requires {P_m(t) > 0} for {t>c}. In other words, the bound on {\theta} comes from the greatest zero of the Legendre polynomial. As is true for orthogonal polynomials in general, the zeros are interlaced: that is, the zeros of {P_m=0} appear strictly between any two consecutive zeros of {P_{m+1}}. It follows that the value of the greatest zero grows with {m}. Thus, it is smallest when {m=2}. Since {P_2(t) = (3t^2-1)/2}, the zero of interest is {1/\sqrt{3}}, and we conclude that {c \ge  1/\sqrt{3}}. Hence the magic angle.

A piece of magic cone

The magic angle is easy to visualize: it is formed by an edge of a cube and its space diagonal. So, the magic cone with vertex at (0,0,0) is the one that passes through (1, 1, 1), as shown above.

In other dimensions the zonal harmonics are expressed in terms of Gegenbauer polynomials (which reduce to Chebyshev polynomials in dimensions 2 and 4). The above argument applies to them just as well. The relevant Gegenbauer polynomial of degree {2} is {nx^2-1} up to a constant. Thus, in {\mathbb R^n} the magic angle is {\cos^{-1}(1/\sqrt{n})}, illustrated by the harmonic function {u(x)=nx_n^2 - |x|^2}.

This analysis is reminiscent of the Hopf Lemma which asserts that if a positive harmonic function in a smooth domain has boundary value 0 at some point, the normal derivative cannot vanish at that point. The smoothness requirement is typically expressed as the interior ball condition: one can touch every boundary point by a ball contained in the domain. The consideration of magic angles shows that if the function is also harmonic in a larger domain, the interior ball condition can be replaced by the interior cone condition, provided that the opening angle of the cone is greater than {\cos^{-1}(1/\sqrt{n})}.

Measurability of Banach space valued functions

There is only an indirect proof of the existence of a function {f\colon [0, 1]\to \mathbb R} that is not Lebesgue measurable. But it’s easy to give an explicit example when the codomain of {f} is a Banach space: just let {b(t)} be the sequence of the binary digits of {t}, considered as an element of the sequence space {\ell_\infty}.

Why is {b} not measurable? Recall that a Banach space-valued function {f} is (Bochner) measurable iff there is a sequence of simple functions {\sum v_k \chi_{A_k}} (finite sum, measurable {A_k}, arbitrary vectors {v_k}) that converges to {f} almost everywhere. This property implies that, with an exception of a null set, the range of {f} lies in the separable subspace spanned by all the vectors {v_k} used in the sequence of simple functions. But {b} has the property {\|b(t)-b(s)\|=1} whenever {t\ne s}, so the image of any uncountable set under {b} is nonseparable.

Another way to look at this: on the interval [0, 1) the function {b} is injective and its range has discrete topology, which implies that every subset of [0, 1) is the preimage of some open subset of {\ell_\infty} under {b}.

The binary-digits functions can also be used to illustrate an issue with the duality of Lebesgue-Bochner spaces {L_p(0, 1; X)} where {X} is a Banach space and {1\le p<\infty}. (So, {f} belongs to this space iff it is Bochner measurable and the {L^p} norm of {\|f\|\colon [0, 1]\to [0, \infty)} is finite.) In general we do not have the expected relation {L_p(0, 1; X)^* = L_q(0, 1; X^*)} with {1/p+1/q=1}. The natural isometric embedding of {L_q(0, 1; X^*)} into {L_p(0, 1; X)^*} is still there: any {g\in L_q(0, 1; X^*)} acts on {L_p(0, 1; X)} by {f\mapsto \int \langle f(t), g(t) \rangle\, dt}. But unless {X^*} has the Radon–Nikodym property, these are more bounded linear functionals on {L_p(0, 1; X)}.

To construct such a functional, let {b_n(t)} be the {n}-th binary digit of {t}. Given {f\in L_1(0, 1; \ell_1)}, write it in coordinates as {(f_1, f_2, \dots)} and define {\varphi(f) = \sum_n \int_0^1 f_n b_n}. This is a bounded linear functional, since {|\varphi(f)|\le \sum_n \int_0^1 |f_n| = \|f\|}. But there is no function {g\in L_\infty(0, 1; \ell_\infty)} that represents it, i.e., {\varphi(f) = \int_0^1 \langle f(t), g(t)\rangle \,dt = \sum_n \int_0^1 f_n g_n }. Indeed, if such {g} existed then by considering {f} with only one nonzero coordinate, we find that {g_n} must be {b_n}, using the duality {L_1^* =  L_\infty} in the scalar case. But the function {[0, 1]\to \ell_\infty} with the components {(b_1, b_2, \dots)} is not measurable, as shown above.

This example, which applies to all {1\le p<\infty}, also serves as a reminder that the duality relation {L_p(0, 1; X)^* = L_q(0, 1; X^*)} depends on the dual space {X^*} having the Radon-Nikodym property (RNP), not {X} itself. Indeed, {X=\ell_1} has the RNP; its dual does not.

The importance of {X^*} having the RNP becomes clear once one tries to follow the usual proof of {L_p^*=L_q}. Given {\varphi\in L_p(0,1;X)^*}, we can define an {X^*}-valued measure {\tau} on {[0, 1]} by {\tau(A)(v) = \varphi( v\chi_A)} where {A\subset [0, 1]} is Lebesgue measurable and {v\in X}. This measure has reasonable finiteness and continuity properties coming from {\varphi} being a bounded functional. Still, the existence of a density {g\colon [0, 1]\to X^*} of the measure {\tau} depends on the structure of the Banach space {X^*}.

Immaculate perfection

— I am so lucky. I was born on April 4th 1944, that’s 4/4/44. If you add that up, it comes to 16, one six. One plus six is seven: luckiest number of all.
— You know your math.
— It’s more than math, Mike, it’s… immaculate perfection.

The value of 7 for the the digital root indeed seems preferable, at least on this blog. But looking at the dates formed by repeated digits is too anthropocentric (not to mention xkcd1179). Besides, mathematics already has a concept of perfect numbers: being equal to the sum of proper divisors.

So, are there any perfect numbers with digital root equal to 7, equivalently {n\equiv 7\bmod 9}? For even perfect numbers, one can expect that the Euclid-Euler theorem {n = 2^{p-1} (2^p-1)} will help with the answer, but even so, the solution turns out to be surprisingly simple.

A lucky coincidence is that {2^6\equiv 1\bmod 9} ({64 = 1 + 7\cdot 9}). Since all primes greater than 3 have remainder 1 or 5 mod 6, there are only a few cases to check:

  • {p\equiv 1\bmod 6}, hence {2^{p-1}\equiv 2^{1-1} \equiv 1 \bmod 9} and {2^p-1\equiv 2^1-1 \equiv 1\bmod 9}. In this case {n\equiv 1\cdot 1 \equiv 1 \bmod 9}
  • {p\equiv 5\bmod 6}, hence {2^{p-1}\equiv 2^{5-1} \equiv 7 \bmod 9} and {2^p-1\equiv 2^5-1 \equiv 4\bmod 9}. In this case {n\equiv 7\cdot 4 \equiv 1 \bmod 9}
  • Exceptional cases p=2, 3 correspond to n = 6, 28. The latter is congruent to 1 mod 9, matching the preceding cases.

So, every even perfect number except 6 is congruent to 1 mod 9. Should a perfect number be congruent to 7 mod 9, it has to be odd. An odd perfect number, should it exist, must be either divisible by 9 (not good for us) or congruent to 1 mod 12. (Reference: On the form of an odd perfect number). The condition {n\equiv 1 \bmod 12} is consistent with {n\equiv 7 \bmod 9}: for example, 25 satisfies both. Come to think of it, the sum of proper divisors of 25 is… not 25. But there might still be a 7 mod 9 perfect number out there.

What numbers k have the property that the sequence {n mod k : n is perfect} stabilizes? Assuming odd perfect numbers do not exist, every power of 2 has this property. So do 3 and 9 by the above (and therefore, their products with powers of 2). I don’t know any other such k, in particular n mod 27 does not appear to stabilize. Of course, it is impossible to prove any negative result here since we don’t know if there are infinitely many perfect numbers.

Bipartite metric spaces and positive kernels

Bipartite metric spaces provide simple examples of highly non-Euclidean metric spaces. Let {A} and {B} be two disjoint abstract sets, say, with {m} and {n} points respectively. Define a metric on the union {A\cup B} as follows, where {a, b, c} are some positive numbers:

  • The distance between distinct elements of {A} is {a}.
  • The distance between distinct elements of {B} is {b}.
  • The distance between two elements from different sets is {c}.

The triangle inequality is satisfied provided that {a, b\le 2c}. When {c} is large, the space is not very interesting: it is approximately the union of two simplexes placed at some large distance from each other. On the other hand, when {a=b=2c} we encounter a bizarre situation: every point of {B} is a “midpoint” between any two distinct points of {A}, and vice versa.

A function {K\colon X\times X \to \mathbb R} is called a positive kernel if the matrix {K(x_i, x_j)_{i, j=1}^n} is positive semidefinite for any choice of {n} and {x_1, \dots, x_n\in X}. A classical theorem of Schoenberg says that a metric space {(X, d)} is isometric to a subset of a Hilbert space if and only if {\exp(-\gamma  d^2)} is a positive kernel for all {\gamma >0}. One may ask: is there any function {f\colon [0, \infty)\to\mathbb R} such that {f\circ d} is a positive kernel on {X} for every metric space {(X, d)}?

Suppose {f} is such a function. Applying it to the bipartite metric defined above, we find that the following matrix must be positive semidefinite (the case {m=3}, {n=2} shown; in general the diagonal blocks have sizes {m\times m} and {n\times n}):

{\displaystyle \begin{pmatrix} f(0) & f(a) & f(a)  & f(c) & f(c) \\   f(a) & f(0) & f(a)  & f(c) & f(c) \\   f(a) & f(a) & f(0)  & f(c) & f(c) \\   f(c) & f(c) & f(c)  & f(0) & f(b) \\   f(c) & f(c) & f(c)  & f(b) & f(0)  \end{pmatrix} }

Test the PSD property against vectors of the form {(s, s, s, t, t)}, that is {m} entries equal to some number {s} followed by {n} entries equal to some number {t}. The result is the quadratic form {As^2 + Bt^2 + 2C^2st } where {A = (m^2-m) f(a)+mf(0)}, {B = (n^2-n)f(b)+nf(0)}, and {C = mnf(c)}.

For this quadratic form to be positive semidefinite, we need {A\ge 0} and {C^2\le AB} to hold. The inequality {A\ge 0} amounts to {f(a)\ge 0} and {f(0)\ge 0}, because {m} can be any positive integer. Simply put, {f} must be nonnegative.

The inequality {C^2\le AB} simplifies to

{ mn f(c)^2 \le ( (m-1) f(a)+f(0)) \cdot ((n-1)f(b)+ f(0)) }

When {m=n=1} and {a=b}, we obtain {f(c)\le f(0)}; so, {f} is maximized at {0}. Letting {m, n\to \infty} yields {f(c)^2 \le f(a)f(b)}. What does this mean?

If {c=(a+b)/2}, the inequality {f(c)^2 \le f(a)f(b)} amounts to log-convexity of {f}, and there are lots of log-convex functions. But crucially, our only constraint on {c} in the construction of a bipartite metric space is {c\ge \max(a, b)/2}. In the special case {a=b} we find that {f(c)\le f(a)} whenever {c\ge a/2} (and {a>0}). On one hand, {f} must be nonincreasing because all {c\ge a} are allowed. On the other hand, {f(a/2)\le f(a)} implies that the sequence {k\mapsto f(2^k)} is nondecreasing for {k\in\mathbb Z}. These conditions can be satisfied together only if {f} is constant on {(0, \infty)}.

And there you have it, a complete description of functions {f\colon [0, \infty)\to\mathbb R} that transform every distance matrix into a positive semidefinite matrix: {f\equiv c\ge 0} on {(0, \infty)}, and {f(0)\ge c}. It is immediate that such {f} indeed works for all metric spaces, not just bipartite ones.

Noncontracting Jordan curves

A simple closed curve {\Gamma} on the plane can be parameterized by a homeomorphism {f\colon \mathbb T\to \Gamma} in infinitely many ways. It is natural to look for “nice” parameterizations, like nonexpanding ones in the previous post. This time, let us look for noncontracting parameterizations: {|f(a)-f(b)|\ge |a-b|} for all {a, b\in \mathbb T}. Note that Euclidean distance is used here, not arclength. And we still want {f} to be a homeomorphism. Its noncontracting property simply means that the inverse {f^{-1}} is nonexpanding aka 1-Lipschitz.

What are some necessary conditions for the existence of a noncontracting parameterization? We can mimic the three from the earlier post Nonexpanding Jordan curves, with similar proofs:

  1. The curve must have length at least {2\pi}.
  2. The curve must have diameter at least 2.
  3. The curve must enclose some disk of radius 1. (Apply Kirszbraun’s theorem to {f^{-1}} and note that the resulting Lipschitz map G of the plane will attain 0 somewhere, by a topological degree / winding number argument. Any point where G = 0 works as the center of such a disk.)

This time, the 3rd item supersedes both 1 and 2. Yet, the condition it presents is not sufficient for the existence of a noncontracting parameterization. A counterexample is a disk with a “comb-over”:


Indeed, suppose that {g=f^{-1}} is a nonexpanding map from this curve onto the unit circle. Let {1 = A < B < C} be the three points at which the curve meets the positive x-axis. Since every point of the curve is within small distance from its arc AB, it follows that {g(AB)} is a large subarc of {\mathbb T} that covers almost all of the circle. But the same argument applies to the arcs {g(BC)} and {g(CA)}, a contradiction.

No matter how large the enclosed disk is, a tight combover around it can force arbitrarily large values of the Lipschitz constant of {f^{-1}}. Sad!

What about sufficient conditions?

  1. It is sufficient for the curve to be star-shaped with respect to the origin, in addition to enclosing the unit disk. (Equivalent statement: {\Gamma} has a polar equation {r = r(\theta)} in which {r \ge 1}.) Indeed, the nearest-point projection onto a convex set is a nonexpanding map, and projecting {\Gamma} onto the unit circle in this way gives the desired parameterization.
  2. It is sufficient for the curve to have curvature bounded by 1. I am not going into this further, because second derivatives should not be needed to control a Lipschitz constant.

Recall that for nonexpanding parameterizations, the length of the curve was a single quantity that mostly answered the existence question (length at most 4 — yes; length greater than {2\pi} — no). I do not know of such a quantity for the noncontracting case.