The effect of adding an edge on Laplacian eigenvalues

Adding an edge to a connected graph (between two of its existing vertices) makes the graph “even more” connected. How to quantify this? The Poincaré inequality gives a way to do this: for any function {f\colon V\to \mathbb R} with zero mean,

{\displaystyle \sum_V f(v)^2 \le C \sum_E (f(u)-f(v))^2 }

where the sum on the left is over the vertex set {V}, the sum on the right is over the edge set {E}, and {u, v} on the right are the endpoints of an edge. Suppose {C} is the smallest possible constant in this inequality, the Poincaré constant of the graph. Adding an edge to the graph does not change the sum on the left but may increase the sum on the right. Thus, the new Poincaré constant {C'} satisfies {C'\le C}: greater connectivity means smaller Poincaré constant.

It is more convenient to work with the reciprocal of {C}, especially because {1/C = \lambda_2}, the smallest positive eigenvalue of the graph Laplacian. Let {\lambda_2'} be the corresponding eigenvalue after an edge is added. The above shows that {\lambda_2 \le \lambda_2'}. But there is also an upper bound on how much this eigenvalue (or any Laplacian eigenvalue) can grow. Indeed, adding an edge amounts to adding the matrix

{\displaystyle B = \begin{pmatrix} 1 & -1 \\ -1 & 1 \end{pmatrix}} (with a bunch of zero rows/columns)

to the Laplacian matrix {L}. The eigenvalues of {B} are {0} and {2}. As a consequence of the Courant-Fischer minimax formulas, the ordered eigenvalues {\lambda_1 \le \dots \le \lambda_n} satisfy {\lambda_k(L) \le \lambda_k(L+B) \le \lambda_k(L) + 2} for every {k}.

Both of the above bounds are sharp. If an diagonal edge is added to the 4-cycle,


the smallest positive eigenvalue remains the same ({2}). The reason is that a typical eigenfunction for {\lambda_2} takes on values {-1, 0, 1, 0} (in cyclic order) and adding an edge between two vertices with the same value of such eigenfunction does not affect the Poincaré constant.

On the other hand, adding one more edge increases {\lambda_2} from {2} to {4}.


Normalized Laplacian

Another form of the Poincaré inequality involves the vertex degree: when summing over vertices, we give each of them weight equal to the number of neighbors.

{\displaystyle \sum_V f(v)^2 d_v \le C \sum_E (f(u)-f(v))^2 } provided {\displaystyle \sum_V f(v)d_v = 0}

Here {1/C = \mu_2}, the smallest positive eigenvalue of the normalized Laplacian {L_N}, the matrix with {1} on the diagonal, where off-diagonal entries are {-1/\sqrt{d_ud_v}} if {u\sim v}, and {0} otherwise. The sum of the normalized Laplacian eigenvalues is equal to {n} (the number of vertices), regardless of how many edges are there. So we cannot expect all them to grow when an edge is added. But how do they change?

Let {\mu_2'} be the corresponding eigenvalue after an edge is added.
Here are two examples for which {\mu_2' - \mu_2 = 1/2}:

Completing a 3-path to a 3-cycle increases the eigenvalue from 1 to 3/2.


Completing a 4-path to a 4-cycle increases the eigenvalue from 1/2 to 1.

This pattern does not continue: “5-path to 5-cycle” results in a smaller increase, which is not even largest among 5-vertex graphs. It appears that the above examples are the only two cases of {\mu_2' - \mu_2 = 1/2}, with all other graphs having {\mu_2' - \mu_2 < 1/2}. (I do not have a proof of this.)

The opposite direction

Adding an edge can also make {\mu_2} smaller (thus, the weighted Poincaré constant becomes larger, showing that it may not be as useful for quantifying the connectivity of a graph). The smallest example is adding an edge to the star graph on 4 vertices:


here {\mu_2 = 1} but {\mu_2' = 5/4 - \sqrt{33}/12 \approx 0.77}.

Let us analyze the star graphs on {n} vertices, for general {n}. In an earlier post we saw that {\mu_2 = 1}, so it remains to find {\mu_2'}. Let us order the vertices so that {1} is the center of the star and {(2,3)} is the added edge. Write {\beta = -1/\sqrt{n-1}} for brevity. Then the normalized Laplacian matrix {L_N} is

{\displaystyle \begin{pmatrix} 1 & \beta/\sqrt{2} & \beta/\sqrt{2} & \beta & \cdots & \beta \\ \beta/\sqrt{2} & 1 & -1/2 & 0 & \cdots & 0 \\ \beta/\sqrt{2} & -1/2 & 1 & 0 & \cdots & 0 \\ \beta & 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \beta & 0 & 0 & 0 & \cdots & 1 \\ \end{pmatrix}}

where all rows numbered {i > 3} consist of {\beta} in the first column and {1} in the {i}th column. This shows that {L_N-I} has rank {4}, hence {1} is an eigenvalue of multiplicity {n-4}. Also, {0} is a simple eigenvalue as for any connected graph. Less obviously, {3/2} is an eigenvalue with the corresponding eigenvector {(0, 1, -1, 0, \dots, 0)} – this eigenvalue comes from the submatrix in row/columns 2 and 3, where the surrounding values in these row/columns are nice enough to cancel out. We are left with two eigenvalues to find, and we know their sum is {n - (n-4) - 3/2 = 5/2}. This means the characteristic polynomial of {L_N} is of the form

{\displaystyle p(t) = t (t-1)^{n-4} (t-3/2) (t^2 - 5t/2 + \gamma)}

where the constant {\gamma} remains to be found. I am going to skip to the answer here: {\gamma = n/(n-1)}. The quadratic formula delivers the last two eigenvalues:

{\displaystyle \frac14 \left( 5 \pm \sqrt{\frac{9n-25}{n-1}}\right) }

The smaller of these is {\mu_2'} that we were looking for:

{\displaystyle \mu_2' = \frac14 \left( 5 - \sqrt{\frac{9n-25}{n-1}}\right) \to \frac12 } as {n\to\infty}

Thus, adding an edge to a star graph results in negative {\mu_2' - \mu_2} which decreases to {-1/2} as {n\to\infty}. Exhaustive search for {n\le 9} shows that the star graph has the smallest value of {\mu_2' - \mu_2} among all graphs on {n} vertices. It is reasonable to expect this pattern to continue, which would imply {\mu_2' - \mu_2 > -1/2} in all cases.

Continued fractions vs Power series

The Taylor series of the exponential function,
{\displaystyle e^x = \sum_{n=0}^\infty \frac{x^n}{n!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \frac{x^4}{24} + \frac{x^5}{120} + \cdots }
provides reasonable approximations to the function near zero: for example, with {\displaystyle T_3(x) = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} } we get

Taylor polynomial T_3 in red

The quality of approximation not being spectacular, one can try to improve it by using rational functions instead of polynomials. In view of the identity
{\displaystyle e^x = \frac{e^{x/2}}{e^{-x/2}} \approx \frac{T_3(x/2)}{T_3(-x/2)}}
one can get {e^x\approx p(x)/p(-x)} with {\displaystyle p(x) = 1 + \frac{x}{2} + \frac{x^2}{8} + \frac{x^3}{48} }. The improvement is substantial for negative {x}:

Rational approximation based on exp(x/2)

Having rational approximation of the form {e^x\approx p(x)/p(-x)} makes perfect sense, because such approximants obey the same functional equation {f(x)f(-x)=1} as the exponential function itself. We cannot hope to satisfy other functional equations like {f(x+1)=e f(x)} by functions simpler than {\exp}.

However, the polynomial {T_n(x/2)} is not optimal for approximation {e^x \approx p(x)/p(-x)} except for {n=0, 1}. For degree {3}, the optimal choice is {\displaystyle p(x) = 1 + \frac{x}{2} + \frac{x^2}{10} + \frac{x^3}{120} }. In the same plot window as above, the graph of {p(x)/p(-x)} is indistinguishable from {e^x}.

Red=rational, Blue=exponential

This is a Padé approximant to the exponential function. One way to obtain such approximants is to replace the Taylor series with continued fraction, using long division:

{\displaystyle e^x = 1 + \dfrac{x}{1 + \dfrac{-x/2}{1 + \dfrac{x/6}{1 + \dfrac{-x/6}{1 + \dfrac{x/10}{1 + \dfrac{-x/10}{1 + \cdots }}}}}} }

Terminating the expansion after an even number of {x} apprearances gives a Padé approximant of the above form.

This can be compared to replacing the decimal expansion of number {e}:

{\displaystyle e = 2 + \frac{7}{10} + \frac{1}{10^2} + \frac{8}{10^3} + \frac{2}{10^4}+\cdots }

with the continued fraction expansion

{\displaystyle e = 2 + \dfrac{1}{1 + \dfrac{1}{2 + \dfrac{1}{1 + \dfrac{1}{1 + \dfrac{1}{4 + \dfrac{1}{1 + \cdots }}}}}} }

which, besides greater accuracy, has a regular pattern: 121 141 161 181 …

The numerators {p} of the (diagonal) Padé approximants to the exponential function happen to have a closed form:

{\displaystyle p(x) = \sum_{k=0}^n \frac{\binom{n}{k}}{\binom{2n}{k}} \frac{x^k}{k!} }

which shows that for every fixed {k}, the coefficient of {x^k} converges to {2^{-k}/k!} as {n\to\infty }. The latter is precisely the Taylor coefficient of {e^{x/2}}.

In practice, a recurrence relation is probably the easiest way to get these numerators: begin with {p_0(x)=1} and {p_1(x) = 1+x/2}, and after that use {\displaystyle p_{n+1}(x) = p_n(x) + \frac{x^2}{16n^2 - 4} p_{n-1}(x)}. This relation can be derived from the recurrence relations for the convergents {A_n/B_n} of a generalized continued fraction {\displaystyle \mathop{\raisebox{-5pt}{\huge K}}_{n=1}^\infty \frac{a_n}{b_n}}. Namely, {A_n = b_nA_{n-1}+a_nA_{n-2}} and {B_n = b_nB_{n-1}+a_nB_{n-2}}. Only the first relation is actually needed here.

Using the recurrence relation, we get

{\displaystyle p_2(x) = p_1(x) + \frac{x^2}{12}p_0(x) = 1 + \frac{x}{2} + \frac{x^{2}}{12} }

{\displaystyle p_3(x) = p_2(x) + \frac{x^2}{60}p_1(x) = 1 + \frac{x}{2} + \frac{x^{2}}{10} + \frac{x^{3}}{120} }

{\displaystyle p_4(x) = p_3(x) + \frac{x^2}{140}p_2(x) = 1 + \frac{x}{2} + \frac{3 x^{2}}{28} + \frac{x^{3}}{84} + \frac{x^{4}}{1680} }

(so, not all coefficients have numerator 1…)

{\displaystyle p_5(x) = p_4(x) + \frac{x^2}{252}p_3(x) = 1 + \frac{x}{2} + \frac{x^{2}}{9} + \frac{x^{3}}{72} + \frac{x^{4}}{1008} + \frac{x^{5}}{30240} }

The quality of approximation to {e^x} is best seen on logarithmic scale: i.e., how close is {\log(p(x)/p(-x))} to {x}? Here is this comparison using {p_5}.

Rational approximation based on 5th degree polynomial

For comparison, the Taylor polynomial of fifth degree, also on logarithmic scale: {\log(T_5(x))} where {\displaystyle T_5(x) = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \frac{x^4}{24} + \frac{x^5}{120}}.

log(T_5) in red

Inflection points of bell shaped curves

The standard normal probability density function {f(x) = \exp(-x^2/2)/\sqrt{2\pi}} has inflection points at {x = \pm 1} where {y=\exp(-1/2) /\sqrt{2\pi}\approx 0.24} which is about 60.5% of the maximum of this function. For this, as for other bell-shaped curves, the inflection points are also the points of steepest incline.

Standard Gaussian with inflection points marked in red

This is good to know for drawing an accurate sketch of this function, but in general, the Gaussian curve may be scaled differently, like {A\exp(-B x^2)}, and then the inflection points will be elsewhere. However, their relative height is invariant under scaling: it is always 60.5% of the maximum height of the curve. Since it is the height that we focus on, let us normalize various bell-shaped curves to have maximum {1}:

Gaussian rescaled to height 1

So, the Gaussian curve is inflected at the relative height of {\exp(-1/2)\approx 0.605}. For the Cauchy density {1/(x^2+1)} the inflection is noticeably higher, at {3/4 = 0.75} of the maximum:

Cauchy distribution

Another popular bell shape, hyperbolic secant or simply {2/(e^x+e^{-x})}, is in between with inflection height {1/\sqrt{2}\approx 0.707}. It is slightly unexpected to see an algebraic number arising from this transcendental function.

sech is such a cool name for a function

Can we get inflection height below {1/2}? One candidate is {1/(x^n+1)} with large even {n}, but this does not work: the relative height of inflection is {\displaystyle \frac{n+1}{2n} > \frac{1}{2}}. Shown here for {n=10}:

A smooth trapezoid

However, increasing the power of {x} in the Gaussian curve works: for example, {\exp(-x^4)} has inflection at relative height {\exp(-3/4) \approx 0.47}:

Does this distribution have a name?

More generally, the relative height of inflection for {\exp(-x^n)} is {\exp(1/n - 1)} for even {n}. As {n\to \infty}, this approaches {1/e\approx 0.37}. Can we go lower?

Well, there are compactly supported bump functions which look bell-shaped, for example {\exp(1/(x^2-1))} for {|x|<1}. Normalizing the height to {1} makes it {\exp(x^2/(x^2-1))}. For this function inflection occurs at relative height about {0.255}.

This one actually looks like a bell

Once again, we can replace {2} by an arbitrary positive even integer {n} and get relative inflection height down to {\displaystyle\exp\left(-\left(1+\sqrt{5-4/n}\right)/2\right)}. As {n} increases, this height decreases to {\displaystyle e^{-\varphi}} where {\varphi} is the golden ratio. This is less than {0.2} which is low enough for me today. The smallest {n} for which the height is less than {0.2} is {n=54}: it achieves inflection at {y\approx 0.1999}.

A tough one for derivative-based numerical solvers

In the opposite direction, it is easy to produce bell-shaped curve with high inflection points: either {1/(|x|^p + 1)} or {\exp(-|x|^p)} will do, where {p} is slightly larger than {1}. But these examples are only once differentiable, unlike the infinitely smooth examples above. Aside: as {p\to 1}, the latter function converges to the (rescaled) density of the Laplace distribution and the former to a non-integrable function.

As for the middle between two extremes… I did not find a reasonable bell-shaped curve that inflects at exactly half of its maximal height. An artificial example is {\exp(-|x|^p)} with {p=1/(1-\log 2) \approx 3.26} but this is ugly and only {C^3} smooth.

Maximal algebraic connectivity among graphs of bounded degree

The algebraic connectivity { a(G) } of a graph {G } is its smallest nontrivial Laplacian eigenvalue. Equivalently, it is the minimum of edge sums {\sum_{e\in E} df(e)^2} over all functions {f\colon V\to\mathbb R} normalized by { \sum_{v\in V} f(v) = 0 } and {\sum_{v\in V} f(v)^2 = 1 }. Here  { V, E } are vertex/edge sets, and { df(e)} means the difference of the values of {f } at the vertices of the edge { e}.

It is clear from the definition that adding edges to { G} cannot make  {a(G) } smaller; thus, among all graphs on {n } vertices the maximal value of {a(G) } is attained by the complete graph, for which {a(G) = n}. For any other graph { a(G)\le n-2} which can be shown as follows. Pick two non-adjacent vertices {u } and {v } and let {f(u)=1/\sqrt{2} }, { f(v) = -1/\sqrt{2}}, and {f=0 } elsewhere. This function is normalized as required above, and its edge sum {\sum_{e\in E} df(e)^2} is at most { n-2} since there are at most {  2(n-2)} edges with a nonzero contribution.

What if we require the graph to have degree vertex at most {d}, and look for maximal connectivity then? First of all, only connected graphs are under consideration, since {a(G)=0} for non-connected graphs. Also, only the cases {n\ge d+2 } are of interest, otherwise the complete graph wins. The argument in the previous paragraph shows that { a(G) \le d} but is this bound attained?

The case {d=2} is boring: the only two connected graphs are the path {P_n} and the cycle { C_n}. The cycle wins with  {a(C_n) = 2(1-\cos(2\pi/n)) } versus { a(P_n) = 2(1-\cos(\pi/n))}.

When { d=4}, one might suspect a pattern based on the following winners:

6 vertices, max degree 4, with a = 4
7 vertices, max degree 4, with a = 3.2

The structure of these two is the same: place {n } points on a circle, connect each of them to {4 } nearest neighbors.

But this pattern does not continue: the 8-vertex winner is completely different.

8 vertices, max degree 4, with a = 4

This is simply the complete bipartite graph { K_{4, 4} }. And it makes sense that the “4 neighbors” graph loses when the number of vertices is large: there is too much “redundancy” among its edges, many of which connect the vertices that were already connected by short paths.

In general, when {n = 2d }, the complete bipartite graph { K_{d, d}} achieves {a = d } and therefore maximizes the algebraic connectivity. The fact that {a(K_{d, d}) = d } follows by considering graph complement, as discussed in Laplacian spectrum of small graphs. The complement of {K_{d, d} } is the disjoint union of two copies of the complete graph {K_d }, for which the maximal eigenvalue is {d }. Hence {a(G) = n - \lambda_{\max}(G^c) = n-d = d }.

When { n = 2d-1} is odd, we have a natural candidate in { K_{d,d-1}} for which the argument from the previous paragraph shows {a(G) = d-1 }. This is indeed a winner when {n=5, d=3 }:

5 vertices, max degree 3, with a = 2

The winner is not unique since one can add another edge between two of the vertices of degree 2. This does not change the {a(G)}, however: there is a fundamental eigenfunction that has equal values at the vertices of that added edge.

Same for {n=9, d=5 }: the complete bipartite graph shares the maximum value of algebraic connectivity with two other graphs formed by adding edges to it:

9 vertices, max degree 5, with a = 4

However, the family { K_{d,d-1}} does not win the case { n = 2d-1} in general: we already saw a 4-regular graph on 7 vertices with {a(G)\approx 3.2 }, beating { a(K_{4, 3}) = 3}. Perhaps { K_{d,d-1}} wins when { d} is odd?

I do not have any other patterns to conjecture, but here are two winners for { n = 8, d= 3}: the cube and the “twisted cube”.

8 vertices, max degree 3, with a = 2
8 vertices, max degree 3, with a = 2

The cube is twisted by replacing a pair of edges on the top face with the diagonals. This is still a 3-regular graph and the algebraic connectivity stays the same, but it is no longer bipartite: a 5-cycle appears.

Sequential characterization and removability for uniform continuity

There is a useful sequential characterization of continuity in metric spaces. Let {f\colon X\to Y} be a map between metric spaces. If for every convergent sequence {x_n\to p} in {X} we have {f(x_n)\to f(p)} in {Y}, then {f} is continuous. And the converse is true as well.

Uniformly continuous functions also have a useful property related to sequences: if {f\colon X\to Y} is uniformly continuous and {\{x_n\}} is a Cauchy sequence in {X}, then {\{f(x_n)\}} is a Cauchy sequence in {Y}. However, this property does not characterize uniform continuity. For example, if {X=Y=\mathbb R}, then Cauchy sequences are the same as convergent sequences, and therefore any continuous function preserves the Cauchy-ness of sequences—it does not have to be uniformly continuous.

Let us say that two sequences {\{x_n\}} and {\{x_n'\}} are equivalent if the distance from {x_n} to {x_n'} tends to zero. The sequential characterization of uniform continuity is: {f\colon X\to Y} is uniformly continuous if and only if for any two equivalent sequences {\{x_n\}} and {\{x_n'\}} in {X}, their images {\{f(x_n)\}} and {\{f(x_n')\}} are equivalent in {Y}. The proof of this claim is straightforward.

In the special case when {\{x_n'\}} is a constant sequence, the sequential characterization of uniform continuity reduces to the sequential characterization of continuity.

A typical example of the use of this characterization is the proof that a continuous function on a compact set is uniformly continuous: pick two equivalent sequences with non-equivalent images, pass to suitable subsequences, get a contradiction with continuity.

Here is a different example. To state it, introduce the notation {N_r(p) = \{x\colon d(x, p)<r\}}.

Removability Theorem. Let {f\colon X\to Y} be continuous. Suppose that there exists {p\in X} such that for every {r>0}, the restriction of {f} to {X\setminus N_r(p)} is uniformly continuous. Then {f} is uniformly continuous on {X}.

This is a removability result because from having a certain property on subsets of {X} we get it on all of {X}. To demonstrate its use, let {X=[0, \infty)} with the standard metric, {p=0}, and {f(x)=\sqrt{x}}. The uniform continuity of {f} on {X\setminus N_r(p)} follows immediately from the derivative {f'} being bounded on that set (so, {f} is Lipschitz continuous there). By the removability theorem, {f} is uniformly continuous on {X}.

Before proving the theorem, let us restate the sequential characterization in an equivalent form (up to passing to subsequences): {f\colon X\to Y} is uniformly continuous if and only if for any two equivalent sequences {\{x_n\}} and {\{x_n'\}} there exist equivalent subsequences {\{f(x_{n_k})\}} and {\{f(x_{n_k}')\}}, with the same choice of indices {n_k} in both.

Proof of the theorem. Suppose {\{x_n\}} and {\{x_n'\}} are equivalent sequences in {X}. If {x_n\to p}, then {x_n'\to p} as well, and the continuity of {f} at {p} implies that both {\{f(x_n)\}} and {\{f(x_n')\}} converge to {p}, hence are equivalent sequences. If {x_n\not\to p}, then by passing to a subsequence we can achieve {d(x_n, p)\ge r } for some constant {r>0}. By the triangle inequality, for sufficiently large {n} we have {d(x_n', p)\ge r/2}. Since {f} is uniformly continuous on {X\setminus N_{r/2}(p)}, it follows that {\{f(x_n)\}} and {\{f(x_n')\}} are equivalent.

Boundary value problems: not all explicit solutions are useful

Consider this linear differential equation: {y''(t) + 4y'(t) + 2t y(t) = 7} with boundary conditions {y(0)=1} and {y(1)=0}. Nothing looks particularly scary here. Just one nonconstant coefficient, and it’s a simple one. Entering this problem into Wolfram Alpha produces the following explicit solution:


I am not sure how anyone could use this formula for any purpose.

Let us see what simple linear algebra can do here. The differential equation can be discretized by placing, for example, {4} equally spaces interior grid points on the interval: {t_k = k/5}, {k=1, \dots, 4}. The yet-unknown values of {y} at these points are denoted {y_1,\dots, y_4}. Standard finite-difference formulas provide approximate values of {y'} and {y''}:

{\displaystyle y'(t) \approx \frac{y(t+h) - y(t-h)}{2h}}

{\displaystyle y''(t) \approx \frac{y(t+h) - 2y(t) + y(t-h)}{h^2}}

where {h} is the step size, {1/5} in our case. Stick all this into the equation: we get 4 linear equations, one for each interior point. Namely, at {t_1 = 1/5} it is

{\displaystyle \frac{y_2 - 2y_1 + 1}{(1/5)^2} + 4 \frac{y_2 - 1}{2/5} + 2\cdot \frac15 y_1 = 7 }

(notice how the condition {y(0)=1} is used above), at {t_2 = 2/5} it is

{\displaystyle \frac{y_3 - 2y_2 + y_1}{(1/5)^2} + 4 \frac{y_3 - y_1}{2/5} + 2 \cdot \frac25 y_2 = 7 }

and so on. Clean up this system and put it in matrix form:

{\displaystyle \begin{pmatrix} -49.6 & 35 & 0 & 0 \\ 15 & -49.2 & 35 & 0 \\ 0 & 15 & -48.8 & 35 \\ 0 & 0 & 15 & -48.4 \end{pmatrix} \vec y = \begin{pmatrix} -8 \\ 7 \\ 7 \\ 7 \end{pmatrix} }

This isn’t too hard to solve even with pencil and paper. The solution is

{\displaystyle y = \begin{pmatrix} -0.2859 \\ -0.6337\\ -0.5683\\ -0.3207 \end{pmatrix}}

It can be visualized by plotting 4 points {(t_k, y_k)}:

Discrete solution

Not particularly impressive is it? And why are all these negative y-values in a problem with boundary condition {y(0)=1}? They do not really look like they want to approach {1} at the left end of the interval. But let us go ahead and plot them together with the boundary conditions, using linear interpolation in between:

Linear algebra + linear interpolation

Or better, use cubic spline interpolation, which only adds another step of linear algebra (see Connecting dots naturally) to our computations.

Same points, cubic spline interpolation

This begins to look believable. For comparison, I used a heavier tool: BVP solver from SciPy. Its output is the red curve below.

Cubic spline and BVP solver

Those four points we got from a 4-by-4 system, solvable by hand, pretty much tell the whole story. At any rate, they tell a better story than the explicit solution does.

Graphics made with: SciPy and Matplotlib using Google Colab.

Functions of bounded or vanishing nonlinearity

A natural way to measure the nonlinearity of a function {f\colon I\to \mathbb R}, where {I\subset \mathbb R} is an interval, is the quantity {\displaystyle NL(f;I) = \frac{1}{|I|} \inf_{k, r}\sup_{x\in I}|f(x)-kx-r|} which expresses the deviation of {f} from a line, divided by the size of interval {I}. This quantity was considered in Measuring nonlinearity and reducing it.

Let us write {NL(f) = \sup_I NL(f; I)} where the supremum is taken over all intervals {I} in the domain of definition of {f}. What functions have finite {NL(f)}? Every Lipschitz function does, as was noted previously: {NL(f) \le \frac14 \mathrm{Lip}\,(f)}. But the converse is not true: for example, {NL(f)} is finite for the non-Lipschitz function {f(x)=x\log|x|}, where {f(0)=0}.

y  = x log|x|

The function looks nice, but {f(x)/x} is clearly unbounded. What makes {NL(f)} finite? Note the scale-invariant feature of NL: for any {t>0} the scaled function {f_t(x) = t^{-1}f(tx)} satisfies {NL(f_t)=NL(f)}, and more precisely {NL(f; tI) = NL(f_t; I)}. On the other hand, our function has a curious scaling property {f_t(x) = f(x) + x\log t} where the linear term {x\log t} does not affect NL at all. This means that it suffices to bound {NL(f; I)} for intervals {I} of unit length. The plot of {f} shows that not much deviation from the secant line happens on such intervals, so I will not bother with estimates.

The class of functions {f} with {NL(f)<\infty} is precisely the Zygmund class {\Lambda^*} defined by the property {|f(x-h)-2f(x)+f(x+h)| \le Mh} with {M} independent of {x, h}. Indeed, since the second-order difference {f(x-h)-2f(x)+f(x+h)} is unchanged by adding an affine function to {f}, we can replace {f} by {f(x)-kx-r} with suitable {k, r} and use the triangle inequality to obtain

{\displaystyle |f(x-h)-2f(x)+f(x+h)| \le 4 \sup_I |f(x)-kx-r| = 8h\; NL(f; I)}

where {I=[x-h, x+h]}. Conversely, suppose that {f\in \Lambda^*}. Given an interval {I=[a, b]}, subtract an affine function from {f} to ensure {f(a)=f(b)=0}. We may assume {|f|} attains its maximum on {I} at a point {\xi \le (a + b)/2}. Applying the definition of {\Lambda^*} with {x = \xi} and {h = \xi - a}, we get {|f(2\xi - a) - 2f(\xi )| \le M h}, hence {|f(\xi )| \le Mh}. This shows {NL(f; I)\le M/2}. The upshot is that {NL(f)} is equivalent to the Zygmund seminorm of {f} (i.e., the smallest possible M in the definition of {\Lambda^*}).

A function in {\Lambda^*} may be nowhere differentiable: it is not difficult to construct {f} so that {NL(f;I)} is bounded between two positive constants. The situation is different for the small Zygmund class {\lambda^*} whose definition requires that {NL(f; I)\to 0} as {|I|\to 0}. A function {f \in \lambda^*} is differentiable at any point of local extremum, since the condition {NL(f; I)\to 0} forces its graph to be tangent to the horizontal line through the point of extremum. Given any two points {a, b} we can subtract the secant line from {f} and thus create a point of local extremum between {a } and {b}. It follows that {f} is differentiable on a dense set of points.

The definitions of {\Lambda^* } and {\lambda^*} apply equally well to complex-valued functions, or vector-valued functions. But there is a notable difference in the differentiability properties: a complex-valued function of class {\lambda^*} may be nowhere differentiable [Ullrich, 1993]. Put another way, two real-valued functions in {\lambda^*} need not have a common point of differentiability. This sort of thing does not often happen in analysis, where the existence of points of “good” behavior is usually based on the prevalence of such points in some sense, and therefore a finite collection of functions is expected to have common points of good behavior.

The key lemma in Ullrich’s paper provides a real-valued VMO function that has infinite limit at every point of a given {F_\sigma} set {E} of measure zero. Although this is a result of real analysis, the proof is complex-analytic in nature and involves a conformal mapping. It would be interesting to see a “real” proof of this lemma. Since the antiderivative of a VMO function belongs to {\lambda^* }, the lemma yields a   function {v \in \lambda^*} that is not differentiable at any point of {E}. Consider the lacunary series {u(t) = \sum_{n=1}^\infty a_n 2^{-n} \cos (2^n t)}. One theorem of Zygmund shows that {u \in \lambda^*} when {a_n\to 0}, while another shows that {u} is almost nowhere differentiable when {\sum a_n^2 = \infty}. It remains to apply the lemma to get a function {v\in \lambda^*} that is not differentiable at any point where {u} is differentiable.

Folded letters and uniformly bounded function sequences

We know that every bounded sequence of real numbers has a convergent subsequence. For a sequence of functions, say {f_n\colon [0, 1]\to \mathbb R}, the notion of boundedness can be stated as: there exists a constant {M} such that {|f_n(x)|\le M} for every {n} and for all {x\in [0, 1]}. Such a sequence is called uniformly bounded.

Once we fix some point {x\in [0, 1]}, the boundedness assumption provides a subsequence {\{f_{n_k}\}} which converges at that point. But since different points may require different subsequences, it is not obvious whether we can pick a subsequence {\{f_{n_k}\}} which converges for all {x\in [0, 1]}. (Such a subsequence is called pointwise convergent.)

It is easy to give an example of a uniformly bounded sequence with no uniformly convergent subsequence (uniform convergence {f_n\to f} requires {\sup |f_n-f|\to 0}, which is stronger than {f_n(x) \to f(x)} for every {x}). Indeed, {f_n(x) = x^n} does the job. This sequence is uniformly bounded (by {M=1}) and converges pointwise to a discontinuous function {g} such that {g(1)=1 } and {g(x)=0} elsewhere. Any subsequence {f_{n_k}} has the same pointwise limit {g} and since {g} is not continuous, the convergence cannot be uniform.

But what would be an example of a uniformly bounded sequence of continuous functions with no pointwise convergent subsequence? In Principles of Mathematical Analysis Rudin gives {f_n(x) = \sin nx} as such an example but then uses the Lebesgue Dominated Convergence Theorem to prove the non-existence of pointwise convergent subsequences. I do not want to use the DCT.

The simplest example I could think of is based on the letter-folding function {F\colon [0, 1]\to [0, 1]} defined by

{\displaystyle F(x) = \begin{cases} 3x, & x\in [0, 1/3] \\ 2 - 3x, & x\in [1/3, 2/3] \\  3x - 2, & x \in [2/3, 1]  \end{cases} }

or by a magic one-line formula if you prefer: {F(x) = 3x - 1 - |3x-1| + |3x - 2|}.

Letter-folding function

Let {\{f_n\}} be the sequence of the iterates of {L}, that is {f_0(x) = x} and {f_{n+1}(x) = F(f_n(x))}. This means {f_1 = F}, {f_2 = F\circ F}, {f_3 = F\circ F \circ F}, and so on.

The second and third iterates of F

By construction, the sequence {\{f_n\}} is uniformly bounded ({M=1}). It is somewhat similar to the example {\{\sin nx\}} in that we have increasingly rapid oscillations. But the proof that {\{f_n\}} has no pointwise convergent subsequence is elementary. It is based on the following observations.

(A) Suppose that {\{a_j\}} is a sequence such that {a_j\in \{0, 2\}} for each {j \in \mathbb N}. Then the number {\displaystyle x = \sum_{j=1}^\infty \frac{a_j}{3^j}} satisfies {x\in [0, 1/3]} if {a_1=0} and {x\in [2/3, 1]} if {a_1 = 2}.

The proof of (A) amounts to summing a geometric series. Incidentally, observe that {a_1, a_2, \dots} are the digits of {x} in base {3}.

(B) For {x} as above we have {\displaystyle F(x) = \sum_{j=1}^\infty \frac{a_{j+1}}{3^j}}. In other words, {F} shifts the ternary digits of {x} to the left. As a consequence, {f_n} shifts them to the left by {n} places.

(C) Given any subsequence {\{f_{n_k}\}}, let {a_j = 2} if {j = n_{2k} + 1} for some {k}, and {a_j=0} otherwise. By part (B), {\displaystyle f_{n_k}(x) = \sum_{j=1}^\infty \frac{a_{j + n_k}}{3^j} } which means the first ternary digit of {f_{n_k}(x)} is {a_{n_k + 1}}. By construction, this digit is {2} when {k} is even, and {0} when {k} is odd. By part (A) we have {f_{n_k}(x) \ge 2/3} when {k} is even, and {f_{n_k}(x) \le 1/3} when {k} is odd. Thus, {\{f_{n_k}(x)\}} does not converge. This completes the proof.


The set of all points {x} of the form considered in (A), (B), (C), i.e., those with all ternary digits {0} or {2}, is precisely the standard Cantor set {C}.

The function {F} magnifies each half of {C} by the factor of {3}, and maps it onto {C}.

The important part of the formula for {F} is that {F(x) = 3x\bmod 1} when {x\in C}. The rest of it could be some other continuous extension to {[0, 1]}.

A similar example could be based on the tent map {T(x) = 1 - |2x-1|}, whose graph is shown below.

The tent map

However, in case of the tent map it is more convenient to use the sequence of even iterates: {T\circ T}, {T\circ T\circ T\circ T}, and so on.

The second and fourth iterates of T

Indeed, since {T(T(x)) = 4x \bmod 1} when {x\in [0, 1/4]\cup [1/2, 3/4]}, one can simply replace base {3} with base {4} in all of the above computations and arrive at the same conclusion, except the orbit of {x} will now be jumping between the intervals {[0, 1/4]} and {[1/2, 3/4]}.

The tent map is conjugate to the logistic map {L(x) = 4x(1-x)}, shown below. This conjugacy is {L\circ \varphi = \varphi\circ T} where {\varphi(x) = (1 - \cos \pi x)/2}.

Logistic map

The conjugacy shows that the {n}th iterate {L^{n}} is {\varphi\circ T^n \circ \varphi^{-1}}. Therefore, the sequence of the even iterates of {L} has no pointwise convergent subsequence. This provides an example with smooth functions, even polynomials.

The second and fourth iterates of L

One could use the sequence of all iterates (not just the even ones) in the constructions based on {T} and {L}, it just takes a bit of extra work that I prefer to avoid.

Pi and Python: how 22/7 morphs into 355/113

This is a brief return to the topic of Irrational Sunflowers. The sunflower associated with a real number {a} is the set of points with polar coordinates {r=\sqrt{k}} and {\theta = a(2\pi k)}, {k=1, 2, \dots, n}. A sunflower reduces to {n} equally spaced rays if and only if {a} is a rational number written in lowest terms as {m/n}.

Here is the sunflower of {a=\pi} of size {n = 10000}.

Click to see the larger image

Seven rays emanate from the center because {\pi \approx 22/7}, then they become spirals, and spirals rearrange themselves into 113 rays because {\pi \approx 355/113}. Counting these rays is boring, so here is a way to do this automatically with Python (+NumPy as np):

a = np.pi
n = 5000
x = np.mod(a*np.arange(n, 2*n), 1)
np.sum(np.diff(np.sort(x)) > 1/n)

This code computes the polar angles of sunflower points indexed {5000\le k<10000}, sorts them and counts the relatively large gaps between the sorted values. These correspond to the gaps between sunflower rays, except that one of the gaps gets lost when the circle is cut and straightened onto the interval {[0, 2\pi)}. So the program output (112) means there are 113 rays.

Here is the same sunflower with the points alternatively colored red and blue.

Click to see the larger image

The colors blur into purple when the rational approximation pattern is strong. But they are clearly seen in the transitional period from 22/7 approximation to 355/113.

  1. How many points would we need to see the next rational approximation after 355/113?
  2. What will that approximation be? Yes, 22/7 and 355/113 and among the convergent of the continued fraction of {\pi}. But so is 333/106 which I do not see in the sunflower. Are some convergents better than others?

Finally, the code I used to plot sunflowers.

import numpy as np
import matplotlib.pyplot as plt
a = np.pi
k = np.arange(10000)
r = np.sqrt(k)
t = a*2*np.pi*k 
plt.plot(r*np.cos(t), r*np.sin(t), '.')

Transcendental-free Riemann-Lebesgue lemma

Calculus books tend to introduce transcendental functions (trigonometric, exponential, logarithm) early. Analysis textbooks such as Principles of Mathematical Analysis by Rudin tend to introduce them later, because of how long it takes to develop enough of the theory of power series.

The Riemann-Lebesgue lemma involves either trigonometric or exponential functions. But the following version works with the “late transcendentals” approach.

Transcendental-free Riemann-Lebesgue Lemma

TFRLL. Suppose that {f\colon [a, b]\to \mathbb R} and {g\colon \mathbb R\to \mathbb R} are continuously differentiable functions, and {g} is bounded. Then {\int_a^b f(x)g'(nx)\,dx \to 0} as {n\to\infty}.

The familiar form of the lemma is recovered by letting {g(x) = \sin x} or {g(x) = \cos x}.

Proof. By the chain rule, {g'(nx)} is the derivative of {g(nx)/n}. Integrate by parts:

{\displaystyle  \int_a^b f(x)g'(nx)\,dx = \frac{f(b)g(nb)}{n} - \frac{f(a)g(na)}{n} - \int_a^b f'(x)\frac{g(nx)}{n}\,dx }

By assumption, there exists a constant {M} such that {|g|\le M} everywhere. Hence {\displaystyle \left| \frac{f(b)g(nb)}{n}\right| \le \frac{|f(b)| M}{n} }, {\displaystyle \left|\frac{f(a)g(na)}{n}\right| \le \frac{|f(a)| M}{n}}, and {\displaystyle \left|\int_a^b f'(x)\frac{g(nx)}{n}\,dx\right| \le \frac{M}{n} \int_a^b |f'(x)|\,dx}. By the triangle inequality,

{\displaystyle \left|\int_a^b f(x)g'(nx)\,dx \right| \le \frac{M}{n}\left(|f(b)|+|f(a)| + \int_a^b |f'(x)|\,dx \right) \to 0 }

completing the proof.

As a non-standard example, TFRLL applies to, say, {g(x) = \sin (x^2) } for which {g'(x) = 2x\cos (x^2)}. The conclusion is that {\displaystyle \int_a^b f(x) nx \cos (n^2 x^2) \,dx \to 0 }, that is, {\displaystyle  \int_a^b xf(x) \cos (n^2 x^2) \,dx = o(1/n)} which seems somewhat interesting. When {0\notin [a, b]}, the factor of {x} can be removed by applying the result to {f(x)/x}, leading to {\displaystyle \int_a^b f(x) \cos (n^2 x^2) \,dx = o(1/n)}.

What if we tried less smoothness?

Later in Rudin’s book we encounter the Weierstrass theorem: every continuous function on {[a, b]} is a uniform limit of polynomials. Normally, this would be used to make the Riemann-Lebesgue lemma work for any continuous function {f}. But the general form given above, with an unspecified {g}, presents a difficulty.

Indeed, suppose {f} is continuous on {[a, b]}. Given {\epsilon > 0 }, choose a polynomial {p} such that {|p-f|\le \epsilon} on {[a, b]}. Since {p} has continuous derivative, it follows that {\int_a^b p(x)g'(nx)\,dx \to 0}. It remains to show that {\int_a^b p(x)g'(nx)\,dx} is close to {\int_a^b f(x)g'(nx)\,dx}. By the triangle inequality,

{\displaystyle \left| \int_a^b (p(x) - f(x))g'(nx)\,dx \right| \le \epsilon \int_a^b |g'(nx)|\,dx }

which is bounded by … um. Unlike for {\sin } and {\cos}, we do not have a uniform bound for {|g'|} or for its integral. Indeed, with {g(x) = \sin x^2} the integrals {\displaystyle  \int_0^1 |g'(nx)| \,dx = \int_0^1 2nx |\cos (n^2x^2)| \,dx  } grow linearly with {n}. And this behavior would be even worse with {g(x) = \sin e^x}, for example.

At present I do not see a way to prove TFRLL for continuous {f}, let alone for integrable {f}. But I do not have a counterexample either.