Graphs with integer normalized Laplacian eigenvalues

Let {V} be the vertex set of a graph; all graphs here are assumed connected. Write {u\sim v} if {u, v} are adjacent vertices. The Laplacian of a function {f\colon V\to \mathbb R} is defined as {L f(v) = \sum_{u\colon u\sim v}(f(v)-f(u))}. Its properties were discussed earlier in Laplacian spectrum of small graphs.

The normalized Laplacian involves the vertex degree {d_v} (the number of vertices adjacent to {v}). It is defined by

{\displaystyle L_N f(v) = \frac{1}{\sqrt{d_v}}\sum_{u\colon u\sim v}\left( \frac{f(v)}{\sqrt{d_v}} - \frac{f(u)}{\sqrt{d_u}} \right)}

The matrix of {L_N} is obtained by multiplying the matrix of {L} on both sides by the diagonal matrix with {1/\sqrt{d_v}} on the diagonal. Hence, {L_N} is also positive semidefinite and has the same rank as {L}. In particular, it has a simple eigenvalue 0 and the rest of eigenvalues are positive. Let us write {\mu_1 \le \cdots \le \mu_n} for the eigenvalues of {L_N}. In this notation, {\mu_1= 0} and {\mu_k>0} for {k\ge 2}.

There are many graphs with integer Laplacian eigenvalues; they do not seem to follow any particular pattern. In contrast, the graphs with integer normalized Laplacian eigenvalues have a simple description.

The matrix of normalized Laplacian {L_N} has {1} in every diagonal entry. Its off-diagonal entries are {-1/\sqrt{d_ud_v}} if {u\sim v}, and {0} otherwise. Since the trace is {n}, and there are {n-1} positive eigenvalues, the way that a normalized Laplacian spectrum can consist of integers is {(0, 1, \ldots, 1, 2)}.

It turns out that the largest normalized Laplacian eigenvalue {\mu_n} equal to {2} if and only if the graph is bipartite; otherwise it is less than {2} (as a reminder, all graphs are assumed connected in this post). Indeed, {\mu_n} is the norm of {L_N}, which is the supremum of {\langle L_N f, f\rangle/ \langle f, f\rangle} over all functions {f}. A computation shows {\langle L_N f, f\rangle = \sum_E (f(u)/\sqrt{d_u} - f(v) / \sqrt{d_v})^2} where the sum is taken over the edge set {E}: each edge contributes one term to the sum, with {u, v} being its endpoints. The sum becomes simpler if we write {f(v)=\sqrt{d_v} g(v)}; after this substitution,

{\displaystyle \mu_n = \sup_g \frac{\sum_E (g(u)-g(v))^2 }{\sum_V g(v)^2 d_v}}

The denominator can be rewritten as a sum over edges, namely {\sum_E (g(u)^2 + g(v)^2) }. This shows that the introduction of vertex degree into the denominator achieves certain symmetry: each value of {f} appears in the numerator as many times as in the denominator.

Since {(g(u)-g(v))^2 \le 2(g(u)^2 + g(v)^2)}, the numerator is at most {2 \sum_E (g(u)^2 + g(v)^2)}. Thus {\mu_n\le 2}, and to achieve equality here we need {g(u) + g(v) = 0} whenever {u\sim v}. This requires {|g|} to be constant and the graph to be bipartite, its 2-coloring provided by {\mathrm{sign}\,g}. Conversely, any 2-coloring of a graph provides us with a function {g\colon V\to \{-1, 1\}} for which {\sum_E (g(u)-g(v))^2 = 2\sum_V g(v)^2 d_v} according to the above.

To achieve integer spectrum, we also need {1} to be an eigenvalue of multiplicity {n-2} for {L_N}, which is equivalent to {\mathrm{rank}\,(I-L_N) = 2}. Since the graph is bipartite, {I-L_N} has the block form

{\displaystyle I - L_N = \begin{pmatrix} 0 & A \\ A^t & 0\end{pmatrix}}

where the (possibly rectangular) block {A} has entries {1/\sqrt{d_u d_v}} corresponding to {u\sim v}. In order for {I-L_N} to have rank 2, the matrix {A} must have rank 1. Any zero entry of a rank-1 matrix lies in a zero row, but {A} cannot have a zero row because the graph is connected. Thus, all entries of {A} are positive, which means we have a complete bipartite graph.

The converse is immediate: for a complete bipartite graph {K_{p, q}} the matrix {A} has all entries equal to {1/\sqrt{pq}}, hence its rank is 1.

As a final remark, the ordinary (un-normalized) Laplacian of a complete bipartite graph {K_{p, q}} also has integer eigenvalues. Indeed, the complement of {K_{p, q}} is the disjoint union of complete graphs {K_p} and {K_q}, with the spectra {(0, p, \ldots, p)} and {(0, q, \ldots, q)}. Passing to the complement means replacing each eigenvalue {\lambda} by {n-\lambda}, except that one {0} stays {0} (this is discussed in Laplacian spectrum of small graphs). Hence, the Laplacian spectrum of {K_{p, q}} consists of: simple eigenvalues {0} and {n}, eigenvalue {p} of multiplicity {q-1}, and eigenvalue {q} of multiplicity {p-1}. Worth noting that the Laplacian spectrum determines a complete bipartite graph up to an isomorphism, while the normalized Laplacian spectrum does not.

Functions with horizontally scaled derivative

Exponential functions {f_\lambda(x)=e^{\lambda x}} are characterized by the property that the graph of the derivative {f_\lambda'} is the graph of {f_\lambda} scaled by the factor of {\lambda} in the vertical direction: {f_\lambda'(x) = \lambda f_\lambda(x)}. We also have {f_\lambda(0)=1} for the sake of normalization.

What happens if we replace “vertical” by “horizontal”? That is, let {g_\lambda} be the function such that {g_\lambda'(x) = g_\lambda(x/\lambda)}, still with {g_\lambda(0)=1} for normalization. Clearly, {g_1(x)=e^x=f_1(x)}. Let’s consider {\lambda > 1} from now on. Assuming we can express {g_\lambda} as a power series {g_\lambda(x) = \sum\limits_{n=0}^\infty c_n x^n}, the comparison of {g'(x) =\sum\limits_{n=0}^\infty c_{n+1} (n+1) x^n } and {g(x/\lambda) =\sum\limits_{n=0}^\infty c_{n} \lambda^{-n}x^n } yields the recurrence relation {c_{n+1} = \lambda^{-n} c_n/(n+1)}. This relation leads to an explicit formula: {c_n = \lambda^{-\binom{n}{2}}/n!} where {\binom{n}{2}=n(n-1)/2}. That is,

{\displaystyle g_\lambda(x) = \sum_{n=0}^\infty \lambda^{-\binom{n}{2}} \frac{x^n}{n!}}

The behavior of {g_2, g_3}, and {g_5} in the vicinity of {0} is shown below.

Blue = g_2, red = g_3, yellow = g_5

A few questions come up. Is {g_\lambda} strictly increasing? Does it have a finite limit at {-\infty}? If so, is this limit negative? If so, where does {g_\lambda} cross the {x}-axis? What is its rate of growth as {x\to \infty}? Does the number {g_\lambda(1)} have any significance, considering that {g_1(1) = e}?

First of all: if {g_\lambda} does become negative at some point then its derivative also becomes negative further to the left, which can make {g_\lambda} rise to positive values again, and then the process will probably repeat as shown below. Also, {g_\lambda} is negative at each point of local minimum, and positive at each point of local maximum. This is because both the function and its derivative are positive at {0}, and moving to the left, the derivative cannot change the sign until the function itself changes the sign.

Oscillations are more rapid when lambda is smaller

When {\lambda} is close to {1}, the oscillating pattern takes longer to develop: here it is with {\lambda=1.1}. Note the vertical scale: these are very small oscillations, which is why this plot does not extend to the zero mark.

The waves get smaller but then start growing again, moving to the left

For any {\lambda > 1} the alternating series estimate gives {g_\lambda (x)>1+x} when {x\in [-1, 0]}, hence {g_\lambda(x)>0} for {x\ge -1}. It follows that {g_\lambda} is strictly increasing when {x\ge -\lambda}. We have

{\displaystyle g_\lambda(-\lambda) = \sum_{n=0}^\infty (-1)^n \lambda^{n-\binom{n}{2}}/n! = \frac{5}{6} - \frac{\lambda}{2} + \sum_{n=4}^\infty (-1)^n \lambda^{n-\binom{n}{2}}/n! }

Since {n-\binom{n}{2}} is decreasing for {n\ge 4}, the alternating series estimate applies and shows that {\displaystyle g_\lambda(-\lambda) < \frac{5}{6} - \frac{\lambda}{2} + \frac{\lambda^{-2}}{24}}

So, when {\lambda \ge 2} we have {g_{\lambda}(-\lambda) < 0} and therefore there is a unique point {r\in (-\lambda, -1)} where {g_\lambda(r)=0}. Specifically for {g_2}, this root is {r \approx -1.488}. Its significance is that {\lambda r} is the critical point closest to {0}, meaning that {[\lambda r, \infty)} is the largest interval of monotonicity of {g_\lambda}.

In general it is not true that {g_\lambda(-\lambda) < 0}. Indeed, {g_\lambda(x)\to e^x} uniformly on bounded sets as {\lambda \to 1}. I do not have a proof that {g_\lambda} has a real root for every {\lambda > 1}.

When {\lambda=2}, the sequence of denominators {2^{\binom{n}{2}}n!} is A011266 in OEIS (it is related to counting the evil and odious numbers). But the sum of its reciprocals, {g_2(1) \approx 2.2714925555}, did not show up anywhere.

Riesz projection on polynomials

Consider a trigonometric polynomial of degree {n} with complex coefficients, represented as a Laurent polynomial {L(z) = \sum\limits_{k=-n}^n c_k z^k} where {|z|=1}. The Riesz projection of {L} is just the regular part of {L}, the one without negative powers: {R(z) = \sum\limits_{k=0}^n c_k z^k}. Let’s compare the supremum norm of {L}, { \|L\|=\max\limits_{|z|=1} |L(z)|}, with the norm of {R}.

The ratio {\|R\|/\|L\|} may exceed {1}. By how much?

The extreme example for {n=1} appears to be {L(z) = 2z + 4 - z^{-1}}, pictured below together with {R(z)=2z+4}. The polynomial {L} is in blue, {R} is in red, and the point {0} is marked for reference.

Laurent polynomial in blue, its regular part in red

Since {R(z)=2z+4} has positive coefficients, its norm is just {R(1)=6}. To compute the norm of {L}, let’s rewrite {|L(z)|^2 = L(z)L(z^{-1})} as a polynomial of {x=\mathrm{Re}\,(z)}. Namely, {|L(z)|^2 = -2z^2 + 4z + 21 + 4z^{-1} - 2z^{-2}} which simplifies to {27 - 2(2x-1)^2} in terms of {x}. Hence {\|L\| = \sqrt{27}} and {\|R\|/\|L\| = 6/\sqrt{27} = 2/\sqrt{3}\approx 1.1547}.

The best example for {n=2} appears to be vaguely binomial: {L(z) = 2z^2 + 4z + 6 - 4z^{-1} + z^{-2}}. Note that the range of {R} is a cardioid.

Laurent polynomial in blue, its regular part in red

Once again, {R(z) = 2z^2 + 4z + 6} has positive coefficients, hence {\|R\| = R(1) = 12}. And once again, {|L(z)|^2} is a polynomial of {x=\mathrm{Re}\,(z)}, specifically {|L(z)|^2 = 81 - 8(1-x^2)(2x-1)^2}. Hence {\|L\| = 9} and {\|R\|/\|L\| = 12/9 = 4/3\approx 1.3333}.

I do not have a symbolic candidate for the extremal polynomial of degree {n= 3}. Numerically, it should look like this:

Laurent polynomial in blue, its regular part in red

Is the maximum of {\|R\|/\|L\|} attained by polynomials with real, rational coefficients (which can be made integer)? Do they have some hypergeometric structure? Compare with the Extremal Taylor polynomials which is another family of polynomials which maximize the supremum norm after elimination of some coefficients.

Riesz projection as a contraction

To have some proof content here, I add a 2010 theorem by Marzo and Seip: {\|R\|_4 \le \|L\|} where {\|R\|_p^p = \int_0^1 |R(e^{2\pi i t})|^p\,dt}.

The theorem is not just about polynomials: it says the Riesz projection is a contraction (has norm {1}) as an operator {L^\infty\to L^4}.

Proof. Let {S=L-R}, the singular part of {L}. The polynomial {R-S} differs from {L} only by the sign of the singular part, hence {\|R-S\|_2 = \|L\|_2} by Parseval’s theorem.

Since {S^2} consists of negative powers of {z}, while {R^2} does not contain any negative powers, these polynomials are orthogonal on the unit circle. By the Pythagorean theorem, {\|R^2-S^2\|_2 \ge \|R^2\|_2 = \|R\|_4^2}. On the other hand, {R^2-S^2 = (R+S)(R-S)=L(R-S)}. Therefore, {\|R^2-S^2\|_2 \le \|L\| \|R-S\|_2 = \|L\| \|L\|_2 \le \|L\|^2}, completing the proof.

This is so neat. And the exponent {4} is best possible: the Riesz projection is not a contraction from {L^\infty} to {L^p} when {p>4} (the Marzo-Seip paper has a counterexample).

Measure-distanced subsets of an interval

Suppose {(X, d)} is a bounded metric space in which we want to find {n} points at safe distance {\delta} from one other: {d(x_j, x_k)\ge \delta} for all {j\ne k}. Let {\delta_n} be the greatest value of {\delta} for which this is possible (technically, the supremum which might not be attained). We have {\mathrm{diam}\,(X) = \delta_2 \ge \delta_3\ge \cdots}.

The sequence {\delta_n} tends to zero if and only if {X} is totally bounded. From now on, consider a specific metric space which is bounded but not totally bounded: the space {X} of all measurable subsets of {[0, 1]} with the metric {d(A, B)=|A\triangle B|}, the measure of the symmetric difference. More specifically, the elements of {X} are equivalence classes of sets that differ by a null subset. Another way to think of it is the subset of {L^1([0, 1])} which consists of functions that attain only the values {0} and {1}. In this form we identify a set {A} with its characteristic function {\chi_A}. The measure space {[0, 1]} could be replaced by any isomorphic probability space, of which there are many.

The space {X} contains an infinite subset {\{E_1, E_2, \dots \}} with {\delta=1/2} separation: namely, let {E_k} be the set of all numbers {x\in [0, 1]} such that the fractional part of {2^k x} is less than {1/2}; or equivalently, the {k}th binary digit of {x} is zero. Each symmetric difference {E_j\triangle E_k} has measure exactly {1/2}. Therefore, {\delta_n\ge 1/2} for every {n}.

Upper bound for distances

To get an upper bound for {\delta_n}, suppose {\{E_1, \dots, E_n\}} are {\delta}-separated. Let {\chi_k} be the characteristic function of {E_k}. By assumption, {\int_0^1 |\chi_k-\chi_j|\ge \delta} whenever {1\le k < j \le n}. Summing over all such pairs {(k, j)} shows that {\int_0^1 \sum_{k<j} |\chi_k-\chi_j|\ge \delta n(n-1)/2}. Fix a point {x\in [0, 1]} and let {p} be the number of sets among {E_1, \dots, E_n} that contain {x}. An easy counting argument shows {\sum_{k<j} |\chi_k(x)-\chi_j(x)| = p(n-p)}. The latter quantity is bounded by {r(n-r)} where {r = \lceil n/2\rceil}. Since this bound holds for every {x}, it follows that {\delta n(n-1)/2 \le r(n-r)}, which simplifies to

{\displaystyle \delta \le \frac{r}{2r-1}}, where {r = \lceil n/2\rceil}.

For example, {\delta_4\le \delta_3\le 2/3}, {\delta_6\le \delta_5\le 3/5}, {\delta_8\le \delta_7\le 4/7}, and so on. Are these estimates sharp? For an affirmative answer it would be enough to demonstrate that {\delta_{2r}\ge r/(2r-1)} by finding a suitable collection of {2r} subsets. Note that sharpness requires that almost every point of {[0, 1]} be covered by exactly {r} of these subsets, and that all pairwise distances between sets must be equal to {r/(2r-1)}.

Sharpness of the upper bound

Four subsets with {2/3} separation are easy to find: {[0,1/3]}, {[1/3, 2/3]}, {[2/3, 1]}, and {[0, 1]}. Thus, {\delta_4=2/3}.

To find six subsets with {3/5} separation, some preparation is helpful.

For any set {A}, the map {E\mapsto E\triangle A} is an isometry of {X} onto {X}. So, there is no loss of generality in assuming that one of our {2r} sets is {[0, 1]}. Then we are to find {E_1,\dots, E_{2r-1}} of measure {1-r/(2r-1) = (r-1)/(2r-1)}. In order to have the specified distances, the measure of intersection of any two sets must be {(r-2)/(4r-2)}.

For example, if {r=2}, the above says we need three disjoint sets of measure {1/3}, which is a trivial thing to find. For {r>2} the sets will no longer be disjoint.

Discretization of the problem

A natural way to proceed is to partition {[0, 1]} into {4r-2} equal subintervals (or other sets of equal measure). Then we look for {2r-1} sets, each of which consists of {(4r-2)(r-1)/(2r-1) = 2r-2} subintervals. Their pairwise intersections must consist of {(4r-2)(r-2)/(4r-2) = r-2} subintervals.

The case {r=2} was done above. If {r=3}, we want to use {10} subintervals to form {5} sets with {4} subintervals in each. Pairwise intersections must consist of precisely {1} subinterval. Encoding subintervals by their indices 0-9, we can describe such sets as 0123, 3456, 6780, 1479, 2589. And since 10 is a triangular number, there is a neat illustration of these five sets: three sides of the triangle, and two “stars” connecting its center to the sides.

I admit it could be neater

There is no need to translate this back into sets formed by subintervals: as long as the combinatorial problem has a solution, we have the desired conclusion. In this case, the conclusion is {\delta_6=3/5}.

Next up, {r=4}. Following the above method, we partition {[0, 1]} into {4r-2 = 14} equal subintervals and look for {7} sets, each of which consists of {6} subintervals. Pairwise intersections must consist of {2} subintervals each. Can this be done?

Think of a cube. It has {8} vertices and {6} faces, and {8+6=14}. In the set {faces and vertices}, we choose {7} subsets of cardinality {6} as follows:

  • {face, all its vertices, and the opposite face}. There are 6 such sets. Any two of them have exactly 2 elements in common.
  • {all faces} is our 7th set, it has 2 elements in common with each of the above.

Thus, {\delta_8 = 4/7}. But it is concerning that {\delta_6} and {\delta_8} required different combinatorial constructions. What happens next?

For {r=5}, we partition {[0, 1]} into {18} equal subintervals and look for {9} sets, with {8} subintervals in each. Pairwise intersections must consist of {3} subintervals. I don’t see any neat construction here.

Summary so far

The results obtained so far for {r=2, 3, 4} can be summarized as follows: there exists a 0-1 matrix {B_r} of size {(2r-1)\times (4r-2)} in which the dot product of two rows is {r-2} when the rows are distinct and {2r-2} otherwise. Specifically,

{\displaystyle B_2 = \begin{pmatrix} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \end{pmatrix} }

{\displaystyle B_3 = \begin{pmatrix} 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 1 \\ \end{pmatrix}}

The matrix {B_4} is best written in block form {[F | V]} according to the faces-vertices interpretation. Here

{\displaystyle F = \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 \end{pmatrix}} and {\displaystyle V = \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \end{pmatrix}}

Does {B_5} exist? Perhaps not.

Combinatorics of finite sets

In combinatorial terms, we are looking for a {k}-uniform {\ell}-intersecting family of {m} subsets of {[n]}, with specific values

{k = 2r-2}, {\ell=r-2}, {m=2r-1}, {n=4r-2}.

Explanation of terms: {[n]=\{1, 2, \dots, n\}}, being {k}-uniform means every set has {k} elements, and being {\ell}-intersecting means all pairwise intersections have {\ell} elements.

Thus began my fishing expedition into the literature on combinatorics of finite sets. Surprisingly, it ended with a pivot back to where it started: measurable subsets of an interval.

Fractional intersecting uniform families

A recent (2018) preprint Uniform sets in a family with restricted intersections (Yandong Bai, Binlong Li, Jiuqiang Liu, Shenggui Zhang) relates the combinatorial problem back to subsets of an interval. I quote the relevant result (Theorem 2.5) with slight changes of notation.


Let {X=[0,n]} be the real interval and let {\lambda} be the Lebesgue measure on {X}. If {\mathcal{F}=\{F_1,\ldots,F_m\}} is a family of subsets of {X} such that

  1. {\lambda(F_i)=k} for all {i\in [m]}, and
  2. {\lambda(F_i\cap F_j)=\ell} for all distinct {i,j\in[m]},

then we call {\mathcal{F}} a fractional {\ell}-intersecting {k}-uniform {m}-family of {X}. For given real numbers {n,\ell} and integer {m}, let {\kappa_{\ell}^{\text{frac}}(n,m)} be the largest real number {k} such that there exists a fractional {\ell}-intersecting {k}-uniform {m}-family of {[0,n]}.

Theorem 2.5 Let {n>\ell\ge 0} be two real numbers and let {m\ge 1} be an integer. Then

{\displaystyle \kappa_{\ell}^{\text{frac}}(n,m)={m-1\choose s-1}\alpha+{m-1\choose t-1}\beta, }

where {\displaystyle s=\left\lfloor\frac{1+\sqrt{1+4m(m-1)\ell/n}}{2}\right\rfloor}, {\displaystyle t=\left\lceil\frac{1+\sqrt{1+4m(m-1)\ell/n}}{2}\right\rceil}

and {(\alpha,\beta)} is the solution of the system

{\displaystyle \binom{m}{s}\alpha+\binom{m}{t}\beta=n}

{\displaystyle \binom{m-2}{s-2}\alpha+\binom{m-2}{t-2}\beta=\ell}.


This looks complicated, but with our values {\ell=r-2}, {m=2r-1}, {n=4r-2} both {s} and {t} turn out to be equal to {r-1}. This is an exceptional case when the linear system is degenerate: it amounts to {\alpha+\beta = (4r-2)/\binom{2r-1}{r-1}}. The final result is {\kappa_{\ell}^{\text{frac}}(n,m)=2r-2} as we wanted. Yay.

So, the upper bounds for distances are indeed sharp, meaning that {n} subsets of the unit interval can be separated by measure-distance of {r/(2r-1)}, where {r = \lceil n/2\rceil}. But dividing the intervals into {4r-2} equal parts may not be enough to construct such subsets: in general one has to use a larger number of parts, some integer multiple of {4r-2}.

The authors remark that if {\alpha, \beta} are integers, then a partition with {k=\kappa^{\text{frac}}} exists in the combinatorial realm of {\{1, \dots, n\}}. However, this is not “if and only if”. Indeed, {\alpha+\beta = (4r-2)/\binom{2r-1}{r-1}} which takes integer values when {r=1, 2, 3}, and is fractional (less than 1) when {r\ge 4}. But we do have a combinatorial solution when {r=4}. So it is not quite clear for what values of {r} there exists a {(2r-2)}-uniform {(r-2)}-intersecting family of {2r-1} subsets of {[4r-2]}. Perhaps an answer could be found by following the construction for this specific case, when the formulas magically simplify.

Visualizing the Hardy norms on polynomials

The Hardy norm of a polynomial {f} is

{\displaystyle \|f\|_p = \left( \int_0^{1} |f(e^{2\pi it})|^p \,dt \right)^{1/p} }

with the usual convention

{\displaystyle \|f\|_\infty = \sup_{[0, 1]} |f(e^{2\pi it})| }

(This applies more generally to holomorphic functions, but polynomials suffice for now.) The quantity {\|f\|_p} makes sense for {0<p\le \infty} but is not actually a norm when {p<1}.

When restricted to polynomials of degree {d}, the Hardy norm provides a norm on {\mathbb C^{d+1}}, which we can try to visualize. In the special case {p=2} the Hardy norm agrees with the Euclidean norm by Parseval’s theorem. But what is it for other values of {p}?

Since the space {\mathbb C^{d+1}} has {2d+2} real dimensions, it is hard to visualize unless {d} is very small. When {d=0}, the norm we get on {\mathbb C^1} is just the Euclidean norm regardless of {p}. The first nontrivial case is {d=1}. That is, we consider the norm

{\displaystyle \|(a, b)\|_p = \left( \int_0^{1} |a + b e^{2\pi it}|^p \,dt \right)^{1/p} }

Since the integral on the right does not depend on the arguments of the complex numbers {a, b}, we lose nothing by restricting attention to {(a, b)\in \mathbb R^2}. (Note that this would not be the case for degrees {d\ge 2}.)

One easy case is {\|(a, b)\|_\infty = \sup_{[0, 1]} |a+be^{2\pi i t}| = |a|+|b|} since we can find {t} for which the triangle inequality becomes an equality. For the values {p\ne 2,\infty} numerics will have to do: we can use them to plot the unit ball for each of these norms. Here it is for {p=4}:

hardy4
p = 4

And {p=10}:

hardy10
p = 10

And {p=100} which is pretty close to the rotated square that we would get for {p=\infty}.

hardy100
p = 100

In the opposite direction, {p=1} brings a surprise: the Hardy {1}-norm is strictly convex, unlike the usual {1}-norm.

hardy1
p = 1

This curve already appeared on this blog in Maximum of three numbers: it’s harder than it sounds where I wrote

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.

Well, that was 6 years ago; with time I grew to appreciate this shape and the elliptic integrals.

Meanwhile, surprises continue: when {p=1/2 < 1}, the Hardy norm is an actual norm, with a convex unit ball.

hardy05
p = 0.5

Same holds for {p=1/5}:

hardy02
p = 0.2

And even for {p=1/100}:

hardy001
And here are all these norms together: from the outside in, the values of {p} are 0.01, 0.2, 0.5, 1, 2, 4, 10, 100. Larger {p} makes a larger Hardy norm, hence a smaller unit ball: the opposite behavior to the usual {p}-norms {(|a|^p + |b|^p)^{1/p}}.

hardy_all
All together now

I think this is a pretty neat picture: although the shapes look vaguely like {\ell^p} balls, the fact that the range of the exponent is {[0, \infty] } instead of {[1, \infty]} means there is something different in the behavior of these norms. For one thing, the Hardy norm with {p=1} turns out to be almost isometrically dual to the Hardy norm with {p=4}: this became the subject of a paper and then another one.

The effect of adding an edge on Laplacian eigenvalues

Adding an edge to a connected graph (between two of its existing vertices) makes the graph “even more” connected. How to quantify this? The Poincaré inequality gives a way to do this: for any function {f\colon V\to \mathbb R} with zero mean,

{\displaystyle \sum_V f(v)^2 \le C \sum_E (f(u)-f(v))^2 }

where the sum on the left is over the vertex set {V}, the sum on the right is over the edge set {E}, and {u, v} on the right are the endpoints of an edge. Suppose {C} is the smallest possible constant in this inequality, the Poincaré constant of the graph. Adding an edge to the graph does not change the sum on the left but may increase the sum on the right. Thus, the new Poincaré constant {C'} satisfies {C'\le C}: greater connectivity means smaller Poincaré constant.

It is more convenient to work with the reciprocal of {C}, especially because {1/C = \lambda_2}, the smallest positive eigenvalue of the graph Laplacian. Let {\lambda_2'} be the corresponding eigenvalue after an edge is added. The above shows that {\lambda_2 \le \lambda_2'}. But there is also an upper bound on how much this eigenvalue (or any Laplacian eigenvalue) can grow. Indeed, adding an edge amounts to adding the matrix

{\displaystyle B = \begin{pmatrix} 1 & -1 \\ -1 & 1 \end{pmatrix}} (with a bunch of zero rows/columns)

to the Laplacian matrix {L}. The eigenvalues of {B} are {0} and {2}. As a consequence of the Courant-Fischer minimax formulas, the ordered eigenvalues {\lambda_1 \le \dots \le \lambda_n} satisfy {\lambda_k(L) \le \lambda_k(L+B) \le \lambda_k(L) + 2} for every {k}.

Both of the above bounds are sharp. If an diagonal edge is added to the 4-cycle,

C4add

the smallest positive eigenvalue remains the same ({2}). The reason is that a typical eigenfunction for {\lambda_2} takes on values {-1, 0, 1, 0} (in cyclic order) and adding an edge between two vertices with the same value of such eigenfunction does not affect the Poincaré constant.

On the other hand, adding one more edge increases {\lambda_2} from {2} to {4}.

C4add2

Normalized Laplacian

Another form of the Poincaré inequality involves the vertex degree: when summing over vertices, we give each of them weight equal to the number of neighbors.

{\displaystyle \sum_V f(v)^2 d_v \le C \sum_E (f(u)-f(v))^2 } provided {\displaystyle \sum_V f(v)d_v = 0}

Here {1/C = \mu_2}, the smallest positive eigenvalue of the normalized Laplacian {L_N}, the matrix with {1} on the diagonal, where off-diagonal entries are {-1/\sqrt{d_ud_v}} if {u\sim v}, and {0} otherwise. The sum of the normalized Laplacian eigenvalues is equal to {n} (the number of vertices), regardless of how many edges are there. So we cannot expect all them to grow when an edge is added. But how do they change?

Let {\mu_2'} be the corresponding eigenvalue after an edge is added.
Here are two examples for which {\mu_2' - \mu_2 = 1/2}:

Completing a 3-path to a 3-cycle increases the eigenvalue from 1 to 3/2.

P3add

Completing a 4-path to a 4-cycle increases the eigenvalue from 1/2 to 1.

P4add
This pattern does not continue: “5-path to 5-cycle” results in a smaller increase, which is not even largest among 5-vertex graphs. It appears that the above examples are the only two cases of {\mu_2' - \mu_2 = 1/2}, with all other graphs having {\mu_2' - \mu_2 < 1/2}. (I do not have a proof of this.)

The opposite direction

Adding an edge can also make {\mu_2} smaller (thus, the weighted Poincaré constant becomes larger, showing that it may not be as useful for quantifying the connectivity of a graph). The smallest example is adding an edge to the star graph on 4 vertices:

S4add

here {\mu_2 = 1} but {\mu_2' = 5/4 - \sqrt{33}/12 \approx 0.77}.

Let us analyze the star graphs on {n} vertices, for general {n}. In an earlier post we saw that {\mu_2 = 1}, so it remains to find {\mu_2'}. Let us order the vertices so that {1} is the center of the star and {(2,3)} is the added edge. Write {\beta = -1/\sqrt{n-1}} for brevity. Then the normalized Laplacian matrix {L_N} is

{\displaystyle \begin{pmatrix} 1 & \beta/\sqrt{2} & \beta/\sqrt{2} & \beta & \cdots & \beta \\ \beta/\sqrt{2} & 1 & -1/2 & 0 & \cdots & 0 \\ \beta/\sqrt{2} & -1/2 & 1 & 0 & \cdots & 0 \\ \beta & 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \beta & 0 & 0 & 0 & \cdots & 1 \\ \end{pmatrix}}

where all rows numbered {i > 3} consist of {\beta} in the first column and {1} in the {i}th column. This shows that {L_N-I} has rank {4}, hence {1} is an eigenvalue of multiplicity {n-4}. Also, {0} is a simple eigenvalue as for any connected graph. Less obviously, {3/2} is an eigenvalue with the corresponding eigenvector {(0, 1, -1, 0, \dots, 0)} – this eigenvalue comes from the submatrix in row/columns 2 and 3, where the surrounding values in these row/columns are nice enough to cancel out. We are left with two eigenvalues to find, and we know their sum is {n - (n-4) - 3/2 = 5/2}. This means the characteristic polynomial of {L_N} is of the form

{\displaystyle p(t) = t (t-1)^{n-4} (t-3/2) (t^2 - 5t/2 + \gamma)}

where the constant {\gamma} remains to be found. I am going to skip to the answer here: {\gamma = n/(n-1)}. The quadratic formula delivers the last two eigenvalues:

{\displaystyle \frac14 \left( 5 \pm \sqrt{\frac{9n-25}{n-1}}\right) }

The smaller of these is {\mu_2'} that we were looking for:

{\displaystyle \mu_2' = \frac14 \left( 5 - \sqrt{\frac{9n-25}{n-1}}\right) \to \frac12 } as {n\to\infty}

Thus, adding an edge to a star graph results in negative {\mu_2' - \mu_2} which decreases to {-1/2} as {n\to\infty}. Exhaustive search for {n\le 9} shows that the star graph has the smallest value of {\mu_2' - \mu_2} among all graphs on {n} vertices. It is reasonable to expect this pattern to continue, which would imply {\mu_2' - \mu_2 > -1/2} in all cases.

Continued fractions vs Power series

The Taylor series of the exponential function,
{\displaystyle e^x = \sum_{n=0}^\infty \frac{x^n}{n!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \frac{x^4}{24} + \frac{x^5}{120} + \cdots }
provides reasonable approximations to the function near zero: for example, with {\displaystyle T_3(x) = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} } we get

T3
Taylor polynomial T_3 in red

The quality of approximation not being spectacular, one can try to improve it by using rational functions instead of polynomials. In view of the identity
{\displaystyle e^x = \frac{e^{x/2}}{e^{-x/2}} \approx \frac{T_3(x/2)}{T_3(-x/2)}}
one can get {e^x\approx p(x)/p(-x)} with {\displaystyle p(x) = 1 + \frac{x}{2} + \frac{x^2}{8} + \frac{x^3}{48} }. The improvement is substantial for negative {x}:

exp-half
Rational approximation based on exp(x/2)

Having rational approximation of the form {e^x\approx p(x)/p(-x)} makes perfect sense, because such approximants obey the same functional equation {f(x)f(-x)=1} as the exponential function itself. We cannot hope to satisfy other functional equations like {f(x+1)=e f(x)} by functions simpler than {\exp}.

However, the polynomial {T_n(x/2)} is not optimal for approximation {e^x \approx p(x)/p(-x)} except for {n=0, 1}. For degree {3}, the optimal choice is {\displaystyle p(x) = 1 + \frac{x}{2} + \frac{x^2}{10} + \frac{x^3}{120} }. In the same plot window as above, the graph of {p(x)/p(-x)} is indistinguishable from {e^x}.

pade
Red=rational, Blue=exponential

This is a Padé approximant to the exponential function. One way to obtain such approximants is to replace the Taylor series with continued fraction, using long division:

{\displaystyle e^x = 1 + \dfrac{x}{1 + \dfrac{-x/2}{1 + \dfrac{x/6}{1 + \dfrac{-x/6}{1 + \dfrac{x/10}{1 + \dfrac{-x/10}{1 + \cdots }}}}}} }

Terminating the expansion after an even number of {x} apprearances gives a Padé approximant of the above form.

This can be compared to replacing the decimal expansion of number {e}:

{\displaystyle e = 2 + \frac{7}{10} + \frac{1}{10^2} + \frac{8}{10^3} + \frac{2}{10^4}+\cdots }

with the continued fraction expansion

{\displaystyle e = 2 + \dfrac{1}{1 + \dfrac{1}{2 + \dfrac{1}{1 + \dfrac{1}{1 + \dfrac{1}{4 + \dfrac{1}{1 + \cdots }}}}}} }

which, besides greater accuracy, has a regular pattern: 121 141 161 181 …

The numerators {p} of the (diagonal) Padé approximants to the exponential function happen to have a closed form:

{\displaystyle p(x) = \sum_{k=0}^n \frac{\binom{n}{k}}{\binom{2n}{k}} \frac{x^k}{k!} }

which shows that for every fixed {k}, the coefficient of {x^k} converges to {2^{-k}/k!} as {n\to\infty }. The latter is precisely the Taylor coefficient of {e^{x/2}}.

In practice, a recurrence relation is probably the easiest way to get these numerators: begin with {p_0(x)=1} and {p_1(x) = 1+x/2}, and after that use {\displaystyle p_{n+1}(x) = p_n(x) + \frac{x^2}{16n^2 - 4} p_{n-1}(x)}. This relation can be derived from the recurrence relations for the convergents {A_n/B_n} of a generalized continued fraction {\displaystyle \mathop{\raisebox{-5pt}{\huge K}}_{n=1}^\infty \frac{a_n}{b_n}}. Namely, {A_n = b_nA_{n-1}+a_nA_{n-2}} and {B_n = b_nB_{n-1}+a_nB_{n-2}}. Only the first relation is actually needed here.

Using the recurrence relation, we get

{\displaystyle p_2(x) = p_1(x) + \frac{x^2}{12}p_0(x) = 1 + \frac{x}{2} + \frac{x^{2}}{12} }

{\displaystyle p_3(x) = p_2(x) + \frac{x^2}{60}p_1(x) = 1 + \frac{x}{2} + \frac{x^{2}}{10} + \frac{x^{3}}{120} }

{\displaystyle p_4(x) = p_3(x) + \frac{x^2}{140}p_2(x) = 1 + \frac{x}{2} + \frac{3 x^{2}}{28} + \frac{x^{3}}{84} + \frac{x^{4}}{1680} }

(so, not all coefficients have numerator 1…)

{\displaystyle p_5(x) = p_4(x) + \frac{x^2}{252}p_3(x) = 1 + \frac{x}{2} + \frac{x^{2}}{9} + \frac{x^{3}}{72} + \frac{x^{4}}{1008} + \frac{x^{5}}{30240} }

The quality of approximation to {e^x} is best seen on logarithmic scale: i.e., how close is {\log(p(x)/p(-x))} to {x}? Here is this comparison using {p_5}.

pade5
Rational approximation based on 5th degree polynomial

For comparison, the Taylor polynomial of fifth degree, also on logarithmic scale: {\log(T_5(x))} where {\displaystyle T_5(x) = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \frac{x^4}{24} + \frac{x^5}{120}}.

T5
log(T_5) in red

Inflection points of bell shaped curves

The standard normal probability density function {f(x) = \exp(-x^2/2)/\sqrt{2\pi}} has inflection points at {x = \pm 1} where {y=\exp(-1/2) /\sqrt{2\pi}\approx 0.24} which is about 60.5% of the maximum of this function. For this, as for other bell-shaped curves, the inflection points are also the points of steepest incline.

standard_gaussian
Standard Gaussian with inflection points marked in red

This is good to know for drawing an accurate sketch of this function, but in general, the Gaussian curve may be scaled differently, like {A\exp(-B x^2)}, and then the inflection points will be elsewhere. However, their relative height is invariant under scaling: it is always 60.5% of the maximum height of the curve. Since it is the height that we focus on, let us normalize various bell-shaped curves to have maximum {1}:

height1_gaussian
Gaussian rescaled to height 1

So, the Gaussian curve is inflected at the relative height of {\exp(-1/2)\approx 0.605}. For the Cauchy density {1/(x^2+1)} the inflection is noticeably higher, at {3/4 = 0.75} of the maximum:

Cauchy
Cauchy distribution

Another popular bell shape, hyperbolic secant or simply {2/(e^x+e^{-x})}, is in between with inflection height {1/\sqrt{2}\approx 0.707}. It is slightly unexpected to see an algebraic number arising from this transcendental function.

sech
sech is such a cool name for a function

Can we get inflection height below {1/2}? One candidate is {1/(x^n+1)} with large even {n}, but this does not work: the relative height of inflection is {\displaystyle \frac{n+1}{2n} > \frac{1}{2}}. Shown here for {n=10}:

pow10
A smooth trapezoid

However, increasing the power of {x} in the Gaussian curve works: for example, {\exp(-x^4)} has inflection at relative height {\exp(-3/4) \approx 0.47}:

exp4
Does this distribution have a name?

More generally, the relative height of inflection for {\exp(-x^n)} is {\exp(1/n - 1)} for even {n}. As {n\to \infty}, this approaches {1/e\approx 0.37}. Can we go lower?

Well, there are compactly supported bump functions which look bell-shaped, for example {\exp(1/(x^2-1))} for {|x|<1}. Normalizing the height to {1} makes it {\exp(x^2/(x^2-1))}. For this function inflection occurs at relative height about {0.255}.

bump1
This one actually looks like a bell

Once again, we can replace {2} by an arbitrary positive even integer {n} and get relative inflection height down to {\displaystyle\exp\left(-\left(1+\sqrt{5-4/n}\right)/2\right)}. As {n} increases, this height decreases to {\displaystyle e^{-\varphi}} where {\varphi} is the golden ratio. This is less than {0.2} which is low enough for me today. The smallest {n} for which the height is less than {0.2} is {n=54}: it achieves inflection at {y\approx 0.1999}.

bump54
A tough one for derivative-based numerical solvers

In the opposite direction, it is easy to produce bell-shaped curve with high inflection points: either {1/(|x|^p + 1)} or {\exp(-|x|^p)} will do, where {p} is slightly larger than {1}. But these examples are only once differentiable, unlike the infinitely smooth examples above. Aside: as {p\to 1}, the latter function converges to the (rescaled) density of the Laplace distribution and the former to a non-integrable function.

As for the middle between two extremes… I did not find a reasonable bell-shaped curve that inflects at exactly half of its maximal height. An artificial example is {\exp(-|x|^p)} with {p=1/(1-\log 2) \approx 3.26} but this is ugly and only {C^3} smooth.

Maximal algebraic connectivity among graphs of bounded degree

The algebraic connectivity { a(G) } of a graph {G } is its smallest nontrivial Laplacian eigenvalue. Equivalently, it is the minimum of edge sums {\sum_{e\in E} df(e)^2} over all functions {f\colon V\to\mathbb R} normalized by { \sum_{v\in V} f(v) = 0 } and {\sum_{v\in V} f(v)^2 = 1 }. Here  { V, E } are vertex/edge sets, and { df(e)} means the difference of the values of {f } at the vertices of the edge { e}.

It is clear from the definition that adding edges to { G} cannot make  {a(G) } smaller; thus, among all graphs on {n } vertices the maximal value of {a(G) } is attained by the complete graph, for which {a(G) = n}. For any other graph { a(G)\le n-2} which can be shown as follows. Pick two non-adjacent vertices {u } and {v } and let {f(u)=1/\sqrt{2} }, { f(v) = -1/\sqrt{2}}, and {f=0 } elsewhere. This function is normalized as required above, and its edge sum {\sum_{e\in E} df(e)^2} is at most { n-2} since there are at most {  2(n-2)} edges with a nonzero contribution.

What if we require the graph to have degree vertex at most {d}, and look for maximal connectivity then? First of all, only connected graphs are under consideration, since {a(G)=0} for non-connected graphs. Also, only the cases {n\ge d+2 } are of interest, otherwise the complete graph wins. The argument in the previous paragraph shows that { a(G) \le d} but is this bound attained?

The case {d=2} is boring: the only two connected graphs are the path {P_n} and the cycle { C_n}. The cycle wins with  {a(C_n) = 2(1-\cos(2\pi/n)) } versus { a(P_n) = 2(1-\cos(\pi/n))}.

When { d=4}, one might suspect a pattern based on the following winners:

nd6-4-108
6 vertices, max degree 4, with a = 4

nd7-4-832
7 vertices, max degree 4, with a = 3.2

The structure of these two is the same: place {n } points on a circle, connect each of them to {4 } nearest neighbors.

But this pattern does not continue: the 8-vertex winner is completely different.

nd8-4-4575a
8 vertices, max degree 4, with a = 4

This is simply the complete bipartite graph { K_{4, 4} }. And it makes sense that the “4 neighbors” graph loses when the number of vertices is large: there is too much “redundancy” among its edges, many of which connect the vertices that were already connected by short paths.

In general, when {n = 2d }, the complete bipartite graph { K_{d, d}} achieves {a = d } and therefore maximizes the algebraic connectivity. The fact that {a(K_{d, d}) = d } follows by considering graph complement, as discussed in Laplacian spectrum of small graphs. The complement of {K_{d, d} } is the disjoint union of two copies of the complete graph {K_d }, for which the maximal eigenvalue is {d }. Hence {a(G) = n - \lambda_{\max}(G^c) = n-d = d }.

When { n = 2d-1} is odd, we have a natural candidate in { K_{d,d-1}} for which the argument from the previous paragraph shows {a(G) = d-1 }. This is indeed a winner when {n=5, d=3 }:

nd5-3-6
5 vertices, max degree 3, with a = 2

The winner is not unique since one can add another edge between two of the vertices of degree 2. This does not change the {a(G)}, however: there is a fundamental eigenfunction that has equal values at the vertices of that added edge.

Same for {n=9, d=5 }: the complete bipartite graph shares the maximum value of algebraic connectivity with two other graphs formed by adding edges to it:

nd9-5-22206
9 vertices, max degree 5, with a = 4

However, the family { K_{d,d-1}} does not win the case { n = 2d-1} in general: we already saw a 4-regular graph on 7 vertices with {a(G)\approx 3.2 }, beating { a(K_{4, 3}) = 3}. Perhaps { K_{d,d-1}} wins when { d} is odd?

I do not have any other patterns to conjecture, but here are two winners for { n = 8, d= 3}: the cube and the “twisted cube”.

nd8-3-4326
8 vertices, max degree 3, with a = 2

nd8-3-8725
8 vertices, max degree 3, with a = 2

The cube is twisted by replacing a pair of edges on the top face with the diagonals. This is still a 3-regular graph and the algebraic connectivity stays the same, but it is no longer bipartite: a 5-cycle appears.

Sequential characterization and removability for uniform continuity

There is a useful sequential characterization of continuity in metric spaces. Let {f\colon X\to Y} be a map between metric spaces. If for every convergent sequence {x_n\to p} in {X} we have {f(x_n)\to f(p)} in {Y}, then {f} is continuous. And the converse is true as well.

Uniformly continuous functions also have a useful property related to sequences: if {f\colon X\to Y} is uniformly continuous and {\{x_n\}} is a Cauchy sequence in {X}, then {\{f(x_n)\}} is a Cauchy sequence in {Y}. However, this property does not characterize uniform continuity. For example, if {X=Y=\mathbb R}, then Cauchy sequences are the same as convergent sequences, and therefore any continuous function preserves the Cauchy-ness of sequences—it does not have to be uniformly continuous.

Let us say that two sequences {\{x_n\}} and {\{x_n'\}} are equivalent if the distance from {x_n} to {x_n'} tends to zero. The sequential characterization of uniform continuity is: {f\colon X\to Y} is uniformly continuous if and only if for any two equivalent sequences {\{x_n\}} and {\{x_n'\}} in {X}, their images {\{f(x_n)\}} and {\{f(x_n')\}} are equivalent in {Y}. The proof of this claim is straightforward.

In the special case when {\{x_n'\}} is a constant sequence, the sequential characterization of uniform continuity reduces to the sequential characterization of continuity.

A typical example of the use of this characterization is the proof that a continuous function on a compact set is uniformly continuous: pick two equivalent sequences with non-equivalent images, pass to suitable subsequences, get a contradiction with continuity.

Here is a different example. To state it, introduce the notation {N_r(p) = \{x\colon d(x, p)<r\}}.

Removability Theorem. Let {f\colon X\to Y} be continuous. Suppose that there exists {p\in X} such that for every {r>0}, the restriction of {f} to {X\setminus N_r(p)} is uniformly continuous. Then {f} is uniformly continuous on {X}.

This is a removability result because from having a certain property on subsets of {X} we get it on all of {X}. To demonstrate its use, let {X=[0, \infty)} with the standard metric, {p=0}, and {f(x)=\sqrt{x}}. The uniform continuity of {f} on {X\setminus N_r(p)} follows immediately from the derivative {f'} being bounded on that set (so, {f} is Lipschitz continuous there). By the removability theorem, {f} is uniformly continuous on {X}.

Before proving the theorem, let us restate the sequential characterization in an equivalent form (up to passing to subsequences): {f\colon X\to Y} is uniformly continuous if and only if for any two equivalent sequences {\{x_n\}} and {\{x_n'\}} there exist equivalent subsequences {\{f(x_{n_k})\}} and {\{f(x_{n_k}')\}}, with the same choice of indices {n_k} in both.

Proof of the theorem. Suppose {\{x_n\}} and {\{x_n'\}} are equivalent sequences in {X}. If {x_n\to p}, then {x_n'\to p} as well, and the continuity of {f} at {p} implies that both {\{f(x_n)\}} and {\{f(x_n')\}} converge to {p}, hence are equivalent sequences. If {x_n\not\to p}, then by passing to a subsequence we can achieve {d(x_n, p)\ge r } for some constant {r>0}. By the triangle inequality, for sufficiently large {n} we have {d(x_n', p)\ge r/2}. Since {f} is uniformly continuous on {X\setminus N_{r/2}(p)}, it follows that {\{f(x_n)\}} and {\{f(x_n')\}} are equivalent.