## Sequential characterization and removability for uniform continuity

There is a useful sequential characterization of continuity in metric spaces. Let ${f\colon X\to Y}$ be a map between metric spaces. If for every convergent sequence ${x_n\to p}$ in ${X}$ we have ${f(x_n)\to f(p)}$ in ${Y}$, then ${f}$ is continuous. And the converse is true as well.

Uniformly continuous functions also have a useful property related to sequences: if ${f\colon X\to Y}$ is uniformly continuous and ${\{x_n\}}$ is a Cauchy sequence in ${X}$, then ${\{f(x_n)\}}$ is a Cauchy sequence in ${Y}$. However, this property does not characterize uniform continuity. For example, if ${X=Y=\mathbb R}$, then Cauchy sequences are the same as convergent sequences, and therefore any continuous function preserves the Cauchy-ness of sequences—it does not have to be uniformly continuous.

Let us say that two sequences ${\{x_n\}}$ and ${\{x_n'\}}$ are equivalent if the distance from ${x_n}$ to ${x_n'}$ tends to zero. The sequential characterization of uniform continuity is: ${f\colon X\to Y}$ is uniformly continuous if and only if for any two equivalent sequences ${\{x_n\}}$ and ${\{x_n'\}}$ in ${X}$, their images ${\{f(x_n)\}}$ and ${\{f(x_n')\}}$ are equivalent in ${Y}$. The proof of this claim is straightforward.

In the special case when ${\{x_n'\}}$ is a constant sequence, the sequential characterization of uniform continuity reduces to the sequential characterization of continuity.

A typical example of the use of this characterization is the proof that a continuous function on a compact set is uniformly continuous: pick two equivalent sequences with non-equivalent images, pass to suitable subsequences, get a contradiction with continuity.

Here is a different example. To state it, introduce the notation ${N_r(p) = \{x\colon d(x, p).

Removability Theorem. Let ${f\colon X\to Y}$ be continuous. Suppose that there exists ${p\in X}$ such that for every ${r>0}$, the restriction of ${f}$ to ${X\setminus N_r(p)}$ is uniformly continuous. Then ${f}$ is uniformly continuous on ${X}$.

This is a removability result because from having a certain property on subsets of ${X}$ we get it on all of ${X}$. To demonstrate its use, let ${X=[0, \infty)}$ with the standard metric, ${p=0}$, and ${f(x)=\sqrt{x}}$. The uniform continuity of ${f}$ on ${X\setminus N_r(p)}$ follows immediately from the derivative ${f'}$ being bounded on that set (so, ${f}$ is Lipschitz continuous there). By the removability theorem, ${f}$ is uniformly continuous on ${X}$.

Before proving the theorem, let us restate the sequential characterization in an equivalent form (up to passing to subsequences): ${f\colon X\to Y}$ is uniformly continuous if and only if for any two equivalent sequences ${\{x_n\}}$ and ${\{x_n'\}}$ there exist equivalent subsequences ${\{f(x_{n_k})\}}$ and ${\{f(x_{n_k}')\}}$, with the same choice of indices ${n_k}$ in both.

Proof of the theorem. Suppose ${\{x_n\}}$ and ${\{x_n'\}}$ are equivalent sequences in ${X}$. If ${x_n\to p}$, then ${x_n'\to p}$ as well, and the continuity of ${f}$ at ${p}$ implies that both ${\{f(x_n)\}}$ and ${\{f(x_n')\}}$ converge to ${p}$, hence are equivalent sequences. If ${x_n\not\to p}$, then by passing to a subsequence we can achieve ${d(x_n, p)\ge r }$ for some constant ${r>0}$. By the triangle inequality, for sufficiently large ${n}$ we have ${d(x_n', p)\ge r/2}$. Since ${f}$ is uniformly continuous on ${X\setminus N_{r/2}(p)}$, it follows that ${\{f(x_n)\}}$ and ${\{f(x_n')\}}$ are equivalent.

## Bipartite metric spaces and positive kernels

Bipartite metric spaces provide simple examples of highly non-Euclidean metric spaces. Let ${A}$ and ${B}$ be two disjoint abstract sets, say, with ${m}$ and ${n}$ points respectively. Define a metric on the union ${A\cup B}$ as follows, where ${a, b, c}$ are some positive numbers:

• The distance between distinct elements of ${A}$ is ${a}$.
• The distance between distinct elements of ${B}$ is ${b}$.
• The distance between two elements from different sets is ${c}$.

The triangle inequality is satisfied provided that ${a, b\le 2c}$. When ${c}$ is large, the space is not very interesting: it is approximately the union of two simplexes placed at some large distance from each other. On the other hand, when ${a=b=2c}$ we encounter a bizarre situation: every point of ${B}$ is a “midpoint” between any two distinct points of ${A}$, and vice versa.

A function ${K\colon X\times X \to \mathbb R}$ is called a positive kernel if the matrix ${K(x_i, x_j)_{i, j=1}^n}$ is positive semidefinite for any choice of ${n}$ and ${x_1, \dots, x_n\in X}$. A classical theorem of Schoenberg says that a metric space ${(X, d)}$ is isometric to a subset of a Hilbert space if and only if ${\exp(-\gamma d^2)}$ is a positive kernel for all ${\gamma >0}$. One may ask: is there any function ${f\colon [0, \infty)\to\mathbb R}$ such that ${f\circ d}$ is a positive kernel on ${X}$ for every metric space ${(X, d)}$?

Suppose ${f}$ is such a function. Applying it to the bipartite metric defined above, we find that the following matrix must be positive semidefinite (the case ${m=3}$, ${n=2}$ shown; in general the diagonal blocks have sizes ${m\times m}$ and ${n\times n}$):

${\displaystyle \begin{pmatrix} f(0) & f(a) & f(a) & f(c) & f(c) \\ f(a) & f(0) & f(a) & f(c) & f(c) \\ f(a) & f(a) & f(0) & f(c) & f(c) \\ f(c) & f(c) & f(c) & f(0) & f(b) \\ f(c) & f(c) & f(c) & f(b) & f(0) \end{pmatrix} }$

Test the PSD property against vectors of the form ${(s, s, s, t, t)}$, that is ${m}$ entries equal to some number ${s}$ followed by ${n}$ entries equal to some number ${t}$. The result is the quadratic form ${As^2 + Bt^2 + 2C^2st }$ where ${A = (m^2-m) f(a)+mf(0)}$, ${B = (n^2-n)f(b)+nf(0)}$, and ${C = mnf(c)}$.

For this quadratic form to be positive semidefinite, we need ${A\ge 0}$ and ${C^2\le AB}$ to hold. The inequality ${A\ge 0}$ amounts to ${f(a)\ge 0}$ and ${f(0)\ge 0}$, because ${m}$ can be any positive integer. Simply put, ${f}$ must be nonnegative.

The inequality ${C^2\le AB}$ simplifies to

${ mn f(c)^2 \le ( (m-1) f(a)+f(0)) \cdot ((n-1)f(b)+ f(0)) }$

When ${m=n=1}$ and ${a=b}$, we obtain ${f(c)\le f(0)}$; so, ${f}$ is maximized at ${0}$. Letting ${m, n\to \infty}$ yields ${f(c)^2 \le f(a)f(b)}$. What does this mean?

If ${c=(a+b)/2}$, the inequality ${f(c)^2 \le f(a)f(b)}$ amounts to log-convexity of ${f}$, and there are lots of log-convex functions. But crucially, our only constraint on ${c}$ in the construction of a bipartite metric space is ${c\ge \max(a, b)/2}$. In the special case ${a=b}$ we find that ${f(c)\le f(a)}$ whenever ${c\ge a/2}$ (and ${a>0}$). On one hand, ${f}$ must be nonincreasing because all ${c\ge a}$ are allowed. On the other hand, ${f(a/2)\le f(a)}$ implies that the sequence ${k\mapsto f(2^k)}$ is nondecreasing for ${k\in\mathbb Z}$. These conditions can be satisfied together only if ${f}$ is constant on ${(0, \infty)}$.

And there you have it, a complete description of functions ${f\colon [0, \infty)\to\mathbb R}$ that transform every distance matrix into a positive semidefinite matrix: ${f\equiv c\ge 0}$ on ${(0, \infty)}$, and ${f(0)\ge c}$. It is immediate that such ${f}$ indeed works for all metric spaces, not just bipartite ones.

## Quasi-projections and isometries

A typical scenario: given a subset ${E}$ of a metric space ${X}$ and a point ${x\in X}$, we look for a point ${y\in E}$ that is nearest to ${x}$: that is, ${d(x, y) = \mathrm{dist}\,(x, E)}$. Such a point is generally not unique: for example, if ${E}$ is the graph of cosine function and ${x = (\pi, \pi/2)}$, then both ${(\pi/2, 0)}$ and ${(3\pi/2, 0)}$ qualify as nearest to ${x}$. This makes the nearest-point projection onto ${E}$ discontinuous: moving ${x}$ slightly to the left or to the right will make its projection onto ${E}$ jump from one point to another. Not good.

Even when the nearest point projection is well-defined and continuous, it may not be the kind of projection we want. For example, in a finite-dimensional normed space with strictly convex norm we have a continuous nearest-point projection onto any linear subspace, but it is in general a nonlinear map.

Let’s say that ${P\colon X\to E}$ is a quasi-projection if ${d(x, P(x)) \le C \mathrm{dist}\,(x, E)}$ for some constant ${C}$ independent of ${x}$. Such maps are much easier to construct: indeed, every Lipschitz continuous map ${P\colon X\to E}$ such that ${P(x)=x}$ for ${x \in E}$ is a quasi-projection. For example, one quasi-projection onto the graph of cosine is the map ${(x, y)\mapsto (x, \cos x)}$ shown below.

If ${X}$ is a Banach space and ${E}$ is its subspace, then any idempotent operator with range ${E}$ is a quasi-projection onto ${E}$. Not every subspace admits such an operator but many do (these are complemented subspaces; they include all subspaces of finite dimension or finite codimension). By replacing “nearest” with “close enough” we gain linearity. And even some subspaces that are not linearly complemented admit a continuous quasi-projection.

Here is a neat fact: if ${M}$ and ${N}$ are subspaces of a Euclidean space and ${\dim M = \dim N}$, then there exists an isometric quasi-projection of ${M}$ onto ${N}$ with constant ${C=\sqrt{2}}$. This constant is best possible: for example, an isometry from the ${y}$-axis onto the ${x}$-axis has to send ${(0, 1)}$ to one of ${(\pm 1, 0)}$, thus moving it by distance ${\sqrt{2}}$.

Proof. Let ${k}$ be the common dimension of ${M}$ and ${N}$. Fix some orthonormal bases in ${M}$ and ${N}$. In these bases, the orthogonal (nearest-point) projection from ${M}$ to ${N}$ is represented by some ${k\times k}$ matrix ${P}$ of norm at most ${1}$. We need an orthogonal ${k\times k}$ matrix ${Q}$ such that the map ${M\to N}$ that it defines is a ${\sqrt{2}}$-quasi-projection. What exactly does this condition mean for ${Q}$?

Let’s say ${x\in M}$, ${y\in N}$ is the orthogonal projection of ${x}$ onto ${N}$, and ${z\in N}$ is where we want to send ${x}$ by an isometry. Our goal is ${\|x-z\|\le \sqrt{2}\|x-y\|}$, in addition to ${\|z\|=\|x\|}$. Squaring and expanding inner products yields ${2\langle x, y\rangle - \langle x, z \rangle \le \|y\|^2}$. Since both ${y}$ and ${z}$ are in ${N}$, we can replace ${x}$ on the left by its projection ${y}$. So, the goal simplifies to ${\|y\|^2 \le \langle y, z\rangle}$. Geometrically, this means placing ${z}$ so that its projection onto the line through ${y}$ lies on the continuation of this line beyond ${y}$.

So far so good, but the disappearance of ${x}$ from the inequality is disturbing. Let’s bring it back by observing that ${\|y\|^2 \le \langle y, z\rangle}$ is equivalent to ${4(\|y\|^2 - \langle y, z\rangle) + \|z\|^2 \le \|x\|^2}$, which is simply ${\|2y-z\| \le \|x\|}$. So that’s what we want to do: map ${x}$ so that the distance from its image to ${2y}$ does not exceed ${\|x\|}$. In terms of matrices and their operator norm, this means ${\|2P-Q\|\le 1}$.

It remains to show that every square matrix of norm at most ${2}$ (such as ${2P}$ here) is within distance ${1}$ of some orthogonal matrix. Let ${2P = U\Sigma V^T}$ be the singular value decomposition, with ${U, V}$ orthogonal and ${\Sigma}$ a diagonal matrix with the singular values of ${2P}$ on the diagonal. Since the singular values of ${2P}$ are between ${0}$ and ${2}$, it follows that ${\|\Sigma-I\|\le 1}$. Hence ${\|2P - UV^T\|\le 1}$, and taking ${Q=UV^T}$ concludes the proof.

(The proof is based on a Stack Exchange post by user hypernova.)

## Continuity and diameters of connected sets

The definition of uniform continuity (if it’s done right) can be phrased as: ${f\colon X\to Y}$ is uniformly continuous if there exists a function ${\omega\colon (0,\infty)\to (0,\infty)}$, with ${\omega(0+)=0}$, such that ${\textrm{diam}\, f(E)\le \omega (\textrm{diam}\, E)}$ for every set ${E\subset X}$. Indeed, when ${E}$ is a two-point set ${\{a,b\}}$ this is the same as ${|f(a)-f(b)|\le \omega(|a-b|)}$, the modulus of continuity. Allowing general sets ${E}$ does not change anything, since the diameter is determined by two-point subsets.

Does it make a difference if we ask for ${\textrm{diam}\, f(E)\le \omega (\textrm{diam}\, E)}$ only for connected sets ${E}$? For functions defined on the real line, or on an interval of the line, there is no difference: we can just consider the intervals ${[a,b]}$ and obtain

${|f(a)-f(b)|\le \textrm{diam}\, f([a,b]) \le \omega(|a-b|)}$

as before.

However, the situation does change for maps defined on a non-convex domain. Consider the principal branch of square root, ${f(z)=\sqrt{z}}$, defined on the slit plane ${G=\mathbb C\setminus (-\infty, 0]}$.

This function is continuous on ${G}$ but not uniformly continuous, since ${f(-1 \pm i y) \to \pm i }$ as ${y\to 0+}$. Yet, it satisfies ${\textrm{diam}\, f(E)\le \omega(\textrm{diam}\, E)}$ for connected subsets ${E\subset G}$, where one can take ${\omega(\delta)=2\sqrt{\delta}}$. I won’t do the estimates; let’s just note that although the points ${-1 \pm i y}$ are close to each other, any connected subset of ${G}$ containing both of them has diameter greater than 1.

In a way, this is still uniform continuity, just with respect to a different metric. Given a metric space ${(X,d)}$, one can define inner diameter metric ${\rho}$ on ${X}$ by letting ${\rho(a,b)}$ be the infimum of diameters of connected sets that contain both ${a}$ and ${b}$. This is indeed a metric if the space ${X}$ is reasonable enough (i.e., any two points are contained in some bounded connected set). On a convex subset of ${\mathbb R^n}$, the inner diameter metric coincides with the Euclidean metric ${d_2}$.

One might think that the equality ${\rho=d_e}$ should imply that the domain is convex, but this is not so. Indeed, consider the union of three quadrants on a plane, say ${A = \{(x,y) \colon x > 0\text{ or }y > 0\}}$. Any two points of ${A}$ can be connected by going up from whichever is lower, and then moving horizontally. The diameter of a right triangle is equal to its hypotenuse, which is the Euclidean distance between the points we started with.

Inner diameter metric comes up (often implicitly) in complex analysis. By the Riemann mapping theorem, every simply connected domain ${G\subset \mathbb C}$, other than ${\mathbb C}$ itself, admits a conformal map ${f\colon G\to \mathbb D}$ onto the unit disk ${D}$. This map need not be uniformly continuous in the Euclidean metric (the slit plane is one example), but it is uniformly continuous with respect to the inner diameter metric on ${G}$.

Furthermore, by normalizing the situation in a natural way (say, ${G \supset \mathbb D}$ and ${f(0)=0}$), one can obtain a uniform modulus of continuity for all conformal maps ${f}$ onto the unit disk, whatever the domain is. This uniform modulus of continuity can be taken of the form ${\omega(\delta) = C\sqrt{\delta}}$ for some universal constant ${C}$. Informally speaking, this means that a slit domain is the worst that can happen to the continuity of a conformal map. This fact isn’t often mentioned in complex analysis books. A proof can be found in the book Conformally Invariant Processes in the Plane by Gregory Lawler, Proposition 3.85. A more elementary proof, with a rougher estimate for the modulus of continuity, is on page 15 of lecture notes by Mario Bonk.

## Gromov-Hausdorff convergence

The Hausdorff distance ${d_{ H}(A,B)}$ between two subsets ${A,B}$ of a metric space ${X}$ is defined by ${d_{ H}(A,B)=\inf\{r>0: A\subset B_r \text{ and } B\subset A_r \}}$, where ${A_r,B_r}$ are (open/closed does not matter) ${r}$-neighborhoods of the sets. Informally: ${d_{ H}(A,B) if no matter where you are in one set, you can jump into the other by traveling less than ${r}$. For example, the distance between letters S and U is about the length of the longer green arrow.

Generally, one assumes the sets to be closed (to avoid zero distance) and bounded (to avoid infinite distance). But I will not; in this post I’m not really interested in verifying all the axioms of a metric.

The Gromov-Hausdorff distance is defined between metric spaces ${X,Y}$ as follows: it is the infimum of ${d_{ H}(f(X),g(Y))}$ taken over all isometric embeddings ${f\colon X\rightarrow Z}$ and ${g\colon Y\rightarrow Z}$ into some metric space ${Z}$.

The infimum over all pairs of embeddings into all conceivable metric spaces does not sound like something you would want to compute in practice. Of course, the matter boils down to equipping the abstract union ${X\sqcup Y}$ with pseudometrics that are compatible with the original metrics on ${X}$ and ${Y}$.

A more directly computable notion of distance (not necessarily a metric) can be given as follows: ${\rho_{GH}(X,Y)}$ is the infimum of all ${\epsilon>0}$ for which there exist two maps ${f\colon X\rightarrow Y}$ and ${g\colon Y\rightarrow X}$ such that:

1. ${d_X(g\circ f(x),x)\le \epsilon}$ for all ${x\in X}$
2. ${d_Y(f\circ g(y),y) \le \epsilon}$ for all ${y\in Y}$
3. ${|d_Y(f(x_1), f(x_2)) - d_X(x_1,x_2)| \le \epsilon }$ for all ${x_1,x_2\in X}$
4. ${|d_X(g(y_1), g(y_2)) - d_Y(y_1,y_2)| \le \epsilon }$ for all ${y_1,y_2\in Y}$

This is not as elegant as “infimize over all metric space”, but is more practical. For example, it is easy to check that the sequence of one-sheeted hyperboloids ${H_n = \{x^2+y^2=z^2+1/n\}}$

converges to the cone ${C = \{x^2+y^2=z^2\}}$.

Using cylindrical coordinates, define ${f_n\colon H_n\rightarrow C}$ by ${f_n(r,\theta,z) = (\sqrt{r^2-1/n}, \theta,z) }$ and ${g\colon C\rightarrow H_n}$ by ${g_n(r,\theta,z) = (\sqrt{r^2+1/n}, \theta,z)}$, with an arbitrary choice of ${\theta}$ at the point ${g(0,0,0)}$. Now check the items one by one:

1. ${f_n\circ g_n}$ is the identity map on ${C}$
2. ${g_n\circ f_n}$ fixes all points of ${H_n}$ except for those with ${z=0}$. The latter are displaced by at most ${2/\sqrt{n}}$.
3. follows from ${|\sqrt{r^2-1/n} - r|\le 1/\sqrt{n}}$ on ${[1/n,\infty)}$
4. follows from ${|\sqrt{r^2+1/n} - r|\le 1/\sqrt{n}}$ on ${[0,\infty)}$

Note that the Gromov-Hausdorff convergence of manifolds is understood with respect to their intrinsic metrics. Although both ${H_n}$ and ${C}$ are naturally identified with subsets of ${\mathbb R^3}$, it would be a mistake to use the Hausdorff distance based on the Euclidean metric of ${\mathbb R^3}$. Even though this extrinsic metric is bi-Lipschitz equivalent to the intrinsic metric on both ${H_n}$ and ${C}$, bi-Lipschitz equivalence is too coarse to preserve the GH convergence in general. In general, the intrinsic metric on a manifold cannot be realized as the extrinsic metric in any Euclidean space.

In the example of hyperboloids we are lucky to have a Gromov-Hausdorff convergent sequence of unbounded spaces. Normally, bounded sets are required, and boundedness is imposed either by an exhaustion argument, or by changing the metric (moving to the projective space). For instance, the parabolas ${y=x^2/n}$ do not converge to the line ${y=0}$ as unbounded subsets of ${\mathbb R^2}$, but they do converge as subsets of ${\mathbb R\mathbb P^2}$. In the process, the degree jumps down from ${2}$ to ${1}$.

It seems that the Gromov-Hausdorff limit of algebraic varieties of degree at most ${d}$ is also an algebraic variety of degree at most ${d}$ (provided that we use the intrinsic metric; otherwise flat ellipses ${x^2+n^2y^2=1}$ would converge to a line segment). If this is true, I’m sure there’s a proof somewhere but I never saw one.