Weak convergence in metric spaces

Weak convergence in a Hilbert space is defined as pointwise convergence of functionals associated to the elements of the space. Specifically, {x_n\rightarrow x} weakly if the associated functionals {f_n(y) = \langle x_n, y \rangle} converge to {f(y) = \langle x_n, y \rangle} pointwise.

How could this idea be extended to metric spaces without linear structure? To begin with, {f_n(y)} could be replaced with {\|x_n-y\|^2-\|x_n\|^2}, since this agrees with original {f_n} up to some constant terms. Also, {\|x_n\|^2} here could be {\|x_n-z\|^2} for any fixed {z}; to avoid introducing another variable here, let’s use {x_1} for the purpose of fixed reference point {z}. Now we have a metric space version of the weak convergence: the functions
{f_n(y) = d(x_n,y)^2 - d(x_n,x_1)^2}
must converge pointwise to
{f(y) = d(x,y)^2 - d(x,x_1)^2}

The triangle inequality shows that strong convergence {d(x_n,x)\rightarrow 0} implies weak convergence, as expected. And the converse is not necessarily true, as the example of a Hilbert space shows.


Aside: the above is not the only way to define weak convergence in metric spaces. Another approach is to think of {\langle x_n , y\rangle} in terms of projection onto a line through {y}. A metric space version of this concept is the nearest-point projection onto a geodesic curve. This is a useful approach, but it is only viable for metric spaces with additional properties (geodesic, nonpositive curvature).

Also, both of these approaches take the Hilbert space case as the point of departure, and do not necessarily capture weak convergence in other normed spaces.


Let’s try this out in {\ell^1} with the standard basis sequence {e_n}. Here {f_n(y) = \|e_n-y\|^2 - 4 \rightarrow (\|y\| + 1)^2 - 4}. Is there an element {x\in \ell^1} such that

{\|x-y\|^2 - \|x-e_1\|^2= (\|y\|+1)^2 - 4} for all {y}?

Considering both sides as functions of one variable {y_n}, for a fixed {n}, shows that {x_n=0} for {n}, because the left hand side is non-differentiable at {y_n=x_n} while the right hand side is non-differentiable at {y_n=0}. Then the desired identity simplifies to {\|y\|^2 - 1 = (\|y\|+1)^2 - 4} which is false. Oh well, that sequence wasn’t weakly convergent to begin with: by Schur’s theorem, every weakly convergent sequence in {\ell^1} also converges strongly.

This example also shows that not every bounded sequence in a metric space has a weakly convergent subsequence, unlike the way it works in Hilbert spaces.

Rough isometries

An isometry is a map {f\colon X\rightarrow Y} between two metric spaces {X,Y} which preserves all distances: {d_Y(f(x),f(x')) = d_X(x,x')} for all {x,x'\in X}. (After typing a bunch of such formulas, one tends to prefer shorter notation: {|f(x)f(x')| = |xx'|}, with the metric inferred from contexts.) It’s a popular exercise to prove that every isometry from a compact metric space into itself is surjective.

A rough isometry allows additive distortion of distances: {|\,|f(x)f(x')| - |xx'|\,|\le \epsilon}. (As contrasted with bi-Lipschitz maps, which allow multiplicative distortion). Rough isometries ignore small-scale features (in particular, they need not be continuous), but accurately capture the large scale structure of a space. This makes them convenient for studying hyperbolic spaces (trees and their relatives), where the large-scale structure is of interest.

If {f} is an {\epsilon}-rough isometry, then the Gromov-Hausdorff distance {d_{GH}} between {X} and {f(X)} is at most {\epsilon/2}. This follows from a convenient characterization of {d_{GH}}: it is equal to {\frac12 \inf_R \textrm{dis}\,(R)} where the infimum is taken over all subsets {R\subset X\times Y} that project surjectively onto each factor, and {\textrm{dis}\,(R) = \sup \{|\,|xx'|-|yy'|\,| \colon xRy, \ x'Ry' \} }.


Since small-scale features are ignored, it is not quite natural to talk about rough isometries being surjective. A natural concept for them is {\epsilon}-surjectivity, meaning that every point of {Y} is within distance {\le \epsilon} of {f(X)}. When this happens, we can conclude that {d_{GH}(X,Y)\le 3\epsilon/2}, because the Hausdorff distance between {Y} and {f(X)} is at most {\epsilon}.

Recalling that an isometry of a compact metric space {X} into {X} is automatically onto, it is natural to ask whether {\epsilon}-rough isometries of {X} into {X} are {\epsilon}-surjective. This, however, turns out to be false in general.


First example, a finite space: {X=\{0,1,2,\dots,n\}} with the metric {d(i,j) = i+j} (when {i\ne j}). Consider the backward shift {f(i)=i-1} (and {f(0)=0}). By construction, this is a rough isometry with {\epsilon=2}. Yet, the point {n} is within distance {n} from {f(X) = \{0,1,2,\dots, n-1\}}.

The above metric can be thought of a the distance one has to travel from {i} to {j} with a mandatory visit to {0}. This makes it similar to the second example.


Second example, a geodesic space: {X} is the graph shown below, a subset of {\mathbb R^2} with the intrinsic (path) metric, i.e., the length of shortest path within the set.

Comb space
Comb space

Define {f(x,y) = ((x-1)^+, (y-1)^+)} where {{\,}^+} means the positive part. Again, this is a {2}-rough isometry. The omitted part, shown in red below, contains a ball of radius {5} centered at {(5,5)}. Of course, {5} can be replaced by any positive integer.

Omitted part in red
Omitted part in red

Both of these examples are intrinsically high-dimensional: they cannot be accurately embedded into {\mathbb R^d} for low {d} (using the restriction metric rather than the path metric). This raises a question: does there exist {C=C(d)} such that for every compact subset {K\subset \mathbb R^d}, equipped with the restriction metric, every {\epsilon}-rough isometry {f\colon K\rightarrow K} is {C(d)\epsilon}-surjective?

Binary intersection property, and not fixing what isn’t broken

A metric space has the binary intersection property if every collection of closed balls has nonempty intersection unless there is a trivial obstruction: the distance between centers of two balls exceeds the sum of their radii. In other words, for every family of points {x_\alpha\in X} and numbers {r_\alpha>0} such that {d(x_\alpha,x_\beta)\le r_\alpha+r_\beta} for all {\alpha,\beta} there exists {x\in X} such that {d(x_\alpha,x)\le r_\alpha} for all {\alpha}.

For example, {\mathbb R} has this property: {x=\inf_\alpha (x_\alpha+r_\alpha)} works. But {\mathbb R^2} does not:

Failure of the binary intersection property
Failure of the binary intersection property

The space of bounded sequences {\ell^\infty} has the binary intersection property, and so does the space {B[0,1]} of all bounded functions {f:[0,1]\rightarrow\mathbb R} with the supremum norm. Indeed, the construction for {\mathbb R} generalizes: given a family of bounded functions {f_\alpha} and numbers {r_\alpha>0} as in the definition, let {f(x)=\inf_\alpha (f_\alpha(x)+r_\alpha)}.


The better known space of continuous functions {C[0,1]} has the finite version of binary intersection property, because for a finite family, the construction {\inf_\alpha (f_\alpha(x)+r_\alpha)} produces a continuous function. However, the property fails without finiteness, as the following example shows.

Example. Let {f_n\in C[0,1]} be a function such that {f_n(x)=-1} for {x\le \frac12-\frac1n}, {f_n(x)=1} for {x\ge \frac12+\frac1n}, and {f_n} is linear in between.

Since {\|f_n-f_m\| \le 1} for all {n,m}, we can choose {r_n=1/2} for all {n}. But if a function {f} is such that {\|f-f_n\|\le \frac12} for all {n}, then {f(x) \le -\frac12} for {x<\frac12} and {f(x) \ge \frac12} for {x>\frac12}. There is no continuous function that does that.

More precisely, for every {f\in C[0,1]} we have {\liminf_{n\to\infty} \|f-f_n\|\ge 1 } because {f(x)\approx f(1/2)} in a small neighborhood of {1/2}, while {f_n} change from {1} to {-1} in the same neighborhood when {n} is large.


Given a discontinuous function, one can approximate it with a continuous function in some way: typically, using a mollifier. But such approximations tend to change the function even if it was continuous to begin with. Let’s try to not fix what isn’t broken: look for a retraction of {B[0,1]} onto {C[0,1]}, that is a map {\Phi:B[0,1]\rightarrow C[0,1]} such that {\Phi(f)=f} for all {f\in C[0,1]}.

The failure of binary intersection property, as demonstrated by the sequence {(f_n)} above, implies that {\Phi} cannot be a contraction. Indeed, let {f(x)= \frac12 \,\mathrm{sign}\,(x-1/2)}. This is a discontinuous function such that {\|f-f_n\|\le 1/2} for all {n}. Since {\liminf_{n\to\infty} \|\Phi(f)-f_n\|\ge 1}, it follows that {\Phi} cannot be {L}-Lipschitz with a constant {L<2}.


It was known for a while that there is a retraction from {B[0,1]} onto {C[0,1]} with the Lipschitz constant at most {20}: see Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss. In a paper published in 2007, Nigel Kalton proved the existence of a {2}-Lipschitz retraction.

Graphical convergence

The space of continuous functions (say, on {[0,1]}) is usually given the uniform metric: {d_u(f,g) = \sup_{x}|f(x)-g(x)|}. In other words, this is the smallest number {\rho} such that from every point of the graph of one function we can jump to the graph of another function by moving at distance {\le \rho} in vertical direction.

Uniform metric: vertical distances
Uniform metric is based on vertical distances

Now that I put it this way, why don’t we drop “in vertical direction”? It’ll still be a metric, namely the Hausdorff metric between the graphs of {f} and {g}. It’s natural to call it the graphical metric, denoted {d_g}; from the definition it’s clear that {d_g\le d_u}.

Graphical metric: Hausdorff distance
Weird metric, weird color

Some interesting things happen when the space of continuous functions is equipped with {d_g}. For one thing, it’s no longer a complete space: the sequence {f_n(x)=x^n} is Cauchy in {d_g} but has no limit.

Sequence of x^n
Sequence of x^n

On the other hand, the bounded subsets of {(C[0,1],d_g) } are totally bounded. Indeed, given {M>0} and {\epsilon>0} we can cover the rectangle {[0,1]\times [-M,M]} with a rectangular mesh of diameter at most {\epsilon}. For each function with {\sup|f|\le M}, consider the set of rectangles that its graph visits. There are finitely many possibilities for the sets of visited rectangles. And two functions that share the same set of visited rectangles are at graphical distance at most {\epsilon} from each other.

Boxy approximation to the graph
Boxy approximation to the graph

Thus, the completion of {C[0,1]} in the graphical metric should be a nice space: bounded closed subsets will be compact in it. What is this completion, concretely?

Here is a partial answer: if {(f_n)} is a graphically Cauchy sequence, its limit is the compact set {\{(x,y): g(x)\le y\le h(x)\}} where

\displaystyle    g(x) = \inf_{x_n\rightarrow x} \liminf f_n(x_n)

(the infimum taken over all sequences converging to {x}), and

\displaystyle    h(x) = \sup_{x_n\rightarrow x} \limsup f_n(x_n)

It’s not hard to see that {g} is upper semicontinuous and {h} is lower semicontinuous. Of course, {g\le h}. It seems that the set of such pairs {(g,h)} indeed describes the graphical completion of continuous functions.

For example, the limit of {f_n(x)=x^n} is described by the pair {g(x)\equiv 0}, {h(x)=\chi_{\{1\}}}. Geometrically, it’s a broken line with horizontal and vertical segments

For another example, the limit of {f_n(x)=\sin^2 nx} is described by the pair {g(x)\equiv 0}, {h(x)\equiv 1}. Geometrically, it’s a square.

Stripey sines
Stripey sines

Infinite beatitude of non-existence: a journey into Nothingland

In the novella Flatland by Edwin A. Abbott, the Sphere leads the Square “downward to the lowest depth of existence, even to the realm of Pointland, the Abyss of No dimensions”:

I caught these words, “Infinite beatitude of existence! It is; and there is nothing else beside It.” […] “It fills all Space,” continued the little soliloquizing Creature, “and what It fills, It is. What It thinks, that It utters; and what It utters, that It hears; and It itself is Thinker, Utterer, Hearer, Thought, Word, Audition; it is the One, and yet the All in All. Ah, the happiness, ah, the happiness of Being!”

Indeed, Pointland (a one-point space) is zero-dimensional by every concept of dimension that I know of. Yet there is something smaller: Nothingland — empty space, {\varnothing} — whose non-existent inhabitants must be perpetually enjoying the happiness of Non-Being.

What is the dimension of Nothingland?

In topology, the empty set has dimension {-1}. This fits the inductive definition of topological dimension, which is the smallest number {d} such that the space can be minced by removing a subset of dimension {\le d-1}. (Let’s say a space has been minced if what’s left has no connected subsets other than points.)

Thus, a nonempty finite (or countable) set has dimension {0}: it’s minced already, so we remove nothing, a set of dimension {-1}. A line or a curve is one-dimensional: they can be minced by removing a zero-dimensional subset, like rational numbers.

A curve minced by removing a zero-dimensional set
One-dimensional curve minced by removing zero-dimensional points.

The Flatland itself can be minced by removing a one-dimensional subset (e.g., circles with rational radius and rational coordinates of the center), so it is two-dimensional. And so on.

Flatland minced by removing a one-dimensional subset
Flatland minced by removing a one-dimensional subset

The convention {\mathrm{dim}\,\varnothing = -1}, helpful in the definition, gets in the way later. For example, the topological dimension is subadditive under products: {\mathrm{dim}\,(A\times B)\le \mathrm{dim}\,A + \mathrm{dim}\,B} … unless both {A} and {B} are empty, because then {-1\le -2} is false. So the case {A=B=\varnothing} must be excluded from the product theorem. We would not have to do this if {\mathrm{dim}\,\varnothing } was defined to be {-\infty}.

Next, consider the Hausdorff dimension. Its definition is not inductive, but one has to introduce other concepts first. First, define the {d}-dimensional premeasure on scale {\delta>0}:

\displaystyle    \mathcal H^d_\delta (X) = \inf \sum_j (\mathrm{diam}\,{U_j})^{d}

where the infimum is taken over all covers of {X} by nonempty subsets {U_j} with {\mathrm{diam}\,{U_j}\le \delta}. Requiring {U_j} to be nonempty avoids the need to define the diameter of Nothingland, which would be another story. The empty space can be covered by empty family of nonempty subsets. The sum of empty set of numbers is {0}, and so {\mathcal H^d_\delta (\varnothing) = 0}.

Then we define the {d}-dimensional Hausdorff measure:

\displaystyle    \mathcal H^d (X) = \lim_{\delta\rightarrow0} \mathcal H^d_\delta (X)

and finally,

\displaystyle    \mathrm{dim}_H (X) = \inf \{ d \colon \mathcal H^d (X)=0\}

If in this last infimum we require {d>0}, the result is {\mathrm{dim}_H (\varnothing) =0}. But why make this restriction? The {d}-dimensional pre-measures and measures make sense for all real {d}. It’s just that for nonempty {X}, we are raising some small (or even zero) numbers to negative power, getting something large as a result. Consequently, every nonempty space has {\mathcal H^d = \infty} for all {d < 0}.

But {\mathcal H^d_\delta (\varnothing) = 0}, from the sum of empty collection of numbers being zero. Hence, {\mathcal H^d (\varnothing) = 0} for all real {d}, and this leads to {\mathrm{dim}_H\,\varnothing = -\infty}.

To have {\mathrm{dim}_H\,\varnothing = -\infty} is also convenient because the Hausdorff dimension is superadditive under products: {\mathrm{dim}_H\,(A\times B)\ge \mathrm{dim}_H\,A + \mathrm{dim}_H\,B}. This inequality was proved for general metric spaces as recently as 1995, by John Howroyd. If we don’t have {\mathrm{dim}_H\,\varnothing = -\infty}, then both factors {A} and {B} must be assumed nonempty.

So… should Nothingland have topological dimension {-1} and Hausdorff dimension {-\infty}? But that would violate the inequality {\mathrm{dim} (X)\le \mathrm{dim}_H (X)} which holds for every other separable metric space. In fact, for such spaces the topological dimension is simply the infimum of the Hausdorff dimension over all metrics compatible with the topology.

I am inclined to let the dimension of Nothingland be {-\infty} for every concept of dimension.

Polygonal inequalities: beyond the triangle

(Related to previous post but can be read independently). The triangle inequality, one of the axioms of a metric space, can be visualized by coloring the vertices of a triangle by red and blue.

Triangle with colored vertices
Colored Triangle

The inequality says that the sum of monochromatic distances must not exceed the sum of dichromatic distances. That is, for every assignment of the vertices to points in the space, the sum of all red-red and blue-blue distances does not exceed the sum of red-blue distances. An assignment is just a map from the set of vertices {V} to the metric space {X}; it need not be injective.

But why stop at the triangle? We can take any number of points (vertices), color them in some way, and require the same polygonal inequality:

\displaystyle \text{monochromatic } \le \text{ dichromatic}

Already for the square we have two distinct plausible colorings to explore: even red-blue split

Evenly colored square
Evenly Colored Square

and predominantly red square

(mostly) Red Square
(mostly) Red Square

But it turns out that the second coloring is useless: the inequality it defines fails in every metric space with more than one point. More generally, suppose we have {R} red points and {B} blue ones, and {R- B\ge 2}. Pick two distinct points {a,b\in X}. Assign one red point to {a} and all others to {b}. The sum of monochromatic distances is {(R-1)\,d(a,b)} while the sum of dichromatic distances is {B\,d(a,b)}, which is strictly less.

So, we are limited to nearly-even colorings: those with {|R-B|\le 1}. For even numbers of vertices this means even split, while odd numbers should be divided as evenly as possible: like 3+2 for the pentagon.

Here is the pentagram again.
Colored Pentagon

The inequalities turn out to be related. For every {n}, the {n}-gonal inequality implies the {(n-2)}-gonal inequality, because if we assign two opposite-colored vertices to the same point, their contributions cancel out. More interestingly: when {n} is odd, the {n}-gonal inequality implies the {(n-1)}-gonal inequality. Indeed, suppose we have {(n-1)} points, evenly colored. Add an arbitrary {n}th point. Whether the added point is blue or red, the {n}-gonal inequality holds. Averaging these two inequalities, we see the effect of the added points canceling out.

So, if the {n}-gonal inequality holds for all odd {n}, it holds for all {n}. This property is exactly the hypermetric property from the previous post. Except it was stated there in a different form:

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. But if the point {x_i} is repeated {|b_i|} times, we can replace {b_i} by {\mathrm{sign}\,b_i}. Then represent +1 as blue and -1 as red.

The hypermetric inequalities were introduced by John B. Kelly (a student of William Ted Martin) in late 1960s. He showed they are necessary for the space to be embeddable into the space {\ell^1}. It would be great if they were also sufficient (and for some classes of spaces they are), but this is not so: a counterexample was given by Patrice Assouad in 1977.

It is also interesting to consider the {n}-gonal inequalities for even {n} only. By repetition of vertices, they are equivalent to requiring

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \quad\quad \quad (1)

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=0}. But of course then we have (1) for rational {b_i} (just clear the denominators), hence for all real {b_i} (by approximation), as long as they sum to {0}. So, the requirement amounts to the matrix {(d(x_i,d_j))} being negative semidefinite on the subspace {\sum x_i=0}. Such metrics are called metrics of negative type.

Their relation to embeddability of the space is well-known: {(X,d)} is of negative type if and only if the “snowflake” {(X,\sqrt{d})} isometrically embeds into a Hilbert space. In other words, we can “draw” any finite metric space of negative type in a Euclidean space, with the understanding that Euclidean distances represent the square roots of the actual distances. This embedding result is a 1935 theorem of Isaac Schoenberg who is also known for connecting dots naturally (introducing splines).

Pentagrams and hypermetrics

The Wikipedia article Metric (mathematics) offers a plenty of flavors of metrics, from common to obscure: ultrametric, pseudometric, quasimetric, semimetric, premetric, hemimetric and pseudoquasimetric (I kid you not).

One flavor it does not mention is a hypermetric. This is a metric {d} on a set {X} such that the inequality

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \ \ \ \ \ \ \ \ \ (1)

holds for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. The requirement that {b_i} be integers gives some combinatorial meaning to (1); this is not just some quadratic form being negative semidefinite.

As a warm-up, observe that (1) contains in it the triangle inequality: with {b_1=b_2=1} and {b_3=-1} we get {d(x_1,x_2)-d(x_1,x_3)-d(x_2,x_3)\le 0}. But it appears that (1) says something about “polygons” with more vertices too.

To make (1) worth thinking about, it should be satisfied by some important metric space. Such as the real line {\mathbb R}, for example. It is not quite obvious that the inequality

\displaystyle  \sum_{i,j}b_i b_j |a_i - a_j| \le 0 \ \ \ \ \ \ \ \ \ (2)

holds for all reals {a_i} and all integers {b_i} adding up to {1}. It helps to order the numbers: {a_1\le \dots\le a_m} and focus on the contribution of a particular gap {[a_k,a_{k+1}]} to the sum (2). The amount it contributes is {|a_k-a_{k+1}|} multiplied by

\displaystyle    \sum_{i\le k<j} b_i b_j = \left(\sum_{i\le k} b_i \right) \left(\sum_{j > k} b_j \right) = \left(\sum_{i\le k} b_i \right) \left(1-\sum_{i\le k} b_i \right) \le 0

because {n(1-n)\le 0} for every integer {n}. This proves (2).

Now that we have one hypermetric space, {\mathbb R}, other such spaces can be created easily. If {X} is any set and {f \colon X\rightarrow\mathbb R} any function, consider {d(x,y) = |f(x)-f(y)|}, the pullback pseudometric on {X}. By applying (2) to the numbers {f(x_i)}, we see that {d} satisfies the hypermetric inequality. Since (1) is additive in {d}, we can take any family of functions {f_\alpha \colon X\rightarrow\mathbb R} and add together the corresponding pseudometrics. Or even integrate them against a positive measure: {d(x,y)=\int |f_\alpha(x)-f_\alpha(y)|\,d\mu(\alpha)}.

For example, the plane {\mathbb R^2} is a hypermetric space, because the distance between two points {(x_1,y_1)} and {(x_2,y_2)}, besides the familiar form

\displaystyle  \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}

can also be represented as an integral of the aforementioned pullbacks:

\displaystyle  \frac12 \int_0^\pi \big| (x_1-x_2)\cos \alpha + (y_1-y_2) \sin\alpha \big| \,d\alpha

A similar integral representation holds in all dimensions; thus, all Euclidean spaces are hypermetric.

Okay, what is not a hypermetric then? For example, the cube distance induced by the norm {\|x\|_\infty=\max |x_i|} is not, in dimensions 3 and higher. Specifically, (1) fails as the five-point inequality with {(b_1,\dots,b_5) =(1,1,1,-1,-1)}. I’ll call it the pentagram inequality:

also known as K_5 in mystical literature
also known as K_5 in mystical literature

It says that for any five points in the space the sum of monochromatic distances does not exceed the sum of all bi-chromatic (red-blue) distances.

The pentagram inequality fails when {x_1,\dots,x_5} are the columns of the matrix

\displaystyle  \begin{pmatrix}    1& 1& 1& 2& 0\\    0& 2& 2& 1& 1\\    0& 1& 0& 1& 0\\    \end{pmatrix}

(first three columns blue, the last two red). Indeed, the sum of monochromatic distances is {2+1+2+2=7} while the sum of bichromatic distances is {1+1+1+1+1+1=6}.

If the above example does not look conceptual enough, it’s because I found it via computer search. I don’t have much intuition for the pentagram inequality.

Anyway, the example delivers another proof that taking the maximum of three numbers is hard. More precisely, there is no isometric embedding of {\mathbb R^3} with the maximum metric into {\ell_1}. Unlike the earlier proof, this one does not assume the embedding is linear.

A good reference for hypermetric inequalities is the book Geometry of cuts and metrics by Deza and Laurent.