Weak convergence in metric spaces

Weak convergence in a Hilbert space is defined as pointwise convergence of functionals associated to the elements of the space. Specifically, {x_n\rightarrow x} weakly if the associated functionals {f_n(y) = \langle x_n, y \rangle} converge to {f(y) = \langle x_n, y \rangle} pointwise.

How could this idea be extended to metric spaces without linear structure? To begin with, {f_n(y)} could be replaced with {\|x_n-y\|^2-\|x_n\|^2}, since this agrees with original {f_n} up to some constant terms. Also, {\|x_n\|^2} here could be {\|x_n-z\|^2} for any fixed {z}; to avoid introducing another variable here, let’s use {x_1} for the purpose of fixed reference point {z}. Now we have a metric space version of the weak convergence: the functions
{f_n(y) = d(x_n,y)^2 - d(x_n,x_1)^2}
must converge pointwise to
{f(y) = d(x,y)^2 - d(x,x_1)^2}

The triangle inequality shows that strong convergence {d(x_n,x)\rightarrow 0} implies weak convergence, as expected. And the converse is not necessarily true, as the example of a Hilbert space shows.


Aside: the above is not the only way to define weak convergence in metric spaces. Another approach is to think of {\langle x_n , y\rangle} in terms of projection onto a line through {y}. A metric space version of this concept is the nearest-point projection onto a geodesic curve. This is a useful approach, but it is only viable for metric spaces with additional properties (geodesic, nonpositive curvature).

Also, both of these approaches take the Hilbert space case as the point of departure, and do not necessarily capture weak convergence in other normed spaces.


Let’s try this out in {\ell^1} with the standard basis sequence {e_n}. Here {f_n(y) = \|e_n-y\|^2 - 4 \rightarrow (\|y\| + 1)^2 - 4}. Is there an element {x\in \ell^1} such that

{\|x-y\|^2 - \|x-e_1\|^2= (\|y\|+1)^2 - 4} for all {y}?

Considering both sides as functions of one variable {y_n}, for a fixed {n}, shows that {x_n=0} for {n}, because the left hand side is non-differentiable at {y_n=x_n} while the right hand side is non-differentiable at {y_n=0}. Then the desired identity simplifies to {\|y\|^2 - 1 = (\|y\|+1)^2 - 4} which is false. Oh well, that sequence wasn’t weakly convergent to begin with: by Schur’s theorem, every weakly convergent sequence in {\ell^1} also converges strongly.

This example also shows that not every bounded sequence in a metric space has a weakly convergent subsequence, unlike the way it works in Hilbert spaces.

Rough isometries

An isometry is a map {f\colon X\rightarrow Y} between two metric spaces {X,Y} which preserves all distances: {d_Y(f(x),f(x')) = d_X(x,x')} for all {x,x'\in X}. (After typing a bunch of such formulas, one tends to prefer shorter notation: {|f(x)f(x')| = |xx'|}, with the metric inferred from contexts.) It’s a popular exercise to prove that every isometry from a compact metric space into itself is surjective.

A rough isometry allows additive distortion of distances: {|\,|f(x)f(x')| - |xx'|\,|\le \epsilon}. (As contrasted with bi-Lipschitz maps, which allow multiplicative distortion). Rough isometries ignore small-scale features (in particular, they need not be continuous), but accurately capture the large scale structure of a space. This makes them convenient for studying hyperbolic spaces (trees and their relatives), where the large-scale structure is of interest.

If {f} is an {\epsilon}-rough isometry, then the Gromov-Hausdorff distance {d_{GH}} between {X} and {f(X)} is at most {\epsilon/2}. This follows from a convenient characterization of {d_{GH}}: it is equal to {\frac12 \inf_R \textrm{dis}\,(R)} where the infimum is taken over all subsets {R\subset X\times Y} that project surjectively onto each factor, and {\textrm{dis}\,(R) = \sup \{|\,|xx'|-|yy'|\,| \colon xRy, \ x'Ry' \} }.


Since small-scale features are ignored, it is not quite natural to talk about rough isometries being surjective. A natural concept for them is {\epsilon}-surjectivity, meaning that every point of {Y} is within distance {\le \epsilon} of {f(X)}. When this happens, we can conclude that {d_{GH}(X,Y)\le 3\epsilon/2}, because the Hausdorff distance between {Y} and {f(X)} is at most {\epsilon}.

Recalling that an isometry of a compact metric space {X} into {X} is automatically onto, it is natural to ask whether {\epsilon}-rough isometries of {X} into {X} are {\epsilon}-surjective. This, however, turns out to be false in general.


First example, a finite space: {X=\{0,1,2,\dots,n\}} with the metric {d(i,j) = i+j} (when {i\ne j}). Consider the backward shift {f(i)=i-1} (and {f(0)=0}). By construction, this is a rough isometry with {\epsilon=2}. Yet, the point {n} is within distance {n} from {f(X) = \{0,1,2,\dots, n-1\}}.

The above metric can be thought of a the distance one has to travel from {i} to {j} with a mandatory visit to {0}. This makes it similar to the second example.


Second example, a geodesic space: {X} is the graph shown below, a subset of {\mathbb R^2} with the intrinsic (path) metric, i.e., the length of shortest path within the set.

Comb space
Comb space

Define {f(x,y) = ((x-1)^+, (y-1)^+)} where {{\,}^+} means the positive part. Again, this is a {2}-rough isometry. The omitted part, shown in red below, contains a ball of radius {5} centered at {(5,5)}. Of course, {5} can be replaced by any positive integer.

Omitted part in red
Omitted part in red

Both of these examples are intrinsically high-dimensional: they cannot be accurately embedded into {\mathbb R^d} for low {d} (using the restriction metric rather than the path metric). This raises a question: does there exist {C=C(d)} such that for every compact subset {K\subset \mathbb R^d}, equipped with the restriction metric, every {\epsilon}-rough isometry {f\colon K\rightarrow K} is {C(d)\epsilon}-surjective?

Binary intersection property, and not fixing what isn’t broken

A metric space has the binary intersection property if every collection of closed balls has nonempty intersection unless there is a trivial obstruction: the distance between centers of two balls exceeds the sum of their radii. In other words, for every family of points {x_\alpha\in X} and numbers {r_\alpha>0} such that {d(x_\alpha,x_\beta)\le r_\alpha+r_\beta} for all {\alpha,\beta} there exists {x\in X} such that {d(x_\alpha,x)\le r_\alpha} for all {\alpha}.

For example, {\mathbb R} has this property: {x=\inf_\alpha (x_\alpha+r_\alpha)} works. But {\mathbb R^2} does not:

Failure of the binary intersection property
Failure of the binary intersection property

The space of bounded sequences {\ell^\infty} has the binary intersection property, and so does the space {B[0,1]} of all bounded functions {f:[0,1]\rightarrow\mathbb R} with the supremum norm. Indeed, the construction for {\mathbb R} generalizes: given a family of bounded functions {f_\alpha} and numbers {r_\alpha>0} as in the definition, let {f(x)=\inf_\alpha (f_\alpha(x)+r_\alpha)}.


The better known space of continuous functions {C[0,1]} has the finite version of binary intersection property, because for a finite family, the construction {\inf_\alpha (f_\alpha(x)+r_\alpha)} produces a continuous function. However, the property fails without finiteness, as the following example shows.

Example. Let {f_n\in C[0,1]} be a function such that {f_n(x)=-1} for {x\le \frac12-\frac1n}, {f_n(x)=1} for {x\ge \frac12+\frac1n}, and {f_n} is linear in between.

Since {\|f_n-f_m\| \le 1} for all {n,m}, we can choose {r_n=1/2} for all {n}. But if a function {f} is such that {\|f-f_n\|\le \frac12} for all {n}, then {f(x) \le -\frac12} for {x<\frac12} and {f(x) \ge \frac12} for {x>\frac12}. There is no continuous function that does that.

More precisely, for every {f\in C[0,1]} we have {\liminf_{n\to\infty} \|f-f_n\|\ge 1 } because {f(x)\approx f(1/2)} in a small neighborhood of {1/2}, while {f_n} change from {1} to {-1} in the same neighborhood when {n} is large.


Given a discontinuous function, one can approximate it with a continuous function in some way: typically, using a mollifier. But such approximations tend to change the function even if it was continuous to begin with. Let’s try to not fix what isn’t broken: look for a retraction of {B[0,1]} onto {C[0,1]}, that is a map {\Phi:B[0,1]\rightarrow C[0,1]} such that {\Phi(f)=f} for all {f\in C[0,1]}.

The failure of binary intersection property, as demonstrated by the sequence {(f_n)} above, implies that {\Phi} cannot be a contraction. Indeed, let {f(x)= \frac12 \,\mathrm{sign}\,(x-1/2)}. This is a discontinuous function such that {\|f-f_n\|\le 1/2} for all {n}. Since {\liminf_{n\to\infty} \|\Phi(f)-f_n\|\ge 1}, it follows that {\Phi} cannot be {L}-Lipschitz with a constant {L<2}.


It was known for a while that there is a retraction from {B[0,1]} onto {C[0,1]} with the Lipschitz constant at most {20}: see Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss. In a paper published in 2007, Nigel Kalton proved the existence of a {2}-Lipschitz retraction.

Graphical convergence

The space of continuous functions (say, on {[0,1]}) is usually given the uniform metric: {d_u(f,g) = \sup_{x}|f(x)-g(x)|}. In other words, this is the smallest number {\rho} such that from every point of the graph of one function we can jump to the graph of another function by moving at distance {\le \rho} in vertical direction.

Uniform metric: vertical distances
Uniform metric is based on vertical distances

Now that I put it this way, why don’t we drop “in vertical direction”? It’ll still be a metric, namely the Hausdorff metric between the graphs of {f} and {g}. It’s natural to call it the graphical metric, denoted {d_g}; from the definition it’s clear that {d_g\le d_u}.

Graphical metric: Hausdorff distance
Weird metric, weird color

Some interesting things happen when the space of continuous functions is equipped with {d_g}. For one thing, it’s no longer a complete space: the sequence {f_n(x)=x^n} is Cauchy in {d_g} but has no limit.

Sequence of x^n
Sequence of x^n

On the other hand, the bounded subsets of {(C[0,1],d_g) } are totally bounded. Indeed, given {M>0} and {\epsilon>0} we can cover the rectangle {[0,1]\times [-M,M]} with a rectangular mesh of diameter at most {\epsilon}. For each function with {\sup|f|\le M}, consider the set of rectangles that its graph visits. There are finitely many possibilities for the sets of visited rectangles. And two functions that share the same set of visited rectangles are at graphical distance at most {\epsilon} from each other.

Boxy approximation to the graph
Boxy approximation to the graph

Thus, the completion of {C[0,1]} in the graphical metric should be a nice space: bounded closed subsets will be compact in it. What is this completion, concretely?

Here is a partial answer: if {(f_n)} is a graphically Cauchy sequence, its limit is the compact set {\{(x,y): g(x)\le y\le h(x)\}} where

\displaystyle    g(x) = \inf_{x_n\rightarrow x} \liminf f_n(x_n)

(the infimum taken over all sequences converging to {x}), and

\displaystyle    h(x) = \sup_{x_n\rightarrow x} \limsup f_n(x_n)

It’s not hard to see that {g} is upper semicontinuous and {h} is lower semicontinuous. Of course, {g\le h}. It seems that the set of such pairs {(g,h)} indeed describes the graphical completion of continuous functions.

For example, the limit of {f_n(x)=x^n} is described by the pair {g(x)\equiv 0}, {h(x)=\chi_{\{1\}}}. Geometrically, it’s a broken line with horizontal and vertical segments

For another example, the limit of {f_n(x)=\sin^2 nx} is described by the pair {g(x)\equiv 0}, {h(x)\equiv 1}. Geometrically, it’s a square.

Stripey sines
Stripey sines

Infinite beatitude of non-existence: a journey into Nothingland

In the novella Flatland by Edwin A. Abbott, the Sphere leads the Square “downward to the lowest depth of existence, even to the realm of Pointland, the Abyss of No dimensions”:

I caught these words, “Infinite beatitude of existence! It is; and there is nothing else beside It.” […] “It fills all Space,” continued the little soliloquizing Creature, “and what It fills, It is. What It thinks, that It utters; and what It utters, that It hears; and It itself is Thinker, Utterer, Hearer, Thought, Word, Audition; it is the One, and yet the All in All. Ah, the happiness, ah, the happiness of Being!”

Indeed, Pointland (a one-point space) is zero-dimensional by every concept of dimension that I know of. Yet there is something smaller: Nothingland — empty space, {\varnothing} — whose non-existent inhabitants must be perpetually enjoying the happiness of Non-Being.

What is the dimension of Nothingland?

In topology, the empty set has dimension {-1}. This fits the inductive definition of topological dimension, which is the smallest number {d} such that the space can be minced by removing a subset of dimension {\le d-1}. (Let’s say a space has been minced if what’s left has no connected subsets other than points.)

Thus, a nonempty finite (or countable) set has dimension {0}: it’s minced already, so we remove nothing, a set of dimension {-1}. A line or a curve is one-dimensional: they can be minced by removing a zero-dimensional subset, like rational numbers.

A curve minced by removing a zero-dimensional set
One-dimensional curve minced by removing zero-dimensional points.

The Flatland itself can be minced by removing a one-dimensional subset (e.g., circles with rational radius and rational coordinates of the center), so it is two-dimensional. And so on.

Flatland minced by removing a one-dimensional subset
Flatland minced by removing a one-dimensional subset

The convention {\mathrm{dim}\,\varnothing = -1}, helpful in the definition, gets in the way later. For example, the topological dimension is subadditive under products: {\mathrm{dim}\,(A\times B)\le \mathrm{dim}\,A + \mathrm{dim}\,B} … unless both {A} and {B} are empty, because then {-1\le -2} is false. So the case {A=B=\varnothing} must be excluded from the product theorem. We would not have to do this if {\mathrm{dim}\,\varnothing } was defined to be {-\infty}.

Next, consider the Hausdorff dimension. Its definition is not inductive, but one has to introduce other concepts first. First, define the {d}-dimensional premeasure on scale {\delta>0}:

\displaystyle    \mathcal H^d_\delta (X) = \inf \sum_j (\mathrm{diam}\,{U_j})^{d}

where the infimum is taken over all covers of {X} by nonempty subsets {U_j} with {\mathrm{diam}\,{U_j}\le \delta}. Requiring {U_j} to be nonempty avoids the need to define the diameter of Nothingland, which would be another story. The empty space can be covered by empty family of nonempty subsets. The sum of empty set of numbers is {0}, and so {\mathcal H^d_\delta (\varnothing) = 0}.

Then we define the {d}-dimensional Hausdorff measure:

\displaystyle    \mathcal H^d (X) = \lim_{\delta\rightarrow0} \mathcal H^d_\delta (X)

and finally,

\displaystyle    \mathrm{dim}_H (X) = \inf \{ d \colon \mathcal H^d (X)=0\}

If in this last infimum we require {d>0}, the result is {\mathrm{dim}_H (\varnothing) =0}. But why make this restriction? The {d}-dimensional pre-measures and measures make sense for all real {d}. It’s just that for nonempty {X}, we are raising some small (or even zero) numbers to negative power, getting something large as a result. Consequently, every nonempty space has {\mathcal H^d = \infty} for all {d < 0}.

But {\mathcal H^d_\delta (\varnothing) = 0}, from the sum of empty collection of numbers being zero. Hence, {\mathcal H^d (\varnothing) = 0} for all real {d}, and this leads to {\mathrm{dim}_H\,\varnothing = -\infty}.

To have {\mathrm{dim}_H\,\varnothing = -\infty} is also convenient because the Hausdorff dimension is superadditive under products: {\mathrm{dim}_H\,(A\times B)\ge \mathrm{dim}_H\,A + \mathrm{dim}_H\,B}. This inequality was proved for general metric spaces as recently as 1995, by John Howroyd. If we don’t have {\mathrm{dim}_H\,\varnothing = -\infty}, then both factors {A} and {B} must be assumed nonempty.

So… should Nothingland have topological dimension {-1} and Hausdorff dimension {-\infty}? But that would violate the inequality {\mathrm{dim} (X)\le \mathrm{dim}_H (X)} which holds for every other separable metric space. In fact, for such spaces the topological dimension is simply the infimum of the Hausdorff dimension over all metrics compatible with the topology.

I am inclined to let the dimension of Nothingland be {-\infty} for every concept of dimension.

Polygonal inequalities: beyond the triangle

(Related to previous post but can be read independently). The triangle inequality, one of the axioms of a metric space, can be visualized by coloring the vertices of a triangle by red and blue.

Triangle with colored vertices
Colored Triangle

The inequality says that the sum of monochromatic distances must not exceed the sum of dichromatic distances. That is, for every assignment of the vertices to points in the space, the sum of all red-red and blue-blue distances does not exceed the sum of red-blue distances. An assignment is just a map from the set of vertices {V} to the metric space {X}; it need not be injective.

But why stop at the triangle? We can take any number of points (vertices), color them in some way, and require the same polygonal inequality:

\displaystyle \text{monochromatic } \le \text{ dichromatic}

Already for the square we have two distinct plausible colorings to explore: even red-blue split

Evenly colored square
Evenly Colored Square

and predominantly red square

(mostly) Red Square
(mostly) Red Square

But it turns out that the second coloring is useless: the inequality it defines fails in every metric space with more than one point. More generally, suppose we have {R} red points and {B} blue ones, and {R- B\ge 2}. Pick two distinct points {a,b\in X}. Assign one red point to {a} and all others to {b}. The sum of monochromatic distances is {(R-1)\,d(a,b)} while the sum of dichromatic distances is {B\,d(a,b)}, which is strictly less.

So, we are limited to nearly-even colorings: those with {|R-B|\le 1}. For even numbers of vertices this means even split, while odd numbers should be divided as evenly as possible: like 3+2 for the pentagon.

Here is the pentagram again.
Colored Pentagon

The inequalities turn out to be related. For every {n}, the {n}-gonal inequality implies the {(n-2)}-gonal inequality, because if we assign two opposite-colored vertices to the same point, their contributions cancel out. More interestingly: when {n} is odd, the {n}-gonal inequality implies the {(n-1)}-gonal inequality. Indeed, suppose we have {(n-1)} points, evenly colored. Add an arbitrary {n}th point. Whether the added point is blue or red, the {n}-gonal inequality holds. Averaging these two inequalities, we see the effect of the added points canceling out.

So, if the {n}-gonal inequality holds for all odd {n}, it holds for all {n}. This property is exactly the hypermetric property from the previous post. Except it was stated there in a different form:

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. But if the point {x_i} is repeated {|b_i|} times, we can replace {b_i} by {\mathrm{sign}\,b_i}. Then represent +1 as blue and -1 as red.

The hypermetric inequalities were introduced by John B. Kelly (a student of William Ted Martin) in late 1960s. He showed they are necessary for the space to be embeddable into the space {\ell^1}. It would be great if they were also sufficient (and for some classes of spaces they are), but this is not so: a counterexample was given by Patrice Assouad in 1977.

It is also interesting to consider the {n}-gonal inequalities for even {n} only. By repetition of vertices, they are equivalent to requiring

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \quad\quad \quad (1)

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=0}. But of course then we have (1) for rational {b_i} (just clear the denominators), hence for all real {b_i} (by approximation), as long as they sum to {0}. So, the requirement amounts to the matrix {(d(x_i,d_j))} being negative semidefinite on the subspace {\sum x_i=0}. Such metrics are called metrics of negative type.

Their relation to embeddability of the space is well-known: {(X,d)} is of negative type if and only if the “snowflake” {(X,\sqrt{d})} isometrically embeds into a Hilbert space. In other words, we can “draw” any finite metric space of negative type in a Euclidean space, with the understanding that Euclidean distances represent the square roots of the actual distances. This embedding result is a 1935 theorem of Isaac Schoenberg who is also known for connecting dots naturally (introducing splines).

Pentagrams and hypermetrics

The Wikipedia article Metric (mathematics) offers a plenty of flavors of metrics, from common to obscure: ultrametric, pseudometric, quasimetric, semimetric, premetric, hemimetric and pseudoquasimetric (I kid you not).

One flavor it does not mention is a hypermetric. This is a metric {d} on a set {X} such that the inequality

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \ \ \ \ \ \ \ \ \ (1)

holds for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. The requirement that {b_i} be integers gives some combinatorial meaning to (1); this is not just some quadratic form being negative semidefinite.

As a warm-up, observe that (1) contains in it the triangle inequality: with {b_1=b_2=1} and {b_3=-1} we get {d(x_1,x_2)-d(x_1,x_3)-d(x_2,x_3)\le 0}. But it appears that (1) says something about “polygons” with more vertices too.

To make (1) worth thinking about, it should be satisfied by some important metric space. Such as the real line {\mathbb R}, for example. It is not quite obvious that the inequality

\displaystyle  \sum_{i,j}b_i b_j |a_i - a_j| \le 0 \ \ \ \ \ \ \ \ \ (2)

holds for all reals {a_i} and all integers {b_i} adding up to {1}. It helps to order the numbers: {a_1\le \dots\le a_m} and focus on the contribution of a particular gap {[a_k,a_{k+1}]} to the sum (2). The amount it contributes is {|a_k-a_{k+1}|} multiplied by

\displaystyle    \sum_{i\le k<j} b_i b_j = \left(\sum_{i\le k} b_i \right) \left(\sum_{j > k} b_j \right) = \left(\sum_{i\le k} b_i \right) \left(1-\sum_{i\le k} b_i \right) \le 0

because {n(1-n)\le 0} for every integer {n}. This proves (2).

Now that we have one hypermetric space, {\mathbb R}, other such spaces can be created easily. If {X} is any set and {f \colon X\rightarrow\mathbb R} any function, consider {d(x,y) = |f(x)-f(y)|}, the pullback pseudometric on {X}. By applying (2) to the numbers {f(x_i)}, we see that {d} satisfies the hypermetric inequality. Since (1) is additive in {d}, we can take any family of functions {f_\alpha \colon X\rightarrow\mathbb R} and add together the corresponding pseudometrics. Or even integrate them against a positive measure: {d(x,y)=\int |f_\alpha(x)-f_\alpha(y)|\,d\mu(\alpha)}.

For example, the plane {\mathbb R^2} is a hypermetric space, because the distance between two points {(x_1,y_1)} and {(x_2,y_2)}, besides the familiar form

\displaystyle  \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}

can also be represented as an integral of the aforementioned pullbacks:

\displaystyle  \frac12 \int_0^\pi \big| (x_1-x_2)\cos \alpha + (y_1-y_2) \sin\alpha \big| \,d\alpha

A similar integral representation holds in all dimensions; thus, all Euclidean spaces are hypermetric.

Okay, what is not a hypermetric then? For example, the cube distance induced by the norm {\|x\|_\infty=\max |x_i|} is not, in dimensions 3 and higher. Specifically, (1) fails as the five-point inequality with {(b_1,\dots,b_5) =(1,1,1,-1,-1)}. I’ll call it the pentagram inequality:

also known as K_5 in mystical literature
also known as K_5 in mystical literature

It says that for any five points in the space the sum of monochromatic distances does not exceed the sum of all bi-chromatic (red-blue) distances.

The pentagram inequality fails when {x_1,\dots,x_5} are the columns of the matrix

\displaystyle  \begin{pmatrix}    1& 1& 1& 2& 0\\    0& 2& 2& 1& 1\\    0& 1& 0& 1& 0\\    \end{pmatrix}

(first three columns blue, the last two red). Indeed, the sum of monochromatic distances is {2+1+2+2=7} while the sum of bichromatic distances is {1+1+1+1+1+1=6}.

If the above example does not look conceptual enough, it’s because I found it via computer search. I don’t have much intuition for the pentagram inequality.

Anyway, the example delivers another proof that taking the maximum of three numbers is hard. More precisely, there is no isometric embedding of {\mathbb R^3} with the maximum metric into {\ell_1}. Unlike the earlier proof, this one does not assume the embedding is linear.

A good reference for hypermetric inequalities is the book Geometry of cuts and metrics by Deza and Laurent.

Re: “How many sides does a circle have?”

The post is inspired by this story told by JDH at Math.SE.


My third-grade son came home a few weeks ago with similar homework questions:

How many faces, edges and vertices do the following
have?

  • cube
  • cylinder
  • cone
  • sphere

Like most mathematicians, my first reaction was that for the latter objects the question would need a precise definition of face, edge and vertex, and isn’t really sensible without such definitions.

But after talking about the problem with numerous people, conducting a kind of social/mathematical experiment, I observed something intriguing. What I observed was that none of my non-mathematical friends and acquaintances had any problem with using an intuitive geometric concept here, and they all agreed completely that the answers should be

  • cube: 6 faces, 12 edges, 8 vertices
  • cylinder: 3 faces, 2 edges, 0 vertices
  • cone: 2 faces, 1 edge, 1 vertex
  • sphere: 1 face, 0 edges, 0 vertices

Indeed, these were also the answers desired by my son’s teacher (who is a truly outstanding teacher). Meanwhile, all of my mathematical colleagues hemmed and hawed about how we can’t really answer, and what does “face” mean in this context anyway, and so on; most of them wanted ultimately to say that a sphere has infinitely many faces and infinitely many vertices and so on. For the homework, my son wrote an explanation giving the answers above, but also explaining that there was a sense in which some of the answers were infinite, depending on what was meant.

At a party this past weekend full of mathematicians and philosophers, it was a fun game to first ask a mathematician the question, who invariably made various objections and refusals and and said it made no sense and so on, and then the non-mathematical spouse would forthrightly give a completely clear account. There were many friendly disputes about it that evening.


Let’s track down this intuitive geometric concept that non-mathematicians possess. We are given a set {E\subset \mathbb R^n} and a point {v\in E}, and try to figure out whether {v} is a vertex, a part of an edge, or a part of a face. The answer should depend only on the shape of the set near {p}.

It is natural to say that a vector {v} is tangent to {E} at {p} if going along {v} we stay close to the set. Formally, the condition is {\lim_{t\to 0+} t^{-1}\,\mathrm{dist}\,(p+tv,E)=0}. Notice that the limit is one-sided: if {v} is tangent, {-v} may or may not be tangent.

The set of all tangent vectors to {E} at {p} is denoted by {T_pE} and is called the tangent cone. It is indeed a cone in the sense of being invariant under scaling. This set contains the zero vector, but need not be a linear space. Let’s say that the rank of point {p} is {k} if {T_pE} contains a linear space of dimension {k} but no linear space of dimension {k+1}.

Finally, define a rank {k} stratum of {E} as a connected component of the set of all points of rank {k}.

If {E} is the surface of a polyhedron, we get the familiar concepts of vertices (rank 0 strata), edges (rank 1) and faces (rank 2). For each of the homework solids the answer agrees with the opinion of non-mathematical crowd. Take the cone as an example:

Cone
Cone

At the vertex the tangent cone to the cone is… a cone. It contains no nontrivial linear space, hence the rank is 0. This is indeed a vertex.

Along the edge of the base the tangent cone is the union of two halfplanes:

Tangent cone at an edge point
Tangent cone at an edge point

Here the rank is 1: the tangent cone contains a line, but no planes.

Finally, at every point of smoothness the tangent cone is the tangent plane, so the rank is 2. The set of such points has two connected components, separated by the circular edge.

So much for the cone. As for the circle mentioned in the title, I regrettably find myself in agreement with Travis.


More seriously: the surface of a convex body is a classical example of an Alexandrov space (metric space of curvature bounded below in the triangle comparison sense). Perelman proved that any Alexandrov space can be stratified into topological manifolds. Lacking an ambient vector space, one obtains tangent cones by taking the Gromov-Hausdorff limit of blown-up neighborhoods of {p}. The tangent cone has no linear structure either — it is also a metric space — but it may be isometric to the product of {\mathbb R^k} with another metric space. The maximal {k} for which the tangent cone splits off {\mathbb R^k} becomes the rank of {p}.

Recently, Colding and Naber showed that the above approach breaks down for spaces which have only Ricci curvature bounds instead of triangle-comparison curvature. More precisely, their examples are metric spaces that arise as a noncollapsed limit of manifolds with a uniform lower Ricci bound. In this setting tangent cones are no longer uniquely determined by {p}, and they show that different cones at the same point may have different ranks.

Four-point CAT(0) condition

The definition of a metric space guarantees that if we pick three points {A,B,C} from it, we can draw a triangle in the Euclidean plane, with vertices labeled {A,B,C} so that the side lengths are equal to the corresponding distances {|AB|, |BC|, |CA|} in the metric space. In other words, we can represent any triple of points as vertices of a triangle.

Triangle representation
Triangle representation

The above paragraph is from Tight spans and Gromov hyperbolicity, but here the similarity ends. Moving on to four points {A,B,C,D} we find that it is possible to draw a quadrilateral with side lengths equal to the corresponding distances
{|AB|, |BC|, |CD|, |DA|}.

Quadrilateral with prescribed side lengths
Quadrilateral with prescribed side lengths

Except for the degenerate case (one length equal to the sum of three others), the quadrilateral is not rigid: one can change the angles while keeping the sidelengths the same.

Since the quadrilateral has one degree of freedom, we cannot hope to have the correct lengths of both diagonals {AC} and {BD}. One of two diagonals can be made to match the metric space distance, just be drawing two triangles and gluing them together. But this is not so interesting.

There is an interesting class of metric spaces for which one can always draw a Euclidean quadrilateral as above so that neither diagonal is too short (they are allowed to be too long). These are called CAT(0) spaces, and they generalize the concept of a Riemannian manifold of nonpositive sectional curvature (more on this later). Every Euclidean space is CAT(0) for obvious reasons.

For example, if a CAT(0) space contains four points with pairwise distances {|AB|=|CD|=2}, {|BC|=|DA|=1}, then {|AC|^2+|BD|^2\le 10}. Indeed, any Euclidean quadrilateral with these sidelengths is a parallelogram, in which the sum of squares of the diagonals is {2\cdot (2^2+1^2)=10}. (It never pays to draw {ABCD} with self-intersections, because that only makes the “diagonals” shorter).

For a non-parallelogram example, take {|AB|=2}, {|BC|=3}, {|CD|=4},and {|DA|=5}. The diagonal lengths {|AC|=4} and {|BD|=6} are allowed by the axioms of a metric space, but not by the CAT(0) condition.

Non-CAT(0) quadrilateral
Non-CAT(0) quadrilateral

This can be checked using a bunch of cosine formulas:

\displaystyle    \cos\angle ADB = \frac{5^2+6^2-2^2}{2\cdot 5\cdot 6} =\frac{19}{20}, \\ {}\\    \cos\angle CDB = \frac{4^2+6^2-3^2}{2\cdot 4\cdot 6} =\frac{43}{48}, \\ {} \\   \cos\angle CDA = \frac{|AD|/2}{|CD|} =\frac{5}{8},

where the last computation is simpler because {\triangle CDA} is isosceles. Since {\angle CDA=\angle CDB+\angle ADB}, the addition law of cosines yields a contradiction.

Let’s see what CAT(0) has to do with curvature. First, a closed curve with intrinsic metric is not a CAT(0) space. Indeed, one can find four points {A,B,C,D} on the curve that are equally spaced at distances {L/4}, where {L} is the length of the curve. Then the corresponding Euclidean quadrilateral must be a rhombus of sidelength {L/4}. One of its diagonals will be at most {L\sqrt{2}/4}, which is less than {L/2} required by the intrinsic metric of the curve.

As a corollary, a CAT(0) space does not have closed geodesics. One can object that compact negatively curved manifolds (e.g., double torus) have plenty of closed geodesics. The answer is that to actually characterize nonnegatively curved spaces, one must impose the CAT(0) condition only locally.

The standard definitions of a CAT(0) space require any two points to be connected by a geodesic, and are stated in terms of geodesics. Here is one of them: for any geodesic {\gamma\colon [a,b]\rightarrow X} and for any fixed point {A\in X} the function

\displaystyle  F(t)=d\,^2(\gamma(t),A)-t^2 \ \ \ \ \ (1)

is convex. (Here {d\,^2} is the square of the distance.)

Distance function on a geodesic
Distance function on a geodesic

The condition (1) may look contrived, until one realizes that in a Euclidean space {F} is always an affine function. Thus, the definition says that the distance function in a CAT(0) space is at least as convex as in a Euclidean space. By the way, (1) indeed implies that {t\mapsto d(\gamma(t),A)} is a convex function. I leave this as an exercise.

Tight spans and Gromov hyperbolicity

The definition of a metric space guarantees that if we pick three points {A,B,C} from it, we can draw a triangle in the Euclidean plane, with vertices labeled {A,B,C}, so that the side lengths are equal to the corresponding distances {|AB|, |AC|, |BC|}. In other words, we can represent any triple of points as vertices of a triangle.

Triangle representation

There is another, perhaps less obvious representation: as vertices of a tripod. Specifically, we can draw a graph of the form shown below and assign length to each edge so that {|AB|} is equal to the length of the shortest path from {A} to {B}, etc.

Tripod representation

Indeed, we have the system of linear equations

{\displaystyle \begin{cases}   a+b & = |AB| \\    a+c & = |AC| \\    b+c & = |BC| \\    \end{cases}}

with a unique solution

\displaystyle   \begin{cases} a &= \frac{1}{2}(|AB|+|AC|-|BC|) \\   b &= \frac{1}{2}(|AB|+|BC|-|AC|) \\    c &= \frac{1}{2}(|AC|+|BC|-|AB|)    \end{cases}

The numbers {a,b,c} are clearly nonnegative; if a number is zero, the corresponding edge can be contracted to a point. Simple as they are, these formulas have a name: {a} is called the Gromov product of {B} and {C} with respect to {A}, and is denoted {(B|C)_A}.

The more geometric way to arrive at tripod representation is to inscribe a circle in the triangle shown above, and note how the points of tangency divide the sides of the triangle:

Gromov products from inscribed circle

Next question: can we represent four points {A,B,C,D} in a similar way? The first approach fails: in general there is no Euclidean quadrilateral in which the sides and diagonals represent the distances between these four points. For example, if {|AB|=|BC|=|CD|=|DA|=1} and {|AC|=|BD|=2}, then any Euclidean representation must place both {B} and {D} in the midpoint between {A} and {C}, which contradicts {|BD|} being nonzero. Let’s try the metric graph approach.

By analogy with the tripod, we might hope at first for something like

Metric tree

but this does not work in general. Indeed, four points determine six distances, and the graph has only five degrees of freedom. We can add one more degree of freedom by thickening the connecting edge to a rectangle:

Generic tight span of a 4-point metric space

Does this work? Can we really find the required nonnegative numbers {a,b,c,d,e,f}? Yes! Writing and solving a system of six linear equations is tedious but can be done. A better way is to recognize that we have four tripods here, such as the tripod with vertices {A,B,C} and the edge lengths {a+e}, {b}, {c+f}. This immediately tells us that {b=(A|C)_B}. Similarly, {a=(B|C)_A}, {c=(B|D)_C} and {d=(A|C)_D}. The remaining lengths {e} and {f} can now be found directly. But in fact, we can find {e,f} independently of the other four lengths by considering the perfect matchings between the points {A,B,C,D}. Namely,

\displaystyle    \begin{cases}   |AB|+|CD| &= (a+b+c+d) + 2e \\   |AD|+|BC| &= (a+b+c+d) + 2f \\   |AC|+|BD| &= (a+b+c+d) + 2e+2f    \end{cases}

From here we get nonnegative {e} and {f} if and only if {\{AC,BD\}} is the longest matching of vertices (or one of the longest). In other words, we must consider the lengths of matchings before marking {A,B,C,D} on the graph. But once they are marked correctly, the rest is now clear: the numbers {2e} and {2f} are the amounts by which the longest matching exceeds the other two. In particular, the rectangle collapses into an edge precisely when the two longest matchings have the same length.

What we have here are examples of tight spans of metric spaces: the tripod is the tight span of 3-point space {\{A,B,C\}}, and the six-edge graph is the 1-skeleton of the tight span of {\{A,B,C,D\}}. (The rectangle should have been filled in to form a 2-cell; in general the tight span of n points is a polyhedral complex of dimension up to {\lfloor n/2\rfloor}.)

By definition, a metric space is Gromov hyperbolic if there exists a constant {\delta\ge 0} such that the tight span of any 4-point subset satisfies {\min(e,f)\le \delta}; that is, the rectangle in the middle is not too fat. Informally, this means the space is (contained in) a uniformly thickened metric tree. In terms of distances, the definition says that the lengths of the two longest matchings of {A,B,C,D} differ by at most {2\delta}. Out of several equivalent definitions of Gromov hyperbolicity, this is the most elementary one; it does not require the consideration of geodesics.

However, the real power of Gromov hyperbolicity transpires in the setting of geodesic spaces, where any two points are connected by an isometric image
of a line segment. To begin with, in geodesic spaces it suffices to consider the case when {D} lies on the geodesic from {A} to {C} (otherwise we replace {D}
with a nearby point on the geodesic). The hyperbolicity condition now says that in the geodesic triangle {ABC} each point on the side {AC} is within distance {\delta} of one of two other sides. Informally, all geodesic triangles in a Gromov hyperbolic space are uniformly thin.

This is indeed the case in the hyperbolic plane {\mathbb H^2}. Indeed, a direct computation shows that the area of any geodesic triangle in {\mathbb H^2} is less than {\pi} (it is {\pi} minus the sum of the angles). On the other hand, the area of a hyperbolic disk of radius {r} grows indefinitely as {r\rightarrow\infty}, since the total area of {\mathbb H^2} is infinite. It follows that in {\mathbb H^2} there is a uniform upper bound on the radius of a circle that can be inscribed into a triangle; the Gromov hyperbolicity of {\mathbb H^2} follows at once.

Even more immediately, {\mathbb R} is Gromov hyperbolic, with {\delta=0}. But {\mathbb R^2} is not, as it contains arbitrarily fat triangles (also, the perfect matchings of the points {\{(\pm n, \pm n)\}} have lengths {4\sqrt{2}n, 4n, 4n}, contradicting the four-point condition). Similarly, the graph of {y=|x|} with metric induced by restriction contains four-point configurations \{(\pm n,n),(\pm 2n,2n)\} and therefore is not Gromov hyperbolic. Since this graph is bi-Lipschitz equivalent to a line, we see that in general metric spaces Gromov hyperbolicity is not a bi-Lipschitz invariant. How could we expect it to be? bi-Lipschitz maps introduce multiplicative distortion of distances {L^{-1}|AB|\le |f(A)f(B)|\le L|AB|}, while Gromov hyperbolicity requires an additive bound.

Yet, if two geodesic metric spaces are bi-Lipschitz equivalent, then either both are Gromov hyperbolic, or neither. One can even replace bi-Lipschitz equivalence with a coarser notion of quasi-isometry, which requires {L^{-1}|AB|-M\le |f(A)f(B)|\le L|AB|+M} (and {f} need not be surjective; it should only cover some {\epsilon}-net in the codomain). The quasi-isometric invariance is what makes it possible to discuss hyperbolicity of finitely generated groups… but that topic will have to wait for another day.