Measure-distanced subsets of an interval

Suppose {(X, d)} is a bounded metric space in which we want to find {n} points at safe distance {\delta} from one other: {d(x_j, x_k)\ge \delta} for all {j\ne k}. Let {\delta_n} be the greatest value of {\delta} for which this is possible (technically, the supremum which might not be attained). We have {\mathrm{diam}\,(X) = \delta_2 \ge \delta_3\ge \cdots}.

The sequence {\delta_n} tends to zero if and only if {X} is totally bounded. From now on, consider a specific metric space which is bounded but not totally bounded: the space {X} of all measurable subsets of {[0, 1]} with the metric {d(A, B)=|A\triangle B|}, the measure of the symmetric difference. More specifically, the elements of {X} are equivalence classes of sets that differ by a null subset. Another way to think of it is the subset of {L^1([0, 1])} which consists of functions that attain only the values {0} and {1}. In this form we identify a set {A} with its characteristic function {\chi_A}. The measure space {[0, 1]} could be replaced by any isomorphic probability space, of which there are many.

The space {X} contains an infinite subset {\{E_1, E_2, \dots \}} with {\delta=1/2} separation: namely, let {E_k} be the set of all numbers {x\in [0, 1]} such that the fractional part of {2^k x} is less than {1/2}; or equivalently, the {k}th binary digit of {x} is zero. Each symmetric difference {E_j\triangle E_k} has measure exactly {1/2}. Therefore, {\delta_n\ge 1/2} for every {n}.

Upper bound for distances

To get an upper bound for {\delta_n}, suppose {\{E_1, \dots, E_n\}} are {\delta}-separated. Let {\chi_k} be the characteristic function of {E_k}. By assumption, {\int_0^1 |\chi_k-\chi_j|\ge \delta} whenever {1\le k < j \le n}. Summing over all such pairs {(k, j)} shows that {\int_0^1 \sum_{k<j} |\chi_k-\chi_j|\ge \delta n(n-1)/2}. Fix a point {x\in [0, 1]} and let {p} be the number of sets among {E_1, \dots, E_n} that contain {x}. An easy counting argument shows {\sum_{k<j} |\chi_k(x)-\chi_j(x)| = p(n-p)}. The latter quantity is bounded by {r(n-r)} where {r = \lceil n/2\rceil}. Since this bound holds for every {x}, it follows that {\delta n(n-1)/2 \le r(n-r)}, which simplifies to

{\displaystyle \delta \le \frac{r}{2r-1}}, where {r = \lceil n/2\rceil}.

For example, {\delta_4\le \delta_3\le 2/3}, {\delta_6\le \delta_5\le 3/5}, {\delta_8\le \delta_7\le 4/7}, and so on. Are these estimates sharp? For an affirmative answer it would be enough to demonstrate that {\delta_{2r}\ge r/(2r-1)} by finding a suitable collection of {2r} subsets. Note that sharpness requires that almost every point of {[0, 1]} be covered by exactly {r} of these subsets, and that all pairwise distances between sets must be equal to {r/(2r-1)}.

Sharpness of the upper bound

Four subsets with {2/3} separation are easy to find: {[0,1/3]}, {[1/3, 2/3]}, {[2/3, 1]}, and {[0, 1]}. Thus, {\delta_4=2/3}.

To find six subsets with {3/5} separation, some preparation is helpful.

For any set {A}, the map {E\mapsto E\triangle A} is an isometry of {X} onto {X}. So, there is no loss of generality in assuming that one of our {2r} sets is {[0, 1]}. Then we are to find {E_1,\dots, E_{2r-1}} of measure {1-r/(2r-1) = (r-1)/(2r-1)}. In order to have the specified distances, the measure of intersection of any two sets must be {(r-2)/(4r-2)}.

For example, if {r=2}, the above says we need three disjoint sets of measure {1/3}, which is a trivial thing to find. For {r>2} the sets will no longer be disjoint.

Discretization of the problem

A natural way to proceed is to partition {[0, 1]} into {4r-2} equal subintervals (or other sets of equal measure). Then we look for {2r-1} sets, each of which consists of {(4r-2)(r-1)/(2r-1) = 2r-2} subintervals. Their pairwise intersections must consist of {(4r-2)(r-2)/(4r-2) = r-2} subintervals.

The case {r=2} was done above. If {r=3}, we want to use {10} subintervals to form {5} sets with {4} subintervals in each. Pairwise intersections must consist of precisely {1} subinterval. Encoding subintervals by their indices 0-9, we can describe such sets as 0123, 3456, 6780, 1479, 2589. And since 10 is a triangular number, there is a neat illustration of these five sets: three sides of the triangle, and two “stars” connecting its center to the sides.

I admit it could be neater

There is no need to translate this back into sets formed by subintervals: as long as the combinatorial problem has a solution, we have the desired conclusion. In this case, the conclusion is {\delta_6=3/5}.

Next up, {r=4}. Following the above method, we partition {[0, 1]} into {4r-2 = 14} equal subintervals and look for {7} sets, each of which consists of {6} subintervals. Pairwise intersections must consist of {2} subintervals each. Can this be done?

Think of a cube. It has {8} vertices and {6} faces, and {8+6=14}. In the set {faces and vertices}, we choose {7} subsets of cardinality {6} as follows:

  • {face, all its vertices, and the opposite face}. There are 6 such sets. Any two of them have exactly 2 elements in common.
  • {all faces} is our 7th set, it has 2 elements in common with each of the above.

Thus, {\delta_8 = 4/7}. But it is concerning that {\delta_6} and {\delta_8} required different combinatorial constructions. What happens next?

For {r=5}, we partition {[0, 1]} into {18} equal subintervals and look for {9} sets, with {8} subintervals in each. Pairwise intersections must consist of {3} subintervals. I don’t see any neat construction here.

Summary so far

The results obtained so far for {r=2, 3, 4} can be summarized as follows: there exists a 0-1 matrix {B_r} of size {(2r-1)\times (4r-2)} in which the dot product of two rows is {r-2} when the rows are distinct and {2r-2} otherwise. Specifically,

{\displaystyle B_2 = \begin{pmatrix} 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \end{pmatrix} }

{\displaystyle B_3 = \begin{pmatrix} 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 1 \\ \end{pmatrix}}

The matrix {B_4} is best written in block form {[F | V]} according to the faces-vertices interpretation. Here

{\displaystyle F = \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 \end{pmatrix}} and {\displaystyle V = \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \end{pmatrix}}

Does {B_5} exist? Perhaps not.

Combinatorics of finite sets

In combinatorial terms, we are looking for a {k}-uniform {\ell}-intersecting family of {m} subsets of {[n]}, with specific values

{k = 2r-2}, {\ell=r-2}, {m=2r-1}, {n=4r-2}.

Explanation of terms: {[n]=\{1, 2, \dots, n\}}, being {k}-uniform means every set has {k} elements, and being {\ell}-intersecting means all pairwise intersections have {\ell} elements.

Thus began my fishing expedition into the literature on combinatorics of finite sets. Surprisingly, it ended with a pivot back to where it started: measurable subsets of an interval.

Fractional intersecting uniform families

A recent (2018) preprint Uniform sets in a family with restricted intersections (Yandong Bai, Binlong Li, Jiuqiang Liu, Shenggui Zhang) relates the combinatorial problem back to subsets of an interval. I quote the relevant result (Theorem 2.5) with slight changes of notation.


Let {X=[0,n]} be the real interval and let {\lambda} be the Lebesgue measure on {X}. If {\mathcal{F}=\{F_1,\ldots,F_m\}} is a family of subsets of {X} such that

  1. {\lambda(F_i)=k} for all {i\in [m]}, and
  2. {\lambda(F_i\cap F_j)=\ell} for all distinct {i,j\in[m]},

then we call {\mathcal{F}} a fractional {\ell}-intersecting {k}-uniform {m}-family of {X}. For given real numbers {n,\ell} and integer {m}, let {\kappa_{\ell}^{\text{frac}}(n,m)} be the largest real number {k} such that there exists a fractional {\ell}-intersecting {k}-uniform {m}-family of {[0,n]}.

Theorem 2.5 Let {n>\ell\ge 0} be two real numbers and let {m\ge 1} be an integer. Then

{\displaystyle \kappa_{\ell}^{\text{frac}}(n,m)={m-1\choose s-1}\alpha+{m-1\choose t-1}\beta, }

where {\displaystyle s=\left\lfloor\frac{1+\sqrt{1+4m(m-1)\ell/n}}{2}\right\rfloor}, {\displaystyle t=\left\lceil\frac{1+\sqrt{1+4m(m-1)\ell/n}}{2}\right\rceil}

and {(\alpha,\beta)} is the solution of the system

{\displaystyle \binom{m}{s}\alpha+\binom{m}{t}\beta=n}

{\displaystyle \binom{m-2}{s-2}\alpha+\binom{m-2}{t-2}\beta=\ell}.


This looks complicated, but with our values {\ell=r-2}, {m=2r-1}, {n=4r-2} both {s} and {t} turn out to be equal to {r-1}. This is an exceptional case when the linear system is degenerate: it amounts to {\alpha+\beta = (4r-2)/\binom{2r-1}{r-1}}. The final result is {\kappa_{\ell}^{\text{frac}}(n,m)=2r-2} as we wanted. Yay.

So, the upper bounds for distances are indeed sharp, meaning that {n} subsets of the unit interval can be separated by measure-distance of {r/(2r-1)}, where {r = \lceil n/2\rceil}. But dividing the intervals into {4r-2} equal parts may not be enough to construct such subsets: in general one has to use a larger number of parts, some integer multiple of {4r-2}.

The authors remark that if {\alpha, \beta} are integers, then a partition with {k=\kappa^{\text{frac}}} exists in the combinatorial realm of {\{1, \dots, n\}}. However, this is not “if and only if”. Indeed, {\alpha+\beta = (4r-2)/\binom{2r-1}{r-1}} which takes integer values when {r=1, 2, 3}, and is fractional (less than 1) when {r\ge 4}. But we do have a combinatorial solution when {r=4}. So it is not quite clear for what values of {r} there exists a {(2r-2)}-uniform {(r-2)}-intersecting family of {2r-1} subsets of {[4r-2]}. Perhaps an answer could be found by following the construction for this specific case, when the formulas magically simplify.

Weak convergence in metric spaces

Weak convergence in a Hilbert space is defined as pointwise convergence of functionals associated to the elements of the space. Specifically, {x_n\rightarrow x} weakly if the associated functionals {f_n(y) = \langle x_n, y \rangle} converge to {f(y) = \langle x_n, y \rangle} pointwise.

How could this idea be extended to metric spaces without linear structure? To begin with, {f_n(y)} could be replaced with {\|x_n-y\|^2-\|x_n\|^2}, since this agrees with original {f_n} up to some constant terms. Also, {\|x_n\|^2} here could be {\|x_n-z\|^2} for any fixed {z}; to avoid introducing another variable here, let’s use {x_1} for the purpose of fixed reference point {z}. Now we have a metric space version of the weak convergence: the functions
{f_n(y) = d(x_n,y)^2 - d(x_n,x_1)^2}
must converge pointwise to
{f(y) = d(x,y)^2 - d(x,x_1)^2}

The triangle inequality shows that strong convergence {d(x_n,x)\rightarrow 0} implies weak convergence, as expected. And the converse is not necessarily true, as the example of a Hilbert space shows.


Aside: the above is not the only way to define weak convergence in metric spaces. Another approach is to think of {\langle x_n , y\rangle} in terms of projection onto a line through {y}. A metric space version of this concept is the nearest-point projection onto a geodesic curve. This is a useful approach, but it is only viable for metric spaces with additional properties (geodesic, nonpositive curvature).

Also, both of these approaches take the Hilbert space case as the point of departure, and do not necessarily capture weak convergence in other normed spaces.


Let’s try this out in {\ell^1} with the standard basis sequence {e_n}. Here {f_n(y) = \|e_n-y\|^2 - 4 \rightarrow (\|y\| + 1)^2 - 4}. Is there an element {x\in \ell^1} such that

{\|x-y\|^2 - \|x-e_1\|^2= (\|y\|+1)^2 - 4} for all {y}?

Considering both sides as functions of one variable {y_n}, for a fixed {n}, shows that {x_n=0} for {n}, because the left hand side is non-differentiable at {y_n=x_n} while the right hand side is non-differentiable at {y_n=0}. Then the desired identity simplifies to {\|y\|^2 - 1 = (\|y\|+1)^2 - 4} which is false. Oh well, that sequence wasn’t weakly convergent to begin with: by Schur’s theorem, every weakly convergent sequence in {\ell^1} also converges strongly.

This example also shows that not every bounded sequence in a metric space has a weakly convergent subsequence, unlike the way it works in Hilbert spaces.

Rough isometries

An isometry is a map {f\colon X\rightarrow Y} between two metric spaces {X,Y} which preserves all distances: {d_Y(f(x),f(x')) = d_X(x,x')} for all {x,x'\in X}. (After typing a bunch of such formulas, one tends to prefer shorter notation: {|f(x)f(x')| = |xx'|}, with the metric inferred from contexts.) It’s a popular exercise to prove that every isometry from a compact metric space into itself is surjective.

A rough isometry allows additive distortion of distances: {|\,|f(x)f(x')| - |xx'|\,|\le \epsilon}. (As contrasted with bi-Lipschitz maps, which allow multiplicative distortion). Rough isometries ignore small-scale features (in particular, they need not be continuous), but accurately capture the large scale structure of a space. This makes them convenient for studying hyperbolic spaces (trees and their relatives), where the large-scale structure is of interest.

If {f} is an {\epsilon}-rough isometry, then the Gromov-Hausdorff distance {d_{GH}} between {X} and {f(X)} is at most {\epsilon/2}. This follows from a convenient characterization of {d_{GH}}: it is equal to {\frac12 \inf_R \textrm{dis}\,(R)} where the infimum is taken over all subsets {R\subset X\times Y} that project surjectively onto each factor, and {\textrm{dis}\,(R) = \sup \{|\,|xx'|-|yy'|\,| \colon xRy, \ x'Ry' \} }.


Since small-scale features are ignored, it is not quite natural to talk about rough isometries being surjective. A natural concept for them is {\epsilon}-surjectivity, meaning that every point of {Y} is within distance {\le \epsilon} of {f(X)}. When this happens, we can conclude that {d_{GH}(X,Y)\le 3\epsilon/2}, because the Hausdorff distance between {Y} and {f(X)} is at most {\epsilon}.

Recalling that an isometry of a compact metric space {X} into {X} is automatically onto, it is natural to ask whether {\epsilon}-rough isometries of {X} into {X} are {\epsilon}-surjective. This, however, turns out to be false in general.


First example, a finite space: {X=\{0,1,2,\dots,n\}} with the metric {d(i,j) = i+j} (when {i\ne j}). Consider the backward shift {f(i)=i-1} (and {f(0)=0}). By construction, this is a rough isometry with {\epsilon=2}. Yet, the point {n} is within distance {n} from {f(X) = \{0,1,2,\dots, n-1\}}.

The above metric can be thought of a the distance one has to travel from {i} to {j} with a mandatory visit to {0}. This makes it similar to the second example.


Second example, a geodesic space: {X} is the graph shown below, a subset of {\mathbb R^2} with the intrinsic (path) metric, i.e., the length of shortest path within the set.

Comb space
Comb space

Define {f(x,y) = ((x-1)^+, (y-1)^+)} where {{\,}^+} means the positive part. Again, this is a {2}-rough isometry. The omitted part, shown in red below, contains a ball of radius {5} centered at {(5,5)}. Of course, {5} can be replaced by any positive integer.

Omitted part in red
Omitted part in red


Both of these examples are intrinsically high-dimensional: they cannot be accurately embedded into {\mathbb R^d} for low {d} (using the restriction metric rather than the path metric). This raises a question: does there exist {C=C(d)} such that for every compact subset {K\subset \mathbb R^d}, equipped with the restriction metric, every {\epsilon}-rough isometry {f\colon K\rightarrow K} is {C(d)\epsilon}-surjective?

Binary intersection property, and not fixing what isn’t broken

A metric space has the binary intersection property if every collection of closed balls has nonempty intersection unless there is a trivial obstruction: the distance between centers of two balls exceeds the sum of their radii. In other words, for every family of points {x_\alpha\in X} and numbers {r_\alpha>0} such that {d(x_\alpha,x_\beta)\le r_\alpha+r_\beta} for all {\alpha,\beta} there exists {x\in X} such that {d(x_\alpha,x)\le r_\alpha} for all {\alpha}.

For example, {\mathbb R} has this property: {x=\inf_\alpha (x_\alpha+r_\alpha)} works. But {\mathbb R^2} does not:

Failure of the binary intersection property
Failure of the binary intersection property

The space of bounded sequences {\ell^\infty} has the binary intersection property, and so does the space {B[0,1]} of all bounded functions {f:[0,1]\rightarrow\mathbb R} with the supremum norm. Indeed, the construction for {\mathbb R} generalizes: given a family of bounded functions {f_\alpha} and numbers {r_\alpha>0} as in the definition, let {f(x)=\inf_\alpha (f_\alpha(x)+r_\alpha)}.


The better known space of continuous functions {C[0,1]} has the finite version of binary intersection property, because for a finite family, the construction {\inf_\alpha (f_\alpha(x)+r_\alpha)} produces a continuous function. However, the property fails without finiteness, as the following example shows.

Example. Let {f_n\in C[0,1]} be a function such that {f_n(x)=-1} for {x\le \frac12-\frac1n}, {f_n(x)=1} for {x\ge \frac12+\frac1n}, and {f_n} is linear in between.

Since {\|f_n-f_m\| \le 1} for all {n,m}, we can choose {r_n=1/2} for all {n}. But if a function {f} is such that {\|f-f_n\|\le \frac12} for all {n}, then {f(x) \le -\frac12} for {x<\frac12} and {f(x) \ge \frac12} for {x>\frac12}. There is no continuous function that does that.

More precisely, for every {f\in C[0,1]} we have {\liminf_{n\to\infty} \|f-f_n\|\ge 1 } because {f(x)\approx f(1/2)} in a small neighborhood of {1/2}, while {f_n} change from {1} to {-1} in the same neighborhood when {n} is large.


Given a discontinuous function, one can approximate it with a continuous function in some way: typically, using a mollifier. But such approximations tend to change the function even if it was continuous to begin with. Let’s try to not fix what isn’t broken: look for a retraction of {B[0,1]} onto {C[0,1]}, that is a map {\Phi:B[0,1]\rightarrow C[0,1]} such that {\Phi(f)=f} for all {f\in C[0,1]}.

The failure of binary intersection property, as demonstrated by the sequence {(f_n)} above, implies that {\Phi} cannot be a contraction. Indeed, let {f(x)= \frac12 \,\mathrm{sign}\,(x-1/2)}. This is a discontinuous function such that {\|f-f_n\|\le 1/2} for all {n}. Since {\liminf_{n\to\infty} \|\Phi(f)-f_n\|\ge 1}, it follows that {\Phi} cannot be {L}-Lipschitz with a constant {L<2}.


It was known for a while that there is a retraction from {B[0,1]} onto {C[0,1]} with the Lipschitz constant at most {20}: see Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss. In a paper published in 2007, Nigel Kalton proved the existence of a {2}-Lipschitz retraction.

Graphical convergence

The space of continuous functions (say, on {[0,1]}) is usually given the uniform metric: {d_u(f,g) = \sup_{x}|f(x)-g(x)|}. In other words, this is the smallest number {\rho} such that from every point of the graph of one function we can jump to the graph of another function by moving at distance {\le \rho} in vertical direction.

Uniform metric: vertical distances
Uniform metric is based on vertical distances

Now that I put it this way, why don’t we drop “in vertical direction”? It’ll still be a metric, namely the Hausdorff metric between the graphs of {f} and {g}. It’s natural to call it the graphical metric, denoted {d_g}; from the definition it’s clear that {d_g\le d_u}.

Graphical metric: Hausdorff distance
Weird metric, weird color

Some interesting things happen when the space of continuous functions is equipped with {d_g}. For one thing, it’s no longer a complete space: the sequence {f_n(x)=x^n} is Cauchy in {d_g} but has no limit.

Sequence of x^n
Sequence of x^n

On the other hand, the bounded subsets of {(C[0,1],d_g) } are totally bounded. Indeed, given {M>0} and {\epsilon>0} we can cover the rectangle {[0,1]\times [-M,M]} with a rectangular mesh of diameter at most {\epsilon}. For each function with {\sup|f|\le M}, consider the set of rectangles that its graph visits. There are finitely many possibilities for the sets of visited rectangles. And two functions that share the same set of visited rectangles are at graphical distance at most {\epsilon} from each other.

Boxy approximation to the graph
Boxy approximation to the graph

Thus, the completion of {C[0,1]} in the graphical metric should be a nice space: bounded closed subsets will be compact in it. What is this completion, concretely?

Here is a partial answer: if {(f_n)} is a graphically Cauchy sequence, its limit is the compact set {\{(x,y): g(x)\le y\le h(x)\}} where

\displaystyle    g(x) = \inf_{x_n\rightarrow x} \liminf f_n(x_n)

(the infimum taken over all sequences converging to {x}), and

\displaystyle    h(x) = \sup_{x_n\rightarrow x} \limsup f_n(x_n)

It’s not hard to see that {g} is upper semicontinuous and {h} is lower semicontinuous. Of course, {g\le h}. It seems that the set of such pairs {(g,h)} indeed describes the graphical completion of continuous functions.

For example, the limit of {f_n(x)=x^n} is described by the pair {g(x)\equiv 0}, {h(x)=\chi_{\{1\}}}. Geometrically, it’s a broken line with horizontal and vertical segments

For another example, the limit of {f_n(x)=\sin^2 nx} is described by the pair {g(x)\equiv 0}, {h(x)\equiv 1}. Geometrically, it’s a square.

Stripey sines
Stripey sines

Infinite beatitude of non-existence: a journey into Nothingland

In the novella Flatland by Edwin A. Abbott, the Sphere leads the Square “downward to the lowest depth of existence, even to the realm of Pointland, the Abyss of No dimensions”:

I caught these words, “Infinite beatitude of existence! It is; and there is nothing else beside It.” […] “It fills all Space,” continued the little soliloquizing Creature, “and what It fills, It is. What It thinks, that It utters; and what It utters, that It hears; and It itself is Thinker, Utterer, Hearer, Thought, Word, Audition; it is the One, and yet the All in All. Ah, the happiness, ah, the happiness of Being!”

Indeed, Pointland (a one-point space) is zero-dimensional by every concept of dimension that I know of. Yet there is something smaller: Nothingland — empty space, {\varnothing} — whose non-existent inhabitants must be perpetually enjoying the happiness of Non-Being.

What is the dimension of Nothingland?

In topology, the empty set has dimension {-1}. This fits the inductive definition of topological dimension, which is the smallest number {d} such that the space can be minced by removing a subset of dimension {\le d-1}. (Let’s say a space has been minced if what’s left has no connected subsets other than points.)

Thus, a nonempty finite (or countable) set has dimension {0}: it’s minced already, so we remove nothing, a set of dimension {-1}. A line or a curve is one-dimensional: they can be minced by removing a zero-dimensional subset, like rational numbers.

A curve minced by removing a zero-dimensional set
One-dimensional curve minced by removing zero-dimensional points.

The Flatland itself can be minced by removing a one-dimensional subset (e.g., circles with rational radius and rational coordinates of the center), so it is two-dimensional. And so on.

Flatland minced by removing a one-dimensional subset
Flatland minced by removing a one-dimensional subset

The convention {\mathrm{dim}\,\varnothing = -1}, helpful in the definition, gets in the way later. For example, the topological dimension is subadditive under products: {\mathrm{dim}\,(A\times B)\le \mathrm{dim}\,A + \mathrm{dim}\,B} … unless both {A} and {B} are empty, because then {-1\le -2} is false. So the case {A=B=\varnothing} must be excluded from the product theorem. We would not have to do this if {\mathrm{dim}\,\varnothing } was defined to be {-\infty}.

Next, consider the Hausdorff dimension. Its definition is not inductive, but one has to introduce other concepts first. First, define the {d}-dimensional premeasure on scale {\delta>0}:

\displaystyle    \mathcal H^d_\delta (X) = \inf \sum_j (\mathrm{diam}\,{U_j})^{d}

where the infimum is taken over all covers of {X} by nonempty subsets {U_j} with {\mathrm{diam}\,{U_j}\le \delta}. Requiring {U_j} to be nonempty avoids the need to define the diameter of Nothingland, which would be another story. The empty space can be covered by empty family of nonempty subsets. The sum of empty set of numbers is {0}, and so {\mathcal H^d_\delta (\varnothing) = 0}.

Then we define the {d}-dimensional Hausdorff measure:

\displaystyle    \mathcal H^d (X) = \lim_{\delta\rightarrow0} \mathcal H^d_\delta (X)

and finally,

\displaystyle    \mathrm{dim}_H (X) = \inf \{ d \colon \mathcal H^d (X)=0\}

If in this last infimum we require {d>0}, the result is {\mathrm{dim}_H (\varnothing) =0}. But why make this restriction? The {d}-dimensional pre-measures and measures make sense for all real {d}. It’s just that for nonempty {X}, we are raising some small (or even zero) numbers to negative power, getting something large as a result. Consequently, every nonempty space has {\mathcal H^d = \infty} for all {d < 0}.

But {\mathcal H^d_\delta (\varnothing) = 0}, from the sum of empty collection of numbers being zero. Hence, {\mathcal H^d (\varnothing) = 0} for all real {d}, and this leads to {\mathrm{dim}_H\,\varnothing = -\infty}.

To have {\mathrm{dim}_H\,\varnothing = -\infty} is also convenient because the Hausdorff dimension is superadditive under products: {\mathrm{dim}_H\,(A\times B)\ge \mathrm{dim}_H\,A + \mathrm{dim}_H\,B}. This inequality was proved for general metric spaces as recently as 1995, by John Howroyd. If we don’t have {\mathrm{dim}_H\,\varnothing = -\infty}, then both factors {A} and {B} must be assumed nonempty.

So… should Nothingland have topological dimension {-1} and Hausdorff dimension {-\infty}? But that would violate the inequality {\mathrm{dim} (X)\le \mathrm{dim}_H (X)} which holds for every other separable metric space. In fact, for such spaces the topological dimension is simply the infimum of the Hausdorff dimension over all metrics compatible with the topology.

I am inclined to let the dimension of Nothingland be {-\infty} for every concept of dimension.

Polygonal inequalities: beyond the triangle

(Related to previous post but can be read independently). The triangle inequality, one of the axioms of a metric space, can be visualized by coloring the vertices of a triangle by red and blue.

Triangle with colored vertices
Colored Triangle

The inequality says that the sum of monochromatic distances must not exceed the sum of dichromatic distances. That is, for every assignment of the vertices to points in the space, the sum of all red-red and blue-blue distances does not exceed the sum of red-blue distances. An assignment is just a map from the set of vertices {V} to the metric space {X}; it need not be injective.

But why stop at the triangle? We can take any number of points (vertices), color them in some way, and require the same polygonal inequality:

\displaystyle \text{monochromatic } \le \text{ dichromatic}

Already for the square we have two distinct plausible colorings to explore: even red-blue split

Evenly colored square
Evenly Colored Square

and predominantly red square

(mostly) Red Square
(mostly) Red Square

But it turns out that the second coloring is useless: the inequality it defines fails in every metric space with more than one point. More generally, suppose we have {R} red points and {B} blue ones, and {R- B\ge 2}. Pick two distinct points {a,b\in X}. Assign one red point to {a} and all others to {b}. The sum of monochromatic distances is {(R-1)\,d(a,b)} while the sum of dichromatic distances is {B\,d(a,b)}, which is strictly less.

So, we are limited to nearly-even colorings: those with {|R-B|\le 1}. For even numbers of vertices this means even split, while odd numbers should be divided as evenly as possible: like 3+2 for the pentagon.

Here is the pentagram again.
Colored Pentagon

The inequalities turn out to be related. For every {n}, the {n}-gonal inequality implies the {(n-2)}-gonal inequality, because if we assign two opposite-colored vertices to the same point, their contributions cancel out. More interestingly: when {n} is odd, the {n}-gonal inequality implies the {(n-1)}-gonal inequality. Indeed, suppose we have {(n-1)} points, evenly colored. Add an arbitrary {n}th point. Whether the added point is blue or red, the {n}-gonal inequality holds. Averaging these two inequalities, we see the effect of the added points canceling out.

So, if the {n}-gonal inequality holds for all odd {n}, it holds for all {n}. This property is exactly the hypermetric property from the previous post. Except it was stated there in a different form:

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. But if the point {x_i} is repeated {|b_i|} times, we can replace {b_i} by {\mathrm{sign}\,b_i}. Then represent +1 as blue and -1 as red.

The hypermetric inequalities were introduced by John B. Kelly (a student of William Ted Martin) in late 1960s. He showed they are necessary for the space to be embeddable into the space {\ell^1}. It would be great if they were also sufficient (and for some classes of spaces they are), but this is not so: a counterexample was given by Patrice Assouad in 1977.

It is also interesting to consider the {n}-gonal inequalities for even {n} only. By repetition of vertices, they are equivalent to requiring

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \quad\quad \quad (1)

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=0}. But of course then we have (1) for rational {b_i} (just clear the denominators), hence for all real {b_i} (by approximation), as long as they sum to {0}. So, the requirement amounts to the matrix {(d(x_i,d_j))} being negative semidefinite on the subspace {\sum x_i=0}. Such metrics are called metrics of negative type.

Their relation to embeddability of the space is well-known: {(X,d)} is of negative type if and only if the “snowflake” {(X,\sqrt{d})} isometrically embeds into a Hilbert space. In other words, we can “draw” any finite metric space of negative type in a Euclidean space, with the understanding that Euclidean distances represent the square roots of the actual distances. This embedding result is a 1935 theorem of Isaac Schoenberg who is also known for connecting dots naturally (introducing splines).

Pentagrams and hypermetrics

The Wikipedia article Metric (mathematics) offers a plenty of flavors of metrics, from common to obscure: ultrametric, pseudometric, quasimetric, semimetric, premetric, hemimetric and pseudoquasimetric (I kid you not).

One flavor it does not mention is a hypermetric. This is a metric {d} on a set {X} such that the inequality

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \ \ \ \ \ \ \ \ \ (1)

holds for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. The requirement that {b_i} be integers gives some combinatorial meaning to (1); this is not just some quadratic form being negative semidefinite.

As a warm-up, observe that (1) contains in it the triangle inequality: with {b_1=b_2=1} and {b_3=-1} we get {d(x_1,x_2)-d(x_1,x_3)-d(x_2,x_3)\le 0}. But it appears that (1) says something about “polygons” with more vertices too.

To make (1) worth thinking about, it should be satisfied by some important metric space. Such as the real line {\mathbb R}, for example. It is not quite obvious that the inequality

\displaystyle  \sum_{i,j}b_i b_j |a_i - a_j| \le 0 \ \ \ \ \ \ \ \ \ (2)

holds for all reals {a_i} and all integers {b_i} adding up to {1}. It helps to order the numbers: {a_1\le \dots\le a_m} and focus on the contribution of a particular gap {[a_k,a_{k+1}]} to the sum (2). The amount it contributes is {|a_k-a_{k+1}|} multiplied by

\displaystyle    \sum_{i\le k<j} b_i b_j = \left(\sum_{i\le k} b_i \right) \left(\sum_{j > k} b_j \right) = \left(\sum_{i\le k} b_i \right) \left(1-\sum_{i\le k} b_i \right) \le 0

because {n(1-n)\le 0} for every integer {n}. This proves (2).

Now that we have one hypermetric space, {\mathbb R}, other such spaces can be created easily. If {X} is any set and {f \colon X\rightarrow\mathbb R} any function, consider {d(x,y) = |f(x)-f(y)|}, the pullback pseudometric on {X}. By applying (2) to the numbers {f(x_i)}, we see that {d} satisfies the hypermetric inequality. Since (1) is additive in {d}, we can take any family of functions {f_\alpha \colon X\rightarrow\mathbb R} and add together the corresponding pseudometrics. Or even integrate them against a positive measure: {d(x,y)=\int |f_\alpha(x)-f_\alpha(y)|\,d\mu(\alpha)}.

For example, the plane {\mathbb R^2} is a hypermetric space, because the distance between two points {(x_1,y_1)} and {(x_2,y_2)}, besides the familiar form

\displaystyle  \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}

can also be represented as an integral of the aforementioned pullbacks:

\displaystyle  \frac12 \int_0^\pi \big| (x_1-x_2)\cos \alpha + (y_1-y_2) \sin\alpha \big| \,d\alpha

A similar integral representation holds in all dimensions; thus, all Euclidean spaces are hypermetric.

Okay, what is not a hypermetric then? For example, the cube distance induced by the norm {\|x\|_\infty=\max |x_i|} is not, in dimensions 3 and higher. Specifically, (1) fails as the five-point inequality with {(b_1,\dots,b_5) =(1,1,1,-1,-1)}. I’ll call it the pentagram inequality:

also known as K_5 in mystical literature
also known as K_5 in mystical literature

It says that for any five points in the space the sum of monochromatic distances does not exceed the sum of all bi-chromatic (red-blue) distances.

The pentagram inequality fails when {x_1,\dots,x_5} are the columns of the matrix

\displaystyle  \begin{pmatrix}    1& 1& 1& 2& 0\\    0& 2& 2& 1& 1\\    0& 1& 0& 1& 0\\    \end{pmatrix}

(first three columns blue, the last two red). Indeed, the sum of monochromatic distances is {2+1+2+2=7} while the sum of bichromatic distances is {1+1+1+1+1+1=6}.

If the above example does not look conceptual enough, it’s because I found it via computer search. I don’t have much intuition for the pentagram inequality.

Anyway, the example delivers another proof that taking the maximum of three numbers is hard. More precisely, there is no isometric embedding of {\mathbb R^3} with the maximum metric into {\ell_1}. Unlike the earlier proof, this one does not assume the embedding is linear.

A good reference for hypermetric inequalities is the book Geometry of cuts and metrics by Deza and Laurent.

Re: “How many sides does a circle have?”

The post is inspired by this story told by JDH at Math.SE.


My third-grade son came home a few weeks ago with similar homework questions:

How many faces, edges and vertices do the following
have?

  • cube
  • cylinder
  • cone
  • sphere

Like most mathematicians, my first reaction was that for the latter objects the question would need a precise definition of face, edge and vertex, and isn’t really sensible without such definitions.

But after talking about the problem with numerous people, conducting a kind of social/mathematical experiment, I observed something intriguing. What I observed was that none of my non-mathematical friends and acquaintances had any problem with using an intuitive geometric concept here, and they all agreed completely that the answers should be

  • cube: 6 faces, 12 edges, 8 vertices
  • cylinder: 3 faces, 2 edges, 0 vertices
  • cone: 2 faces, 1 edge, 1 vertex
  • sphere: 1 face, 0 edges, 0 vertices

Indeed, these were also the answers desired by my son’s teacher (who is a truly outstanding teacher). Meanwhile, all of my mathematical colleagues hemmed and hawed about how we can’t really answer, and what does “face” mean in this context anyway, and so on; most of them wanted ultimately to say that a sphere has infinitely many faces and infinitely many vertices and so on. For the homework, my son wrote an explanation giving the answers above, but also explaining that there was a sense in which some of the answers were infinite, depending on what was meant.

At a party this past weekend full of mathematicians and philosophers, it was a fun game to first ask a mathematician the question, who invariably made various objections and refusals and and said it made no sense and so on, and then the non-mathematical spouse would forthrightly give a completely clear account. There were many friendly disputes about it that evening.


Let’s track down this intuitive geometric concept that non-mathematicians possess. We are given a set {E\subset \mathbb R^n} and a point {v\in E}, and try to figure out whether {v} is a vertex, a part of an edge, or a part of a face. The answer should depend only on the shape of the set near {p}.

It is natural to say that a vector {v} is tangent to {E} at {p} if going along {v} we stay close to the set. Formally, the condition is {\lim_{t\to 0+} t^{-1}\,\mathrm{dist}\,(p+tv,E)=0}. Notice that the limit is one-sided: if {v} is tangent, {-v} may or may not be tangent.

The set of all tangent vectors to {E} at {p} is denoted by {T_pE} and is called the tangent cone. It is indeed a cone in the sense of being invariant under scaling. This set contains the zero vector, but need not be a linear space. Let’s say that the rank of point {p} is {k} if {T_pE} contains a linear space of dimension {k} but no linear space of dimension {k+1}.

Finally, define a rank {k} stratum of {E} as a connected component of the set of all points of rank {k}.

If {E} is the surface of a polyhedron, we get the familiar concepts of vertices (rank 0 strata), edges (rank 1) and faces (rank 2). For each of the homework solids the answer agrees with the opinion of non-mathematical crowd. Take the cone as an example:

Cone
Cone

At the vertex the tangent cone to the cone is… a cone. It contains no nontrivial linear space, hence the rank is 0. This is indeed a vertex.

Along the edge of the base the tangent cone is the union of two halfplanes:

Tangent cone at an edge point
Tangent cone at an edge point

Here the rank is 1: the tangent cone contains a line, but no planes.

Finally, at every point of smoothness the tangent cone is the tangent plane, so the rank is 2. The set of such points has two connected components, separated by the circular edge.

So much for the cone. As for the circle mentioned in the title, I regrettably find myself in agreement with Travis.


More seriously: the surface of a convex body is a classical example of an Alexandrov space (metric space of curvature bounded below in the triangle comparison sense). Perelman proved that any Alexandrov space can be stratified into topological manifolds. Lacking an ambient vector space, one obtains tangent cones by taking the Gromov-Hausdorff limit of blown-up neighborhoods of {p}. The tangent cone has no linear structure either — it is also a metric space — but it may be isometric to the product of {\mathbb R^k} with another metric space. The maximal {k} for which the tangent cone splits off {\mathbb R^k} becomes the rank of {p}.

Recently, Colding and Naber showed that the above approach breaks down for spaces which have only Ricci curvature bounds instead of triangle-comparison curvature. More precisely, their examples are metric spaces that arise as a noncollapsed limit of manifolds with a uniform lower Ricci bound. In this setting tangent cones are no longer uniquely determined by {p}, and they show that different cones at the same point may have different ranks.

Four-point CAT(0) condition

The definition of a metric space guarantees that if we pick three points {A,B,C} from it, we can draw a triangle in the Euclidean plane, with vertices labeled {A,B,C} so that the side lengths are equal to the corresponding distances {|AB|, |BC|, |CA|} in the metric space. In other words, we can represent any triple of points as vertices of a triangle.

Triangle representation
Triangle representation

The above paragraph is from Tight spans and Gromov hyperbolicity, but here the similarity ends. Moving on to four points {A,B,C,D} we find that it is possible to draw a quadrilateral with side lengths equal to the corresponding distances
{|AB|, |BC|, |CD|, |DA|}.

Quadrilateral with prescribed side lengths
Quadrilateral with prescribed side lengths

Except for the degenerate case (one length equal to the sum of three others), the quadrilateral is not rigid: one can change the angles while keeping the sidelengths the same.

Since the quadrilateral has one degree of freedom, we cannot hope to have the correct lengths of both diagonals {AC} and {BD}. One of two diagonals can be made to match the metric space distance, just be drawing two triangles and gluing them together. But this is not so interesting.

There is an interesting class of metric spaces for which one can always draw a Euclidean quadrilateral as above so that neither diagonal is too short (they are allowed to be too long). These are called CAT(0) spaces, and they generalize the concept of a Riemannian manifold of nonpositive sectional curvature (more on this later). Every Euclidean space is CAT(0) for obvious reasons.

For example, if a CAT(0) space contains four points with pairwise distances {|AB|=|CD|=2}, {|BC|=|DA|=1}, then {|AC|^2+|BD|^2\le 10}. Indeed, any Euclidean quadrilateral with these sidelengths is a parallelogram, in which the sum of squares of the diagonals is {2\cdot (2^2+1^2)=10}. (It never pays to draw {ABCD} with self-intersections, because that only makes the “diagonals” shorter).

For a non-parallelogram example, take {|AB|=2}, {|BC|=3}, {|CD|=4},and {|DA|=5}. The diagonal lengths {|AC|=4} and {|BD|=6} are allowed by the axioms of a metric space, but not by the CAT(0) condition.

Non-CAT(0) quadrilateral
Non-CAT(0) quadrilateral

This can be checked using a bunch of cosine formulas:

\displaystyle    \cos\angle ADB = \frac{5^2+6^2-2^2}{2\cdot 5\cdot 6} =\frac{19}{20}, \\ {}\\    \cos\angle CDB = \frac{4^2+6^2-3^2}{2\cdot 4\cdot 6} =\frac{43}{48}, \\ {} \\   \cos\angle CDA = \frac{|AD|/2}{|CD|} =\frac{5}{8},

where the last computation is simpler because {\triangle CDA} is isosceles. Since {\angle CDA=\angle CDB+\angle ADB}, the addition law of cosines yields a contradiction.

Let’s see what CAT(0) has to do with curvature. First, a closed curve with intrinsic metric is not a CAT(0) space. Indeed, one can find four points {A,B,C,D} on the curve that are equally spaced at distances {L/4}, where {L} is the length of the curve. Then the corresponding Euclidean quadrilateral must be a rhombus of sidelength {L/4}. One of its diagonals will be at most {L\sqrt{2}/4}, which is less than {L/2} required by the intrinsic metric of the curve.

As a corollary, a CAT(0) space does not have closed geodesics. One can object that compact negatively curved manifolds (e.g., double torus) have plenty of closed geodesics. The answer is that to actually characterize nonnegatively curved spaces, one must impose the CAT(0) condition only locally.

The standard definitions of a CAT(0) space require any two points to be connected by a geodesic, and are stated in terms of geodesics. Here is one of them: for any geodesic {\gamma\colon [a,b]\rightarrow X} and for any fixed point {A\in X} the function

\displaystyle  F(t)=d\,^2(\gamma(t),A)-t^2 \ \ \ \ \ (1)

is convex. (Here {d\,^2} is the square of the distance.)

Distance function on a geodesic
Distance function on a geodesic

The condition (1) may look contrived, until one realizes that in a Euclidean space {F} is always an affine function. Thus, the definition says that the distance function in a CAT(0) space is at least as convex as in a Euclidean space. By the way, (1) indeed implies that {t\mapsto d(\gamma(t),A)} is a convex function. I leave this as an exercise.