Bourgain’s embedding theorem

Let X be a finite metric space, say of cardinality n. For n\ge 4 we cannot expect to have an isometric embedding of X into a Euclidean space; the 4-cycle with path metric cannot be so embedded. So we ask for an embedding f\colon X\to \mathbb R^n with reasonable distortion D(f). The distortion is defined as the product of the Lipschitz norms of f and of its inverse,

\displaystyle D=\mathrm{Lip}(f)\mathrm{Lip}(f^{-1}):=\sup_{a\ne b}\frac{|f(a)-f(b)|}{d_X(a,b)} \cdot \frac{d_X(a,b)}{|f(a)-f(b)|}

So, D(f)=1 if and only if f scales all distances by the same factor (which we can then make equal to 1 by rescaling the target space).

It is easy to isometrically embed X into the normed vector space \ell_{\infty}(X) which consists of all functions X\to\mathbb R with the maximum norm. This (Kuratowski) embedding simply sends each point p\in X into the function \rho_p(x)=d(x,p). It is easy to see that \max |\rho_p-\rho_q|=d(p,q), attained when x\in \lbrace p,q\rbrace. The identity map from \ell_{\infty}(X) to \ell_{2}(X) has distortion \sqrt{n}, and this is the price D(f) we have to pay.

In 1985 Bourgain achieved a dramatic improvement: D(f) \le C\log n, which is the best possible bound apart from the value of the absolute constant C. (It can be taken to be 128.) The starting point is remarkably simple: instead of \rho_p(x)=d(x,p), which measures the distances of p to other points, Bourgain used \rho_p(S)=d(p,S) where S runs over all nonempty proper subsets of X. The fact that the target space now has 2^n-2 dimensions is of no theoretical consequence, since the image of X is always contained in an (n-1) dimensional hyperplane. (From the practical viewpoint, even dimension n is much too high, but the Johnson-Lindenstrauss flattening lemma reduces it greatly.) We can hope to improve upon the Kuratowski embedding because the geometry of X is now measured by more sophisticated probes than singletons.

As before, the map p\mapsto \rho_p is easily seen to be an isometric embedding into \ell_{\infty}(P^X) where P^X is the set of all nonempty proper subsets of X. But using the identity map onto \ell_{2}(P^X) is out of question: its Lipschitz constant is almost 2^{n/2}. Instead, Bourgain applies the diagonal operator T\colon \ell_{\infty}(P^X)\to \ell_{2}(P^X) defined by

\displaystyle T(e_S) := \frac{1}{\sqrt{|S|}} \binom{n}{|S|}^{-1/2}e_S

To understand this better, write \ell_{\infty}(P^X) as the direct sum of spaces \ell_{\infty}(P_k^X) where P_k^X is the family of all k-point subsets of X. The dimension of \ell_{\infty}(P_k^X) is exactly \binom{n}{k}, and the square root appears because we are after the \ell_2 norm.

But it would be better to say that \sqrt{\cdot} appears because \ell_2 is halfway between \ell_\infty and \ell_1. Indeed, Bourgain applies T twice: first as a map \ell_{\infty}(P^X)\to \ell_{2}(P^X) and then as \ell_{2}(P^X)\to \ell_{1}(P^X). The point is that \|T^2\|_{\ell_\infty\to\ell_1} is easy to estimate: decomposing P_X by cardinality as above, we find that the \ell_\infty\to\ell_1 norm of T^2 on k-subsets is at most 1/k, due to the strategically placed factor 1/{\sqrt{|S|}}. Unfortunately, the harmonic series diverges, but fortunately it diverges slowly, yielding the estimate \|T^2\|_{\ell_\infty\to\ell_1}\le \sum_{k=1}^{n-1} \frac{1}{k}\le \log n. This is where the logarithm comes from.

The hard part of the proof is the lower bound \|T^2(\rho_p)-T^2(\rho_q)\|_{\ell_1} \ge c\, d(p,q) with universal c. I will not say anything about that.

Let’s test this on the 4-cycle with the path metric. Split \ell_{\infty}(P^X) by subset cardinality k=1,2,3, as before. Let a,b,c,d denote the vertices. The image of the vertex a is:

  • in \ell_{\infty}(P_1^X): (0,1,2,1): this is the Kuratowski embedding. The singletons are ordered a,b,c,d.
  • in \ell_{\infty}(P_2^X): (0,0,0,1,1,1) where the sets are ordered ab,ac,ad,bc,bd,cd.
  • in \ell_{\infty}(P_3^X): (0,0,0,1). Sets are ordered abc,abd,acd,bcd.

Now we apply T, which means multiplication by \frac{1}{\sqrt{k}} \binom{n}{k}^{-1/2}. The resulting vector

\displaystyle \left(0,\frac{1}{2},1,\frac{1}{2},0,0,0,\frac{1}{\sqrt{12}},\frac{1}{\sqrt{12}},\frac{1}{\sqrt{12}},0,0,0,\frac{1}{\sqrt{6}}\right)

represents the vertex a. In much the same way, the vertices b and c are mapped into

\displaystyle \left(\frac{1}{2},0,\frac{1}{2},1,0,\frac{1}{\sqrt{12}},\frac{1}{\sqrt{12}},0,0,\frac{1}{\sqrt{12}},0,0,\frac{1}{\sqrt{6}},0\right)


\displaystyle \left(1,\frac{1}{2},0,\frac{1}{2},\frac{1}{\sqrt{12}},0,\frac{1}{\sqrt{12}},0,\frac{1}{\sqrt{12}},0,0,\frac{1}{\sqrt{6}},0,0\right)

respectively. And so, |f(a)-f(b)|=\sqrt{15}/3\approx 1.29 while |f(a)-f(c)|=2\sqrt{6}/3\approx 1.633. This is compared to d_X(a,b)=1 and d_X(a,c)=2. The distortion D(f)\approx 1.58 is only slightly worse than for the optimal embedding of X (into the vertices of a square), which has D(f)=\sqrt{2}\approx 1.41. Well, we could hardly expect a general algorithm like this to find the absolutely best way to embed each finite metric space.

Apart from the constant, Bourgain’s theorem cannot be improved for general metric spaces. However, there are better embedding results for special classes of metric spaces (doubling, ultrametric etc.)

Hausdorff extension of Lipschitz functions

The extension appeared in the previous post and is very simple: given a closed set A\subset X and a bounded function f\colon A\to\mathbb R, we define \displaystyle F(x)=\inf_{a\in A} \left(f(a)+\frac{d(x,a)}{d(x,A)}-1\right) for all x\in X\setminus A. It is easy to see that F(x)\to f(c) as x\to c\in A. Indeed, the boundedness of f implies that the infimum is actually taken only over the set A_{x,M}=\lbrace a\in A\colon d(x,a)\le M\,d(x,A)\rbrace where M=1+\sup f-\inf f. As the distance \rho=d(x,c) tends to 0, the set A_{x,M} shrinks at the same rate: indeed, A_{x,M} is contained in the ball of radius M\rho with center c. Thus, F furnishes a continuous extension of f whenever f is continuous on A.

Now suppose that f is Lipschitz on A. Will the extension be Lipschitz? Let’s consider a one-dimensional example with X=\mathbb R, A=[0,5], and f(x)=x on A. When x<5, the extension is

\displaystyle F(x)=\inf_{a\in [0,5]} \left(a+\frac{x-a}{x-5}-1\right)=\frac{5}{x-5}+\inf_{a\in [0,5]} a\,\left(1-\frac{1}{x-5}\right)

When 5<x<6, the infimum is at a=5, hence F(x)=1. When x> 6, it shifts to a=0, which yields F(x)=\frac{5}{x-5}. Here is the plot:

Large Lipschitz constant

While the original function f had Lipschitz constant 1, the extension has Lipschitz constant 5 (attained to the right of the transition point x=6). We can make this progressively worse by replacing 5 with larger numbers. So, the Lipschitz constant of the extension does not admit a bound in terms of \mathrm{Lip}\,(f) alone. The oscillation M=\sup f-\inf f enters the game as well… although it’s not entirely clear that it must. In the example above, there is no apparent reason for F to decrease so quickly (or at all) when x>6.

Tietze and Hausdorff extension formulas

Given a continuous bounded real-valued function f defined on a closed subset A of a metric space X, we would like to extend it to a continuous function F on the entire space X. The first thought is that F(x) should be equal to the value of f at the nearest point of A. But of course such point may fail to exist, or to be unique, or to depend continuously on x. But the idea can be rescued.

Let d(x,x') denote the distance between two points on X. The distance from point x to set A is d(x,A)=\inf_{a\in A} d(x,a). This distance is strictly positive when x\notin A, although the infimum is not necessarily attained.

Tietze (1915, paper by subscription) proved the existence of a continuous extension by setting \displaystyle F(x)=\sup_{a\in A}\left(\frac{f(a)}{1+d(x,a)^2}\right)^{1/d(x,A)} (under the assumption \inf_A f>0, which can be achieved without loss of generality). The idea is that even though the supremum allows a to be any point of A, the “far-away” points have d(x,a)\gg d(x,A) which makes denominator large and thus prevents such points from affecting the supremum. In contrast, when a is the nearest or almost-nearest point, we have d(x,a)\approx d(x,A) and as the distance tends to zero, the denominator approaches 1.

A few years later Hausdorff (1919, paper by subscription) offered a simpler and more natural formula: \displaystyle F(x)=\inf_{a\in A} \left(f(a)+\frac{d(x,a)}{d(x,A)}-1\right). The penalty is now assessed via addition rather than division, which eliminates the need to make f positive prior to extension.

The simplest nontrivial example is extension from a two-point set. I took X=\mathbb R^2; the set A consists of the points (0,0), (1,0), at which f is defined to be 0 and 1, respectively. Here is the extension F computed according to Hausdorff (not to scale):

Hausdorff formula

And this is its slice long the x-axis, true scale:


As you can see, the extension isn’t the best one could have in terms of continuity, but its attractive feature is that one does not need to compute anything like the modulus of continuity of f (or its Lipschitz constant, as with the McShane-Whitney extension operator).

Strength in unity

Elementary point-set topology today, on a metric space X.

  • X is compact if every open cover of X has a finite subcover. One could just as well say “every sequence has a convergent subsequence”, but it’s considered bad form.
  • X is totally bounded if for every r>0 the space can be covered by finitely many balls of radius r. One could just as well say “every sequence has a Cauchy subsequence”, but it’s probably bad form, too.
  • X is complete if every Cauchy sequence converges. I don’t know if anyone considers this sequential definition bad form.

A look at the sequential forms of definitions reveals that a space is compact iff it is complete and totally bounded.

Which of the above properties survive under continuous maps?

  • If X is compact, then every continuous image of X is also compact.
  • If X is totally bounded, its continuous image may fail to be totally bounded, or even bounded. For example, the interval (0,1] is continuously mapped onto [1,\infty) by x\mapsto x^{-1}.
  • if X is complete, its continuous image may fail to be complete. For example, the interval [1,\infty) is continuously mapped onto (0,1] by x\mapsto x^{-1}.

There must be other natural examples of how two properties P_1 and P_2 are not invariant (under some class of transformations) on their own, but are invariant together. None come to mind at this moment.

Injective:Surjective :: Isometric:?

I used this sort of title before, in “Continuous:Lipschitz :: Open:?“, but the topics are related anyway.

In some sense (formal or informal) the following classes of maps are dual to each other.

  • Injective : Surjective
  • Immersion : Submersion
  • Monomorphism : Epimorphism

An isometric embedding of metric space X into a metric space Y is a map f\colon X\to Y such that d_Y(f(a),f(b))=d_X(a,b) for all a,b\in X. This concept belongs to the left column of the table above. What should be its counterpart?

Candidate 1. Observe that a 1-Lipschitz map f\colon X\to Y is an isometric embedding iff it does not factor through any 1-Lipschitz surjection g\colon X\to Z (for any space Z) unless g is an isometric isomorphism. Reversing the order of arrows, we arrive at the following concept:

A 1-Lipschitz map f\colon Y\to X is a metric quotient map if it does not factor through any 1-Lipschitz injection g\colon Z\to X (for any space Z) unless g is an isometric isomorphism.

This can be reformulated as follows: \rho(a,b):=d_{X}(f(a),f(b)) is the greatest pseudo-metric on Y subject to

  1. \rho(a,b)\le d_{Y}(a,b)
  2. \rho(a,b)=0 if f(a)=f(b)

This seems reasonable: f does as little damage as possible, given the structure of its fibers. There is also a natural way to construct \rho for any reasonable fibering of Y: begin by defining \widetilde{d_Y}=d_Y for points in different fibers and 0 otherwise. Then force the triangle inequality by letting \rho(a,b)=\inf \sum_{j=1}^n \widetilde{d_Y}(y_j,y_{j-1}) subject to y_0=a and y_n=b. As long as the fibers stay at positive distance from one another, this \rho will be a metric. The corresponding metric quotient map sends each point of Y onto its fiber.

Here is a simple example, in which a part of an interval is mapped to a point.

These aren't the quotients I'm looking for

However, the above example made me unhappy. The only nontrivial fiber is the interval [1,2]. Both points 0 and 3 belong to trivial fibers, but the distance between them decreases from 3 to 2. This looks like a wrong kind of quotient to me.

Candidate 2 already appeared in my post on Lipschitz quotient, but wasn’t recognized at the time. It could be called (1,1)-Lipschitz quotient, but a better name is available. A map f\colon Y\to X is a submetry if f(B_Y(y,r))=B_X(f(y),r) where the balls are closed (using open balls yields something almost equivalent, but generally weaker). Such f need not be an isometry: consider orthogonal projections in a Euclidean space. It does have to be 1-Lipschitz. The additional property that distinguishes it from general 1-Lipschitz maps is the 2-point lifting property: for every x_0,x_1\in X and every y_0\in f^{-1}(x_0) there is y_1\in f^{-1}(x_1) such that d_{Y}(y_0,y_1)=d_X(x_0,x_1). Incidentally, this shows that x\mapsto f^{-1}(x) is an isometric embedding of X into the hyperspace \mathcal{H}(Y) which I covered earlier (“This ain’t like dusting crops, boy“).

The concept and the name were introduced by В.Н. Берестовский (V.N. Berestovskii) in his paper “Submetries of space-forms of nonnegative curvature” published in 1987 in Siberian Math. J. Among other things, he proved that a submetry between spheres (of any dimension) is an isometry. Of course, there are many submetries of S^{n-1} onto other spaces: take the quotient by a subgroup of O(n), which can be either discrete or continuous. Are there any submetries that are not quotients by isometries?

Yes, there are. I’ll describe a (modified) example given by Berestovskii and Guijarro (2000). Let \mathbb{H} be the hyperbolic plane realized as the upper half-plane with the metric \frac{dx^2+dy^2}{y^2}. Define f\colon \mathbb{H}\to\mathbb{R} by

\displaystyle f(x,y)=\begin{cases} \log y, \qquad x\le 0 \\ \log\frac{x^2+y^2}{y}, \qquad x\ge 0 \end{cases}

Don’t panic; this is just the signed distance function (in the metric of \mathbb H) to the fat green curve below. I also drew two other level sets of f, to the best of my Friday night line-drawing ability.

Weird submetry

To convince yourself that f is a submetry, first consider y\mapsto \log y for which the submetry property is clear (it’s the quotient by horizontal translation), and then note that the inversion in the unit circle exchanges horizontal lines (horocycles at infinity) with horocycles at 0. An interesting feature of this submetry that it is not very smooth: C^{1,1} but not C^2.

More fractals: Laakso spaces and slit carpets

In early 2000s Tomi Laakso published two papers which demonstrated that metric spaces could behave in very Euclidean ways without looking Euclidean at all. One of his examples became known as the Laakso graph space, since it is the limit of a sequence of graphs. In fact, the best known version of the construction was introduced by Lang and Plaut, who modified Laakso’s original example to make it more symmetric. The building block is this graph with 6 edges:

Building block

Each edge is assigned length 1/4, so that the distance between the leftmost and rightmost points is 1. Next, replace each edge with a copy of the building block. This increases the number of edges by the factor of 6; their length goes down by the factor of 4 (so that the leftmost-rightmost distance is still 1). Repeat.

Laakso graph (click to magnify)

The resulting metric space is doubling: every ball can be covered by a fixed number (namely, 6) balls of half the size. This is the typical behavior of subsets of Euclidean spaces. Yet, the Laakso space does not admit a bi-Lipschitz embedding into any Euclidean space (in fact, even into any uniformly convex Banach space). It remains the simplest known example of a doubling space without such an embedding. Looking back at the building block, one recognizes the cycle C_4 as the source of non-embeddability (a single cycle forces a certain amount of distortion; adding cycles withing cycles ad infinitum forces infinite distortion). The extra edges on the left and right are necessary for the doubling condition.

In some sense, the Laakso space is the antipode of the Cantor set: instead of deleting the middle part repeatedly, we duplicate it. But it’s also possible to construct it with a ‘removal’ process very similar to Cantor’s. Begin with the square and slit it horizontally in the center; let the length of the slit be 1/3 of the sidelength. Then repeat as with the Cantor set, except in addition to cutting left and right of the previous cut, we also do it up and down. Like this:

Slit pseudo-carpet: beginning

Our metric space if the square minus the slits, equipped with the path metric: the distance between two points is the infimum of the length of curves connecting them within the space. Thus, the slits seriously affect the metric. This is how the set will look after a few more iterations:

Slit pseudo-carpet (click to magnify)

I called this a slit pseudo-carpet because it has nonempty interior, unlike true Sierpinski carpets. To better see the similarity with the Laakso space, multiply the vertical coordinate by \epsilon and let \epsilon\to 0 (equivalently, redefine the length of curves as \int |x'| instead of \int \sqrt{|x'|^2+|y'|^2}). This collapses all vertical segments remaining in the set, leaving us with a version of the Laakso graph space.

Finally, some Scilab code. The Laakso graph was plotted using the Chaos game, calling the function below with parameters laakso(0.7,50000).

function laakso(angle,steps)
xoffset = [0,1-s,s,s,1/2,1/2];
yoffset = [0,0,0,0,s*sin(angle),-s*sin(angle)];
rotation =[0,0,angle,-angle,-angle,angle];
point = zeros(steps,2);
vert = grand(1,steps,'uin',1,6);
for j = 2:steps
point(j,:) = point(j-1,:)*[sc(vert(j)),ss(vert(j));-ss(vert(j)),sc(vert(j))] + [xoffset(vert(j)), yoffset(vert(j))];