Retraction by contraction

The Euclidean space {\mathbb R^n} has a nice property: every closed convex subset {C\subset \mathbb R^n} is the image of the whole space under a map {f} that is simultaneously:

  • a contraction, meaning {\|f(a)-f(b)\|\le \|a-b\|} for all {a,b\in\mathbb R^n};
  • a retraction, meaning {f(a)=a} for all {a\in C}.

Indeed, we can simply define {f(x)} to be the (unique) nearest point of {C} to {x}; it takes a bit of work to verify that {f} is a contraction, but not much.


In other normed spaces, this nearest point projection does not work that well. For example, take {X=\ell_1^2}, the two-dimensional space with the Manhattan metric. Consider the line {C=\{(2t,t)\colon t\in\mathbb R\}} which is a closed and convex set. The nearest point of {C} to {(2,0)} is {(2,1)}: moving straight up, since changing the first coordinate doesn’t pay off. Since {(0,0)} remains fixed, the nearest point projection increases some pairwise distances, in this case from {2} to {3}.

However, there is a contractive retraction onto this line, given by the formuls {T(x_1,x_2) =    \left(\frac23(x_1+x_2), \frac13(x_1+x_2)\right)}. Indeed, this is a linear map that fixes the line {x_1=2x_2} pointwise and has norm {1} because

\displaystyle \|T(x)\|_1 = |x_1+x_2|\le \|x\|_1


More generally, in every normed plane, every closed convex subset admits a contractive retraction. To prove this, it suffices to consider closed halfplanes, since a closed convex set is an intersection of such, and contractions form a semigroup. Furthermore, it suffices to consider lines, because having a contractive retraction onto a line, we can redefine it to be the identity map on one side of the line, and get a contractive retraction onto a halfplane.

Such a retraction onto a line, which is a linear map, is illustrated below.

Every line in a normed plane is 1-complemented
Every line in a normed plane is 1-complemented

Given the unit circle (black) and a line (blue), draw supporting lines (red) to the unit circle at the points where it meets the line. Project onto the blue line along the red ones. By construction, the image of the unit disk under this projection is contained in the unit disk. This precisely means that the map has operator norm {1}.


In spaces of dimensions {3} or higher, there are closed convex subsets without a contractive retraction. For example, consider the plane in {\ell_\infty^3} passing through the points {A = (2,2,0)}, {B= (2,0,2)}, and {C= (0,2,2)}. This plane has equation {x_1+x_2+x_3=4}. The point {D=(1,1,1)} is at distance {1} from each of A,B,C, and it does not lie on the plane. For any point E other than D, at least one of the distances AE, BE, CE exceeds 1. More precisely, the best place to project D is {(4/3, 4/3, 4/3)} which is at distance {4/3} from A, B, and C.


Two natural questions: (a) is there a nice characterization of normed spaces that admit a contractive retraction onto every closed convex subset? (b) what is the smallest constant {L=L(n)} such that every {n}-dimensional normed space admits an {L}-Lipschitz retraction onto every closed convex subset?

(The answers may well be known, but not to me at present.)

Binary intersection property, and not fixing what isn’t broken

A metric space has the binary intersection property if every collection of closed balls has nonempty intersection unless there is a trivial obstruction: the distance between centers of two balls exceeds the sum of their radii. In other words, for every family of points {x_\alpha\in X} and numbers {r_\alpha>0} such that {d(x_\alpha,x_\beta)\le r_\alpha+r_\beta} for all {\alpha,\beta} there exists {x\in X} such that {d(x_\alpha,x)\le r_\alpha} for all {\alpha}.

For example, {\mathbb R} has this property: {x=\inf_\alpha (x_\alpha+r_\alpha)} works. But {\mathbb R^2} does not:

Failure of the binary intersection property
Failure of the binary intersection property

The space of bounded sequences {\ell^\infty} has the binary intersection property, and so does the space {B[0,1]} of all bounded functions {f:[0,1]\rightarrow\mathbb R} with the supremum norm. Indeed, the construction for {\mathbb R} generalizes: given a family of bounded functions {f_\alpha} and numbers {r_\alpha>0} as in the definition, let {f(x)=\inf_\alpha (f_\alpha(x)+r_\alpha)}.


The better known space of continuous functions {C[0,1]} has the finite version of binary intersection property, because for a finite family, the construction {\inf_\alpha (f_\alpha(x)+r_\alpha)} produces a continuous function. However, the property fails without finiteness, as the following example shows.

Example. Let {f_n\in C[0,1]} be a function such that {f_n(x)=-1} for {x\le \frac12-\frac1n}, {f_n(x)=1} for {x\ge \frac12+\frac1n}, and {f_n} is linear in between.

Since {\|f_n-f_m\| \le 1} for all {n,m}, we can choose {r_n=1/2} for all {n}. But if a function {f} is such that {\|f-f_n\|\le \frac12} for all {n}, then {f(x) \le -\frac12} for {x<\frac12} and {f(x) \ge \frac12} for {x>\frac12}. There is no continuous function that does that.

More precisely, for every {f\in C[0,1]} we have {\liminf_{n\to\infty} \|f-f_n\|\ge 1 } because {f(x)\approx f(1/2)} in a small neighborhood of {1/2}, while {f_n} change from {1} to {-1} in the same neighborhood when {n} is large.


Given a discontinuous function, one can approximate it with a continuous function in some way: typically, using a mollifier. But such approximations tend to change the function even if it was continuous to begin with. Let’s try to not fix what isn’t broken: look for a retraction of {B[0,1]} onto {C[0,1]}, that is a map {\Phi:B[0,1]\rightarrow C[0,1]} such that {\Phi(f)=f} for all {f\in C[0,1]}.

The failure of binary intersection property, as demonstrated by the sequence {(f_n)} above, implies that {\Phi} cannot be a contraction. Indeed, let {f(x)= \frac12 \,\mathrm{sign}\,(x-1/2)}. This is a discontinuous function such that {\|f-f_n\|\le 1/2} for all {n}. Since {\liminf_{n\to\infty} \|\Phi(f)-f_n\|\ge 1}, it follows that {\Phi} cannot be {L}-Lipschitz with a constant {L<2}.


It was known for a while that there is a retraction from {B[0,1]} onto {C[0,1]} with the Lipschitz constant at most {20}: see Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss. In a paper published in 2007, Nigel Kalton proved the existence of a {2}-Lipschitz retraction.

Nested powers and collisions: final chapter

This is where the story of nested powers of metric spaces and of collision-inducing ODE reaches its logical conclusion.

Theorem 1. For every {d\ge 1} and every {n\ge 2} there exists a Lipschitz retraction {r\colon {\mathbb R}^d(n)\rightarrow {\mathbb R}^d(n-1)}.

Compared to previous posts, I changed notation to improve readability: {X(n)} now stands for the set of all nonempty subsets of {X} with at most {n} elements. The metric on {X(n)} is the Hausdorff metric {d_H}. The map {r} was defined in Collisions II as follows:

Given a set {A\in {\mathbb R}^d(n)\setminus {\mathbb R}^d(n-1)}, enumerate its elements {a_1,\dots,a_n} and consider the initial-value problem

\displaystyle    \dot{x}_i = \sum_{j\ne i}\frac{x_j-x_i}{|x_j-x_i|}, \quad x_i(0)=a_i, \quad i=1,\dots,n

Define {r(A) = \{x_i(t_c)\}} where {t_c} be the time of the first collision between particles in this system. Also define {r(A)=A} when {A\in{\mathbb R}^d(n-1)}.

I claim that

\displaystyle    d_H(r(A),r(B))\le (2n-1)\sqrt{n} \,d_H(A,B)   \ \ \ \ \ (1)


for all {A,B\in{\mathbb R}^d(n)}. Interestingly, the Lipschitz constant does not depend on the dimension of ambient space. (Well, I’m not so sure it has to depend on cardinality either.)

Proof. Let

\displaystyle \delta(A)=\begin{cases} \min_{i\ne j}|a_i-a_j|,\quad  A\in{\mathbb R}^d(n)\setminus {\mathbb R}^d(n-1) \\ 0,\quad \quad A\in \mathbb R^d(n-1) \end{cases}

It is easy to check that the function {A\mapsto \delta(A)} is {2}-Lipschitz on {{\mathbb R}^d(n)}.

Earlier I proved that { t_c\le \dfrac{1}{2}\delta(A)} and, as a consequence,

\displaystyle    d_H(r(A),A)\le \frac{n-1}{2} \delta(A)   \ \ \ \ \ (2)

Let {\rho=d_H(A,B)}. If {\delta(A)+\delta(B)\le 4\,\rho}, then (2) already gives (1). From now on assume {\delta(A)> 2\,\rho}, hence {\delta(B)>0}. The geometric meaning of these inequalities is that the points of {A} are spread out in space, yet each of them is close to a point of {B}. Therefore, we can enumerate them {a_i} and {b_i} in such a way that {|a_i-b_i|\le \rho} for all {i}.

Let {(x_i)} and {(y_i)} be the solutions of our ODE with initial data {(a_i)} and {(b_i)}, respectively. Here comes a monotonicity lemma.

Lemma 2. {\sum_{i=1}^n |x_i-y_i|^2} is nonincreasing with time {t}.

Proof: Encode the sequence {(x_i)} by a single point {\mathbf{x}\in {\mathbb R}^{nd}}. This point is moved by the gradient flow of the convex function {\Phi(\mathbf{x}) = \sum_{i<j}|x_i-x_j|}. Since the gradient of a convex function is monotone, we have {\langle {\mathbf{ \dot x}}-{\mathbf{\dot y}}, \mathbf{x}-\mathbf{y}\rangle\le 0}. It follows that {|\mathbf{x}-\mathbf{y}|^2} is nonincreasing. \Box

By Lemma 2, {|x_i-y_i|\le \sqrt{n}\,\rho } holds for all {i} until a collision occurs in either system. We may assume that {x}-particles collide first. Then, at the time of their collision they form the set {r(A)} while the {y}-particles form some set {B'}, with {d_H(r(A),B')\le \sqrt{n}\,\rho}. Since r(A) has fewer than n elements, {\delta(B')\le 2d_H(r(A),B')}. From (2) we obtain {d_H(r(B'),B')\le 2(n-1)\sqrt{n}\,\rho}. Note that {r(B')=r(B)} because our ODE system is autonomous. The final estimate is {d_H(r(A),r(B))\le d_H(r(A),B')+d_H(r(B'),B')}, which is (1). {\Box}

Here is how a 20-point set gets retracted onto {{\mathbb R}^2(19)}:

20 particles, stopped
20 particles, stopped

The plot was produced by a version of the familiar Scilab routine, modified to stop the computation at first collision. It was called with

gravity3(grand(1,20,"unf",0,2),grand(1,20,"def"),300)
function gravity3(vx,vy,Nsteps)
    Npoints = length(vx);
    h = 0.01/Npoints;
    tolerance = h*(Npoints-1)/3;
    collided = 0;
    x = zeros(Nsteps,Npoints);
    y = zeros(Nsteps,Npoints);
    x(1,:) = vx;
    y(1,:) = vy;
    m = 1;
    while (m<Nsteps & collided==0) do
        x(m+1,:)= x(m,:);
        y(m+1,:)= y(m,:);
        for i = 1:Npoints
            for j = (i+1):Npoints
                v = [x(m,j),y(m,j)]-[x(m,i),y(m,i)];
                if norm(v) < tolerance then 
                    collided=1; 
                else
                    x(m+1,i) = x(m+1,i) + h*v(1)/norm(v);
                    x(m+1,j) = x(m+1,j) - h*v(1)/norm(v);
                    y(m+1,i) = y(m+1,i) + h*v(2)/norm(v);
                    y(m+1,j) = y(m+1,j) - h*v(2)/norm(v);
                end
            end
        end
    m = m+1;    
    end
    clf();
    plot(x(:,:),y(:,:),'o');
endfunction

Nested powers and their retracts

It is common to consider vector spaces {{\mathbb R}}, {{\mathbb R}^2}, {{\mathbb R}^3}, … as a nested sequence with inclusion maps

\displaystyle \iota \colon {\mathbb R}^n\rightarrow{\mathbb R}^{n+1}, \quad (x_1, \dots, x_n) \mapsto (x_1,\dots,x_n,0)

To define these inclusion maps we did not need any structure of {{\mathbb R}} other than the existence of a distinguished element {0\in{\mathbb R}}. The same could be done for any pointed set, i.e., a set {X} with a distinguished element {x_0\in X}. Each inclusion {\iota\colon X^n\rightarrow X^{n+1}} is split, because we can retract {X^{n+1}} onto {X^n} by deleting the last coordinate. This works in the category of sets, and also in every other category I can think of.

However, many naturally occurring manifolds (more generally, metric spaces) do not have any distinguished elements: spheres, hyperbolic spaces, tori, … indeed, such spaces are called homogeneous because all their points look alike (the group of isometries is transitive). The line did not have a distinguished element until we gave it the group structure of {{\mathbb R}}, and to introduce such a structure we had to make an arbitrary choice of a point to be called {0}.

Is there a way to have a nested sequence of powers of a space {X} without distinguished points? Yes, but it will not be the Cartesian product {X\times \dots \times X}. Instead, we consider the symmetric product {X^{(n)}} whose elements are nonempty subsets of {X} with cardinality at most {n}. This product (really a power, but the name “symmetric product” is more common) is naturally related to {X^n}:

  • There is a canonical surjection {q\colon X^{n}\rightarrow X^{(n)}} which sends {(x_1,\dots,x_n)} to {\{x_1,\dots,x_n\}}.
  • If {A_1,\dots,A_n} are disjoint subsets of {X}, then the restriction of {q} to {A_1\times \dots \times A_n} is injective.

The symmetric products are nested by virtue of the obvious inclusion {\iota \colon X^{(n)}\rightarrow X^{(n+1)}}: each set stays as it is, nothing needs to be added.

If {X} is a topological or metric space, we can give the same structure to {X^{(n)}} by declaring {q} to be a quotient map. In the metric case there is a more direct and explicit definition of a metric on {X^{(n)}}: the Hausdorff metric. First, introduce {\delta}-neighborhood of a set {A}: this is the set of all points at distance at most {\delta} from {A}, denoted {A(\delta)}. Given two sets {A,B\in X^{(n)}}, define

\displaystyle d_H(A,B)=\inf\{\delta\colon A\subset B(\delta) \text{ and } B\subset A(\delta) \}

To better relate this metric to {X^n}, we use the maximum metric {d_\infty} on {X^n}: the distance between {(x_j),(y_j)\in X^n} is {d_\infty= \max_j d_X(x_j,y_j)}. It is clear that the forgetful surjection {q} is {1}-Lipschitz. Unfortunately, it is not a submetry: for example, the points {\{1,2,5\}} and {\{1,4,5\}} are at unit distance in {{\mathbb R}^{(3)}} but any pair of their preimages under {q} has distance at least {2} in {{\mathbb R}^3}. Oh well.

Recalling that the inclusions of Cartesian products are split, we ask if the same holds for symmetric products: is there always a retraction {r\colon X^{(n)}\rightarrow X^{(m)}} when {n>m}? A retraction is a Lipschitz map (in topological setting, continuous) which agrees with the identity on {X^{(m)}}.

For example, is the circle {\mathbb T} a retract of {\mathbb T^{(2)}}? The first idea may be to appeal to non-existence of continuous single-valued square root, but then we recall that a retraction need not send {\{a,b\}} into either {\{a\}} or {\{b\}}. The following argument works, though: supposing that {r\colon \mathbb{T}^{(2)}\rightarrow\mathbb{T}} is a retraction, for each fixed {t\in [0,\pi]} we define the map {f_t\colon \mathbb{T}\rightarrow\mathbb{T}} so that {f_t(e^{i \theta})} is the sole element of {r(\{e^{i\theta}, e^{i\theta+it}\})}. Then {f_0} is the identity, {f_{\pi}} is an even map (hence of even degree), but they are homotopic via {\{f_t\}}, which is a contradiction.

In the positive direction, {{\mathbb R}^{(m)}} is a Lipschitz retract of {{\mathbb R}^{(n)}} whenever {m<n}. The idea is simple: given a set of cardinality {n}, label its elements as {a_1<\dots<a_n} and define {a_j'=a_j-(j-1)\delta} where {\delta=\min \{ a_{j+1}-a_j \colon j=1,\dots,n-1\}}. This can even be promoted to a deformation retraction via {a_j^t=a_j-t(j-1)\delta}, {0\le t\le 1}. Geometrically, this map deflates all gaps in the set at the same rate, until the smallest one collapses.

Reducing the gaps
Reducing the gaps

For a slightly more general version, see Lemma 4.2 here.

I got stuck on the case when {X} is {\mathbb{P}}, the Euclidean plane {{\mathbb R}^2}. It is easy to retract {\mathbb{P}^{(2)}} onto {\mathbb{P}^{(1)}}: just send each two-point set into its center of mass (midpoint). I also have a Lipschitz retraction of {\mathbb{P}^{(3)}} onto {\mathbb{P}^{(2)}}, but it is not as simple. Given a {3}-point set {\{a,b,c\}} in the plane, let {o} be the center of the circle inscribed in the triangle with vertices {abc}; if the points are collinear, let {o} be the one that lies between two others. Equivalently, {o} is the center of mass of the vertices where each vertex is given weight equal to the length of opposite side. One can check that the map {\{a,b,c\}\mapsto o} is Lipschitz. By relabeling vertices, we may assume that {|a-o|\ge |b-o|\ge |c-o|}. Let {a'} be the point on the segment {ao} at distance {|b-o|} from {a}. The map {\{a,b,c\}\mapsto \{a',o\}} is the desired retraction. Again, we can make it into a deformation retraction in an obvious way.

From a 3-point set to a 2-point set
From a 3-point set to a 2-point set

In both cases, the idea is to pretend that the original set lives on a tree, and pull all of its elements toward some vertex of the tree. The incenter of triangle enters the game once one recalls the tripod representation of a {3}-point set. However, with sets of greater cardinality the non-hyperbolicity of {\mathbb{P}} becomes apparent, making it hard to pretend that we work with a tree. I do not know if there is a Lipschitz retract from {\mathbb{P}^{(n+1)}} to {\mathbb{P}^{(n)}} for every {n}.

Update: such a retraction exists.