Retraction by contraction

The Euclidean space {\mathbb R^n} has a nice property: every closed convex subset {C\subset \mathbb R^n} is the image of the whole space under a map {f} that is simultaneously:

  • a contraction, meaning {\|f(a)-f(b)\|\le \|a-b\|} for all {a,b\in\mathbb R^n};
  • a retraction, meaning {f(a)=a} for all {a\in C}.

Indeed, we can simply define {f(x)} to be the (unique) nearest point of {C} to {x}; it takes a bit of work to verify that {f} is a contraction, but not much.


In other normed spaces, this nearest point projection does not work that well. For example, take {X=\ell_1^2}, the two-dimensional space with the Manhattan metric. Consider the line {C=\{(2t,t)\colon t\in\mathbb R\}} which is a closed and convex set. The nearest point of {C} to {(2,0)} is {(2,1)}: moving straight up, since changing the first coordinate doesn’t pay off. Since {(0,0)} remains fixed, the nearest point projection increases some pairwise distances, in this case from {2} to {3}.

However, there is a contractive retraction onto this line, given by the formuls {T(x_1,x_2) =    \left(\frac23(x_1+x_2), \frac13(x_1+x_2)\right)}. Indeed, this is a linear map that fixes the line {x_1=2x_2} pointwise and has norm {1} because

\displaystyle \|T(x)\|_1 = |x_1+x_2|\le \|x\|_1


More generally, in every normed plane, every closed convex subset admits a contractive retraction. To prove this, it suffices to consider closed halfplanes, since a closed convex set is an intersection of such, and contractions form a semigroup. Furthermore, it suffices to consider lines, because having a contractive retraction onto a line, we can redefine it to be the identity map on one side of the line, and get a contractive retraction onto a halfplane.

Such a retraction onto a line, which is a linear map, is illustrated below.

Every line in a normed plane is 1-complemented
Every line in a normed plane is 1-complemented

Given the unit circle (black) and a line (blue), draw supporting lines (red) to the unit circle at the points where it meets the line. Project onto the blue line along the red ones. By construction, the image of the unit disk under this projection is contained in the unit disk. This precisely means that the map has operator norm {1}.


In spaces of dimensions {3} or higher, there are closed convex subsets without a contractive retraction. For example, consider the plane in {\ell_\infty^3} passing through the points {A = (2,2,0)}, {B= (2,0,2)}, and {C= (0,2,2)}. This plane has equation {x_1+x_2+x_3=4}. The point {D=(1,1,1)} is at distance {1} from each of A,B,C, and it does not lie on the plane. For any point E other than D, at least one of the distances AE, BE, CE exceeds 1. More precisely, the best place to project D is {(4/3, 4/3, 4/3)} which is at distance {4/3} from A, B, and C.


Two natural questions: (a) is there a nice characterization of normed spaces that admit a contractive retraction onto every closed convex subset? (b) what is the smallest constant {L=L(n)} such that every {n}-dimensional normed space admits an {L}-Lipschitz retraction onto every closed convex subset?

(The answers may well be known, but not to me at present.)

The shortest circle is a hexagon

Let {\|\cdot\|} be some norm on {{\mathbb R}^2}. The norm induces a metric, and the metric yields a notion of curve length: the supremum of sums of distances over partitions. The unit circle {C=\{x\in \mathbb R^2\colon \|x\|=1\}} is a closed curve; how small can its length be under the norm?

For the Euclidean norm, the length of unit circle is {2\pi\approx 6.28}. But it can be less than that: if {C} is a regular hexagon, its length is exactly {6}. Indeed, each of the sides of {C} is a unit vector with respect to the norm defined by {C}, being a parallel translate of a vector connecting the center to a vertex.

Hexagon as unit disk
Hexagon as unit disk

To show that {6} cannot be beaten, suppose that {C} is the unit circle for some norm. Fix a point {p\in C}. Draw the circle {\{x\colon \|x-p\|=1\}}; it will cross {C} at some point {q}. The points {p,q,q-p, -p, -q, p-q} are vertices of a hexagon inscribed in {C}. Since every side of the hexagon has length {1}, the length of {C} is at least {6}.

It takes more effort to prove that the regular hexagon and its affine images, are the only unit circles of length {6}; a proof can be found in Geometry of Spheres in Normed Spaces by Juan Jorge Schäffer.

Nonlinear Closed Graph Theorem

If a function {f\colon {\mathbb R}\rightarrow {\mathbb R}} is continuous, then its graph {G_f = \{(x,f(x))\colon x\in{\mathbb R}\}} is closed. The converse is false: a counterexample is given by any extension of {y=\tan x} to the real line.

Closed graph, not continuous
Closed graph, not continuous

The Closed Graph Theorem of functional analysis states that a linear map between Banach spaces is continuous whenever its graph is closed. Although the literal extension to nonlinear maps fails, it’s worth noting that linear maps are either continuous or discontinuous everywhere. Hence, if one could show that a nonlinear map with a closed graph has at least one point of continuity, that would be a nonlinear version of the Closed Graph Theorem.


Here is an example of a function with a closed graph and an uncountable set of discontinuities. Let {C\subset {\mathbb R}} be a closed set with empty interior, and define

\displaystyle f(x) = \begin{cases} 0,\quad & x\in C \\ \textrm{dist}\,(x,C)^{-1},\quad & x\notin C \end{cases}

For a general function, the set of discontinuities is an Fσ set. When the graph is closed, we can say more: the set of discontinuities is closed. Indeed, suppose that a function {f} is bounded in a neighborhood of {a} but is not continuous at {a}. Then there are two sequences {x_n\rightarrow a} and {y_n\rightarrow a} such that both sequences {f(x_n)} and {f(y_n)} converge but have different limits. Since at least one of these limits must be different from {f(a)}, the graph of {f} is not closed. Conclusion: a function with a closed graph is continuous at {a} if and only if it is bounded in a neighborhood of {a}. In particular, the set of discontinuities is closed.


Furthermore, the set of discontinuities has empty interior. Indeed, suppose that {f} is discontinuous at every point of a nontrivial closed interval {[a,b]}. Let {A_n = G_f \cap ([a,b]\times [-n,n])}; this is a closed bounded set, hence compact. Its projection onto the {x}-axis is also compact, and this projection is exactly the set {B_n=\{x\in [a,b] : |f(x)|\le n\}}. Thus, {B_n} is closed. The set {B_n} has empty interior, since otherwise {f} would be continuous at its interior points. Finally, {\bigcup B_n=[a,b]}, contradicting the Baire Category theorem.

Summary: for closed-graph functions on {\mathbb R}, the sets of discontinuity are precisely the closed sets with empty interior. In particular, every such function has a point of continuity. The proof works just as well for maps from {\mathbb R^n} to any metric space.


However, the above result does not extend to the setting of Banach spaces. Here is an example of a map {F\colon X\rightarrow X} on a Banach space {X} such that {\|F(x)-F(y)\|=1} whenever {x\ne y}; this property implies that the graph is closed, despite {F} being discontinuous everywhere.

Let {X} the space of all bounded functions {\phi \colon (0,1]\rightarrow\mathbb R} with the supremum norm. Let {(q_n)_{n=1}^\infty} be an enumeration of all rational numbers. Define the function {\psi =F(\phi )} separately on each subinterval {(2^{-n},2^{1-n}]}, {n=1,2,\dots} as

{\displaystyle \psi(t) = \begin{cases} 1 \quad &\text{if } \phi (2^nt-1) > q_n \\ 0 \quad &\text{if } \phi (2^n t-1)\le q_n\end{cases}}

For any two distinct elements {\phi_1,\phi_2} of {X} there is a point {s\in (0,1]} and a number {n\in\mathbb N} such that {q_n} is strictly between {\phi_1(s)} and {\phi_2(s)}. According to the definition of {F} this implies that the functions {F(\phi_1)} and {F(\phi_2)} take on different values at the point {t=2^{-n}(s+1)}. Thus the norm of their difference is {1}.


So much for Nonlinear Closed Graph Theorem. However, the space {X} in the above example is nonseparable. Is there an nowhere continuous map between separable Banach spaces such that its graph is closed?

Binary intersection property, and not fixing what isn’t broken

A metric space has the binary intersection property if every collection of closed balls has nonempty intersection unless there is a trivial obstruction: the distance between centers of two balls exceeds the sum of their radii. In other words, for every family of points {x_\alpha\in X} and numbers {r_\alpha>0} such that {d(x_\alpha,x_\beta)\le r_\alpha+r_\beta} for all {\alpha,\beta} there exists {x\in X} such that {d(x_\alpha,x)\le r_\alpha} for all {\alpha}.

For example, {\mathbb R} has this property: {x=\inf_\alpha (x_\alpha+r_\alpha)} works. But {\mathbb R^2} does not:

Failure of the binary intersection property
Failure of the binary intersection property

The space of bounded sequences {\ell^\infty} has the binary intersection property, and so does the space {B[0,1]} of all bounded functions {f:[0,1]\rightarrow\mathbb R} with the supremum norm. Indeed, the construction for {\mathbb R} generalizes: given a family of bounded functions {f_\alpha} and numbers {r_\alpha>0} as in the definition, let {f(x)=\inf_\alpha (f_\alpha(x)+r_\alpha)}.


The better known space of continuous functions {C[0,1]} has the finite version of binary intersection property, because for a finite family, the construction {\inf_\alpha (f_\alpha(x)+r_\alpha)} produces a continuous function. However, the property fails without finiteness, as the following example shows.

Example. Let {f_n\in C[0,1]} be a function such that {f_n(x)=-1} for {x\le \frac12-\frac1n}, {f_n(x)=1} for {x\ge \frac12+\frac1n}, and {f_n} is linear in between.

Since {\|f_n-f_m\| \le 1} for all {n,m}, we can choose {r_n=1/2} for all {n}. But if a function {f} is such that {\|f-f_n\|\le \frac12} for all {n}, then {f(x) \le -\frac12} for {x<\frac12} and {f(x) \ge \frac12} for {x>\frac12}. There is no continuous function that does that.

More precisely, for every {f\in C[0,1]} we have {\liminf_{n\to\infty} \|f-f_n\|\ge 1 } because {f(x)\approx f(1/2)} in a small neighborhood of {1/2}, while {f_n} change from {1} to {-1} in the same neighborhood when {n} is large.


Given a discontinuous function, one can approximate it with a continuous function in some way: typically, using a mollifier. But such approximations tend to change the function even if it was continuous to begin with. Let’s try to not fix what isn’t broken: look for a retraction of {B[0,1]} onto {C[0,1]}, that is a map {\Phi:B[0,1]\rightarrow C[0,1]} such that {\Phi(f)=f} for all {f\in C[0,1]}.

The failure of binary intersection property, as demonstrated by the sequence {(f_n)} above, implies that {\Phi} cannot be a contraction. Indeed, let {f(x)= \frac12 \,\mathrm{sign}\,(x-1/2)}. This is a discontinuous function such that {\|f-f_n\|\le 1/2} for all {n}. Since {\liminf_{n\to\infty} \|\Phi(f)-f_n\|\ge 1}, it follows that {\Phi} cannot be {L}-Lipschitz with a constant {L<2}.


It was known for a while that there is a retraction from {B[0,1]} onto {C[0,1]} with the Lipschitz constant at most {20}: see Geometric Nonlinear Functional Analysis by Benyamini and Lindenstrauss. In a paper published in 2007, Nigel Kalton proved the existence of a {2}-Lipschitz retraction.

Graphical embedding

This post continues the theme of operating with functions using their graphs. Given an integrable function {f} on the interval {[0,1]}, consider the region {R_f} bounded by the graph {y=f(x)}, the axis {y=0}, and the vertical lines {x=0}, {x=1}.

Total area under and over the graph is the L1 norm
Total area under and over the graph is the L1 norm

The area of {R_f} is exactly {\int_0^1 |f(x)|\,dx}, the {L^1} norm of {f}. On the other hand, the area of a set is the integral of its characteristic function,

\displaystyle    \chi_f = \begin{cases}1, \quad x\in R_f, \\ 0,\quad x\notin R_f \end{cases}

So, the correspondence {f\mapsto \chi_f } is a map from the space of integrable functions on {[0,1]}, denoted {L^1([0,1])}, to the space of integrable functions on the plane, denoted {L^1(\mathbb R^2)}. The above shows that this correspondence is norm-preserving. It also preserves the metric, because integration of {|\chi_f-\chi_g|} gives the area of the symmetric difference {R_f\triangle R_g}, which in turn is equal to {\int_0^1 |f-g| }. In symbols:

\displaystyle    \|\chi_f-\chi_g\|_{L^1} = \int |\chi_f-\chi_g| = \int |f-g| = \|f-g\|_{L^1}

Distance between two functions in terms of their graphs
Distance between two functions in terms of their graphs

The map {f\mapsto \chi_f} is nonlinear: for example {2f} is not mapped to {2 \chi_f} (the function that is equal to 2 on the same region) but rather to a function that is equal to 1 on a larger region.

So far, this nonlinear embedding did not really offer anything new: from one {L^1} space we got into another. It is more interesting (and more difficult) to embed things into a Hilbert space such as {L^2(\mathbb R^2)}. But for the functions that take only the values {0,1,-1}, the {L^2} norm is exactly the square root of the {L^1} norm. Therefore,

\displaystyle    \|\chi_f-\chi_g\|_{L^2} = \sqrt{\int |\chi_f-\chi_g|^2} =    \sqrt{\int |\chi_f-\chi_g|} = \sqrt{\|f-g\|_{L^1}}

In other words, raising the {L^1} metric to power {1/2} creates a metric space that is isometric to a subset of a Hilbert space. The exponent {1/2} is sharp: there is no such embedding for the metric {d(f,g)=\|f-g\|_{L^1}^{\alpha} } with {\alpha>1/2}. The reason is that {L^1}, having the Manhattan metric, contains geodesic squares: 4-cycles where the distances between adjacent vertices are 1 and the diagonal distances are equal to 2. Having such long diagonals is inconsistent with the parallelogram law in Hilbert spaces. Taking the square root reduces the diagonals to {\sqrt{2}}, which is the length they would have in a Hilbert space.

This embedding, and much more, can be found in the ICM 2010 talk by Assaf Naor.

Maximum of three numbers: it’s harder than it sounds

This simple identity hold for any two real numbers {x,y}:

\displaystyle   \max(|x|,|y|) = \frac12\,(|x+y|+|x-y|)   \ \ \ \ \  \ \ \ \ \ (1)

Indeed, if {|x|} realizes the maximum, then both {x+y} and {x-y} have the same sign as {x}. After opening the absolute value signs, we get {y} to cancel out.

So, (1) represents {\max(|x|,|y|)}, also known as the {\ell_\infty} norm, as the sum of absolute values of linear functions. Let’s try the same with {\max(|x|,|y|,|z|)}. Since the right hand side of (1) is just the average of {|\pm x \pm y|} over all possible choices of {\pm } signs, the natural thing to do is to average {|\pm x \pm y \pm z|} over all eight choices. The sign in front of {x} can be taken to be {+}, which simplifies the average to

\displaystyle    \frac14\,(|x+y+z|+|x+y-z|+|x-y+z|+|x-y-z|)    \ \ \ \ \  \ \ \ \ \ (2)

Does this formula give {\max(|x|,|y|,|z|)}? Instead of trying random numbers, let’s just plot the unit ball for the norm given by (2). If the identity works, it will be a cube. I used Maple:

with(plots): f:=(abs(x+y+z)+abs(x+y-z)+abs(x-y-z)+abs(x-y+z))/4:
implicitplot3d(f=1,x=-1/4..1/4,y=-1/4..1/4,z=-1/4..1/4,grid=[25,25,25]);
My precious!
My precious!

Close, but no cube. Luckily, this is my favorite Archimedean solid, the cuboctahedron.

Although all terms of (2) look exactly the same, the resulting shape has both triangular and square faces. Where does the difference of shapes come from?

More importantly, is (2) really the best we can do? Is there some other sum of moduli of linear functions that will produce {\max(|x|,|y|,|z|)}?

— No.

Even if negative coefficients are allowed?

— Even then. (But you can come arbitrarily close.)

What if we allow integrals with respect to an arbitrary (signed) measure, as in

\displaystyle    \iiint |\alpha x+\beta y+\gamma z|\,d \mu(\alpha, \beta, \gamma)    \ \ \ \ \  \ \ \ \ \ (3)

— Still no. But if {\mu} is allowed to be a distribution of higher order (an object more singular than a measure), then a representation exists (W. Weil, 1976). Yes, one needs the theory of distributions to write the maximum of three numbers as a combination of linear functions.

I’ll only prove that there is no identity of the form

\displaystyle  \max(|x|,|y|,|z|) = \sum_{i=1}^N |\alpha_i x+\beta_i y+ \gamma_i z|

Indeed, such an identity amounts to having an isometric embedding {T\colon \ell_\infty^3 \rightarrow \ell_1^N}. The adjoint operator {T^* \colon \ell_\infty^N \rightarrow \ell_1^3} is a submetry meaning that it maps the unit ball of {\ell_\infty^N } onto the unit ball {\ell_1^3}. The unit ball of {\ell_\infty^N } is just a cube; all of its faces are centrally symmetric, and this symmetry is preserved by linear maps. But {\ell_1^3} is an octahedron, with triangular faces. A contradiction. {\ \Box}

An aside: what if instead of averaging {|\pm x \pm y|} over all {\pm } choices (i.e., unimodular real coefficients) we take the average over all unimodular complex coefficients? This amounts to {\|(x,y)\| = \frac{1}{2\pi} \int_0^{2\pi} |x+e^{it}y|\,dt}. I expected something nice from this norm, but

Complex averaging in real space
Complex averaging in real space

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.

Subspaces and projections

Let M be a closed subspace of a Banach space X. In general, there is no linear projection P\colon X\to M, the canonical example being c_0 in \ell^\infty. At least we can construct a projection when M is finite-dimensional. The one-dimensional case is easy: take a unit vector x_1\in M, pick a norming functional f and define P(x)=f(x)x_1. If M is n-dimensional, one can construct P as the sum of n rank-one projections, achieving \|P\| \le n. Which is pretty bad: distorting distances by a factor comparable to dimension may render high-dimensional data useless. One usually seeks estimates that are logarithmic in dimension (or better yet, dimension-independent).

Recall that a retraction is a continuous map f\colon X\to M which is the identity on M. A projection is a linear retraction. The linearity is quite a rigid condition. We may have better luck with retractions in other classes, such as Lipschitz maps. And indeed, there is a 2-Lipschitz retraction from \ell^\infty onto c_0. Given x\in \ell^\infty, let d=\limsup|x_n| and define r(x)=y with y_n=(|x_n|-d)^+\mathrm{sign}\,x_n. Since d is a 1-Lipschitz function, r is 2-Lipschitz. It is a retraction because d=0 when x\in c_0.

One of many open problems in the book Geometric Nonlinear Functional Analysis I (Benyamini and Lindenstrauss) is whether every Banach space is a Lipschitz retract of its bidual. This problem is also mentioned in 2007 survey by Nigel Kalton.

One thing to like about linear projections is their openness: any linear surjection between Banach spaces is an open map. This is not the case for Lipschitz surjections: for instance, f(x)=(|x|-1)^+\mathrm{sign}\,x is a Lipschitz surjection \mathbb R\to\mathbb R which maps (-1,1) to a point. This example resembles the retraction r above. And indeed, r is not open either: the image of a small open ball centered at (0,1,1,1,1,\dots) is contained in the hyperplane x_1=0.

In the context of Lipschitz maps it is natural to quantify openness in the same way as continuity: i.e., by requiring the image of a ball B(x,r) to contain B(f(x),r/C) with C independent of x,r. This defines Lipschitz quotients, which appear to be the right concept of “nonlinear projection”. However, it remains unknown whether there is a Lipschitz quotient Q\colon \ell^\infty\to c_0. [Benyamini and Lindenstrauss]