The displacement set of nonlinear maps in vector spaces

Given a vector space {V} and a map {f\colon V\to V} (linear or not), consider the displacement set of {f}, denoted {D(f) = \{f(x)-x\colon x\in V\}}. For linear maps this is simply the range of the operator {f-I} and therefore is a subspace.

The essentially nonlinear operations of taking the inverse or composition of maps become almost linear when the displacement set is considered. Specifically, if {f} has an inverse, then {D(f^{-1}) = -D(f)}, which is immediate from the definition. Also, {D(f\circ g)\subset D(f)+D(g)}.

When {V} is a topological vector space, the maps for which {D(f)} has compact closure are of particular interest: these are compact perturbations of the identity, for which degree theory can be developed. The consideration of {D(f)} makes it very clear that if {f} is an invertible compact perturbation of the identity, then {f^{-1}} is in this class as well.

It is also of interest to consider the maps for which {D(f)} is either bounded, or is bounded away from {0}. Neither case can occur for linear operators, so this is essentially nonlinear analysis. In the nonlinear case, the boundedness assumption for linear operators is usually replaced by the Lipschitz condition. Let us say that {f} is {(L, \ell)}-bi-Lipschitz if {\ell\|x-y\|\le \|f(x)-f(y)\|\le L\|x-y\|} for all {x, y} in the domain of {f}.

Brouwer’s fixed point theorem fails in infinite-dimensional Hilbert spaces, but it not yet clear how hard it can fail. The strongest possible counterexample would be a bi-Lipschitz automorphism of the unit ball with displacement bounded away from 0. The existence of such a map is unknown. If it does not exist, that would imply that the unit ball and the unit sphere in the Hilbert space are not bi-Lipschitz equivalent, because the unit sphere does have such an automorphism: {x\mapsto -x}.

Concerning the maps with bounded displacement, here is a theorem from Patrick Biermann’s thesis (Theorem 3.3.2): if {f} is an {(L, \ell)}-bi-Lipschitz map in a Hilbert space, {L/\ell < \pi/\sqrt{8}}, and {f} has bounded displacement, then {f} is onto. The importance of bounded displacement is illustrated by the forward shift map {S(x_1, x_2, \dots) = (0, x_1, x_2, \dots)} for which {L=\ell=1} but surjectivity nonetheless fails.

It would be nice to get rid of the assumption {L/\ell < \pi/\sqrt{8}} in the preceding paragraph. I guess any bi-Lipschitz map with bounded displacement should be surjective, at least in Hilbert spaces, but possibly in general Banach spaces as well.

Rough isometries

An isometry is a map {f\colon X\rightarrow Y} between two metric spaces {X,Y} which preserves all distances: {d_Y(f(x),f(x')) = d_X(x,x')} for all {x,x'\in X}. (After typing a bunch of such formulas, one tends to prefer shorter notation: {|f(x)f(x')| = |xx'|}, with the metric inferred from contexts.) It’s a popular exercise to prove that every isometry from a compact metric space into itself is surjective.

A rough isometry allows additive distortion of distances: {|\,|f(x)f(x')| - |xx'|\,|\le \epsilon}. (As contrasted with bi-Lipschitz maps, which allow multiplicative distortion). Rough isometries ignore small-scale features (in particular, they need not be continuous), but accurately capture the large scale structure of a space. This makes them convenient for studying hyperbolic spaces (trees and their relatives), where the large-scale structure is of interest.

If {f} is an {\epsilon}-rough isometry, then the Gromov-Hausdorff distance {d_{GH}} between {X} and {f(X)} is at most {\epsilon/2}. This follows from a convenient characterization of {d_{GH}}: it is equal to {\frac12 \inf_R \textrm{dis}\,(R)} where the infimum is taken over all subsets {R\subset X\times Y} that project surjectively onto each factor, and {\textrm{dis}\,(R) = \sup \{|\,|xx'|-|yy'|\,| \colon xRy, \ x'Ry' \} }.


Since small-scale features are ignored, it is not quite natural to talk about rough isometries being surjective. A natural concept for them is {\epsilon}-surjectivity, meaning that every point of {Y} is within distance {\le \epsilon} of {f(X)}. When this happens, we can conclude that {d_{GH}(X,Y)\le 3\epsilon/2}, because the Hausdorff distance between {Y} and {f(X)} is at most {\epsilon}.

Recalling that an isometry of a compact metric space {X} into {X} is automatically onto, it is natural to ask whether {\epsilon}-rough isometries of {X} into {X} are {\epsilon}-surjective. This, however, turns out to be false in general.


First example, a finite space: {X=\{0,1,2,\dots,n\}} with the metric {d(i,j) = i+j} (when {i\ne j}). Consider the backward shift {f(i)=i-1} (and {f(0)=0}). By construction, this is a rough isometry with {\epsilon=2}. Yet, the point {n} is within distance {n} from {f(X) = \{0,1,2,\dots, n-1\}}.

The above metric can be thought of a the distance one has to travel from {i} to {j} with a mandatory visit to {0}. This makes it similar to the second example.


Second example, a geodesic space: {X} is the graph shown below, a subset of {\mathbb R^2} with the intrinsic (path) metric, i.e., the length of shortest path within the set.

Comb space
Comb space

Define {f(x,y) = ((x-1)^+, (y-1)^+)} where {{\,}^+} means the positive part. Again, this is a {2}-rough isometry. The omitted part, shown in red below, contains a ball of radius {5} centered at {(5,5)}. Of course, {5} can be replaced by any positive integer.

Omitted part in red
Omitted part in red

Both of these examples are intrinsically high-dimensional: they cannot be accurately embedded into {\mathbb R^d} for low {d} (using the restriction metric rather than the path metric). This raises a question: does there exist {C=C(d)} such that for every compact subset {K\subset \mathbb R^d}, equipped with the restriction metric, every {\epsilon}-rough isometry {f\colon K\rightarrow K} is {C(d)\epsilon}-surjective?

Graphical convergence

The space of continuous functions (say, on {[0,1]}) is usually given the uniform metric: {d_u(f,g) = \sup_{x}|f(x)-g(x)|}. In other words, this is the smallest number {\rho} such that from every point of the graph of one function we can jump to the graph of another function by moving at distance {\le \rho} in vertical direction.

Uniform metric: vertical distances
Uniform metric is based on vertical distances

Now that I put it this way, why don’t we drop “in vertical direction”? It’ll still be a metric, namely the Hausdorff metric between the graphs of {f} and {g}. It’s natural to call it the graphical metric, denoted {d_g}; from the definition it’s clear that {d_g\le d_u}.

Graphical metric: Hausdorff distance
Weird metric, weird color

Some interesting things happen when the space of continuous functions is equipped with {d_g}. For one thing, it’s no longer a complete space: the sequence {f_n(x)=x^n} is Cauchy in {d_g} but has no limit.

Sequence of x^n
Sequence of x^n

On the other hand, the bounded subsets of {(C[0,1],d_g) } are totally bounded. Indeed, given {M>0} and {\epsilon>0} we can cover the rectangle {[0,1]\times [-M,M]} with a rectangular mesh of diameter at most {\epsilon}. For each function with {\sup|f|\le M}, consider the set of rectangles that its graph visits. There are finitely many possibilities for the sets of visited rectangles. And two functions that share the same set of visited rectangles are at graphical distance at most {\epsilon} from each other.

Boxy approximation to the graph
Boxy approximation to the graph

Thus, the completion of {C[0,1]} in the graphical metric should be a nice space: bounded closed subsets will be compact in it. What is this completion, concretely?

Here is a partial answer: if {(f_n)} is a graphically Cauchy sequence, its limit is the compact set {\{(x,y): g(x)\le y\le h(x)\}} where

\displaystyle    g(x) = \inf_{x_n\rightarrow x} \liminf f_n(x_n)

(the infimum taken over all sequences converging to {x}), and

\displaystyle    h(x) = \sup_{x_n\rightarrow x} \limsup f_n(x_n)

It’s not hard to see that {g} is upper semicontinuous and {h} is lower semicontinuous. Of course, {g\le h}. It seems that the set of such pairs {(g,h)} indeed describes the graphical completion of continuous functions.

For example, the limit of {f_n(x)=x^n} is described by the pair {g(x)\equiv 0}, {h(x)=\chi_{\{1\}}}. Geometrically, it’s a broken line with horizontal and vertical segments

For another example, the limit of {f_n(x)=\sin^2 nx} is described by the pair {g(x)\equiv 0}, {h(x)\equiv 1}. Geometrically, it’s a square.

Stripey sines
Stripey sines

Sequences and nets

According to the (Banach-)Alaoglu theorem, for any Banach space X the closed unit ball of X^* is compact in the weak* topology (the weakest/coarsest topology that makes all evaluation functionals f\mapsto f(x) continuous).

For example, take X=\ell_{\infty}. The dual space \ell_{\infty}^* contains an isometric copy of \ell_1 because \ell_\infty^*=\ell_1^{**}. The sequence x_n=n^{-1}(e_1+\dots+e_n), where e_n are the standard basis vectors, is contained in the unit sphere of \ell_1. Should (x_n) have a weak*-convergent subsequence? Maybe it should, but it does not.

Indeed, take any subsequence (x_{n_k}). If necessary, choose a further subsequence so that n_{k+1}\ge 3n_k for all k. Define

\displaystyle y=\sum_{k}(-1)^k\sum_{n_{k-1}< j\le n_k} e_j

where we set n_0=0. Two things to notice here: (1) y\in \ell_{\infty}; and (2) at least 2/3 of the coefficients of e_j, 1\le j\le n_k, have the sign (-1)^k. Hence, \langle x_{n_k},y\rangle\le -1/3 when k is odd and \langle x_{n_k},y\rangle\ge 1/3 when k is even. This shows that (x_{n_k}) does not converge in the weak* topology.

The above does not contradict the Banach-Alaoglu theorem. Since \ell_\infty is not separable, the weak* topology on the unit ball of its dual is not metrizable. The compactness can be stated in terms of nets instead of sequences: every bounded net in X^* has a convergent subnet. In particular, the sequence (x_n) has a convergent subnet (which is not a sequence). I personally find subnets a recipe for erroneous arguments. So I prefer to say: the infinite set \{x_n\} has a cluster point x; namely, every neighborhood of x contains some x_n. You can use the reverse inclusion of neighborhoods to define a subnet, but I’d rather not to. Everything we want to know about x can be easily proved from the cluster point definition. For example,

  • \|x\|_{X^*}=1
  • \langle x, 1_{\infty}\rangle =1 where 1_{\infty} stands for the \ell_\infty vector with all coordinates 1.
  • \langle x, y\rangle = \langle x, Sy\rangle where S is the shift operator on \ell_\infty