The displacement set of nonlinear maps in vector spaces

Given a vector space {V} and a map {f\colon V\to V} (linear or not), consider the displacement set of {f}, denoted {D(f) = \{f(x)-x\colon x\in V\}}. For linear maps this is simply the range of the operator {f-I} and therefore is a subspace.

The essentially nonlinear operations of taking the inverse or composition of maps become almost linear when the displacement set is considered. Specifically, if {f} has an inverse, then {D(f^{-1}) = -D(f)}, which is immediate from the definition. Also, {D(f\circ g)\subset D(f)+D(g)}.

When {V} is a topological vector space, the maps for which {D(f)} has compact closure are of particular interest: these are compact perturbations of the identity, for which degree theory can be developed. The consideration of {D(f)} makes it very clear that if {f} is an invertible compact perturbation of the identity, then {f^{-1}} is in this class as well.

It is also of interest to consider the maps for which {D(f)} is either bounded, or is bounded away from {0}. Neither case can occur for linear operators, so this is essentially nonlinear analysis. In the nonlinear case, the boundedness assumption for linear operators is usually replaced by the Lipschitz condition. Let us say that {f} is {(L, \ell)}-bi-Lipschitz if {\ell\|x-y\|\le \|f(x)-f(y)\|\le L\|x-y\|} for all {x, y} in the domain of {f}.

Brouwer’s fixed point theorem fails in infinite-dimensional Hilbert spaces, but it not yet clear how hard it can fail. The strongest possible counterexample would be a bi-Lipschitz automorphism of the unit ball with displacement bounded away from 0. The existence of such a map is unknown. If it does not exist, that would imply that the unit ball and the unit sphere in the Hilbert space are not bi-Lipschitz equivalent, because the unit sphere does have such an automorphism: {x\mapsto -x}.

Concerning the maps with bounded displacement, here is a theorem from Patrick Biermann’s thesis (Theorem 3.3.2): if {f} is an {(L, \ell)}-bi-Lipschitz map in a Hilbert space, {L/\ell < \pi/\sqrt{8}}, and {f} has bounded displacement, then {f} is onto. The importance of bounded displacement is illustrated by the forward shift map {S(x_1, x_2, \dots) = (0, x_1, x_2, \dots)} for which {L=\ell=1} but surjectivity nonetheless fails.

It would be nice to get rid of the assumption {L/\ell < \pi/\sqrt{8}} in the preceding paragraph. I guess any bi-Lipschitz map with bounded displacement should be surjective, at least in Hilbert spaces, but possibly in general Banach spaces as well.

Measuring nonlinearity and reducing it

How to measure the nonlinearity of a function {f\colon I\to \mathbb R} where {I\subset \mathbb R} is an interval? A natural way is to consider the smallest possible deviation from a line {y=kx+b}, that is {\inf_{k, b}\sup_{x\in I}|f(x)-kx-b|}. It turns out to be convenient to divide this by {|I|}, the length of the interval {I}. So, let {\displaystyle NL(f;I) = \frac{1}{|I|} \inf_{k, b}\sup_{x\in I}|f(x)-kx-b|}. (This is similar to β-numbers of Peter Jones, except the deviation from a line is measured only in the vertical direction.)

NL-def
NL(f; I) is the maximal vertical distance between red and black, divided by the length of I

Relation with derivatives

The definition of derivative immediately implies that if {f'(a)} exists, then {NL(f;I)\to 0} as {I} shrinks to {a} (that is, gets smaller while containing {a}). A typical construction of a nowhere differentiable continuous function is based on making {NL(f;I)} bounded from below; it is enough to do this for dyadic intervals, and that can be done by adding wiggly terms like {2^{-n}\mathrm{dist}\,(x, 2^{-n}\mathbb Z)}: see the blancmange curve.

The converse is false: if {NL(f; I)\to 0} as {I} shrinks to {a}, the function {f} may still fail to be differentiable at {a}. The reason is that the affine approximation may have different slopes at different scales. An example is {f(x)=x \sin \sqrt{-\log |x|}} in a neighborhood of {0}. Consider a small interval {[-\delta, \delta]}. The line {y = kx} with {k=\sin\sqrt{-\log \delta}} is a good approximation to {f} because {f(x)/x\approx k} on most of the interval except for a very small part near {0}, and on that part {f} is very close to {0} anyway.

Why the root of logarithm? Because {\sin \log |x|} has a fixed amount of change on a fixed proportion of  {[-\delta, \delta]}, independently of {\delta}. We need a function slower than the logarithm, so that as {\delta} decreases, there is a smaller amount of change on a larger part of the interval {[-\delta, \delta]}.

Nonlinearity of Lipschitz functions

Suppose {f} is a Lipschitz function, that is, there exists a constant {L} such that {|f(x)-f(y)|\le L|x-y|} for all {x, y\in I}. It’s easy to see that {NL(f;I)\le L/2}, by taking the mid-range approximation {y=\frac12 (\max_I f + \min_I f)}. But the sharp bound is {NL(f;I)\le L/4} whose proof is not as trivial. The sharpness is shown by {f(x)=|x|} with {I=[-1,1]}.

sharpness
With the maximum difference 1/2 and the length of interval 2, we get NL(f; [-1, 1]) = 1/4
Proof. Let {k} be the slope of the linear function that agrees with {f} at the endpoints of {I}. Subtracting this linear function from {f} gives us a Lipschitz function {g} such that {-L-k\le g'\le L-k} and {\int_I g'= 0}. Let {A = \int_I (g')^+ = \int_I (g')^-}. Chebyshev’s inequality gives lower bounds for the measures of the sets {g'>0} and {g'<0}: namely, {|g'>0|\ge A/(L-k)} and {|g'<0|\le A/(L+k)}. By adding these, we find that {|I| \ge 2LA/(L^2-k^2)\ge 2A/L}. Since {\max _I g - \min_I g \le A}, the mid-range approximation to {g} has error at most {A/2 \le |I|L/4}. Hence {NL(f; I) = NL(g; I) \le L/4}.

Reducing nonlinearity

Turns out, the graph of every Lipschitz function has relatively large almost-flat pieces.  That is, there are subintervals of nontrivial size where the measure of nonlinearity is much smaller than the Lipschitz constant. This result is a special (one-dimensional) case of Theorem 2.3 in Affine approximation of Lipschitz functions and nonlinear quotients by Bates, Johnson, Lindenstrauss, Preiss, and Schechtman.

Theorem AA (for “affine approximation”): For every {\epsilon>0} there exists {\delta>0} with the following property. If {f\colon I\to \mathbb R} is an {L}-Lipschitz function, then there exists an interval {J\subset I} with {|J|\ge \delta |I|} and {NL(f; J)\le \epsilon L}.

Theorem AA should not be confused with Rademacher’s theorem which says that a Lipschitz function is differentiable almost everywhere. The point here is a lower bound on the size of the interval {J}. Differentiability does not provide that. In fact, if we knew that {f} is smooth, or even a polynomial, the proof of Theorem AA would not become any easier.

Proof of Theorem AA

We may assume {I=[-1, 1]} and {L=1}. For {t\in (0, 2]} let {L(t) = \sup \{|f(x)-f(y)|/|x-y| \colon x, y\in I, \ |x-y|\ge t\}}. That is, {L(t)} is the restricted Lipschitz constant, one that applies for distances at least {t}. It is a decreasing function of {t}, and {L(0+)=1}.

Note that {|f(-1)-f(1)|\le 2L(1)} and that every value of {f} is within {2L(1)} of either {f(-1)} or {f(1)}. Hence, the oscillation of {f} on {I} is at most {6L(1)}. If {L(1) \le \epsilon/3}, then the constant mid-range approximation on {I} gives the desired conclusion, with {J=I}. From now on {L(1) > \epsilon/3}.

The sequence {L_k = L(4^{-k})} is increasing toward {L(0+)=1}, which implies {L_{k+1}\le (1+\epsilon) L_k} for some {k}. Pick an interval {[a, b]\subset I} that realizes {L_k}, that is {b-a\ge 4^{-k}} and {|f(b)-f(a)| = 4^{-k}L_k}. Without loss of generality {f(b)>f(a)} (otherwise consider {-f}). Let {J = [(3a+b)/4, (a+3b)/4]} be the middle half of {[a. b]}. Since each point of {J} is within distance {\ge 4^{-k-1}} of both {a} and {b}, it follows that {\displaystyle f(b) + L_{k+1}(x-b) \le f(x) \le f(a) + L_{k+1}(x-a) } for all {x \in J}.

proof
The green lines have slope L_{k+1} which is close to the slope L_k of the secant line through (a, f(a)) and (b, f(b)). The graph of f is pinched between these green lines, except near a or b.

So far we have pinched {f} between two affine functions of equal slope. Let us consider their difference:
{\displaystyle (f(a) + L_{k+1}(x-a)) - (f(b) + L_{k+1}(x-b)) = (L_{k+1}-L_k) (b-a)}. Recall that {L_{k+1}\le (1+\epsilon) L_k}, which gives a bound of {\epsilon L_k(b-a) \le 2\epsilon L |J|} for the difference. Approximating {f} by the average of the two affine functions we conclude that {NL(f;J)\le \epsilon L} as required.

It remains to consider the size of {J}, about which we only know {|J|\ge 4^{-k}/2} so far. Naturally, we want to take the smallest {k} such that {L_{k+1}\le (1+\epsilon) L_k} holds. Let {m} be this value; then {L_m > (1+\epsilon)^{m} L_0}. Here {L_m\le 1} and {L_0 = L(1)> \epsilon/3 }. The conclusion is that {(1+\epsilon)^m < 3/\epsilon}, hence {m< \log(3/\epsilon)/\log(1+\epsilon)}. This finally yields {\displaystyle \delta = 4^{-\log(3/\epsilon)/\log(1+\epsilon)}/2} as an acceptable choice, completing the proof of Theorem AA.

A large amount of work has been done on quantifying {\delta} in various contexts; for example Heat flow and quantitative differentiation by Hytönen and Naor.

Noncontracting Jordan curves

A simple closed curve {\Gamma} on the plane can be parameterized by a homeomorphism {f\colon \mathbb T\to \Gamma} in infinitely many ways. It is natural to look for “nice” parameterizations, like nonexpanding ones in the previous post. This time, let us look for noncontracting parameterizations: {|f(a)-f(b)|\ge |a-b|} for all {a, b\in \mathbb T}. Note that Euclidean distance is used here, not arclength. And we still want {f} to be a homeomorphism. Its noncontracting property simply means that the inverse {f^{-1}} is nonexpanding aka 1-Lipschitz.

What are some necessary conditions for the existence of a noncontracting parameterization? We can mimic the three from the earlier post Nonexpanding Jordan curves, with similar proofs:

  1. The curve must have length at least {2\pi}.
  2. The curve must have diameter at least 2.
  3. The curve must enclose some disk of radius 1. (Apply Kirszbraun’s theorem to {f^{-1}} and note that the resulting Lipschitz map G of the plane will attain 0 somewhere, by a topological degree / winding number argument. Any point where G = 0 works as the center of such a disk.)

This time, the 3rd item supersedes both 1 and 2. Yet, the condition it presents is not sufficient for the existence of a noncontracting parameterization. A counterexample is a disk with a “comb-over”:

combover

Indeed, suppose that {g=f^{-1}} is a nonexpanding map from this curve onto the unit circle. Let {1 = A < B < C} be the three points at which the curve meets the positive x-axis. Since every point of the curve is within small distance from its arc AB, it follows that {g(AB)} is a large subarc of {\mathbb T} that covers almost all of the circle. But the same argument applies to the arcs {g(BC)} and {g(CA)}, a contradiction.

No matter how large the enclosed disk is, a tight combover around it can force arbitrarily large values of the Lipschitz constant of {f^{-1}}. Sad!

What about sufficient conditions?

  1. It is sufficient for the curve to be star-shaped with respect to the origin, in addition to enclosing the unit disk. (Equivalent statement: {\Gamma} has a polar equation {r = r(\theta)} in which {r \ge 1}.) Indeed, the nearest-point projection onto a convex set is a nonexpanding map, and projecting {\Gamma} onto the unit circle in this way gives the desired parameterization.
  2. It is sufficient for the curve to have curvature bounded by 1. I am not going into this further, because second derivatives should not be needed to control a Lipschitz constant.

Recall that for nonexpanding parameterizations, the length of the curve was a single quantity that mostly answered the existence question (length at most 4 — yes; length greater than {2\pi} — no). I do not know of such a quantity for the noncontracting case.

Nonexpanding Jordan curves

A simple closed curve {\Gamma} on the plane can be parameterized by a homeomorphism {f\colon \mathbb T\to \Gamma} in infinitely many ways. It is natural to look for “nice” parameterizations: smooth ones, for example. I do not want to require smooth {\Gamma} here, so let us try to find {f} that is nonexpanding, that is {|f(a)-f(b)|\le |a-b|} for all {a, b\in \mathbb T}. Note that Euclidean distance is used here, not arclength.

What are some necessary conditions for the existence of a nonexpanding parameterization?

  1. The curve must have length at most {2\pi}, since nonexpanding maps do not increase length. But this is not sufficient: an equilateral triangle of sidelength {2\pi/3} has no nonexpanding parameterization, despite its length being {2\pi}.
  2. The curve must have diameter at most 2 (which the triangle in item 1 fails). Indeed, nonexpanding maps do not increase the diameter either. However, {\mathrm{diam}\,\Gamma\le 2} is not sufficient either: an equilateral triangle of sidelength {2} has no nonexpanding parameterization, despite its diameter being 2 (and length {6 < 2\pi}).
  3. The curve must be contained in some closed disk of radius 1. This is not as obvious as the previous two items. We need Kirszbraun’s theorem here: any nonexpanding map {f\colon \mathbb T\to \Gamma} extends to a nonexpanding map {F\colon \mathbb R^2\to\mathbb R^2}, and therefore {f(\mathbb T)} is contained in the closed disk of radius 1 centered at {F(0)}. (This property fails for the triangle in item 2.)

The combination of 1 and 3 (with 2 being superseded by 3) still is not sufficient. A counterexample is given by any polygon that has length {2\pi} but is small enough to fit in a unit disk, for example:

crown
Uneasy lies the head that cannot find a nonexpanding parameterization

Indeed, since the length is exactly {2\pi}, a nonexpanding parametrization must have constant speed 1. But mapping a circular arc onto a line segment with speed 1 increases pairwise Euclidean distances, since we are straightening out the arc.

Since I do not see a way to refine the necessary conditions further, let us turn to the sufficient ones.

  1. It is sufficient for {\Gamma} to be a convex curve contained in the unit disk. Indeed, the nearest-point projection onto a convex set is a nonexpanding map, and projecting the unit circle onto the curve in this way gives the desired parameterization.
  2. It is sufficient for the curve to have length at most 4. Indeed, in this case there exists a parameterization {f\colon \mathbb T\to\Gamma} with {|f'|\le 4/(2\pi) = 2/\pi}. For any {a, b\in \mathbb T} the length of the shorter subarc {\gamma} between {a} and {b} is at most {(\pi/2)|a-b|}. Therefore, the length of {f(\gamma)} is at most {|a-b|}, which implies {|f(a)-f(b)|\le |a-b|}.

Clearly, neither of two sufficient conditions is necessary for the existence of a nonexpanding parameterization. But one can consider “length {\le 2\pi} is necessary, length {\le 4} is sufficient” a reasonable resolution of the problem: up to some constant, the length of a curve can decide the existence one way or the other.

Stay tuned for a post on noncontracting parameterizations…

Gromov’s “Hilbert volume in metric spaces I”

Just finished writing a summary of Gromov’s Hilbert Volume in Metric Spaces, Part 1 for Zentralblatt. Not an easy task to summarize such an article, and I essentially limited myself to the introductory part. But at least I contributed a Parseval frame analogy, which is not explicit in the article. I like frames in general, and tight/Parseval frames most of all.

And since the Zentralblatt server for review submission is down at the moment, the outlet for this text defaults to my blog.


Let Df denote the Jacobian matrix of a differentiable map f\colon \mathbb R^n\to\mathbb R^n. One way to quantify the infinitesimal dilation of f is to consider the operator norm \|Df\|. This corresponds to the local form of the Lipschitz constant \mathrm{Lip}\,f=\sup_{a,b} \frac{|f(a)-f(b)|}{|a-b|}, and thus makes sense in general metric spaces. Another natural, and often more convenient way to measure dilation is the Hilbert-Schmidt norm \|Df\|_{HS}. However, the latter does not immediately generalize to metric spaces. The present article developes such a generalization and uses it to derive several known previously results in a unified and elegant way.

Instead of trying to describe the construction in full generality, let us consider the special case of Lipschitz maps f\colon X\mapsto \mathbb R^n, where X is a metric space. Let \mu be a measure on the set \mathcal P of all rank-1 projections, which can be identified on the (n-1)-dimensional projective space. Of particular importance are the measures \mu for which \int_{\mathcal P}p(x)\,d\mu = x for all x\in \mathbb R^n. Such a measure is called an axial partition of unity; a related term in harmonic analysis is a Parseval frame. The L_2-dilation of f with respect to \mu is \|\mathrm{dil}^* f\|_{L_2(\mu)}=\left(\int_{\mathcal P} \mathrm{Lip}^2(p\circ f) \,d\mu(p)\right)^{1/2}. In terms of frames, this definition means that one applies the analysis operator to f and measures the Lipschitz constant of the output. Another approach is to use the synthesis operator: consider all Lipschitz maps \tilde f\colon X\to L_2(\mathcal P,\mu) from which f can be synthesized and define \|\widetilde{\mathrm{dil}}^* f\|_{L_2(\mu)}=\inf_{\tilde f} \left(\int_{\mathcal P} \mathrm{Lip}^2(\tilde f(\cdot,p))\,d\mu(p) \right)^{1/2}.

For every axial partition of unity one has \|\widetilde{\mathrm{dil}}^* f\|_{L_2(\mu)} \le \|\mathrm{dil}^* f\|_{L_2(\mu)} because the composition of analysis and synthesis recovers f. Taking the infimum over all axial partitions of unity \mu yields minimal L_2-dilations \|\min\mathrm{dil}^* f\|_{L_2} and \|\min\widetilde{\mathrm{dil}}^* f\|_{L_2}.

For linear maps between Euclidean spaces the minimal L_2-dilation of either kind is exactly the Hilbert-Schmidt norm. For non-linear maps they need to be localized first, by taking restrictions to small neighborhoods of a point. The concept turns out to be useful, e.g., for proving volume comparison theorems. The author proves an elegant form of F. John’s ellipsoid theorem in terms of \|\min\mathrm{dil}^* f\|_{L_2}, recasts the Burago-Ivanov proof of the Hopf conjecture [Geom. Funct. Anal. 4, No.3, 259-269 (1994; Zbl 0808.53038)] in these new terms, and presents further extensions and applications of his approach.

Injective:Surjective :: Isometric:?

I used this sort of title before, in “Continuous:Lipschitz :: Open:?“, but the topics are related anyway.

In some sense (formal or informal) the following classes of maps are dual to each other.

  • Injective : Surjective
  • Immersion : Submersion
  • Monomorphism : Epimorphism

An isometric embedding of metric space X into a metric space Y is a map f\colon X\to Y such that d_Y(f(a),f(b))=d_X(a,b) for all a,b\in X. This concept belongs to the left column of the table above. What should be its counterpart?

Candidate 1. Observe that a 1-Lipschitz map f\colon X\to Y is an isometric embedding iff it does not factor through any 1-Lipschitz surjection g\colon X\to Z (for any space Z) unless g is an isometric isomorphism. Reversing the order of arrows, we arrive at the following concept:

A 1-Lipschitz map f\colon Y\to X is a metric quotient map if it does not factor through any 1-Lipschitz injection g\colon Z\to X (for any space Z) unless g is an isometric isomorphism.

This can be reformulated as follows: \rho(a,b):=d_{X}(f(a),f(b)) is the greatest pseudo-metric on Y subject to

  1. \rho(a,b)\le d_{Y}(a,b)
  2. \rho(a,b)=0 if f(a)=f(b)

This seems reasonable: f does as little damage as possible, given the structure of its fibers. There is also a natural way to construct \rho for any reasonable fibering of Y: begin by defining \widetilde{d_Y}=d_Y for points in different fibers and 0 otherwise. Then force the triangle inequality by letting \rho(a,b)=\inf \sum_{j=1}^n \widetilde{d_Y}(y_j,y_{j-1}) subject to y_0=a and y_n=b. As long as the fibers stay at positive distance from one another, this \rho will be a metric. The corresponding metric quotient map sends each point of Y onto its fiber.

Here is a simple example, in which a part of an interval is mapped to a point.

These aren't the quotients I'm looking for

However, the above example made me unhappy. The only nontrivial fiber is the interval [1,2]. Both points 0 and 3 belong to trivial fibers, but the distance between them decreases from 3 to 2. This looks like a wrong kind of quotient to me.

Candidate 2 already appeared in my post on Lipschitz quotient, but wasn’t recognized at the time. It could be called (1,1)-Lipschitz quotient, but a better name is available. A map f\colon Y\to X is a submetry if f(B_Y(y,r))=B_X(f(y),r) where the balls are closed (using open balls yields something almost equivalent, but generally weaker). Such f need not be an isometry: consider orthogonal projections in a Euclidean space. It does have to be 1-Lipschitz. The additional property that distinguishes it from general 1-Lipschitz maps is the 2-point lifting property: for every x_0,x_1\in X and every y_0\in f^{-1}(x_0) there is y_1\in f^{-1}(x_1) such that d_{Y}(y_0,y_1)=d_X(x_0,x_1). Incidentally, this shows that x\mapsto f^{-1}(x) is an isometric embedding of X into the hyperspace \mathcal{H}(Y) which I covered earlier (“This ain’t like dusting crops, boy“).

The concept and the name were introduced by В.Н. Берестовский (V.N. Berestovskii) in his paper “Submetries of space-forms of nonnegative curvature” published in 1987 in Siberian Math. J. Among other things, he proved that a submetry between spheres (of any dimension) is an isometry. Of course, there are many submetries of S^{n-1} onto other spaces: take the quotient by a subgroup of O(n), which can be either discrete or continuous. Are there any submetries that are not quotients by isometries?

Yes, there are. I’ll describe a (modified) example given by Berestovskii and Guijarro (2000). Let \mathbb{H} be the hyperbolic plane realized as the upper half-plane with the metric \frac{dx^2+dy^2}{y^2}. Define f\colon \mathbb{H}\to\mathbb{R} by

\displaystyle f(x,y)=\begin{cases} \log y, \qquad x\le 0 \\ \log\frac{x^2+y^2}{y}, \qquad x\ge 0 \end{cases}

Don’t panic; this is just the signed distance function (in the metric of \mathbb H) to the fat green curve below. I also drew two other level sets of f, to the best of my Friday night line-drawing ability.

Weird submetry

To convince yourself that f is a submetry, first consider y\mapsto \log y for which the submetry property is clear (it’s the quotient by horizontal translation), and then note that the inversion in the unit circle exchanges horizontal lines (horocycles at infinity) with horocycles at 0. An interesting feature of this submetry that it is not very smooth: C^{1,1} but not C^2.

This ain’t like dusting crops, boy

The hyperspace is a set of sets equipped with a metric or at least with a topology. Given a metric space X, let \mathcal{H}(X) be the set of all nonempty closed subsets of X with the Hausdorff metric: d(A,B)<r if no matter where you are in one set, you can jump into the other by traveling less than r. So, the distance between letters S and U is the length of the longer green arrow.

The requirement of closedness ensures d(A,B)>0 for A\ne B. If X is unbounded, then d(A,B) will be infinite for some pairs of sets, which is natural: the hyperspace contains infinitely many parallel universes which do not interact, being at infinite distance from one another.

Imagine that

Every continuous surjection f\colon X\to Y has an inverse f^{-1}\colon Y\to \mathcal{H}(X) defined in the obvious way: f^{-1}(y)=f^{-1}(y). Yay ambiguous notation! The subset of \mathcal{H}(X) that consists of the singletons is naturally identified with X, so for bijective maps we recover the usual inverse.

Exercise: what conditions on f guarantee that f^{-1} is (a) continuous; (b) Lipschitz? After the previous post it should not be surprising that

  • Even if f is open and continuous, f^{-1} may be discontinuous.
  • If f is a Lipschitz quotient, then f^{-1} is Lipschitz.

Proofs are not like dusting crops—they are easier.