Lightness, hyperspace, and lower oscillation bounds

When does a map {f\colon X\to Y} admit a lower “anti-continuity” bound like {d_Y(f(a),f(b))\ge \lambda(d_X(a,b))} for some function {\lambda\colon (0,\infty)\to (0, \infty)} and for all {a\ne b}? The answer is easy: {f} must be injective and its inverse must be uniformly continuous. End of story.

But recalling what happened with diameters of connected sets last time, let’s focus on the inequality {\textrm{diam}\, f(E)\ge \lambda (\textrm{diam}\, E)} for connected subsets {E\subset X}. If such {\lambda} exists, the map f has the LOB property, for “lower oscillation bound” (oscillation being the diameter of image). The LOB property does not require {f} to be injective. On the real line, {f(x)=|x|} satisfies it with {\lambda(\delta)=\delta/2}: since it simply folds the line, the worst that can happen to the diameter of an interval is to be halved. Similarly, {f(x)=x^2} admits a lower oscillation bound {\lambda(\delta) = (\delta/2)^2}. This one decays faster than linear at 0, indicating some amount of squeezing going on. One may check that every polynomial has the LOB property as well.

On the other hand, the exponential function {f(x)=e^x} does not have the LOB property, since {\textrm{diam}\, f([x,x+1])} tends to {0} as {x\to-\infty}. No surprise there; we know from the relation of continuity and uniform continuity that things like that happen on a non-compact domain.

Also, a function that is constant on some nontrivial connected set will obviously fail LOB. In topology, a mapping is called light if the preimage of every point is totally disconnected, which is exactly the same as not being constant on any nontrivial connected set. So, lightness is necessary for LOB, but not sufficient as {e^x} shows.

Theorem 1: Every continuous light map {f\colon X\to Y} with compact domain {X} admits a lower oscillation bound.

Proof. Suppose not. Then there exists {\epsilon>0} and a sequence of connected subsets {E_n\subset X} such that {\textrm{diam}\, E_n\ge \epsilon} and {\textrm{diam}\, f(E_n)\to 0}. We can assume {E_n} compact, otherwise replace it with its closure {\overline{E_n}} which we can because {f(\overline{E_n})\subset \overline{f(E_n)}}.

The space of nonempty compact subsets of {X} is called the hyperspace of {X}; when equipped with the Hausdorff metric, it becomes a compact metric space itself. Pass to a convergent subsequence, still denoted {\{E_n\}}. Its limit {E} has diameter at least {\epsilon}, because diameter is a continuous function on the hyperspace. Finally, using the uniform continuity of {f} we get {\textrm{diam}\, f(E) = \lim \textrm{diam}\, f(E_n) = 0}, contradicting the lightness of {f}. {\quad \Box}

Capture
A homeomorphism lacking LOB

Here is another example to demonstrate the importance of compactness (not just boundedness) and continuity: on the domain {X = \{(x,y)\colon 0 < x < 1, 0 < y < 1\}} define {f(x,y)=(x,xy)}. This is a homeomorphism, the inverse being {(u,v)\mapsto (u, v/u)}. Yet it fails LOB because the image of line segment {\{x\}\times (0,1)} has diameter {x}, which can be arbitrarily close to 0. So, the lack of compactness hurts. Extending {f} to the closed square in a discontinuous way, say by letting it be the identity map on the boundary, we see that continuity is also needed, although it’s slightly non-intuitive that one needs continuity (essentially an upper oscillation bound) to estimate oscillation from below.

All that said, on a bounded interval of real line we need neither compactness nor continuity.

Theorem 2: If {I\subset \mathbb R} is a bounded interval, then every light map {f\colon I\to Y} admits a lower oscillation bound.

Proof. Following the proof of Theorem 1, consider a sequence of intervals {(a_n, b_n)} such that {b_n-a_n\ge \epsilon} and {\textrm{diam}\, f((a_n,b_n))\to 0}. There is no loss of generality in considering open intervals, since it can only make the diameter of the image smaller. Also WLOG, suppose {a_n\to a} and {b_n\to b}; this uses the boundedness of {I}. Consider a nontrivial closed interval {[c,d]\subset (a,b)}. For all sufficiently large {n} we have {[c,d]\subset (a_n,b_n)}, which implies {\textrm{diam}\, f([c,d])\le \textrm{diam}\, f((a_n,b_n))\to 0}. Thus {f} is constant on {[c,d]}, a contradiction. {\quad \Box}

The property that distinguishes real line here is that nontrivial connected sets have nonempty interior. The same works on the circle and various tree-like spaces, but fails for spaces that don’t look one-dimensional.

Graphical convergence

The space of continuous functions (say, on {[0,1]}) is usually given the uniform metric: {d_u(f,g) = \sup_{x}|f(x)-g(x)|}. In other words, this is the smallest number {\rho} such that from every point of the graph of one function we can jump to the graph of another function by moving at distance {\le \rho} in vertical direction.

Uniform metric: vertical distances
Uniform metric is based on vertical distances

Now that I put it this way, why don’t we drop “in vertical direction”? It’ll still be a metric, namely the Hausdorff metric between the graphs of {f} and {g}. It’s natural to call it the graphical metric, denoted {d_g}; from the definition it’s clear that {d_g\le d_u}.

Graphical metric: Hausdorff distance
Weird metric, weird color

Some interesting things happen when the space of continuous functions is equipped with {d_g}. For one thing, it’s no longer a complete space: the sequence {f_n(x)=x^n} is Cauchy in {d_g} but has no limit.

Sequence of x^n
Sequence of x^n

On the other hand, the bounded subsets of {(C[0,1],d_g) } are totally bounded. Indeed, given {M>0} and {\epsilon>0} we can cover the rectangle {[0,1]\times [-M,M]} with a rectangular mesh of diameter at most {\epsilon}. For each function with {\sup|f|\le M}, consider the set of rectangles that its graph visits. There are finitely many possibilities for the sets of visited rectangles. And two functions that share the same set of visited rectangles are at graphical distance at most {\epsilon} from each other.

Boxy approximation to the graph
Boxy approximation to the graph

Thus, the completion of {C[0,1]} in the graphical metric should be a nice space: bounded closed subsets will be compact in it. What is this completion, concretely?

Here is a partial answer: if {(f_n)} is a graphically Cauchy sequence, its limit is the compact set {\{(x,y): g(x)\le y\le h(x)\}} where

\displaystyle    g(x) = \inf_{x_n\rightarrow x} \liminf f_n(x_n)

(the infimum taken over all sequences converging to {x}), and

\displaystyle    h(x) = \sup_{x_n\rightarrow x} \limsup f_n(x_n)

It’s not hard to see that {g} is upper semicontinuous and {h} is lower semicontinuous. Of course, {g\le h}. It seems that the set of such pairs {(g,h)} indeed describes the graphical completion of continuous functions.

For example, the limit of {f_n(x)=x^n} is described by the pair {g(x)\equiv 0}, {h(x)=\chi_{\{1\}}}. Geometrically, it’s a broken line with horizontal and vertical segments

For another example, the limit of {f_n(x)=\sin^2 nx} is described by the pair {g(x)\equiv 0}, {h(x)\equiv 1}. Geometrically, it’s a square.

Stripey sines
Stripey sines

Squarish polynomials

For some reason I wanted to construct polynomials approximating this piecewise constant function {f}:

How to approximate this with polynomials?
So square

Of course approximation cannot be uniform, since the function is not continuous. But it can be achieved in the sense of convergence of graphs in the Hausdorff metric: their limit should be the “graph” shown above, with the vertical line included. In concrete terms, this means for every {\epsilon>0} there is {N} such that for {n\ge N} the polynomial {p_n} satisfies

\displaystyle  |p_n-f|\le \epsilon\quad \text{ on }\ [0,2]\setminus [1-\epsilon,1+\epsilon]

and also

\displaystyle  -\epsilon\le  p_n  \le 1+\epsilon\quad \text{ on }\ [1-\epsilon,1+\epsilon]

How to get such {p_n} explicitly? I started with the functions {f_m(x) = \exp(-x^m)} when {m} is large. The idea is that as {m\rightarrow\infty}, the limit of {\exp(-x^m)} is what is wanted: {1} when {x<1}, {0} when {x>1}. Also, for each {m} there is a Taylor polynomial {T_{m,n}} that approximates {f_m} uniformly on {[0,2]}. Since the Taylor series is alternating, it is not hard to find suitable {n}. Let’s shoot for {\epsilon=0.01} in the Taylor remainder and see where this leads:

  • Degree {7} polynomial for {\exp(-x)}
  • Degree {26} polynomial for {\exp(-x^2)}
  • Degree {69} polynomial for {\exp(-x^3)}
  • Degree {180} polynomial for {\exp(-x^4)}
  • Degree {440} polynomial for {\exp(-x^5)}

The results are unimpressive, though:

Taylor polynomials of exp(-x^m) are not so square
Taylor polynomials of exp(-x^m) are not so square

To get within {0.01} of the desired square-ness, we need {\exp(-1.01^m)<0.01}. This means {m\ge 463}. Then, to have the Taylor remainder bounded by {0.01} at {x=2}, we need {2^{463n}/n! < 0.01}. Instead of messing with Stirling’s formula, just observe that {2^{463n}/n!} does not even begin to decrease until {n} exceeds {2^{463}}, which is more than {10^{139}}. That’s a … high degree polynomial. I would not try to ask a computer algebra system to plot it.

Bernstein polynomials turn out to work better. On the interval {[0,2]} they are given by

\displaystyle    p_n(x) = 2^{-n} \sum_{k=0}^n f(2k/n) \binom{n}{k} x^k (2-x)^{n-k}

To avoid dealing with {f(1)}, it is better to use odd degrees. For comparison, I used the same or smaller degrees as above: {7, 25, 69, 179, 439}.

Squarish Bernstein polynomials
Squarish Bernstein polynomials

Looks good. But I don’t know of a way to estimate the degree of Bernstein polynomial required to obtain Hausdorff distance less than a given {\epsilon} (say, {0.01}) from the square function.

Walking dogs and comparing sticks

Then he dropped two in at once, and leant over the bridge to see which of them would come out first; and one of them did; but as they were both the same size, he didn’t know if it was the one which he wanted to win, or the other one. — A. A. Milne

It’s useful to have a way of measuring how different two sticks (or fir cones) are in size, shape, and their position in a river. Yes, we have the Hausdorff distance {d_H} between sets, but it does not take into account the orientation of sticks. And it performs poorly when the sticks are broken: the Hausdorff distance between these blue and red curves does not capture the disparity of their shapes:

Small Hausdorff distance, totally different curves
Small Hausdorff distance, totally different curves

Indeed, {d_H} is relatively small here, because from any point of red curve one can easily jump to some point of the blue curve, and the other way around. However, this kind of measurement completely ignores the fact that curves are meant to be traveled along in a continuous, monotone way.

There is a concept of distance that is better suited for comparing curves: the Fréchet distance {d_F}. Wikipedia gives this (folklore) description:

Imagine a dog walking along one curve and the dog’s owner walking along the other curve, connected by a leash. Both walk continuously along their respective curve from the prescribed start point to the prescribed end point of the curve. Both may vary their speed, and even stop, at arbitrary positions and for arbitrarily long. However, neither can backtrack. The Fréchet distance between the two curves is the length of the shortest leash that is sufficient for traversing both curves in this manner.


To get started, let’s compute this distance for two oriented line segments {AB} and {CD}. The length of the leash must be at least {|AC|} in order to begin the walk, and at least {|BD|} to finish. So,

\displaystyle    d_F(AB,CD) \ge \max(|AC|, |BD|)

In fact, equality holds here. In order to bound {d_F} from above, we just need one parametrization of the segments. Take the parametrization proportional to length:

\displaystyle    P=(1-t)A+tB,\quad Q=(1-t)C+tD

Then {|PQ|^2} is the quadratic polynomial of {t}. Without doing any computations, we can say the coefficient of {t^2} is nonnegative, because {|PQ|^2} cannot be negative for any {t\in\mathbb R}. Hence, this polynomial is a convex function of {t}, which implies that its maximum on the interval {[0,1]} is attained at an endpoints. And the endpoints we already considered. (By the way, this proof works in every CAT(0) metric space.)


In general, the Fréchet distance is not realized by constant-speed parametrization. Consider these two curves, each with a long detour:

Symmetry breaking
Symmetry breaking

It would be impractical for the dog and the owner to go on the detour at the same time. One should go first while the other waits for his/her/its turn. In particular, we see symmetry breaking here: even for two perfectly symmetric curves, the Fréchet-optimal parametrizations would not be symmetric to each other.


It is not obvious from the definition of {d_F} whether it is a metric; as usual, it’s the triangle inequality that is suspect. However, {d_F} indeed satisfies the triangle inequality. To prove this, we should probably formalize the definition of {d_F}. Given two continuous maps {f,g} from {[0,1]} into {\mathbb R^2} (or any metric space), define

\displaystyle    d_F(f,g) = \inf_{\phi,\psi}\max_{[0,1]} |f\circ \phi-g\circ \psi|

where {\phi} and {\psi} range over all nondecreasing functions from {[0,1]} onto itself. Actually, we can require {\phi} and {\psi} to be strictly increasing (it only takes a small perturbation), which in dog/owner terms means they are not allowed to stop, but can mosey along as slowly as they want. Then we don’t need both {\phi} and {\psi}, since

\displaystyle    \max_{[0,1]} |f\circ \phi-g\circ \psi| = \max_{[0,1]} |f -g\circ (\psi\circ \phi^{-1})|

So, given {f,g,h} we can pick {\phi} such that {\max_{[0,1]} |f-g\circ \phi|} is within {\epsilon} of {d_F(f,g)}; then pick {\psi} such that {\max_{[0,1]} | g\circ \phi - h\circ \psi | } is within {\epsilon} of {d_F(g,h)}. Then

\displaystyle    d_F(f,h)\le \max_{[0,1]} |f- g\circ \phi|+\max_{[0,1]} |g\circ \phi - h\circ \psi|   \le d_F(f,g)+d_F(g,h)+2\epsilon

and the triangle inequality follows.

Continuity of circumcenter and circumradius

For a bounded set {A} on the plane (or in any Euclidean space) one can define the circumcenter {c(A)} and circumradius {r(A)} as follows: {r(A)} is the smallest radius of a closed disk containing {A}, and {c(A)} is the center of such a disk. (Other terms in use: Chebyshev center and Chebyshev radius.)

Smallest disk covering the points
Smallest disk covering the points

The fact that {c(A)} is well-defined may not be obvious: what if there are multiple disks of radius {r(A)} that contain {A}? To investigate, introduce the farthest distance function {f_A(x) = \sup_{a\in A} |x-a|}. By definition, {c(A)} is where {f_A} attains its minimum. The function {f_A} is convex, being the supremum of a family of convex functions. However, that does not guarantee the uniqueness of its minimum. We have two issues here:

  • {x\mapsto |x-a|} is not strictly convex
  • the supremum of an infinite family of strictly convex functions can fail to be strictly convex (like {\sup_{1<p<2} x^p = x} on the interval {[0,1]}).

The first issue is resolved by squaring {f_A}. Indeed, {f_A^2} attains its minimum at the same place where {f_A} does, and {f_A(x)^2 = \sup_{a\in A} |x-a|^2} where each term {|x-a|^2} is strictly convex.

Also, we don’t want to lose strict convexity when taking the supremum over {a\in A}. For this purpose, we must replace strict inequality by something more robust. The appropriate substitute is strong convexity: a function {f} is strongly convex if there is {\lambda>0} such that {f(x)-\lambda |x|^2 } is convex. Let’s say that {f} is {\lambda}-convex in this case.

Since {|x-a|^2-|x|^2 = -2\langle x,a\rangle + |a|^2} is a convex (in fact linear) function of {x}, we see that {|x-a|^2} is {1}-convex. This property passes to supremum: subtracting {|x|^2} from the supremum is the same as subtracting it from each term. Strong convexity implies strict convexity and with it, the uniqueness of the minimum point. So, {c(A)}, the minimum of {f_A^2}, is uniquely defined. (Finding it in practice may be difficult. The spherical version of this problem is considered in Covering points with caps).


Having established uniqueness, it is natural to ask about stability, or more precisely, the continuity of {c(A)} and {r(A)} with respect to {A}. Introduce the Hausdorff distance {d_{\mathcal H}} on the set of bounded subsets. By definition, {d_{\mathcal H}(A,B)\le \delta} if {A} is contained in {\delta}-neighborhood of {B}, and {B} is contained in {\delta}-neighborhood of {A}. It is easy to see that {r(B)\le r(A) + d_{\mathcal H}(A,B)}, and therefore

\displaystyle  |r(A)-r(B)|\le d_{\mathcal H}(A,B)

In words, the circumradius is a {1}-Lipschitz function of the set.

What about the circumcenter? If the set {A} is shifted by {d} units in some direction, the circumcenter moves by the same amount. So it may appear that it should also be a {1}-Lipschitz function of {A}. But this is false.

Observe (or recall from middle-school geometry) that the circumcenter of a right triangle is the midpoint of its hypotenuse:

Circumcenter of right triangle
Circumcenter of a right triangle

Consider two right triangles:

  • Vertices {(-1,0), (1,0), (1,\epsilon)}. The right angle is at {(1,0)}, and the circumvcenter is the midpoint of opposite side: {(0,\epsilon/2)}.
  • Vertices {(-1,0), (1,0), (\sqrt{1-\epsilon^2},\epsilon)}. The right angle is at
    {(\sqrt{1-\epsilon^2},\epsilon)} and the circumcenter is at {(0,0)}.

The Hausdorff distance between these two triangles is merely {1-\sqrt{1-\epsilon^2} < \epsilon^2}, yet the distance between their circumcenters is {\epsilon/2}. So, Lipschitz continuity fails, and the most we can hope for is Hölder continuity with exponent {1/2}.

And indeed, the circumcenter is locally {1/2}-Hölder continuous. To prove this, suppose {|c(A)-c(B)|\ge \epsilon}. The {1}-convexity of {f_A^2} implies that

\displaystyle  f_A(c(B))^2 \ge f_A(c(A))^2+|c(A)-c(B)|^2 \ge r(A)^2 + \epsilon^2

On the other hand, since {|f_A-f_B|\le d_{\mathcal H}(A,B)} everywhere,

\displaystyle  f_A(c(B))\le f_B(c(B)) + d_{\mathcal H}(A,B) = r(B) + d_{\mathcal H}(A,B) \le r(A) + 2 d_{\mathcal H}(A,B)

Putting things together,

\displaystyle  d_{\mathcal H}(A,B) \ge \frac12 (\sqrt{r(A)^2 + \epsilon^2} - r(A))  = \frac{\epsilon^2}{2( \sqrt{r(A)^2 + \epsilon^2} + r(A) )}

Thus, as long as {r(A)} remains bounded above, we have an inequality of the form {d_{\mathcal H}(A,B) \ge c\, \epsilon^2}, which is exactly {1/2}-Hölder continuity.


Remark. The proof uses no information about {\mathbb R^n} other than the {1}-convexity of the squared distance function. As such, it applies to every CAT(0) space.

This ain’t like dusting crops, boy

The hyperspace is a set of sets equipped with a metric or at least with a topology. Given a metric space X, let \mathcal{H}(X) be the set of all nonempty closed subsets of X with the Hausdorff metric: d(A,B)<r if no matter where you are in one set, you can jump into the other by traveling less than r. So, the distance between letters S and U is the length of the longer green arrow.

The requirement of closedness ensures d(A,B)>0 for A\ne B. If X is unbounded, then d(A,B) will be infinite for some pairs of sets, which is natural: the hyperspace contains infinitely many parallel universes which do not interact, being at infinite distance from one another.

Imagine that

Every continuous surjection f\colon X\to Y has an inverse f^{-1}\colon Y\to \mathcal{H}(X) defined in the obvious way: f^{-1}(y)=f^{-1}(y). Yay ambiguous notation! The subset of \mathcal{H}(X) that consists of the singletons is naturally identified with X, so for bijective maps we recover the usual inverse.

Exercise: what conditions on f guarantee that f^{-1} is (a) continuous; (b) Lipschitz? After the previous post it should not be surprising that

  • Even if f is open and continuous, f^{-1} may be discontinuous.
  • If f is a Lipschitz quotient, then f^{-1} is Lipschitz.

Proofs are not like dusting crops—they are easier.