Lightness, hyperspace, and lower oscillation bounds

When does a map {f\colon X\to Y} admit a lower “anti-continuity” bound like {d_Y(f(a),f(b))\ge \lambda(d_X(a,b))} for some function {\lambda\colon (0,\infty)\to (0, \infty)} and for all {a\ne b}? The answer is easy: {f} must be injective and its inverse must be uniformly continuous. End of story.

But recalling what happened with diameters of connected sets last time, let’s focus on the inequality {\textrm{diam}\, f(E)\ge \lambda (\textrm{diam}\, E)} for connected subsets {E\subset X}. If such {\lambda} exists, the map f has the LOB property, for “lower oscillation bound” (oscillation being the diameter of image). The LOB property does not require {f} to be injective. On the real line, {f(x)=|x|} satisfies it with {\lambda(\delta)=\delta/2}: since it simply folds the line, the worst that can happen to the diameter of an interval is to be halved. Similarly, {f(x)=x^2} admits a lower oscillation bound {\lambda(\delta) = (\delta/2)^2}. This one decays faster than linear at 0, indicating some amount of squeezing going on. One may check that every polynomial has the LOB property as well.

On the other hand, the exponential function {f(x)=e^x} does not have the LOB property, since {\textrm{diam}\, f([x,x+1])} tends to {0} as {x\to-\infty}. No surprise there; we know from the relation of continuity and uniform continuity that things like that happen on a non-compact domain.

Also, a function that is constant on some nontrivial connected set will obviously fail LOB. In topology, a mapping is called light if the preimage of every point is totally disconnected, which is exactly the same as not being constant on any nontrivial connected set. So, lightness is necessary for LOB, but not sufficient as {e^x} shows.

Theorem 1: Every continuous light map {f\colon X\to Y} with compact domain {X} admits a lower oscillation bound.

Proof. Suppose not. Then there exists {\epsilon>0} and a sequence of connected subsets {E_n\subset X} such that {\textrm{diam}\, E_n\ge \epsilon} and {\textrm{diam}\, f(E_n)\to 0}. We can assume {E_n} compact, otherwise replace it with its closure {\overline{E_n}} which we can because {f(\overline{E_n})\subset \overline{f(E_n)}}.

The space of nonempty compact subsets of {X} is called the hyperspace of {X}; when equipped with the Hausdorff metric, it becomes a compact metric space itself. Pass to a convergent subsequence, still denoted {\{E_n\}}. Its limit {E} has diameter at least {\epsilon}, because diameter is a continuous function on the hyperspace. Finally, using the uniform continuity of {f} we get {\textrm{diam}\, f(E) = \lim \textrm{diam}\, f(E_n) = 0}, contradicting the lightness of {f}. {\quad \Box}

Capture
A homeomorphism lacking LOB

Here is another example to demonstrate the importance of compactness (not just boundedness) and continuity: on the domain {X = \{(x,y)\colon 0 < x < 1, 0 < y < 1\}} define {f(x,y)=(x,xy)}. This is a homeomorphism, the inverse being {(u,v)\mapsto (u, v/u)}. Yet it fails LOB because the image of line segment {\{x\}\times (0,1)} has diameter {x}, which can be arbitrarily close to 0. So, the lack of compactness hurts. Extending {f} to the closed square in a discontinuous way, say by letting it be the identity map on the boundary, we see that continuity is also needed, although it’s slightly non-intuitive that one needs continuity (essentially an upper oscillation bound) to estimate oscillation from below.

All that said, on a bounded interval of real line we need neither compactness nor continuity.

Theorem 2: If {I\subset \mathbb R} is a bounded interval, then every light map {f\colon I\to Y} admits a lower oscillation bound.

Proof. Following the proof of Theorem 1, consider a sequence of intervals {(a_n, b_n)} such that {b_n-a_n\ge \epsilon} and {\textrm{diam}\, f((a_n,b_n))\to 0}. There is no loss of generality in considering open intervals, since it can only make the diameter of the image smaller. Also WLOG, suppose {a_n\to a} and {b_n\to b}; this uses the boundedness of {I}. Consider a nontrivial closed interval {[c,d]\subset (a,b)}. For all sufficiently large {n} we have {[c,d]\subset (a_n,b_n)}, which implies {\textrm{diam}\, f([c,d])\le \textrm{diam}\, f((a_n,b_n))\to 0}. Thus {f} is constant on {[c,d]}, a contradiction. {\quad \Box}

The property that distinguishes real line here is that nontrivial connected sets have nonempty interior. The same works on the circle and various tree-like spaces, but fails for spaces that don’t look one-dimensional.

Words that contain UIO, and best-fitting lines

In Calculus I we spend a fair amount of time talking about how nicely the tangent line fits a smooth curve.

Tangent line to exponential function at x=0
Tangent line to exponential function at x=0

But truth be told, it fits only near the point of tangency. How can we find the best approximating line for a function {f} on a given interval?

Best linear fit to the exponential function on [0,1]
Best linear fit to the exponential function on [0,1]

A natural measure of quality of approximation is the maximum deviation of the curve from the line, {E(f;\alpha,\beta) = \max_{[a, b]} |f(x) - \alpha x-\beta|} where {\alpha,\beta} are the coefficients in the line equation, to be determined. We need {\alpha,\beta} that minimize {E(f;\alpha,\beta)}.

The Chebyshev equioscillation theorem is quite useful here. For one thing, its name contains the letter combination uio, which Scrabble players may appreciate. (Can you think of other words with this combination?) Also, its statement does not involve concepts outside of Calculus I. Specialized to the case of linear fit, it says that {\alpha,\beta} are optimal if and only if there exist three numbers { x_1<x_2<x_3} in {[a, b]} such that the deviations {\delta_i = f(x_i) - \alpha x_i-\beta}

  • are equal to {E(f;\alpha,\beta)} in absolute value: {|\delta_i| = E(f;\alpha,\beta)} for {i=1,2,3}
  • have alternating signs: {\delta_1 = -\delta_2 = \delta_3}

Let’s consider what this means. First, {f'(x_i) =\alpha} unless {x_i} is an endpoint of {[a,b]}. Since {x_2} cannot be an endpoint, we have {f'(x_2)=\alpha}.

Furthermore, {f(x) - \alpha x } takes the same value at {x_1} and {x_3}. This gives an equation for {x_2}

\displaystyle    f(x_1)-f'(x_2)x_1 = f(x_3)-f'(x_2) x_3    \qquad \qquad (1)

which can be rewritten in the form resembling the Mean Value Theorem:

\displaystyle    f'(x_2) = \frac{f(x_1)-f(x_3)}{x_1-x_3}   \qquad \qquad (2)

If {f'} is strictly monotone, there can be only one {x_i} with {f'(x_i)=\alpha}. Hence {x_1=a} and {x_3=b} in this case, and we find {x_2} by solving (2). This gives {\alpha = f'(x_2)}, and then {\beta} is not hard to find.

Here is how I did this in Sage:

var('x a b')
f = sin(x)  # or another function
df = f.diff(x)
a = # left endpoint
b = # right endpoint

That was the setup. Now the actual computation:

var('x1 x2 x3')
x1 = a
x3 = b
x2 = find_root(f(x=x1)-df(x=x2)*x1 == f(x=x3)-df(x=x2)*x3, a, b)
alpha = df(x=x2)
beta = 1/2*(f(x=x1)-alpha*x1 + f(x=x2)-alpha*x2)
show(plot(f,a,b)+plot(alpha*x+beta,a,b,color='red'))
Fitting a line to the sine curve
Fitting a line to the sine

However, the algorithm fails to properly fit a line to the sine function on {[0,3\pi/2]}:

Not the best fit, obviously
Not the best fit, obviously

The problem is, {f'(x)=\cos x} is no longer monotone, making it possible for two of {x_i} to be interior points. Recalling the identities for cosine, we see that these points must be symmetric about {x=\pi}. One of {x_i} must still be an endpoint, so either {x_1=a} (and {x_3=2\pi-x_2}) or {x_3=b} (and {x_1=2\pi-x_2}). The first option works:

Best fit on [0,3pi/2]
Best fit on [0,3pi/2]

This same line is also the best fit on the full period {[0,2\pi]}. It passes through {(\pi,0)} and has the slope of {-0.2172336...} which is not a number I can recognize.

On the interval {[0,4\pi]}, all three of the above approaches fail:

Wrong!
Wrong!
Also wrong!
Also wrong!
What were you thinking?
What were you thinking?

Luckily we don’t need a computer in this case. Whenever {|f|} has at least three points of maximum with alternating signs of {f}, the Chebyshev equioscillation theorem implies that the best linear fit is the zero function.

Scaling and oscillation

A function {f\colon \mathbb R\rightarrow\mathbb R} can be much larger than its derivative. Take the constant function {f(x)=10^{10}}, for example. Or {f(x)=10^{10}+\sin x} to make it nonconstant. But if one subtracts the average (mean) from {f}, the residual is nicely estimated by the derivative:

\displaystyle    \frac{1}{b-a}\int_a^b |f(x)-\overline{f}|\,dx \le \frac12 \int_a^b |f'(x)|\,dx    \ \ \ \ \ (1)

Here {\overline{f}} is the mean of {f} on {[a,b]}, namely {\overline{f}=\frac{1}{b-a}\int_a^b f(t)\,dt}. Indeed, what’s the worst that could happen? Something like this:

Deviation from the mean
Deviation from the mean

Here {H} is at most the integral of {|f'|}, and the shaded area is at most {\frac12 H(b-a)}. This is what the inequality (1) says.

An appealing feature of (1) is that it is scale-invariant. For example, if we change the variable {u=2x}, both sides remain the same. The derivative will be greater by the factor of {2}, but will be integrated over the shorter interval. And on the left we have averages upon averages, which do not change under scaling.

What happens in higher dimensions? Let’s stick to two dimensions and consider a smooth function {f\colon\mathbb R^2\rightarrow\mathbb R}. Instead of an interval we now have a square, denoted {Q}. It makes sense to denote squares by {Q}, because it’s natural to call a square a cube, and “Q” is the first letter of “cube”. Oh wait, it isn’t. Moving on…

The quantity {b-a} was the length of interval of integration. Now we will use the area of {Q}, denoted {|Q|}. And {\overline{f}=\frac{1}{|Q|}\iint_Q f} is now the mean value of {f} on {Q}. At first glance one might conjecture the following version of (1):

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le C \int_Q |\nabla f(x,y)|\,dx\,dy   \ \ \ \ \ (2)

But this can’t be true because of inconsistent scaling. The left side of (2) is scale-invariant as before. The right side is not. If we shrink the cube by factor of {2}, the gradient {|\nabla f|} will go up by {2}, but the area goes down by {4}. This suggests that the correct inequality should be

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le C \left(\int_Q |\nabla f(x,y)|^2\,dx\,dy\right)^{1/2}   \ \ \ \ \ (3)

We need the square root so that the right side of (3) scales correctly with {f}: to first power.

And here is the proof. Let {f(*,y)} denote {f} averaged over {x}. Applying (1) to every horizontal segment in {Q}, we obtain

\displaystyle    \frac{1}{h}\iint_Q |f(x,y)-f(*,y)|\,dx\,dy \le \frac12 \int_Q |f_x(x,y)|\,dx\,dy    \ \ \ \ \ (4)

where {h} is the sidelength of {Q}. Now work with {f(*,y)}, using (1) along vertical segments:

\displaystyle  \frac{1}{h}\iint_Q |f(*,y)-f(*,*)|\,dx\,dy \le \frac12 \int_Q |f_y(*,y)|\,dx\,dy    \ \ \ \ \ (5)

Of course, {f(*,*)} is the same as {\overline{f}}. The derivative on the right can be estimated: the derivative of average does not exceed the average of the absolute value of derivative. To keep estimates clean, simply estimate both partial derivatives by {|\nabla f|}. From (4) and (5) taken together it follows that

\displaystyle    \frac{1}{h}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le \int_Q |\nabla f(x,y)|\,dx\,dy    \ \ \ \ \ (6)

This is an interesting result (a form of the Poincar\'{e} inequality), but in the present form it’s not scale-invariant. Remember that we expect the square of the gradient on the right. Cauchy-Schwarz to the rescue:

\displaystyle    \int_Q 1\cdot |\nabla f| \le \left( \int_Q 1 \right)^{1/2} \left( \int_Q |\nabla f|^2 \right)^{1/2}

The first factor on the right is simply {h}. Move it to the left and we are done:

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le \left(\int_Q |\nabla f(x,y)|^2\,dx\,dy\right)^{1/2}   \ \ \ \ \ (7)

In higher dimensions we would of course have {n} instead of {2}. Which is one of many reasons why analysis in two dimensions is special: {L^n} is a Hilbert space only when {n=2}.

The left side of (7) is the mean oscillation of {f} on the square {Q}. The integrability of {|\nabla f|^n} in {n} dimensions ensures that {f} is a function of bounded mean oscillation, known as BMO. Actually, it is even in the smaller space VMO because the right side of (7) tends to zero as the square shrinks. But it need not be continuous or even bounded: for {f(x)=\log\log |x| } the integral of {|\nabla f|^n} converges in a neighborhood of the origin (just barely, thanks to {\log^n |x|} in the denominator). This is unlike the one-dimensional situation where the integrability of {|f'|} guarantees that the function is bounded.