Scaling and oscillation

A function {f\colon \mathbb R\rightarrow\mathbb R} can be much larger than its derivative. Take the constant function {f(x)=10^{10}}, for example. Or {f(x)=10^{10}+\sin x} to make it nonconstant. But if one subtracts the average (mean) from {f}, the residual is nicely estimated by the derivative:

\displaystyle    \frac{1}{b-a}\int_a^b |f(x)-\overline{f}|\,dx \le \frac12 \int_a^b |f'(x)|\,dx    \ \ \ \ \ (1)

Here {\overline{f}} is the mean of {f} on {[a,b]}, namely {\overline{f}=\frac{1}{b-a}\int_a^b f(t)\,dt}. Indeed, what’s the worst that could happen? Something like this:

Deviation from the mean
Deviation from the mean

Here {H} is at most the integral of {|f'|}, and the shaded area is at most {\frac12 H(b-a)}. This is what the inequality (1) says.

An appealing feature of (1) is that it is scale-invariant. For example, if we change the variable {u=2x}, both sides remain the same. The derivative will be greater by the factor of {2}, but will be integrated over the shorter interval. And on the left we have averages upon averages, which do not change under scaling.

What happens in higher dimensions? Let’s stick to two dimensions and consider a smooth function {f\colon\mathbb R^2\rightarrow\mathbb R}. Instead of an interval we now have a square, denoted {Q}. It makes sense to denote squares by {Q}, because it’s natural to call a square a cube, and “Q” is the first letter of “cube”. Oh wait, it isn’t. Moving on…

The quantity {b-a} was the length of interval of integration. Now we will use the area of {Q}, denoted {|Q|}. And {\overline{f}=\frac{1}{|Q|}\iint_Q f} is now the mean value of {f} on {Q}. At first glance one might conjecture the following version of (1):

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le C \int_Q |\nabla f(x,y)|\,dx\,dy   \ \ \ \ \ (2)

But this can’t be true because of inconsistent scaling. The left side of (2) is scale-invariant as before. The right side is not. If we shrink the cube by factor of {2}, the gradient {|\nabla f|} will go up by {2}, but the area goes down by {4}. This suggests that the correct inequality should be

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le C \left(\int_Q |\nabla f(x,y)|^2\,dx\,dy\right)^{1/2}   \ \ \ \ \ (3)

We need the square root so that the right side of (3) scales correctly with {f}: to first power.

And here is the proof. Let {f(*,y)} denote {f} averaged over {x}. Applying (1) to every horizontal segment in {Q}, we obtain

\displaystyle    \frac{1}{h}\iint_Q |f(x,y)-f(*,y)|\,dx\,dy \le \frac12 \int_Q |f_x(x,y)|\,dx\,dy    \ \ \ \ \ (4)

where {h} is the sidelength of {Q}. Now work with {f(*,y)}, using (1) along vertical segments:

\displaystyle  \frac{1}{h}\iint_Q |f(*,y)-f(*,*)|\,dx\,dy \le \frac12 \int_Q |f_y(*,y)|\,dx\,dy    \ \ \ \ \ (5)

Of course, {f(*,*)} is the same as {\overline{f}}. The derivative on the right can be estimated: the derivative of average does not exceed the average of the absolute value of derivative. To keep estimates clean, simply estimate both partial derivatives by {|\nabla f|}. From (4) and (5) taken together it follows that

\displaystyle    \frac{1}{h}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le \int_Q |\nabla f(x,y)|\,dx\,dy    \ \ \ \ \ (6)

This is an interesting result (a form of the Poincar\'{e} inequality), but in the present form it’s not scale-invariant. Remember that we expect the square of the gradient on the right. Cauchy-Schwarz to the rescue:

\displaystyle    \int_Q 1\cdot |\nabla f| \le \left( \int_Q 1 \right)^{1/2} \left( \int_Q |\nabla f|^2 \right)^{1/2}

The first factor on the right is simply {h}. Move it to the left and we are done:

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le \left(\int_Q |\nabla f(x,y)|^2\,dx\,dy\right)^{1/2}   \ \ \ \ \ (7)

In higher dimensions we would of course have {n} instead of {2}. Which is one of many reasons why analysis in two dimensions is special: {L^n} is a Hilbert space only when {n=2}.

The left side of (7) is the mean oscillation of {f} on the square {Q}. The integrability of {|\nabla f|^n} in {n} dimensions ensures that {f} is a function of bounded mean oscillation, known as BMO. Actually, it is even in the smaller space VMO because the right side of (7) tends to zero as the square shrinks. But it need not be continuous or even bounded: for {f(x)=\log\log |x| } the integral of {|\nabla f|^n} converges in a neighborhood of the origin (just barely, thanks to {\log^n |x|} in the denominator). This is unlike the one-dimensional situation where the integrability of {|f'|} guarantees that the function is bounded.

Vanishing Mean Oscillation

The oscillation of function f on a set A is the diameter of f(A). For real-valued functions it’s written as \mathrm{osc}_A\,f=\sup_A f - \inf_A f. The relation to uniform continuity is immediate: f is uniformly continuous if and only if \mathrm{osc}_I\,f is small (<\epsilon) on all sufficiently short intervals (|I|<\delta). Here I'm following the traditional definition of uniform continuity here, even though it’s wrong.

In statistics, the difference (maximum-minimum) is called the range of a set of data values, and is one of common measures of variation. It is a crude one, though, and is obviously influenced by outliers. There are better ones, such as standard deviation: subtract the mean, square each term, then average them, and take square root… you know the drill. What happens if we use the standard deviation in the definition of uniform continuity?

We get the space of functions of vanishing mean oscillation, known as VMO. To avoid technicalities, I will consider only the functions f\colon \mathbb R\to\mathbb R that vanish outside of the interval [-1,1]. Like this one:

Hat function

Given an interval I\subset \mathbb R, let f_I denote the mean of f on I, namely \frac{1}{|I|}\int_I f. By definition, f is in VMO if and only if |f - f_I|_I is small (<\epsilon) on all sufficiently short intervals (|I|<\delta).

Wait, what happened to squaring and taking a root afterwards? Turns out they are not really necessary: the above definition is equivalent to the one with standard deviation \sqrt{(|f - f_I|^2)_I}. In fact, VMO functions are integrable to any power 1\le p<\infty, and any such p can be used in the definition (a consequence of the John-Nirenberg lemma). Both squared and unsquared versions have their advantages:

  • |f - f_I|_I is finite for any locally integrable function; we don’t need to know in advance that f is square integrable.
  • \sqrt{(|f - f_I|^2)_I} is better suited for precise computation, since it’s just \sqrt{ (f^2)_I-f_I^2}.

For example, taking I=[-1,1] for the hat function, we get f_I=1/2 and (f^2)_I=1/3. So, \sqrt{(|f - f_I|^2)_I} = 1/\sqrt{12}, and this is as large as it gets. The supremum of mean oscillation over all intervals is called the BMO norm. The space BMO, naturally, includes all functions with finite BMO norm. VMO is a closed subspace of BMO — the closed span of continuous functions (uniformly continuous and vanishing at infinity, to be precise). In particular, VMO is separable while BMO is not.

The definition of VMO clearly rules out jump discontinuities such as x/|x|. Yet, it allows some discontinuous functions and even unbounded ones. The standard example of an unbounded function in VMO is the double logarithm:

f(x) = log(1-log|x|)

If this function does not look unbounded, it’s because it reaches level y=5 only when |x|\approx 10^{-64}, which is well below the resolution of your screen. Even if you have Retina display.

The classical paper by Sarason which introduced VMO has 381 Google Scholar citations as of now. Will this post count as 382nd? We’ll see…


1. Sarason, Donald. “Functions of vanishing mean oscillation.” Trans. Amer. Math. Soc. 207, no. 2 (1975), 391-405.