A function can be much larger than its derivative. Take the constant function , for example. Or to make it nonconstant. But if one subtracts the average (mean) from , the residual is nicely estimated by the derivative:

Here is the mean of on , namely . Indeed, what’s the worst that could happen? Something like this:

Here is at most the integral of , and the shaded area is at most . This is what the inequality (1) says.

An appealing feature of (1) is that it is *scale-invariant*. For example, if we change the variable , both sides remain the same. The derivative will be greater by the factor of , but will be integrated over the shorter interval. And on the left we have averages upon averages, which do not change under scaling.

What happens in higher dimensions? Let’s stick to two dimensions and consider a smooth function . Instead of an interval we now have a square, denoted . It makes sense to denote squares by , because it’s natural to call a square a cube, and “Q” is the first letter of “cube”. Oh wait, it isn’t. Moving on…

The quantity was the length of interval of integration. Now we will use the area of , denoted . And is now the mean value of on . At first glance one might conjecture the following version of (1):

But this can’t be true because of inconsistent scaling. The left side of (2) is scale-invariant as before. The right side is not. If we shrink the cube by factor of , the gradient will go up by , but the area goes down by . This suggests that the correct inequality should be

We need the square root so that the right side of (3) scales correctly with : to first power.

And here is the proof. Let denote averaged over . Applying (1) to every horizontal segment in , we obtain

where is the sidelength of . Now work with , using (1) along vertical segments:

Of course, is the same as . The derivative on the right can be estimated: the derivative of average does not exceed the average of the absolute value of derivative. To keep estimates clean, simply estimate both partial derivatives by . From (4) and (5) taken together it follows that

This is an interesting result (a form of the Poincar\'{e} inequality), but in the present form it’s not scale-invariant. Remember that we expect the square of the gradient on the right. Cauchy-Schwarz to the rescue:

The first factor on the right is simply . Move it to the left and we are done:

In higher dimensions we would of course have instead of . Which is one of many reasons why analysis in two dimensions is special: is a Hilbert space only when .

The left side of (7) is the *mean oscillation* of on the square . The integrability of in dimensions ensures that is a function of *bounded mean oscillation*, known as BMO. Actually, it is even in the smaller space VMO because the right side of (7) tends to zero as the square shrinks. But it need not be continuous or even bounded: for the integral of converges in a neighborhood of the origin (just barely, thanks to in the denominator). This is unlike the one-dimensional situation where the integrability of guarantees that the function is bounded.