How to measure the nonlinearity of a function where is an interval? A natural way is to consider the smallest possible deviation from a line , that is . It turns out to be convenient to divide this by , the length of the interval . So, let . (This is similar to β-numbers of Peter Jones, except the deviation from a line is measured only in the vertical direction.)

### Relation with derivatives

The definition of derivative immediately implies that if exists, then as shrinks to (that is, gets smaller while containing ). A typical construction of a nowhere differentiable continuous function is based on making bounded from below; it is enough to do this for dyadic intervals, and that can be done by adding wiggly terms like : see the blancmange curve.

The converse is false: if as shrinks to , the function may still fail to be differentiable at . The reason is that the affine approximation may have different slopes at different scales. An example is in a neighborhood of . Consider a small interval . The line with is a good approximation to because on most of the interval except for a very small part near , and on that part is very close to anyway.

Why the root of logarithm? Because has a fixed amount of change on a fixed proportion of , independently of . We need a function slower than the logarithm, so that as decreases, there is a smaller amount of change on a larger part of the interval .

### Nonlinearity of Lipschitz functions

Suppose is a Lipschitz function, that is, there exists a constant such that for all . It’s easy to see that , by taking the mid-range approximation . But the sharp bound is whose proof is not as trivial. The sharpness is shown by with .

**Proof**. Let be the slope of the linear function that agrees with at the endpoints of . Subtracting this linear function from gives us a Lipschitz function such that and . Let . Chebyshev’s inequality gives lower bounds for the measures of the sets and : namely, and . By adding these, we find that . Since , the mid-range approximation to has error at most . Hence .

### Reducing nonlinearity

Turns out, the graph of every Lipschitz function has relatively large almost-flat pieces. That is, there are subintervals of nontrivial size where the measure of nonlinearity is much smaller than the Lipschitz constant. This result is a special (one-dimensional) case of Theorem 2.3 in *Affine approximation of Lipschitz functions and nonlinear quotients* by Bates, Johnson, Lindenstrauss, Preiss, and Schechtman.

**Theorem AA** (for “affine approximation”): For every there exists with the following property. If is an -Lipschitz function, then there exists an interval with and .

Theorem AA should not be confused with Rademacher’s theorem which says that a Lipschitz function is differentiable almost everywhere. The point here is a lower bound on the size of the interval . Differentiability does not provide that. In fact, if we knew that is smooth, or even a polynomial, the proof of Theorem AA would not become any easier.

### Proof of Theorem AA

We may assume and . For let . That is, is the restricted Lipschitz constant, one that applies for distances at least . It is a decreasing function of , and .

Note that and that every value of is within of either or . Hence, the oscillation of on is at most . If , then the constant mid-range approximation on gives the desired conclusion, with . From now on .

The sequence is increasing toward , which implies for some . Pick an interval that realizes , that is and . Without loss of generality (otherwise consider ). Let be the middle half of . Since each point of is within distance of both and , it follows that for all .

So far we have pinched between two affine functions of equal slope. Let us consider their difference:

. Recall that , which gives a bound of for the difference. Approximating by the average of the two affine functions we conclude that as required.

It remains to consider the size of , about which we only know so far. Naturally, we want to take the smallest such that holds. Let be this value; then . Here and . The conclusion is that , hence . This finally yields as an acceptable choice, completing the proof of Theorem AA.

A large amount of work has been done on quantifying in various contexts; for example *Heat flow and quantitative differentiation *by Hytönen and Naor.