Measuring nonlinearity and reducing it

How to measure the nonlinearity of a function ${f\colon I\to \mathbb R}$ where ${I\subset \mathbb R}$ is an interval? A natural way is to consider the smallest possible deviation from a line ${y=kx+b}$, that is ${\inf_{k, b}\sup_{x\in I}|f(x)-kx-b|}$. It turns out to be convenient to divide this by ${|I|}$, the length of the interval ${I}$. So, let ${\displaystyle NL(f;I) = \frac{1}{|I|} \inf_{k, b}\sup_{x\in I}|f(x)-kx-b|}$. (This is similar to β-numbers of Peter Jones, except the deviation from a line is measured only in the vertical direction.)

Relation with derivatives

The definition of derivative immediately implies that if ${f'(a)}$ exists, then ${NL(f;I)\to 0}$ as ${I}$ shrinks to ${a}$ (that is, gets smaller while containing ${a}$). A typical construction of a nowhere differentiable continuous function is based on making ${NL(f;I)}$ bounded from below; it is enough to do this for dyadic intervals, and that can be done by adding wiggly terms like ${2^{-n}\mathrm{dist}\,(x, 2^{-n}\mathbb Z)}$: see the blancmange curve.

The converse is false: if ${NL(f; I)\to 0}$ as ${I}$ shrinks to ${a}$, the function ${f}$ may still fail to be differentiable at ${a}$. The reason is that the affine approximation may have different slopes at different scales. An example is ${f(x)=x \sin \sqrt{-\log |x|}}$ in a neighborhood of ${0}$. Consider a small interval ${[-\delta, \delta]}$. The line ${y = kx}$ with ${k=\sin\sqrt{-\log \delta}}$ is a good approximation to ${f}$ because ${f(x)/x\approx k}$ on most of the interval except for a very small part near ${0}$, and on that part ${f}$ is very close to ${0}$ anyway.

Why the root of logarithm? Because ${\sin \log |x|}$ has a fixed amount of change on a fixed proportion of  ${[-\delta, \delta]}$, independently of ${\delta}$. We need a function slower than the logarithm, so that as ${\delta}$ decreases, there is a smaller amount of change on a larger part of the interval ${[-\delta, \delta]}$.

Nonlinearity of Lipschitz functions

Suppose ${f}$ is a Lipschitz function, that is, there exists a constant ${L}$ such that ${|f(x)-f(y)|\le L|x-y|}$ for all ${x, y\in I}$. It’s easy to see that ${NL(f;I)\le L/2}$, by taking the mid-range approximation ${y=\frac12 (\max_I f + \min_I f)}$. But the sharp bound is ${NL(f;I)\le L/4}$ whose proof is not as trivial. The sharpness is shown by ${f(x)=|x|}$ with ${I=[-1,1]}$.

Proof. Let ${k}$ be the slope of the linear function that agrees with ${f}$ at the endpoints of ${I}$. Subtracting this linear function from ${f}$ gives us a Lipschitz function ${g}$ such that ${-L-k\le g'\le L-k}$ and ${\int_I g'= 0}$. Let ${A = \int_I (g')^+ = \int_I (g')^-}$. Chebyshev’s inequality gives lower bounds for the measures of the sets ${g'>0}$ and ${g'<0}$: namely, ${|g'>0|\ge A/(L-k)}$ and ${|g'<0|\le A/(L+k)}$. By adding these, we find that ${|I| \ge 2LA/(L^2-k^2)\ge 2A/L}$. Since ${\max _I g - \min_I g \le A}$, the mid-range approximation to ${g}$ has error at most ${A/2 \le |I|L/4}$. Hence ${NL(f; I) = NL(g; I) \le L/4}$.

Reducing nonlinearity

Turns out, the graph of every Lipschitz function has relatively large almost-flat pieces.  That is, there are subintervals of nontrivial size where the measure of nonlinearity is much smaller than the Lipschitz constant. This result is a special (one-dimensional) case of Theorem 2.3 in Affine approximation of Lipschitz functions and nonlinear quotients by Bates, Johnson, Lindenstrauss, Preiss, and Schechtman.

Theorem AA (for “affine approximation”): For every ${\epsilon>0}$ there exists ${\delta>0}$ with the following property. If ${f\colon I\to \mathbb R}$ is an ${L}$-Lipschitz function, then there exists an interval ${J\subset I}$ with ${|J|\ge \delta |I|}$ and ${NL(f; J)\le \epsilon L}$.

Theorem AA should not be confused with Rademacher’s theorem which says that a Lipschitz function is differentiable almost everywhere. The point here is a lower bound on the size of the interval ${J}$. Differentiability does not provide that. In fact, if we knew that ${f}$ is smooth, or even a polynomial, the proof of Theorem AA would not become any easier.

Proof of Theorem AA

We may assume ${I=[-1, 1]}$ and ${L=1}$. For ${t\in (0, 2]}$ let ${L(t) = \sup \{|f(x)-f(y)|/|x-y| \colon x, y\in I, \ |x-y|\ge t\}}$. That is, ${L(t)}$ is the restricted Lipschitz constant, one that applies for distances at least ${t}$. It is a decreasing function of ${t}$, and ${L(0+)=1}$.

Note that ${|f(-1)-f(1)|\le 2L(1)}$ and that every value of ${f}$ is within ${2L(1)}$ of either ${f(-1)}$ or ${f(1)}$. Hence, the oscillation of ${f}$ on ${I}$ is at most ${6L(1)}$. If ${L(1) \le \epsilon/3}$, then the constant mid-range approximation on ${I}$ gives the desired conclusion, with ${J=I}$. From now on ${L(1) > \epsilon/3}$.

The sequence ${L_k = L(4^{-k})}$ is increasing toward ${L(0+)=1}$, which implies ${L_{k+1}\le (1+\epsilon) L_k}$ for some ${k}$. Pick an interval ${[a, b]\subset I}$ that realizes ${L_k}$, that is ${b-a\ge 4^{-k}}$ and ${|f(b)-f(a)| = 4^{-k}L_k}$. Without loss of generality ${f(b)>f(a)}$ (otherwise consider ${-f}$). Let ${J = [(3a+b)/4, (a+3b)/4]}$ be the middle half of ${[a. b]}$. Since each point of ${J}$ is within distance ${\ge 4^{-k-1}}$ of both ${a}$ and ${b}$, it follows that ${\displaystyle f(b) + L_{k+1}(x-b) \le f(x) \le f(a) + L_{k+1}(x-a) }$ for all ${x \in J}$.

So far we have pinched ${f}$ between two affine functions of equal slope. Let us consider their difference:
${\displaystyle (f(a) + L_{k+1}(x-a)) - (f(b) + L_{k+1}(x-b)) = (L_{k+1}-L_k) (b-a)}$. Recall that ${L_{k+1}\le (1+\epsilon) L_k}$, which gives a bound of ${\epsilon L_k(b-a) \le 2\epsilon L |J|}$ for the difference. Approximating ${f}$ by the average of the two affine functions we conclude that ${NL(f;J)\le \epsilon L}$ as required.

It remains to consider the size of ${J}$, about which we only know ${|J|\ge 4^{-k}/2}$ so far. Naturally, we want to take the smallest ${k}$ such that ${L_{k+1}\le (1+\epsilon) L_k}$ holds. Let ${m}$ be this value; then ${L_m > (1+\epsilon)^{m} L_0}$. Here ${L_m\le 1}$ and ${L_0 = L(1)> \epsilon/3 }$. The conclusion is that ${(1+\epsilon)^m < 3/\epsilon}$, hence ${m< \log(3/\epsilon)/\log(1+\epsilon)}$. This finally yields ${\displaystyle \delta = 4^{-\log(3/\epsilon)/\log(1+\epsilon)}/2}$ as an acceptable choice, completing the proof of Theorem AA.

A large amount of work has been done on quantifying ${\delta}$ in various contexts; for example Heat flow and quantitative differentiation by Hytönen and Naor.

This site uses Akismet to reduce spam. Learn how your comment data is processed.