Functions of bounded or vanishing nonlinearity

A natural way to measure the nonlinearity of a function ${f\colon I\to \mathbb R}$, where ${I\subset \mathbb R}$ is an interval, is the quantity ${\displaystyle NL(f;I) = \frac{1}{|I|} \inf_{k, r}\sup_{x\in I}|f(x)-kx-r|}$ which expresses the deviation of ${f}$ from a line, divided by the size of interval ${I}$. This quantity was considered in Measuring nonlinearity and reducing it.

Let us write ${NL(f) = \sup_I NL(f; I)}$ where the supremum is taken over all intervals ${I}$ in the domain of definition of ${f}$. What functions have finite ${NL(f)}$? Every Lipschitz function does, as was noted previously: ${NL(f) \le \frac14 \mathrm{Lip}\,(f)}$. But the converse is not true: for example, ${NL(f)}$ is finite for the non-Lipschitz function ${f(x)=x\log|x|}$, where ${f(0)=0}$.

The function looks nice, but ${f(x)/x}$ is clearly unbounded. What makes ${NL(f)}$ finite? Note the scale-invariant feature of NL: for any ${t>0}$ the scaled function ${f_t(x) = t^{-1}f(tx)}$ satisfies ${NL(f_t)=NL(f)}$, and more precisely ${NL(f; tI) = NL(f_t; I)}$. On the other hand, our function has a curious scaling property ${f_t(x) = f(x) + x\log t}$ where the linear term ${x\log t}$ does not affect NL at all. This means that it suffices to bound ${NL(f; I)}$ for intervals ${I}$ of unit length. The plot of ${f}$ shows that not much deviation from the secant line happens on such intervals, so I will not bother with estimates.

The class of functions ${f}$ with ${NL(f)<\infty}$ is precisely the Zygmund class ${\Lambda^*}$ defined by the property ${|f(x-h)-2f(x)+f(x+h)| \le Mh}$ with ${M}$ independent of ${x, h}$. Indeed, since the second-order difference ${f(x-h)-2f(x)+f(x+h)}$ is unchanged by adding an affine function to ${f}$, we can replace ${f}$ by ${f(x)-kx-r}$ with suitable ${k, r}$ and use the triangle inequality to obtain

${\displaystyle |f(x-h)-2f(x)+f(x+h)| \le 4 \sup_I |f(x)-kx-r| = 8h\; NL(f; I)}$

where ${I=[x-h, x+h]}$. Conversely, suppose that ${f\in \Lambda^*}$. Given an interval ${I=[a, b]}$, subtract an affine function from ${f}$ to ensure ${f(a)=f(b)=0}$. We may assume ${|f|}$ attains its maximum on ${I}$ at a point ${\xi \le (a + b)/2}$. Applying the definition of ${\Lambda^*}$ with ${x = \xi}$ and ${h = \xi - a}$, we get ${|f(2\xi - a) - 2f(\xi )| \le M h}$, hence ${|f(\xi )| \le Mh}$. This shows ${NL(f; I)\le M/2}$. The upshot is that ${NL(f)}$ is equivalent to the Zygmund seminorm of ${f}$ (i.e., the smallest possible M in the definition of ${\Lambda^*}$).

A function in ${\Lambda^*}$ may be nowhere differentiable: it is not difficult to construct ${f}$ so that ${NL(f;I)}$ is bounded between two positive constants. The situation is different for the small Zygmund class ${\lambda^*}$ whose definition requires that ${NL(f; I)\to 0}$ as ${|I|\to 0}$. A function ${f \in \lambda^*}$ is differentiable at any point of local extremum, since the condition ${NL(f; I)\to 0}$ forces its graph to be tangent to the horizontal line through the point of extremum. Given any two points ${a, b}$ we can subtract the secant line from ${f}$ and thus create a point of local extremum between ${a }$ and ${b}$. It follows that ${f}$ is differentiable on a dense set of points.

The definitions of ${\Lambda^* }$ and ${\lambda^*}$ apply equally well to complex-valued functions, or vector-valued functions. But there is a notable difference in the differentiability properties: a complex-valued function of class ${\lambda^*}$ may be nowhere differentiable [Ullrich, 1993]. Put another way, two real-valued functions in ${\lambda^*}$ need not have a common point of differentiability. This sort of thing does not often happen in analysis, where the existence of points of “good” behavior is usually based on the prevalence of such points in some sense, and therefore a finite collection of functions is expected to have common points of good behavior.

The key lemma in Ullrich’s paper provides a real-valued VMO function that has infinite limit at every point of a given ${F_\sigma}$ set ${E}$ of measure zero. Although this is a result of real analysis, the proof is complex-analytic in nature and involves a conformal mapping. It would be interesting to see a “real” proof of this lemma. Since the antiderivative of a VMO function belongs to ${\lambda^* }$, the lemma yields a   function ${v \in \lambda^*}$ that is not differentiable at any point of ${E}$. Consider the lacunary series ${u(t) = \sum_{n=1}^\infty a_n 2^{-n} \cos (2^n t)}$. One theorem of Zygmund shows that ${u \in \lambda^*}$ when ${a_n\to 0}$, while another shows that ${u}$ is almost nowhere differentiable when ${\sum a_n^2 = \infty}$. It remains to apply the lemma to get a function ${v\in \lambda^*}$ that is not differentiable at any point where ${u}$ is differentiable.

f(f(x)) = 4x

There are plenty of continuous functions ${f}$ such that ${f(f(x)) \equiv x}$. Besides the trivial examples ${f(x)=x}$ and ${f(x)=-x}$, one can take any equation ${F(x,y)=0}$ that is symmetric in ${x,y}$ and has a unique solution for one variable in terms of the other. For example: ${x^3+y^3-1 =0 }$ leads to ${f(x) = (1-x^3)^{1/3}}$.

I can’t think of an explicit example that is also differentiable, but implicitly one can be defined by ${x^3+y^3+x+y=1}$, for example. In principle, this can be made explicit by solving the cubic equation for ${x}$, but I’d rather not.

At the time of writing, I could not think of any diffeomorphism ${f\colon \mathbb R \rightarrow \mathbb R}$ such that both ${f}$ and ${f^{-1}}$ have a nice explicit form. But Carl Feynman pointed out in a comment that the hyperbolic sine ${f(x)= \sinh x = (e^x-e^{-x})/2}$ has the inverse ${f^{-1}(x) = \log(x+\sqrt{x^2+1})}$ which certainly qualifies as nice and explicit.

Let’s change the problem to ${f(f(x))=4x}$. There are still two trivial, linear solutions: ${f(x)=2x}$ and ${f(x)=-2x}$. Any other? The new equation imposes stronger constraints on ${f}$: for example, it implies

$\displaystyle f(4x) = f(f(f(x)) = 4f(x)$

But here is a reasonably simple nonlinear continuous example: define

$\displaystyle f(x) = \begin{cases} 2^x,\quad & 1\le x\le 2 \\ 4\log_2 x,\quad &2\le x\le 4 \end{cases}$

and extend to all ${x}$ by ${f(\pm 4x) = \pm 4f(x)}$. The result looks like this, with the line ${y=2x}$ drawn in red for comparison.

To check that this works, notice that ${2^x}$ maps ${[1,2]}$ to ${[2,4]}$, which the function ${4\log_2 x}$ maps to ${[4,8]}$, and of course ${4\log _2 2^x = 4x}$.

From the plot, this function may appear to be differentiable for ${x\ne 0}$, but it is not. For example, at ${x=2}$ the left derivative is ${4\ln 2 \approx 2.8}$ while the right derivative is ${2/\ln 2 \approx 2.9}$.
This could be fixed by picking another building block instead of ${2^x}$, but not worth the effort. After all, the property ${f(4x)=4f(x)}$ is inconsistent with differentiability at ${0}$ as long as ${f}$ is nonlinear.

The plots were made in Sage, with the function f define thus:

def f(x):
if x == 0:
return 0
xa = abs(x)
m = math.floor(math.log(xa, 2))
if m % 2 == 0:
return math.copysign(2**(m + xa/2**m), x)
else:
return math.copysign(2**(m+1) * (math.log(xa, 2)-m+1), x)

3 calculus 3 examples

The function ${f(x,y)=\dfrac{xy}{x^2+y^2}}$ might be the world’s most popular example demonstrating that the existence of partial derivatives does not imply differentiability.

But in my opinion, it is somewhat extreme and potentially confusing, with discontinuity added to the mix. I prefer

$\displaystyle f(x,y)=\frac{xy}{\sqrt{x^2+y^2}}$

pictured below.

This one is continuous. In fact, it is Lipschitz continuous because the first-order partials ${f_x}$ and ${f_y}$ are bounded. The restriction of ${f}$ to the line ${y=x}$ is ${f(x,y)=x^2/\sqrt{2x^2} = |x|/\sqrt{2}}$, which is a familiar single-variable example of a nondifferentiable function.

To unify the analysis of such examples, let ${f(x,y)=xy\,g(x^2+y^2)}$. Then

$\displaystyle f_x = y g+ 2x^2yg'$

With ${g(t)=t^{-1/2}}$, where ${t=x^2+y^2}$, we get

$\displaystyle f_x = O(t^{1/2}) t^{-1/2} + O(t^{3/2})t^{-3/2} = O(1),\quad t\rightarrow 0$

By symmetry, ${f_y}$ is bounded as well.

My favorite example from this family is more subtle, with a deceptively smooth graph:

The formula is

$\displaystyle f(x,y)=xy\sqrt{-\log(x^2+y^2)}$

Since ${f}$ decays almost quadratically near the origin, it is differentiable at ${(0,0)}$. Indeed, the first order derivatives ${f_x}$ and ${f_y}$ are continuous, as one may observe using ${g(t)=\sqrt{-\log t}}$ above.

And the second-order partials ${f_{xx}}$ and ${f_{yy}}$ are also continuous, if just barely. Indeed,

$\displaystyle f_{xx} = 6xy g'+ 4x^3yg''$

Since the growth of ${g}$ is sub-logarithmic, it follows that ${g'(t)=o(t^{-1})}$ and ${g''(t)=o(t^{-2})}$. Hence,

$\displaystyle f_{xx} = O(t) o(t^{-1}) + O(t^{2}) o(t^{-2}) = o(1),\quad t\rightarrow 0$

So, ${f_{xx}(x,y)\rightarrow 0 = f_{xx}(0,0)}$ as ${(x,y)\rightarrow (0,0)}$. Even though the graph of ${f_{xx}}$ looks quite similar to the first example in this post, this one is continuous. Can’t trust these plots.

By symmetry, ${f_{yy}}$ is continuous as well.

But the mixed partial ${f_{xy}}$ does not exist at ${(0,0)}$, and tends to ${+\infty}$ as ${(x,y)\rightarrow (0,0)}$. The first claim is obvious once we notice that ${f_x(0,y)= y\, g(y^2)}$ and ${g}$ blows up at ${0}$. The second one follows from

$\displaystyle f_{xy} = g + 2(x^2+y^2) g' + 4x^2y^2 g''$

where ${g\rightarrow\infty}$ while the other two terms tend to zero, as in the estimate for ${f_{xx}}$. Here is the graph of ${f_{xy}}$.

This example is significant for the theory of partial differential equations, because it shows that a solution of the Poisson equation ${f_{xx}+f_{yy} = h }$ with continuous ${h}$ may fail to be in ${C^2}$ (twice differentiable, with continuous derivatives). The expected gain of two derivatives does not materialize here.

The situation is rectified by upgrading the continuity condition to Hölder continuity. Then ${f}$ indeed gains two derivatives: if ${h\in C^\alpha}$ for some ${\alpha\in (0,1)}$, then ${f\in C^{2,\alpha}}$. In particular, the Hölder continuity of ${f_{xx} }$ and ${f_{yy} }$ implies the Hölder continuity of ${f_{xy} }$.

How much multivariable calculus can be done along curves?

Working with functions of two (or more) real variables is significantly harder than with functions of one variable. It is tempting to reduce the complexity by considering the restrictions of a multivariate function to lines passing through a point of interest. But standard counterexamples of Calculus III, such as $\displaystyle f(x,y)=\frac{xy^2}{x^2+y^4}$, $f(0,0)=0$, show that lines are not enough: this function $f$ is not continuous at $(0,0)$, even though its restriction to every line is continuous. It takes a parabola, such as $x=y^2$, to detect the discontinuity.

Things look brighter if we do allow parabolas and other curves into consideration.

Continuity: $f$ is continuous at $a\in\mathbb R^n$ if and only if $f\circ \gamma$ is continuous at $0$ for every map $\gamma\colon \mathbb R\to \mathbb R^n$ such that $\gamma(0)=a$ and $\gamma$ is continuous at $0$.

Proof: If $f$ is not continuous, we can find a sequence $a_n\to a$ such that $f(a_n)\not\to f(a)$, and run $\gamma$ through these points, for example in a piecewise linear way.

Having been successful at the level of continuity, we can hope for a similar differentiability result:

Differentiability, take 1: $f$ is differentiable at $a\in\mathbb R^n$ if and only if $f\circ \gamma$ is differentiable at $0$ for every map $\gamma\colon \mathbb R\to \mathbb R^n$ such that $\gamma(0)=a$ and $\gamma'(0)$ exists.

Alas, this is false. Take a continuous function $g\colon S^{n-1}\to \mathbb R$ which preserves antipodes (i.e., $g(-x)=-g(x)$) and extend it to $\mathbb R^n$ via $f(tx)=tg(x)$. Consider $\gamma$ as above, with $a\in \mathbb R^n$ being the origin. If $\gamma'(0)=0$ when $(f\circ \gamma)'(0)=0$ because $f$ is Lipschitz. If $\gamma'(0)\ne 0$, we can rescale the parameter so that $\gamma'(0)$ is a unit vector. It is easy to see that $\displaystyle \frac{f(\gamma(t))}{t}= \frac{f(\gamma(t))}{|\gamma(t)|\mathrm{sign}\,t} \frac{|\gamma(t)|}{|t|}\to g(\gamma'(0))$, hence $f\circ \gamma$ is differentiable at $0$. However, $f$ is not differentiable at $a$ unless $g$ happens to be the restriction of a linear map.

I can’t think of a way to detect the nonlinearity of directional derivative by probing $f$ with curves. Apparently, it has to be imposed artificially.

Differentiability, take 2: $f$ is differentiable at $a\in\mathbb R^n$ if and only if there exists a linear map $T$ such that $(f\circ \gamma)'(0)=T\gamma'(0)$ for every map $\gamma\colon \mathbb R\to \mathbb R^n$ such that $\gamma(0)=a$ and $\gamma'(0)$ exists.

Note that the only viable candidate for $T$ is given by partial derivatives, and those are computed along lines. Thus, we are able determine the first-order differentiability of $f$ using only the tools of single-variable calculus.

Proof goes along the same lines as for continuity, with extra care taken in forming $\gamma$.

1. We may assume that $T=0$ by subtracting $Tx$ from our function. Also assume $a=0$.
2. Suppose $f$ is not differentiable at $0$. Pick a sequence $v_k\to 0$ such that $|f(v_k)|\ge \epsilon |v_k|$ for all $k$.
3. Passing to a subsequence, make sure that $v_k/|v_k|$ tends to a unit vector $v$, and also that $|v_{k+1}|\le 2^{-k}|v_k|$.
4. Connect the points $v_k$ by line segments. Parametrize this piecewise-linear curve by arc length.
5. The distance from $v_{k+1}$ and $v_k$ is bounded by $|v_{k+1}|+|v_k|\le (1+2^{-k})|v_k|$, the triangle inequality. Hence, the total length between $0$ and $v_k$ does not exceed $\sum_{m\ge k}(1+2^{-m})|v_m| \le (1+c_k)|v_k|$, where $c_k\to 0$ as $k\to \infty$.
6. By 3, 4, and 5 the constructed curve $\gamma$ has a one-sided derivative when it reaches 0. Shift the parameter so that $\gamma(0)=0$. Extend $\gamma$ linearly to get two-sided derivative at $0$.
7. By assumption, $|f(\gamma (t))|/|t|\to 0$ as $t\to 0$. This contradicts 2 and 5.

Can one go further and detect the second order differentiability by probing $f$ with paths? But the second derivative is not a pointwise asymptotic condition: it requires the first derivative to exist in a neighborhood. The pointwise second derivative might be possible to detect, but I’m not sure… and it’s getting late.