Linear approximation and differentiability

If a function ${f\colon \mathbb R\rightarrow \mathbb R}$ is differentiable at ${a\in \mathbb R}$, then it admits good linear approximation at small scales. Precisely: for every ${\epsilon>0}$ there is ${\delta>0}$ and a linear function ${\ell(x)}$ such that ${|f(x)-\ell(x)|<\epsilon \,\delta}$ for all ${|x|<\delta}$. Having ${\delta}$ multiplied by ${\epsilon}$ means that the deviation from linearity is small compared to the (already small) scale ${\delta}$ on which the function is considered.

For example, this is a linear approximation to ${f(x)=e^x}$ near ${0}$ at scale ${\delta=0.1}$.

As is done on this graph, we can always take ${\ell}$ to be the secant line to the graph of ${f}$ based on the endpoints of the interval of consideration. This is because if ${L}$ is another line for which ${|f(x)-L(x)|<\epsilon \,\delta}$ holds, then ${|\ell-L|\le \epsilon \,\delta}$ at the endpoints, and therefore on all of the interval (the function ${x\mapsto |\ell(x)-L(x)|}$ is convex).

Here is a non-differentiable function that obviously fails the linear approximation property at ${0}$.

(By the way, this post is mostly about me trying out SageMathCloud.) A nice thing about ${f(x)=x\sin \log |x|}$ is self-similarity: ${f(rx)=rf(x)}$ with the similarity factor ${r=e^{2\pi}}$. This implies that no matter how far we zoom in on the graph at ${x=0}$, the graph will not get any closer to linear.

I like ${x\sin \log |x|}$ more than its famous, but not self-similar, cousin ${x\sin(1/x)}$, pictured below.

Interestingly, linear approximation property does not imply differentiability. The function ${f(x)=x\sin \sqrt{-\log|x|}}$ has this property at ${0}$, but it lacks derivative there since ${f(x)/x}$ does not have a limit as ${x\rightarrow 0}$. Here is how it looks.

Let’s look at the scale ${\delta=0.1}$

and compare to the scale ${\delta=0.001}$

Well, that was disappointing. Let’s use math instead. Fix ${\epsilon>0}$ and consider the function ${\phi(\delta)=\sqrt{-\log \delta}-\sqrt{-\log (\epsilon \delta)}}$. Rewriting it as

$\displaystyle \frac{\log \epsilon}{\sqrt{-\log \delta}+\sqrt{-\log (\epsilon \delta)}}$

shows ${\phi(\delta)\rightarrow 0}$ as ${\delta\rightarrow 0}$. Choose ${\delta}$ so that ${|\phi(\delta)|<\epsilon}$ and define ${\ell(x)=x\sqrt{-\log \delta}}$. Then for ${\epsilon \,\delta\le |x|< \delta}$ we have ${|f(x)-\ell(x)|\le \epsilon |x|<\epsilon\,\delta}$, and for ${|x|<\epsilon \delta}$ the trivial bound ${|f(x)-\ell(x)|\le |f(x)|+|\ell(x)|}$ suffices.

Thus, ${f}$ can be well approximated by linear functions near ${0}$; it’s just that the linear function has to depend on the scale on which approximation is made: its slope ${\sqrt{-\log \delta}}$ does not have a limit as ${\delta\to0}$.

The linear approximation property does not become apparent until extremely small scales. Here is ${\delta = 10^{-30}}$.

Words that contain UIO, and best-fitting lines

In Calculus I we spend a fair amount of time talking about how nicely the tangent line fits a smooth curve.

But truth be told, it fits only near the point of tangency. How can we find the best approximating line for a function ${f}$ on a given interval?

A natural measure of quality of approximation is the maximum deviation of the curve from the line, ${E(f;\alpha,\beta) = \max_{[a, b]} |f(x) - \alpha x-\beta|}$ where ${\alpha,\beta}$ are the coefficients in the line equation, to be determined. We need ${\alpha,\beta}$ that minimize ${E(f;\alpha,\beta)}$.

The Chebyshev equioscillation theorem is quite useful here. For one thing, its name contains the letter combination uio, which Scrabble players may appreciate. (Can you think of other words with this combination?) Also, its statement does not involve concepts outside of Calculus I. Specialized to the case of linear fit, it says that ${\alpha,\beta}$ are optimal if and only if there exist three numbers ${ x_1 in ${[a, b]}$ such that the deviations ${\delta_i = f(x_i) - \alpha x_i-\beta}$

• are equal to ${E(f;\alpha,\beta)}$ in absolute value: ${|\delta_i| = E(f;\alpha,\beta)}$ for ${i=1,2,3}$
• have alternating signs: ${\delta_1 = -\delta_2 = \delta_3}$

Let’s consider what this means. First, ${f'(x_i) =\alpha}$ unless ${x_i}$ is an endpoint of ${[a,b]}$. Since ${x_2}$ cannot be an endpoint, we have ${f'(x_2)=\alpha}$.

Furthermore, ${f(x) - \alpha x }$ takes the same value at ${x_1}$ and ${x_3}$. This gives an equation for ${x_2}$

$\displaystyle f(x_1)-f'(x_2)x_1 = f(x_3)-f'(x_2) x_3 \qquad \qquad (1)$

which can be rewritten in the form resembling the Mean Value Theorem:

$\displaystyle f'(x_2) = \frac{f(x_1)-f(x_3)}{x_1-x_3} \qquad \qquad (2)$

If ${f'}$ is strictly monotone, there can be only one ${x_i}$ with ${f'(x_i)=\alpha}$. Hence ${x_1=a}$ and ${x_3=b}$ in this case, and we find ${x_2}$ by solving (2). This gives ${\alpha = f'(x_2)}$, and then ${\beta}$ is not hard to find.

Here is how I did this in Sage:

var('x a b')
f = sin(x)  # or another function
df = f.diff(x)
a = # left endpoint
b = # right endpoint

That was the setup. Now the actual computation:

var('x1 x2 x3')
x1 = a
x3 = b
x2 = find_root(f(x=x1)-df(x=x2)*x1 == f(x=x3)-df(x=x2)*x3, a, b)
alpha = df(x=x2)
beta = 1/2*(f(x=x1)-alpha*x1 + f(x=x2)-alpha*x2)
show(plot(f,a,b)+plot(alpha*x+beta,a,b,color='red'))

However, the algorithm fails to properly fit a line to the sine function on ${[0,3\pi/2]}$:

The problem is, ${f'(x)=\cos x}$ is no longer monotone, making it possible for two of ${x_i}$ to be interior points. Recalling the identities for cosine, we see that these points must be symmetric about ${x=\pi}$. One of ${x_i}$ must still be an endpoint, so either ${x_1=a}$ (and ${x_3=2\pi-x_2}$) or ${x_3=b}$ (and ${x_1=2\pi-x_2}$). The first option works:

This same line is also the best fit on the full period ${[0,2\pi]}$. It passes through ${(\pi,0)}$ and has the slope of ${-0.2172336...}$ which is not a number I can recognize.

On the interval ${[0,4\pi]}$, all three of the above approaches fail:

Luckily we don’t need a computer in this case. Whenever ${|f|}$ has at least three points of maximum with alternating signs of ${f}$, the Chebyshev equioscillation theorem implies that the best linear fit is the zero function.

Partition of the plane by lines

It’s an exercise in induction to prove that $n$ lines in general position divide the plane into latex $M_n=n(n+1)/2+1$ regions, and this number of regions is the maximal possible. Here is a partition that realizes $M_3$:

Three lines can also divide the plane into 6 regions instead of 7: this happens if the triangle collapses to a point, or if two of the lines are made parallel. However, 3 lines can never divide the plane into 5 regions.

Define $\mu_n$ to be the smallest integer such that for any integer $m\in [\mu_n, M_n]$ there is a partition of the plane into $m$ regions by $n$ lines. So, $\mu_3=6$, and of course $\mu_2=3$ and $\mu_1=M_1=2$. Here are a few more values of $\mu_n$, with examples in lieu of proofs.

The last one, unattainability of 23 with 8 lines, isn’t easy to prove.

Does $\mu_n$ have a closed form? 2,3,6,8,12,15,18,24 is not in OEIS, unlike the upper bound.

Update: the 1993 paper Classification of arrangements by the number of their cells by Nicola Martinov gives a complete description of the pairs $(n,f)$ for which there is a partition of (projective) plane by $n$ lines into $f$ regions. In the affine version considered above we should simply says that the ideal line is also a part of arrangement; thus, 3-line affine arrangements correspond to 4-line projective arrangements, etc. However, it is not entirely trivial to get $\mu_n$ out of Martinov’s formulas.