Need for speed vs bounded position and acceleration

You are driving a car with maximal acceleration (and deceleration) A on a road that’s been blocked off in both directions (or, if you prefer, on the landing strip of an aircraft carrier). Let L be the length of the road available to you.
What is the maximal speed you can reach?

Besides A and L, the answer also depends on your mood: do you want to live, or are you willing to go out in a blaze of glory? In the latter case the answer is obvious: position the car at one end of the interval, and put the pedal to the metal. The car will cover the distance L within the time {\sqrt{2L/A}}, reaching the speed {v=\sqrt{2AL}} at the end. In the former scenario one has to switch to the brake pedal midway through the distance, so the maximal speed will be attained at half the length, {\sqrt{AL}}.

Rephrased in mathematical terms: if {f} is a twice differentiable function and {M_k = \sup|f^{(k)}|} for {k=0,1,2}, then {M_1^2 \le 4M_0M_2} if {f} is defined on a half-infinite interval, and {M_1^2 \le 2M_0M_2} if the domain of {f} is the entire line. To connect the notation, just put {L=2M_0} and {A=M_1} in the previous paragraph… and I guess some proof other than “this is obvious” is called for, but it’s not hard to find one: this is problem 5.15 in Rudin’s Principles of Mathematical Analysis.

Perhaps more interesting is to study the problem in higher dimensions: one could be driving in a parking lot of some shape, etc. Let’s normalize the maximal acceleration as 1, keeping in mind it’s a vector. Given a set E, let S(E) be the square of maximal speed attainable by a unit-acceleration vehicle which stays in E indefinitely. Also let U(E) be the square of maximal speed one can attain while crashing out of bounds after the record is set. Squaring makes these quantities scale linearly with the size of the set. Both are monotone with respect to set inclusion. And we know what they are for an interval of length L: namely, {S = L} and {U=2L}, so that gives some lower bounds for sets that contain a line interval.

When E is a circle of radius 1, the best we can do is to drive along it with constant speed 1; then the centripetal acceleration is also 1. Any higher speed will exceed the allowable acceleration in the normal direction, never mind the tangential one. So, for a circle both S and U are equal to its radius.

On the other hand, if E is a disk of radius R, then driving along its diameter is better: it gives {S\ge 2R} and {U\ge 4R}.

Some questions:

  1. If E is a convex set of diameter D, is it true that {S(E) = D} and {U(E) = 2D}?
  2. Is it true that {U\le 2S} in general?
  3. How to express S and U for a smooth closed curve in terms of its curvature? They are not necessarily equal (like they are for a circle): consider thin ellipses converging to a line segment, for which S and U approach the corresponding values for that segment.

The answer to Question 1 is yes. Consider the orthogonal projection of E, and of a trajectory it contains, onto some line L. This does not increase the diameter or the acceleration; thus, the one-dimensional result implies that the projection of velocity vector onto L does not exceed {\sqrt{D}} (or {\sqrt{2D}} for the crashing-out version). Since L was arbitrary, it follows that {S(E) \le D} and {U(E) \le 2D}. These upper bounds hold for general sets, not only convex ones. But when E is convex, we get matching lower bounds by considering the longest segment contained in E.

I don’t have answers to questions 2 and 3.

Linear approximation and differentiability

If a function {f\colon \mathbb R\rightarrow \mathbb R} is differentiable at {a\in \mathbb R}, then it admits good linear approximation at small scales. Precisely: for every {\epsilon>0} there is {\delta>0} and a linear function {\ell(x)} such that {|f(x)-\ell(x)|<\epsilon \,\delta} for all {|x|<\delta}. Having {\delta} multiplied by {\epsilon} means that the deviation from linearity is small compared to the (already small) scale {\delta} on which the function is considered.

For example, this is a linear approximation to {f(x)=e^x} near {0} at scale {\delta=0.1}.

Linear approximation to exponential function
Linear approximation to exponential function

As is done on this graph, we can always take {\ell} to be the secant line to the graph of {f} based on the endpoints of the interval of consideration. This is because if {L} is another line for which {|f(x)-L(x)|<\epsilon \,\delta} holds, then {|\ell-L|\le \epsilon \,\delta} at the endpoints, and therefore on all of the interval (the function {x\mapsto |\ell(x)-L(x)|} is convex).


Here is a non-differentiable function that obviously fails the linear approximation property at {0}.

Self-similar graph
Self-similar graph

(By the way, this post is mostly about me trying out SageMathCloud.) A nice thing about {f(x)=x\sin \log |x|} is self-similarity: {f(rx)=rf(x)} with the similarity factor {r=e^{2\pi}}. This implies that no matter how far we zoom in on the graph at {x=0}, the graph will not get any closer to linear.

I like {x\sin \log |x|} more than its famous, but not self-similar, cousin {x\sin(1/x)}, pictured below.

Standard example from intro to real analysis
Standard example from intro to real analysis

Interestingly, linear approximation property does not imply differentiability. The function {f(x)=x\sin \sqrt{-\log|x|}} has this property at {0}, but it lacks derivative there since {f(x)/x} does not have a limit as {x\rightarrow 0}. Here is how it looks.

Now with the square root!
Now with the square root!

Let’s look at the scale {\delta=0.1}

scale 0.01
scale 0.1

and compare to the scale {\delta=0.001}

scale 0.001
scale 0.001

Well, that was disappointing. Let’s use math instead. Fix {\epsilon>0} and consider the function {\phi(\delta)=\sqrt{-\log \delta}-\sqrt{-\log (\epsilon \delta)}}. Rewriting it as

\displaystyle    \frac{\log \epsilon}{\sqrt{-\log \delta}+\sqrt{-\log (\epsilon \delta)}}

shows {\phi(\delta)\rightarrow 0} as {\delta\rightarrow 0}. Choose {\delta} so that {|\phi(\delta)|<\epsilon} and define {\ell(x)=x\sqrt{-\log \delta}}. Then for {\epsilon \,\delta\le |x|< \delta} we have {|f(x)-\ell(x)|\le \epsilon |x|<\epsilon\,\delta}, and for {|x|<\epsilon \delta} the trivial bound {|f(x)-\ell(x)|\le |f(x)|+|\ell(x)|} suffices.

Thus, {f} can be well approximated by linear functions near {0}; it’s just that the linear function has to depend on the scale on which approximation is made: its slope {\sqrt{-\log \delta}} does not have a limit as {\delta\to0}.

The linear approximation property does not become apparent until extremely small scales. Here is {\delta = 10^{-30}}.

Scale 1e-30
Scale 1e-30

Higher order reflections

Mathematical reflections, not those supposedly practiced in metaphilosophy.

Given a function {f} defined for {x\ge 0}, we have two basic ways to reflect it about {x=0}: even reflection {f(-x)=f(x)} and odd reflection {f(-x)=-f(x)}. Here is the even reflection of the exponential function {e^x}:

Even reflection
Even reflection

The extended function is not differentiable at {0}. The odd reflection, pictured below, is not even continuous at {0}. But to be fair, it has the same slope to the left and to the right of {0}, unlike the even reflection.

Odd reflection
Odd reflection

Can we reflect a function preserving both continuity and differentiability? Yes, this is what higher-order reflections are for. They define {f(-x)} not just in terms of {f(x)} but also involve values at other points, like {f(x/2)}. Here is one such smart reflection:

\displaystyle    f(-x) = 4f(x/2)-3f(x)  \qquad\qquad\qquad (1)

Differentiable reflection
Differentiable reflection

Indeed, letting {x\rightarrow 0^+}, we observe continuity: both sides converge to {f(0)}. Taking derivatives of both sides, we get

\displaystyle  -f'(-x) = 2f'(x/2) - 3f'(x)

where the limits of both sides as {x\rightarrow 0^+} again agree: they are {-f'(0)}.

A systematic way to obtain such reflection formulas is to consider what they do to monomials: {1}, {x}, {x^2}, etc. A formula that reproduces the monomials up to degree {d} will preserve the derivatives up to order {d}. For example, plugging {f(x)=1} or {f(x)=x} into (1) we get a valid identity. With {f(x)=x^2} the equality breaks down: {x^2} on the left, {-2x^2} on the right. As a result, the curvature of the graph shown above is discontinuous: at {x=0} it changes the sign without passing through {0}.

To fix this, we’ll need to use a third point, for example {x/4}. It’s better not to use points like {2x}, because when the original domain of {f} is a bounded interval {[0,b]}, we probably want the reflection to be defined on all of {[-b,b]}.

So we look for coefficients {A,B,C} such that {f(-x)=Af(x/4)+Bf(x/2)+Cf(x)} holds as identity for {f(x)=1,x,x^2}. The linear system {A+B+C=1}, {A/4+B/2+C=-1}, {A/16+B/4+C=1} has the solution {A=16}, {B=-20}, {C=5}. This is our reflection formula, then:

\displaystyle  f(-x) = 16f(x/4)-20f(x/2)+5f(x)  \qquad\qquad\qquad (2)

And this is the result of reflecting {\exp(x)} according to (2):

Twice differentiable reflection
Twice differentiable reflection

Now the curvature of the graph is continuous. One could go on, but since human eye is not sensitive to discontinuities of the third derivative, I’ll stop here.


In case you don’t believe the last paragraph, here is the reflection with three continuous derivatives, given by

\displaystyle  f(-x) = \frac{640}{7} f(x/8) - 144f(x/4)+60f(x/2)-\frac{45}{7}f(x)

and below it, the extension given by (2). For these plots I used Desmos because plots in Maple (at least in my version) have pretty bad aliasing.

Three continuous derivatives
Three continuous derivatives
Two continuous derivatives
Two continuous derivatives

Also, cubic splines have only two continuous derivatives and they connect dots naturally.

Fractal-ish monotone functions

There are several ways to construct a strictly increasing continuous function which has zero derivative almost everywhere. I like the explicit construction given by R. Salem, On some singular monotonic functions which are strictly increasing (1943).

lambda = 1/4
lambda = 1/4

Here is a streamlined version of the construction.

Fix {\lambda\in (0,1/2)} (on the above picture {\lambda=1/4}). Let {f_0(x)=x}, and inductively define {f_{n+1}} so that

  1. {f_{n+1} (x) = f_n(x)} when {x\in 2^{-n}\mathbb Z}.
  2. If {x\in 2^{-n}\mathbb Z}, let {f_{n+1}(x+2^{-n-1}) =\lambda f_n(x) + (1-\lambda) f_n(x+2^{-n})}.
  3. Now that {f_{n+1}} has been defined on {2^{-n-1}\mathbb Z}, extend it to {\mathbb R} by linear interpolation.
  4. Let {f=\lim f_n}.

Since {f(x+1)=f(x)+1} by construction, it suffices to understand the behavior of {f} on {[0,1]}.

Each {f_n} is piecewise linear and increasing. At each step of the construction, every line segment of {f_n} (say, with slope {s}) is replaced by two segments, with slopes {2(1-\lambda)s} and {2\lambda s}. Since {\lambda f_n(x+2^{-n-1})}. Hence, {f_{n+1}\ge f_n}.

Since {f(x)=f_n(x)} when {x\in 2^{-n}\mathbb Z}, it is easy to understand {f} by considering its values at dyadic rationals and using monotonicity. This is how one can see that:

  • The difference of values of {f} at consecutive points of {2^{-n}\mathbb Z} is at most {(1-r)^{n}}. Therefore, {f} is Hölder continuous with exponent {\alpha = - \frac{\log (1-r)}{\log 2}}.
  • The difference of values of {f} at consecutive points of {2^{-n}\mathbb Z} is at least {r^{n}}. Therefore, {f} is strictly increasing, and its inverse is Hölder continuous with exponent {\alpha = - \frac{\log r}{\log 2}}.

It remains to check that {f'=0} almost everywhere. Since {f} is monotone, it is differentiable almost everywhere. Let {x} be a point of differentiability (and not a dyadic rational, though this is automatic). For each {n} there is {x_n\in 2^{-n}\mathbb Z} such that {x_n < x< x_n+2^{-n}}. Let {s_n = 2^{n} (f_n(x_n+2^{-n})-f_n(x_n))}; this is the slope of {f_n} on the {2^{-n}}-dyadic interval containing {x}. Since {f'(x)} exists, we must have {f'(x) = \lim_{n\rightarrow\infty} s_n}. On the other hand, the ratio of consecutive terms of this sequence, {s_{n+1}/s_n}, is always either {2 (1-\lambda )} or {2\lambda}. Such a sequence cannot have a finite nonzero limit. Thus {f'(x)=0}.


Here is another {f}, with {\lambda=1/8}.

lambda = 1/8
lambda = 1/8

By making {\lambda} very small, and being more careful with the analysis of {f'}, one can make the Hausdorff dimension of the complement of {\{x \colon f'(x)=0\}} arbitrarily small.


An interesting modification of Salem’s function was introduced by Tukia in Hausdorff dimension and quasisymmetric mappings (1989). For the functions considered above, the one-sided derivatives at every dyadic rational are zero and infinity, which is a rather non-symmetric state of affair. In particular, these functions are not quasisymmetric. But Tukia showed that if one alternates between {\lambda} and {1-\lambda} at every step, the resulting homeomorphism of {\mathbb R} becomes quasisymmetric. Here is the picture of alternating construction with {\lambda=1/4}; preliminary stages of construction are in green.

Tukia's modification of Salem's function
Tukia’s modification of Salem’s function

This is quite similar to how the introduction of alternating signs turns Takagi curve (blancmange curve) into a quasiarc (i.e., a curve without cusps); see Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk. But the fractal curves in this post are relatively mild-mannered: they are rectifiable (and thus, not really fractal).


Here is the simple Scilab code I used for the above plots.

r = 1/4
t = [0 1]
f = t
for i = 1:10
    t = [t, t(1:$-1)+2^(-i)]
    f = [f, r*f(1:$-1)+(1-r)*f(2:$)]
    [t, ind] = gsort(t,'g','i')
    f = f(ind)
end
plot(t,f)

To have preliminary stages shown as well, move plot(t,f) into the loop. For Tukia’s alternating version, insert the line r = 1-r into the loop.

Calculating 2*2 with high precision

The definition of derivative,

\displaystyle    f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h) - f(x)}{h}   \ \ \ \ \ \ \ \ \ \ (1)

is not such a great way to actually find the derivative numerically. Its symmetric version,

\displaystyle    f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h) - f(x-h)}{2h}   \ \ \ \ \ \ \ \ \ \ (2)

performs much better in computations. For example, consider the derivative {f(x)=e^x } at the point {x=1}. We know that {f'(1)=2.718281828\dots}. Numerically, with {h=0.001}, we get

\displaystyle    \frac{f(1+h) - f(1)}{h} \approx \mathbf{2.71}9 \dots

(error {>10^{-3}}) versus

\displaystyle    \frac{f(1+h) - f(1-h)}{2h} \approx \mathbf{2.71828}2 \dots

(error {<10^{-6}}).

Considering this, why don’t we ditch (1) altogether and adopt (2) as the definition of derivative? Just say that by definition,

\displaystyle    f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h)-f(x-h)}{2h}

whenever the limit exists.

This expands the class of differentiable functions: for example, {f(x)=|x|} becomes differentiable with {f'(0)=0}. Which looks more like a feature than a bug: after all, {f} has a minimum at {0}, and the horizontal line through the minimum is the closest thing to the tangent line that it has.

Another example: the function

\displaystyle     f(x) = \begin{cases} x ,\quad & x\le 0 \\ 3x,\quad & x>0 \end{cases}

has {f'(0)=2} under this definition, because

\displaystyle    \lim_{h\rightarrow 0^+}\frac{f(x+h)-f(x-h)}{2h} = \lim_{h\rightarrow 0^+}\frac{3h-(-h)}{2h} = 2

and

\displaystyle    \lim_{h\rightarrow 0^-}\frac{f(x+h)-f(x-h)}{2h} = \lim_{h\rightarrow 0^-}\frac{h-3(-h)}{2h} = 2

This example also makes sense: since {f(x)=|x|+2x}, getting {f'(0)=0+2} is expected. In fact, with the new definition we still have basic derivative rules: if {f,g} are differentiable, then {f+g}, {f-g}, {fg}, {f/g} are also differentiable (with the usual caveat about {g\ne 0}) and the familiar formulas hold.

Let’s test the chain rule on the function {g = f\circ f}. The rule says

\displaystyle     g'(0) = f'(f(0)) f'(0)    \ \ \ \ \ \ \ \ \ \ (3)

Since {f(0)=0}, the product on the right is {2\cdot 2}. On the other hand,

\displaystyle     g(x) = \begin{cases} x ,\quad & x\le 0 \\ 9x,\quad & x>0 \end{cases}

which implies, by a computation similar to the above, that {g'(0)=5}. So, if we want to have the chain rule (3), we must accept that

\displaystyle     \mathbf{2\cdot 2 = 5}

This is where the desire for high numerical precision leads.

Plenty of other things go wrong with the symmetric definition:

  • Maximum or minimum of {f} may be attained where {f'} exists and is nonzero.
  • A differentiable function may be discontinuous.
  • Having {f'>0} everywhere does not imply that {f} is increasing.
  • Mean Value Theorem fails.

Scaling and oscillation

A function {f\colon \mathbb R\rightarrow\mathbb R} can be much larger than its derivative. Take the constant function {f(x)=10^{10}}, for example. Or {f(x)=10^{10}+\sin x} to make it nonconstant. But if one subtracts the average (mean) from {f}, the residual is nicely estimated by the derivative:

\displaystyle    \frac{1}{b-a}\int_a^b |f(x)-\overline{f}|\,dx \le \frac12 \int_a^b |f'(x)|\,dx    \ \ \ \ \ (1)

Here {\overline{f}} is the mean of {f} on {[a,b]}, namely {\overline{f}=\frac{1}{b-a}\int_a^b f(t)\,dt}. Indeed, what’s the worst that could happen? Something like this:

Deviation from the mean
Deviation from the mean

Here {H} is at most the integral of {|f'|}, and the shaded area is at most {\frac12 H(b-a)}. This is what the inequality (1) says.

An appealing feature of (1) is that it is scale-invariant. For example, if we change the variable {u=2x}, both sides remain the same. The derivative will be greater by the factor of {2}, but will be integrated over the shorter interval. And on the left we have averages upon averages, which do not change under scaling.

What happens in higher dimensions? Let’s stick to two dimensions and consider a smooth function {f\colon\mathbb R^2\rightarrow\mathbb R}. Instead of an interval we now have a square, denoted {Q}. It makes sense to denote squares by {Q}, because it’s natural to call a square a cube, and “Q” is the first letter of “cube”. Oh wait, it isn’t. Moving on…

The quantity {b-a} was the length of interval of integration. Now we will use the area of {Q}, denoted {|Q|}. And {\overline{f}=\frac{1}{|Q|}\iint_Q f} is now the mean value of {f} on {Q}. At first glance one might conjecture the following version of (1):

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le C \int_Q |\nabla f(x,y)|\,dx\,dy   \ \ \ \ \ (2)

But this can’t be true because of inconsistent scaling. The left side of (2) is scale-invariant as before. The right side is not. If we shrink the cube by factor of {2}, the gradient {|\nabla f|} will go up by {2}, but the area goes down by {4}. This suggests that the correct inequality should be

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le C \left(\int_Q |\nabla f(x,y)|^2\,dx\,dy\right)^{1/2}   \ \ \ \ \ (3)

We need the square root so that the right side of (3) scales correctly with {f}: to first power.

And here is the proof. Let {f(*,y)} denote {f} averaged over {x}. Applying (1) to every horizontal segment in {Q}, we obtain

\displaystyle    \frac{1}{h}\iint_Q |f(x,y)-f(*,y)|\,dx\,dy \le \frac12 \int_Q |f_x(x,y)|\,dx\,dy    \ \ \ \ \ (4)

where {h} is the sidelength of {Q}. Now work with {f(*,y)}, using (1) along vertical segments:

\displaystyle  \frac{1}{h}\iint_Q |f(*,y)-f(*,*)|\,dx\,dy \le \frac12 \int_Q |f_y(*,y)|\,dx\,dy    \ \ \ \ \ (5)

Of course, {f(*,*)} is the same as {\overline{f}}. The derivative on the right can be estimated: the derivative of average does not exceed the average of the absolute value of derivative. To keep estimates clean, simply estimate both partial derivatives by {|\nabla f|}. From (4) and (5) taken together it follows that

\displaystyle    \frac{1}{h}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le \int_Q |\nabla f(x,y)|\,dx\,dy    \ \ \ \ \ (6)

This is an interesting result (a form of the Poincar\'{e} inequality), but in the present form it’s not scale-invariant. Remember that we expect the square of the gradient on the right. Cauchy-Schwarz to the rescue:

\displaystyle    \int_Q 1\cdot |\nabla f| \le \left( \int_Q 1 \right)^{1/2} \left( \int_Q |\nabla f|^2 \right)^{1/2}

The first factor on the right is simply {h}. Move it to the left and we are done:

\displaystyle    \frac{1}{|Q|}\iint_Q |f(x,y)-\overline{f}|\,dx\,dy \le \left(\int_Q |\nabla f(x,y)|^2\,dx\,dy\right)^{1/2}   \ \ \ \ \ (7)

In higher dimensions we would of course have {n} instead of {2}. Which is one of many reasons why analysis in two dimensions is special: {L^n} is a Hilbert space only when {n=2}.

The left side of (7) is the mean oscillation of {f} on the square {Q}. The integrability of {|\nabla f|^n} in {n} dimensions ensures that {f} is a function of bounded mean oscillation, known as BMO. Actually, it is even in the smaller space VMO because the right side of (7) tends to zero as the square shrinks. But it need not be continuous or even bounded: for {f(x)=\log\log |x| } the integral of {|\nabla f|^n} converges in a neighborhood of the origin (just barely, thanks to {\log^n |x|} in the denominator). This is unlike the one-dimensional situation where the integrability of {|f'|} guarantees that the function is bounded.