# Continuity of circumcenter and circumradius

For a bounded set ${A}$ on the plane (or in any Euclidean space) one can define the circumcenter ${c(A)}$ and circumradius ${r(A)}$ as follows: ${r(A)}$ is the smallest radius of a closed disk containing ${A}$, and ${c(A)}$ is the center of such a disk. (Other terms in use: Chebyshev center and Chebyshev radius.)

The fact that ${c(A)}$ is well-defined may not be obvious: what if there are multiple disks of radius ${r(A)}$ that contain ${A}$? To investigate, introduce the farthest distance function ${f_A(x) = \sup_{a\in A} |x-a|}$. By definition, ${c(A)}$ is where ${f_A}$ attains its minimum. The function ${f_A}$ is convex, being the supremum of a family of convex functions. However, that does not guarantee the uniqueness of its minimum. We have two issues here:

• ${x\mapsto |x-a|}$ is not strictly convex
• the supremum of an infinite family of strictly convex functions can fail to be strictly convex (like ${\sup_{1 on the interval ${[0,1]}$).

The first issue is resolved by squaring ${f_A}$. Indeed, ${f_A^2}$ attains its minimum at the same place where ${f_A}$ does, and ${f_A(x)^2 = \sup_{a\in A} |x-a|^2}$ where each term ${|x-a|^2}$ is strictly convex.

Also, we don’t want to lose strict convexity when taking the supremum over ${a\in A}$. For this purpose, we must replace strict inequality by something more robust. The appropriate substitute is strong convexity: a function ${f}$ is strongly convex if there is ${\lambda>0}$ such that ${f(x)-\lambda |x|^2 }$ is convex. Let’s say that ${f}$ is ${\lambda}$-convex in this case.

Since ${|x-a|^2-|x|^2 = -2\langle x,a\rangle + |a|^2}$ is a convex (in fact linear) function of ${x}$, we see that ${|x-a|^2}$ is ${1}$-convex. This property passes to supremum: subtracting ${|x|^2}$ from the supremum is the same as subtracting it from each term. Strong convexity implies strict convexity and with it, the uniqueness of the minimum point. So, ${c(A)}$, the minimum of ${f_A^2}$, is uniquely defined. (Finding it in practice may be difficult. The spherical version of this problem is considered in Covering points with caps).

Having established uniqueness, it is natural to ask about stability, or more precisely, the continuity of ${c(A)}$ and ${r(A)}$ with respect to ${A}$. Introduce the Hausdorff distance ${d_{\mathcal H}}$ on the set of bounded subsets. By definition, ${d_{\mathcal H}(A,B)\le \delta}$ if ${A}$ is contained in ${\delta}$-neighborhood of ${B}$, and ${B}$ is contained in ${\delta}$-neighborhood of ${A}$. It is easy to see that ${r(B)\le r(A) + d_{\mathcal H}(A,B)}$, and therefore

$\displaystyle |r(A)-r(B)|\le d_{\mathcal H}(A,B)$

In words, the circumradius is a ${1}$-Lipschitz function of the set.

What about the circumcenter? If the set ${A}$ is shifted by ${d}$ units in some direction, the circumcenter moves by the same amount. So it may appear that it should also be a ${1}$-Lipschitz function of ${A}$. But this is false.

Observe (or recall from middle-school geometry) that the circumcenter of a right triangle is the midpoint of its hypotenuse:

Consider two right triangles:

• Vertices ${(-1,0), (1,0), (1,\epsilon)}$. The right angle is at ${(1,0)}$, and the circumvcenter is the midpoint of opposite side: ${(0,\epsilon/2)}$.
• Vertices ${(-1,0), (1,0), (\sqrt{1-\epsilon^2},\epsilon)}$. The right angle is at
${(\sqrt{1-\epsilon^2},\epsilon)}$ and the circumcenter is at ${(0,0)}$.

The Hausdorff distance between these two triangles is merely ${1-\sqrt{1-\epsilon^2} < \epsilon^2}$, yet the distance between their circumcenters is ${\epsilon/2}$. So, Lipschitz continuity fails, and the most we can hope for is Hölder continuity with exponent ${1/2}$.

And indeed, the circumcenter is locally ${1/2}$-Hölder continuous. To prove this, suppose ${|c(A)-c(B)|\ge \epsilon}$. The ${1}$-convexity of ${f_A^2}$ implies that

$\displaystyle f_A(c(B))^2 \ge f_A(c(A))^2+|c(A)-c(B)|^2 \ge r(A)^2 + \epsilon^2$

On the other hand, since ${|f_A-f_B|\le d_{\mathcal H}(A,B)}$ everywhere,

$\displaystyle f_A(c(B))\le f_B(c(B)) + d_{\mathcal H}(A,B) = r(B) + d_{\mathcal H}(A,B) \le r(A) + 2 d_{\mathcal H}(A,B)$

Putting things together,

$\displaystyle d_{\mathcal H}(A,B) \ge \frac12 (\sqrt{r(A)^2 + \epsilon^2} - r(A)) = \frac{\epsilon^2}{2( \sqrt{r(A)^2 + \epsilon^2} + r(A) )}$

Thus, as long as ${r(A)}$ remains bounded above, we have an inequality of the form ${d_{\mathcal H}(A,B) \ge c\, \epsilon^2}$, which is exactly ${1/2}$-Hölder continuity.

Remark. The proof uses no information about ${\mathbb R^n}$ other than the ${1}$-convexity of the squared distance function. As such, it applies to every CAT(0) space.

# Real line : Complex plane :: Hat : ?

The title is a word analogy puzzle. The plots below are hints. In each pair, the black curve is the same.

### x4

Answer: boa constrictor digesting an elephant.

# Infinite beatitude of non-existence: a journey into Nothingland

In the novella Flatland by Edwin A. Abbott, the Sphere leads the Square “downward to the lowest depth of existence, even to the realm of Pointland, the Abyss of No dimensions”:

I caught these words, “Infinite beatitude of existence! It is; and there is nothing else beside It.” [...] “It fills all Space,” continued the little soliloquizing Creature, “and what It fills, It is. What It thinks, that It utters; and what It utters, that It hears; and It itself is Thinker, Utterer, Hearer, Thought, Word, Audition; it is the One, and yet the All in All. Ah, the happiness, ah, the happiness of Being!”

Indeed, Pointland (a one-point space) is zero-dimensional by every concept of dimension that I know of. Yet there is something smaller: Nothingland — empty space, ${\varnothing}$ — whose non-existent inhabitants must be perpetually enjoying the happiness of Non-Being.

What is the dimension of Nothingland?

In topology, the empty set has dimension ${-1}$. This fits the inductive definition of topological dimension, which is the smallest number ${d}$ such that the space can be minced by removing a subset of dimension ${\le d-1}$. (Let’s say a space has been minced if what’s left has no connected subsets other than points.)

Thus, a nonempty finite (or countable) set has dimension ${0}$: it’s minced already, so we remove nothing, a set of dimension ${-1}$. A line or a curve is one-dimensional: they can be minced by removing a zero-dimensional subset, like rational numbers.

The Flatland itself can be minced by removing a one-dimensional subset (e.g., circles with rational radius and rational coordinates of the center), so it is two-dimensional. And so on.

The convention ${\mathrm{dim}\,\varnothing = -1}$, helpful in the definition, gets in the way later. For example, the topological dimension is subadditive under products: ${\mathrm{dim}\,(A\times B)\le \mathrm{dim}\,A + \mathrm{dim}\,B}$ … unless both ${A}$ and ${B}$ are empty, because then ${-1\le -2}$ is false. So the case ${A=B=\varnothing}$ must be excluded from the product theorem. We would not have to do this if ${\mathrm{dim}\,\varnothing }$ was defined to be ${-\infty}$.

Next, consider the Hausdorff dimension. Its definition is not inductive, but one has to introduce other concepts first. First, define the ${d}$-dimensional premeasure on scale ${\delta>0}$:

$\displaystyle \mathcal H^d_\delta (X) = \inf \sum_j (\mathrm{diam}\,{U_j})^{d}$

where the infimum is taken over all covers of ${X}$ by nonempty subsets ${U_j}$ with ${\mathrm{diam}\,{U_j}\le \delta}$. Requiring ${U_j}$ to be nonempty avoids the need to define the diameter of Nothingland, which would be another story. The empty space can be covered by empty family of nonempty subsets. The sum of empty set of numbers is ${0}$, and so ${\mathcal H^d_\delta (\varnothing) = 0}$.

Then we define the ${d}$-dimensional Hausdorff measure:

$\displaystyle \mathcal H^d (X) = \lim_{\delta\rightarrow0} \mathcal H^d_\delta (X)$

and finally,

$\displaystyle \mathrm{dim}_H (X) = \inf \{ d \colon \mathcal H^d (X)=0\}$

If in this last infimum we require ${d>0}$, the result is ${\mathrm{dim}_H (\varnothing) =0}$. But why make this restriction? The ${d}$-dimensional pre-measures and measures make sense for all real ${d}$. It’s just that for nonempty ${X}$, we are raising some small (or even zero) numbers to negative power, getting something large as a result. Consequently, every nonempty space has ${\mathcal H^d = \infty}$ for all ${d < 0}$.

But ${\mathcal H^d_\delta (\varnothing) = 0}$, from the sum of empty collection of numbers being zero. Hence, ${\mathcal H^d (\varnothing) = 0}$ for all real ${d}$, and this leads to ${\mathrm{dim}_H\,\varnothing = -\infty}$.

To have ${\mathrm{dim}_H\,\varnothing = -\infty}$ is also convenient because the Hausdorff dimension is superadditive under products: ${\mathrm{dim}_H\,(A\times B)\ge \mathrm{dim}_H\,A + \mathrm{dim}_H\,B}$. This inequality was proved for general metric spaces as recently as 1995, by John Howroyd. If we don’t have ${\mathrm{dim}_H\,\varnothing = -\infty}$, then both factors ${A}$ and ${B}$ must be assumed nonempty.

So… should Nothingland have topological dimension ${-1}$ and Hausdorff dimension ${-\infty}$? But that would violate the inequality ${\mathrm{dim} (X)\le \mathrm{dim}_H (X)}$ which holds for every other separable metric space. In fact, for such spaces the topological dimension is simply the infimum of the Hausdorff dimension over all metrics compatible with the topology.

I am inclined to let the dimension of Nothingland be ${-\infty}$ for every concept of dimension.

# Words that contain UIO, and best-fitting lines

In Calculus I we spend a fair amount of time talking about how nicely the tangent line fits a smooth curve.

But truth be told, it fits only near the point of tangency. How can we find the best approximating line for a function ${f}$ on a given interval?

A natural measure of quality of approximation is the maximum deviation of the curve from the line, ${E(f;\alpha,\beta) = \max_{[a, b]} |f(x) - \alpha x-\beta|}$ where ${\alpha,\beta}$ are the coefficients in the line equation, to be determined. We need ${\alpha,\beta}$ that minimize ${E(f;\alpha,\beta)}$.

The Chebyshev equioscillation theorem is quite useful here. For one thing, its name contains the letter combination uio, which Scrabble players may appreciate. (Can you think of other words with this combination?) Also, its statement does not involve concepts outside of Calculus I. Specialized to the case of linear fit, it says that ${\alpha,\beta}$ are optimal if and only if there exist three numbers ${ x_1 in ${[a, b]}$ such that the deviations ${\delta_i = f(x_i) - \alpha x_i-\beta}$

• are equal to ${E(f;\alpha,\beta)}$ in absolute value: ${|\delta_i| = E(f;\alpha,\beta)}$ for ${i=1,2,3}$
• have alternating signs: ${\delta_1 = -\delta_2 = \delta_3}$

Let’s consider what this means. First, ${f'(x_i) =\alpha}$ unless ${x_i}$ is an endpoint of ${[a,b]}$. Since ${x_2}$ cannot be an endpoint, we have ${f'(x_2)=\alpha}$.

Furthermore, ${f(x) - \alpha x }$ takes the same value at ${x_1}$ and ${x_3}$. This gives an equation for ${x_2}$

$\displaystyle f(x_1)-f'(x_2)x_1 = f(x_3)-f'(x_2) x_3 \qquad \qquad (1)$

which can be rewritten in the form resembling the Mean Value Theorem:

$\displaystyle f'(x_2) = \frac{f(x_1)-f(x_3)}{x_1-x_3} \qquad \qquad (2)$

If ${f'}$ is strictly monotone, there can be only one ${x_i}$ with ${f'(x_i)=\alpha}$. Hence ${x_1=a}$ and ${x_3=b}$ in this case, and we find ${x_2}$ by solving (2). This gives ${\alpha = f'(x_2)}$, and then ${\beta}$ is not hard to find.

Here is how I did this in Sage:

var('x a b')
f = sin(x)  # or another function
df = f.diff(x)
a = # left endpoint
b = # right endpoint

That was the setup. Now the actual computation:

var('x1 x2 x3')
x1 = a
x3 = b
x2 = find_root(f(x=x1)-df(x=x2)*x1 == f(x=x3)-df(x=x2)*x3, a, b)
alpha = df(x=x2)
beta = 1/2*(f(x=x1)-alpha*x1 + f(x=x2)-alpha*x2)
show(plot(f,a,b)+plot(alpha*x+beta,a,b,color='red'))

However, the algorithm fails to properly fit a line to the sine function on ${[0,3\pi/2]}$:

The problem is, ${f'(x)=\cos x}$ is no longer monotone, making it possible for two of ${x_i}$ to be interior points. Recalling the identities for cosine, we see that these points must be symmetric about ${x=\pi}$. One of ${x_i}$ must still be an endpoint, so either ${x_1=a}$ (and ${x_3=2\pi-x_2}$) or ${x_3=b}$ (and ${x_1=2\pi-x_2}$). The first option works:

This same line is also the best fit on the full period ${[0,2\pi]}$. It passes through ${(\pi,0)}$ and has the slope of ${-0.2172336...}$ which is not a number I can recognize.

On the interval ${[0,4\pi]}$, all three of the above approaches fail:

Luckily we don’t need a computer in this case. Whenever ${|f|}$ has at least three points of maximum with alternating signs of ${f}$, the Chebyshev equioscillation theorem implies that the best linear fit is the zero function.

# Unreasonable effectiveness of the left endpoint rule

The left endpoint rule, and its twin right endpoint rule, are ugly ducklings of integration methods. The left endpoint rule is just the average of the values of the integrand over left endpoints of equal subintervals:

$\displaystyle \int_a^b f(x)\,dx \approx \frac{b-a}{n} \sum_{i=0}^{n-1} f(a+i/n)$

Here is its graphical representation with ${n=10}$ on the interval ${[-1,1]}$: the sample points are marked with vertical lines, the length of each line representing the weight given to that point. Every point has the same weight, actually.

Primitive, ineffective, with error ${O(1/n)}$ for ${n}$ points used.

Simpson’s rule is more sophisticated, with error ${O(1/n^4)}$. It uses weights of three sizes:

Gaussian quadrature uses specially designed (and difficult to compute) sample points and weights: more points toward the edges, larger weights in the middle.

Let’s compare these quadrature rules on the integral ${\int_{-1}^1 e^x \cos \pi x\,dx}$, using ${10}$ points as above. Here is the function:

• The exact value of the integral is ${\dfrac{e^{-1}-e}{1+\pi^2}}$, about -0.216.
• Simpson’s rule gets within 0.0007 of the exact value. Well done!
• Gaussian quadrature gets within 0.000000000000003 of the exact value. Amazing!
• And the lame left endpoint rule outputs… a positive number, getting even the sign wrong! This is ridiculous. The error is more than 0.22.

Let’s try another integral: ${\int_{-1}^1 \sqrt{4+\cos \pi x +\sin \pi x} \,dx}$, again using ${10}$ points. The function looks like this:

The integral can be evaluated exactly… sort of. In terms of elliptic integrals. And preferably not by hand:

• Simpson’s rule is within 0.00001 of the exact value, even better than the first time.
• Gaussian quadrature is within 0.00000003, not as spectacular as in the first example.
• And the stupid left endpoint rule is … accurate within 0.00000000000000005. What?

The integral of a smooth periodic function over its period amounts to integration over a circle. When translated to the circle, the elaborate placement of Gaussian sample points is… simply illogical. There is no reason to bunch them up at any particular point: there is nothing special about (-1,0) or any other point of the circle.

The only natural approach here is the simplest one: equally spaced points, equal weights. Left endpoint rule uses it and wins.

# The Nelder-Mead minimization algorithm

It is easy to find the minimum of ${f(x,y) = x^2+16y^2}$ if you are human. For a computer this takes more work:

The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of ${f}$. Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?

A natural way to look for minimum is to slide along the graph in the direction opposite to ${\nabla f}$; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of ${f}$; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.

This is already a reasonable minimization algorithm: begin with a triangle ${T}$; find the values of ${f}$ at the vertices of ${T}$; reflect the triangle away from the highest value; if the reflected point ${R}$ has a smaller value, move there; otherwise stop.

But there’s a problem: the size of triangle never changes in this process. If ${T}$ is large, we won’t know where the minimum is even if ${T}$ eventually covers it. If ${T}$ is small, it will be moving in tiny steps.

Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of ${T}$. It is natural to contract it toward the “best” vertex (the one with the smallest value of ${f}$), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of ${f}$ at all vertices becoming very close to one another.

This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of ${f}$.

The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:

• Begin with some triangle ${T}$.
• Evaluate the function ${f}$ at each vertex. Call the vertices ${W,G,B}$ where ${W}$ is the worst one (the largest value of ${f}$) and ${B}$ is the best.
• Reflect ${W}$ about the midpoint of the good side ${GB}$. Let ${R}$ be the reflected point.
• If ${f(R), then we consider moving even further in the same direction, extending the line ${WR}$ beyond ${R}$ by half the length of ${WR}$. Choose between ${R}$ and ${E}$ based on where ${f}$ is smaller, and make the chosen point a new vertex of our triangle, replacing ${W}$.
• Else, do not reflect and instead shrink the triangle toward ${B}$.
• Repeat, stopping when we either exceed the number of iterations or all values of ${f}$ at the vertices of triangle become nearly equal.

(The full version of the Nelder-Mead algorithm also includes the comparison of ${R}$ with ${G}$, and also involves trying a point inside the triangle.)

This is Rosenbrock’s function ${f(x,y)=100(x^2-y)^2 + (x-1)^2}$, one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola ${y=x^2}$. At the bottom of the valley, the incline toward the minimum ${(1,1)}$ is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.

The algorithm described above gets within ${0.001}$ of the minimum in 65 steps.

In conclusion, Scilab code with this algorithm.

x = -0.4:0.1:1.6; y = -2:0.1:1.4          // viewing window
[X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30)  // contour plot
plot([1],[1],'r+')                             // minimum point
tol = 10^(-6)
n = 0
T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5]        //  initial triangle
for i=1:3
values(i) = f(T(i,1), T(i,2))
end
while (%T)
xpoly(T(:,1),T(:,2),'lines',1)         // draw the triangle
[values, index] = gsort(values)          // sort the values
T = T(index,:)
if values(1)-values(3) < tol            // close enough?
mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2))
break
end
R = T(2,:) + T(3,:) - T(1,:)             // reflected
fR = f(R(1),R(2))
if fR < values(3)
E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:)  // extended
fE = f(E(1),E(2))
if fE < fR
T(1,:)=E; values(1)=fE     // pick extended
else
T(1,:)=R; values(1)=fR     // pick reflected
end
else
for i=1:2
T(i,:) = (T(i,:)+T(3,:))/2      // shrink
values(i) = f(T(i,1), T(i,2))
end
end
n = n+1
if n >= 200
disp('Failed to converge'); break    //  too bad
end
end

# Hertzsprung–Russell diagram of Stack Exchange sites

The Hertzsprung–Russell diagram is a scatter plot of stars by temperature and luminosity.

Traditionally, it is shown with temperature axis pointing from right to left, which I don’t really like.

Stack Exchange family of sites is not (yet) as numerous as stars in the Universe; there are only 120 or so of them. But they can also be organized on a two-parameter log-log scatterplot. The two parameters are: total number of questions (intrinsic characteristic, like surface temperature) and the number of daily visits (luminosity in Internet terms).

The linear scale chart was not going to work, due to the supergiant size of Stack Overflow:

Even on the log-log scale Stack Overflow is an outlier, but within reason:

The colors follow Stack Exchange classification: Technology, Science, Culture & Recreation, Life & Arts, Business, Professional. The largest of science sites, and the second largest overall, is Mathematics, although it trails several non-science sites in luminosity.

Annotated version of the above diagram:

Both Mathematics and MathOverflow have low traffic compared to their size: perhaps the Internet audience is just not that into math? On the other hand, the young Mathematics Educators site is way up in the traffic category.

Ironically, Astronomy is a low-luminosity site, trailing in traffic the much smaller Earth Science. Other annotated science sites are well established by now: Physics, Statistics, and (slow-moving) Theoretical Computer Science.

The only two non-technology sites that beat Mathematics in traffic are Gaming and English. Gaming isn’t really an exception here, since all large “technology” sites revolve around computers.

Three of the sites have extremely low traffic currently: Beer, Italian, and Expatriates. If this does not improve, they may be shut down… or perhaps merged into one: Beer for Italian Expatriates.

(*) “Hertzsprung-Russel StarData” by ESOhttp://www.eso.org/public/images/. Licensed under CC BY 3.0 via Wikimedia Commons.

# Fractal-ish monotone functions

There are several ways to construct a strictly increasing continuous function which has zero derivative almost everywhere. I like the explicit construction given by R. Salem, On some singular monotonic functions which are strictly increasing (1943).

Here is a streamlined version of the construction.

Fix ${\lambda\in (0,1/2)}$ (on the above picture ${\lambda=1/4}$). Let ${f_0(x)=x}$, and inductively define ${f_{n+1}}$ so that

1. ${f_{n+1} (x) = f_n(x)}$ when ${x\in 2^{-n}\mathbb Z}$.
2. If ${x\in 2^{-n}\mathbb Z}$, let ${f_{n+1}(x+2^{-n-1}) =\lambda f_n(x) + (1-\lambda) f_n(x+2^{-n})}$.
3. Now that ${f_{n+1}}$ has been defined on ${2^{-n-1}\mathbb Z}$, extend it to ${\mathbb R}$ by linear interpolation.
4. Let ${f=\lim f_n}$.

Since ${f(x+1)=f(x)+1}$ by construction, it suffices to understand the behavior of ${f}$ on ${[0,1]}$.

Each ${f_n}$ is piecewise linear and increasing. At each step of the construction, every line segment of ${f_n}$ (say, with slope ${s}$) is replaced by two segments, with slopes ${2(1-\lambda)s}$ and ${2\lambda s}$. Since ${\lambda<1/2}$, it follows that ${f_{n+1}(x+2^{-n-1}) > f_n(x+2^{-n-1})}$. Hence, ${f_{n+1}\ge f_n}$.

Since ${f(x)=f_n(x)}$ when ${x\in 2^{-n}\mathbb Z}$, it is easy to understand ${f}$ by considering its values at dyadic rationals and using monotonicity. This is how one can see that:

• The difference of values of ${f}$ at consecutive points of ${2^{-n}\mathbb Z}$ is at most ${(1-r)^{n}}$. Therefore, ${f}$ is Hölder continuous with exponent ${\alpha = - \frac{\log (1-r)}{\log 2}}$.
• The difference of values of ${f}$ at consecutive points of ${2^{-n}\mathbb Z}$ is at least ${r^{n}}$. Therefore, ${f}$ is strictly increasing, and its inverse is Hölder continuous with exponent ${\alpha = - \frac{\log r}{\log 2}}$.

It remains to check that ${f'=0}$ almost everywhere. Since ${f}$ is monotone, it is differentiable almost everywhere. Let ${x}$ be a point of differentiability (and not a dyadic rational, though this is automatic). For each ${n}$ there is ${x_n\in 2^{-n}\mathbb Z}$ such that ${x_n < x< x_n+2^{-n}}$. Let ${s_n = 2^{n} (f_n(x_n+2^{-n})-f_n(x_n))}$; this is the slope of ${f_n}$ on the ${2^{-n}}$-dyadic interval containing ${x}$. Since ${f'(x)}$ exists, we must have ${f'(x) = \lim_{n\rightarrow\infty} s_n}$. On the other hand, the ratio of consecutive terms of this sequence, ${s_{n+1}/s_n}$, is always either ${2 (1-\lambda )}$ or ${2\lambda}$. Such a sequence cannot have a finite nonzero limit. Thus ${f'(x)=0}$.

Here is another ${f}$, with ${\lambda=1/8}$.

By making ${\lambda}$ very small, and being more careful with the analysis of ${f'}$, one can make the Hausdorff dimension of the complement of ${\{x \colon f'(x)=0\}}$ arbitrarily small.

An interesting modification of Salem’s function was introduced by Tukia in Hausdorff dimension and quasisymmetric mappings (1989). For the functions considered above, the one-sided derivatives at every dyadic rational are zero and infinity, which is a rather non-symmetric state of affair. In particular, these functions are not quasisymmetric. But Tukia showed that if one alternates between ${\lambda}$ and ${1-\lambda}$ at every step, the resulting homeomorphism of ${\mathbb R}$ becomes quasisymmetric. Here is the picture of alternating construction with ${\lambda=1/4}$; preliminary stages of construction are in green.

This is quite similar to how the introduction of alternating signs turns Takagi curve (blancmange curve) into a quasiarc (i.e., a curve without cusps); see Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk. But the fractal curves in this post are relatively mild-mannered: they are rectifiable (and thus, not really fractal).

Here is the simple Scilab code I used for the above plots.

r = 1/4
t = [0 1]
f = t
for i = 1:10
t = [t, t(1:$-1)+2^(-i)] f = [f, r*f(1:$-1)+(1-r)*f(2:\$)]
[t, ind] = gsort(t,'g','i')
f = f(ind)
end
plot(t,f)


To have preliminary stages shown as well, move plot(t,f) into the loop. For Tukia’s alternating version, insert the line r = 1-r into the loop.

Link to preliminary version of this post.

# Unreasonable effectiveness of Runge-Kutta 2

I used to give this example of an initial-value problem whenever numerical solutions of ODE came up in class:

$\displaystyle y' = -y^2, \quad y(0)=1$

The exact solution is ${y(t)=1/(1+t)}$. Using Euler’s method with step ${h=1}$ (admittedly large) we get, starting with ${y_0=1}$:

$\displaystyle y_1 = y_0 + h (-y_0^2) = 0,\quad y_2 = y_1 + h (-y_1^2) = 0, \dots$

The numerical solution drops down to zero at once and stays there. Not good at all.

The trapezoidal version of RK2 (also known as improved Euler, or Heun’s method) treats the above value of ${y_1}$ merely as prediction ${\tilde y_1}$, and computes the actual ${y_1}$ using the average of the slopes at ${y_0}$ and ${\tilde y_1}$:

$\displaystyle y_1 = y_0 + \frac{h}{2} (-y_0^2 - \tilde y_1^2) = 1 + \frac12 (-1 - 0 ) = \frac12$

… which is right on target. Let’s do it again: the prediction ${\tilde y_2 = y_1 + h (-y_1^2) = 1/4}$ and correction

$\displaystyle y_2 = y_1 + \frac{h}{2} (-y_1^2 - \tilde y_2^2) = 1 + \frac12 \left(-\frac{1}{4}-\frac{1}{16} \right) = \frac{11}{32}$

… which is ${\approx 0.34}$, compared to the exact solution ${1/3}$. Nice!

I like the example, because all numbers are manageable with mental arithmetics for two steps. And it’s clear that ${11/32}$ is pretty close to ${1/3 = 11/33}$.

But the fact that the relative error stays below ${3.7\%}$ throughout the computation despite the large step size ${h=1}$ seems… too good to be true. Consider that for the exact solution, the expression ${y^2}$ can differ by the factor of ${4}$ between two ${t}$-values at distance ${1}$.

The midpoint version of RK2, which generally performs about as well as the trapezoidal version, is nowhere as close in this example. Indeed, using the same data as above, we get

$\displaystyle y_1 = y_0 - h \left(\frac{y_0+\tilde y_1}{2}\right)^2 = \frac{3}{4}$

and so forth: the relative error reaches ${59\%}$ at the second step. That’s 16 times worse than the trapezoidal version.

What happens here has less to do with numerical analysis than with algebra of rational functions. Using ${h=1}$ in trapezoidal RK2, we are in effect iterating the function

$\displaystyle \varphi(y) = y + \frac12 (-y^2-(y-y^2)^2) = y-y^2+y^3-\frac12 y^4$

The exact solution would be obtained by iterating

$\displaystyle \Phi (y) = \frac{y}{1+y} = y-y^2+ y^3 - y^4 + \dots$

Two functions just happen to coincide at ${y=1}$, which is our starting point here.

From there we get to ${0.5}$, and on ${[0,0.5]}$ they are really close anyway, due to another coincidence: the truncation error ${|\varphi(y)-\Phi(y)|}$ is ${O(y^4)}$ instead of ${O(y^3)}$ as it is normally for second-order methods.

The midpoint method with ${h=1}$ amounts to iterating

$\displaystyle \psi(y) = y - \left(\frac{y+y-y^2}{2} \right)^2 = y-y^2+y^3-\frac14 y^4$

which is not substantially further away from ${\Phi}$, but does not enjoy the lucky coincidence at ${y=1}$.

The tables are turned with the initial value ${y_0=3}$. The exact solution is ${y(t) = 3/(1+3t)}$, which drops sharply from ${t=0}$ to ${t=1}$; its slope decreases by the factor of ${16}$ during one step. Still, midpoint-RK2 does a decent job with ${h=1}$:

while trapezoidal RK2 outputs ${y_1=-19.5}$, ${y_2=-80110}$, ${y_3 = -2\cdot 10^{19}}$ and promptly overflows.

With a reasonable step size, like ${h=0.1}$, normality is restored: both methods perform about equally well, with ${0.13\%}$ error for trapezoidal and ${0.21\%}$ for midpoint.

# Calculating 2*2 with high precision

The definition of derivative,

$\displaystyle f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h) - f(x)}{h} \ \ \ \ \ \ \ \ \ \ (1)$

is not such a great way to actually find the derivative numerically. Its symmetric version,

$\displaystyle f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h) - f(x-h)}{2h} \ \ \ \ \ \ \ \ \ \ (2)$

performs much better in computations. For example, consider the derivative ${f(x)=e^x }$ at the point ${x=1}$. We know that ${f'(1)=2.718281828\dots}$. Numerically, with ${h=0.001}$, we get

$\displaystyle \frac{f(1+h) - f(1)}{h} \approx \mathbf{2.71}9 \dots$

(error ${>10^{-3}}$) versus

$\displaystyle \frac{f(1+h) - f(1-h)}{2h} \approx \mathbf{2.71828}2 \dots$

(error ${<10^{-6}}$).

Considering this, why don’t we ditch (1) altogether and adopt (2) as the definition of derivative? Just say that by definition,

$\displaystyle f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h)-f(x-h)}{2h}$

whenever the limit exists.

This expands the class of differentiable functions: for example, ${f(x)=|x|}$ becomes differentiable with ${f'(0)=0}$. Which looks more like a feature than a bug: after all, ${f}$ has a minimum at ${0}$, and the horizontal line through the minimum is the closest thing to the tangent line that it has.

Another example: the function

$\displaystyle f(x) = \begin{cases} x ,\quad & x\le 0 \\ 3x,\quad & x>0 \end{cases}$

has ${f'(0)=2}$ under this definition, because

$\displaystyle \lim_{h\rightarrow 0^+}\frac{f(x+h)-f(x-h)}{2h} = \lim_{h\rightarrow 0^+}\frac{3h-(-h)}{2h} = 2$

and

$\displaystyle \lim_{h\rightarrow 0^-}\frac{f(x+h)-f(x-h)}{2h} = \lim_{h\rightarrow 0^-}\frac{h-3(-h)}{2h} = 2$

This example also makes sense: since ${f(x)=|x|+2x}$, getting ${f'(0)=0+2}$ is expected. In fact, with the new definition we still have basic derivative rules: if ${f,g}$ are differentiable, then ${f+g}$, ${f-g}$, ${fg}$, ${f/g}$ are also differentiable (with the usual caveat about ${g\ne 0}$) and the familiar formulas hold.

Let’s test the chain rule on the function ${g = f\circ f}$. The rule says

$\displaystyle g'(0) = f'(f(0)) f'(0) \ \ \ \ \ \ \ \ \ \ (3)$

Since ${f(0)=0}$, the product on the right is ${2\cdot 2}$. On the other hand,

$\displaystyle g(x) = \begin{cases} x ,\quad & x\le 0 \\ 9x,\quad & x>0 \end{cases}$

which implies, by a computation similar to the above, that ${g'(0)=5}$. So, if we want to have the chain rule (3), we must accept that

$\displaystyle \mathbf{2\cdot 2 = 5}$

This is where the desire for high numerical precision leads.

Plenty of other things go wrong with the symmetric definition:

• Maximum or minimum of ${f}$ may be attained where ${f'}$ exists and is nonzero.
• A differentiable function may be discontinuous.
• Having ${f'>0}$ everywhere does not imply that ${f}$ is increasing.
• Mean Value Theorem fails.