The Nelder-Mead minimization algorithm

It is easy to find the minimum of {f(x,y) = x^2+16y^2} if you are human. For a computer this takes more work:

Search for the minimum of x^2+16y^2
Search for the minimum of x^2+16y^2

The animation shows a simplified form of the Nelder-Mead algorithm: a simplex-based minimization algorithm that does not use any derivatives of {f}. Such algorithms are easy to come up with for functions of one variable, e.g., the bisection method. But how to minimize a function of two variables?

A natural way to look for minimum is to slide along the graph in the direction opposite to {\nabla f}; this is the method of steepest descent. But for computational purposes we need a discrete process, not a continuous one. Instead of thinking of a point sliding down, think of a small tetrahedron tumbling down the graph of {f}; this is a discrete process of flips and flops. The process amounts to the triangle of contact being replaced by another triangle with an adjacent side. The triangle is flipped in the direction away from the highest vertex.

This is already a reasonable minimization algorithm: begin with a triangle {T}; find the values of {f} at the vertices of {T}; reflect the triangle away from the highest value; if the reflected point {R} has a smaller value, move there; otherwise stop.

But there’s a problem: the size of triangle never changes in this process. If {T} is large, we won’t know where the minimum is even if {T} eventually covers it. If {T} is small, it will be moving in tiny steps.

Perhaps, instead of stopping when reflection does not work anymore, we should reduce the size of {T}. It is natural to contract it toward the “best” vertex (the one with the smallest value of {f}), replacing two other vertices with the midpoints of corresponding sides. Then repeat. The stopping condition can be the values of {f} at all vertices becoming very close to one another.

This looks clever, but the results are unspectacular. The algorithm is prone to converge to a non-stationary point where just by an accident the triangle attains a nearly horizontal position. The problem is that the triangle, while changing its size, does not change its shape to fit the geometry of the graph of {f}.

The Nelder-Mead algorithm adapts the shape of the triangle by including the possibility of stretching while flipping. Thus, the triangle can grow smaller and larger, moving faster when the path is clear, or becoming very thin to fit into a narrow passage. Here is a simplified description:

  • Begin with some triangle {T}.
  • Evaluate the function {f} at each vertex. Call the vertices {W,G,B} where {W} is the worst one (the largest value of {f}) and {B} is the best.
  • Reflect {W} about the midpoint of the good side {GB}. Let {R} be the reflected point.
  • If {f(R)<f(B)}, then we consider moving even further in the same direction, extending the line {WR} beyond {R} by half the length of {WR}. Choose between {R} and {E} based on where {f} is smaller, and make the chosen point a new vertex of our triangle, replacing {W}.
  • Else, do not reflect and instead shrink the triangle toward {B}.
  • Repeat, stopping when we either exceed the number of iterations or all values of {f} at the vertices of triangle become nearly equal.

(The full version of the Nelder-Mead algorithm also includes the comparison of {R} with {G}, and also involves trying a point inside the triangle.)

Rosenbrock's function
Rosenbrock’s function

This is Rosenbrock’s function {f(x,y)=100(x^2-y)^2 + (x-1)^2}, one of standard torture tests for minimization algorithms. Its graph has a narrow valley along the parabola {y=x^2}. At the bottom of the valley, the incline toward the minimum {(1,1)} is relatively small, compared to steep walls surrounding the valley. The steepest descent trajectory quickly reaches the valley but dramatically slows down there, moving in tiny zig-zagging steps.

The algorithm described above gets within {0.001} of the minimum in 65 steps.

Minimizing Rosenbrock's function
Minimizing Rosenbrock’s function

In conclusion, Scilab code with this algorithm.

x = -0.4:0.1:1.6; y = -2:0.1:1.4          // viewing window  
[X,Y] = meshgrid(x,y); contour(x,y,f(X,Y)',30)  // contour plot
plot([1],[1],'r+')                             // minimum point
tol = 10^(-6)
n = 0
T = [0, -1.5 ; 1.4, -1.5; 1.5, 0.5]        //  initial triangle
for i=1:3 
    values(i) = f(T(i,1), T(i,2))
end
while (%T) 
    xpoly(T(:,1),T(:,2),'lines',1)         // draw the triangle  
    [values, index] = gsort(values)          // sort the values 
    T = T(index,:)       
    if values(1)-values(3) < tol            // close enough?
        mfprintf(6, "Minimum at (%.3f, %.3f)",T(3,1),T(3,2))
        break 
    end
    R = T(2,:) + T(3,:) - T(1,:)             // reflected  
    fR = f(R(1),R(2))
    if fR < values(3)                         
        E = 1.5*T(2,:) + 1.5*T(3,:) - 2*T(1,:)  // extended  
        fE = f(E(1),E(2))
        if fE < fR 
            T(1,:)=E; values(1)=fE     // pick extended 
        else
            T(1,:)=R; values(1)=fR     // pick reflected 
        end
    else 
        for i=1:2
            T(i,:) = (T(i,:)+T(3,:))/2      // shrink
            values(i) = f(T(i,1), T(i,2))      
        end
    end
    n = n+1
    if n >= 200
        disp('Failed to converge'); break    //  too bad 
    end
end

Hertzsprung–Russell diagram of Stack Exchange sites

The Hertzsprung–Russell diagram is a scatter plot of stars by temperature and luminosity.

Hertzsprung–Russell diagram*

Traditionally, it is shown with temperature axis pointing from right to left, which I don’t really like.

Stack Exchange family of sites is not (yet) as numerous as stars in the Universe; there are only 120 or so of them. But they can also be organized on a two-parameter log-log scatterplot. The two parameters are: total number of questions (intrinsic characteristic, like surface temperature) and the number of daily visits (luminosity in Internet terms).

The linear scale chart was not going to work, due to the supergiant size of Stack Overflow:

Linear scale
Linear scale

Even on the log-log scale Stack Overflow is an outlier, but within reason:

log-log scale
log-log scale

The colors follow Stack Exchange classification: Technology, Science, Culture & Recreation, Life & Arts, Business, Professional. The largest of science sites, and the second largest overall, is Mathematics, although it trails several non-science sites in luminosity.

Annotated version of the above diagram:

Annotated diagram
Annotated diagram

Both Mathematics and MathOverflow have low traffic compared to their size: perhaps the Internet audience is just not that into math? On the other hand, the young Mathematics Educators site is way up in the traffic category.

Ironically, Astronomy is a low-luminosity site, trailing in traffic the much smaller Earth Science. Other annotated science sites are well established by now: Physics, Statistics, and (slow-moving) Theoretical Computer Science.

The only two non-technology sites that beat Mathematics in traffic are Gaming and English. Gaming isn’t really an exception here, since all large “technology” sites revolve around computers.

Three of the sites have extremely low traffic currently: Beer, Italian, and Expatriates. If this does not improve, they may be shut down… or perhaps merged into one: Beer for Italian Expatriates.


(*) “Hertzsprung-Russel StarData” by ESOhttp://www.eso.org/public/images/. Licensed under CC BY 3.0 via Wikimedia Commons.

Fractal-ish monotone functions

There are several ways to construct a strictly increasing continuous function which has zero derivative almost everywhere. I like the explicit construction given by R. Salem, On some singular monotonic functions which are strictly increasing (1943).

lambda = 1/4
lambda = 1/4

Here is a streamlined version of the construction.

Fix {\lambda\in (0,1/2)} (on the above picture {\lambda=1/4}). Let {f_0(x)=x}, and inductively define {f_{n+1}} so that

  1. {f_{n+1} (x) = f_n(x)} when {x\in 2^{-n}\mathbb Z}.
  2. If {x\in 2^{-n}\mathbb Z}, let {f_{n+1}(x+2^{-n-1}) =\lambda f_n(x) + (1-\lambda) f_n(x+2^{-n})}.
  3. Now that {f_{n+1}} has been defined on {2^{-n-1}\mathbb Z}, extend it to {\mathbb R} by linear interpolation.
  4. Let {f=\lim f_n}.

Since {f(x+1)=f(x)+1} by construction, it suffices to understand the behavior of {f} on {[0,1]}.

Each {f_n} is piecewise linear and increasing. At each step of the construction, every line segment of {f_n} (say, with slope {s}) is replaced by two segments, with slopes {2(1-\lambda)s} and {2\lambda s}. Since {\lambda<1/2}, it follows that {f_{n+1}(x+2^{-n-1}) > f_n(x+2^{-n-1})}. Hence, {f_{n+1}\ge f_n}.

Since {f(x)=f_n(x)} when {x\in 2^{-n}\mathbb Z}, it is easy to understand {f} by considering its values at dyadic rationals and using monotonicity. This is how one can see that:

  • The difference of values of {f} at consecutive points of {2^{-n}\mathbb Z} is at most {(1-r)^{n}}. Therefore, {f} is Hölder continuous with exponent {\alpha = - \frac{\log (1-r)}{\log 2}}.
  • The difference of values of {f} at consecutive points of {2^{-n}\mathbb Z} is at least {r^{n}}. Therefore, {f} is strictly increasing, and its inverse is Hölder continuous with exponent {\alpha = - \frac{\log r}{\log 2}}.

It remains to check that {f'=0} almost everywhere. Since {f} is monotone, it is differentiable almost everywhere. Let {x} be a point of differentiability (and not a dyadic rational, though this is automatic). For each {n} there is {x_n\in 2^{-n}\mathbb Z} such that {x_n < x< x_n+2^{-n}}. Let {s_n = 2^{n} (f_n(x_n+2^{-n})-f_n(x_n))}; this is the slope of {f_n} on the {2^{-n}}-dyadic interval containing {x}. Since {f'(x)} exists, we must have {f'(x) = \lim_{n\rightarrow\infty} s_n}. On the other hand, the ratio of consecutive terms of this sequence, {s_{n+1}/s_n}, is always either {2 (1-\lambda )} or {2\lambda}. Such a sequence cannot have a finite nonzero limit. Thus {f'(x)=0}.


Here is another {f}, with {\lambda=1/8}.

lambda = 1/8
lambda = 1/8

By making {\lambda} very small, and being more careful with the analysis of {f'}, one can make the Hausdorff dimension of the complement of {\{x \colon f'(x)=0\}} arbitrarily small.


An interesting modification of Salem’s function was introduced by Tukia in Hausdorff dimension and quasisymmetric mappings (1989). For the functions considered above, the one-sided derivatives at every dyadic rational are zero and infinity, which is a rather non-symmetric state of affair. In particular, these functions are not quasisymmetric. But Tukia showed that if one alternates between {\lambda} and {1-\lambda} at every step, the resulting homeomorphism of {\mathbb R} becomes quasisymmetric. Here is the picture of alternating construction with {\lambda=1/4}; preliminary stages of construction are in green.

Tukia's modification of Salem's function
Tukia’s modification of Salem’s function

This is quite similar to how the introduction of alternating signs turns Takagi curve (blancmange curve) into a quasiarc (i.e., a curve without cusps); see Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk. But the fractal curves in this post are relatively mild-mannered: they are rectifiable (and thus, not really fractal).


Here is the simple Scilab code I used for the above plots.

r = 1/4
t = [0 1]
f = t
for i = 1:10
    t = [t, t(1:$-1)+2^(-i)]
    f = [f, r*f(1:$-1)+(1-r)*f(2:$)]
    [t, ind] = gsort(t,'g','i')
    f = f(ind)
end
plot(t,f)

To have preliminary stages shown as well, move plot(t,f) into the loop. For Tukia’s alternating version, insert the line r = 1-r into the loop.


Link to preliminary version of this post.

Unreasonable effectiveness of Runge-Kutta 2

I used to give this example of an initial-value problem whenever numerical solutions of ODE came up in class:

\displaystyle    y' = -y^2, \quad y(0)=1

The exact solution is {y(t)=1/(1+t)}. Using Euler’s method with step {h=1} (admittedly large) we get, starting with {y_0=1}:

\displaystyle    y_1 = y_0 + h (-y_0^2) = 0,\quad y_2 = y_1 + h (-y_1^2) = 0, \dots

The numerical solution drops down to zero at once and stays there. Not good at all.

Euler's method
Euler’s method

The trapezoidal version of RK2 (also known as improved Euler, or Heun’s method) treats the above value of {y_1} merely as prediction {\tilde y_1}, and computes the actual {y_1} using the average of the slopes at {y_0} and {\tilde y_1}:

\displaystyle    y_1 = y_0 + \frac{h}{2} (-y_0^2 - \tilde y_1^2) = 1 + \frac12 (-1 - 0 ) = \frac12

… which is right on target. Let’s do it again: the prediction {\tilde y_2 = y_1 + h (-y_1^2) = 1/4} and correction

\displaystyle    y_2 = y_1 + \frac{h}{2} (-y_1^2 - \tilde y_2^2) = 1 + \frac12 \left(-\frac{1}{4}-\frac{1}{16} \right) =    \frac{11}{32}

… which is {\approx 0.34}, compared to the exact solution {1/3}. Nice!

Trapezoidal RK2
Trapezoidal RK2

I like the example, because all numbers are manageable with mental arithmetics for two steps. And it’s clear that {11/32} is pretty close to {1/3 = 11/33}.


But the fact that the relative error stays below {3.7\%} throughout the computation despite the large step size {h=1} seems… too good to be true. Consider that for the exact solution, the expression {y^2} can differ by the factor of {4} between two {t}-values at distance {1}.

The midpoint version of RK2, which generally performs about as well as the trapezoidal version, is nowhere as close in this example. Indeed, using the same data as above, we get

\displaystyle    y_1 = y_0 - h \left(\frac{y_0+\tilde y_1}{2}\right)^2 = \frac{3}{4}

and so forth: the relative error reaches {59\%} at the second step. That’s 16 times worse than the trapezoidal version.

Midpoint RK2
Midpoint RK2

What happens here has less to do with numerical analysis than with algebra of rational functions. Using {h=1} in trapezoidal RK2, we are in effect iterating the function

\displaystyle \varphi(y) = y + \frac12 (-y^2-(y-y^2)^2) = y-y^2+y^3-\frac12 y^4

The exact solution would be obtained by iterating

\displaystyle \Phi (y) = \frac{y}{1+y} = y-y^2+ y^3 - y^4 + \dots

Two functions just happen to coincide at {y=1}, which is our starting point here.

Two iterands
Two iterands

From there we get to {0.5}, and on {[0,0.5]} they are really close anyway, due to another coincidence: the truncation error {|\varphi(y)-\Phi(y)|} is {O(y^4)} instead of {O(y^3)} as it is normally for second-order methods.

The midpoint method with {h=1} amounts to iterating

\displaystyle \psi(y) = y - \left(\frac{y+y-y^2}{2} \right)^2 = y-y^2+y^3-\frac14 y^4

which is not substantially further away from {\Phi}, but does not enjoy the lucky coincidence at {y=1}.


The tables are turned with the initial value {y_0=3}. The exact solution is {y(t) = 3/(1+3t)}, which drops sharply from {t=0} to {t=1}; its slope decreases by the factor of {16} during one step. Still, midpoint-RK2 does a decent job with {h=1}:

Midpoint RK2 starting with y=3
Midpoint RK2 starting with y=3

while trapezoidal RK2 outputs {y_1=-19.5}, {y_2=-80110}, {y_3 = -2\cdot 10^{19}} and promptly overflows.


With a reasonable step size, like {h=0.1}, normality is restored: both methods perform about equally well, with {0.13\%} error for trapezoidal and {0.21\%} for midpoint.

Calculating 2*2 with high precision

The definition of derivative,

\displaystyle    f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h) - f(x)}{h}   \ \ \ \ \ \ \ \ \ \ (1)

is not such a great way to actually find the derivative numerically. Its symmetric version,

\displaystyle    f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h) - f(x-h)}{2h}   \ \ \ \ \ \ \ \ \ \ (2)

performs much better in computations. For example, consider the derivative {f(x)=e^x } at the point {x=1}. We know that {f'(1)=2.718281828\dots}. Numerically, with {h=0.001}, we get

\displaystyle    \frac{f(1+h) - f(1)}{h} \approx \mathbf{2.71}9 \dots

(error {>10^{-3}}) versus

\displaystyle    \frac{f(1+h) - f(1-h)}{2h} \approx \mathbf{2.71828}2 \dots

(error {<10^{-6}}).

Considering this, why don’t we ditch (1) altogether and adopt (2) as the definition of derivative? Just say that by definition,

\displaystyle    f'(x) = \lim_{h\rightarrow 0}\frac{f(x+h)-f(x-h)}{2h}

whenever the limit exists.

This expands the class of differentiable functions: for example, {f(x)=|x|} becomes differentiable with {f'(0)=0}. Which looks more like a feature than a bug: after all, {f} has a minimum at {0}, and the horizontal line through the minimum is the closest thing to the tangent line that it has.

Another example: the function

\displaystyle     f(x) = \begin{cases} x ,\quad & x\le 0 \\ 3x,\quad & x>0 \end{cases}

has {f'(0)=2} under this definition, because

\displaystyle    \lim_{h\rightarrow 0^+}\frac{f(x+h)-f(x-h)}{2h} = \lim_{h\rightarrow 0^+}\frac{3h-(-h)}{2h} = 2

and

\displaystyle    \lim_{h\rightarrow 0^-}\frac{f(x+h)-f(x-h)}{2h} = \lim_{h\rightarrow 0^-}\frac{h-3(-h)}{2h} = 2

This example also makes sense: since {f(x)=|x|+2x}, getting {f'(0)=0+2} is expected. In fact, with the new definition we still have basic derivative rules: if {f,g} are differentiable, then {f+g}, {f-g}, {fg}, {f/g} are also differentiable (with the usual caveat about {g\ne 0}) and the familiar formulas hold.

Let’s test the chain rule on the function {g = f\circ f}. The rule says

\displaystyle     g'(0) = f'(f(0)) f'(0)    \ \ \ \ \ \ \ \ \ \ (3)

Since {f(0)=0}, the product on the right is {2\cdot 2}. On the other hand,

\displaystyle     g(x) = \begin{cases} x ,\quad & x\le 0 \\ 9x,\quad & x>0 \end{cases}

which implies, by a computation similar to the above, that {g'(0)=5}. So, if we want to have the chain rule (3), we must accept that

\displaystyle     \mathbf{2\cdot 2 = 5}

This is where the desire for high numerical precision leads.

Plenty of other things go wrong with the symmetric definition:

  • Maximum or minimum of {f} may be attained where {f'} exists and is nonzero.
  • A differentiable function may be discontinuous.
  • Having {f'>0} everywhere does not imply that {f} is increasing.
  • Mean Value Theorem fails.

Polygonal inequalities: beyond the triangle

(Related to previous post but can be read independently). The triangle inequality, one of the axioms of a metric space, can be visualized by coloring the vertices of a triangle by red and blue.

Triangle with colored vertices
Colored Triangle

The inequality says that the sum of monochromatic distances must not exceed the sum of dichromatic distances. That is, for every assignment of the vertices to points in the space, the sum of all red-red and blue-blue distances does not exceed the sum of red-blue distances. An assignment is just a map from the set of vertices {V} to the metric space {X}; it need not be injective.

But why stop at the triangle? We can take any number of points (vertices), color them in some way, and require the same polygonal inequality:

\displaystyle \text{monochromatic } \le \text{ dichromatic}

Already for the square we have two distinct plausible colorings to explore: even red-blue split

Evenly colored square
Evenly Colored Square

and predominantly red square

(mostly) Red Square
(mostly) Red Square

But it turns out that the second coloring is useless: the inequality it defines fails in every metric space with more than one point. More generally, suppose we have {R} red points and {B} blue ones, and {R- B\ge 2}. Pick two distinct points {a,b\in X}. Assign one red point to {a} and all others to {b}. The sum of monochromatic distances is {(R-1)\,d(a,b)} while the sum of dichromatic distances is {B\,d(a,b)}, which is strictly less.

So, we are limited to nearly-even colorings: those with {|R-B|\le 1}. For even numbers of vertices this means even split, while odd numbers should be divided as evenly as possible: like 3+2 for the pentagon.

Here is the pentagram again.
Colored Pentagon

The inequalities turn out to be related. For every {n}, the {n}-gonal inequality implies the {(n-2)}-gonal inequality, because if we assign two opposite-colored vertices to the same point, their contributions cancel out. More interestingly: when {n} is odd, the {n}-gonal inequality implies the {(n-1)}-gonal inequality. Indeed, suppose we have {(n-1)} points, evenly colored. Add an arbitrary {n}th point. Whether the added point is blue or red, the {n}-gonal inequality holds. Averaging these two inequalities, we see the effect of the added points canceling out.

So, if the {n}-gonal inequality holds for all odd {n}, it holds for all {n}. This property is exactly the hypermetric property from the previous post. Except it was stated there in a different form:

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. But if the point {x_i} is repeated {|b_i|} times, we can replace {b_i} by {\mathrm{sign}\,b_i}. Then represent +1 as blue and -1 as red.

The hypermetric inequalities were introduced by John B. Kelly (a student of William Ted Martin) in late 1960s. He showed they are necessary for the space to be embeddable into the space {\ell^1}. It would be great if they were also sufficient (and for some classes of spaces they are), but this is not so: a counterexample was given by Patrice Assouad in 1977.

It is also interesting to consider the {n}-gonal inequalities for even {n} only. By repetition of vertices, they are equivalent to requiring

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \quad\quad \quad (1)

for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=0}. But of course then we have (1) for rational {b_i} (just clear the denominators), hence for all real {b_i} (by approximation), as long as they sum to {0}. So, the requirement amounts to the matrix {(d(x_i,d_j))} being negative semidefinite on the subspace {\sum x_i=0}. Such metrics are called metrics of negative type.

Their relation to embeddability of the space is well-known: {(X,d)} is of negative type if and only if the “snowflake” {(X,\sqrt{d})} isometrically embeds into a Hilbert space. In other words, we can “draw” any finite metric space of negative type in a Euclidean space, with the understanding that Euclidean distances represent the square roots of the actual distances. This embedding result is a 1935 theorem of Isaac Schoenberg who is also known for connecting dots naturally (introducing splines).

Pentagrams and hypermetrics

The Wikipedia article Metric (mathematics) offers a plenty of flavors of metrics, from common to obscure: ultrametric, pseudometric, quasimetric, semimetric, premetric, hemimetric and pseudoquasimetric (I kid you not).

One flavor it does not mention is a hypermetric. This is a metric {d} on a set {X} such that the inequality

\displaystyle  \sum_{i,j}b_i b_j d(x_i , x_j ) \le 0 \ \ \ \ \ \ \ \ \ (1)

holds for every finite sequence of points {x_i\in X} and every choice of integers {b_i} such that {\sum_i b_i=1}. The requirement that {b_i} be integers gives some combinatorial meaning to (1); this is not just some quadratic form being negative semidefinite.

As a warm-up, observe that (1) contains in it the triangle inequality: with {b_1=b_2=1} and {b_3=-1} we get {d(x_1,x_2)-d(x_1,x_3)-d(x_2,x_3)\le 0}. But it appears that (1) says something about “polygons” with more vertices too.

To make (1) worth thinking about, it should be satisfied by some important metric space. Such as the real line {\mathbb R}, for example. It is not quite obvious that the inequality

\displaystyle  \sum_{i,j}b_i b_j |a_i - a_j| \le 0 \ \ \ \ \ \ \ \ \ (2)

holds for all reals {a_i} and all integers {b_i} adding up to {1}. It helps to order the numbers: {a_1\le \dots\le a_m} and focus on the contribution of a particular gap {[a_k,a_{k+1}]} to the sum (2). The amount it contributes is {|a_k-a_{k+1}|} multiplied by

\displaystyle    \sum_{i\le k<j} b_i b_j = \left(\sum_{i\le k} b_i \right) \left(\sum_{j > k} b_j \right) = \left(\sum_{i\le k} b_i \right) \left(1-\sum_{i\le k} b_i \right) \le 0

because {n(1-n)\le 0} for every integer {n}. This proves (2).

Now that we have one hypermetric space, {\mathbb R}, other such spaces can be created easily. If {X} is any set and {f \colon X\rightarrow\mathbb R} any function, consider {d(x,y) = |f(x)-f(y)|}, the pullback pseudometric on {X}. By applying (2) to the numbers {f(x_i)}, we see that {d} satisfies the hypermetric inequality. Since (1) is additive in {d}, we can take any family of functions {f_\alpha \colon X\rightarrow\mathbb R} and add together the corresponding pseudometrics. Or even integrate them against a positive measure: {d(x,y)=\int |f_\alpha(x)-f_\alpha(y)|\,d\mu(\alpha)}.

For example, the plane {\mathbb R^2} is a hypermetric space, because the distance between two points {(x_1,y_1)} and {(x_2,y_2)}, besides the familiar form

\displaystyle  \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}

can also be represented as an integral of the aforementioned pullbacks:

\displaystyle  \frac12 \int_0^\pi \big| (x_1-x_2)\cos \alpha + (y_1-y_2) \sin\alpha \big| \,d\alpha

A similar integral representation holds in all dimensions; thus, all Euclidean spaces are hypermetric.

Okay, what is not a hypermetric then? For example, the cube distance induced by the norm {\|x\|_\infty=\max |x_i|} is not, in dimensions 3 and higher. Specifically, (1) fails as the five-point inequality with {(b_1,\dots,b_5) =(1,1,1,-1,-1)}. I’ll call it the pentagram inequality:

also known as K_5 in mystical literature
also known as K_5 in mystical literature

It says that for any five points in the space the sum of monochromatic distances does not exceed the sum of all bi-chromatic (red-blue) distances.

The pentagram inequality fails when {x_1,\dots,x_5} are the columns of the matrix

\displaystyle  \begin{pmatrix}    1& 1& 1& 2& 0\\    0& 2& 2& 1& 1\\    0& 1& 0& 1& 0\\    \end{pmatrix}

(first three columns blue, the last two red). Indeed, the sum of monochromatic distances is {2+1+2+2=7} while the sum of bichromatic distances is {1+1+1+1+1+1=6}.

If the above example does not look conceptual enough, it’s because I found it via computer search. I don’t have much intuition for the pentagram inequality.

Anyway, the example delivers another proof that taking the maximum of three numbers is hard. More precisely, there is no isometric embedding of {\mathbb R^3} with the maximum metric into {\ell_1}. Unlike the earlier proof, this one does not assume the embedding is linear.

A good reference for hypermetric inequalities is the book Geometry of cuts and metrics by Deza and Laurent.

Maximum of three numbers: it’s harder than it sounds

This simple identity hold for any two real numbers {x,y}:

\displaystyle   \max(|x|,|y|) = \frac12\,(|x+y|+|x-y|)   \ \ \ \ \  \ \ \ \ \ (1)

Indeed, if {|x|} realizes the maximum, then both {x+y} and {x-y} have the same sign as {x}. After opening the absolute value signs, we get {y} to cancel out.

So, (1) represents {\max(|x|,|y|)}, also known as the {\ell_\infty} norm, as the sum of absolute values of linear functions. Let’s try the same with {\max(|x|,|y|,|z|)}. Since the right hand side of (1) is just the average of {|\pm x \pm y|} over all possible choices of {\pm } signs, the natural thing to do is to average {|\pm x \pm y \pm z|} over all eight choices. The sign in front of {x} can be taken to be {+}, which simplifies the average to

\displaystyle    \frac14\,(|x+y+z|+|x+y-z|+|x-y+z|+|x-y-z|)    \ \ \ \ \  \ \ \ \ \ (2)

Does this formula give {\max(|x|,|y|,|z|)}? Instead of trying random numbers, let’s just plot the unit ball for the norm given by (2). If the identity works, it will be a cube. I used Maple:

with(plots): f:=(abs(x+y+z)+abs(x+y-z)+abs(x-y-z)+abs(x-y+z))/4:
implicitplot3d(f=1,x=-1/4..1/4,y=-1/4..1/4,z=-1/4..1/4,grid=[25,25,25]);
My precious!
My precious!

Close, but no cube. Luckily, this is my favorite Archimedean solid, the cuboctahedron.

Although all terms of (2) look exactly the same, the resulting shape has both triangular and square faces. Where does the difference of shapes come from?

More importantly, is (2) really the best we can do? Is there some other sum of moduli of linear functions that will produce {\max(|x|,|y|,|z|)}?

— No.

Even if negative coefficients are allowed?

— Even then. (But you can come arbitrarily close.)

What if we allow integrals with respect to an arbitrary (signed) measure, as in

\displaystyle    \iiint |\alpha x+\beta y+\gamma z|\,d \mu(\alpha, \beta, \gamma)    \ \ \ \ \  \ \ \ \ \ (3)

— Still no. But if {\mu} is allowed to be a distribution of higher order (an object more singular than a measure), then a representation exists (W. Weil, 1976). Yes, one needs the theory of distributions to write the maximum of three numbers as a combination of linear functions.

I’ll only prove that there is no identity of the form

\displaystyle  \max(|x|,|y|,|z|) = \sum_{i=1}^N |\alpha_i x+\beta_i y+ \gamma_i z|

Indeed, such an identity amounts to having an isometric embedding {T\colon \ell_\infty^3 \rightarrow \ell_1^N}. The adjoint operator {T^* \colon \ell_\infty^N \rightarrow \ell_1^3} is a submetry meaning that it maps the unit ball of {\ell_\infty^N } onto the unit ball {\ell_1^3}. The unit ball of {\ell_\infty^N } is just a cube; all of its faces are centrally symmetric, and this symmetry is preserved by linear maps. But {\ell_1^3} is an octahedron, with triangular faces. A contradiction. {\ \Box}

An aside: what if instead of averaging {|\pm x \pm y|} over all {\pm } choices (i.e., unimodular real coefficients) we take the average over all unimodular complex coefficients? This amounts to {\|(x,y)\| = \frac{1}{2\pi} \int_0^{2\pi} |x+e^{it}y|\,dt}. I expected something nice from this norm, but

Complex averaging in real space
Complex averaging in real space

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.

Connecting dots naturally

How to draw a nice curve through given points {(x_i,y_i)}?

Some data
Some data

One way is to connect them with straight lines, creating a piecewise linear function:

Piecewise linear interpolant
Piecewise linear interpolant

This is the shortest graph of a function that interpolates the data. In other words, the piecewise linear function minimizes the integral

\displaystyle    \int_0^{10} \sqrt{1+(f'(x))^2}\,dx

among all functions with {f(x_i)=y_i}. As is often the case, the length functional can be replaced with the elastic energy

\displaystyle    \mathcal{E}(f) = \int_0^{10} (f'(x))^2\,dx

because the piecewise linear {f} (and only it) minimizes it too.

Of course, it is not natural for the connecting curve to take such sharp turns at the data points. One could try to fit a polynomial function to these points, which is guaranteed to be smooth. With 11 points we need a 10th degree polynomial. The result is disappointing:

Interpolating polynomial
Interpolating polynomial

It is not natural for a curve connecting the points with {1\le y\le 9} to shoot up over {y=40}. We want a connecting curve that does not wiggle more than necessary.

To reduce the wiggling and remove sharp turns at the same time, one can minimize the bending energy of the function, thinking of its graph as a thin metal rod. This energy is

\displaystyle    \mathcal{B}(f) = \int_0^{10} (f''(x))^2\,dx

and the function that minimizes it subject to conditions {f(x_i)=y_i} looks very nice indeed:

Natural cubic spline
Natural cubic spline

The Euler-Lagrange equation for the functional {\mathcal{B}} dictates that the fourth derivative of {f} is zero in the intervals between the knots {x_i}. Thus, {f} is a piecewise cubic polynomial. Also, both {f} and {f'} must be continuous for any function with integrable second derivative. More delicate analysis is required for {f''}, but it also can be shown to be continuous for minimizing function {f}; moreover, {f''} must vanish at the endpoints {0} and {10}. Taken together, these properties (all derived from the variational problem) complete the description of a natural cubic spline.

It remains to actually construct one. I prefer to think of this process as adding a correction term {C} to the piecewise linear interpolant {L}. Here the spline is shown together with {L} (green) and {C} (magenta).

PL interpolant, correction term, and their sum: the cubic spline
PL interpolant, correction term, and their sum: cubic spline

On each interval {[x_i,x_{i+1}]} the correction term {C} is a cubic polynomial vanishing at both endpoints. The space of such polynomials is two-dimensional: thus,

\displaystyle    C(x) = \alpha_i (x_{i+1}-x)^2(x-x_i) + \beta_i (x_{i+1}-x)(x-x_i)^2

on this interval. There are 20 coefficients {\alpha_i}, {\beta_i} to find. At each of 9 knots 1,2,…9 we have two conditions: {C''} must have removable singularity and {C'} must jump by the amount opposite to the jump of {L'}. Since {C''} also vanishes at {1,10}, there are 20 linear equations for 20 unknowns.

It is easier to set up a linear system in terms of {z_i=C''(x_i)}. Indeed, the values of {C''} at two consecutive knots determine the correction term within: {\alpha_i= \dfrac{z_{i+1}+2z_i}{6} } and {\beta_i = \dfrac{2z_{i+1}+z_i}{6}}. This leaves {n-1} equations (from the jumps of {C'}) for {n-1} unknowns. The best part is that the matrix of this system is really nice: tridiagonal with dominant diagonal.

\displaystyle    \frac{1}{6}\begin{pmatrix} 4 & 1 & 0 & 0 & \ldots & 0 & 0 \\    1 & 4 & 1 & 0 & \ldots & 0 & 0 \\    0 & 1 & 4 & 1 & \ldots & 0 & 0 \\    \vdots & \vdots & \vdots & \vdots & \ddots & \ldots & \ldots \\    0 & 0 & 0 & 0 & \ldots & 1 & 4    \end{pmatrix}

One can solve the system for {z_i} within a for loop, but I used the Scilab solver instead. Here is the Scilab code for the most interesting part: the spline. The jumps of the derivative of the piecewise linear interpolant are obtained from the second order difference of the sequence of y-values.

a = 0; b = 10
y = [3 1 4 1 5 9 2 6 5 3 5]
n = length(y)-1
h = (b-a)/n
jumps = diff(y,2)/h
A = (h/6)*(diag(4*ones(1,n-1))+diag(ones(1,n-2),1)+diag(ones(1,n-2),-1))
z = [0,-jumps/A,0] 
allx = []; spl = []
for i=1:n  
   xL = a+h*(i-1)
   xR = a+h*i
   x = linspace(xL,xR,100)
   linear = y(i)*(xR-x)/h + y(i+1)*(x-xL)/h
   cor = ((z(i+1)+2*z(i))*(xR-x)+(2*z(i+1)+z(i))*(x-xL)).*(xR-x).*(x-xL)/(6*h)
   allx = [allx, x]   
   spl = [spl, linear + cor] 
end
plot(allx, spl) 
plot(a:h:b, y, 'r*')

Rectangular boxes: handle with care

A rectangular box (aka parallelepiped) looks like a sturdy object:

Rainbow sheep inside
Rainbow sheep inside

But this particular box, with dimensions 7.147 by 6.021 by 4.095, took me the better part of an hour to figure out.

It was a part of a numerical methods assignment: find the dimensions of a box with given volume {V}, surface area {S}, and diameter {D} (i.e., space diagonal). Algebraic approach leads to pretty ugly expressions, which is the motivation for a numerical method. Specifically, the task was to apply the Newton-Raphson method to the map

\displaystyle    F(x,y,z) = \begin{pmatrix} xyz-V  \\ 2(xy+yz+xz)-S  \\ x^2+y^2+z^2-D^2 \end{pmatrix}

Of course, I understood that not every triple {(V,S,D)} is attainable. Also realized that the Jacobian of {F} is degenerate when two of the coordinates coincide, which is a problem for the method. So I thought: let’s generate some random {x,y,z} values that are not too close to one another, and give students the resulting parameters {V,S,D}.

With {x=7.147}, {y=6.021}, and {z=4.095} the parameters are {V=176.216}, {S=193.91}, and {D=10.203}. Sure, a little rounding can’t hurt when numbers are of this size and we are quite far from the critical points of {F}. So I put {V=176}, {S=194} and {D=10} in one of the versions of the assignment.

But the Newton-Raphson method would not converge… because no such box exists! The rounding did hurt after all.

This motivated me to describe all attainable triples {(V,S,D)} explicitly, which ended up being less of a chore than I expected. It helps to realize that {(x+y+z)^2 = D^2+S}, which reduces the search to the intersection of the sphere {x^2+y^2+z^2=D^2} with the plane {x+y+z=\sqrt{D^2+S}}. This is a circle (called {C} below), and the allowed range for {V} is between the minimum and maximum of {xyz} on {C}.

This goes into Calculus 3 territory. Using Lagrange multipliers with two constraints looks like a tedious job. Instead, I decided to parametrize {C}. Its center is {(c,c,c)} where {c=\dfrac13\sqrt{D^2+S}}. The radius is {\displaystyle r = \sqrt{ D^2 - \frac13(D^2+S)}=\frac13 \sqrt{6D^2-3S}}. We also need an orthonormal basis of the subspace {x+y+z=0}: the vectors

\displaystyle  \frac{1}{\sqrt{6}} \langle 2, -1, -1\rangle \quad \text{and}\quad \frac{1}{\sqrt{2}} \langle 0, 1, -1\rangle

do the job.

So, the circle {C} is parametrized by

\displaystyle    x  = c+\frac{2r}{\sqrt{6}} \cos t \\   y  = c-\frac{r}{\sqrt{6}} \cos t +\frac{r}{\sqrt{2}} \sin t \\   z  = c-\frac{r}{\sqrt{6}} \cos t -\frac{r}{\sqrt{2}} \sin t

This is not as bad as it looks: the product {xyz} simplifies to

\displaystyle xyz = c^3 - \frac{cr^2}{2} + \frac{r^3\sqrt{6}}{18} \cos 3t

which tells us right away that the volume {V} lies within

\displaystyle  c^3 - \frac{cr^2}{2} \pm \frac{r^3\sqrt{6}}{18}

In terms of the original data {S,D} the bounds for {V} take the form

\displaystyle    \frac{5S-4D^2}{54}\sqrt{S+D^2} \pm \frac{\sqrt{2}}{54} (2D^2-S)^{3/2}

(And of course, {V} cannot be negative even if the lower bound is.) It is easy to see that {2D^2-S\ge 0} with equality only for a cube; however {2D^2-S} can be relatively small even when the box does not look very cube-ish. For the box pictured above, the tolerance {\frac{\sqrt{2}}{54} (2D^2-S)^{3/2}} is approximately {1.4}; after rounding {S\approx 194 } and {D\approx 10} this drops to {0.38}, and the desired volume of {176} is way out of the allowable range {181\pm 0.38}.

Yes, the set of attainable triples {(V,S,D)} is quite thin. Parallelepipeds are fragile objects: handle them with care.

Attainable (V,S,D)
Attainable (V,S,D)