Maximum of three numbers: it’s harder than it sounds

This simple identity hold for any two real numbers {x,y}:

\displaystyle   \max(|x|,|y|) = \frac12\,(|x+y|+|x-y|)   \ \ \ \ \  \ \ \ \ \ (1)

Indeed, if {|x|} realizes the maximum, then both {x+y} and {x-y} have the same sign as {x}. After opening the absolute value signs, we get {y} to cancel out.

So, (1) represents {\max(|x|,|y|)}, also known as the {\ell_\infty} norm, as the sum of absolute values of linear functions. Let’s try the same with {\max(|x|,|y|,|z|)}. Since the right hand side of (1) is just the average of {|\pm x \pm y|} over all possible choices of {\pm } signs, the natural thing to do is to average {|\pm x \pm y \pm z|} over all eight choices. The sign in front of {x} can be taken to be {+}, which simplifies the average to

\displaystyle    \frac14\,(|x+y+z|+|x+y-z|+|x-y+z|+|x-y-z|)    \ \ \ \ \  \ \ \ \ \ (2)

Does this formula give {\max(|x|,|y|,|z|)}? Instead of trying random numbers, let’s just plot the unit ball for the norm given by (2). If the identity works, it will be a cube. I used Maple:

with(plots): f:=(abs(x+y+z)+abs(x+y-z)+abs(x-y-z)+abs(x-y+z))/4:
My precious!
My precious!

Close, but no cube. Luckily, this is my favorite Archimedean solid, the cuboctahedron.

Although all terms of (2) look exactly the same, the resulting shape has both triangular and square faces. Where does the difference of shapes come from?

More importantly, is (2) really the best we can do? Is there some other sum of moduli of linear functions that will produce {\max(|x|,|y|,|z|)}?

— No.

Even if negative coefficients are allowed?

— Even then. (But you can come arbitrarily close.)

What if we allow integrals with respect to an arbitrary (signed) measure, as in

\displaystyle    \iiint |\alpha x+\beta y+\gamma z|\,d \mu(\alpha, \beta, \gamma)    \ \ \ \ \  \ \ \ \ \ (3)

— Still no. But if {\mu} is allowed to be a distribution of higher order (an object more singular than a measure), then a representation exists (W. Weil, 1976). Yes, one needs the theory of distributions to write the maximum of three numbers as a combination of linear functions.

I’ll only prove that there is no identity of the form

\displaystyle  \max(|x|,|y|,|z|) = \sum_{i=1}^N |\alpha_i x+\beta_i y+ \gamma_i z|

Indeed, such an identity amounts to having an isometric embedding {T\colon \ell_\infty^3 \rightarrow \ell_1^N}. The adjoint operator {T^* \colon \ell_\infty^N \rightarrow \ell_1^3} is a submetry meaning that it maps the unit ball of {\ell_\infty^N } onto the unit ball {\ell_1^3}. The unit ball of {\ell_\infty^N } is just a cube; all of its faces are centrally symmetric, and this symmetry is preserved by linear maps. But {\ell_1^3} is an octahedron, with triangular faces. A contradiction. {\ \Box}

An aside: what if instead of averaging {|\pm x \pm y|} over all {\pm } choices (i.e., unimodular real coefficients) we take the average over all unimodular complex coefficients? This amounts to {\|(x,y)\| = \frac{1}{2\pi} \int_0^{2\pi} |x+e^{it}y|\,dt}. I expected something nice from this norm, but

Complex averaging in real space
Complex averaging in real space

it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.

Fourth order obstacle problem

Having solved the obstacle problem for a string, let us turn to a more difficult one, in which an elastic string is replaced with an elastic rod (or plate, if we are looking at a cross-section). Elastic rods resist bending much in the same way that strings don’t. This can be modeled by minimizing the bending energy

\displaystyle B(u) = \frac12 \int_{-2}^2 u''(x)^2 \,dx

subject to the boundary conditions {u(-2)=0=u(2)}, {u'(-2)=0=u'(2)}, and the same obstacle as before: {u(x)\le -\sqrt{1-x^2}}. The boundary conditions for {u'} mean that the rod is clamped on both ends.

As before, the obstacle permits one-sided variations {u+\varphi} with { \varphi\le 0} smooth and compactly supported. The linear term of {B(u+\varphi)} is {\int u''\varphi''}, which after double integration by parts becomes {\int u^{(4)} \varphi}. Since the minimizer satisfies {E(u+\varphi)-E(u)\ge 0}, the conclusion is {\int u^{(4)} \varphi \ge 0 } whenever {\varphi\le 0}. Therefore, {u^{(4)}\le 0} everywhere, at least in the sense of distributions. In the parts where the rod does not touch the obstacle, we can do variation of either sign and obtain {u^{(4)}=0}; that is, {u} is a cubic polynomial there.

So far everything looks similar to the previous post. But the fourth derivative of the obstacle function {-\sqrt{1-x^2}} is {3(4x^2+1)/(1-x^2)^{7/2}}, which is positive. Since the minimizer {u} must satisfy {u^{(4)}\le 0}, it cannot assume the shape of the obstacle. The contact can happen only at isolated points.

Therefore, {u} is a cubic spline with knots at the contact points and at {\pm 2}. The distributional derivative {u^{(4)}} consists of negative point masses placed at the contact points. Integrating twice, we find that {u''} is a piecewise affine concave function; in particular it is continuous. The minimizer will be {C^2}-smooth in {(-2,2)}.

How many contact points are there? If only one, then by symmetry it must be at {x=0}, and the only three-knot cubic spline that satisfies the boundary conditions and passes through {(0,-1)} with zero derivative is {(-1/4)(1+|x|)(2-|x|)^2}. But it does not stay below the obstacle:

Three-knot spline fails the obstacle condition
Three-knot spline fails the obstacle condition

With a smaller circle, or a longer bar, the one-contact (three-knot) spline would work. For example, on {[-3,3]}:

With a longer bar, a three-knot spline solves the problem
With a longer bar, a three-knot spline solves the problem

But with our parameters we look for two contact points. By symmetry, the middle piece of the spline must be of the form {q(x)=cx^2+d}. The other two will be {p(x)=(ax+b)(2-x)^2} and {p(-x)}, also by symmetry and to satisfy the boundary conditions at {\pm 2}. At the positive knot {x_0} the following must hold:

\displaystyle p(x_0)=q(x_0)=-\sqrt{1-x_0^2}, \quad p'(x_0)=q'(x_0)=\frac{-x_0}{\sqrt{1-x_0^2}}, \quad p''(x_0)=q''(x_0)

where the last condition comes from the fact that {u''} is concave and therefore continuous. With five equations and five unknowns, Maple finds solutions in closed form. One of them has {x_0=0}, {a=b=-1/4} as above, and is not what we want. The other has {x_0=(\sqrt{10}-2)/3\approx 0.3874} and coefficients such as {a=-\frac{1}{3645}\sqrt{505875+164940\sqrt{10}}}. Ugly, but it works:


This time, the bar does stay below the obstacle, touching it only at two points. The amount by which it comes off the obstacle in the middle is very small. Here is the difference {u(x)-(-\sqrt{1-x^2})}:

Deviation from the obstacle
Deviation from the obstacle

And this is the second derivative {u''}.

Second derivative is Lipschitz continuous
Second derivative is Lipschitz continuous

Again, the minimizer has a higher degree of regularity (Lipschitz continuous second derivative) than a generic element of the function space in which minimization takes place (square-integrable second derivative).

If the rod is made shorter (and the obstacle stays the same), the two-contact nature of the solution becomes more pronounced.

Shorter rod
Shorter rod

Assuming the rod stays in one piece, of course.

Second order obstacle problem

Imagine a circular (or cylindrical, in cross-section) object being supported by an elastic string. Like this:

Obstacle problem
Obstacle problem

To actually compute the equilibrium mass-string configuration, I would have to take some values for the mass of the object and for the resistance of the string. Instead, I simply chose the position of the object: it is the unit circle with center at {(0,0)}. It remains to find the equilibrium shape of the string. The shape is described by equation {y=u(x)} where {u} minimizes the appropriate energy functional subject to boundary conditions {u(-2)=0=u(2)} and the obstacle {u(x)\le -\sqrt{1-x^2}}. The functional could be the length

\displaystyle  L(u) = \int_{-2}^2 \sqrt{1+u'(x)^2}\,dx

or its quadratization

\displaystyle E(u) = \frac12 \int_{-2}^2 u'(x)^2 \,dx

The second one is nicer because it yields linear Euler-Lagrange equation/inequality. Indeed, the obstacle permits one-sided variations {u+\varphi} with { \varphi\le 0} smooth and compactly supported. The linear term of {E(u+\varphi)} is {\int u'\varphi'}, which after integration by parts becomes {-\int u'' \varphi}. Since the minimizer satisfies {E(u+\varphi)-E(u)\ge 0}, the conclusion is {\int u'' \varphi \le 0 } whenever {\varphi\le 0}. Therefore, {u''\ge 0} everywhere (at least in the sense of distributions), which means {u} is a convex function. In the parts where the string is free, we can do variation of either sign and obtain {u''=0}; that is, {u} is an affine function there.

The convexity of {u} in the part where it touches the obstacle is consistent with the shape of the obstacle: the string can assume the same shape as the obstacle.

The function {u} can now be determined geometrically: the only way the function can come off the circle, stay convex, and meet the boundary condition is by leaving the circle along the tangents that pass through the endpoint {(\pm 2,0)}. This is the function pictured above. Its derivative is continuous: Lipschitz continuous, to be precise.

First derivative is Lipschitz continuous
First derivative is Lipschitz continuous

The second derivative does not exist at the transition points. Still, the minimizer has a higher degree of regularity (Lipschitz continuous derivative) than a generic element of the function space in which minimization takes place (square-integrable derivative).

As a bonus, the minimizer of energy {E} turns out to minimize the length {L} as well.

All in all, this was an easy problem. Next post will be on its fourth-order version.

Quasi-isometries and stability of quasi-geodesics

Continuation of expository series on Gromov hyperbolicity. Recall that a map {f\colon X\rightarrow Y} is a quasi-isometry if there are constants {L,M} such that {L^{-1}|x_1x_2|-M\le |f(x_1)f(x_2)|\le L|x_1x_2|+M} for all {x_1,x_2\in X}. This is a coarse version of the bi-Lipschitz condition. Surprisingly, Gromov hyperbolicity is preserved under quasi-isometries of geodesic spaces. The surprising part is that the multiplicative constant {L} does not kill the additive constant {\delta}.

Theorem. Suppose {X} and {Y} are geodesic metric spaces, and {Y} is Gromov hyperbolic. If there exists a quasi-isometry {f\colon X\rightarrow Y}, then {X} is also Gromov hyperbolic.

Proof goes like this. Assuming that {X} contains a fat geodesic triangle {a,b,c}, we consider the geodesic triangle in {Y} with vertices {f(a),f(b),f(c)}, and want to prove that it is also fat. Since {f} is a quasi-isometry, it follows that the images of geodesics {ab}, {bc} and {ac} form a roughly-triangular shape which has the fatness property: there is a point on one of the sides that is far away from the other two sides. The problem reduces to showing that this roughly-triangular shape lies within a certain distance {R} (independent of {a,b,c}) from the actual geodesic triangle with vertices {f(a),f(b),f(c)}. This is known as stability of quasi-geodesics. A quasi-geodesic is a quasi-isometric image of a line segment, similar to how a geodesic is a (locally) isometric image of a segment.

By the way, quasi-geodesic stability fails in {\mathbb R^2}. We can connect the points {(-n,0)} and {(n,0)} by the quasi-geodesic {y=n-|x|}, which is at distance {n} from the true geodesic between these points.

Quasi-geodesics are not stable in R^2

I’ll prove a more specialized and weaker statement, which however contains the essence of the full result. Namely, let \mathbb H^2 denote the hyperbolic plane and assume that {f\colon [a,b]\rightarrow \mathbb H^2} is bi-Lipschitz: {L^{-1}|x_1-x_2|\le |f(x_1)f(x_2)|\le L|x_1-x_2|} for all {x_1,x_2\in [a,b]}. The claim is that the image of {f} lies in the {R}-neighborhood of the geodesic through {f(a)} and {f(b)}, where {R} depends only on {L}.

There are three standard models of hyperbolic plane: disk, halfplane and infinite strip. I’ll use the last one, because it’s the only model in which a geodesic is represented by Euclidean line. Specifically, {\mathbb H^2} is identified with the infinite strip {\{x+iy\in\mathbb C\colon |y|<\pi/2\}} equipped with the metric {|dz|/\cos y}. (To see where the metric comes from, apply z\mapsto i e^{iz} to map the strip onto upper halfplane and pull back the hyperbolic metric |dw|/\mathrm{Im}\,w.)

The hyperbolic and Euclidean metrics coincide on the real line, which is where we place {f(a)} and {f(b)} with the help of some hyperbolic isometry. Let {\Gamma=f[a,b]} be our quasi-geodesic. Being a bi-Lipschitz image of a line segment, {\Gamma} satisfies the chord-arc condition: the length of any subarc of {\Gamma} does not exceed {L^2} times the distance between its endpoints. Pick {y_0\in (0,\pi/2)} such that {1/\cos y_0=2L^2}. Let {D} be the hyperbolic distance between the lines {y=y_0} and {y=0}. This distance could be calculated as {\int_0^{y_0}\sec y\,dy}, but I’d rather keep this integral as an exquisite Calculus II torture device.

The problem facing us is that quasigeodesic may be L^2 times longer than the distance between its endpoints, which seems to allows it to wander far off the straight path. However, it turns out there is a uniform bound on the length of any subarc {\Gamma'} of {\Gamma} that lies within the substrip {\{y \ge y_0\}}. We lose no generality in assuming that the endpoints of \Gamma' are on the line y=y_0; they will be denoted {x_j+iy_0}, {j=1,2}. The key point is that connecting these two points within \{y\ge y_0\} is rather inefficient, and such inefficiency is controlled by the chord-arc property.

Quasi-geodesic and its inefficient subarc

The hyperbolic distance between {x_j+iy_0} is at most |x_1-x_2|+2D, because we can go from {x_1+iy_0} to {x_1} (distance {D}), then from {x_1} to {x_2} (distance {|x_1-x_2|}), and finally from {x_2} to {x_2+iy_0} (distance {D}). On the other hand, the length of {\Gamma'} is at least {|x_1-x_2|/\cos y_0 = 2L^2|x_1-x_2|} because the density of hyperbolic metric is at least 1/\cos y_0 where \Gamma' lives. The chord-arc property yields {2L^2 |x_1-x_2| \le L^2 (|x_1-x_2|+2D)}, which simplifies to {|x_1-x_2| \le 2D}. Hence, the distance between the endpoints of {\Gamma'} is at most {4D}, and another application of the chord-arc property bounds the length of {\Gamma'} by {4DL^2}.

In conclusion, the claimed stability result holds with {R= D+2DL^2}.

Complete proofs can be found in many books, for example Metric Spaces of Non-Positive Curvature by Bridson and Haefliger or Elements of Asymptotic Geometry by Buyalo and Schroeder. I used Schroeder’s lecture notes An introduction to asymptotic geometry.

Tight spans and Gromov hyperbolicity

The definition of a metric space guarantees that if we pick three points {A,B,C} from it, we can draw a triangle in the Euclidean plane, with vertices labeled {A,B,C}, so that the side lengths are equal to the corresponding distances {|AB|, |AC|, |BC|}. In other words, we can represent any triple of points as vertices of a triangle.

Triangle representation

There is another, perhaps less obvious representation: as vertices of a tripod. Specifically, we can draw a graph of the form shown below and assign length to each edge so that {|AB|} is equal to the length of the shortest path from {A} to {B}, etc.

Tripod representation

Indeed, we have the system of linear equations

{\displaystyle \begin{cases}   a+b & = |AB| \\    a+c & = |AC| \\    b+c & = |BC| \\    \end{cases}}

with a unique solution

\displaystyle   \begin{cases} a &= \frac{1}{2}(|AB|+|AC|-|BC|) \\   b &= \frac{1}{2}(|AB|+|BC|-|AC|) \\    c &= \frac{1}{2}(|AC|+|BC|-|AB|)    \end{cases}

The numbers {a,b,c} are clearly nonnegative; if a number is zero, the corresponding edge can be contracted to a point. Simple as they are, these formulas have a name: {a} is called the Gromov product of {B} and {C} with respect to {A}, and is denoted {(B|C)_A}.

The more geometric way to arrive at tripod representation is to inscribe a circle in the triangle shown above, and note how the points of tangency divide the sides of the triangle:

Gromov products from inscribed circle

Next question: can we represent four points {A,B,C,D} in a similar way? The first approach fails: in general there is no Euclidean quadrilateral in which the sides and diagonals represent the distances between these four points. For example, if {|AB|=|BC|=|CD|=|DA|=1} and {|AC|=|BD|=2}, then any Euclidean representation must place both {B} and {D} in the midpoint between {A} and {C}, which contradicts {|BD|} being nonzero. Let’s try the metric graph approach.

By analogy with the tripod, we might hope at first for something like

Metric tree

but this does not work in general. Indeed, four points determine six distances, and the graph has only five degrees of freedom. We can add one more degree of freedom by thickening the connecting edge to a rectangle:

Generic tight span of a 4-point metric space

Does this work? Can we really find the required nonnegative numbers {a,b,c,d,e,f}? Yes! Writing and solving a system of six linear equations is tedious but can be done. A better way is to recognize that we have four tripods here, such as the tripod with vertices {A,B,C} and the edge lengths {a+e}, {b}, {c+f}. This immediately tells us that {b=(A|C)_B}. Similarly, {a=(B|C)_A}, {c=(B|D)_C} and {d=(A|C)_D}. The remaining lengths {e} and {f} can now be found directly. But in fact, we can find {e,f} independently of the other four lengths by considering the perfect matchings between the points {A,B,C,D}. Namely,

\displaystyle    \begin{cases}   |AB|+|CD| &= (a+b+c+d) + 2e \\   |AD|+|BC| &= (a+b+c+d) + 2f \\   |AC|+|BD| &= (a+b+c+d) + 2e+2f    \end{cases}

From here we get nonnegative {e} and {f} if and only if {\{AC,BD\}} is the longest matching of vertices (or one of the longest). In other words, we must consider the lengths of matchings before marking {A,B,C,D} on the graph. But once they are marked correctly, the rest is now clear: the numbers {2e} and {2f} are the amounts by which the longest matching exceeds the other two. In particular, the rectangle collapses into an edge precisely when the two longest matchings have the same length.

What we have here are examples of tight spans of metric spaces: the tripod is the tight span of 3-point space {\{A,B,C\}}, and the six-edge graph is the 1-skeleton of the tight span of {\{A,B,C,D\}}. (The rectangle should have been filled in to form a 2-cell; in general the tight span of n points is a polyhedral complex of dimension up to {\lfloor n/2\rfloor}.)

By definition, a metric space is Gromov hyperbolic if there exists a constant {\delta\ge 0} such that the tight span of any 4-point subset satisfies {\min(e,f)\le \delta}; that is, the rectangle in the middle is not too fat. Informally, this means the space is (contained in) a uniformly thickened metric tree. In terms of distances, the definition says that the lengths of the two longest matchings of {A,B,C,D} differ by at most {2\delta}. Out of several equivalent definitions of Gromov hyperbolicity, this is the most elementary one; it does not require the consideration of geodesics.

However, the real power of Gromov hyperbolicity transpires in the setting of geodesic spaces, where any two points are connected by an isometric image
of a line segment. To begin with, in geodesic spaces it suffices to consider the case when {D} lies on the geodesic from {A} to {C} (otherwise we replace {D}
with a nearby point on the geodesic). The hyperbolicity condition now says that in the geodesic triangle {ABC} each point on the side {AC} is within distance {\delta} of one of two other sides. Informally, all geodesic triangles in a Gromov hyperbolic space are uniformly thin.

This is indeed the case in the hyperbolic plane {\mathbb H^2}. Indeed, a direct computation shows that the area of any geodesic triangle in {\mathbb H^2} is less than {\pi} (it is {\pi} minus the sum of the angles). On the other hand, the area of a hyperbolic disk of radius {r} grows indefinitely as {r\rightarrow\infty}, since the total area of {\mathbb H^2} is infinite. It follows that in {\mathbb H^2} there is a uniform upper bound on the radius of a circle that can be inscribed into a triangle; the Gromov hyperbolicity of {\mathbb H^2} follows at once.

Even more immediately, {\mathbb R} is Gromov hyperbolic, with {\delta=0}. But {\mathbb R^2} is not, as it contains arbitrarily fat triangles (also, the perfect matchings of the points {\{(\pm n, \pm n)\}} have lengths {4\sqrt{2}n, 4n, 4n}, contradicting the four-point condition). Similarly, the graph of {y=|x|} with metric induced by restriction contains four-point configurations \{(\pm n,n),(\pm 2n,2n)\} and therefore is not Gromov hyperbolic. Since this graph is bi-Lipschitz equivalent to a line, we see that in general metric spaces Gromov hyperbolicity is not a bi-Lipschitz invariant. How could we expect it to be? bi-Lipschitz maps introduce multiplicative distortion of distances {L^{-1}|AB|\le |f(A)f(B)|\le L|AB|}, while Gromov hyperbolicity requires an additive bound.

Yet, if two geodesic metric spaces are bi-Lipschitz equivalent, then either both are Gromov hyperbolic, or neither. One can even replace bi-Lipschitz equivalence with a coarser notion of quasi-isometry, which requires {L^{-1}|AB|-M\le |f(A)f(B)|\le L|AB|+M} (and {f} need not be surjective; it should only cover some {\epsilon}-net in the codomain). The quasi-isometric invariance is what makes it possible to discuss hyperbolicity of finitely generated groups… but that topic will have to wait for another day.

Vanishing Mean Oscillation

The oscillation of function f on a set A is the diameter of f(A). For real-valued functions it’s written as \mathrm{osc}_A\,f=\sup_A f - \inf_A f. The relation to uniform continuity is immediate: f is uniformly continuous if and only if \mathrm{osc}_I\,f is small (<\epsilon) on all sufficiently short intervals (|I|<\delta). Here I'm following the traditional definition of uniform continuity here, even though it’s wrong.

In statistics, the difference (maximum-minimum) is called the range of a set of data values, and is one of common measures of variation. It is a crude one, though, and is obviously influenced by outliers. There are better ones, such as standard deviation: subtract the mean, square each term, then average them, and take square root… you know the drill. What happens if we use the standard deviation in the definition of uniform continuity?

We get the space of functions of vanishing mean oscillation, known as VMO. To avoid technicalities, I will consider only the functions f\colon \mathbb R\to\mathbb R that vanish outside of the interval [-1,1]. Like this one:

Hat function

Given an interval I\subset \mathbb R, let f_I denote the mean of f on I, namely \frac{1}{|I|}\int_I f. By definition, f is in VMO if and only if |f - f_I|_I is small (<\epsilon) on all sufficiently short intervals (|I|<\delta).

Wait, what happened to squaring and taking a root afterwards? Turns out they are not really necessary: the above definition is equivalent to the one with standard deviation \sqrt{(|f - f_I|^2)_I}. In fact, VMO functions are integrable to any power 1\le p<\infty, and any such p can be used in the definition (a consequence of the John-Nirenberg lemma). Both squared and unsquared versions have their advantages:

  • |f - f_I|_I is finite for any locally integrable function; we don’t need to know in advance that f is square integrable.
  • \sqrt{(|f - f_I|^2)_I} is better suited for precise computation, since it’s just \sqrt{ (f^2)_I-f_I^2}.

For example, taking I=[-1,1] for the hat function, we get f_I=1/2 and (f^2)_I=1/3. So, \sqrt{(|f - f_I|^2)_I} = 1/\sqrt{12}, and this is as large as it gets. The supremum of mean oscillation over all intervals is called the BMO norm. The space BMO, naturally, includes all functions with finite BMO norm. VMO is a closed subspace of BMO — the closed span of continuous functions (uniformly continuous and vanishing at infinity, to be precise). In particular, VMO is separable while BMO is not.

The definition of VMO clearly rules out jump discontinuities such as x/|x|. Yet, it allows some discontinuous functions and even unbounded ones. The standard example of an unbounded function in VMO is the double logarithm:

f(x) = log(1-log|x|)

If this function does not look unbounded, it’s because it reaches level y=5 only when |x|\approx 10^{-64}, which is well below the resolution of your screen. Even if you have Retina display.

The classical paper by Sarason which introduced VMO has 381 Google Scholar citations as of now. Will this post count as 382nd? We’ll see…


1. Sarason, Donald. “Functions of vanishing mean oscillation.” Trans. Amer. Math. Soc. 207, no. 2 (1975), 391-405.

Injective:Surjective :: Isometric:?

I used this sort of title before, in “Continuous:Lipschitz :: Open:?“, but the topics are related anyway.

In some sense (formal or informal) the following classes of maps are dual to each other.

  • Injective : Surjective
  • Immersion : Submersion
  • Monomorphism : Epimorphism

An isometric embedding of metric space X into a metric space Y is a map f\colon X\to Y such that d_Y(f(a),f(b))=d_X(a,b) for all a,b\in X. This concept belongs to the left column of the table above. What should be its counterpart?

Candidate 1. Observe that a 1-Lipschitz map f\colon X\to Y is an isometric embedding iff it does not factor through any 1-Lipschitz surjection g\colon X\to Z (for any space Z) unless g is an isometric isomorphism. Reversing the order of arrows, we arrive at the following concept:

A 1-Lipschitz map f\colon Y\to X is a metric quotient map if it does not factor through any 1-Lipschitz injection g\colon Z\to X (for any space Z) unless g is an isometric isomorphism.

This can be reformulated as follows: \rho(a,b):=d_{X}(f(a),f(b)) is the greatest pseudo-metric on Y subject to

  1. \rho(a,b)\le d_{Y}(a,b)
  2. \rho(a,b)=0 if f(a)=f(b)

This seems reasonable: f does as little damage as possible, given the structure of its fibers. There is also a natural way to construct \rho for any reasonable fibering of Y: begin by defining \widetilde{d_Y}=d_Y for points in different fibers and 0 otherwise. Then force the triangle inequality by letting \rho(a,b)=\inf \sum_{j=1}^n \widetilde{d_Y}(y_j,y_{j-1}) subject to y_0=a and y_n=b. As long as the fibers stay at positive distance from one another, this \rho will be a metric. The corresponding metric quotient map sends each point of Y onto its fiber.

Here is a simple example, in which a part of an interval is mapped to a point.

These aren't the quotients I'm looking for

However, the above example made me unhappy. The only nontrivial fiber is the interval [1,2]. Both points 0 and 3 belong to trivial fibers, but the distance between them decreases from 3 to 2. This looks like a wrong kind of quotient to me.

Candidate 2 already appeared in my post on Lipschitz quotient, but wasn’t recognized at the time. It could be called (1,1)-Lipschitz quotient, but a better name is available. A map f\colon Y\to X is a submetry if f(B_Y(y,r))=B_X(f(y),r) where the balls are closed (using open balls yields something almost equivalent, but generally weaker). Such f need not be an isometry: consider orthogonal projections in a Euclidean space. It does have to be 1-Lipschitz. The additional property that distinguishes it from general 1-Lipschitz maps is the 2-point lifting property: for every x_0,x_1\in X and every y_0\in f^{-1}(x_0) there is y_1\in f^{-1}(x_1) such that d_{Y}(y_0,y_1)=d_X(x_0,x_1). Incidentally, this shows that x\mapsto f^{-1}(x) is an isometric embedding of X into the hyperspace \mathcal{H}(Y) which I covered earlier (“This ain’t like dusting crops, boy“).

The concept and the name were introduced by В.Н. Берестовский (V.N. Berestovskii) in his paper “Submetries of space-forms of nonnegative curvature” published in 1987 in Siberian Math. J. Among other things, he proved that a submetry between spheres (of any dimension) is an isometry. Of course, there are many submetries of S^{n-1} onto other spaces: take the quotient by a subgroup of O(n), which can be either discrete or continuous. Are there any submetries that are not quotients by isometries?

Yes, there are. I’ll describe a (modified) example given by Berestovskii and Guijarro (2000). Let \mathbb{H} be the hyperbolic plane realized as the upper half-plane with the metric \frac{dx^2+dy^2}{y^2}. Define f\colon \mathbb{H}\to\mathbb{R} by

\displaystyle f(x,y)=\begin{cases} \log y, \qquad x\le 0 \\ \log\frac{x^2+y^2}{y}, \qquad x\ge 0 \end{cases}

Don’t panic; this is just the signed distance function (in the metric of \mathbb H) to the fat green curve below. I also drew two other level sets of f, to the best of my Friday night line-drawing ability.

Weird submetry

To convince yourself that f is a submetry, first consider y\mapsto \log y for which the submetry property is clear (it’s the quotient by horizontal translation), and then note that the inversion in the unit circle exchanges horizontal lines (horocycles at infinity) with horocycles at 0. An interesting feature of this submetry that it is not very smooth: C^{1,1} but not C^2.

The exits are North South and East but are blocked by an infinite number of mutants

I used the word mutation in the previous post because of the (implicit) connection to quiver mutation. The quiver mutation is easy to define: take an oriented graph (quiver) where multiple edges are allowed, but must have consistent orientation (i.e., no 2-edge oriented cycles are allowed). Mutation at vertex v is done in three steps:

  1. v is removed and each oriented path of length two through v is contracted into an edge. That is, the stopover at v is eliminated.
  2. Step 1 may create some 2-edge oriented cycles, which must be deleted. That is, we cancel the pairs of arrows going in opposite directions.
  3. The replacement vertex v’ is inserted, connected to the rest of the graph in the same way that v was, but with opposite orientation. In practice, one simply reuses v for this purpose.

Some quivers have a finite set of mutation equivalent ones; others an infinite one. Perhaps the simplest nontrivial case is the oriented 3-cycle with edges of multiplicities x,y,z. The finiteness of its equivalents has to do with the Markov constant C(x,y,z)=x^2+y^2+z^2-xyz (not 3xyz this time), which is invariant under mutation. This is investigated in the paper “Cluster-Cyclic Quivers with Three Vertices and the Markov Equation” by Beineke, Brűstle and Hille. The appendix by Kerner relates the Markov constant to Hochschild cohomology, which I take as a clue for me to finish this post.

So I’ll leave you to play with the mutation applet linked in the embedded tweet below.

“Let me say a few words about the solutions in positive integers of the equation


This equation is symmetric with respect to unknown terms x, y, z; therefore, knowing one of its solutions

x=\alpha, \quad y=\beta, \quad z=\gamma,

it is easy to find the following five:

x=\alpha, \quad y=\gamma, \quad z=\beta;
x=\beta, \quad y=\gamma, \quad z=\alpha;
x=\beta, \quad y=\alpha, \quad z=\gamma;
x=\gamma, \quad y=\alpha, \quad z=\beta;
x=\gamma, \quad y=\beta, \quad z=\alpha.

Although these six solutions may be different, we will consider them as one, denoted by


The above is a quote from A.A.Markov’s paper “Sur les formes quadratiques binaires indéfinies” (part 2). Back in 1880 people were patient enough to write out all permutations of three symbols… If we fix x=\alpha and y=\beta, the equation for z becomes

z^2-3\alpha \beta z +(\alpha^2+\beta^2)=0

which admits the second integer root \gamma'=3\alpha\beta-\gamma. We can also write \gamma\gamma'=\alpha^2+\beta^2 which does not immediately tell that \gamma' is an integer, but it does tell us that \gamma' is positive. For example, from the obvious solution (1,1,1) we get (1,1,2). Now it would not do us any good to mutate in the third variable again, for it would bring us back to (1,1,1). But we can mutate in the second variable, replacing it with 6-1=5. Having understood so well the symmetry of the equation, we write this new solution as (1,2,5), in the increasing order. Now we can mutate either 1 or 2, and so it goes…

Markov number tree, from

All triples, except for the two at the beginning, consist of distinct numbers (thus, they do generate six distinct solutions if order matters). The tree contains all solutions of the Markov equation. The Wikipedia article also points out the occurrence of Fibonacci numbers along the top branch, as well as a curious identity discovered by Don Zagier: let f(t)=\cosh^{-1}(3t/2); then (for triples written in increasing order)

\displaystyle x^2+y^2+z^2=3xyz+\frac{4}{9} \quad \Leftrightarrow \quad f(x)+f(y)=f(z)

Looks like a fun problem on simplification of inverse hyperbolic trigonometric functions.

Also, it’s still unknown whether two distinct Markov triples can have the same maximum \max(\alpha,\beta,\gamma). Looks like a fun problem for amateur number theorists.

To wrap this up, I will describe how Markov (the one of Markov chains fame, not his identically-named son of 4-manifold undecidability fame) came across the equation. Let Q(m,n)=am^2+2bmn+cn^2 be a quadratic form with real coefficients a,b,c normalized by b^2-ac=1. What is the best upper bound on

\displaystyle \min_{(m,n)\in \mathbb{Z}^2\setminus \lbrace(0,0)\rbrace} |Q(m,n)|?

In 1873 Korkin and Zolotarev published a paper showing that the best bound is 2/\sqrt{5}, attained by the form \displaystyle Q_1=\frac{2}{\sqrt{5}}(m^2-mn-n^2). Looks like the case is closed. But what if Q is not Q_1 (precisely, not equivalent to Q_1 under the action of SL(2,\mathbb Z))? Then the best bound improves to 1/\sqrt{2}, attained by \displaystyle Q_2=\frac{1}{\sqrt{2}}(m^2-2mn-n^2) (this is also due to KZ). Well, what if Q is not equivalent to either Q_1 or Q_2? Then the bound improves to \sqrt{\frac{100}{221}} (found by Markov), and we could go on…

But rather than continue in this fashion, Markov looked for the threshold \mu at which the number of inequivalent forms with minimum \ge \mu becomes infinite. And he found it: \mu =2/3 (for comparison, \sqrt{\frac{100}{221}}=0.67267\dots). Specifically, there are only finitely many forms with minimum above 2/3+\epsilon, for every \epsilon>0. But there are infinitely many forms with minimum exactly 2/3, such as \frac{2}{3}(x^2-\sqrt{5}xy-y^2). It was the iterative process of getting more and more of these forms that led Markov to the Diophantine equation x^2+y^2+z^2=3xyz.

The number 2/3 and its square also appear in Zagier’s identity with f(t)=\cosh^{-1}(3t/2)… But enough numerology for today.

Multiplication by consensus, and some arrows

Let’s admit it: it’s hard to keep track of signs when multiplying numbers. Being lazy people, mathematicians seek ways to avoid this chore. One popular way is to work in the enchanted world of \mathbb Z_2, where -1=1. I’ll describe another way, which is to redefine multiplication by letting the factors reach a consensus on what the sign of their product should be.

If both a and b are positive, let their product be positive. And if they are both negative, the product should also be negative. Finally, if the factors can’t agree on which sign they like, they compromise at 0.

In a formula, this operation can be written as a\odot b=\frac{a|b|+|a|b}{2}, but who wants to see that kind of formulas? Just try using it to check that the operation is associative (which it is).

But I hear someone complaining that \odot is just an arbitrary operation that does not make any sense. So I’ll reformulate it. Represent real numbers by ordered pairs (a_+,a_-)\in [0,\infty)\times [0,\infty), for example 5 becomes (5,0) and -6 becomes (0,6). Define multiplication component-wise. Better now? You don’t have to keep track of minus signs because there aren’t any.

This \odot comes in handy when multiplying the adjancency matrices of quivers. The Wikipedia article on Quiver illustrates the concept with this picture:


But in mathematics, a quiver is a directed graph such as this one:

Quiver = directed graph

Recall that the adjancency matrix A of a graph on vertices \lbrace 1,2,\dots,n\rbrace has A_{ij}=1 if there is an edge between i and j, and A_{ij}=0 otherwise. For a directed graph we modify this definition by letting A_{ij}=1 if the arrow goes from i to j, and A_{ij}=-1 if it goes in the opposite direction. So, for the quiver shown above we get

A=\displaystyle \begin{pmatrix} 0 & 1 & -1 & 1 \\   -1 & 0 & -1 & 1 \\ 1 & 1 & 0 & -1 \\ -1 & -1 & 1 & 0 \end{pmatrix}

For an undirected graph the square A^2 counts the number of ways to get from i to j in exactly 2 steps (and one can replace 2 by n). To make this work for the directed graph, we represent numbers as pairs 1=(1,0) and -1=(0,1) and carry on multiplying and adding:

A\odot A=\displaystyle \begin{pmatrix} (0,0) & (0,0) & (1,0) & (1,1) \\   (0,0) & (0,0) & (1,1) & (0,1) \\ (0,1) & (1,1) & (0,0) & (2,0) \\ (1,1) & (1,0) & (0,2) & (0,0) \end{pmatrix}

For instance, there are two ways to get from 3 to 4 in two steps, but none in the opposite direction. This works for any powers, and also for multigraphs (with more than one edge between same vertices). Logically, this is the same as separating the adjacency matrix into its positive and negative parts, and multiplying them separately.


The last example is matrix mutation from the theory of cluster algebras. Given a (usually, integer) m\times n matrix A and a positive integer k\le \min(m,n), we can mutate A in the direction k by doing the following:

  1. Dump nuclear waste on the kth row and kth column
  2. To each non-radioactive element a_{ij} add a_{ik}\odot a_{kj}, that is, the \odot product of the radioactive elements to which a_{ij} is exposed.
  3. Clean up by flipping the signs of all radioactive elements.

The properties of \odot should make it clear that each mutation is an involution: mutating for the second time in the same direction recovers the original matrix. However, applying mutations in different directions, one can obtain a large, or even infinite, class of mutation-equivalent matrices.