This simple identity hold for any two real numbers :
Indeed, if realizes the maximum, then both and have the same sign as . After opening the absolute value signs, we get to cancel out.
So, (1) represents , also known as the norm, as the sum of absolute values of linear functions. Let’s try the same with . Since the right hand side of (1) is just the average of over all possible choices of signs, the natural thing to do is to average over all eight choices. The sign in front of can be taken to be , which simplifies the average to
Does this formula give ? Instead of trying random numbers, let’s just plot the unit ball for the norm given by (2). If the identity works, it will be a cube. I used Maple:
Although all terms of (2) look exactly the same, the resulting shape has both triangular and square faces. Where does the difference of shapes come from?
More importantly, is (2) really the best we can do? Is there some other sum of moduli of linear functions that will produce ?
Even if negative coefficients are allowed?
— Even then. (But you can come arbitrarily close.)
What if we allow integrals with respect to an arbitrary (signed) measure, as in
— Still no. But if is allowed to be a distribution of higher order (an object more singular than a measure), then a representation exists (W. Weil, 1976). Yes, one needs the theory of distributions to write the maximum of three numbers as a combination of linear functions.
I’ll only prove that there is no identity of the form
Indeed, such an identity amounts to having an isometric embedding . The adjoint operator is a submetry meaning that it maps the unit ball of onto the unit ball . The unit ball of is just a cube; all of its faces are centrally symmetric, and this symmetry is preserved by linear maps. But is an octahedron, with triangular faces. A contradiction.
An aside: what if instead of averaging over all choices (i.e., unimodular real coefficients) we take the average over all unimodular complex coefficients? This amounts to . I expected something nice from this norm, but
it’s a strange shape whose equation involves the complete elliptic integral of second kind. Yuck.
Having solved the obstacle problem for a string, let us turn to a more difficult one, in which an elastic string is replaced with an elastic rod (or plate, if we are looking at a cross-section). Elastic rods resist bending much in the same way that strings don’t. This can be modeled by minimizing the bending energy
subject to the boundary conditions , , and the same obstacle as before: . The boundary conditions for mean that the rod is clamped on both ends.
As before, the obstacle permits one-sided variations with smooth and compactly supported. The linear term of is , which after double integration by parts becomes . Since the minimizer satisfies , the conclusion is whenever . Therefore, everywhere, at least in the sense of distributions. In the parts where the rod does not touch the obstacle, we can do variation of either sign and obtain ; that is, is a cubic polynomial there.
So far everything looks similar to the previous post. But the fourth derivative of the obstacle function is , which is positive. Since the minimizer must satisfy , it cannot assume the shape of the obstacle. The contact can happen only at isolated points.
Therefore, is a cubic spline with knots at the contact points and at . The distributional derivative consists of negative point masses placed at the contact points. Integrating twice, we find that is a piecewise affine concave function; in particular it is continuous. The minimizer will be -smooth in .
How many contact points are there? If only one, then by symmetry it must be at , and the only three-knot cubic spline that satisfies the boundary conditions and passes through with zero derivative is . But it does not stay below the obstacle:
With a smaller circle, or a longer bar, the one-contact (three-knot) spline would work. For example, on :
But with our parameters we look for two contact points. By symmetry, the middle piece of the spline must be of the form . The other two will be and , also by symmetry and to satisfy the boundary conditions at . At the positive knot the following must hold:
where the last condition comes from the fact that is concave and therefore continuous. With five equations and five unknowns, Maple finds solutions in closed form. One of them has , as above, and is not what we want. The other has and coefficients such as . Ugly, but it works:
This time, the bar does stay below the obstacle, touching it only at two points. The amount by which it comes off the obstacle in the middle is very small. Here is the difference :
And this is the second derivative .
Again, the minimizer has a higher degree of regularity (Lipschitz continuous second derivative) than a generic element of the function space in which minimization takes place (square-integrable second derivative).
If the rod is made shorter (and the obstacle stays the same), the two-contact nature of the solution becomes more pronounced.
Imagine a circular (or cylindrical, in cross-section) object being supported by an elastic string. Like this:
To actually compute the equilibrium mass-string configuration, I would have to take some values for the mass of the object and for the resistance of the string. Instead, I simply chose the position of the object: it is the unit circle with center at . It remains to find the equilibrium shape of the string. The shape is described by equation where minimizes the appropriate energy functional subject to boundary conditions and the obstacle . The functional could be the length
or its quadratization
The second one is nicer because it yields linear Euler-Lagrange equation/inequality. Indeed, the obstacle permits one-sided variations with smooth and compactly supported. The linear term of is , which after integration by parts becomes . Since the minimizer satisfies , the conclusion is whenever . Therefore, everywhere (at least in the sense of distributions), which means is a convex function. In the parts where the string is free, we can do variation of either sign and obtain ; that is, is an affine function there.
The convexity of in the part where it touches the obstacle is consistent with the shape of the obstacle: the string can assume the same shape as the obstacle.
The function can now be determined geometrically: the only way the function can come off the circle, stay convex, and meet the boundary condition is by leaving the circle along the tangents that pass through the endpoint . This is the function pictured above. Its derivative is continuous: Lipschitz continuous, to be precise.
The second derivative does not exist at the transition points. Still, the minimizer has a higher degree of regularity (Lipschitz continuous derivative) than a generic element of the function space in which minimization takes place (square-integrable derivative).
As a bonus, the minimizer of energy turns out to minimize the length as well.
All in all, this was an easy problem. Next post will be on its fourth-order version.
Continuation of expository series on Gromov hyperbolicity. Recall that a map is a quasi-isometry if there are constants such that for all . This is a coarse version of the bi-Lipschitz condition. Surprisingly, Gromov hyperbolicity is preserved under quasi-isometries of geodesic spaces. The surprising part is that the multiplicative constant does not kill the additive constant .
Theorem. Suppose and are geodesic metric spaces, and is Gromov hyperbolic. If there exists a quasi-isometry , then is also Gromov hyperbolic.
Proof goes like this. Assuming that contains a fat geodesic triangle , we consider the geodesic triangle in with vertices , and want to prove that it is also fat. Since is a quasi-isometry, it follows that the images of geodesics , and form a roughly-triangular shape which has the fatness property: there is a point on one of the sides that is far away from the other two sides. The problem reduces to showing that this roughly-triangular shape lies within a certain distance (independent of ) from the actual geodesic triangle with vertices . This is known as stability of quasi-geodesics. A quasi-geodesic is a quasi-isometric image of a line segment, similar to how a geodesic is a (locally) isometric image of a segment.
By the way, quasi-geodesic stability fails in . We can connect the points and by the quasi-geodesic , which is at distance from the true geodesic between these points.
I’ll prove a more specialized and weaker statement, which however contains the essence of the full result. Namely, let denote the hyperbolic plane and assume that is bi-Lipschitz: for all . The claim is that the image of lies in the -neighborhood of the geodesic through and , where depends only on .
There are three standard models of hyperbolic plane: disk, halfplane and infinite strip. I’ll use the last one, because it’s the only model in which a geodesic is represented by Euclidean line. Specifically, is identified with the infinite strip equipped with the metric . (To see where the metric comes from, apply to map the strip onto upper halfplane and pull back the hyperbolic metric .)
The hyperbolic and Euclidean metrics coincide on the real line, which is where we place and with the help of some hyperbolic isometry. Let be our quasi-geodesic. Being a bi-Lipschitz image of a line segment, satisfies the chord-arc condition: the length of any subarc of does not exceed times the distance between its endpoints. Pick such that . Let be the hyperbolic distance between the lines and . This distance could be calculated as , but I’d rather keep this integral as an exquisite Calculus II torture device.
The problem facing us is that quasigeodesic may be times longer than the distance between its endpoints, which seems to allows it to wander far off the straight path. However, it turns out there is a uniform bound on the length of any subarc of that lies within the substrip . We lose no generality in assuming that the endpoints of are on the line ; they will be denoted , . The key point is that connecting these two points within is rather inefficient, and such inefficiency is controlled by the chord-arc property.
The hyperbolic distance between is at most , because we can go from to (distance ), then from to (distance ), and finally from to (distance ). On the other hand, the length of is at least because the density of hyperbolic metric is at least where lives. The chord-arc property yields , which simplifies to . Hence, the distance between the endpoints of is at most , and another application of the chord-arc property bounds the length of by .
In conclusion, the claimed stability result holds with .
The definition of a metric space guarantees that if we pick three points from it, we can draw a triangle in the Euclidean plane, with vertices labeled , so that the side lengths are equal to the corresponding distances . In other words, we can represent any triple of points as vertices of a triangle.
There is another, perhaps less obvious representation: as vertices of a tripod. Specifically, we can draw a graph of the form shown below and assign length to each edge so that is equal to the length of the shortest path from to , etc.
Indeed, we have the system of linear equations
with a unique solution
The numbers are clearly nonnegative; if a number is zero, the corresponding edge can be contracted to a point. Simple as they are, these formulas have a name: is called the Gromov product of and with respect to , and is denoted .
The more geometric way to arrive at tripod representation is to inscribe a circle in the triangle shown above, and note how the points of tangency divide the sides of the triangle:
Next question: can we represent four points in a similar way? The first approach fails: in general there is no Euclidean quadrilateral in which the sides and diagonals represent the distances between these four points. For example, if and , then any Euclidean representation must place both and in the midpoint between and , which contradicts being nonzero. Let’s try the metric graph approach.
By analogy with the tripod, we might hope at first for something like
but this does not work in general. Indeed, four points determine six distances, and the graph has only five degrees of freedom. We can add one more degree of freedom by thickening the connecting edge to a rectangle:
Does this work? Can we really find the required nonnegative numbers ? Yes! Writing and solving a system of six linear equations is tedious but can be done. A better way is to recognize that we have four tripods here, such as the tripod with vertices and the edge lengths , , . This immediately tells us that . Similarly, , and . The remaining lengths and can now be found directly. But in fact, we can find independently of the other four lengths by considering the perfect matchings between the points . Namely,
From here we get nonnegative and if and only if is the longest matching of vertices (or one of the longest). In other words, we must consider the lengths of matchings before marking on the graph. But once they are marked correctly, the rest is now clear: the numbers and are the amounts by which the longest matching exceeds the other two. In particular, the rectangle collapses into an edge precisely when the two longest matchings have the same length.
What we have here are examples of tight spans of metric spaces: the tripod is the tight span of 3-point space , and the six-edge graph is the 1-skeleton of the tight span of . (The rectangle should have been filled in to form a 2-cell; in general the tight span of points is a polyhedral complex of dimension up to .)
By definition, a metric space is Gromov hyperbolic if there exists a constant such that the tight span of any 4-point subset satisfies ; that is, the rectangle in the middle is not too fat. Informally, this means the space is (contained in) a uniformly thickened metric tree. In terms of distances, the definition says that the lengths of the two longest matchings of differ by at most . Out of several equivalent definitions of Gromov hyperbolicity, this is the most elementary one; it does not require the consideration of geodesics.
However, the real power of Gromov hyperbolicity transpires in the setting of geodesic spaces, where any two points are connected by an isometric image
of a line segment. To begin with, in geodesic spaces it suffices to consider the case when lies on the geodesic from to (otherwise we replace
with a nearby point on the geodesic). The hyperbolicity condition now says that in the geodesic triangle each point on the side is within distance of one of two other sides. Informally, all geodesic triangles in a Gromov hyperbolic space are uniformly thin.
This is indeed the case in the hyperbolic plane . Indeed, a direct computation shows that the area of any geodesic triangle in is less than (it is minus the sum of the angles). On the other hand, the area of a hyperbolic disk of radius grows indefinitely as , since the total area of is infinite. It follows that in there is a uniform upper bound on the radius of a circle that can be inscribed into a triangle; the Gromov hyperbolicity of follows at once.
Even more immediately, is Gromov hyperbolic, with . But is not, as it contains arbitrarily fat triangles (also, the perfect matchings of the points have lengths , contradicting the four-point condition). Similarly, the graph of with metric induced by restriction contains four-point configurations and therefore is not Gromov hyperbolic. Since this graph is bi-Lipschitz equivalent to a line, we see that in general metric spaces Gromov hyperbolicity is not a bi-Lipschitz invariant. How could we expect it to be? bi-Lipschitz maps introduce multiplicative distortion of distances , while Gromov hyperbolicity requires an additive bound.
Yet, if two geodesic metric spaces are bi-Lipschitz equivalent, then either both are Gromov hyperbolic, or neither. One can even replace bi-Lipschitz equivalence with a coarser notion of quasi-isometry, which requires (and need not be surjective; it should only cover some -net in the codomain). The quasi-isometric invariance is what makes it possible to discuss hyperbolicity of finitely generated groups… but that topic will have to wait for another day.
The oscillation of function on a set is the diameter of . For real-valued functions it’s written as . The relation to uniform continuity is immediate: is uniformly continuous if and only if is small () on all sufficiently short intervals (). Here I'm following the traditional definition of uniform continuity here, even though it’s wrong.
In statistics, the difference (maximum-minimum) is called the range of a set of data values, and is one of common measures of variation. It is a crude one, though, and is obviously influenced by outliers. There are better ones, such as standard deviation: subtract the mean, square each term, then average them, and take square root… you know the drill. What happens if we use the standard deviation in the definition of uniform continuity?
We get the space of functions of vanishing mean oscillation, known as VMO. To avoid technicalities, I will consider only the functions that vanish outside of the interval . Like this one:
Given an interval , let denote the mean of on , namely . By definition, is in VMO if and only if is small () on all sufficiently short intervals ().
Wait, what happened to squaring and taking a root afterwards? Turns out they are not really necessary: the above definition is equivalent to the one with standard deviation . In fact, VMO functions are integrable to any power , and any such can be used in the definition (a consequence of the John-Nirenberg lemma). Both squared and unsquared versions have their advantages:
is finite for any locally integrable function; we don’t need to know in advance that is square integrable.
is better suited for precise computation, since it’s just .
For example, taking for the hat function, we get and . So, , and this is as large as it gets. The supremum of mean oscillation over all intervals is called the BMO norm. The space BMO, naturally, includes all functions with finite BMO norm. VMO is a closed subspace of BMO — the closed span of continuous functions (uniformly continuous and vanishing at infinity, to be precise). In particular, VMO is separable while BMO is not.
The definition of VMO clearly rules out jump discontinuities such as . Yet, it allows some discontinuous functions and even unbounded ones. The standard example of an unbounded function in VMO is the double logarithm:
If this function does not look unbounded, it’s because it reaches level only when , which is well below the resolution of your screen. Even if you have Retina display.
The classical paper by Sarason which introduced VMO has 381 Google Scholar citations as of now. Will this post count as 382nd? We’ll see…
1. Sarason, Donald. “Functions of vanishing mean oscillation.” Trans. Amer. Math. Soc. 207, no. 2 (1975), 391-405.
In some sense (formal or informal) the following classes of maps are dual to each other.
Injective : Surjective
Immersion : Submersion
Monomorphism : Epimorphism
An isometric embedding of metric space into a metric space is a map such that for all . This concept belongs to the left column of the table above. What should be its counterpart?
Candidate 1. Observe that a 1-Lipschitz map is an isometric embedding iff it does not factor through any 1-Lipschitz surjection (for any space ) unless is an isometric isomorphism. Reversing the order of arrows, we arrive at the following concept:
A 1-Lipschitz map is a metric quotient map if it does not factor through any 1-Lipschitz injection (for any space ) unless is an isometric isomorphism.
This can be reformulated as follows: is the greatest pseudo-metric on subject to
This seems reasonable: does as little damage as possible, given the structure of its fibers. There is also a natural way to construct for any reasonable fibering of : begin by defining for points in different fibers and otherwise. Then force the triangle inequality by letting subject to and . As long as the fibers stay at positive distance from one another, this will be a metric. The corresponding metric quotient map sends each point of onto its fiber.
Here is a simple example, in which a part of an interval is mapped to a point.
However, the above example made me unhappy. The only nontrivial fiber is the interval . Both points and belong to trivial fibers, but the distance between them decreases from 3 to 2. This looks like a wrong kind of quotient to me.
Candidate 2 already appeared in my post on Lipschitz quotient, but wasn’t recognized at the time. It could be called -Lipschitz quotient, but a better name is available. A map is a submetry if where the balls are closed (using open balls yields something almost equivalent, but generally weaker). Such need not be an isometry: consider orthogonal projections in a Euclidean space. It does have to be 1-Lipschitz. The additional property that distinguishes it from general 1-Lipschitz maps is the 2-point lifting property: for every and every there is such that . Incidentally, this shows that is an isometric embedding of into the hyperspace which I covered earlier (“This ain’t like dusting crops, boy“).
The concept and the name were introduced by В.Н. Берестовский (V.N. Berestovskii) in his paper “Submetries of space-forms of nonnegative curvature” published in 1987 in Siberian Math. J. Among other things, he proved that a submetry between spheres (of any dimension) is an isometry. Of course, there are many submetries of onto other spaces: take the quotient by a subgroup of , which can be either discrete or continuous. Are there any submetries that are not quotients by isometries?
Yes, there are. I’ll describe a (modified) example given by Berestovskii and Guijarro (2000). Let be the hyperbolic plane realized as the upper half-plane with the metric . Define by
Don’t panic; this is just the signed distance function (in the metric of ) to the fat green curve below. I also drew two other level sets of , to the best of my Friday night line-drawing ability.
To convince yourself that is a submetry, first consider for which the submetry property is clear (it’s the quotient by horizontal translation), and then note that the inversion in the unit circle exchanges horizontal lines (horocycles at infinity) with horocycles at 0. An interesting feature of this submetry that it is not very smooth: but not .
I used the word mutation in the previous post because of the (implicit) connection to quiver mutation. The quiver mutation is easy to define: take an oriented graph (quiver) where multiple edges are allowed, but must have consistent orientation (i.e., no 2-edge oriented cycles are allowed). Mutation at vertex v is done in three steps:
v is removed and each oriented path of length two through v is contracted into an edge. That is, the stopover at v is eliminated.
Step 1 may create some 2-edge oriented cycles, which must be deleted. That is, we cancel the pairs of arrows going in opposite directions.
The replacement vertex v’ is inserted, connected to the rest of the graph in the same way that v was, but with opposite orientation. In practice, one simply reuses v for this purpose.
Some quivers have a finite set of mutation equivalent ones; others an infinite one. Perhaps the simplest nontrivial case is the oriented 3-cycle with edges of multiplicities . The finiteness of its equivalents has to do with the Markov constant (not this time), which is invariant under mutation. This is investigated in the paper “Cluster-Cyclic Quivers with Three Vertices and the Markov Equation” by Beineke, Brűstle and Hille. The appendix by Kerner relates the Markov constant to Hochschild cohomology, which I take as a clue for me to finish this post.
So I’ll leave you to play with the mutation applet linked in the embedded tweet below.
This equation is symmetric with respect to unknown terms , , ; therefore, knowing one of its solutions
it is easy to find the following five:
Although these six solutions may be different, we will consider them as one, denoted by
The above is a quote from A.A.Markov’s paper “Sur les formes quadratiques binaires indéfinies” (part 2). Back in 1880 people were patient enough to write out all permutations of three symbols… If we fix and , the equation for becomes
which admits the second integer root . We can also write which does not immediately tell that is an integer, but it does tell us that is positive. For example, from the obvious solution we get . Now it would not do us any good to mutate in the third variable again, for it would bring us back to . But we can mutate in the second variable, replacing it with . Having understood so well the symmetry of the equation, we write this new solution as , in the increasing order. Now we can mutate either 1 or 2, and so it goes…
All triples, except for the two at the beginning, consist of distinct numbers (thus, they do generate six distinct solutions if order matters). The tree contains all solutions of the Markov equation. The Wikipedia article also points out the occurrence of Fibonacci numbers along the top branch, as well as a curious identity discovered by Don Zagier: let ; then (for triples written in increasing order)
Looks like a fun problem on simplification of inverse hyperbolic trigonometric functions.
Also, it’s still unknown whether two distinct Markov triples can have the same maximum . Looks like a fun problem for amateur number theorists.
To wrap this up, I will describe how Markov (the one of Markov chains fame, not his identically-named son of 4-manifold undecidability fame) came across the equation. Let be a quadratic form with real coefficients normalized by . What is the best upper bound on
In 1873 Korkin and Zolotarev published a paper showing that the best bound is , attained by the form . Looks like the case is closed. But what if is not (precisely, not equivalent to under the action of )? Then the best bound improves to , attained by (this is also due to KZ). Well, what if is not equivalent to either or ? Then the bound improves to (found by Markov), and we could go on…
But rather than continue in this fashion, Markov looked for the threshold at which the number of inequivalent forms with minimum becomes infinite. And he found it: (for comparison, ). Specifically, there are only finitely many forms with minimum above , for every . But there are infinitely many forms with minimum exactly , such as . It was the iterative process of getting more and more of these forms that led Markov to the Diophantine equation .
The number and its square also appear in Zagier’s identity with … But enough numerology for today.
Let’s admit it: it’s hard to keep track of signs when multiplying numbers. Being lazy people, mathematicians seek ways to avoid this chore. One popular way is to work in the enchanted world of , where . I’ll describe another way, which is to redefine multiplication by letting the factors reach a consensus on what the sign of their product should be.
If both and are positive, let their product be positive. And if they are both negative, the product should also be negative. Finally, if the factors can’t agree on which sign they like, they compromise at 0.
In a formula, this operation can be written as , but who wants to see that kind of formulas? Just try using it to check that the operation is associative (which it is).
But I hear someone complaining that is just an arbitrary operation that does not make any sense. So I’ll reformulate it. Represent real numbers by ordered pairs , for example becomes and becomes . Define multiplication component-wise. Better now? You don’t have to keep track of minus signs because there aren’t any.
This comes in handy when multiplying the adjancency matrices of quivers. The Wikipedia article on Quiver illustrates the concept with this picture:
But in mathematics, a quiver is a directed graph such as this one:
Recall that the adjancency matrix of a graph on vertices has if there is an edge between and , and otherwise. For a directed graph we modify this definition by letting if the arrow goes from to , and if it goes in the opposite direction. So, for the quiver shown above we get
For an undirected graph the square counts the number of ways to get from to in exactly 2 steps (and one can replace 2 by n). To make this work for the directed graph, we represent numbers as pairs and and carry on multiplying and adding:
For instance, there are two ways to get from 3 to 4 in two steps, but none in the opposite direction. This works for any powers, and also for multigraphs (with more than one edge between same vertices). Logically, this is the same as separating the adjacency matrix into its positive and negative parts, and multiplying them separately.
The last example is matrix mutation from the theory of cluster algebras. Given a (usually, integer) matrix and a positive integer , we can mutate in the direction by doing the following:
Dump nuclear waste on the th row and th column
To each non-radioactive element add , that is, the product of the radioactive elements to which is exposed.
Clean up by flipping the signs of all radioactive elements.
The properties of should make it clear that each mutation is an involution: mutating for the second time in the same direction recovers the original matrix. However, applying mutations in different directions, one can obtain a large, or even infinite, class of mutation-equivalent matrices.