## This ain’t like dusting crops, boy

The hyperspace is a set of sets equipped with a metric or at least with a topology. Given a metric space $X$, let $\mathcal{H}(X)$ be the set of all nonempty closed subsets of $X$ with the Hausdorff metric: $d(A,B) if no matter where you are in one set, you can jump into the other by traveling less than $r$. So, the distance between letters S and U is the length of the longer green arrow.

The requirement of closedness ensures $d(A,B)>0$ for $A\ne B$. If $X$ is unbounded, then $d(A,B)$ will be infinite for some pairs of sets, which is natural: the hyperspace contains infinitely many parallel universes which do not interact, being at infinite distance from one another.

Every continuous surjection $f\colon X\to Y$ has an inverse $f^{-1}\colon Y\to \mathcal{H}(X)$ defined in the obvious way: $f^{-1}(y)=f^{-1}(y)$. Yay ambiguous notation! The subset of $\mathcal{H}(X)$ that consists of the singletons is naturally identified with $X$, so for bijective maps we recover the usual inverse.

Exercise: what conditions on $f$ guarantee that $f^{-1}$ is (a) continuous; (b) Lipschitz? After the previous post it should not be surprising that

• Even if $f$ is open and continuous, $f^{-1}$ may be discontinuous.
• If $f$ is a Lipschitz quotient, then $f^{-1}$ is Lipschitz.

Proofs are not like dusting crops—they are easier.

## Continuous:Lipschitz :: Open:?

A map is continuous if the preimage of every open set is open. If the topology is defined by a metric, we can reformulate this as: the inverse image of an open ball $B_R(f(x))$ contains an open ball $B_r(x)$. Like this:

But bringing these radii $R$ and $r$ into the picture will not serve any purpose unless we use them to quantify continuity. For example, if we insist that $r\ge cR$ for a fixed constant $c>0$, we arrive at the definition of a Lipschitz map.

But why do we look at the inverse image; what happens if we take the direct image instead? Then we get the definition of an open map: the image of every open set is open. Recasting this in metric terms: the image of an open ball $B_R(x)$ contains an open ball $B_r(f(x))$. Like this:

If we quantify openness by requiring $r\ge cR$ for a fixed $c>0$, we arrive at the definition of a co-Lipschitz map. [Caution: some people use “co-Lipschitz” to mean $|f(a)-f(b)|\ge c|a-b|$, which is a different condition. They coincide if $f$ is bijective.]

I don’t know if openness without continuity is good for anything other than torturing students with exercises such as: “Construct an open discontinuous map from $\mathbb R$ to $\mathbb R$.” We probably want both. At first one can hope that open continuous maps will have reasonable fibers $f^{-1}(x)$: something $(m-n)$-dimensional when going from $m$ dimensions to $n$, with $m\ge n$. The hope is futile: an open continuous map $f\colon \mathbb R^2\to\mathbb R^2$ can squeeze a line segment to a point (construction left as an exercise).

A map that is both Lipschitz and co-Lipschitz is called a Lipschitz quotient; this is a quantitative analog of “open continuous”. It turns out that for any Lipschitz quotient $f\colon \mathbb R^2\to\mathbb R^2$ the preimage of every point is a finite set. Moreover, $f$ factors as $f=g\circ h$ where $g$ is a complex polynomial and $h$ is a homeomorphism.

This is encouraging… but going even one dimension higher, it remains unknown whether a Lipschitz quotient $f\colon \mathbb R^3\to\mathbb R^3$ must have discrete fibers. For an overview of the subject, see Bill Johnson’s slides.

## Sweetened and flavored dessert made from gelatinous or starchy ingredients and milk

Takagi (高木) curves are fractals that are somehow less known than Cantor sets and Sierpinski carpets, yet they can also be useful as (counter-)examples. The general 高木 curve is the graph $y=f(x)$ of a function $f$ that is built from triangular waves. The $n$th generation wave has equation $y=2^{-n} \lbrace 2^n x \rbrace$ where $\lbrace\cdot\rbrace$ means the distance to the nearest integer. Six of these waves are pictured below.

Summation over $n$ creates the standard 高木 curve $T$, also known as the blancmange curve:

$\displaystyle y=\sum_{n=0}^{\infty} 2^{-n} \lbrace 2^n x\rbrace$

Note the prominent cusps at dyadic rationals: more on this later.

General 高木 curves are obtained by attaching coefficients $c_n$ to the terms of the above series. The simplest of these, and the one of most interest to me, is the alternating 高木 curve $T_{alt}$:

$\displaystyle y=\sum_{n=0}^{\infty} (-2)^{-n} \lbrace 2^n x\rbrace$

The alternation of signs destroys the cusps that are so prominent in $T$. Quantitatively speaking, the diameter of any subarc of $T_{alt}$ is bounded by the distance between its endpoints times a fixed constant. The curves with this property are called quasiarcs, and they are precisely the quasiconformal images of line segments.

Both $T$ and $T_{alt}$ have infinite length. More precisely, the length of the $n$th generation of either curve is between $\sqrt{(n+1)/2}$ and $\sqrt{n+1}+1$. Indeed, the derivative of $x\mapsto 2^{-k}\lbrace 2^k x\rbrace$ is just the Rademacher function $r_k$. Therefore, the total variation of the sum $\sum_{k=0}^n c_k 2^{-k}\lbrace 2^k x\rbrace$ is the $L^1$ norm of $\sum_{k=0}^n c_k r_k$. With $c_k=\pm 1$ the sharp form of the Хинчин inequality from the previous post yields

$\displaystyle 2^{-1/2}\sqrt{n+1} \le \left\|\sum_{k=0}^n c_k r_k\right\|_{L^1} \le \sqrt{n+1}$

For the upper bound I added 1 to account for the horizontal direction. Of course, the bound of real interest is the lower one, which proves unrectifiability. So far, a construction involving these curves shed a tiny bit of light on the following questions:

Which sets $K\subset \mathbb R^n$ have the property that any quasiconformal image of $K$ contains a rectifiable curve?

I won’t go (yet) into the reasons why this question arose. Any set with nonempty interior has the above property, since quasiconformal maps are homeomorphisms. A countable union of lines in the plane does not; this is what 高木 curves helped to show. The wide gap between these results remains to be filled.

## The Khintchine inequality

Today’s technology should make it possible to use the native transcription of names like Хинчин without inventing numerous ugly transliterations. The inequality is extremely useful in both analysis and probability: it says that the $L^p$ norm of a linear combination of Rademacher functions (see my post on the Walsh basis) can be computed from its coefficients, up to a multiplicative error that depends only on $p$. Best of all, this works even for the troublesome $p=1$; in fact for all $0. Formally stated, the inequality is

$\displaystyle A_p\sqrt{\sum c_n^2} \le \left\|\sum c_n r_n\right\|_{L^p} \le B_p\sqrt{\sum c_n^2}$

where the constants $A_p,B_p$ depend only on $p$. The orthogonality of Rademacher functions tells us that $A_2=B_2=1$, but what are the other constants? They were not found until almost 60 years after the inequality was proved. The precise values, established by Haagerup in 1982, behave in a somewhat unexpected way. Actually, only $A_p$ does. The upper bound is reasonably simple:

$\displaystyle B_p=\begin{cases} 1, \qquad 0

The lower bound takes an unexpected turn:

$\displaystyle A_p=\begin{cases} 2^{\frac{1}{2}-\frac{1}{p}},\qquad 0

The value of $p_0$ is determined by the continuity of $A_p$, and is not far from $2$: precisely, $p_0\approx 1.84742$. Looks like a bug in the design of the Universe.

For a concrete example, I took random coefficients $c_0...c_4$ and formed the linear combination shown above. Then computed its $L^p$ norm and the bounds in the Khintchine inequality. The norm is in red, the lower bound is green, the upper bound is yellow.

It’s a tight squeeze near $p=2$

## Linear independence, matroids, and fun with Fano

Take a finite set of vectors $v_1,\dots,v_n$ in some vector space. Call a set $A\subset \lbrace 1,2,\dots,n\rbrace$ independent if the corresponding vectors are linearly independent. The collection of all independent sets is denoted $\mathcal{I}$; what properties must it have? Well, it must be a matroid:

A matroid with ground set $E$ is a nonempty collection of subsets $\mathcal{I}\subseteq 2^E$ which is both hereditary and augmentable.

1. Hereditary: if $A\in \mathcal{I}$, then any subset of $A$ is in $\mathcal{I}$
2. Augmentable: if $A,B\in \mathcal{I}$ and $A$ has fewer elements than $B$, then $A$ can take an element from $B$ and still remain in $\mathcal{I}$.

The augmentation property brings to mind wealth redistribution (at least if you grew up in the USSR). Anyway, it is fairly obvious that any collection $\mathcal{I}$ that arises from vectors is a matroid. Is the converse true? Not obvious at all.

Behold the Fano plane, the smallest projective plane in existence: 7 points, 7 lines, any two lines meet at one point, through any two points there is a unique line.

The Fano matroid has these 7 vertices as the ground set. A set of vertices is independent if it has at most three points and is not a line. The hereditary property is clear. The augmentation isn’t hard either: given a 2-point set $A$, let $L$ be the line through it; since $B$ has 3 points and is not a line, it must contain a point outside of $L$, and this is the point we add to $A$.

Can the Fano matroid be realized by 7 vectors in a Euclidean space? Guess what… no.

Suppose we do have such a set of 7 vectors. Then they span a 3-dimensional space, since there is no linearly independent subset with 4 vectors. The three vertices of the triangle are independent, and we may assume they are realized by the standard basis vectors $v_1,v_2,v_3$. Enumerate the other points/vectors $v_4,\dots,v_7$ so that $v_{j+3}$ is the midpoint opposite $v_j$ for $j=1,2,3$. Observe that $v_4$ is a linear combination of $v_2,v_3$, etc. Thus, $v_1,\dots, v_7$ are the columns of the matrix

$\displaystyle M= \begin{pmatrix} 1&0&0&0&*&*&* \\ 0&1&0&*&0&*&* \\ 0&0&1&*&*&0&* \end{pmatrix}$

But wait, there’s more! Denote $v_7=(x,y,z)^T$ and observe that the linear dependence of $v_1,v_4,v_7$ forces $v_4$ to be a scalar multiple of $(0,y,z)^T$. Similarly for $v_5,v_6$. So the matrix $M$ must be of the form

$\displaystyle M= \begin{pmatrix} 1&0&0&0&bx&cx&x \\ 0&1&0&ay&0&cy&y \\ 0&0&1&az&bz&0&z \end{pmatrix}$

But wait, there’s even more! The vectors $v_4,v_5,v_6$ are also dependent (that inscribed circle is actually a line). The determinant of the 4,5,6 columns is $2abcxyz$. Can any of $x,y,z$ be zero? No, because that would make $v_7$ a linear combination of two of $v_1,v_2,v_3$, which is not supposed to happen. Can any of $a,b,c$ be zero? No, because that would create a zero vector, and all of the 1- and 2-subsets are independent. So, the Fano matroid cannot be realized by vectors…

unless 2=0. Over $\mathbb F_2$ there is no problem:

$\displaystyle M= \begin{pmatrix} 1&0&0&0&1&1&1 \\ 0&1&0&1&0&1&1 \\ 0&0&1&1&1&0&1 \end{pmatrix}$

(P.S. All of this stuff is in Whitney’s 1935 paper that introduced matroids.)

## Multiplication by consensus, and some arrows

Let’s admit it: it’s hard to keep track of signs when multiplying numbers. Being lazy people, mathematicians seek ways to avoid this chore. One popular way is to work in the enchanted world of $\mathbb Z_2$, where $-1=1$. I’ll describe another way, which is to redefine multiplication by letting the factors reach a consensus on what the sign of their product should be.

If both $a$ and $b$ are positive, let their product be positive. And if they are both negative, the product should also be negative. Finally, if the factors can’t agree on which sign they like, they compromise at 0.

In a formula, this operation can be written as $a\odot b=\frac{a|b|+|a|b}{2}$, but who wants to see that kind of formulas? Just try using it to check that the operation is associative (which it is).

But I hear someone complaining that $\odot$ is just an arbitrary operation that does not make any sense. So I’ll reformulate it. Represent real numbers by ordered pairs $(a_+,a_-)\in [0,\infty)\times [0,\infty)$, for example $5$ becomes $(5,0)$ and $-6$ becomes $(0,6)$. Define multiplication component-wise. Better now? You don’t have to keep track of minus signs because there aren’t any.

This $\odot$ comes in handy when multiplying the adjancency matrices of quivers. The Wikipedia article on Quiver illustrates the concept with this picture:

But in mathematics, a quiver is a directed graph such as this one:

Recall that the adjancency matrix $A$ of a graph on vertices $\lbrace 1,2,\dots,n\rbrace$ has $A_{ij}=1$ if there is an edge between $i$ and $j$, and $A_{ij}=0$ otherwise. For a directed graph we modify this definition by letting $A_{ij}=1$ if the arrow goes from $i$ to $j$, and $A_{ij}=-1$ if it goes in the opposite direction. So, for the quiver shown above we get

$A=\displaystyle \begin{pmatrix} 0 & 1 & -1 & 1 \\ -1 & 0 & -1 & 1 \\ 1 & 1 & 0 & -1 \\ -1 & -1 & 1 & 0 \end{pmatrix}$

For an undirected graph the square $A^2$ counts the number of ways to get from $i$ to $j$ in exactly 2 steps (and one can replace 2 by n). To make this work for the directed graph, we represent numbers as pairs $1=(1,0)$ and $-1=(0,1)$ and carry on multiplying and adding:

$A\odot A=\displaystyle \begin{pmatrix} (0,0) & (0,0) & (1,0) & (1,1) \\ (0,0) & (0,0) & (1,1) & (0,1) \\ (0,1) & (1,1) & (0,0) & (2,0) \\ (1,1) & (1,0) & (0,2) & (0,0) \end{pmatrix}$

For instance, there are two ways to get from 3 to 4 in two steps, but none in the opposite direction. This works for any powers, and also for multigraphs (with more than one edge between same vertices). Logically, this is the same as separating the adjacency matrix into its positive and negative parts, and multiplying them separately.

The last example is matrix mutation from the theory of cluster algebras. Given a (usually, integer) $m\times n$ matrix $A$ and a positive integer $k\le \min(m,n)$, we can mutate $A$ in the direction $k$ by doing the following:

1. Dump nuclear waste on the $k$th row and $k$th column
2. To each non-radioactive element $a_{ij}$ add $a_{ik}\odot a_{kj}$, that is, the $\odot$ product of the radioactive elements to which $a_{ij}$ is exposed.
3. Clean up by flipping the signs of all radioactive elements.

The properties of $\odot$ should make it clear that each mutation is an involution: mutating for the second time in the same direction recovers the original matrix. However, applying mutations in different directions, one can obtain a large, or even infinite, class of mutation-equivalent matrices.

## Another orthonormal basis: Hermite functions

This is an orthonormal basis for $L^2(\mathbb R)$. Since the measure of $\mathbb R$ is infinite, functions will have to decay at infinity in order to be in $L^2$. The Hermite functions are
$\displaystyle \Phi_n(x)=(2^n n! \sqrt{\pi})^{-1/2} H_n(x)e^{-x^2/2}$
where $H_n$ is the nth Hermite polynomial, defined by
$\displaystyle H_n(x)=(-1)^n e^{x^2} \left(\frac{d}{dx}\right)^n e^{-x^2}$.
The goal is to prove that the functions $\Phi_n$ can be obtained from $x^n e^{-x^2/2}$ via the Gram-Schmidt process. (They actually form a basis, but I won’t prove that.)

One can observe that the term $e^{-x^2/2}$ would be unnecessary if we considered the weighted space $L^2(\mathbb R, w)$ with weight $w(x)=e^{-x^2}$ and the inner product $\langle f,g\rangle=\int_{\mathbb R} fg\,w\,dx$. In this language, we orthogonalize the sequence of monomials $\lbrace x^n\rbrace\subset L^2(\mathbb R, w)$ and get the ON basis of polynomials $\{c_n H_n\}$ with $c_n = (2^n n! \sqrt{\pi})^{-1/2}$ being a normalizing constant. But since weighted spaces were never introduced in class, I’ll proceed with the original formulation. First, an unnecessary graph of $\Phi_0,\dots,\Phi_4$; the order is red, green, yellow, blue, magenta.

Claim 1. $H_n$ is a polynomial of degree $n$ with the leading term $2^n x^n$. Proof by induction, starting with $H_0=1$. Observe that

$\displaystyle H_{n+1}=- e^{x^2} \frac{d}{dx}\left(e^{-x^2} H_n\right) =2x H_n - H_n'$

where the first term has degree $n+1$ and the second $n-1$. So, their sum has degree exactly $n+1$, and the leading coefficient is $2^{n+1}$. Claim 1 is proved.

In particular, Claim 1 tells us that the span of the $\Phi_0,\dots,\Phi_n$ is the same as the span of $\lbrace x^k e^{-x^2/2}\colon 0\le k\le n\rbrace$.

Claim 2. $\Phi_m\perp \Phi_n$ for $m\ne n$. We may assume $m. Must show $\int_{\mathbb R} H_m(x) H_n(x) e^{-x^2}\,dx=0$. Since $H_m$ is a polynomial of degree $m, it suffices to prove

(*) $\displaystyle \int_{\mathbb R} x^k H_n(x) e^{-x^2}\,dx=0$ for integers $0\le k.

Rewrite (*) as $\int_{\mathbb R} x^k \left(\frac{d}{dx}\right)^n e^{-x^2} \,dx=0$ and integrate by parts repeatedly, throwing the derivatives onto $x^k$ until the poor guy can't handle it anymore and dies. No boundary terms appear because $e^{-x^2}$ decays superexponentially at infinity, easily beating any polynomial factors. Claim 2 is proved.

Combining Claim 1 and Claim 2, we see that $\Phi_n$ belongs to the $(n+1)$-dimensional space $\mathrm{span}\,\lbrace x^k e^{-x^2/2}\colon 0\le k\le n\rbrace$, and is orthogonal to the $n$-dimensional subspace $\mathrm{span}\,\lbrace x^k e^{-x^2/2}\colon 0\le k\le n-1\rbrace$. Since the “Gram-Schmidtization'' of $x^n e^{-x^2/2}$ has the same properties, we conclude that $\Phi_n$ agrees with this “Gram-Schmidtization'' up to a scalar factor.

It remains to prove that the scalar factor is unimodular ($\pm 1$ since we are over reals).

Claim 3. $\langle \Phi_n, \Phi_n\rangle=1$ for all $n$. To this end we must show $\int_{\mathbb R} H_n(x)H_n(x)e^{-x^2}\,dx =2^n n! \sqrt{\pi}$. Expand the first factor $H_n$ into monomials, use (*) to kill the degrees less than n, and recall Claim 1 to obtain
$\int_{\mathbb R} H_n(x)H_n(x)e^{-x^2}\,dx = 2^n \int_{\mathbb R} x^n H_n(x)e^{-x^2}\,dx = (-1)^n 2^n\int_{\mathbb R} x^n \left(\frac{d}{dx}\right)^n e^{-x^2} \,dx$.
As in the proof of Claim 2, we integrate by parts throwing the derivatives onto $x^n$. After n integrations the result is
$2^n \int_{\mathbb R} n! e^{-x^2} \,dx = 2^n n! \sqrt{\pi}$, as claimed.

P.S. According to Wikipedia, these are the “physicists’ Hermite polynomials”. The “probabilists’ Hermite polynomials” are normalized to have the leading coefficient 1.

## Three knot invariants

This has been a long Monday, toward the end of which I confused knot width with its bridge number. This post is meant to remind me which is which. Here is a trefoil knot:

The crossing number is the smallest number of self-intersections that the knot can have when thrown on a plane. For the trefoil cr=3.

Here is the same knot on which points of locally maximal height are marked with red, and points of locally minimal height are in blue. There are two extrema of each kind.

The bridge number of a knot is the smallest number of maxima in any of its diagrams. The trefoil has b=2. A knot is in the bridge position if its diagram achieves b, and in addition all the maxima are above all the minima. This allows one to neatly cut the knot by its bridge sphere (in orange), leaving b untangled arcs hanging from either side.

The third invariant also has to do with the extrema of the height function. Now we pick a regular value in each interval between each pair of consecutive critical values, and count the multiplicity of said value. The sum of all these multiplicities is the width of the knot. For the trefoil w=8.

The knot is in a thin position when it attains w, as on the diagram above. Despite their apparent similarity, two different positions of the knot may be required to attain b and w. Also, the invariants behave in different ways under connected sums. Here is a connected sum of two knots, with trefoil on the left:

It’s obvious that the crossing number is subadditive: $\mathrm{cr}\,(K\# K')\le \mathrm{cr}\,(K)+\mathrm{cr}\,(K')$. It remains an open problem whether it’s in fact additive, i.e., whether the equality always holds.

The bridge number is the best behaved of the three: it’s known that $b(K\# K')=b(K)+b(K')-1$ (one-sided inequality is easy: connected sum can kill the absolute maximum on the lower-placed knot). Horst Schubert proved this equality in 1954. Much more recently (in 2004) Jennifer Schultens used modern machinery to give a 6-page proof of Schubert’s theorem.

The width turns out to misbehave. By putting one of the summands way below the other, one sees at once that $w(K\# K')\le w(K)+w(K')-2$. In many examples this is in fact an equality. However, very recently (2011) Blair and Tomova proved that strict inequality may hold; moreover, the Scharlemann-Schultens lower bound $w(K\# K')\ge \min(w(K),w(K'))$ is attained for infinitely many pairs of knots.

## The Walsh basis

If you ask a random passerby to give you an orthonormal basis for $L^2[0,1]$, they will probably respond with $e_n(t)=\exp(2\pi i nt)$, $n\in \mathbb Z$. There is a lot to like about this exponential basis: most importantly, it diagonalizes the $\frac{d}{dt}$ operator: $\frac{d}{dt}e_n=2\pi i n e_n$. This property makes the exponential basis indispensable in the studies of differential equations. However, I prefer to describe the Walsh basis, which has several advantages:

• the basis functions take just two values $\pm 1$, which simplifies the computation of coefficients
• the proof of the basis property is easier than for the exponential basis
• there is a strong connection to probability: the Walsh expansion can be seen as conditional expectation, and the partial sums form a Doob martingale
• partial sums converge a.e. for any $L^1$ function, which is not the case for the exponential basis.

First, introduce the Rademacher functions $r_n=\mathrm{sign}\, \sin (2^{n+1} \pi t)$, $n=0,1,2,\dots$ (The enumeration is slightly different from what I used in class.) These are $r_0,r_1,r_2,r_3$:

Alternatively, one can define $r_n$ as the function which takes the values $+1,-1$ alternatively on the dyadic intervals $\displaystyle \bigg[\frac{j}{2^{n+1}},\frac{j+1}{2^{n+1}}\bigg)$.

To define the $n$th Walsh function $W_n$, express the index $n$ as the sum of powers of 2, i.e., $n=2^{p_1}+2^{p_2}+\dots$ and let $W_n=r_{p_1}r_{p_2}\dots$. For example, $W_{13}=r_3r_2r_0$ because $13=2^3+2^2+2^0$. Since the binary representation is unique, the definition makes sense. We also have $W_0=1$ because the product of an empty set of numbers is 1.

In class I checked that the set $\lbrace W_n\colon n=0,1,2,\dots\rbrace$ is orthonormal. Also, for any integer $k\ge 0$ the linear span of $\lbrace W_n\colon 0\le n< 2^k \rbrace$ is the space $V_k$ of all functions that are constant on the dyadic intervals of length $2^{-k}$. This follows by observing that $\lbrace W_n\colon 0\le n< 2^k \rbrace\subset V_k$ and that the dimension of $V_k$ is $2^k$.

To prove that the Walsh basis is indeed a basis, suppose that $h\in L^2[0,1]$ is orthogonal to all $W_n$. Since $h\perp V_k$ for all $k$, the integral of $h$ over any dyadic interval is zero (note that the characteristic function of any dyadic interval belongs to some $V_k$). But any subinterval $I\subset [0,1]$ can be written as a disjoint countable union of dyadic intervals: just take all dyadic intervals that are contained in $I$. (You don't necessarily get the right type of endpoints, but as long as we work with integrals, the difference between open and closed intervals is immaterial.) Thus, the integral of $h$ over any subinterval of $[0,1]$ is zero. By the Lebesgue differentiation theorem, for a.e. $t$ we have $\displaystyle h(t)=\lim_{\delta\to 0}\frac{1}{2\delta}\int_{t-\delta}^{t+\delta} h =0$. Thus $h=0$ as required.

The proof is even simpler if we use the non-centered form of the Lebesgue differentiation theorem: for a.e. $t$ the average $\frac{1}{b-a}\int_a^b h$ approaches $h(t)$ as $a,b\to t$ in such a way that $a\le t\le b$. Armed with this theorem, we can consider the sequence of dyadic intervals containing $t$, and immediately obtain $h(t)=0$ a.e.

Having proved that $\lbrace W_n\rbrace$ is a basis, let’s expand something in it. For example, this moderately ugly function $f$:

I used Maple to compute the coefficients $c_n=\langle f, W_n\rangle$ and plotted the partial sums $\sum_{n=0}^N c_n W_n$ for $N=1,3,7,15$:

Such partials sums (those that use $2^k$ basis functions) are particularly nice: they are obtained simply by averaging $f$ over each dyadic interval of length $2^{-k}$. In probability theory this is known as conditional expectation. The conditional expectation is a contraction in any $L^p$ space, including $L^1$ which gives so much trouble to the exponential basis. The highly oscillatory parts of $f$ are killed by the dyadic averaging; in contrast, when integrated against the exponentials, they may cleverly accumulate and destroy the convergence of partial sums.

## The magic of cyclic sieving and q-analogs

In my previous post the 14 possible triangulations of hexagons were arranged by Whitehead moves. But if asked, most people would probably group them in this way:

Well, some may decide to put the S- and Z-shapes together since they are mirror images of each other. But I won’t. It’s easy to rotate a piece of paper with a triangulation on it; it’s not nearly as easy to fetch a mirror and study triangulations through it. So let it be recorded that the cyclic group $\mathbb Z_6$ acts on triangulations by rotating the hexagon. The orbits have lengths 6,3,3,2.

In the previous post I also mentioned that the number of triangulations of an (n+2)-gon is the Catalan number $C_n=\frac{1}{n+1}\binom{2n}{n}$. With n=4 we get

$\displaystyle C_4=\frac{1}{5}\,\frac{8\cdot 7\cdot 6\cdot 5}{4\cdot 3\cdot 2}=\frac{8\cdot 7\cdot 6}{4\cdot 3\cdot 2} \qquad (*)$

Amazingly, the formula (*) does not just count triangulations: it also knows how they behave under rotations of the hexagons. But it will only tell us if we don’t rush to its evaluation. Instead, replace each number on the right of (*) by its q-analog

$\displaystyle [n]_q=\frac{1-q^n}{1-q} =1+q+q^2+\dots+q^{n-1}$

The result is

$\displaystyle C_4(q) = \frac{(1-q^8)(1-q^7)(1-q^6)}{(1-q^4)(1-q^3)(1-q^2)} =1+q^2+q^3+2q^4+q^5+2q^6+q^7+2q^8+q^9+q^{10}+q^{12}\qquad (**)$

The fact that we get a polynomial can be explained: since the Catalan number is an integer, every prime divisor of the denominator in (*) also appears in the numerator (to equal or higher power), and this can be translated into the order of vanishing of polynomials in (**) at the roots of unity. It’s not as easy to explain why the coefficients end up nonnegative.

Now, both the cyclic group $\mathbb Z_6$ and the roots of unity were mentioned. Let’s bring them together by introducing $\zeta=\exp(2\pi i/6)$ which generates $\mathbb Z_6$ by multiplication. The cyclic sieving phenomenon (in this special case) is the following: $C_4(\zeta^d)$ is the number of triangulations fixed by the action of $\zeta^d$. Indeed, evaluating $C_4(q)$ at $q=\zeta,\zeta^2,\dots,\zeta^6$ we get $0,2,6,2,0,14$. So, no triangulations are invariant under rotation by $\zeta$, two are invariant under rotation by $\zeta^2$, etc. It’s easy to read the entire orbit structure from this:

• there are no fixed points
• 2 elements of order 2 form a single orbit of size 2
• 6 elements of order 3 form two orbits of size 3
• there are no elements of order 4, since their number is $C_4(\zeta^4)-C_4(\zeta^2)=2-2=0$
• there are $C_4(\zeta^6)-C_4(\zeta^2)-C_4(\zeta^3)=6$ elements of order 6, which form a single orbit of size 6

And it’s not just triangulations. Consider noncrossing matchings of 2n points — these can be observed experimentally by sitting 2n people at a round table and asking them all to shake hands with someone with no arms crossing.

The number of such matchings is also the Catalan number $C_n$ (hence 14 matchings for 8 people). But the orbit structure is completely different! How can the same polynomial $C_4(q)$ work here? It can because we now act by $\mathbb Z_8$ and therefore take $\zeta=\exp(2\pi i/8)$. Evaluation of $C_4(q)$ at the powers of $\zeta$ yields $0,2,0,6,0,2,0,14$. As above, we see the orbit structure in these numbers:

• 2 elements of order 2 form a single orbit
• 4 elements of order 4 form a single orbit
• 8 elements of order 8 form a single orbit

And it’s not just the Catalan numbers. Suppose that among those 8 guests at the round table are 2 men and 6 women. The number of seating arrangements is $\displaystyle \binom{8}{2}=\frac{8\cdot 7}{2}$. Don’t evaluate to 28 just yet. Replace each integer with its q-analog and you’ll get the polynomial $\displaystyle 1+q+2q^2+2q^3+3q^4+3q^5+4q^6+3q^7+3q^8+2q^9+2q^{10}+q^{11}+q^{12}$, which will tell you that under rotation, 4 arrangements form an orbit of size 4, while the other 24 form 3 orbits of size 8.

And there is much more. The cyclic sieving phenomenon was established recently (in 2004) by three mathematicians from Minnesota (Reiner, Stanton, and White), and there are still paths waiting to be explored.