Tristram-Levine signatures with Scilab

The signature of a Hermitian matrix {A} can be defined either as the pair (number of positive eigenvalues, number of negative eigenvalues), or simply as the difference

\displaystyle s(A)=\#\{\lambda_i: \lambda_i>0\}-\#\{\lambda_i: \lambda_i<0\}

The function {s(A)} hides some information when the matrix is degenerate, but this will not be of concern here. For a matrix of size {n}, the number {s(A)} is between {-n} and {n} and has the same parity as {n}.

Given any square matrix {V} with real entries and a complex number {\omega}, we can form {V_\omega = (1-\omega)V+(1-\bar\omega)V^T}, which is a Hermitian matrix. Then {s(\omega):=s(V_\omega)} is an integer-valued function of {\omega}. Restricting attention to the unit circle {|\omega|=1}, we obtain a piecewise constant function with jumps at the points where {V_\omega} is degenerate.

When {A} is a Seifert matrix of a knot, the function {s(\omega)} is the Tristram-Levine signature of the knot. To each knot {K} there are infinitely many Seifert surfaces {F}, and to each Seifert surface there are infinitely many Seifert matrices {A}, depending on how we choose the generators of {H_1(F)}. Yet, {s(\omega)} depends on {K} alone.

Below I plot {s(\omega)} for a few knots using Scilab for computation of signature and the knot data from

J. C. Cha and C. Livingston, KnotInfo: Table of Knot Invariants

Technical point: since Scilab enumerates colors using positive integers, the colors below correspond to the number of positive eigenvalues rather than the signature. Namely, black for 0, blue (1), green (2), and cyan (3).

As always, first comes the trefoil:

Trefoil 3_1

One of its Seifert matrices is

\displaystyle \begin{pmatrix} -1 & 0 \\ -1 & -1 \end{pmatrix}

and the Tristram-Levine signature is

trefoil

Next, the knot {8_5}

8_5

with Seifert matrix

\displaystyle  \begin{pmatrix} -1& 0& 0& -1& -1& -1\\ 0& 1& 0& 0& 0& 0\\ -1& 0& -1& -1& -1& -1\\ 0& -1& 0& -1& -1& -1\\ 0& -1& 0& 0& -1& 0\\ 0& -1& 0& 0& -1& -1 \end{pmatrix}

and the signature

TL signature for 8_5
TL signature for 8_5

And finally, the knot {8_{19}}:

8_19

with Seifert matrix

\displaystyle  \begin{pmatrix} -1& 0& 0& 0& 0& 0\\ -1& -1& 0& 0& 0& 0\\ -1& -1& -1& -1& 0& -1\\ -1& -1& 0& -1& 0& 0\\ 0& 0& -1& -1& -1& -1\\ -1& -1& 0& -1& 0& -1\end{pmatrix}

and the signature

TL signature for 8_19
TL signature for 8_19

I experimented with a few more, trying to create more colors. However, guessing the complexity of the signature by looking at the Seifert matrix is one of many skills I do not have. So I conclude with the simple code used to plot the signatures.

function sig(A)
    clf();
    [n,n] = size(A);
    Npoints = 200;
    r = 2*%pi/Npoints;
    arcs = zeros(6,Npoints);
    colors = zeros(Npoints);
    for m = 1:Npoints
        x = cos(2*%pi*(m-1/2)/Npoints); 
        y = sin(2*%pi*(m-1/2)/Npoints);
        omega = complex(x,y);
        B = (1-omega)*A+(1-conj(omega))*A';
        signature = sum(sign(spec(B)));
        colors(m) = (signature+n)/2+1;
        arcs(:,m) = [x-r,y-r,2*r,2*r,0,360*64]';
    end
    xfarcs(arcs,colors');
    replot([-1,-1,1,1]);
    a=get("current_axes");
    a.isoview="on";
endfunction

Forward-thinking business dialogue

My university offers a Certificate of Advanced Study in Data Science:

As the world’s data grow exponentially, organizations need to understand, manage and use big data sets. These data spawned the term “big data,” which now monopolizes forward-thinking business dialogue.

Well, I have some medium-size data from last semester: the grades of business calculus students at the academic drop deadline. At that moment the available data consisted of 18 homework sets, 7 quizzes, and 1 midterm exam: 26 dimensions total. Based on these data, can we tell whether the student should drop the course?

Looks like we need some dimension reduction here. I achieve it by simply replacing 18 homework scores with their average, and likewise for quizzes. This has the effect of projecting 26-dimensional data to 3-dimensional space. I scaled each to the interval {[0,1]}, so that the data is represented by 139 points in the cube {[0,1]^3}.

Suppose that the minimal acceptable grade is C-; the grades of D or F are worse than dropping the class. Accordingly, the points are colored blue (C- or better) or red (D or F).

Grade data
Grade data

The goal is to find a plane {Ax+By+Cz=D} that separates the red dots from the blue ones. Since there is no plane that separates the groups perfectly, the task calls for a soft-margin support vector machine. I decided to get my hands dirty with actual computations in a spreadsheet.

My objective is

\displaystyle \mathcal{E}(A,B,C,D) = \sum_{i=1}^{139} ((0.1-\epsilon_i (Ax_i+By_i+Cz_i-D))^+)^2 \rightarrow \min

subject to the normalization {A+B+C=1}. Here {z^+=\max(z,0)}, {\epsilon_i=1} for blue dots and {\epsilon_i=-1} for red dots. The logic is that I want {Ax_i+By_i+Cz_i-D} to have the sign {\epsilon_i}, that is, to be on the correct side of the separating plane. Squaring the penalty terms helps with minimization. The term {0.1} introduces a penalty for being too close to the separating plane; this ought to improve the quality of separation.

The gradient of {\mathcal{E}} is easy to calculate: for example,

\displaystyle \frac{ \partial \mathcal{E}}{\partial A}= -2 \sum_{i=1}^{139} \epsilon_i x_i (0.1-\epsilon_i (Ax_i+By_i+Cz_i-D))^+

The spreadsheet recalculated both {\mathcal{E}} and {\nabla \mathcal{E}} every time I changed the values of {A,B,D} (the value of {C} was set to {1-A-B}). This made it easy to do gradient descent “by hand”, starting with {A=B=C=1/3} and {D=0.7}.

Minimization process
Minimization process

At the final step the gradient of {\mathcal {E}} is parallel to the gradient of the constraint function {A+B+C}: we are at a critical point. Since {\mathcal{E}} is convex, the point is a minimum. The optimal plane has the equation

\displaystyle 0.4703x+0.3285y+0.2012z=0.69035

which tells us that Exam 1 score x is the most effective predictor while homework average z is the least effective. (Homework was done online, with unlimited attempts, and no control over who actually did the work.)

SVM solution
SVM solution

The plane gives the correct prediction in 123 out of 139 cases. Not bad, considering that the data is incomplete and somewhat biased; it does not include the students who made a forward-thinking business decision and dropped the class by the deadline.

Gromov-Hausdorff convergence

The Hausdorff distance {d_{ H}(A,B)} between two subsets {A,B} of a metric space {X} is defined by {d_{ H}(A,B)=\inf\{r>0: A\subset B_r \text{ and } B\subset A_r \}}, where {A_r,B_r} are (open/closed does not matter) {r}-neighborhoods of the sets. Informally: {d_{ H}(A,B)<r} if no matter where you are in one set, you can jump into the other by traveling less than {r}. For example, the distance between letters S and U is about the length of the longer green arrow.

Hausdorff distance
Hausdorff distance

Generally, one assumes the sets to be closed (to avoid zero distance) and bounded (to avoid infinite distance). But I will not; in this post I’m not really interested in verifying all the axioms of a metric.

The Gromov-Hausdorff distance is defined between metric spaces {X,Y} as follows: it is the infimum of {d_{ H}(f(X),g(Y))} taken over all isometric embeddings {f\colon X\rightarrow Z} and {g\colon Y\rightarrow Z} into some metric space {Z}.

The infimum over all pairs of embeddings into all conceivable metric spaces does not sound like something you would want to compute in practice. Of course, the matter boils down to equipping the abstract union {X\sqcup Y} with pseudometrics that are compatible with the original metrics on {X} and {Y}.

A more directly computable notion of distance (not necessarily a metric) can be given as follows: {\rho_{GH}(X,Y)} is the infimum of all {\epsilon>0} for which there exist two maps {f\colon X\rightarrow Y} and {g\colon Y\rightarrow X} such that:

  1. {d_X(g\circ f(x),x)\le \epsilon} for all {x\in X}
  2. {d_Y(f\circ g(y),y) \le \epsilon} for all {y\in Y}
  3. {|d_Y(f(x_1), f(x_2)) - d_X(x_1,x_2)| \le \epsilon } for all {x_1,x_2\in X}
  4. {|d_X(g(y_1), g(y_2)) - d_Y(y_1,y_2)| \le \epsilon } for all {y_1,y_2\in Y}

This is not as elegant as “infimize over all metric space”, but is more practical. For example, it is easy to check that the sequence of one-sheeted hyperboloids {H_n = \{x^2+y^2=z^2+1/n\}}

One hyperboloid,...
One hyperboloid,…
... another hyperboloid, ...
… another hyperboloid, …

converges to the cone {C = \{x^2+y^2=z^2\}}.

... and we get a cone.
… and we get a cone.

Using cylindrical coordinates, define {f_n\colon H_n\rightarrow C} by {f_n(r,\theta,z) = (\sqrt{r^2-1/n}, \theta,z) } and {g\colon C\rightarrow H_n} by {g_n(r,\theta,z) = (\sqrt{r^2+1/n}, \theta,z)}, with an arbitrary choice of {\theta} at the point {g(0,0,0)}. Now check the items one by one:

  1. {f_n\circ g_n} is the identity map on {C}
  2. {g_n\circ f_n} fixes all points of {H_n} except for those with {z=0}. The latter are displaced by at most {2/\sqrt{n}}.
  3. follows from {|\sqrt{r^2-1/n} - r|\le 1/\sqrt{n}} on {[1/n,\infty)}
  4. follows from {|\sqrt{r^2+1/n} - r|\le 1/\sqrt{n}} on {[0,\infty)}

Note that the Gromov-Hausdorff convergence of manifolds is understood with respect to their intrinsic metrics. Although both {H_n} and {C} are naturally identified with subsets of {\mathbb R^3}, it would be a mistake to use the Hausdorff distance based on the Euclidean metric of {\mathbb R^3}. Even though this extrinsic metric is bi-Lipschitz equivalent to the intrinsic metric on both {H_n} and {C}, bi-Lipschitz equivalence is too coarse to preserve the GH convergence in general. In general, the intrinsic metric on a manifold cannot be realized as the extrinsic metric in any Euclidean space.

In the example of hyperboloids we are lucky to have a Gromov-Hausdorff convergent sequence of unbounded spaces. Normally, bounded sets are required, and boundedness is imposed either by an exhaustion argument, or by changing the metric (moving to the projective space). For instance, the parabolas {y=x^2/n} do not converge to the line {y=0} as unbounded subsets of {\mathbb R^2}, but they do converge as subsets of {\mathbb R\mathbb P^2}. In the process, the degree jumps down from {2} to {1}.

It seems that the Gromov-Hausdorff limit of algebraic varieties of degree at most {d} is also an algebraic variety of degree at most {d} (provided that we use the intrinsic metric; otherwise flat ellipses {x^2+n^2y^2=1} would converge to a line segment). If this is true, I’m sure there’s a proof somewhere but I never saw one.

Inner-variational equations

It’s been a while since the last time I posted a post-colloquium post. This one is based on a colloquium given by me, but don’t worry: none of my results are included here.

As a warm-up, consider the (trivial) problem of finding the shortest path between two points a,b\in\mathbb R^n. The naive approach is to minimize the length L(f)=\int_0^1 |f\,'(t)|\,dt among all maps f\colon [0,1]\to \mathbb R^n that are sufficiently smooth and satisfy the boundary conditions f(0)=a and f(1)=b. This turns out to be a bad idea: L(f) is neither strictly convex nor differentiable, and the set of minimizing maps is huge, containing some rather nonsmooth specimen.

The right approach is to minimize the energy E(f)=\int_0^1 |f\,'(t)|^2\,dt. While this functional is not immediately related to length for general maps, it is not hard to see that for minimizing maps f we have E(f)=L(f)^2. Indeed, consider performing the inner variation \widetilde{f}(t)=f(t+\epsilon \eta(t)) where \eta\colon [0,1]\to\mathbb R is smooth and vanishes at the endpoints. Expanding the inequality E(\widetilde{f})\ge E(f), we arrive at \int_0^1 |f\,'(t)|^2\eta'(t)\,dt=0, which after integration by parts yields \frac{d}{dt}|f\,'(t)|^2=0. Thus, minimization of energy enforces constant-speed parametrization, and since for constant-speed maps we have E(f)=L(f)^2, the geometric nature of the variational problem has not been lost.

As a side remark, the inner-variational equation \frac{d}{dt}|f\,'(t)|^2=0 could be written as f\,''\cdot f\,'=0, which is a nonlinear second-order equation.

For comparison, try the first variation \widetilde{f}(t)=f(t)+\epsilon \eta(t) where \eta\colon [0,1]\to\mathbb R^n is smooth and vanishes at the endpoints. Expanding the inequality E(\widetilde{f})\ge E(f), we arrive at \int_0^1 f\,'(t)\cdot \eta'(t)\,dt=0, which after integration by parts yields f\,''\equiv 0, a linear second-order equation. This Euler-Lagrange equation immediately tells us what the minimizing map f is: there is only one linear map with the given boundary values. Obviously, f\,''\equiv 0 is a much stronger statement that f\,''\cdot f\,'=0. However, if f+\epsilon \eta is not admissible for some geometric reason (e.g., the curve must avoid an obstacle), the Euler-Lagrange equation may not be available.

Moving one dimension up, consider the problem of parameterizing a given simply-connected domain \Omega\subset \mathbb C. Now we are to minimize the energy of diffeomorphisms f\colon \mathbb D\to\Omega which is defined, as before, to be the sum of squares of derivatives. (Here \mathbb D is the unit disk.) In complex notation, E(f)=\iint_{\mathbb D}(|f_z|^2+|f_{\bar z}|^2). For definiteness assume f is sense-preserving, that is |f_z|\ge |f_{\bar z}|. Minimizers ought to be conformal maps onto \Omega, but how can we see this from variational equations?

The Euler-Lagrange equation that we get from E(f)\le E(f+\epsilon\eta) turns out to be the Laplace equation \Delta f=0. This is much weaker than the Cauchy-Riemann equation f_{\bar z}=0 that we expect. One problem is that \eta must vanish on the boundary: otherwise f+\epsilon\eta will violate the geometric constraint. We could try to move the values of f in the direction tangent to \partial\Omega, but since does not necessarily make sense since the boundary of \Omega could be something like the von Koch snowflake. And of course, it is not at all clear why the minimum of E must be attained by a diffeomorphism. If the class of maps is expanded to include suitable limits of diffeomorphisms, then it’s no longer clear (actually, not true) that f+\epsilon\eta belongs to the same class. All things considered, the approach via the first variation does not appear promising.

Let’s try the inner variation instead. For small \epsilon the map z\mapsto z+\epsilon\eta is a diffeomorphism of \mathbb D, hence its composition with f is as good a candidate as f itself. Furthermore, since the inner variation deals with the model domain \mathbb D and not with the generic domain \Omega, it is easy to allow modification of boundary values: \eta should be tangent to \partial \mathbb D, i.e., \mathrm{Re}\,\bar z\eta(z) should vanish on the boundary. It takes a bit of computation to turn E(f(z+\epsilon \eta(z))) into something manageable, but the final outcome is remarkably simple. The inner-variational equation says that the function \varphi:=f_z\overline{f_{\bar z}} is holomorphic in \mathbb D and z^2 \varphi is real on the boundary \partial \mathbb D. (Technically speaking, \varphi\, dz^2 must be a real holomorphic quadratic differential.) What can we conclude from this? To begin with, the maximum principle implies that z^2 \varphi is a constant function. And since it vanishes at z=0, the inevitable conclusion is \varphi\equiv 0. Recalling the sense-preserving constraint |f_z|\ge |f_{\bar z}|, we arrive at f_{\bar z}\equiv 0, the desired Cauchy-Riemann equation.

Executive summary

  • Suppose we are to minimize some quantity (called “energy”) that depends on function (or a map) f
  • We can consider applying the first variation f+ \epsilon\eta of the inner variation f\circ (\mathrm{id}+\epsilon \eta). Here \eta is a small perturbation which is applied differently: in the first case it changes the values of f, in the second it shuffles them around.
  • Inner variation applies even in the presence of geometric constraints that make first variation illegal. One example of such constraint is “f must be injective”.
  • Inner-variational equations are quite different from the Euler-Lagrange equations. Even for simple quadratic functionals they are nonlinear.
  • Inner-variational equations are useful because they tell us something about the maps of minimal energy.

Linear independence, matroids, and fun with Fano

Take a finite set of vectors v_1,\dots,v_n in some vector space. Call a set A\subset \lbrace 1,2,\dots,n\rbrace independent if the corresponding vectors are linearly independent. The collection of all independent sets is denoted \mathcal{I}; what properties must it have? Well, it must be a matroid:

A matroid with ground set E is a nonempty collection of subsets \mathcal{I}\subseteq 2^E which is both hereditary and augmentable.

  1. Hereditary: if A\in \mathcal{I}, then any subset of A is in \mathcal{I}
  2. Augmentable: if A,B\in \mathcal{I} and A has fewer elements than B, then A can take an element from B and still remain in \mathcal{I}.

The augmentation property brings to mind wealth redistribution (at least if you grew up in the USSR). Anyway, it is fairly obvious that any collection \mathcal{I} that arises from vectors is a matroid. Is the converse true? Not obvious at all.

Behold the Fano plane, the smallest projective plane in existence: 7 points, 7 lines, any two lines meet at one point, through any two points there is a unique line.

Fano Plane

The Fano matroid has these 7 vertices as the ground set. A set of vertices is independent if it has at most three points and is not a line. The hereditary property is clear. The augmentation isn’t hard either: given a 2-point set A, let L be the line through it; since B has 3 points and is not a line, it must contain a point outside of L, and this is the point we add to A.

Can the Fano matroid be realized by 7 vectors in a Euclidean space? Guess what… no.

Suppose we do have such a set of 7 vectors. Then they span a 3-dimensional space, since there is no linearly independent subset with 4 vectors. The three vertices of the triangle are independent, and we may assume they are realized by the standard basis vectors v_1,v_2,v_3. Enumerate the other points/vectors v_4,\dots,v_7 so that v_{j+3} is the midpoint opposite v_j for j=1,2,3. Observe that v_4 is a linear combination of v_2,v_3, etc. Thus, v_1,\dots, v_7 are the columns of the matrix

\displaystyle M= \begin{pmatrix} 1&0&0&0&*&*&* \\ 0&1&0&*&0&*&* \\ 0&0&1&*&*&0&* \end{pmatrix}

But wait, there’s more! Denote v_7=(x,y,z)^T and observe that the linear dependence of v_1,v_4,v_7 forces v_4 to be a scalar multiple of (0,y,z)^T. Similarly for v_5,v_6. So the matrix M must be of the form

\displaystyle M= \begin{pmatrix} 1&0&0&0&bx&cx&x \\ 0&1&0&ay&0&cy&y \\ 0&0&1&az&bz&0&z \end{pmatrix}

But wait, there’s even more! The vectors v_4,v_5,v_6 are also dependent (that inscribed circle is actually a line). The determinant of the 4,5,6 columns is 2abcxyz. Can any of x,y,z be zero? No, because that would make v_7 a linear combination of two of v_1,v_2,v_3, which is not supposed to happen. Can any of a,b,c be zero? No, because that would create a zero vector, and all of the 1- and 2-subsets are independent. So, the Fano matroid cannot be realized by vectors…

unless 2=0. Over \mathbb F_2 there is no problem:

\displaystyle M= \begin{pmatrix} 1&0&0&0&1&1&1 \\ 0&1&0&1&0&1&1 \\ 0&0&1&1&1&0&1 \end{pmatrix}

(P.S. All of this stuff is in Whitney’s 1935 paper that introduced matroids.)

Three knot invariants

This has been a long Monday, toward the end of which I confused knot width with its bridge number. This post is meant to remind me which is which. Here is a trefoil knot:

Trefoil knot

The crossing number is the smallest number of self-intersections that the knot can have when thrown on a plane. For the trefoil cr=3.

Here is the same knot on which points of locally maximal height are marked with red, and points of locally minimal height are in blue. There are two extrema of each kind.

Bridge number

The bridge number of a knot is the smallest number of maxima in any of its diagrams. The trefoil has b=2. A knot is in the bridge position if its diagram achieves b, and in addition all the maxima are above all the minima. This allows one to neatly cut the knot by its bridge sphere (in orange), leaving b untangled arcs hanging from either side.

The third invariant also has to do with the extrema of the height function. Now we pick a regular value in each interval between each pair of consecutive critical values, and count the multiplicity of said value. The sum of all these multiplicities is the width of the knot. For the trefoil w=8.

Width

The knot is in a thin position when it attains w, as on the diagram above. Despite their apparent similarity, two different positions of the knot may be required to attain b and w. Also, the invariants behave in different ways under connected sums. Here is a connected sum of two knots, with trefoil on the left:

Sum of knots from http://en.wikipedia.org/wiki/File:Sum_of_knots3.svg

It’s obvious that the crossing number is subadditive: \mathrm{cr}\,(K\# K')\le \mathrm{cr}\,(K)+\mathrm{cr}\,(K'). It remains an open problem whether it’s in fact additive, i.e., whether the equality always holds.

The bridge number is the best behaved of the three: it’s known that b(K\# K')=b(K)+b(K')-1 (one-sided inequality is easy: connected sum can kill the absolute maximum on the lower-placed knot). Horst Schubert proved this equality in 1954. Much more recently (in 2004) Jennifer Schultens used modern machinery to give a 6-page proof of Schubert’s theorem.

The width turns out to misbehave. By putting one of the summands way below the other, one sees at once that w(K\# K')\le w(K)+w(K')-2. In many examples this is in fact an equality. However, very recently (2011) Blair and Tomova proved that strict inequality may hold; moreover, the Scharlemann-Schultens lower bound w(K\# K')\ge \min(w(K),w(K')) is attained for infinitely many pairs of knots.

The magic of cyclic sieving and q-analogs

In my previous post the 14 possible triangulations of hexagons were arranged by Whitehead moves. But if asked, most people would probably group them in this way:

Triangulations of hexagon
Triangulations of hexagon

Well, some may decide to put the S- and Z-shapes together since they are mirror images of each other. But I won’t. It’s easy to rotate a piece of paper with a triangulation on it; it’s not nearly as easy to fetch a mirror and study triangulations through it. So let it be recorded that the cyclic group \mathbb Z_6 acts on triangulations by rotating the hexagon. The orbits have lengths 6,3,3,2.

In the previous post I also mentioned that the number of triangulations of an (n+2)-gon is the Catalan number C_n=\frac{1}{n+1}\binom{2n}{n}. With n=4 we get

\displaystyle C_4=\frac{1}{5}\,\frac{8\cdot 7\cdot 6\cdot 5}{4\cdot 3\cdot 2}=\frac{8\cdot 7\cdot 6}{4\cdot 3\cdot 2} \qquad (*)

Amazingly, the formula (*) does not just count triangulations: it also knows how they behave under rotations of the hexagons. But it will only tell us if we don’t rush to its evaluation. Instead, replace each number on the right of (*) by its q-analog

\displaystyle [n]_q=\frac{1-q^n}{1-q}  =1+q+q^2+\dots+q^{n-1}

The result is

\displaystyle C_4(q) = \frac{(1-q^8)(1-q^7)(1-q^6)}{(1-q^4)(1-q^3)(1-q^2)} =1+q^2+q^3+2q^4+q^5+2q^6+q^7+2q^8+q^9+q^{10}+q^{12}\qquad (**)

The fact that we get a polynomial can be explained: since the Catalan number is an integer, every prime divisor of the denominator in (*) also appears in the numerator (to equal or higher power), and this can be translated into the order of vanishing of polynomials in (**) at the roots of unity. It’s not as easy to explain why the coefficients end up nonnegative.

Now, both the cyclic group \mathbb Z_6 and the roots of unity were mentioned. Let’s bring them together by introducing \zeta=\exp(2\pi i/6) which generates \mathbb Z_6 by multiplication. The cyclic sieving phenomenon (in this special case) is the following: C_4(\zeta^d) is the number of triangulations fixed by the action of \zeta^d. Indeed, evaluating C_4(q) at q=\zeta,\zeta^2,\dots,\zeta^6 we get 0,2,6,2,0,14. So, no triangulations are invariant under rotation by \zeta, two are invariant under rotation by \zeta^2, etc. It’s easy to read the entire orbit structure from this:

  • there are no fixed points
  • 2 elements of order 2 form a single orbit of size 2
  • 6 elements of order 3 form two orbits of size 3
  • there are no elements of order 4, since their number is C_4(\zeta^4)-C_4(\zeta^2)=2-2=0
  • there are C_4(\zeta^6)-C_4(\zeta^2)-C_4(\zeta^3)=6 elements of order 6, which form a single orbit of size 6

And it’s not just triangulations. Consider noncrossing matchings of 2n points — these can be observed experimentally by sitting 2n people at a round table and asking them all to shake hands with someone with no arms crossing.

Handshake patterns, 8 people
Handshake patterns, 8 people

The number of such matchings is also the Catalan number C_n (hence 14 matchings for 8 people). But the orbit structure is completely different! How can the same polynomial C_4(q) work here? It can because we now act by \mathbb Z_8 and therefore take \zeta=\exp(2\pi i/8). Evaluation of C_4(q) at the powers of \zeta yields 0,2,0,6,0,2,0,14. As above, we see the orbit structure in these numbers:

  • 2 elements of order 2 form a single orbit
  • 4 elements of order 4 form a single orbit
  • 8 elements of order 8 form a single orbit

And it’s not just the Catalan numbers. Suppose that among those 8 guests at the round table are 2 men and 6 women. The number of seating arrangements is \displaystyle \binom{8}{2}=\frac{8\cdot 7}{2}. Don’t evaluate to 28 just yet. Replace each integer with its q-analog and you’ll get the polynomial \displaystyle 1+q+2q^2+2q^3+3q^4+3q^5+4q^6+3q^7+3q^8+2q^9+2q^{10}+q^{11}+q^{12}, which will tell you that under rotation, 4 arrangements form an orbit of size 4, while the other 24 form 3 orbits of size 8.

And there is much more. The cyclic sieving phenomenon was established recently (in 2004) by three mathematicians from Minnesota (Reiner, Stanton, and White), and there are still paths waiting to be explored.