Counting ballots with crystals

In my graduate Advanced Combinatorics class last semester, I covered the combinatorics of crystal base theory. One of the concepts that came up in this context was ballot sequences, which are motivated by the following elementary problem about voting:

Suppose two candidates, A and B, are running for local office. There are 100 voters in the town, 50 of whom plan to vote for candidate A and 50 of whom plan to vote for candidate B. The 100 voters line up in a random order at the voting booth and cast their ballots one at a time, and the votes are counted real-time as they come in with the tally displayed for all to see. What is the probability that B is never ahead of A in the tally?

We’ll provide a solution to this classical problem on page 2 of this post. For now, this motivates the notion of a ballot sequence in two letters, which is a sequence of A’s and B’s such that, as the word is read from left to right, the number of A’s that have been read so far is always at least as large as the number of B’s.

For instance, the sequence AABABB is ballot, because as we read from left to right we get the words A, AA, AAB, AABA, AABAB, and AABABB, each of which has at least as many A’s as B’s. On the other hand, the sequence ABBAAB is not, because after reading the first three letters ABB, there are more B’s than A’s.

If we replace the A’s by $1$’s and $B$’s by $2$’s and reverse the words, we obtain the notion of a ballot sequence in $1$’s and $2$’s described in our previous post on crystals. In particular, we say a sequence of $1$’s and $2$’s is ballot if, when we read the word from right to left, there are at least as many $1$’s as $2$’s at each step. So $221211$ and $211111$ are both ballot, but $111112$ and $211221$ are not.

Enumerating all ballot sequences

When I introduced this notion in class, one of my students asked the following.

How many total ballot sequences of $1$’s and $2$’s are there of length $n$?

Now, as in the first question about voting above, the more common version of this type of question is to fix the number of $1$’s and $2$’s in the sequence (the “content” of the word) and ask how many ballot sequences have exactly that many $1$’s and $2$’s. But in this case, the question was asked with no fixed content, resulting in a sum of Littlewood-Richardson coefficients (or, in voting terms, where the voters have not yet decided who they will vote for when they line up, and may vote for either candidate).

To start, let’s try some examples. For $n=0$, there is only one ballot sequence, namely the empty sequence. For $n=1$, there is also just one: $1$. For $n=2$, there are two: $11$ and $21$. For $n=3$, there are three: $111$, $121$, $211$. For $n=4$, there are six: $1111$, $2111$, $1211$, $1121$, $2211$, $2121$. And for $n=5$, there are ten: $$11111, 21111, 12111, 11211, 11121, 22111, 21211, 21121, 12211, 12121$$

The sequence of answers, $1,1,2,3,6,10,\ldots$, so far agrees with the “middle elements” of the rows of Pascal’s triangle:

$$\begin{array}{ccccccccccc}
& & & & & \color{red}1 & & & & & \\
&&&&\color{red} 1&&1 &&&& \\
&&&1&&\color{red} 2&&1&&& \\
&&1&&\color{red} 3&&3&&1&& \\
&1&&4&&\color{red} 6&&4&&1& \\
1&&5&&{\color{red}{10}}&&10&&5&&1
\end{array}$$

More formally, it appears that the number of ballot sequences of $1$’s and $2$’s of length $2n$ is $\binom{2n}{n}$, and the number of length $2n+1$ is $\binom{2n+1}{n}$.

Now, it is possible to prove this formula holds using a somewhat complicated recursive argument, which we will also illustrate on page 2 of this post. But there is also very elegant solution using crystal operators.

Solution using crystals

Let’s recall the definition of the crystal operator $F_1$ on words of $1$’s and $2$’s. Given such a word, we first replace all $2$’s with left parentheses, “$($”, and all $1$’s with right parentheses, “$)$”. We then “cancel” left and right parentheses in matching pairs as shown in the following example.

\begin{array}{ccccccccccc}
2 & 2 & 1 & 1 & 1 & 1 & 2 & 1 & 2 & 2 & 1 \\
( & ( & ) & ) & ) & ) & ( & ) & ( & ( & ) \\
( & & & ) & ) & ) & & & ( & & \\
& & & & ) & ) & & & ( & &
\end{array}

Once all matching pairs have been cancelled, we are left with a subsequence of the form $$)))\cdots))(((\cdots(($$ consisting of some number of right parentheses (possibly zero) followed by some number of left parentheses (possibly zero). If there is a $)$ remaining, then $F_1$ changes the rightmost $)$ that was not cancelled to $($, changing that $1$ to $2$ in the original word. The word therefore becomes:

\begin{array}{ccccccccccc}
2 & 2 & 1 & 1 & 1 & 2 & 2 & 1 & 2 & 2 & 1.
\end{array}

Thus $F_1(22111121221)=22111221221$. If there were no $)$ symbols remaining after cancelling, the operator $F_1$ is undefined.

Now, consider the directed graph on all words of $1$’s and $2$’s of length $n$, where we draw an arrow from word $w$ to word $v$ if $F_1(w)=v$. Here is the graph for $n=4$:

This graph will in general be a union of disjoint one-directional chains, since when $F_1$ is defined it is invertible: the unbracketed $1$ that is changed to a $2$ is still unbracketed, and we can identify it as the leftmost unbracketed $2$ in the new word. We write $E_1$ to denote this inverse operator, which changes the leftmost unpaired $2$ to a $1$ if it exists, and is undefined otherwise.

We also cannot have cycles in the $F_1$ graph, because the number of $1$’s always decreases with every application of $F_1$. Thus we have chains of arrows going forward until we reach an element $w$ for which $F_1(w)$ is undefined. Similarly, going backwards along the $F_1$ arrows, we can continue until $E_1$ is undefined, and we call these top elements of each chain the highest weight words.

There are six highest weight words in the above diagram: $1111$, $2111$, $1211$, $1121$, $2211$, $2121$. Notice that these are precisely the two-letter ballot sequences of length $6$!

Indeed, if a word is ballot, then every $2$ as a left parentheses will be cancelled with some $1$ as a right parentheses to its right, so $E_1$ is undefined on such a word. Conversely, if a word is not ballot, consider the first step in the right-to-left reading of the word that has more $2$’s than $1$’s. The $2$ that is encountered at that step cannot be bracketed with a $1$ to its right, because there are not enough $1$’s to bracket with the $2$’s in that suffix. Thus a word is ballot if and only if $E_1$ is undefined, which means that it is at the top of its chain, or highest weight.

Since there is exactly one highest weight word per chain, we have the following.

The number of ballot sequences of length $n$ is equal to the number of chains in the $F_1$ crystal graph on all $2^n$ words of $1$’s and $2$’s of length $n$.

So, to count the ballot words, it suffices to count the chains of the $F_1$ graph. And here’s the key idea: instead of counting the top elements, count the middle ones!

In the picture above, the middle elements of each chain are: $$1122, 2112, 1212, 1221, 2211, 2121$$ which is just the set of all words having exactly two $1$’s and two $2$’s, and is clearly counted by $\binom{4}{2}$. Why does this work in general?

Here’s where we need one more fact about the $F_1$ chains: they are “content-symmetric”. If the top element of a chain has $k$ ones and $n-k$ twos, then the bottom element has $n-k$ ones and $k$ twos. This is because the top element, after pairing off an equal number of $2$’s and $1$’s by matching parentheses, has a certain number of unpaired $1$’s, which then all get changed to $2$’s one step at a time as we move towards the bottom of the chain. In particular, the middle element of each chain has exactly as many $1$’s as $2$’s (or, if $n$ is odd, the two “middle elements” have one more $1$ than $2$ and one less $1$ than $2$ respectively.)

Finally, since the graph is drawn on all $2^n$ possible words, every word having the same number of $1$’s as $2$’s (or off by $1$ in the odd case) occurs in exactly one chain. It follows that there is a bijection between the chains and these words, which are enumerated by $\binom{2n}{n}$ for words of length $2n$, and $\binom{2n+1}{n}$ for words of length $2n+1$.

For the more elementary approach, and the solution to the classical ballot problem, turn to the next page!

A linear algebra-free proof of the Matrix-Tree Theorem

As a new assistant professor at Colorado State University, I had the privilege this fall of teaching Math 501, the introductory graduate level course in combinatorics. We encountered many ‘mathematical gemstones’ in the course, and one of my favorites is the Matrix-Tree theorem, which gives a determinantal formula for the number of spanning trees in a graph. In particular, there is a version for directed graphs that can be stated as follows.

Consider a directed graph $D=(V,E)$, consisting of a finite vertex set $V=\{v_1,\ldots,v_n\}$ and a set of directed edges $E\subseteq V\times V$. An oriented spanning tree of $D$ is a subset $T\subset E$ of the edges, along with a chosen root vertex $v_k$, such that there is a unique path in $T$ from any vertex $v_j\in V$ to the root $v_k$. Such a tree is said to be oriented towards $v_k$, since all the edges are `pointing towards’ the root. The term spanning indicates that $T$ is incident to every vertex in $V$. For example, in the digraph $D$ at left below, an oriented spanning tree rooted at $v_9$ is shown using red edges in the graph at right.

Define $\tau(D,v_k)$ to be the number of oriented spanning trees of $D$ rooted at $v_k$. One can check that, in the above graph, we have $\tau(D,v_9)=16$. Now, let $m_{i,j}$ be the number of directed edges from $v_i$ to $v_j$ in $D$, so that $m_{i,j}$ is equal to $1$ if $(v_i,v_j)$ is an edge and $0$ otherwise. Define the Laplacian of the digraph $D$ to be the matrix $$L(D)=\left(\begin{array}{ccccc} \mathrm{out}(v_1) & -m_{1,2} & -m_{1,3} & \cdots & -m_{1,n} \\ -m_{2,1} & \mathrm{out}(v_2) & -m_{2,3} & \cdots & -m_{2,n} \\ -m_{3,1} & -m_{3,2} & \mathrm{out}(v_3) & \cdots & -m_{3,n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ -m_{n,1} & m_{n,2} & -m_{n,3} & \cdots & \mathrm{out}(v_n) \end{array}\right)$$ where $\mathrm{out}(v_i)$ is the outdegree of $v_i$, the number of non-loop edges having starting vertex $v_i$ (that is, the number of edges from $v_i$ to a vertex other than $v_i$). Then the (directed) Matrix-Tree theorem states that $$\tau(D,v_k)=\det(L_0(D,k))$$ where $L_0(D,k)$ is the deleted Laplacian obtained by deleting the $k$th row and column from $L(D)$. For instance, in the above graph, we have $$\det L_0(D,9)=\det \left(\begin{array}{cccccccc} 2 & -2 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & -1 & -1 & 0 & 0 & 0 & 0 & 0 \\ -1 & 0 & 2 & 0 & 0 & -1 & 0 & 0 \\ 0 & -1 & 0 & 2 & -1 & 0 & 0 & 0 \\ 0 & 0 & -1 & 0 & 2 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 & -1 & 2 & 0 & -1 \\ 0 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right)=16$$ There are several known proofs of the Matrix-Tree theorem. One of the more `standard’ proofs is by induction on the number of edges in the digraph, combined with a bit of linear algebra and row reduction. But it got me thinking:

Is there be a way to prove that the determinant formula holds directly, without relying on induction or linear algebra?

In particular, the determinant of a matrix $A=(a_{ij})$ can be defined explicitly as $$\det(A)=\sum_{\pi\in S_n} \mathrm{sgn}(\pi)\prod_{i} a_{i\pi(i)}$$ where $\pi:\{1,2,\ldots,n\}\to \{1,2,\ldots,n\}$ ranges over all permutations (bijections) in the symmetric group $S_n$. For instance, \begin{align*} \det\left(\begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right)&=a_{11}a_{22}a_{33}-a_{12}a_{21}a_{33}-a_{11}a_{23}a_{32} \\ &\phantom{=}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32}-a_{13}a_{22}a_{31}. \end{align*} It is natural to ask whether applying this formula to the deleted Laplacian gives any combinatorial insight into why the Matrix-Tree theorem should hold. And indeed, there is a direct proof using this combinatorial definition of the determinant!

A combinatorial proof

For simplicity we set $k=n$, so that we are deleting the $n$th row and column to create the deleted Laplacian $\det(L_0(D,n))$. It is sufficient to consider this case since we can always relabel the vertices to have the deleted vertex be the $n$th.

We now give a combinatorial interpretation of each of the terms of the determinant $\det(L_0(D,n)$ as a sum over permutations of $\{1,2,\ldots,n-1\}$. The term corresponding to the identity permutation is the product of the diagonal entries of $L_0(D,n)$, which is $$\prod_{i\neq n} \mathrm{out}(v_i).$$ This counts the number of ways of choosing a non-loop edge starting at each vertex $v_i\neq v_n$; we call such a choice an out-edge subgraph $G$ of $D$. Note that all oriented spanning trees with root $v_n$ are out-edge subgraphs, but in general an out-edge subgraph may have cycles among the vertices other than $v_n$. In fact, it is not hard to see that every out-edge subgraph consists a number of nontrivial directed cycles among non-$v_n$ vertices, along with a unique directed path from every other vertex into either one of the cycles or into $v_n$. Two examples of out-edge subgraphs which are not trees are shown below.

Now, for a general term corresponding to a permutation $\pi$ of $\{1,2,\ldots,n-1\}$, consider the decomposition of $\pi$ into disjoint cycles. Suppose there are $p$ fixed points and $r$ nontrivial cycles; let $a_1,\ldots,a_p$ be the fixed points of $\pi$ and $(a_{1}^{(j)}\cdots a_{c_j}^{(j)})$ are the other cycles of lengths $c_1,\ldots,c_r$. Then the sign of $\pi$ is $$\mathrm{sgn}(\pi)=(-1)^{(c_1-1)+\cdots+(c_r-1)}=(-1)^{(n-1-p)-r}.$$ The entries multiplied together in the term corresponding to $\pi$ are the outdegrees of $v_{a_1},\ldots, v_{a_p}$ along with the values $-m_{a_{t}^{(i)},a_{t+1}^{(i)}}$. Their product is $(-1)^{n-1-p}$ times the number of ways to choose an edge from $v_{a_t^{(i)}}$ to $v_{a_{t+1}^{(i)}}$ for each $i$ and $t$. Putting this all together, the entire term of the determinant corresponding to $\pi$ is $(-1)^{r}$ times the number of subgraphs formed by choosing a cyclic path on the vertices corresponding to each nontrivial cycle in $\pi$, as well as an out edge for each fixed point. Such a choice is an out-edge subgraph that is compatible with $\pi$ in the sense that any cycle of $\pi$ corresponds to a cycle on the subgraph.

For some examples of compatibility, the permutations $(123)$, $(123)(57)$, $(57)$, and the identity are compatible with the out-edge subgraph drawn above at left. The permutations $(365)$ and the identity are compatible with the subgraph above at right.

It follows that we can rewrite the determinant as: $$\det L_0(D,n)=\sum_{(G,\pi)} (-1)^{r(\pi)}$$ where $r(\pi)$ is the number of nontrivial cycles of the $\pi$, and where the sum ranges over all pairs $(G,\pi)$ where $G$ is an out-edge subgraph and $\pi$ is a permutation compatible with $G$. (Note that the same out-edge subgraph $G$ may occur several times, paired with different permutations $\pi$.)

We finally construct a sign-reversing involution on the compatible pairs $(G,\pi)$ that cancel all the negative entries in the sum above. In particular, if $G$ has no cycles then send $(G,\pi)$ to itself, and otherwise consider the cycle $C$ in $G$ containing the vertex with the smallest label among all cycles in $G$. Define $\pi’$ by removing $C$ from $\pi$ if $\pi$ contains the cycle $C$, and otherwise adding $C$ to $\pi$ (in other words, toggle whether the elements of $C$ form a cycle or are all fixed points in the permutation). Then $\pi’$ is still compatible with $G$, so we can map $(G,\pi)$ to $(G,\pi’)$ in this case. This forms a sign-reversing involution in which the only non-canceling terms come from the pairs $$(T,\mathrm{id})$$ where $T$ is an out-edge subgraph with no cycles and $\mathrm{id}$ is the identity permutation.

Since a non-cyclic out-edge subgraph on $v_1,\ldots,v_{n-1}$ must be rooted at $v_n$ (for otherwise it would have a cycle), we can conclude that $\det L_0(D,n)$ is the number of spanning trees of $D$ rooted at $v_n$.

On Raising Your Hand

A few weeks ago I attended the AWM (Association of Women in Mathematics) Research Symposium in Houston, TX. I gave a talk in my special session, speaking on queer supercrystals for the first time, to a room full of female mathematicians.

I was a bit disappointed when, at the end of my talk, no one raised their hand to ask any questions.  It’s usually the classic sign of an uninteresting or inappropriately aimed talk, so I figured that maybe I had to revisit my slides and make them more accessible for the next time I spoke on the subject.

Afterwards, however, several of the women in my session came up to me privately to ask specific questions about my research.  When I told my husband about this after the conference, he pointed out that perhaps they just were the kind of people to prefer asking questions one-on-one rather than raising their hands during or after the lecture.

“Did anyone in your session ask questions after the other talks?” he asked me, testing his theory.

I thought about it, and was surprised when I realized the answer.  “Woah, I think you’re right,” I said.  “I asked at least one question after nearly every talk.  But I think I was the only one.  Once in a while one other woman would ask something too.  But the rest kept their hands down and went up to the speaker during the break to ask their questions.”

Upon further reflection, I realized that this was even true during the plenary talks.  During an absolutely fantastic lecture by Chelsea Walton, I was intrigued by something she said.  She mentioned that the automorphism group of the noncommutative ring $$\mathbb{C}\langle x,y\rangle/(xy-qyx)$$ is $\mathbb{C}^{\times} \times \mathbb{C}^{\times}$ for all $q\neq \pm 1$, but the answer is different at $q=1$ and $q=-1$.  I knew that many of the standard $q$-analogs arise naturally in computations in this particular ring, such as the $q$-numbers $$[n]_q=1+q+q^2+\cdots +q^{n-1}.$$ So, I wondered if the exceptions at $q=1$ and $q=-1$ were happening because $q$ was a root of unity, making some of the $q$-numbers be zero.  So maybe she was considering $q$ as a real parameter?  I raised my hand to ask.

“Is $q$ real or complex in this setting?”

“It’s complex,” Chelsea answered.  “Any nonzero complex parameter $q$.”

“Really?” I asked. “And there are no exceptions at other roots of unity?”

“Nope!” she replied with a smile, getting excited now.  “Just at $1$ and $-1$.  The roots of unity get in your way when looking at the representation theory.  But for the automorphism group, there are only two exceptional values for $q$.”  Fascinating!

No one else asked any mathematical questions during or after that talk.


Now, I have the utmost faith in womankind.  And I would normally have chalked the lack of questions and outspokenness up to it being a less mathematically cohesive conference than most, because the participants were selected from only a small percentage of mathematicians (those that happened to be female).  But it reminded me of another time, several years ago, that I had been surprised to discover the same phenomenon among a group of women in mathematics.

One summer I was visiting the Duluth REU, a fantastic research program for undergraduates run by Joe Gallian in the beautiful and remote city of Duluth, Minnesota.  As a former student at the program myself, I visited for a couple of weeks to hang out and talk math with the students.  I attended all the weekly student talks, and as usual, participated heavily, raising my hand to ask questions and give suggestions.

The day before I left, Joe took me aside.  “I wanted to thank you for visiting,” he said.  “Before you came, the women never raised their hand during the other students’ talks.  But after they saw you doing it, suddenly all of them are participating and raising their hands!”

I was floored.  I didn’t know that being a woman had anything to do with asking questions.

I have always felt a little out of place at AWM meetings.  They are inevitably host to many conversations about the struggles faced by women in competitive male-dominant settings, which I have never really related to on a personal level.  I love the hyper-competitive setting of academia.  I live for competition; I thrive in it.  And it never occurs to me to hold back from raising my hand, especially when I’m genuinely curious about why $q$ can be a complex root of unity without breaking the computation.

But, clearly, many women are in the habit of holding back, staying in the shadows, asking their questions in a one-on-one setting and not drawing attention to themselves.  And I wonder how much this phenomenon plays a role in the gender imbalance and bias in mathematics.


At the reception before the dinner at the AWM conference, I spotted Chelsea.  She was, unsurprisingly, quite popular, constantly engaged in conversation with several people at once.  I eventually made my way into a conversation in a group setting with her in it, and I introduced myself.

“Hi, I just wanted to say I really enjoyed your talk!  I was the one asking you whether $q$ was real.”

Her expression suddenly shifted from ‘oh-no-not-another-random-person-I-have-to-meet’ to a warm, smiling face of recognition.  “Oh!  I liked your question!” she exclaimed.  The conversation immediately turned to math, and she was nice enough to walk me through enough computations to convince me that $q=\pm 1$ were special cases in computing the automorphism group of the noncommutative ring.  (See Page 2 of this post for the full computation!)

The entire experience got me thinking.  It was because I raised my hand that Chelsea recognized me, that she was happy to talk to me and mathematics was communicated.  It was because I raised my hand that I got the question out in the open so that other participants could think about it as well.  It was because I raised my hand that women were doing mathematics together.  And perhaps it is because I raise my hand that I have no problem interacting in a male-dominant environment.  After all, they raise their hands all the time.

It is tempting to want to ask the men in mathematics to take a step back and let the women have the limelight once in a while.  But I don’t think that’s the answer in this case.  Men should keep raising their hands.  It’s part of how mathematics gets done.  It helps to communicate ideas more efficiently, to the whole room at once rather than only in private one-on-one settings.  It draws visibility to the interesting aspects of a talk that other participants may not have thought of.

What we really need is for women to come out of the shadows.  So, to my fellow women in mathematics: I’m calling on all of us to ask all our questions, to engage with the seminar room, to not hold back in those immensely valuable times when we are confused.  And raise our hands!

Addicted to Crystal Math

Some of my recent research has centered on crystals, so here’s an introductory post, from a combinatorial point of view, on what crystals are and where they come up.

First, a clarification: the crystals I have in mind are not the chemical kind that you find in nature, nor are they any of these other uses of the word crystal on the Wikipedia disambiguation page. There is yet another use of the word, sometimes referred to as a crystal base, introduced by Kashiwara in the mid-90’s. Kashiwara developed crystal base theory as a way of understanding the $q\to 0$ limit of the representation theory of quantum groups $U_q(\mathfrak{g})$ where $\mathfrak{g}$ is a Lie algebra and $U_q(\mathfrak{g})$ is a $q$-analog of the universal enveloping algebra $U(\mathfrak{g})$. In the quantized statistical mechanics models in which quantum groups come up, the parameter $q$ is a measure of temperature, so the $q\to 0$ limit is exploring what happens in the situation of absolute zero temperature. Hence the word “crystal”, referring to what happens to many forms of matter when frozen.

Although the original motivation for crystals came from quantum groups, the combinatorial structure is also very useful in symmetric function theory. One example of a crystal is a certain weighted, edge-labeled directed graph structure on the summands that comprise a Schur polynomial. Recall that the Schur function $s_{\lambda/\mu}(x_1,\ldots,x_n)$ is given by the sum $$s_{\lambda/\mu}=\sum_{T\in \mathrm{SSYT}(\lambda/\mu)} x^T$$ where $\mathrm{SSYT}(T)$ is the set of semistandard Young tableaux of skew shape $\lambda/\mu$, and $x^T=x_1^{m_1}x_2^{m_2}\cdots$ where $m_i$ is the number of $i$’s in $T$. For instance, we have $$s_{(2,1)}(x_1,x_2,x_3)=x_1^2x_2+x_1x_2^2+x_1^2x_3+2x_1x_2x_3+x_1x_3^2+x_2^2x_3+x_2x_3^2,$$ since the eight semistandard Young tableaux of straight shape $(2,1)$ are:

We can organize these tableaux by weight in a 2-dimensional grid, since each weight is a tuple $(m_1,m_2,m_3)$ where $m_1+m_2+m_3=3$. We angle the grid so that either changing the weight by $(-1,1,0)$ or by $(0,-1,1)$ is “lowering” the weight, going downwards at a 45 degree angle to the right or left. We will then connect some of the tableaux along the edges of the grid with edges labeled by weight-lowering operators $F_1$ and $F_2$ (with inverses $E_1$ and $E_2$), shown in red and blue below:

In general $F_i$ changes the weight by $(0,0,\ldots,-1,1,0,\ldots,0)$ with the $-1$ in the $i$-th position, and we think of $F_i$ as an operator on tableaux that changes an $i$ into an $i+1$. To define $F_i(T)$, we consider the reading word $w$ of the tableau $T$, formed by concatenating the rows from bottom to top, and consider just the $i$ and $i+1$ entries of $w$. Replace each $i$ with a closed bracket ‘)’, and each $i+1$ with an open bracket ‘(‘, and cancel the brackets in matching pairs until only a word of the form $$)))\cdots ))(((\cdots (($$ (or alternatively, of the form $ii\cdots ii(i+1)(i+1)\cdots(i+1)$) remains. Then $F_i$ changes the last $i$ in this subsequence to $i+1$. For example, if $T$ looks like:

then to apply $F_1$ to $T$, we consider the reading word of $1$s and $2$s and cancel matching pairs as follows:

\begin{array}{ccccccccccc}
2 & 2 & 1 & 1 & 1 & 1 & 2 & 1 & 2 & 2 & 1 \\
( & ( & ) & ) & ) & ) & ( & ) & ( & ( & ) \\
( & & & ) & ) & ) & & & ( & & \\
& & & & ) & ) & & & ( & &
\end{array}

Then $F_1$ changes the rightmost $)$ that was not cancelled to $($, changing that $1$ to $2$. The word becomes:

\begin{array}{ccccccccccc}
2 & 2 & 1 & 1 & 1 & 2 & 2 & 1 & 2 & 2 & 1.
\end{array}

Note that $F_i$ may not be defined, if there is no closed bracket that is not cancelled. If no $F_i$ operator may be applied to a tableau $T$, we say that $T$ is lowest weight. The bottom tableau on the crystal diagram above is lowest weight for the operators $F_1,F_2$. Similarly, we can define raising operators $E_i$ as the natural partial inverses of the $F_i$ operators (simply by reversing the arrows in the crystal graph), and we define highest weight elements as those for which all $E_i$ operators are undefined.

Now here’s the interesting thing: highest weight tableaux or words are precisely the Littlewood-Richardson tableaux that we have encountered in several previous posts. Recall that a tableau is Littlewood-Richardson if its reading word $w=w_1\cdots w_n$ is ballot, meaning that every tail of the word (of the form $w_k\cdots w_n$ for some $k$) has at least as many $1$’s as $2$’s, at least as many $2$’s as $3$’s, and so on. In other words, when reading the word from right to left, the smaller letters always occur more frequently than larger letters. For instance, the word $$12122111$$ is ballot, and it is not hard to see that ballotness in a word with just $1$’s and $2$’s is equivalent to the condition that every $2$ is bracketed with a $1$, so that $E_1$ is undefined.

The key fact about tableau crystals is as follows:

Every connected component of a crystal graph has a unique highest weight element.

For instance, here is the crystal graph for the skew shape $(3,1)/(1)$:

Note that each connected component has a Littlewood-Richardson tableau at the top, and no other elements are killed by all raising operators. Notice also that the first graph above has the same structure as that of the shape $(2,1)$ crystal drawn above. This is no coincidence: another key fact about the operators $F_i$ and $E_i$ is that they commute with Jeu de Taquin rectification, which is easily proven using Knuth equivalence. The tableaux in the shape $(2,1)$ crystal are simply the Jeu de Taquin rectifications of those in the corresponding component of the $(3,1)/(1)$ graph. Similarly, the tableaux in the second component rectify to form the tableau crystal for shape $(3)$. It follows that, if we sum the monomial weights in the graph, we obtain the equation $$s_{(3,1)/(1)}=s_{(2,1)}+s_{(3)},$$ which is an instance of the famous Littlewood-Richardson rule.

In general, let $c^{\lambda}_{\mu\nu}$ be the number of Littlewood-Richardson tableaux of shape $\lambda/\mu$ and content $\nu$. Then recall that the Littlewood-Richardson rule for multiplying Schur functions states:

$$s_\mu s_\nu = \sum_{\lambda} c^{\lambda}_{\mu\nu} s_\lambda,$$

or equivalently, after a bit of symmetric function theory manipulation,

$$s_{\lambda/\mu}=\sum_{\nu} c^{\lambda}_{\mu\nu} s_\nu.$$

The latter equation now follows immediately from the facts about crystals listed above, as we saw for the shape $(3,1)/(1)$. We can also prove the product version directly using crystals, by introducing a tensor product. There is a natural tensor product on crystals coming from their correspondence with Lie algebra representations, which we will define in general on the next page. In combinatorial terms, we can define the tensor product of crystals of tableaux by defining the concatenation of two tableaux $T$ and $S$ to be the tableau $T\cdot S$ formed by drawing $S$ above and to the right of $T$, with the lower left corner of $S$ matching the upper right corner of $T$. For instance, tableaux of skew shape $(3,1)/(1)$ can be formed by concatenating a tableau of shape $(1)$ with a tableau of shape $(2)$. (Note that this concatenates the reading words of $T$ and $S$ as well, and induces the plactic monoid structure on Knuth equivalence classes of words.)

Then the tensor product $\mathcal{B}_\mu \otimes \mathcal{B}_\nu$ of two tableaux crystals on shapes $\mu$ and $\nu$ is the crystal on all words of the form $T\cdot S$ for $T\in \mathcal{B}_\mu$ and $S\in \mathcal{B}_\lambda$, with operators $F_i$ connecting the nodes appropriately. It turns out that every connected component in the resulting crystal are again full crystals of tableaux, with Schur functions as their weight generating functions. We thus obtain a crystal-theoretic interpretation of the multiplicative Littlewood-Richardson rule. Namely, we have
$$\mathcal{B}_\mu \otimes \mathcal{B}_\nu=\bigcup c^\lambda_{\mu\nu} \mathcal{B}_{\lambda}.$$

For instance, $\mathcal{B}_{(1)}\otimes \mathcal{B}_{(2)}$ is the crystal whose graph is precisely the diagram of $\mathcal{B}_{(3,1)/(1)}$ shown above, so $$\mathcal{B}_{(1)}\otimes \mathcal{B}_{(2)}=\mathcal{B}_{(2,1)}\cup \mathcal{B}_{(3)},$$ matching the Schur expansion $s_{(1)}s_{(2)}=s_{(2,1)}+s_{(3)}.$

With these examples in mind, on the next page we will define the tensor category of crystals over any semisimple Lie algebra.

Hockey sticks on planet ABBABA

Question: On the planet ABBABA, the inhabitants have a binary language where the only two letters in their alphabet are A and B. The language is incredibly efficient and complex in that every finite sequence of A’s and B’s is a valid word. How many of the words in this language have exactly five A’s and at most five B’s?

For instance, ABAAABA, AAAAA, and BBBBBAAAAA are all valid such words, since they all have five A’s and no more than five B’s.

For the sake of readers who might want to grapple with this problem on their own, I’ll leave the answer and discussion to the next page. We’ll start with a straightforward counting method, and then go on to find a much faster method that will also lead us to an elegant combinatorial proof of a well-known binomial coefficient identity. Once you think you have the answer, turn to page 2!

The CW complex structure of the Grassmannian

In a previous post, we briefly described the complex Grassmannian $\mathrm{Gr}(n,k)$ as a CW complex whose cells are the Schubert cells with respect to a chosen flag. We’ll now take a closer look at the details of this construction, along the lines of the exposition in this master’s thesis of Tuomas Tajakka (chapter 3) or Hatcher’s book Vector Bundles and $K$-theory (page 31), but with the aid of concrete examples.

Background on CW complexes

Let’s first review the notion of a CW complex (or cell complex), as described in Hatcher’s Algebraic Topology.

An $n$-cell is any topological space homeomorphic to the open ball $B_n=\{v\in\mathbb{R}^n:|v|<1\}$ in $\mathbb{R}^n$. Similarly an $n$-disk is a copy of the closure $\overline{B_n}=\{v\in \mathbb{R}^n:|v|\le 1\}$.

To construct a cell complex, one starts with a set of points called the $0$-skeleton $X^0$, and then attaches $1$-disks $D$ via continuous boundary maps from the boundary $\partial D$ (which simply consists of two points) to $X^0$. The result is a $1$-skeleton $X^1$, which essentially looks like a graph:

This an then be extended to a $2$-skeleton by attaching $2$-disks $D$ via maps from the boundary $\partial D$ (which is a circle) to $X^1$. Note that, as in the picture below, the attaching map may collapse the entire boundary to a point, making the circle into a balloon-like shape.

In general the $n$-skeleton $X^n$ is formed from $X^{n-1}$ by attaching a set of $n$-disks by maps from their boundaries to $X^{n-1}$.

More precisely, to form $X^n$ from $X^{n-1}$, we start with a collection of $n$-disks $D^n_\alpha$ and continuous attaching maps $\varphi_\alpha:\partial D_\alpha^n\to X^{n-1}$. Then $$X^n=\frac{X^{n-1}\sqcup \bigsqcup_\alpha D_\alpha^n}{\sim}$$ where $\sim$ is the identification $x\sim \varphi_\alpha(x)$ for $x\in \partial D^n_\alpha$. The cell complex is $X=\bigcup_n X^n$, which may be simply $X=X^n$ if the process stops at stage $n$.

By the construction, the points $X^0$ along with the open $i$-cells associated to the $i$-disks in $X^i$ for each $i$ are disjoint and cover the cell complex $X$. The topology on $X$ is given by the rule that $A\subset X$ is open if and only if $A\cap X^n$ is open in $X^n$ for all $n$.

The real projective plane

As an example, consider the real projective plane $\mathbb{P}^2_{\mathbb{R}}$. It has a cell complex structure in which $X^0=\{(0:0:1)\}$ is a single point, and $X^1=X^0\sqcup \{(0:1:\ast)\}$ is topologically a circle formed by attaching a $1$-cell to the point at both ends.

Then, $X^2$ is formed by attaching a $2$-cell $\mathbb{R}^2$ to the circle such that the boundary wraps around the $1$-cell twice. Intuitively, this is because the points of the form $(1:xt:yt)$ as $t\to \infty$ and as $t\to -\infty$ both approach the same point in $X^1$, so the boundary map must be a $2$-to-$1$ mapping.

To make this analysis more rigorous, consider $\mathbb{P}^2_{\mathbb{R}}$ as the quotient $(\mathbb{R}^3\setminus\{(0,0,0)\})/\sim$ where $\sim$ is the equivalence relation given by scalar multiplication. In particular there is a natural continuous quotient map $$q:\mathbb{R}^3\setminus \{0\}\to \mathbb{P}^2_{\mathbb{R}}.$$ We can now take $n$-disks in $\mathbb{R}^3\setminus\{0\}$ and map them into $\mathbb{P}^2_{\mathbb{R}}$ via $q$, which will give the CW complex structure described above.

In particular, the point $(0,0,1)$ maps to $(0:0:1)$, giving the $0$-skeleton. Then consider the set of points $$\{(0,a,b):a^2+b^2=1, a\ge 0\}\subseteq \mathbb{R}^3\setminus \{0\},$$ which is a half-circle and thus homeomorphic to a $1$-disk. The quotient map $q$ gives an attaching map of the boundary of this $1$-disk to $(0:0:1)$, by mapping both endpoints $(0,0,1)$ and $(0,0,-1)$ to $(0:0:1)$. It also maps the interior of this disk bijectively to the $1$-cell $(0:1:\ast)$ in $\mathbb{P}^2_{\mathbb{R}}$.

Finally, consider the set $$D=\{(a,b,c):a^2+b^2+c^2=1,a\ge 0\},$$ which is a half-sphere and thus homeomorphic to a $2$-disk. Then under $q$, the interior of $D$ maps to the $2$-cell $(1:\ast,\ast)$, and $q$ gives a $2$-to-$1$ attaching map of the boundary $\partial D=\{(0,b,c): b^2+c^2=1\}$ onto the $1$-skeleton $\{(0:x:y)\}$. Indeed, given a desired ratio $(0:x:y)$, there are two points $(0,b,c)\in \partial D$ mapping to it, since a line with a given slope intersects the unit circle in exactly two points.

The complex projective plane

A nearly identical construction to the one above can help us realize the complex projective plane $\mathbb{P}_{\mathbb{C}}^2$ as a CW complex as well, though the structure is somewhat different.

In particular, we can again start with the point $(0:0:1)$ as our $0$-skeleton, and attach the $1$-cell $(0:1:\ast)$ as follows. Let $$q:\mathbb{C}^3\setminus\{0\}\to \mathbb{P}_{\mathbb{C}}^2$$ be the natural quotient map, and define $$D=\{(0,b,c):|(0,b,c)|=1, b\in \mathbb{R}_{\ge 0}\}\subseteq \mathbb{C}^3\setminus \{0\}.$$ Then we can write $c=x+yi$ and the conditions defining $D$ are equivalent to $b,x,y\in \mathbb{R}$, $b\ge 0$, $b^2+x^2+y^2=1$ (since $|c|^2=c\overline{c}=x^2+y^2$). Thus $D$ is a real half-sphere and hence a $2$-disk. The quotient map takes the interior of $D$, on which $b>0$, bijectively and continuously to $(0:1:\ast)$, while collapsing its entire circular boundary to the $0$-skeleton as the attaching map.

Finally, we can similarly take the image of $$E=\{(a,b,c): |(a,b,c)|=1, a\in \mathbb{R}_{\ge 0}\},$$ which is homeomorphic to a 4-disk, to obtain the remaining points in $\mathbb{P}_{\mathbb{C}}^2$. (Indeed, setting $b=x+yi$ and $c=z+wi$ for real numbers $x,y,z,w$, the space $E$ is cut out by the equations $a^2+x^2+y^2+z^2+w^2=1$, $a\ge 0$ in $5$-dimensional real space, so it is a $4$-hemisphere.) The attaching map on the boundary is “circle-to-one”: every point in the $1$-skeleton has a circle as its preimage in $E$. (Can you prove this?)

Note that these constructions can be continued for higher dimensional projective spaces over $\mathbb{C}$ or $\mathbb{R}$.

The Grassmannian

We now can analyze the complex Grassmannian $\mathrm{Gr}(n,k)$ – the space of $k$-dimensional subspaces of $\mathbb{C}^n$ – in a similar manner to that of a complex projective space. In fact, $\mathbb{P}_{\mathbb{C}}^n=\mathrm{Gr}(n+1,1)$, so we have already done this for some Grassmannians.

For $\mathrm{Gr}(n,k)$ with $k\ge 2$, we use a similar quotient map, but this time from the Stiefel manifold $V_k=V_k(\mathbb{C}^n)$, the space of orthonormal $k$-frames. That is, $V_k$ is the space of all tuples of linearly independent vectors $(v_1,\ldots,v_k)\in (\mathbb{C}^n)^k$ for which $$\langle v_i, v_j\rangle=\delta_{ij}$$ for all $i$ and $j$. (Here $\langle-,-\rangle$ can be taken as the standard Hermitian inner form $\langle (z_1,\ldots,z_n), (w_1,\ldots,w_n) \rangle=\sum z_i \overline{w_i}$.)

Now, there is a natural quotient map $q:V_k\to \mathrm{Gr}(n,k)$ given by row-reducing the matrix formed by the $k$-frame, as we did in this post. Note that $V_1=\mathbb{C}^n\setminus\{0\}$ is the space we used above for projective space, and that $q$ matches our map above in this case.

We will not go into full details of the CW complex construction here, since Tajakka’s paper does it so nicely already. Instead, let’s outline the construction for one of the nontrivial attaching maps in $\mathrm{Gr}(4,2)$.

Recall that in $\mathrm{Gr}(4,2)$, the Schubert varieties are indexed by partitions whose Young diagram fits inside a $2\times 2$ rectangle. The Schubert variety $\Omega_{(2,2)}$ (with respect to the standard flag) consists of the single point of the Grassmannian represented by the row-reduced matrix

$$\left[\begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right]$$

Use this point as the $0$-skeleton of the Grassmannian. For the $2$-skeleton, we attach the $2$-cell given by $$\Omega_{(2,1)}^{\circ}=\left[\begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 1 & \ast & 0 \end{array}\right]$$ to the $0$-skeleton, which is the unique boundary point in its closure. So the $1$-skeleton is actually homeomorphic to the complex projective line.

We now construct the attaching maps that show that if we include the two $4$-dimensional (over $\mathbb{R}$) Schubert cells given by $\Omega_{(1,1)}^{\circ}$ and $\Omega_{(2)}^{\circ}$, we get a $4$-dimensional CW complex. Consider the cell $$\Omega_{(1,1)}=\left[\begin{array}{cccc} 0 & 0 & 1 & \ast \\ 0 & 1 & 0 & \ast \end{array}\right].$$ Define $E$ to be the following subspace of $V_2(\mathbb{C}^4)$:
$$D=\{(v_1,v_2)\in V_2: v_1=(0,0,a,b), v_2=(0,c,d,e)\text{ for some }b,d,e\in \mathbb{C}, a,c\in \mathbb{R}_{\ge 0}\}.$$ Then $D$ maps under $q$ to $\Omega_{(1,1)}=\overline{\Omega_{(1,1)}^\circ}$, its interior mapping to $\Omega_{(1,1)}^\circ$.

We now wish to show that $D$ is topologically a $4$-disk, making $q$ an attaching map. This is the tricky part in general, but we’ll describe the proof explicitly in this special case.

Let $$D_1=\{(0,0,a,b):a\ge 0, b\in \mathbb{C}, a^2+b\overline{b}=1\}$$ and
$$D_2=\{(0,x,0,z):x\ge 0, z\in \mathbb{C}, x^2+z\overline{z}=1\}.$$ Then both are hemispheres in $\mathbb{R}^3$ and hence homeomorphic to $2$-disks, so their Cartesian product $D_1\times D_2$ is homeomorphic to a $4$-disk. So it suffices to show that $D_1\times D_2$ is homeomorphic to $D$.

In particular, given a pair $(v,u)\in D_1\times D_2$, we can apply a certain rotation $T_v$ to $u$ (depending continuously on $v$) that transforms $(v,u)$ into an independent, orthonormal pair of vectors $(v,T_v(u))\in D$. We can define $T_v$ to be the unique rotation that takes $(0,0,1,0)$ to $v$ and fixes all vectors orthogonal to both $(0,0,1,0)$ and $v$. Then since $(0,0,1,0)$ is already orthogonal to $u$ by the definition of $D_2$, this will give a pair $(v,T_v(u))$ of orthonormal vectors, which are independent since $u$ and $v$ are. Tajakka proves (in much more generality) that this map gives the required homeomorphism.

Back to $\mathbb{R}$eality

The real Grassmannian also has a CW complex structure, given by an almost identical construction to the one above (see Hatcher, page 31). Let’s analyze the map described above, replacing all complex coordinates with real ones.

Let $v=(0,0,a,b)$ with $a^2+b^2=1$ and $a\ge 0$ as above. The map $T_v$, in coordinates, sends $(0,x,0,w)$ to $(0,x,-wb,wa)$. Keeping in mind that $a^2+b^2=1$ and $x^2+w^2=1$, it is easy to see that the vectors $(0,0,a,b)$ and $(0,x,-wb,wa)$ form an orthonormal $2$-frame. The disk $D$ in this setting is the set of all orthonormal $2$-frames of this form with $a,x\ge 0$.

Taking the image of $D$ under the quotient map $q:V_2(\mathbb{R}^4)\to \mathrm{Gr}_{\mathbb{R}}(4,2)$, it’s not hard to show that the restriction to the boundary is a $2$-to-$1$ map, as in the construction of the real projective plane. Indeed, if $q((0,0,a,b),(0,x,0,w))$ is the point of the Grassmannian represented by $$\left[\begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 1 & c & 0 \end{array}\right],$$ this forces $a=0$, $b=\pm 1$, $x=1$, $w=-c/b$. So there are two preimages of this point, determined by the two possible values for $b$.

I haven’t worked out (or found in a reference) what the degrees of the attaching maps are for the higher dimensional Schubert cells. If you work out a higher dimensional example, or know of a reference for this, please post it in the comments below!

Thanks to Jake Levinson for enlightening mathematical discussions that motivated me to write this post.

Higher specht polynomials

There are polynomials. There are Specht polynomials. And then there are higher Specht polynomials.

A colleague recently pointed out to me the results of this paper by Ariki, Terasoma, and Yamada, or more concisely summarized here. The authors give a basis of the ring of coinvariants $$R_n=\mathbb{C}[x_1,\ldots,x_n]/(e_1,\ldots,e_n)$$ which, unlike the other bases we’ve discussed on this blog so far, respects the decomposition into irreducible $S_n$-modules. These basis elements include the ordinary Specht polynomials, but also some polynomials of higher degrees, hence the name “higher” Specht polynomials.

Background: What is a Specht polynomial?

The construction of the irreducible representations of the symmetric group $S_n$ (called Specht modules) is often described in terms of “polytabloids”, but can be equivalently described in terms of a basis of Specht polynomials.

Before we describe this basis, recall that the dimension of the irreducible representation $V_\lambda$ of $S_n$ indexed by the partition $\lambda$ is equal to the number $f^\lambda$ of Standard Young Tableaux (SYT) of shape $\lambda$, i.e., the number of fillings of the Young diagram of $\lambda$ with the numbers $1,\ldots,n$ in such a way that the entries are increasing down rows and across columns. An example is shown below for $n=7$ and $\lambda=(3,3,1)$:

Given a standard Young tableau $T$, the Specht polynomial $F_T$ is defined as the following product taken over all of its columns $C$
$$F_T(x_1,\ldots,x_n)=\prod_{C} \prod_{i\lt j\in C}(x_j-x_i)$$

For instance, in the above tableau, $$F_T(x_1,\ldots,x_7)=(x_3-x_1)(x_4-x_1)(x_4-x_3)(x_6-x_2)(x_7-x_5).$$

In other words, the Specht polynomial for $T$ is the product of the Vandermonde determinants given by each of its columns’ entries. It is known that the set of polymomials $F_T$, where $T$ ranges over all SYT’s of shape $\lambda$, form a basis of their span, and their span is isomorphic to $V_\lambda$ as an $S_n$-module (here the $S_n$-action is the usual action on the variables $x_1,\ldots,x_n$).

Specht polynomials in the coinvariant ring

As described in detail in this post, the coinvariant ring is the $S_n$-module $$R_n=\mathbb{C}[x_1,\ldots,x_n]/(e_1,\ldots,e_n)$$ where $e_i$ is the $i$th elementary symmetric function in the variables $x_1,\ldots,x_n$.

The ring $R_n$ inherits the action of $S_n$ on the polynomial ring, and we saw in the previous post on the coinvariant ring that it is isomorphic to the regular representation as an $S_n$-module. It is also graded by polynomial degree, with Hilbert series equal to $(n)_q!$. There are many known bases of $n!$ elements that respect this grading, but the ones we have described in previous posts do not respect the $S_n$-action. In particular, can we find an explicit basis which partitions into bases of the irreducible components in $R_n$?

To start, the Specht polynomials $F_T$ defined above are nonzero in $R_n$ and span one copy of each Specht module. But we know that in the regular representation, there are $f^\lambda$ copies of the irreducible representation $V_\lambda$ for each $\lambda$, and this construction only gives one copy of each. In fact, since the copies of $V_\lambda$ must be alternating in the column variables, they are the lowest-degree copies of each. Hence, we need higher Specht polynomials.

To define them, we first need to understand the RSK correspondence and the charge statistic.

The tools: RSK and charge

Somehow, in all of my years of blogging, I have never actually described the RSK (Robinson-Schensted-Knuth) correspondence. By the nature of the regular representation of $S_n$, we must have $$\sum_{\lambda \vdash n} (f^{\lambda})^2=n!$$ since each irreducible representation $V_\lambda$ of dimension $f^\lambda$ occurs exactly $f^{\lambda}$ times, and the regular representation has dimension $n!$. Recall also that $f^\lambda$ is the number of standard Young tableaux of shape $\lambda$.

The RSK correspondence gives a combinatorially described bijection between pairs of standard Young tableaux of the same partition shape $\lambda$ of $n$ (for some $\lambda$) and permutations of $n$. It’s a beautiful bijection, but since the readers of this post will most likely already be familiar with it, I will simply refer the reader to Simon Rubenstein-Salzedo’s clearly-written introduction to RSK.

The upshot is that there is one higher Specht polynomial for each pair of standard Young tableaux $(S,T)$ of the same shape, denoted $F_T^S$. The degree of $F_T^S$ is given by the charge statistic on $S$, defined as follows. Let $w(S)$ be the reading word of $S$, formed by concatenating the rows of $S$ from bottom to top, and assign subscripts $s(i)$ to the letters $i$ of the reading word recursively as follows. Assign the letter $1$ the subscript $s(1)=0$, and for each $i\ge 1$, if $i+1$ is to the right of $i$ then assign it $s(i+1)=s(i)$, and otherwise $s(i+1)=s(i)+1$. The charge of $S$ is the sum of the subscripts $s(i)$.

For example, let $S$ and $T$ be the tableaux given below:

Then the charge of $S$ is computed by first forming the reading word $$w(S)=3267145$$ and then placing subscripts starting from the letter $1$ according to the rules above:
$$3_2 2_1 6_3 7_3 1_0 4_2 5_2$$
Then the charge of $S$ is $2+1+3+3+0+2+2=13$.

The lowest-degree (minimal charge) case occurs when $S=S_0$ is the canonical choice having elements $1,2,3,\ldots,n$ written in order left to right in each row from top to bottom, as in:

In this case we will recover the standard Specht polynomials, and the others are the higher Specht polynomials.

Construction of the Higher Specht polynomials

We now have the tools to describe the construction. Given a pair $(S,T)$ of standard Young tableaux of the same shape $\lambda$, let $x_T^{\mathrm{charge}(S)}$ be the monomial $x_1^{i_1}\cdots x_n^{i_n}$ where $i_k$ is the charge subscript on the cell in $S$ corresponding to the cell containing $k$ in $T$. For example, if we fill the boxes of $S$ in the example above with their corresponding subscripts in the charge computation, it looks like:

and we have $x_T^{\mathrm{charge}(S)}=x_2^2x_3x_4^2x_5^2x_6^3x_7^3$.

Then the higher Specht polynomial $F_T^S$ is defined as $$F_T^S(x_1,\ldots,x_n)=\sum_{\tau\in \mathrm{Col}(T)}\sum_{\sigma\in \mathrm{Row}(T)} \mathrm{sgn}(\tau)\tau\sigma x_T^{\mathrm{charge}(S)}$$ where $\mathrm{Row}(T)$ and $\mathrm{Col}(T)$ are the subgroups of $S_n$ generated by the permutations of the rows of $T$ and the columns of $T$, respectively.

Notice that this polynomial has degree given by the charge of $S$, which is minimal when $S=S_0$ is the canonical tableau of shape $\lambda$ as described above. And when $S=S_0$, $x_T^{\mathrm{charge}(S)}$ is the product $\prod_{i} x_i^{h_i}$ where $h_i$ is the row of the entry $i$ in $T$. It’s not hard to see that antisymmetrizing this monomial over the column permutations recovers the ordinary Specht polynomial $F_T$. Furthermore, symmetrizing across rows keeps the monomial fixed, and so the higher Specht polynomial $F_T^{S_0}$ is simply a constant multiple of the ordinary Specht polynomial $F_T$.

In general the Specht module $V_\lambda$ occurs once for each tableau $S$ of shape $\lambda$, giving the right multiplicities for each irreducible representation.

Thanks to some Sage code by Nicolas Thiery, we can use this formula to efficiently compute the higher Specht polynomials. Here are the polynomials for $n=3$:

\[
6 \\
2(y-x) \\
2(z-x) \\
z(y-x) \\
y(z-x) \\
(y-x)(z-x)(z-y)
\]

Note that the first three and the last are (scalar multiples of) ordinary Specht polynomials, and $z(y-x)$ and $y(z-x)$ are two higher Specht polynomials giving another copy of $V_{(2,1)}$.

Generalizations

In their paper, Ariki, Terasoma, and Yamada actually construct a more general version of the higher Specht polynomials that respects the restricted $S_\mu$-module structure of $R_n$ where $S_{\mu_1}\times \cdots \times S_{\mu_k}\subseteq S_n$ is any Young subgroup. It’s also natural to ask whether there are generalizations of the higher Specht modules to related rings such as the Garsia-Procesi modules or even (we can dream!) the Garsia-Haiman modules.

Schubert Calculus mini-course

I’ve written a lot about Schubert calculus here over the last few years, in posts such as Schubert Calculus, What do Schubert Curves, Young tableaux, and K-theory have in common? (Part II) and (Part III), and Shifted partitions and the Orthogonal Grassmannian.

I soon found out that writing a lot about your favorite topics on a blog can sometimes get you invited to give talks on said topics. This past June, I gave one of the three graduate mini-courses at the Equivariant Combinatorics workshop at the Center for Mathematics Research (CRM) in Montreal.

Naturally, a lot of what I covered came from the blog posts I already had written, but I also prepared more material on flag varieties and Schubert polynomials, more details on the cohomology of the Grassmannian, and more problems and examples suitable for the workshop. So I organized all of this material into a single lecture notes document available here:

Variations on a Theme of Schubert Calculus

The coolest part was that the CRM had good quality audio/video setup for all three workshops, and so you can also view my lecture videos that accompany these notes at the following five links, and take the entire course yourself:

Lecture 1: Introduction, Projective varieties, Schubert cells in the Grassmannian

Lecture 2: Duality theorem, CW complexes, Homology/cohomology

Lecture 3: Littlewood-Richardson rule, Flag variety, Schubert polynomials

Lecture 4: Cohomology of flag variety, generalized flag varieties

Lecture 5: The orthogonal Grassmannian, recap

You can also find the videos for the other two mini-courses from the summer – Steven Griffeth’s lectures on Cherednik algebras and Jeffrey Remmel’s lectures on symmetric function theory – at this link.

Hope someone out there finds these lectures useful or interesting. I certainly had a blast teaching the course!

Pythagorean triples on a sphere?

It’s 8/15/17, which means it’s time to celebrate! The three numbers making up the date today form a Pythagorean triple, a triple of positive integers $(a,b,c)$ with $a^2+b^2=c^2$. Indeed, $8^2+15^2=64+225=289=17^2$.

Alternatively, by the Pythagorean theorem, a Pythagorean triple is any triple of positive integers which make up the sides of a right triangle:

It’s exciting when all three sides are integers, since many common right triangles’ lengths involve square roots: $(1,1,\sqrt{2})$, $(1,2,\sqrt{5})$, and $(1,\sqrt{3},2)$, to name a few. And these sides aren’t even rational, which the poor Pythagorean scholar Hippasus discovered by proving that $\sqrt{2}$ is irrational and was subsequently drowned to death by his colleagues, according to some historical accounts. So the ancient Pythagoreans in fact only believed in right triangles having rational side lengths.

Of course, given one Pythagorean triple, like $(3,4,5)$, we can construct infinitely many by scaling the sides: $(6,8,10)$, $(9,12,15)$, etc. (In fact, 12/9/15 was the previous Pythagorean triple day, and 12/16/20 will be the next.) So to classify all Pythagorean triples, it suffices to find the reduced triples, those with no common factors greater than $1$.

So the last reduced Pythagorean triple day was back in 2013 on 12/5/13, and the next one won’t be until 7/24/25!

Constructing Pythagorean triples

It’s well known that there are infinitely many reduced Pythagorean triples as well. One beautiful, famous proof of this derives a parameterization of all triples via geometric means:

Consider the unit circle and the point $P=(-1,0)$ on the circle. Let $Q=(0,r/s)$ be a rational point on the $y$-axis inside the circle. Then line $PQ$ intersects the circle at a second point $R$, and it turns out $R$ has rational coordinates as well. Some simple algebra with similar triangles (try it, if you haven’t seen it before!) gives us $$R=\left(\frac{r^2-s^2}{r^2+s^2},\frac{2rs}{r^2+s^2}\right).$$ But since $R$ lies on the unit circle, if $(x,y)$ are the coordinates of $R$ then $x^2+y^2=1$, and substituting and clearing denominators we have $$(r^2-s^2)^2+(2rs)^2=(r^2+s^2)^2$$ (which can also be checked with direct algebraic computation). It follows that $(r^2-s^2,2rs,r^2+s^2)$ is a Pythagorean triple for any integers $r$ and $s$, giving us infinitely many Pythagorean triples. And in fact, for $r$ and $s$ relatively prime of different parity, these triples are reduced.

Spherical considerations

Given that this is a global day to celebrate (assuming you use the standard world calendar), and the Earth is a sphere, I have to wonder whether Pythagorean triples actually exist if drawn on a perfect-sphere approximation of the Earth. First, we’d have to define what we even mean by that – are we measuring in meters? In feet? And what is a right triangle on the surface of a sphere anyway?

The most common definition of a triangle on a sphere is one formed by great circles. A great circle is any circle of maximal diameter around a sphere, in other words, whose plane contains the center of the sphere. So, the equator, the prime meridian, and all longitude lines are great circles on the Earth, but latitude lines are not. A line segment on a sphere is a segment of a great circle, and a (spherical) triangle is a shape formed by three points connected by three (spherical) line segments. The angle between two great circles that intersect is the angle between their planes. (Thanks to Wikipedia for the excellent image below.)

Since our world now has size rather than being a flat plane, let’s set the radius of the Earth to be $1$ unit for simplicity. So we’re working with triangles $ABC$ with a right angle at $C$, and asking when they have rational lengths:

Are there any Pythagorean triples on the unit sphere? Are there infinitely many?

These questions suddenly aren’t very easy. If we scale our sphere down by $\pi/2$ we can get at least one, by taking the equator, the prime meridian, and the $90^\circ$ longitudinal line. This forms a right triangle with all three lengths equal (and all angles right!) and so we can simply scale the Earth to make it any rational lengths we want. But this still doesn’t answer anything about the unit sphere.

There is some hope, however. In this paper by Hartshorne and Van Luijk, the authors show that there are infinitely many Pythagorean triples in the hyperbolic plane, using the Poincare Disk model and some nice hyperbolic trig formulas combined with some Eulerian number theory tricks. So Pythagorean triples are not the sole property of flat Euclidean space.

In addition, there is a “spherical Pythagorean theorem”, which in our notation, since the radius of our sphere is $1$, says that $$\cos(c)=\cos(a)\cos(b)$$ where $a,b,c$ are the side lengths of the triangle and $c$ is opposite the right angle. And yet, this offers little insight into whether this equation has rational solutions. Trig functions are generally much harder to deal with than polynomials, especially when it comes to solving equations over the rationals.

So, if you have any insights on this problem (or references to papers that have worked on it – I couldn’t find any in an initial search, but I am not very familiar with it), please let me know in the comments! And until then, happy reduced-Pythagorean-triple-in-flat-Euclidean-space day!

A faster divisibility rule for $7$ (and $13$)

I occasionally teach evening online classes for Art of Problem Solving (artofproblemsolving.com, a fantastic resource for high school students interested in learning mathematics), and one of the lessons I taught recently was on fast mental arithmetic tricks for testing divisibility by various small integers.

How do you tell if a number is divisible by $2$? Easy: look at the last digit. If that digit is even, the whole number is divisible by $2$, otherwise it’s not.

How do you tell if a number is divisible by $3$? Easy: Add up the digits and see if the sum is divisible by $3$. For instance, $1642$ is not divisible by $3$ because $1+6+4+2=13$ is not, but $1644$ is because $1+6+4+4=15$ is divisible by $3$.

The list goes on; there are nice, well-known divisibility rules for $4$, $5$, $6$, $8$, $9$, $10$, $11$, and $12$. But $7$ turns out to be rather annoying. Many of the existing texts on this subject use a recursive rule that goes something like this: Cross off the last digit, double that digit, and subtract it from the number that remains. Then keep doing that until you get a small number and see if the result is divisible by $7$.

Ouch.

Not only is this method somewhat cumbersome, the small number you get in the end does not tell you the remainder mod $7$ (see this post for an introduction to modular arithmetic), only whether or not that remainder is $0$. But it turns out that one can derive a different method along the same lines as the divisibility rule for $3$. (See this Wikipedia article for many, many divisibility rules!)

First, let’s recall the proof of the rule for $3$. Given a number in base ten, like $1642$, we can write it out as a sum of powers of $10$:

$$1642=1\cdot 10^3+6\cdot 10^2+4\cdot 10+2$$
Notice that $10$ has a remainder of $1$ when divided by $3$, so when looking at remainders mod $3$, we can replace all these $10$’s in the expression above with $1$’s:
$$1642\equiv 1+6+4+2 \pmod{3}$$ and we have recovered the sum of the digits expression.

Similarly, for mod $7$, we can write things out in terms of powers of $10$, and reduce these powers modulo $7$. The number $10$ itself is $3$ mod $7$, then $100\equiv 10^2\equiv 3^2$ is $2$ mod $7$. The next remainders, for $10^3,10^4, 10^5$ are $6,4,5$, and then the remainder when $10^6$ is divided by $7$ is $1$ again, wrapping around to the same value as $10^0$. Writing this out in a table, we see a pattern start to emerge:

$$\begin{array}{cccccccccc}
n & 1 & 10 & 10^2 & 10^3 & 10^4 & 10^5 & 10^6 & 10^7 & 10^8 \\
n \bmod 7 & 1 & 3 & 2 & 6 & 4 & 5 & 1 & 3 & 2
\end{array}$$

Indeed, the remainders $1,3,2,6,4,5$ repeat every six steps, since at each step we are simply multiplying the previous remainder by $10$ and taking the result mod $7$. But we can make this pattern even simpler by replacing $6$ with $-1$, $4$ with $-3$, and $5$ with $-2$, which still have the same remainders mod $7$ (they differ by $7$ from the originals):

$$\begin{array}{cccccccccc}
n & 1 & 10 & 10^2 & 10^3 & 10^4 & 10^5 & 10^6 & 10^7 & 10^8 \\
n \bmod 7 & 1 & 3 & 2 & -1 & -3 & -2 & 1 & 3 & 2
\end{array}$$

So, the pattern simply continues $1,3,2,-1,-3,-2,1,3,2,-1,-3,-2,\ldots$. We can therefore replace the powers of $10$ in a base ten expansion by these smaller digits to compute the number mod $7$. For instance, $$1642=10^3+6\cdot 10^2+4\cdot 10+2\equiv -1+6\cdot 2+4\cdot 3+2=25\equiv 4\pmod 7,$$ so $1642$ has a remainder of $4$ when divided by $7$.

As a larger example, starting with the number $133251140379$, we can make a simple table with the pattern $1,3,2,-1,-3,-2,\ldots$ underneath the digits from right to left (we reverse the direction since the powers of $10$ increase to the left):

$$\begin{array}{cccccccccccc}
1 & 3 & 3 & 2 & 5 & 1 & 1 & 4 & 0 & 3 & 7 & 9 \\
-2 & -3& -1& 2 & 3 & 1 & -2& -3& -1& 2 & 3 & 1
\end{array}$$

Then we can simply multiply the pairs of numbers in column together (and take the results mod 7) and add the results mod $7$. We can write each product below the pairs:

$$\begin{array}{cccccccccccc}
1 & 3 & 3 & 2 & 5 & 1 & 1 & 4 & 0 & 3 & 7 & 9 \\
-2 & -3& -1& 2 & 3 & 1 & -2& -3& -1& 2 & 3 & 1 \\\hline
-2 & -2& -3& 4 & 1 & 1 & -2& -5& 0 & 6 & 0 & 2
\end{array}$$

(Notice that in the computation of the products we sometimes automatically take the result mod $7$: for instance, $5\cdot 3=15\equiv 1\pmod 7$, so under the fifth column we simply write $1$.) Finally, the sum of the numbers in the bottom row is $-2-2-3+4+1+1-2-5+6+2=0$, so $133251140379$ is indeed divisible by $7$.

I tested this method by hand on a few large numbers and compared my calculation time to the iterative subtraction method, and it saves a good chunk of time. The particular number above took me 35 seconds to check with the $1,3,2$-pattern method, and 69 seconds to check with the double-and-subtract rule. Try it out, and let me know in the comments whether you find this method easier!

Addendum: A similar rule for $13$

While this method may become more cumbersome for larger moduli (which will have a longer pattern to memorize), it turns out that powers of $10$ mod $13$ also have an easily-memorizable repeating pattern:

$$\begin{array}{cccccccccc}
n & 1 & 10 & 10^2 & 10^3 & 10^4 & 10^5 & 10^6 & 10^7 & 10^8 \\
n \bmod 13 & 1 & -3 & -4 & -1 & 3 & 4 & 1 & -3 & -4
\end{array}$$

The pattern is $1,-3,-4,-1,3,4$ repeating, which can be memorized by taking the length $3$ string $1,3,4$ and then writing negative signs on every other block of three entries starting at the tens digit. Let’s try it out on our example number above.

$$\begin{array}{cccccccccccc}
1 & 3 & 3 & 2 & 5 & 1 & 1 & 4 & 0 & 3 & 7 & 9 \\
4 & 3 &-1 &-4 &-3 & 1 & 4 & 3 & -1&-4 &-3 & 1 \\\hline
4 &-4 &-3 & 5 &-2 & 1 & 4 & -1& 0 & 1 & 5 & -4
\end{array}$$
We have $4-4-3+5-2+1+4-1+1+5-4=6$, so $133251140379$ has a remainder of $6$ when divided by $13$.