In my graduate Advanced Combinatorics class last semester, I covered the combinatorics of crystal base theory. One of the concepts that came up in this context was ballot sequences, which are motivated by the following elementary problem about voting:
Suppose two candidates, A and B, are running for local office. There are 100 voters in the town, 50 of whom plan to vote for candidate A and 50 of whom plan to vote for candidate B. The 100 voters line up in a random order at the voting booth and cast their ballots one at a time, and the votes are counted real-time as they come in with the tally displayed for all to see. What is the probability that B is never ahead of A in the tally?
We’ll provide a solution to this classical problem on page 2 of this post. For now, this motivates the notion of a ballot sequence in two letters, which is a sequence of A’s and B’s such that, as the word is read from left to right, the number of A’s that have been read so far is always at least as large as the number of B’s.
For instance, the sequence AABABB is ballot, because as we read from left to right we get the words A, AA, AAB, AABA, AABAB, and AABABB, each of which has at least as many A’s as B’s. On the other hand, the sequence ABBAAB is not, because after reading the first three letters ABB, there are more B’s than A’s.
If we replace the A’s by $1$’s and $B$’s by $2$’s and reverse the words, we obtain the notion of a ballot sequence in $1$’s and $2$’s described in our previous post on crystals. In particular, we say a sequence of $1$’s and $2$’s is ballot if, when we read the word from right to left, there are at least as many $1$’s as $2$’s at each step. So $221211$ and $211111$ are both ballot, but $111112$ and $211221$ are not.
Enumerating all ballot sequences
When I introduced this notion in class, one of my students asked the following.
How many total ballot sequences of $1$’s and $2$’s are there of length $n$?
Now, as in the first question about voting above, the more common version of this type of question is to fix the number of $1$’s and $2$’s in the sequence (the “content” of the word) and ask how many ballot sequences have exactly that many $1$’s and $2$’s. But in this case, the question was asked with no fixed content, resulting in a sum of Littlewood-Richardson coefficients (or, in voting terms, where the voters have not yet decided who they will vote for when they line up, and may vote for either candidate).
To start, let’s try some examples. For $n=0$, there is only one ballot sequence, namely the empty sequence. For $n=1$, there is also just one: $1$. For $n=2$, there are two: $11$ and $21$. For $n=3$, there are three: $111$, $121$, $211$. For $n=4$, there are six: $1111$, $2111$, $1211$, $1121$, $2211$, $2121$. And for $n=5$, there are ten: $$11111, 21111, 12111, 11211, 11121, 22111, 21211, 21121, 12211, 12121$$
The sequence of answers, $1,1,2,3,6,10,\ldots$, so far agrees with the “middle elements” of the rows of Pascal’s triangle:
$$\begin{array}{ccccccccccc}
& & & & & \color{red}1 & & & & & \\
&&&&\color{red} 1&&1 &&&& \\
&&&1&&\color{red} 2&&1&&& \\
&&1&&\color{red} 3&&3&&1&& \\
&1&&4&&\color{red} 6&&4&&1& \\
1&&5&&{\color{red}{10}}&&10&&5&&1
\end{array}$$
More formally, it appears that the number of ballot sequences of $1$’s and $2$’s of length $2n$ is $\binom{2n}{n}$, and the number of length $2n+1$ is $\binom{2n+1}{n}$.
Now, it is possible to prove this formula holds using a somewhat complicated recursive argument, which we will also illustrate on page 2 of this post. But there is also very elegant solution using crystal operators.
Solution using crystals
Let’s recall the definition of the crystal operator $F_1$ on words of $1$’s and $2$’s. Given such a word, we first replace all $2$’s with left parentheses, “$($”, and all $1$’s with right parentheses, “$)$”. We then “cancel” left and right parentheses in matching pairs as shown in the following example.
\begin{array}{ccccccccccc}
2 & 2 & 1 & 1 & 1 & 1 & 2 & 1 & 2 & 2 & 1 \\
( & ( & ) & ) & ) & ) & ( & ) & ( & ( & ) \\
( & & & ) & ) & ) & & & ( & & \\
& & & & ) & ) & & & ( & &
\end{array}
Once all matching pairs have been cancelled, we are left with a subsequence of the form $$)))\cdots))(((\cdots(($$ consisting of some number of right parentheses (possibly zero) followed by some number of left parentheses (possibly zero). If there is a $)$ remaining, then $F_1$ changes the rightmost $)$ that was not cancelled to $($, changing that $1$ to $2$ in the original word. The word therefore becomes:
\begin{array}{ccccccccccc}
2 & 2 & 1 & 1 & 1 & 2 & 2 & 1 & 2 & 2 & 1.
\end{array}
Thus $F_1(22111121221)=22111221221$. If there were no $)$ symbols remaining after cancelling, the operator $F_1$ is undefined.
Now, consider the directed graph on all words of $1$’s and $2$’s of length $n$, where we draw an arrow from word $w$ to word $v$ if $F_1(w)=v$. Here is the graph for $n=4$:

This graph will in general be a union of disjoint one-directional chains, since when $F_1$ is defined it is invertible: the unbracketed $1$ that is changed to a $2$ is still unbracketed, and we can identify it as the leftmost unbracketed $2$ in the new word. We write $E_1$ to denote this inverse operator, which changes the leftmost unpaired $2$ to a $1$ if it exists, and is undefined otherwise.
We also cannot have cycles in the $F_1$ graph, because the number of $1$’s always decreases with every application of $F_1$. Thus we have chains of arrows going forward until we reach an element $w$ for which $F_1(w)$ is undefined. Similarly, going backwards along the $F_1$ arrows, we can continue until $E_1$ is undefined, and we call these top elements of each chain the highest weight words.
There are six highest weight words in the above diagram: $1111$, $2111$, $1211$, $1121$, $2211$, $2121$. Notice that these are precisely the two-letter ballot sequences of length $6$!
Indeed, if a word is ballot, then every $2$ as a left parentheses will be cancelled with some $1$ as a right parentheses to its right, so $E_1$ is undefined on such a word. Conversely, if a word is not ballot, consider the first step in the right-to-left reading of the word that has more $2$’s than $1$’s. The $2$ that is encountered at that step cannot be bracketed with a $1$ to its right, because there are not enough $1$’s to bracket with the $2$’s in that suffix. Thus a word is ballot if and only if $E_1$ is undefined, which means that it is at the top of its chain, or highest weight.
Since there is exactly one highest weight word per chain, we have the following.
The number of ballot sequences of length $n$ is equal to the number of chains in the $F_1$ crystal graph on all $2^n$ words of $1$’s and $2$’s of length $n$.
So, to count the ballot words, it suffices to count the chains of the $F_1$ graph. And here’s the key idea: instead of counting the top elements, count the middle ones!

In the picture above, the middle elements of each chain are: $$1122, 2112, 1212, 1221, 2211, 2121$$ which is just the set of all words having exactly two $1$’s and two $2$’s, and is clearly counted by $\binom{4}{2}$. Why does this work in general?
Here’s where we need one more fact about the $F_1$ chains: they are “content-symmetric”. If the top element of a chain has $k$ ones and $n-k$ twos, then the bottom element has $n-k$ ones and $k$ twos. This is because the top element, after pairing off an equal number of $2$’s and $1$’s by matching parentheses, has a certain number of unpaired $1$’s, which then all get changed to $2$’s one step at a time as we move towards the bottom of the chain. In particular, the middle element of each chain has exactly as many $1$’s as $2$’s (or, if $n$ is odd, the two “middle elements” have one more $1$ than $2$ and one less $1$ than $2$ respectively.)
Finally, since the graph is drawn on all $2^n$ possible words, every word having the same number of $1$’s as $2$’s (or off by $1$ in the odd case) occurs in exactly one chain. It follows that there is a bijection between the chains and these words, which are enumerated by $\binom{2n}{n}$ for words of length $2n$, and $\binom{2n+1}{n}$ for words of length $2n+1$.
For the more elementary approach, and the solution to the classical ballot problem, turn to the next page!
Very nice post! Is there a proof of the “hook length formula” using cystals?
That’s a good question! I don’t actually know. I’ll have to think about it!