Can you Prove it… combinatorially?

This year’s Prove it! Math Academy was a big success, and it was an enormous pleasure to teach the seventeen talented high school students that attended this year. Some of the students mentioned that they felt even more inspired to study math further after our two-week program, but the inspiration went both ways – they inspired me with new ideas as well!

Maria Gemstones - 02

One of the many, many things we investigated at the camp was the Fibonacci sequence, formed by starting with the two numbers $0$ and $1$ and then at each step, adding the previous two numbers to form the next: $$0,1,1,2,3,5,8,13,21,\ldots$$ If $F_n$ denotes the $(n+1)$st term of this sequence (where $F_0=0$ and $F_1=1$), then there is a remarkable formula for the $n$th term, known as Binet’s Formula:

$$F_n=\frac{1}{\sqrt{5}}\left( \left(\frac{1+\sqrt{5}}{2}\right)^n – \left(\frac{1-\sqrt{5}}{2}\right)^n \right)$$

Looks crazy, right? Why would there be $\sqrt 5$’s showing up in a sequence of integers?

At Prove it!, we first derived the formula using generating functions. I mentioned during class that it can also be proven by induction, and later, one of our students was trying to work out the induction proof on a white board outside the classroom. She was amazed how many different proofs there could be of the same fact, and it got me thinking: what if we expand each of the terms using the binomial theorem? Is there a combinatorial proof of the resulting identity?

In particular, suppose we use the binomial theorem to expand $(1+\sqrt{5})^n$ and $(1-\sqrt{5})^n$ in Binet’s formula. The resulting expression is:

$$F_n=\frac{1}{\sqrt{5}\cdot 2^n}\left( \left(\sum_{i=0}^n \binom{n}{i}(\sqrt{5})^i \right) – \left(\sum_{i=0}^n (-1)^i\binom{n}{i}(\sqrt{5})^i \right) \right)$$

Note that the even terms in the two summations cancel, and combining the odd terms gives us:

$$F_n=\frac{1}{\sqrt{5}\cdot 2^n}\left( \sum_{k=0}^{\lfloor n/2\rfloor} 2 \binom{n}{2k+1}(\sqrt{5})^{2k+1} \right)$$
Since $(\sqrt{5})^{2k+1}=\sqrt{5}\cdot 5^k$, we can cancel the factors of $\sqrt{5}$ and multiply both sides by $2^{n-1}$ to obtain:

$$2^{n-1}\cdot F_n=\sum_{k=0}^{\lfloor n/2\rfloor} \binom{n}{2k+1}\cdot 5^k.$$

Now, the left hand and right hand side are clearly nonnegative integers, and one handy fact about nonnegative integers is that they count the number of elements in some collection. The proof method of counting in two ways is the simple principle that if by some method one can show that a collection $A$ has $n$ elements, and by another method one can show that $A$ has $m$ elements, then it follows that $m=n$. Such a “combinatorial proof” may be able to be used to prove the identity above, with $m$ being the left hand side of the equation and $n$ being the right.

I started thinking about this after Prove it! ended, and remembered that the $(n+1)$st Fibonacci number $F_n$ counts the number of ways to color a row of $n-2$ fenceposts either black or white such that no two adjacent ones are black. (Can you see why this combinatorial construction would satisfy the Fibonacci recurrence?) For instance, we have $F_5=5$, because there are five such colorings of a row of $3$ fenceposts:
$$\begin{array}{ccc} \square & \square & \square
\\ \\ \square & \square & \blacksquare
\\ \\ \square & \blacksquare & \square
\\ \\ \blacksquare & \square & \square
\\ \\ \blacksquare & \square & \blacksquare
\end{array}$$

Note also that $2^{n-1}$ counts the number of length-$(n-1)$ sequences of $0$’s and $1$’s. Thus, the left hand side of our identity, $2^{n-1}\cdot F_n$, counts the number of ways of choosing a binary sequence of length $n-1$ and also a fence post coloring of length $n-2$. Because of their lengths, given such a pair we can interlace their entries, forming an alternating sequence of digits and fence posts such as: $$1\, \square\, 0\, \square\, 1\, \blacksquare\, 1$$ We will call such sequences interlaced sequences.

We now need only to show that the right hand side also counts these interlaced sequences. See the next page for my solution, or post your own solution in the comments below!

What is a $q$-analog? (Part 2)

This is a continuation of Part 1 of this series of posts on $q$-analogs.

Counting by $q$’s

Another important area in which $q$-analogs come up is in combinatorics. In this context, $q$ is a formal variable, and the $q$-analog is a generating function in $q$, but viewed in a different light than usual generating functions. We think of the $q$-analog of as “$q$-counting” a set of weighted objects, where the weights are given by powers of $q$.

Say you’re trying to count permutations of $1,\ldots,n$, that is, ways of rearranging the numbers $1,\ldots,n$ in a row. There are $n$ ways to choose the first number, and once we choose that there are $n-1$ remaining choices for the second, then $n-2$ for the third, and so on. So there are $n!=n\cdot (n-1)\cdot \cdots \cdot 2\cdot 1$ ways to rearrange the entries. For instance, the $3!=6$ permutations of $1,2,3$ are $123$, $132$, $213$, $231$, $312$, $321$.

Now, say we weight the permutations according to how “mixed up” they are, in the sense of how many pairs of numbers are out of order. An inversion is a pair of entries in the permutation in which the bigger number is to the left of the smaller, and $\mathrm{inv}(\pi)$ denotes the number of inversions of the permutation $\pi$. The table below shows the permutations of 3 along with the number of inversions they contain.

$$\begin{array}{ccc}
p & \mathrm{inv}(p) & q^{\mathrm{inv}(p)}\\\hline
123 & 0 & 1 \\
132 & 1 & q\\
213 & 1 & q\\
231 & 2 & q^2\\
312 & 2 & q^2 \\
321 & 3 & q^3
\end{array}
$$

We weight each permutation $p$ by $q^{\mathrm{inv}(p)}$, and $q$-count by summing these $q$-powers, to form the sum $$\sum_{p\in S_n}q^{\mathrm{inv}(p)}$$ where $S_n$ is the set of all permutations of $1,\ldots,n$. So for $n=3$, the sum is $1+2q+2q^2+q^3$ by our table above.

We now come to an important philosophical distinction between $q$-analogs and generating functions. As a generating function, the sum $1+2q+2q^2+q^3$ is thought of in terms of the sequence of coefficients, $1,2,2,1$. Generatingfunctionologically, we might instead write the sum as $\sum_{i=0}^\infty c_i q^i$ where $c_i$ is the number of permutations of length $n$ with $i$ inversions. But in $q$-analog notation, $\sum_{p\in S_n}q^{\mathrm{inv}(p)}$, we understand that it is not the coefficients but rather the exponents of our summation that we are interested in..

In general, a combinatorial $q$-analog can be defined as a summation of $q$-powers $q^{\mathrm{stat}(p)}$ where $p$ ranges over a certain set of combinatorial objects and $\mathrm{stat}$ is a statistic on these objects. Recall that we defined an “interesting $q$-analog” of an expression $P$ to be an expression $P_q$ such that

  1. Setting $q=1$ or taking the limit as $q\to 1$ results in $P$,
  2. $P_q$ can be expressed in terms of (possibly infinite) sums or products of rational functions of $q$ over some field,
  3. $P_q$ gives us more refined information about something that $P$ describes, and
  4. $P_q$ has $P$-like properties.

Certainly setting $q=1$ in a combinatorial $q$-analog results in the total number of objects, and the $q$-analog gives us more information about the objects than just their total number. It’s also a polynomial in $q$, so it satisfies properties 1, 2, and 3 above. Let’s now see how our $q$-analog $\sum_{p\in S_n}q^{\mathrm{inv}(p)}$, which is a $q$-analog of $n!$, also satisfies property 4.

New-attempt

Notice that $1+2q+2q^2+q^3$ factors as $(1)(1+q)(1+q+q^2)$. Indeed, in general this turns out to be the same $q$-factorial we saw in the last post! That is, $$\sum_{p\in S_n}q^{\mathrm{inv}(p)}=(1)(1+q)(1+q+q^2)\cdots(1+q+\cdots+q^n)=(n)_q!.$$ So it satisfies property 4 by exhibiting a product formula like $n!$ itself. I posted a proof of this fact in this post, but let’s instead prove it by building up the theory of $q$-counting from the ground up.

The multiplication principle in combinatorics is the basic fact that the number of ways of choosing one thing from a set of $m$ things and another from a set of $n$ things is the product $m\cdot n$. But what if the things are weighted?

$q$-Multiplication Principle: Given two weighted sets $A$ and $B$ with $q$-counts $M(q)$ and $N(q)$, the $q$-count of the ways of choosing one element from $A$ and another from $B$ is the product $M(q)N(q)$, where the weight of a pair is the sum of the weights of the elements.

Let’s see how this plays out in the case of $(n)_q!$. If each entry in the permutation is weighted by the number of inversions it forms with smaller entries (to its right), then the first entry can be any of $1,2,\ldots,n$, which contributes a factor of $1+q+q^2+\cdots+q^{n-1}$ to the total. The next entry then can be any of the $n-1$ remaining entries, and since the first entry cannot be the smaller entry of an inversion with the second, this choice contributes a factor of $1+q+q^2+\cdots+q^{n-2}$ to the total by the same argument. Continuing in this fashion we get the $q$-factorial as our $q$-count.

Notice that even the proof was a $q$-analog of the proof that $n!$ is the number of permutations of $1,\ldots,n$, now that we have the $q$-Multiplication Principle.

That’s all for now! In the next post we’ll talk about how to use the $q$-Multiplication Principle to derive a combinatorial interpretation of the $q$-binomial coefficient, and discuss $q$-Catalan numbers and other fun $q$-analogs. Stay tuned!

What is a $q$-analog? (Part 1)

Hi, I’m Maria and I’m a $q$-analog addict. The theory of $q$-analogs is a little-known gem, and in this series of posts I’ll explain why they’re so awesome and addictive!

So what is a $q$-analog? It is one of those rare mathematical terms whose definition doesn’t really capture what it is about, but let’s start with the definition anyway:

Definition: A $q$-analog of a statement or expression $P$ is a statement or expression $P_q$, depending on $q$, such that setting $q=1$ in $P_q$ results in $P$.

So, for instance, $2q+3q^2$ is a $q$-analog of $5$, because if we plug in $q=1$ we get $5$.

Sometimes, if $P_q$ is not defined at $q=1$, we also say it’s a $q$-analog if $P$ can be recovered by taking the limit as $q$ approaches $1$. For instance, the expression $$\frac{q^5-1}{q-1}$$ is another $q$-analog of $5$ – even though we get division by zero if we plug in $q=1$, we do have a well defined limit that we can calculate, for instance using L’Hospital’s Rule: $$\lim_{q\to 1} \frac{q^5-1}{q-1}=\lim_{q\to 1} \frac{5q^4}{1} = 5.$$

Now of course, there are an unlimited supply of $q$-analogs of the number $5$, but certain $q$-analogs are more important than others. When mathematicians talk about $q$-analogs, they are usually referring to “good” or “useful” $q$-analogs, which doesn’t have a widely accepted standard definition, but which I’ll attempt to define here:

More Accurate Definition: An interesting $q$-analog of a statement or expression $P$ is a statement or expression $P_q$ depending on $q$ such that:

  1. Setting $q=1$ or taking the limit as $q\to 1$ results in $P$,
  2. $P_q$ can be expressed in terms of (possibly infinite) sums or products of rational functions of $q$ over some field,
  3. $P_q$ gives us more refined information about something that $P$ describes, and
  4. $P_q$ has $P$-like properties.

Because of Property 2, most people would agree that $5^q$ is not an interesting $q$-analog of $5$, because usually we’re looking for polynomial-like things in $q$.

cropped-pic-fixed

On the other hand, $\frac{q^5-1}{q-1}$, is an excellent $q$-analog of $5$ for a number of reasons. It certainly satisfies Property 2. It can also be easily generalized to give a $q$-analog of any real number: we can define $$(a)_q=\frac{q^a-1}{q-1},$$ a $q$-analog of the number $a$.

In addition, for positive integers $n$, the expression simplifies:
$$(n)_q=\frac{q^n-1}{q-1}=1+q+q^2+\cdots+q^{n-1}.$$
So for instance, $(5)_q=1+q+q^2+q^3+q^4$, which is a natural $q$-analog of the basic fact that $5=1+1+1+1+1$. The powers of $q$ are just distinguishing each of our “counts” as we count to $5$. This polynomial also captures the fact that $5$ is prime, in a $q$-analog-y way: the polynomial $1+q+q^2+q^3+q^4$ cannot be factored into two smaller-degree polynomials with integer coefficients. So the $q$-number $(5)_q$ also satisfies Properties 3 and 4 above: it gives us more refined information about $5$-ness, by keeping track of the way we count to $5$, and behaves like $5$ in the sense that it can’t be factored into smaller $q$-analogs of integers.

But it doesn’t stop there. Properties 3 and 4 can be satisfied in all sorts of ways, and this $q$-number is even interesting than we might expect. It comes up in finite geometry, analytic number theory, representation theory, and combinatorics. So much awesome mathematics is involved in the study of $q$-analogs that I’ll only cover one aspect of it today: $q$-analogs that appear in geometry over a finite field $\mathbb{F}_q$. Turn to the next page to see them!

The r-major index

In this post and this one too, we discussed the inv and maj statistics on words, and two different proofs of their equidistribution. In fact, there is an even more unifying picture behind these statistics: they are simply two different instances of an entire parameterized family of statistics, called $r\text{-maj}$, all of which are equidistributed!

Rawlings defined an $r$-inversion of a permutation $\pi$ to be a pair of entries $(\pi_i,\pi_j)$ with $i\lt j$ and $$0\lt \pi_i-\pi_j\lt r.$$ For instance, $21534$ has three total inversions, $(2,1)$, $(5,4)$, and $(5,3)$, but only the first two have $\pi_i-\pi_j<2$, so it has two $2$-inversions. He also defined an $r$-descent to be an index $i$ for which $$\pi_i\ge \pi_{i+1}+r,$$ so that $21534$ has only position $3$ as a $2$-descent.

Finally, he defines the $r\text{-maj}$ of a permutation to be $$r\text{-}\mathrm{maj}(\pi)=\left(\sum_{\pi_i\ge \pi_{i+1}+r}i\right)+\#r\text{-}\mathrm{inversions}.$$ Thus $2\text{-}\mathrm{maj}(21534)=3+2=5$. Notice that $1\text{-maj}$ is the usual major index, and $n\text{-maj}$ is the inv statistic!

Rawling’s result is that these statistics all have the same distribution: for any $r,s\ge 1$, the number of permutations of $\{1,2,\ldots,n\}$ having $r\text{-maj}$ value $k$ is the same as the number of them having $s\text{-maj}$ value $k$ for any $k$. More succinctly, $$\sum_{\pi\in S_n} q^{r\text{-maj}(\pi)}=\sum_{\pi\in S_n} q^{s\text{-maj}(\pi)}.$$

A colleague of mine mentioned this result and challenged me to prove it without reading the proof first, so here goes. I challenge the reader to do the same before turning to the next page. Good luck!

Symmedians and circumcircles

Classical Euclidean geometry is one of the oldest branches of mathematics, and in my opinion still as beautiful as ever. Here is a cute little geometric gemstone that was introduced to me recently by a fellow staff member at MOSP.

Consider a triangle $ABC$ drawn in the plane. Let $M$ be the midpoint of $BC$, and let $O$ be the circumcenter of triangle $ABC$, that is, the center of the circle $\Omega$ passing through $A$, $B$, and $C$.

symmedians1

Now let $\omega$ the circumcircle of triangle $BOC$, and let $D$ be the point on $\omega$ outside of $ABC$ for which $MD$ is perpendicular to $BC$. Then it turns out that the line $AD$ can be obtained by reflecting the median $AM$ across the angle bisector of angle $A$! This reflection of $AM$ is called a symmedian of the triangle.

symmedians2

How can we go about proving this? Well, it suffices to show that angle $BAD$ is congruent to angle $CAM$. One way to go about showing angle measures are equal is by finding similar triangles. Triangles $AMC$ and $ABD$ look promising; let’s start by calculating angle $ABD$. This is equal to angle $B$ plus angle $CBD$, which is an inscribed angle on circle $\omega$. By the inscribed angle theorem (there is a beautiful interactive demonstration at that link!) angle $CBD$ is equal to angle $DOC$, which is half the arc $BC$ on circle $\Omega$. Thus angle $DOC$, and hence angle $CBD$, is equal to angle $A$ of the triangle.

We now have that angle $ABD$ is equal to angle $B$ plus angle $A$, which is $180-C$ (in degrees). So angle $ADB$ is $C-\theta$ where $\theta$ is angle $BAD$. Unfortunately this is not congruent to angle $C$, so triangles $AMC$ and $ABD$ are not similar.

However, they’re rather close to being similar, in the following sense. Suppose we extend line $AB$ to a point $X$ for which $XD=DB$. Then triangle $BXD$ is isosceles, so angle $AXD$ is equal to angle $XBD$, which is supplementary to angle $ABD$. By our calculation above, angle $ABD$ is $A+B$ and so $AXD=XBD=C$. So, it is possible that triangles $ADX$ and $AMC$ are similar; this would suffice. (Notice also that this is a classic case of the falsity of “Angle Side Side” similarity – if it only takes an angle and two side lengths, then $ABD$ and $AMC$ would be similar too!)

symmedians3

Finally, we also extend $XD$ to meet the extension of $AC$ at $Y$. Then by looking at the sum of the angles in triangle $AXY$, we have that the angle at $Y$ is angle $B$. We also know by symmetry that angle $DCY$ is $B$, and so $DCY$ is isosceles and $DC=DY$. But $DC=DB$ as well since $D$ lies on the perpendicular bisector of $BC$. So $$DY=DC=DB=DX,$$ and so $D$ is the midpoint of $XY$!

We are now almost done. Triangles $ABC$ and $AYX$ are now similar since all their angles are congruent, so we have $$\frac{AC}{AX}=\frac{BC}{XY}=\frac{2MC}{2DX}=\frac{MC}{DX},$$ so by Side Angle Side similarity, triangles $AMC$ and $ADX$ are similar. So indeed, angle $BAD$ is equal to angle $MAC$.

Geometry of the real projective plane

I’ll be flying to Nebraska tomorrow to teach at the Math Olympiad Summer Program. One of my more advanced classes will be on projective geometry and how it can help us understand Euclidean geometry. I thought I would share the introduction from my handout, since it is an attempt to demonstrate one of the many reasons projective geometry is such a gemstone:

General projective planes

Let’s start with some general theory. A projective plane is any set $P$, which we call the set of points, and a nonempty collection $L$ of subsets of $P$ called lines, such that the following four axioms hold:

P1: Every pair of distinct points are contained in exactly one line.

P2: Every pair of distinct lines intersect in exactly one point.

P3: There are four distinct points $a,b,c,d\in P$ no three of which lie on a line.

P4: Every line contains at least three points.

Axiom P2 raises some alarms. In Euclidean geometry, we assume that there are parallel lines that never meet, but here, we’re requiring that every pair of lines intersect. So the usual Euclidean plane is not a projective plane.

However, there are many interesting projective planes that do come in handy. The simplest and smallest example of a projective plane is known as the Fano plane, or $F_7$, consisting of seven points and seven lines as shown below.

Fano

Notice the terms “point” and “line” are interchangeable in the axioms above. We can in fact interchange the roles of points and lines in any projective plane to obtain another projective plane, called the “dual” projective plane. Some projective planes, like the Fano plane above, are even self-dual! (In the diagram above, we can associate each point $A, B, C,\ldots$ with its corresponding lowercase-letter line $a, b, c,\ldots$, and incidence is preserved: points $A$ lies on lines $b$, $c$, and $d$, whereas line $a$ contains points $B$, $C$, and $D$. And so on.)

The projective plane we’ll be dealing with today is the real projective plane, which is an extension of the Euclidean plane that gets rid of those nasty parallel lines by adding an extra line out at infinity. Let’s take a look.

The real projective plane

The real projective plane was first defined by Desargues as an extension of the Euclidean plane $\mathbb{R}^2$ that forces parallel lines to intersect “at infinity”, hence creating a plane that satisfies axiom P2 rather than the parallel postulate. When you think about it, this is a rather natural model of things we see in reality. Suppose you are standing on parallel train tracks and looking out towards the horizon along the direction of the tracks. The tracks are parallel, but they appear to converge to a point on the horizon. In this sense, Euclidean geometry is simply a local version of projective geometry; parallel lines are only truly “parallel” near your feet!

To make this rigorous, define a pencil of parallel lines in $\mathbb{R}^2$ to be the set of all lines parallel to a given line in the plane. Then the real projective plane is the set $\mathbb{R}^2\cup l^\infty$ where $l^\infty$ is a set consisting of one point for each pencil of parallel lines in $\mathbb{R}^2$. A line in this plane is then one of:

  • $l^\infty$,
  • Any subset of $\mathbb{R}^2\cup l^\infty$ consisting of a line in $\mathbb{R}^2$ along with the associated point in $l^\infty$.

It’s not hard to see that this satisfies axioms P1-P4 above. Any two lines that were parallel in $\mathbb{R}^2$ now meet at the point at infinity corresponding to their pencil.

Homogeneous coordinates

There is another, more symmetric way to define the real projective plane. We can think of a point in the real projective plane as a ratio $(a:b:c)$ of three real numbers, not all zero, called its homogeneous coordinates. (Since we only care about the ratio between the numbers, $(2:4:5)$ and $(6:12:15)$ describe the same point.)

A line is then defined as the set of solutions $(a:b:c)$ to a linear equation $$\alpha x +\beta y + \gamma z=0$$ for some fixed $\alpha$, $\beta$, $\gamma$.

It’s not hard to see why this is equivalent to our first definition of the real projective plane. Each point either has $z=0$ or $z\neq 0$. Those with $z\neq 0$, we can normalize so that $z=1$, and then take $(x:y:1)$ to be the point $(x,y)$ in $\mathbb{R}^2$. The points with $z=0$ then form the new line $l^\infty$, where $(x:y:0)$ corresponds to the direction with “slope” $y/x$ (or the vertical direction, if $x$ is $0$.)

Exercise. Show that the unit circle, expressed in homogeneous coordinates, is described by the equation $$x^2+y^2=z^2.$$ Why doesn’t the equation $x^2+y^2=1$ make sense in the projective plane?

The q-factorial in terms of the major index

EDIT: As pointed out in the comments, the following proof is found in Knuth’s book, The Art of Computer Programming, Vol. 3, and is attributed to MacMahon.

There are several nice series expansions involving basic permutation statistics which I’ve been reviewing lately. One of these involves the number of inversions of a permutation.

An inversion of a permutation of the numbers $1,2,\ldots,n$ is a pair of numbers which appear “out of order” in the sequence, that is, with the larger number appearing first. For instance, for $\DeclareMathOperator{\maj}{maj} \DeclareMathOperator{\inv}{inv} n=4$, the permutation $3,1,4,2$ has three inversions: $(3,1)$, $(3,2)$, and $(4,2)$. We write $$\inv(3142)=3.$$

It is not hard to show that
$$(1)(1+q)(1+q+q^2)\cdots(1+q+q^2+\cdots+q^{n-1})=\sum q^{\inv(w)}$$ where the sum ranges over all permutations $w$ of $\{1,2,\ldots,n\}$. (See page 3 of this post for a proof.) The product on the left side is called the $n$th $q$-factorial.

The remarkable thing is that there is a similar expansion of the $q$-factorial in terms of the major index. Define a descent of a permutation to be an index $i$ for which the $i$th number is greater than the $(i+1)$st. The major index is defined to be the sum of the descents. For instance, the permutation $3,1,4,2$ has two descents, in positions $1$ and $3$, so the major index is $4$. We write $$\maj(3142)=4.$$

With this notation, we have
\begin{equation}(1)(1+q)(1+q+q^2)\cdots(1+q+q^2+\cdots+q^{n-1})=\sum_{w\in S_n} q^{\maj(w)}\end{equation}
as well, implying that the number of permutations with $k$ inversions is equal to the number of permutations with major index $k$, for any $k$. There is a direct bijection showing this called the Foata bijection.

I was seeking a way to prove equation (1) directly, without the Foata bijection. But the simplest proof I was able to find in the literature, in Stanley’s Enumerative Combinatorics, proved it by showing something much more general and then simply stating it as a corollary. Wishing to see a direct proof of the result, I went through Stanley’s more general proof step-by step to see what was going on in this special case. I thought I would share it here since I had a hard time finding it elsewhere. (Turn to page 2.)

What happens in characteristic p?

There is a reason that the real numbers and complex numbers are so popular. They have the nice property that $1+1+1+\cdots+1$ is not equal to zero, no matter how many times you add $1$ to itself.

Strange things happen in fields of prime characteristic $p\neq 0$, such as the field $\mathbb{Z}/p\mathbb{Z}$ of integers taken modulo $p$. In these fields, $1+1+\cdots+1=0$ for any number of $1$’s which is a multiple of $p$. We get non-intuitive but strangely beautiful identities like $(x+y)^p=x^p+y^p$, since the coefficient $\binom{p}{i}$ is divisible by $p$ for $1\le i\le p-1$.

And, we get a representation of the symmetric group which is indecomposable but not irreducible. Allow me to explain.

The symmetric group $S_n$ is the group of permutations of $\{1,2,\ldots,n\}$ under composition. We can represent these elements as $n\times n$ permutation matrices, by assigning to the permutation $\pi:\{1,2,\ldots,n\}\to \{1,2,\ldots,n\}$ the matrix whose $(i,\pi(i))$th entry is $1$ for all $i$, and whose other entries are all $0$. For example, for $n=3$:

  • The permutation $(123)$ is assigned the matrix:
    $$\left(\begin{array}{ccc}
    0 & 0 & 1\\
    1 & 0 & 0\\
    0 & 1 & 0
    \end{array}
    \right)$$
  • The identity permutation $(1)(2)(3)$ is assigned the identity matrix:
    $$\left(\begin{array}{ccc}
    1 & 0 & 0\\
    0 & 1 & 0\\
    0 & 0 & 1
    \end{array}
    \right)$$
  • The permutation $(12)(3)$ is assigned the matrix:
    $$\left(\begin{array}{ccc}
    0 & 1 & 0\\
    1 & 0 & 0\\
    0 & 0 & 1
    \end{array}
    \right)$$

And so on. This gives an action of $S_n$ on the vectors in $\mathbb{R}^n$ by multiplying by the associated matrix. For instance, $(123)$ sends the vector $(2,4,-3)$ to $(-3,2,4)$ by permuting the coordinates.

There are a few things that are “fixed” by this action. For one, any permutation in $S_3$ maps the vector $(1,1,1)$ to itself. In fact, it fixes the one-dimensional subspace spanned by this vector: every vector of the form $(r,r,r)$ is fixed under any permutation. Let’s call this subspace $U$.

Decomposing vector spaces as the direct sum of two subspaces is fun, so let’s look at the space perpendicular to the all $1$’s vector: the space of vectors whose entries sum to $0$. In $\mathbb{R}^3$, this would be the plane $x+y+z=0$. Notice that every permutation fixes this space too: any permutation of a vector whose sum is $0$, such as $(1,1,-2)$, also has sum $0$. Let’s call this subspace $W$.

So, we can write $\mathbb{R}^n$ as $U\oplus W$, where the action of $S_n$ on $U$ and $W$ give two smaller matrix representations of $S_n$.

This worked out quite nicely when dealing with real numbers. But what happens if we are instead considering vectors whose entries are from a field $k$ of characteristic $p$, for some $p$ dividing $n$? For instance, consider $k^3$ where $k$ has characteristic $3$. Then the vector $(1,1,1)$, an element of the subspace $U$ from above, has sum $1+1+1=0$. But that means that $(1,1,1)$ is in $W$ as well! The spaces that used to be perpendicular are now nested; $U$ is contained in $W$ altogether.

In fact, this representation is now indecomposable; it cannot be written as a direct sum of two smaller representations. It is not irreducible, however; the sub-representation $W$ is still well-defined, but it is not a direct summand of the entire space!

Strange things happen in characteristic $p$.

Equilateral triangles in the complex plane

I came across an exercise in Ahlfors’ Complex Analysis the other day that got me thinking. The exercise asked to prove that the complex numbers $a$, $b$, and $c$ form the vertices of an equilateral triangle if and only if $a^2+b^2+c^2=ab+bc+ca.$ It struck me as quite a nice, simple, and symmetric condition.

My first instinct, in going about proving this, was to see if the condition was translation invariant, so that one of the points can be moved to the origin. Indeed, if we subtract a constant $z$ from each of $a,b,c$ the equation becomes
$(a-z)^2+(b-z)^2+(c-z)^2=(a-z)(b-z)+(b-z)(c-z)+(c-z)(a-z),$
which simplifies to the original equation after expanding each term. So, we can assume without loss of generality that $a=0$, and we wish to show that $0$, $b$, and $c$ form the vertices of an equilateral triangle if and only if $b^2+c^2=bc$.

To finish up, since we can multiply $b$ and $c$ by a constant scaling factor without changing the condition, we can further assume that $b=1$. So then $1+c^2=c$, implying that $c=\frac{1}{2}\pm \frac{\sqrt{3}}{2}i$. So indeed, the three points form an equilateral triangle. QED.

This proof works, but is somehow deeply unsatisfying. I wanted to find a more “symmetric” proof, that didn’t involve moving one of the points to an origin and another to an axis. Such a coordinate-free condition should have a coordinate-free proof.

Another way to approach it is to first manipulate the condition a bit:
\begin{eqnarray*}
a^2+b^2+c^2-ab-bc-ca&=& 0 \\
2a^2+2b^2+2c^2-2ab-2bc-2ca &=& 0 \\
(a^2-2ab+b^2)+(b^2-2bc+c^2)+(c^2-2ac+a^2) &=& 0 \\
(a-b)^2+(b-c)^2+(c-a)^2=0
\end{eqnarray*}

So, we have re-written the equation as a sum of three squares that equals $0$. (Too bad we’re looking for complex and not real solutions!)

Setting $x=a-b$, $y=b-c$, and $z=c-a$, the condition now becomes $x^2+y^2+z^2=0$, along with the additional equation $x+y+z=0$.

Notice that $x$, $y$, and $z$ are, in some sense, the sides of the triangle $abc$ as vectors. So, we are trying to show that if $x^2+y^2+z^2=0$ and $x+y+z=0$, then $x$, $y$, and $z$ all have the same distance to $0$ and are spaced $120^\circ$ apart from each other around the origin. This is starting to sound more tractable.

In fact, upon closer inspection, we are simply studying the intersection of two curves in the complex projective plane. The points in this plane are equivalence classes of triples $(x:y:z)$, not all zero, up to scaling by a complex number (which is dilation and rotation, geometrically). So, we can think of the equation $x^2+y^2+z^2=0$ as a quadratic curve in this plane, and the equation $x+y+z=0$ as a line, and we simply want to know where these curves intersect.

Intuitively, a quadratic curve (such as a circle or a parabola) and a line should intersect in at most 2 points. Indeed, Bezout’s theorem tells us that two complex projective curves of degrees $d$ and $e$ intersect in at most $de$ points. Here, $d$ and $e$ are $2$ and $1$, so we have at most $2$ solutions $(x:y:z)$ to the equations $x^2+y^2+z^2=0$ and $x+y+z=0$.

But we already know two solutions: first, any $x,y,z$ that are equal in norm and spaced at $120^\circ$ angles around $0$ will satisfy $x+y+z=0$, since they form an equilateral triangle with its centroid at the origin. Since squaring them will double their angles, $x^2$, $y^2$, and $z^2$ are at angles of $240^\circ$ from each other, which is the same as $120^\circ$ in the opposite direction! So, $x^2+y^2+z^2=0$ as well in this case.

Moreover, such $x$, $y$, and $z$ are unique up to dilation, rotation, and reflection of the equilateral triangle (say, interchanging $x$ and $y$). But scaling or rotating $x$, $y$, and $z$ by a complex constant does not change the point $(x:y:z)$, and so the only other distinct point is formed by reflecting the triangle. (Thanks to Bryan Gillespie for this helpful geometric insight!)

So we already know two distinct solutions, and by Bezout’s theorem there cannot be any others. It follows that the original triangle $abc$ must be equilateral.

Somehow more satisfactorily, QED.

Theme and variations: the Newton-Girard identities

Time for another gemstone from symmetric function theory! (I am studying for my Ph.D. qualifying exam at the moment, and as a consequence, the next several posts will feature yet more gemstones from symmetric function theory. You can refer back to this post for the basic definitions.)

Start with a polynomial $p(x)$ that factors as
$$p(x)=(x-\alpha_1)(x-\alpha_2)\cdots(x-\alpha_n).$$ The coefficients of $p(x)$ are symmetric functions in $\alpha_1,\ldots,\alpha_n$ – in fact, they are, up to sign, the elementary symmetric functions in $\alpha_1,\ldots,\alpha_n$.

In particular, if $e_i$ denotes $\sum_{j_1<\cdots<j_i} \alpha_{j_1}\alpha_{j_2}\cdots\alpha_{j_i}$, then
$$p(x)=x^n-e_1x^{n-1}+e_{2}x^{n-2}-\cdots+(-1)^n e_n.$$ (These coefficients are sometimes referred to as Vieta’s Formulas.)

Since $p(\alpha_i)=0$ for all $i$, we can actually turn the equation above into a symmetric function identity by plugging in $\alpha_1,\ldots,\alpha_n$:
$$\begin{eqnarray*}
\alpha_1^n-e_1\alpha_1^{n-1}+e_{2}\alpha_1^{n-2}-\cdots+(-1)^n e_n&=&0 \\
\alpha_2^n-e_1\alpha_2^{n-1}+e_{2}\alpha_2^{n-2}-\cdots+(-1)^n e_n&=&0 \\
\vdots\hspace{3cm}& &
\end{eqnarray*}$$ and then summing these equations:
$$(\alpha_1^n+\cdots+\alpha_n^n)-e_1(\alpha_1^{n-1}+\cdots+\alpha_n^{n-1})+\cdots+(-1)^n\cdot ne_n=0$$ …and we have stumbled across another important basis of the symmetric functions, the power sum symmetric functions. Defining $p_i=\alpha_1^i+\cdots+\alpha_n^i$, every symmetric function in $\{\alpha_j\}$ can be uniquely expressed as a linear combination of products of these $p_i$`s (we write $p_\lambda=p_{\lambda_1}p_{\lambda_2}\cdots p_{\lambda_n}$.)

So, we have
$$p_n-e_1p_{n-1}+e_2 p_{n-2}-\cdots+(-1)^n\cdot ne_n=0.$$ This is called the $n$th Newton-Girard identity, and it gives us a recursive way of expressing the $p$`s in terms of the $e$`s.

Well, almost. We have so far only shown that this identity holds when dealing with $n$ variables. However, we can plug in any number of zeros for the $\alpha_i$`s to see that the identity also holds for any number of variables $k<n$. And, if we had more than $n$ variables, we can compare coefficients of each monomial individually, which only can involve at most $n$ of the variables at a time since the equation is homogeneous of degree $n$. Setting the rest of the variables equal to zero for each such monomial will do the trick.

So now we have it! The Newton-Girard identities allow us to recursively solve for $p_n$ in terms of the $e_i$`s. Wikipedia does this nicely and explains the computation, and the result is:
$$p_n=\det\left(\begin{array}{cccccc}
e_1 & 1 & 0 & 0 &\cdots & 0 \\
2e_2 & e_1 & 1 & 0 & \cdots & 0 \\
3e_3 & e_2 & e_1 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
(n-1)e_{n-1} & e_{n-2} & e_{n-3} & e_{n-4} & \ddots & 1 \\
ne_n & e_{n-1} & e_{n-2} & e_{n-3} & \cdots & e_1
\end{array}\right)$$ For instance, this gives us $p_2=e_1^2-2e_2$, which is true:
$$x^2+y^2+z^2=(x+y+z)^2-2(xy+yz+zx).$$ The Wikipedia page derives a similar identity for expressing the $e$`s in terms of the $p$`s. It also does the same for expressing the complete homogeneous symmetric functions $h_n$ in terms of the $p$`s and vice versa. However, it does not explicitly express the $e$`s in terms of the $h$`s or vice versa. In the name of completeness of the Internet, let’s treat these here.

Fix some number of variables $x_1,\ldots,x_n$. For any $d$, define $h_d$ to be the sum of all monomials of degree $d$ in $x_1,\ldots,x_n$. This is clearly symmetric, and we define
$$h_\lambda = h_{\lambda_1}h_{\lambda_2}\cdots h_{\lambda_k}$$ for any partition $\lambda$, as we did for the elementary symmetric functions last week. The $h_\lambda$`s, called the complete homogeneous symmetric functions, form a basis for the space of symmetric functions.

It’s a fun exercise to derive the following generating function identities for $e_n$ and $h_n$:
$$H(t)=\sum_n h_n t^n = \prod_{i=1}^n \frac{1}{1-x_i t}$$ $$E(t)=\sum_n e_n t^n = \prod_{i=1}^n (1+x_i t)$$ (The first requires expanding out each factor as a geometric series, and then comparing coefficients. Try it!)

From these, we notice that $H(t)E(-t)=1$, and by multiplying the generating functions together and comparing coefficients, we find the identities
$$h_n=h_{n-1}e_1-h_{n-2}e_2+\cdots+(-1)^{n-1}e_n$$ Just as before, this gives us a recursion for $h_n$ in terms of the $e_i$`s. With a bit of straightforward algebra, involving Cramer’s Rule, we can solve for $h_n$:

$$h_n=\det\left(\begin{array}{cccccc}
e_1 & 1 & 0 & 0 &\cdots & 0 \\
e_2 & e_1 & 1 & 0 & \cdots & 0 \\
e_3 & e_2 & e_1 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
e_{n-1} & e_{n-2} & e_{n-3} & e_{n-4} & \ddots & 1 \\
e_n & e_{n-1} & e_{n-2} & e_{n-3} & \cdots & e_1
\end{array}\right)$$ We can also use the same equations to solve for $e_n$ in terms of $h_n$:

$$e_n=\det \left(\begin{array}{cccccc}
h_1 & 1 & 0 & 0 &\cdots & 0 \\
h_2 & h_1 & 1 & 0 & \cdots & 0 \\
h_3 & h_2 & h_1 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
h_{n-1} & h_{n-2} & h_{n-3} & h_{n-4} & \ddots & 1 \\
h_n & h_{n-1} & h_{n-2} & h_{n-3} & \cdots & h_1
\end{array}\right)$$

I find these two formulas to be more aesthetically appealing than the standard Newton-Gerard formulas between the $p_n$`s and $e_n$`s, since they lack the pesky integer coefficients that appear in the first column of the matrix in the $p$-to-$e$ case. While perhaps not as mainstream, they are gemstones in their own right, and deserve a day to shine.