Theme and variations: the Newton-Girard identities

Time for another gemstone from symmetric function theory! (I am studying for my Ph.D. qualifying exam at the moment, and as a consequence, the next several posts will feature yet more gemstones from symmetric function theory. You can refer back to this post for the basic definitions.)

Start with a polynomial $p(x)$ that factors as
$$p(x)=(x-\alpha_1)(x-\alpha_2)\cdots(x-\alpha_n).$$ The coefficients of $p(x)$ are symmetric functions in $\alpha_1,\ldots,\alpha_n$ – in fact, they are, up to sign, the elementary symmetric functions in $\alpha_1,\ldots,\alpha_n$.

In particular, if $e_i$ denotes $\sum_{j_1<\cdots<j_i} \alpha_{j_1}\alpha_{j_2}\cdots\alpha_{j_i}$, then
$$p(x)=x^n-e_1x^{n-1}+e_{2}x^{n-2}-\cdots+(-1)^n e_n.$$ (These coefficients are sometimes referred to as Vieta’s Formulas.)

Since $p(\alpha_i)=0$ for all $i$, we can actually turn the equation above into a symmetric function identity by plugging in $\alpha_1,\ldots,\alpha_n$:
$$\begin{eqnarray*}
\alpha_1^n-e_1\alpha_1^{n-1}+e_{2}\alpha_1^{n-2}-\cdots+(-1)^n e_n&=&0 \\
\alpha_2^n-e_1\alpha_2^{n-1}+e_{2}\alpha_2^{n-2}-\cdots+(-1)^n e_n&=&0 \\
\vdots\hspace{3cm}& &
\end{eqnarray*}$$ and then summing these equations:
$$(\alpha_1^n+\cdots+\alpha_n^n)-e_1(\alpha_1^{n-1}+\cdots+\alpha_n^{n-1})+\cdots+(-1)^n\cdot ne_n=0$$ …and we have stumbled across another important basis of the symmetric functions, the power sum symmetric functions. Defining $p_i=\alpha_1^i+\cdots+\alpha_n^i$, every symmetric function in $\{\alpha_j\}$ can be uniquely expressed as a linear combination of products of these $p_i$`s (we write $p_\lambda=p_{\lambda_1}p_{\lambda_2}\cdots p_{\lambda_n}$.)

So, we have
$$p_n-e_1p_{n-1}+e_2 p_{n-2}-\cdots+(-1)^n\cdot ne_n=0.$$ This is called the $n$th Newton-Girard identity, and it gives us a recursive way of expressing the $p$`s in terms of the $e$`s.

Well, almost. We have so far only shown that this identity holds when dealing with $n$ variables. However, we can plug in any number of zeros for the $\alpha_i$`s to see that the identity also holds for any number of variables $k<n$. And, if we had more than $n$ variables, we can compare coefficients of each monomial individually, which only can involve at most $n$ of the variables at a time since the equation is homogeneous of degree $n$. Setting the rest of the variables equal to zero for each such monomial will do the trick.

So now we have it! The Newton-Girard identities allow us to recursively solve for $p_n$ in terms of the $e_i$`s. Wikipedia does this nicely and explains the computation, and the result is:
$$p_n=\det\left(\begin{array}{cccccc}
e_1 & 1 & 0 & 0 &\cdots & 0 \\
2e_2 & e_1 & 1 & 0 & \cdots & 0 \\
3e_3 & e_2 & e_1 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
(n-1)e_{n-1} & e_{n-2} & e_{n-3} & e_{n-4} & \ddots & 1 \\
ne_n & e_{n-1} & e_{n-2} & e_{n-3} & \cdots & e_1
\end{array}\right)$$ For instance, this gives us $p_2=e_1^2-2e_2$, which is true:
$$x^2+y^2+z^2=(x+y+z)^2-2(xy+yz+zx).$$ The Wikipedia page derives a similar identity for expressing the $e$`s in terms of the $p$`s. It also does the same for expressing the complete homogeneous symmetric functions $h_n$ in terms of the $p$`s and vice versa. However, it does not explicitly express the $e$`s in terms of the $h$`s or vice versa. In the name of completeness of the Internet, let’s treat these here.

Fix some number of variables $x_1,\ldots,x_n$. For any $d$, define $h_d$ to be the sum of all monomials of degree $d$ in $x_1,\ldots,x_n$. This is clearly symmetric, and we define
$$h_\lambda = h_{\lambda_1}h_{\lambda_2}\cdots h_{\lambda_k}$$ for any partition $\lambda$, as we did for the elementary symmetric functions last week. The $h_\lambda$`s, called the complete homogeneous symmetric functions, form a basis for the space of symmetric functions.

It’s a fun exercise to derive the following generating function identities for $e_n$ and $h_n$:
$$H(t)=\sum_n h_n t^n = \prod_{i=1}^n \frac{1}{1-x_i t}$$ $$E(t)=\sum_n e_n t^n = \prod_{i=1}^n (1+x_i t)$$ (The first requires expanding out each factor as a geometric series, and then comparing coefficients. Try it!)

From these, we notice that $H(t)E(-t)=1$, and by multiplying the generating functions together and comparing coefficients, we find the identities
$$h_n=h_{n-1}e_1-h_{n-2}e_2+\cdots+(-1)^{n-1}e_n$$ Just as before, this gives us a recursion for $h_n$ in terms of the $e_i$`s. With a bit of straightforward algebra, involving Cramer’s Rule, we can solve for $h_n$:

$$h_n=\det\left(\begin{array}{cccccc}
e_1 & 1 & 0 & 0 &\cdots & 0 \\
e_2 & e_1 & 1 & 0 & \cdots & 0 \\
e_3 & e_2 & e_1 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
e_{n-1} & e_{n-2} & e_{n-3} & e_{n-4} & \ddots & 1 \\
e_n & e_{n-1} & e_{n-2} & e_{n-3} & \cdots & e_1
\end{array}\right)$$ We can also use the same equations to solve for $e_n$ in terms of $h_n$:

$$e_n=\det \left(\begin{array}{cccccc}
h_1 & 1 & 0 & 0 &\cdots & 0 \\
h_2 & h_1 & 1 & 0 & \cdots & 0 \\
h_3 & h_2 & h_1 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
h_{n-1} & h_{n-2} & h_{n-3} & h_{n-4} & \ddots & 1 \\
h_n & h_{n-1} & h_{n-2} & h_{n-3} & \cdots & h_1
\end{array}\right)$$

I find these two formulas to be more aesthetically appealing than the standard Newton-Gerard formulas between the $p_n$`s and $e_n$`s, since they lack the pesky integer coefficients that appear in the first column of the matrix in the $p$-to-$e$ case. While perhaps not as mainstream, they are gemstones in their own right, and deserve a day to shine.

Addendum: An alternate proof of the FTSFT

Last week I posted about the Fundamental Theorem of Symmetric Function Theory. Zarathustra Brady pointed me to the following alternate proof in Serge Lang’s book Algebra. While not as direct or useful in terms of changing basis from the $e_\lambda$`s to the $m_\lambda$`s, it is a nice, clean inductive proof that I thought was worth sharing:

Assume for contradiction that the $e_\lambda$`s do not form a basis of the space of symmetric functions. We have shown that they span the space, so there is a dependence relation: some nontrivial linear combination of $e_\lambda$`s, all necessarily of the same degree, is equal to zero. Among all such linear combinations, choose one (say $P$) that holds for the smallest possible number of variables $x_1,\ldots,x_n$. Furthermore, among the possible linear combinations for $n$ variables, choose $P$ to have minimal degree.

If the number of variables is $1$, then the only elementary symmetric functions are $x_1^k$ for some $k$, and so there is clearly no linear dependence relation. So, $n\ge 2$. Furthermore, if $P$ has degree $1$ as a polynomial in the $x_i$`s, then it can involve only $e_1$, and so it cannot be identically zero. So $P$ has degree at least $2$, in at least $2$ variables.

Now, if $P$ is divisible by $x_n$, then by symmetry it is divisible by each of the variables. So, it is divisible by $e_n$, and so we can divide the equation by $e_n$ and get a relation of smaller degree, contradicting our choice of $P$. Otherwise, if $P$ is not divisible by $x_n$, set $x_n=0$. Then we get another nontrivial relation among the $e_\lambda$ in the smaller number of variables $x_1,\ldots,x_n$, again contradicting the choice of $P$. QED!

The Fundamental Theorem of Symmetric Function Theory

This little gemstone is hidden within the folds of algebraic combinatorics, but certainly deserves its name. Just as the Fundamental Theorem of Arithmetic gives us a way of writing common objects (numbers) in a canonical form (prime factorization), the Fundamental Theorem of Symmetric Function Theory allows us to express any symmetric function in a useful canonical form.

First, some background on symmetric functions. Like generating functions, symmetric functions arise naturally in combinatorics, and the coefficients are often the point of interest. In the poetic words of Herbert Wilf in his book generatingfunctionology:

A generating function is a clothesline on which we hang up a sequence of numbers for display.

I claim, and aim to demonstrate here, that

A symmetric function is a web of clotheslines on which we hang up a collection of numbers that are indexed by partitions.

Okay, so the clothesline might be a bit more complicated in this case. But it is much more useful for areas of mathematics in which partitions play an important role.

Let’s be more rigorous. A symmetric function (over $\mathbb{Q}$) is a polynomial or series in many variables with rational coefficients which is invariant (unchanging) under permutations of the variables. For instance, $x^2 + y^2 + z^2 + 2xyz$ is a symmetric function in three variables, but $x^2 y + y^2 z + z^2 x$ is not, because interchanging $x$ and $y$ does not leave the function unchanged.

I find that writing out a symmetric function explicitly as a polynomial, like those above, is similar to expressing a number in base ten – sometimes it is useful, but other times you need its prime factorization. The “primes” of symmetric function theory are the elementary symmetric functions.

First, define $e_n(x_1,\ldots,x_r)=m_{(1^n)}=\sum_{i_1<\cdots<i_n} x_{i_1}\cdots x_{i_n}.$ For instance,
$$e_3(a,b,c,d)=abc+abd+acd+bcd.$$ Let $\lambda=(\lambda_1,\lambda_2,\ldots,\lambda_k)$ be a partition, that is, a finite sequence of positive integers written in nonincreasing order. Then the elementary symmetric function corresponding to $\lambda$ is defined to be the product
$$e_\lambda=e_{\lambda_1}e_{\lambda_2}\cdots e_{\lambda_k}.$$ The remarkable fact is that the symmetric functions $e_\lambda$ form a basis for the $\mathbb{Q}$-module of symmetric functions! In other words:

Fundamental Theorem of Symmetric Function Theory: Every symmetric function can be written uniquely in the form $\sum_{\lambda} c_{\lambda}e_\lambda$, where each $c_\lambda\in \mathbb{Q}$.

So, the $e_\lambda$`s are a clothesline of sorts, indexed by the partitions, on which to hang coefficients.

For example, let’s try to write the symmetric function $x^3+y^3+z^3$ in terms of elementary symmetric functions. We have the functions $e_1=x+y+z$, $e_2=xy+yz+zx$, and $e_3=xyz$ to work with. We can start by cubing $e_1$:
$$e_1^3=x^3+y^3+z^3+3(x^2y+y^2x+x^2z+z^2x+y^2z+z^2y)+6xyz$$ We can replace the $6xyz$ with $6e_3$. We now only need to write the term $3(x^2y+y^2x+x^2z+z^2x+y^2z+z^2y)$ in terms of $e_i$`s. The product $3e_1e_2$ is pretty close, but leaves us with an extra $9xyz$ that we need to subtract out, giving us $3e_1e_2-9e_3$ for the middle term. Putting this all together and solving for $x^3+y^3+z^3$, we find:

$$x^3+y^3+z^3=e_1^3-3e_1e_2+3e_3=e_{(1,1,1)}-3e_{(2,1)}+3e_{(3)}.$$ It is fairly clear that a similar process can be used to express any symmetric function in terms of $e_{\lambda}$`s. But why is it unique?