$\newcommand{\divides}{\mathbin{|}} \DeclareMathOperator{\gl}{gl} \DeclareMathOperator{\End}{End} \DeclareMathOperator{\Im}{Im} \DeclareMathOperator{\Ker}{Ker} \DeclareMathOperator{\ev}{ev} \DeclareMathOperator{\res}{res} \DeclareMathOperator{\id}{id}$
Due to my recent switch to mathematics, I had to re-take linear algebra, and was once again confronted with the jordan normal form of a matrix. To construct it, we introduced a lot of machinery that seemed to appear out of nowhere – we talked about a minimal polynomial, generalized eigenspaces, flags, et cetera. I thought I would never have to think about this again.
Fast forwarding a few months, I'm taking a course about lie algebras, which closely follows the book from Humphreys[1] – and stumbled upon the following:
Proposition. Let V be a finite dimensional vector space over $\mathrm F$, $x\in \End V$.
a) there exist unique $x_s, x_n\in \End V$ satisfying the following conditions: $x = x_s + x_n$, $x_s$ is semisimple, $x_n$ is nilpotent, $x_s$ and $x_n$ commute.
b) There exist polynomials $p(t)$, $q(t)$ in one indeterminate, without constant term, such that $x_s = p(x)$, $x_n = q(x)$. In particular, $x_s$ and $x_n$ commute with any endomorphism commuting with $x$.
c) If $A \leq B \leq V$ are subspaces, and x maps $B$ into $A$, then $x_s$ and $x_n$ also map $B$ into $A$.
Proof [it follows some black magic containing the words “chinese remainder theorem”]
Whoa! I was pretty impressed on multiple levels:
The last thing is kind of a big deal: In our linear algebra course, we had that restriction! But what does “semisimple” mean in this context? Humphreys says:
Call $x \in \End V$ […] semisimple if the roots of its minimal polynomial over $\mathrm F$ are all distinct. Equivalently ($\mathrm F$ being algebraically closed), $x$ is semisimple if and only if $x$ is diagonalizable.
Hm. The equivalence was not really obvious to me. Also, can it happen that $x$ is semisimple as an element of $\gl(\mathbb R)$, but not of $\gl(\mathbb C)$? Wikipedia tells a different story:
[T is semisimple] if every T-invariant subspace has a complementary T-invariant subspace.
This is unknown territory for me. Let's try to clear things up a bit.
(TODO: Introduction, $k[t]$ is PID, behaves nicely, why all of a sudden polynomials)
LEM 1 Let $p\in k[t]$ and $W$ be $x$-invariant. $p(x\vert_W) = p(x)\vert_W$.
Proof. Because both $\res_W \circ \ev_x$ and $\ev_{x\vert_W}$ map from a polynomial ring, is sufficient to check equality for constants and $t$. These cases reduce to $\left({\lambda \id_V}\right)\vert_W = \lambda \id_W$ and $x\vert_W=x\vert_W$.
LEM 2 Let $W\leq V$ be $x$-invariant. $\mu_{x\vert_W}\divides \mu_x$.
Proof. Because of $$ \mu_x(x\vert_W) = \mu_x(x)\vert_W = 0\vert_W = 0, $$ $\mu_x$ is in $\Ker(\ev_{x\vert_W}) = (\mu_{x\vert_W})$, and therefore must be a multiple of the latter.
First of all, one might think that due to its minimality, $\mu_x$ should not contain any “junk factors” – by that I mean factors $p\vert \mu_x$ that don't annihilate any vectors.
LEM 3 Let $A,B\in\End(V), \dim V<\infty$, where $B$ is injective. Then $\Ker A = \Ker AB$.
Proof. By finite dimensionality, $B$ is also surjective. Thus $\Ker AB = \Ker A\cap \Im B = \Ker A\cap V = \Ker A$.
(TODO) reword this. An immediate consequence is that the minimal polynomial does indeed not contain any factors with trivial kernel: If $\nu(x) \cdot \rho(x)$ vanishes everywhere with $\rho(x)$ being injective, so does $\nu$, and thus $\nu\cdot \rho$ cannot be minimal. So no matter how we split it up, we never have any “junk”.
(TODO remark why this doesn't work in the infinite-dimensional case)
LEM 4 Let $p, q$ be coprime polynomials and $x\in \End(V)$. Then the kernels of $p(x)$ and $q(x)$ intersect trivially.
Proof. Since $p(x)$ and $q(x)$ both vanish on $K:=\Ker p(x)\cap \Ker q(x)$, so does any linear combination, and in particular, their greatest common divisor $gcd(p,q)=1$, which gets evaluated to the identity map. However, the only way for the identity to vanish on $K$ is $K=0$.
THM 5 Let $x\in\End(V)$ with minimal polynomial $\mu_x=pq$, both factors being monic and coprime. Then $V$ is a direct sum of the $x$-invariant subspaces $P:=\Ker p(x)$ and $Q:=\Ker q(x)$. Furthermore, $\mu_{x\vert_P}=p$, $\mu_{x\vert_Q}=q$.
Before we get to the proof, recall a few important characterizations:
Operators $x\in \End(V)$ vanish on $P+Q\subseteq V$ if and only if they vanish on $P$ and on $Q$.
Operators divide the minimal polynomial if and only if they divide every poly that vanishes when evaluated at x.
If $x, y$ commute, $\Ker (xy) \subseteq \Ker x + \Ker y$.
The second point is a formulation of the so called “yoneda lemma of posets”: $x = y$ if and only if for every $z$ we have $x \leq z \Leftrightarrow y \leq z$. In this case, The poset is the preorder $(\{p\in k[X] \mid p(x) = 0 \}, |)$ modulo association (i.e. modulo scalar factor).
Proof. Because $p(x)q(x)$ vanishes everywhere, $\Im q(x)\leq \Ker p(x)$, and so $$ V\simeq \Im q(x)\oplus \Ker q(x) \leq \Ker p(x) \oplus \Ker q(x) = P \oplus Q, $$ implying $\dim V \leq \dim P + \dim Q$. Because $p$ and $q$ are coprime, $P$ and $Q$ intersect trivially. Therefore, the canonical map $P\oplus Q\to V$ sending $(v,w)$ to $v+w$ is a monomorphism whose codomain does not have strictly larger dimension, thus an isomorphism.
Now let $v\in k[X]$ such that $v(x\vert_P) = 0$, which means that $v(x)$ vanishes at least on $P\leq V$. Because $q(x)$ vanishes on $Q$, the product $(vq)(x) = v(x)q(x) = q(x)v(x)$ vanishes on both $P$ and $Q$, ergo on all of $V$. The minimality of $\mu_x$ implies $pq = \mu_x | vq$, ergo $p | v$. Because $p$ divides any polynomial vanishing on $P$ and itself vanishes on $P$, we conclude that $p$ is associated to the minimal polynomial of $x\vert_P$, and since both are monic, they must be equal. The argument for $q$ follows analogously.
COR Let $x\in\End(V)$ whose minimal polynomial is represented by coprime factors $\mu_x=\prod_i \pi_i$. Then $V \simeq \bigoplus_i \Ker\pi_i(x)$, where each factor is the minimal polynomial of the respective restriction of $x$.
Proof. We will proceed by induction over the number of factors. The case of only one factor is clear because $V = \Ker \mu_x(x)$. Assume the following induction hypothesis holds: For all finite dimensional $k$-vector spaces $V$ and all $x\in \End(V)$ and a decomposition $\mu_x = \prod_{i=1}^n \pi_i$, $V \simeq \bigoplus_i$ To be more precise,
TODO The most important results should be stated more explicitly:
Any coprime factorization $\mu_x = \prod_i \pi_i$ gives rise to a decomposition $V = \bigoplus_i \Ker \pi_i(x)$
Whenever $p$ is coprime to $\mu_x$, it only annihilates the zero vector.
Why do we even need the restriction lemmata?
Important counterexample: Not every (complementable) (minimal) Inv. VS must be of the form $\Ker p(x)$ – just look at the identity.
EXAMPLE. Consider $$ x := \begin{pmatrix}1&0&0&0\\0&-1&0&0\\0&0&0&1\\0&0&0&0\end{pmatrix} \,\in \End\left(ℝ^4\right) $$ which has $\mu_x(x) = (x+1)(x-1)x^2 = (x+1)x (x-1)x =: p(x)q(x)$. Their intersection's kernel is $e_3$, reflecting the fact that their factors are not coprime.
LEM Let $x\in\End(V)$ with irreducible minimal polynomial $\mu_x$. Then every invariant subspace admits an invariant complement.[2]
Proof $x$-invariant subspaces of $V$ are precisely the submodules of $V$ viewed as a $k[X]$-Module $M$ via the evaluation map $$ \mathrm{ev}_x\colon k[X]\to \End_{\mathrm{AbGrp}}(V). $$ Because $M$ has annihilator $\mu_x$, we can instead view $V$ as a $k[X]/(\mu_x)$-module $M^\prime$ without altering the submodule lattice. In less technical words, since $p(x)v$ = $(p + q)(x)v$ holds for all $v$ precisely when $q$ is a multiple of the minimal polynomial, we can safely define the left multiplication of our module on cosets $p + (\mu_x)$, and what is a submodule of $M$ must precisely be a submodule of $M^\prime$. Since irreducible elements in a PID generate maximal ideals, $k[X]/(\mu_x)$ is a field. Thus, $M^\prime$ is a vector space, allowing for every subspace has a subspace complementing the latter.
Example. Consider the rotation operator $$ x := \begin{pmatrix}0&-1&0&0\\1&0&0&0\\0&0&0&-1\\0&0&1&0\end{pmatrix}, $$ which has minimal polynomial $\mu_x(X) = X^2+1$. The lemma effectively tells us that this minimal polynomial being irreducible, the action of $x$ equips $V$ effectively with the structure of a $k[X]/(X^2+1)$-Module – however, this is just $\mathbb C$ in disguise! In particular, the decomposition $$ ℝ^4 = \langle e_1 \rangle \oplus \langle e_3 \rangle = \mathrm{span}_ℝ \{ e_1, e_2\} \oplus \mathrm{span}_ℝ \{ e_3, e_4\} $$ into $x$-invariant subspaces (see the definition in the next section) allows us to identify each of those with a copy of $ℂ$ by mapping $1\in ℂ$ to $e_1$ or $e_3$, respectively. With this in mind, “multiplying” with $x$ mirrors the multiplication with $i$ – doing it twice results in the negative. This is in complete agreement with the canonical isomorphism $k[X]/(X^2+1)\simeq ℂ$, where $X$ is identified with $i$.
Let's take the crucial step apart for a bit, and call $k[X]/(\mu_x)=: l$. Let further $$ \langle v \rangle := \mathrm{span}_k\{x^i v\mid 0\leq i \} = \mathrm{span}_l\{v\} $$ denote the $x$-invariant subspace generated by $v$.
We can consider $l$ as an extension of $k$ because the canonical embedding $k\to k[X]$ intersects $(\mu_x)$ trivially (every non-zero scalar is a unit, thus cannot be a multiple of the minimal polynomial), therefore the induced embedding $k\to l$ must stay injective. We can easily see that $l$ must be finite dimensional as a $k$-vector space by noting that
LEM $$ 1 + (\mu_x),\, X + (\mu_x),\, \cdots,\, X^{\mathrm{deg}(\mu_x)-1} (\mu_x) $$ is a basis of $l$ viewed as a $k$-vector space.
Proof It is a generating set because every coset $p + (\mu_x)\in l$ can be represented by a polynomial of degree less than $\mathrm{deg}(\mu_x)$ (polynomial division), which clearly is a $k$-linear combination of our generators. Limiting the exponent to $\mathrm{deg}(\mu_x)$ ensures that this set is in fact linearly independent: If we had a degenerating $k$-linear combination $$ \left( \sum_{i=0}^{\mathrm{deg}(\mu_x)} \alpha_i X^i \right) + (\mu_x) = (\mu_x), $$ (remember that the zero element in $l$ is not zero, but the trivial coset) this would imply that the representative on the left hand side were a multiple of $\mu_x$, but since every such linear combination has degree less than $\mu_x$, the only possibility is for all coefficients to be zero.
To conclude:
LEM If $p\in k[X]$ is irreducible, $k[X]/(p)$ is a finite field extension of $k$ with the induced embedding with the monomial cosets $$ 1 + (p),\, X + (p),\, \cdots,\, X^{\mathrm{deg}(p)-1} (p) $$ as basis.
Now let $M^\prime$ be our $l$-module from the proof, whose underlying elements lie in $V$. The fact that it's a vector space first of all equips us with a basis: Let's fix one consisting of vectors $m_1\ldots m_k$.
Our vector space decomposes into a direct sum of the subspaces $\langle m_i \rangle$. These are “all the same” in the sense that they are isomorphic as one-dimensional $l$-vector spaces. However, note that because $l$-linearity implies $k$-linearity, they are also isomorphic as $k$-vector spaces, and thus have the same dimension over $k$.
Visualized a different way, consider the $l$-module homomorphism $$ i\colon\: l\to \langle m_i \rangle \leq M^\prime ,\quad 1+(\mu_x) \mapsto m_i. $$ It is always surjective: Its codomain $\langle m_i\rangle$ is the smallest $l$-submodule containing $m_i$, whereas $\Im i$ is some $l$-submodule containing $m_i$. On the other hand, it can fail to be injective only if we have a kernel. The latter must be an $l$-submodule of $l$ – in other words, an ideal.
If $l$ is a field, the only possibilities are $0$ or $l$, the latter of which would mean that $m_i = i(1) = 0$. So for fields and $m_i$ nonzero, the map we began with is an injection and can be interpreted as an identification of $\langle m_i\rangle$ with $l$.
If $l$ were not a field, we had elements $p + (\mu_x)\in l$ satisfying $(p + (\mu_x))(x)m_i = 0$ – that is, there is a polynomial which is not a multiple of the minimal polynomial that still manages to let $m_i$ vanish.
To give a third aspect of this, without using the word module: Fix $\mu_x$ irreducible and $p\in k[X]$ not a multiple of $\mu_x$. Because they are coprime, we can find a decomposition of the unit $$ 1 = \mathrm{gcd}(\mu_x, p)(X) = α(X)p(X) + β(X)\mu_x(X) $$ (note how $α$ is a multiplicative inverse to $p$ when viewed modulo $\mu_x$). Then, $p(x)v = 0$ would imply $$ 0 = α(x)p(x)v = (1-β(x)\mu_x(x))v = v, $$ which means that the only thing vanishing under $p(x)$ is the zero vector. Using this, we can verify that $\left(v, xv,\ldots,x^{\mathrm{deg}(\mu_x)-1}v\right)$ is a basis of $\langle v \rangle$ whenever $v\neq 0$, again telling us that all one-element-generated $x$-invariant subspaces “look the same”.
In particular, note that these observations give us a slightly stronger statement: For every $v\in V$, the subspace $\langle v \rangle$ is not only the smallest $x$-invariant subspace containing $v$, but in particular does not contain any strictly smaller subspaces: If an $x$-invariant space $S\leq \langle v \rangle$ were nontrivial, $s\in S$ implies $\langle s \rangle \leq S \leq \langle v \rangle$, and both sides having the same dimension implies equality.
An important corollary is that whenever we intersect $\langle v \rangle$ with another $x$-invariant subspace $U$, the resulting space must be $x$-invariant again, meaning that either the intersection is trivial or the whole of $\langle v \rangle$. The latter, however, would imply that $v\in \langle v \rangle \leq U$. This gives us a strategy to decompose $V$ into disjoint copies of subspaces $\langle m_i \rangle$: Start with an arbitrary $m_1\neq 0$, and for every $i$ and $U_i := \sum_{j=1}^i \langle m_j \rangle$, we can pick an $m_{i+1}$ outside of that space such that $U_i \cap \langle m_{i+1}\rangle = 0$. Because $V$ is finitely dimensional, and with every step $U_i \le U_{i+1}$, this process must stop eventually, again yielding the desired decomposition into direct summands.
LEM Let $x\in \End(V)$. The following are equivalent:
THM?Let $x\in\End(V)$. TFAE:
Proof. 1→2: Assume not every two irreducible factors of $\mu_x$ are distinct. Therefore, the factorization $\mu_x = \prod_i \pi_i^{r_i}$ into irreducible $\pi_i$ must have some exponent $k_i>1$. By lemma $$
[1] Humphreys, James E., „Introduction to Lie Algebras and Representation Theory” – ISBN 978-0-387-90052-0