CAPP 30271 — Lecture 19

Eigenvectors

Let $A$ be an $n \times n$ matrix. A scalar $\lambda$ is an eigenvalue of $A$ if $A \mathbf x = \lambda \mathbf x$ for some vector $\mathbf x$. The vector $\mathbf x$ is the eigenvector that corresponds to $\lambda$.

This definition tells us that eigenvectors are those vectors that stay the same, but are scaled by a transformation—in other words, the direction of the vector doesn't change. In some sense, we can think of these vectors as remaining stable under the transformation. This suggests that they capture some information about the structure of the transformation.

Let $A = \begin{bmatrix} 7 & -9 \\ 2 & -2 \end{bmatrix}$. The vector $\begin{bmatrix} 3 \\ 1 \end{bmatrix}$ is an eigenvector of $A$. Its associated eigenvalue is 4. To see this, we have: \[\begin{bmatrix} 7 & -9 \\ 2 & -2 \end{bmatrix} \begin{bmatrix} 3 \\ 1 \end{bmatrix} = \begin{bmatrix} 12 \\ 4 \end{bmatrix} = 4 \begin{bmatrix} 3 \\ 1 \end{bmatrix}.\]

Traditionally, the motivation for eigenvectors is in the evolution of linear systems: if we view a matrix $A$ as a process applied repeatedly to some input vector $\mathbf x$, then we can view $A^t \mathbf x$ as the evolution of the system through time step $t$. Obviously, taking powers of matrix products is expensive.

Eigenvectors give us a way around this. For an eigenvector $\mathbf x$ with eigenvalue $\lambda$, we have \[A^k \mathbf x = A^{k-1} \lambda \mathbf x = \lambda A^{k-1} \mathbf x = \lambda^2 A^{k-2} \mathbf x = \cdots = \lambda^k \mathbf x.\] This leads to the following ideas:

If we can find $n$ linearly independent eigenvectors, we can use it as a basis and rewrite any vector in terms of eigenvectors—eigenbasis.
If we can find $n$ linearly independent eigenvectors, we can use them to decompose our matrix into eigenvector and eigenvalue pieces—diagonalization.

The set of eigenvalues of $A$ is called the eigenspectrum or spectrum of $A$. From this, we get spectral analysis—studying matrices (and other mathematical objects that can be represented as graphs) based on their eigenvalues and eigenvectors and the structures that they admit.

But we'll see that despite these being useful tools in their own right, what we would really like is for an analogous tool for any matrix, not just square ones. Eigenvalues and eigenvectors are only the first step towards this goal.

How do we find eigenvectors? We take our eigenvector equation and rewrite it. The equation $A \mathbf x = \lambda \mathbf x$ becomes \[(A - \lambda I) \mathbf x = \mathbf 0.\] Then this becomes almost the usual question that we've been dealing with all along: solving for $\mathbf x$. Of course, there's an extra step here: what is $\lambda$? To answer that, we have to ask what $A - \lambda I$ is. Obviously, we do not want $A - \lambda I = 0$.

However, if we take a look at this equation, we're back to solving something of the form $B \mathbf x = \mathbf 0$. One vector that satisfies this is $\mathbf x= \mathbf 0$, which we also do we want.

In order for both of these things to be true, it must be the case that $A - \lambda I$ is not invertible—in which case, the null space of $A - \lambda I$ contains more than just the zero vector.

What we need now is a systematic way of computing $\lambda$ based on this information. To do that, we need to discuss the determinant of a matrix.

A brief excursion into determinants

Determinants are useful quantities that say something important about square matrices (rectangular matrices do not have determinants). Specifically, they quantify the change that a transformation makes to vectors in terms of its "volume". Unfortunately, like matrix inverses, they are a real pain to compute and their properties are not particularly relevant for us. However, we do need them for one thing: being able to compute eigenvalues. This makes sense because both of these values say something about the transformation. So we must discuss how to compute a determinant.

Classically, determinants are defined recursively. We begin with our base case, the $2 \times 2$ matrix.

Let $A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$. Then the determinant of $A$ is $\det A = ad - bc$.

You will sometimes see the determinant denoted by surrounding the matrix with bars instead of square brackets: \[\begin{vmatrix} a & b \\ c & d \end{vmatrix} = ad-bc.\]

What does this say about linear transformations? Consider the effect of this matrix on the standard basis vectors $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$ and $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$.

Notice that this essentially maps every vector onto something like a parallelogram, at least when viewed in $\mathbb R^2$. An interesting question we can ask is what the area of this parallelogram is, viewing the unit box as having an area of 1. This value is exactly the determinant and this is what the determinant signifies—it is the value that quantifies the amount of the transformation.

We can generalize this idea to 3 dimensions (thinking about volume instead of area) or more. The textbook contains details about how the volume arises from this computation in the 3-dimensional case. But we are more concerned with the formula: determinants for a $3 \times 3$ matrix are defined as follows.

Let $A = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}$. Then $\det A = a(ei-fh) - b (di-fg) + c(dh-ef)$.

Be careful: notice that the sign for the terms alternates!

The idea here is that we go through each entry in the first row and consider the determinant of the $2 \times 2$ matrix obtained by removing the row and column that the entry is in. That is, \[\det A = a \begin{vmatrix} e & f \\ h & i \end{vmatrix} - b \begin{vmatrix} d & f \\ g & i \end{vmatrix} + c \begin{vmatrix} d & e \\ g & h \end{vmatrix}.\]

For general $n$, we can extend this idea and find that the determinant formula depends on computing the determinant for $n$ different $n-1 \times n-1$ matrices. This definition for the determinant is called the Laplace expansion, due to the 17th c. French mathematician Pierre–Simon Laplace. Notice that because we end up having to compute roughly $n!$ determinants, it is not actually computationally feasible to compute determinants for sufficiently large matrices in this way. For hand computation and small $n$, this will be fine.

Here are some useful properties of determinants.

Let $A$ be diagonal or triangular. Then $\det(A) = a_{11} \cdots a_{nn}$, the product of the diagonal entries.
$\det(A^T) = \det(A)$.
$\det(AB) = \det(A) \det(B)$.
$\det A = 0$ if and only if $A$ is singular—$A$ is not invertible.

These properties give us some interesting ideas and in particular lead to faster methods for computing determinants.

Recall that if $A$ is invertible, then it can be decomposed into $A = LU$, where both $L$ and $U$ are triangular matrices. We can make use of these properties: both $\det L$ and $\det U$ can be computed by multiplying their diagonals. Then we have $\det A = \det L \det U$. But what if $A$ isn't invertible? Then $\det A = 0$, which we'll discover when LU factorization fails!

What is the determinant of an orthogonal matrix $Q$? We know that $\det Q = \det Q^T$ and $Q^T Q = I$, so we must have that $\det Q = \pm 1$.

The last property is actually where the definition of singular matrix comes from—we saw this earlier simply as square matrices that are not invertible. If we consider the area/volume view of the determinant, this makes sense: a singular matrix has dependent columns, which means one of the dimensions of our parallelopiped collapses and the resultant space has 0 area/volume.

If you read the text, you'll find that determinants allow you to compute the inverse of a matrix without performing elimination. Personally, I think this is a scam which students are too quick to accept because they're tired of doing elimination. But I find computing the determinant and remembering the process to compute the inverse (called Cramer's rule) even more exhausting than just buckling down and doing the elimination, which we know how to do by now. However, if you're more into memorizing formulas, you may find this a more convenient way to compute the inverse.

Back to computing eigenvalues and eigenvectors

Recall that $A - \lambda I$ is not invertible. Then $\det(A - \lambda I) = 0$. This is the key we need to solve for the eigenvalues $\lambda$.

Let $A = \begin{bmatrix} 7 & -9 \\ 2 & -2 \end{bmatrix}$. We have \[A - \lambda I = \begin{bmatrix} 7 - \lambda & -9 \\ 2 & -2-\lambda \end{bmatrix}.\] Then the determinant of this matrix is \[(7-\lambda)(-2-\lambda) -2 \cdot -9 = \lambda^2 - 5\lambda + 4.\] Recall that the determinant is 0, so this suggests that the eigenvalues $\lambda$ are roots of this polynomial. Indeed, we have that this polynomial factors to $(\lambda - 4)(\lambda - 1)$, so we have $\lambda = 4, 1$.

The polynomial $\det(A - \lambda I)$ is the characteristic polynomial of $A$. The eigenvalues of $A$ are the roots of the characteristic polynomial of $A$.

One of the implications from this definition is that an $n \times n$ matrix will have a characteristic polynomial of degree $n$. This comes from having $n$ $\lambda$'s along the diagonal of the matrix $A - \lambda I$.

Once we have our eigenvalues, to find the eigenvectors, we substitute each eigenvalue into $A - \lambda I$ and solve the equation $(A - \lambda I)\mathbf x = \mathbf 0$.

For $\lambda = 1$, we have $\begin{bmatrix} 6 & -9 \\ 2 & -3 \end{bmatrix} \mathbf x = \mathbf 0$. From this, we get $\mathbf x = \begin{bmatrix} 3 \\ 2 \end{bmatrix}$.
For $\lambda = 4$, we have $\begin{bmatrix} 3 & -9 \\ 2 & -6 \end{bmatrix} \mathbf x = \mathbf 0$. From this, we get $\mathbf x = \begin{bmatrix} 3 \\ 1 \end{bmatrix}$.

The entire process to find eigenvalues and eigenvectors is summarized:

Compute the determinant of $A - \lambda I$. This is a polynomial in $\lambda$, the characteristic polynomial of $A$.
Since $A - \lambda I$ is not invertible, its determinant is 0. So we solve for the roots of the characteristic polynomial. These are our eigenvalues.
For each eigenvalue, substitute into the equation $(A - \lambda I) \mathbf x = \mathbf 0$ and solve for $\mathbf x$.

There are a few things to watch out for at this point.

Technically speaking, each eigenvalue corresponds not to a single eigenvector, but a subspace of eigenvectors, called an eigenspace. In the cases we've seen, the eigenspace of an eigenvalue is spanned by a single vector—i.e., it has dimension 1.
What happens if $\lambda = 0$? It turns out this is not actually an issue and actually says something interesting about $A$. We can ask ourselves: If $\lambda = 0$, which vectors $\mathbf x$ satisfy $A \mathbf x = 0 \mathbf x$? The answer is: the vectors in the null space of $A$.
What happens if our eigenvalues aren't unique? This is what we call an eigenvalue with multiplicity greater than 1. However, solving this is the same as usual. Again, we're really solving for the null space of a matrix. It may happen that this space is multidimensional, in which case, we get linearly independent vectors. But it may also be the case that we get fewer linearly independent eigenvectors than we have multiplicity. The implications of this will be discussed soon.
What happens if $\lambda$ isn't a real number? It turns out this can actually happen even when our matrices only contain real numbers. Remember that real number polynomials are perfectly capable of having complex roots, so there's no real reason for real matrices to magically avoid that problem. Luckily, we'll see that this is not a huge problem for our purposes, so we'll only note that this is possible but it is not something we'll spend a lot of thought on.