CAPP 30271 — Lecture 22

How do we know that a PageRank link matrix has 1 as an eigenvalue? One helpful observation we can make is that $A$ is really a stochastic matrix (i.e. Markov matrix). We know this because each entry contains $\frac 1 n$ for $n$ outgoing links. As it turns out, every stochastic matrix has an eigenvalue of 1.

Every stochastic matrix has 1 as an eigenvalue.

Let $A$ be an $n \times n$ stochastic matrix and let $\mathbf 1 = \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}$ be a vector of all 1s in $\mathbb R^n$. We have $A^T \mathbf 1 = \mathbf 1$, since all columns of $A$ add up to 1. This means that $\mathbf 1$ is an eigenvector for $A^T$ with eigenvalue 1. Since $A$ and $A^T$ have the same eigenvalues, this means that 1 is an eigenvalue for $A$.

This means that if we compute the eigenvalues and eigenvectors of $A$, we will find that one of the eigenvalues is 1.


    >>> A = np.array([[0,   0,   1, 1/2], 
                      [1/3, 0,   0, 0],
                      [1/3, 1/2, 0, 1/2],
                      [1/3, 1/2, 0, 0]])
    >>> L, X = np.linalg.eig(A)
    >>> L
    array([ 1.        +0.j        , -0.36062333+0.41097555j,
           -0.36062333-0.41097555j, -0.27875333+0.j        ])
    >>> X[:,[0]]
    array([[0.72101012+0.j],
           [0.24033671+0.j],
           [0.54075759+0.j],
           [0.36050506+0.j]])

Notice that though there are some complex eigenvalues, we have an eigenvalue of 1 right up front and we can pick out the associated eigenvector. Typically, we would normalize this vector so that it is stochasitc (adds up to 1) and since it's real, we can take the opportunity to discard the complex parts.


    >>> x = np.real(X[:,[0]])
    >>> x = x / np.linalg.norm(x, ord=1)
    >>> x
    array([[0.38709677],
           [0.12903226],
           [0.29032258],
           [0.19354839]])

Even without normalizing, we can see that the most highly ranked page is 1, even though only two pages link to it, compared to page 3, which has three incoming links.

This provides another interesting interpretation for the link matrix: it's really a Markov chain where a web "surfer" (as they were called in the 90s) decides to follow outgoing links on each page with uniform probability.

Symmetric matrices

So far, our discussion of eigenvectors has necessarily focused on square matrices, since only square matrices have eigenvectors. Ultimately, we would like a way to connect this back to matrices of arbitrary size in a meaningful way.

One way to do this is to consider an $m \times n$ matrix $A$ and the product $A^TA$. We have seen this in a few different contexts:

$A^TA$ is involved in projections and least squares approximations.
$A^TA$ can describe notions of similarity among objects represented by columns of $A$.

Informally, this product can capture some notion of correlation or similarity. Structurally, we observe that $A^TA$ is square and symmetric.

A matrix $A$ is symmetric if and only if $A^T = A$.

Symmetric matrices give us a surprisingly strong decomposition result, called the Spectral Theorem.

Let $S$ be a real symmetric matrix. Then $S = Q\Lambda Q^T$ for some orthogonal matrix $Q$.

Suppose $S$ has a nonzero eigenvalue and a zero eigenvalue. Then $S \mathbf x = \lambda \mathbf x$ and $S \mathbf y = 0 \mathbf y$. These tell us that

$\mathbf x$ is in the column space of $S$, and
$\mathbf y$ is in the null space of $S$.

This doesn't seem to help us because the column space and null space are not orthogonal complements. However, $S$ is symmetric—this means its row space is the same as its column space. This is enough to give us that $\mathbf x$ and $\mathbf y$ are orthogonal.

But what if we have another non-zero eigenvalue $\alpha$? We have $S \mathbf y = \alpha \mathbf y$. So consider $S - \alpha I$. This gives us $(S - \alpha I) \mathbf y = \mathbf 0$ and $(S - \alpha I) \mathbf x = (\lambda - \alpha)\mathbf x$. Then $\mathbf y$ is in the null space of $S - \alpha I$ and $\mathbf x$ is in the column space of $S - \alpha I$, which is the row space of $S - \alpha I$. So the two vectors are orthogonal.

There are a few things to note here.

Any real symmetric matrix is guaranteed to be diagonalizable, even if they have repeated eigenvalues and even if they have dependent columns.
Real symmetric matrices are guaranteed to have real eigenvalues.
There is a version of the Spectral Theorem that talks about complex matrices. We will not discuss it here, but it's interesting to note that the result is even stronger than what's presented here.

Consider the matrix $A = \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix}$.


    >>> A = np.array([[2, -1, -1], [-1, 2, -1], [-1, -1, 2]])
    >>> L, X = np.linalg.eig(A)
    >>> L
    array([ 3.0000000e+00, -5.3481844e-16,  3.0000000e+00])

At this point, we see that $A$ has two eigenvalues: 3 and 0. Since $\lambda = 2$ has multiplicity 2, we get two eigenvectors. We want orthonormal eigenvectors. Since NumPy produces normalized eigenvectors, we just have to deal with orthogonality.

The Spectral Theorem guarantees that eigenvectors of different eigenvalues are orthogonal. This means we need to be careful with eigenvectors that have the same eigenvalue. Again, the Spectral Theorem guarantees that we will have independent eigenvectors, but NumPy is not guaranteed to produce orthogonal eigenvectors.


    >>> X
    array([[ 0.81649658, -0.57735027,  0.24423262],
           [-0.40824829, -0.57735027, -0.79684797],
           [-0.40824829, -0.57735027,  0.55261536]])
    >>> X.T @ X
    array([[ 1.00000000e+00, -1.11022302e-16,  2.99122643e-01],
           [-1.11022302e-16,  1.00000000e+00, -3.33066907e-16],
           [ 2.99122643e-01, -3.33066907e-16,  1.00000000e+00]])
    >>> X @ np.diag(L) @ X.T
    array([[ 2.17894871, -1.58384879, -0.59509992],
           [-1.58384879,  2.40490008, -0.82105129],
           [-0.59509992, -0.82105129,  1.41615121]])

Here, we see that it is not the case that $X$ is orthogonal. The corner entries are 0.299, which is not really small enough to be considered 0. But this corresponds to the vectors of eigenvalue 3, so these two vectors are really in the same eigenspace. Since they're independent and in the same space, we just have to orthogonalize them in the usual way: put them into a matrix and compute the QR factorization.


    >>> Qs, _ = np.linalg.qr(X[:,[0,2]])
    >>> Qs
    array([[-8.16496581e-01,  1.00835816e-16],
           [ 4.08248290e-01, -7.07106781e-01],
           [ 4.08248290e-01,  7.07106781e-01]])
    >>> X[:,[0,2]] = Qs
    >>> X
    array([[-8.16496581e-01, -5.77350269e-01,  1.00835816e-16],
           [ 4.08248290e-01, -5.77350269e-01, -7.07106781e-01],
           [ 4.08248290e-01, -5.77350269e-01,  7.07106781e-01]])
    >>> X.T @ X
    array([[ 1.00000000e+00,  2.22044605e-16,  1.11022302e-16],
           [ 2.22044605e-16,  1.00000000e+00, -3.88578059e-16],
           [ 1.11022302e-16, -3.88578059e-16,  1.00000000e+00]])
    >>> X @ np.diag(L) @ X.T
    array([[ 2., -1., -1.],
           [-1.,  2., -1.],
           [-1., -1.,  2.]])

Much better. But in fact, this is so important that NumPy has a special eigenvector function specifically for symmetric matrices called numpy.linalg.eigh.


    >>> L, Q = np.linalg.eigh(A)
    >>> L
    array([1.11022302e-16, 3.00000000e+00, 3.00000000e+00])
    >>> Q
    array([[ 5.77350269e-01, -4.08248290e-01, -7.07106781e-01],
           [ 5.77350269e-01,  8.16496581e-01, -1.65911125e-16],
           [ 5.77350269e-01, -4.08248290e-01,  7.07106781e-01]])
    >>> Q.T @ Q
    array([[ 1.00000000e+00, -2.22044605e-16, -2.62322286e-16],
           [-2.22044605e-16,  1.00000000e+00, -7.99547152e-17],
           [-2.62322286e-16, -7.99547152e-17,  1.00000000e+00]])
    >>> Q @ np.diag(L) @ Q.T
    array([[ 2., -1., -1.],
           [-1.,  2., -1.],
           [-1., -1.,  2.]])

The h in eigh stands for "Hermitian", which are matrices $A$ that satisfy $A = \overline{A^T}$, the transpose of the matrix containing the complex conjugates of $A$. Since we're dealing with real matrices, this is just going to be a symmetric matrix.

It is important to note that NumPy will not stop you from using eigh on non-symmetric matrices and that the result will be incorrect.

Next, here's a definition that seems like it comes out of nowhere but will be quite useful.

A symmetric real matrix $M$ is said to be positive semidefinite if for all vectors $\mathbf v$, $\mathbf v^T M \mathbf v \geq 0$. (The matrix $M$ is positive definite if $\mathbf v^T M \mathbf v \gt 0$.)

First, let's see that this is true of $A^T A$. For any matrix $A$, the matrix $A^TA$ is symmetric. Then using the definition, we have \[\mathbf v^T A^T A \mathbf v = (A \mathbf v)^T (A \mathbf v) = \|A \mathbf v\|^2 \geq 0,\] since lengths of vectors are nonnegative.

This directly leads to the following significant consequence:

The eigenvalues of $A^TA$ are nonnegative.

Let $\mathbf v$ be an eigenvector of $A^TA$ with eigenvalue $\lambda$. So \begin{align*} A^TA \mathbf v &= \lambda \mathbf v \\ \mathbf v^T A^TA \mathbf v &= \mathbf v^T \lambda \mathbf v \\ (A \mathbf v)^T (A \mathbf v) &= \lambda (\mathbf v^T \mathbf v) \\ \|A \mathbf v\|^2 &= \lambda \|\mathbf v\|^2 \\ \lambda &= \frac{\|A \mathbf v\|^2}{\|\mathbf v\|^2} \end{align*}

We know this is nonnegative because $\mathbf v$ is nonzero and vector lengths are nonnegative.

Interestingly enough, this suggests that eigenvalues of $A^TA$ describe the "stretch" factor of a vector in the direction of the eigenvector under transformation by $A$.

Consider the matrix $A = \begin{bmatrix} 1 & 3 & 1 & 0 & 9 \\ 0 & 0 & 0 & 1 & 3 \\ 1 & 3 & 1 & 1 & 12 \end{bmatrix}$.


    >>> A = np.array([[1, 3, 1, 0, 9], [0, 0, 0, 1, 3], [1, 3, 1, 1, 12]])
    >>> L, X = np.linalg.eigh(A.T @ A)
    >>> L
    array([-1.53390816e-14,  1.41158059e-16,  1.11210331e-14,  2.24038498e+00,
        2.55759615e+02])

Notice that the first three eigenvalues are 0.

Why does this matter? Suppose $A$ is $m \times n$ with rank $r \lt m,n$. Consider a diagonalization of $A^TA = Q \Lambda Q^T$. Since $A^TA$ is symmetric, $Q$ is orthogonal. Since $A^TA$ is positive semidefinite, $\Lambda$ is nonnegative.

What is the column space of $Q$? Since columns of $Q$ are eigenvectors, they must be in the column space of $A^TA$, since $A^TA \mathbf x = \lambda x$. So the columns of $Q$ must form a basis for the column space of $A^TA$.

But what is the column space of $A^TA$ in terms of $A$? If we consider a product $A^T A\mathbf x$, then $A \mathbf x$ is a vector in the column space of $A$, which makes $A^T(A \mathbf x)$ a vector in the column space of $A^T$. But the columns of $A^T$ are the rows of $A$, so we're really in the row space of $A$.

This means that the columns of $Q$ are an orthonormal basis for the row space of $A$. But hold on a second: if $r \lt n$, then that can't be the case. The dimension of the row space of $A$ is only $r$, but we have $n$ orthognal vectors in $Q$, which is more than $r$.

The key is the eigenvalues: some of those eigenvalues are positive but some are 0. But what are the eigenvectors with eigenvalue 0? Remember that all columns of $Q$ are orthogonal. Which vectors are orthogonal to vectors in the row space but not in the row space itself? It's the null space.

The columns of $Q$ contain $r$ orthogonal vectors in the row space of $A$ and $n-r$ orthogonal vectors in the null space in the null space of $A$. In other words, we have an orthogonal basis for the row space and an orthogonal basis for the null space.

From our earlier example, the first three eigenvalues we got were 0. If we take these columns and multiply $A$ by them, we get zeros. So these columns are in the null space of $A$.


    >>> A = np.array([[1, 3, 1, 0, 9], [0, 0, 0, 1, 3], [1, 3, 1, 1, 12]])
    >>> L, X = np.linalg.eigh(A.T @ A)
    >>> L
    array([-1.53390816e-14,  1.41158059e-16,  1.11210331e-14,  2.24038498e+00,
        2.55759615e+02])
    >>> A @ X[:,:3]
    array([[ 2.85882429e-15, -4.85722573e-16, -7.13318293e-15],
           [-3.41393580e-15,  4.30211422e-16,  5.24580379e-15],
           [-5.55111512e-16, -5.55111512e-17, -2.10942375e-15]])

Now, take the two remaining columns. Since they're orthogonal, they're independent. If we put them together with the rows of $A$ (as columns), we see that the rank of that matrix is still 2.


    >>> np.linalg.matrix_rank(X[:,3:])
    2
    >>> np.linalg.matrix_rank(np.hstack((A.T, X[:,3:])))
    2

But wait, we can repeat this entire argument using $A^T$ instead, since $(A^T)^T A^T = AA^T$, which is real, symmetric, and positive semidefinite. So we can decompose $AA^T = Q' \Lambda Q'^T$, where $Q'$ is orthogonal and contains a basis for the column space of $A$ and a basis for the left null space of $A$.

Here, the first eigenvalue is 0 so we take the first column and multiply $A^T$ by it. We see we get the 0 vector.


    >>> A = np.array([[1, 3, 1, 0, 9], [0, 0, 0, 1, 3], [1, 3, 1, 1, 12]])
    >>> L, X = np.linalg.eigh(A @ A.T)
    >>> L
    array([-1.77655138e-15,  2.24038498e+00,  2.55759615e+02]) 
    >>> X
    array([[-0.57735027,  0.55647685, -0.59749492],
           [-0.57735027, -0.7956842 , -0.18317563],
           [ 0.57735027, -0.23920735, -0.78067055]])
    >>> A.T @ X[:,:1]
    array([[-5.55111512e-16],
           [-1.77635684e-15],
           [-5.55111512e-16],
           [ 1.99840144e-15],
           [ 8.88178420e-16]])
    >>> np.linalg.matrix_rank(np.hstack((A, X[:,1:])))
    2

One final observation that we can see from the above examples is that the nonzero eigenvalues of $A^TA$ and $AA^T$ are the same. We will see that this provides a link between eigenvectors of $A^TA$ and eigenvectors of $AA^T$, resulting in the singular value decomposition of $A$.