CAPP 30271 — Lecture 23

Deriving the SVD

Returning to the SVD, recall that $A^TA$ is a symmetric $n \times n$ matrix and $AA^T$ is a symmetric $m \times m$ matrix. By the spectral theorem, both of these admit diagonalizations $Q \Lambda Q^T$. Now, suppose we know $A = U \Sigma V^T$. Then

$A^TA = (U \Sigma V^T)^T (U \Sigma V^T) = (V \Sigma^T U^T)(U \Sigma V^T) = V \Sigma^2 V^T$, and
$AA^T = (U \Sigma V^T)(U \Sigma V^T)^T = (U \Sigma V^T) (V \Sigma^T U^T) = U \Sigma^2 U^T$.

Of course, this is entirely the wrong way to go about things: we need to show that $A = U \Sigma V^T$ first. But looking at it like this, we can see that in both cases, we have a diagonal matrix, $\Sigma^2$ surrounded by an orthogonal matrix and its transpose. This tells us that

$V$ are the orthonormal eigenvectors of $A^TA$,
$U$ are the orthonormal eigenvectors of $AA^T$, and
The eigenvalues of $A^TA$ and $AA^T$ are the squares of the singular values $\sigma_1, \dots, \sigma_r$.

If we were diagonalizing a symmetric matrix, we could stop here, since it's clear how eigenvalues and eigenvectors are related. However, with SVD, we have a bit more work to do: we have to ensure that the left singular vectors and right singular vectors are associated with the correct singular values.

Recall that we have $AV = U\Sigma$. This means that each right singular vector $\mathbf v_k$ will determine the associated left singular vector by the equation \[A \mathbf v_k = \sigma_k \mathbf u_k.\] Of course, in order to know $\mathbf v_k$, we needed to compute $\lambda_k = \sigma_k^2$, so we already have all the pieces to find $\mathbf u_k = \frac{A \mathbf v_k}{\sigma_k}$. Let's convince ourselves that $\mathbf u_k$ is actually a right singular vector. To do that we need to first show that it's an eigenvector of $AA^T$.

Let $\mathbf u_k = \frac{A \mathbf v_k}{\sigma_k}$. Then $\mathbf u_k$ is an eigenvector of $AA^T$ with eigenvalue $\sigma_k^2$.

We see that \[AA^T \mathbf u_k = AA^T \frac{A \mathbf v_k}{\sigma_k} = A \frac{A^TA \mathbf v_k}{\sigma_k} = A \frac{\sigma_k^2 \mathbf v_k}{\sigma_k} = \sigma_k^2 \mathbf u_k.\]

Then, we want to verify that all of these eigenvectors $\mathbf u_k$ are orthonormal.

Let $\mathbf u_j = \frac{A \mathbf v_j}{\sigma_j}$ and $\mathbf u_k = \frac{A \mathbf v_k}{\sigma_k}$. Then $\mathbf u_j$ and $\mathbf u_k$ are orthonormal.

We have \[\mathbf u_j^T \mathbf u_k = \left(\frac{A \mathbf v_j}{\sigma_j}\right)^T \frac{A \mathbf v_k}{\sigma_k} = \frac{\mathbf v_j^T A^TA\mathbf v_k}{\sigma_j\sigma_k} = \frac{\sigma_k}{\sigma_j} \mathbf v_j^T \mathbf v_k.\] Since we started off with the assumption that the right singular vectors are orthonormal, we have that $\mathbf v_j^T \mathbf v_k$ is either 0 or 1, depending on whether or not $j = k$.

From our earlier example, we have \[A A^T = \begin{bmatrix}5 & 2\\2 & 5\end{bmatrix}, \quad A^T A = \begin{bmatrix}1 & 2 & 0\\2 & 5 & 2\\0 & 2 & 4\end{bmatrix}\] The characteristic polynomial for $A^T A$ is \[(1-\lambda)((5-\lambda)(4-\lambda) -4) - 2(2(4-\lambda)) = -\lambda^3(\lambda^2-10\lambda + 21).\] We verify that the characteristc polynomial for $AA^T$ is \[(5-\lambda)^2 - 4 = \lambda^2 - 10 \lambda + 21.\] The nonzero roots of these polynomials are 7 and 3. This then gives us the singular values we saw earlier of $\sqrt 7$ and $\sqrt 3$. One can then solve for the eigenvectors of $A^TA$ in the usual way. Our initial example then shows how these eigenvectors (the right singular vectors of $A$) can be used to obtain the left singular vectors of $A$.

In summary, to compute the SVD for a matrix $A$:

Begin by computing the orthonormal eigenvectors and eigenvalues of $A^TA$. This gives singular vectors $\mathbf v_1, \dots, \mathbf v_r$.
The singular values of $A$ are $\sigma_1, \dots, \sigma_r$, where $\sigma_i = \sqrt{\lambda_i}$ for each $i$ from $1$ to $r$.
For each right singular vector $\mathbf v_i$, compute the left singular vector $\mathbf u_i = \frac{A\mathbf v_i}{\sigma_i}$.

Full vs. reduced SVD

By now, you may have noticed something amiss: our initial definition of the SVD involved matrices of an $m \times r$ matrix $U$, an $r \times r$ diagonal matrix $\Sigma$, and an $n \times r$ matrix $V$. However, the outline of our computation suggests that we have an $m \times m$ matrix $U = AA^T$ and an $n \times n$ matrix $V = A^TA$. If this is the case, we are left with an $m \times n$ matrix $\Sigma$ in order to make the product $U \Sigma V^T$ work. What gives?

The definition we gave earlier is really technically the reduced SVD. The reduced SVD can be thought of as the "useful" part of the SVD. Why?

Suppose we have a matrix $A$ with rank $r \lt m,n$. In this case, $A$ has $r$ nonzero singular values. However, we can end up with more singular values: each of which would be 0. Then what goes in the rest of the "real" $\Sigma$? We would take our $r \times r$ matrix and pad it out with 0's until it is $m \times n$.

Notice that because the rest of the diagonal of $\Sigma$ would be 0s anyway, any additional vectors aren't needed to reconstruct $A$. We would have \begin{align*} A &= \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T + \sigma_{r+1} \mathbf u_{r+1} \mathbf v_{r+1}^T + \cdots + \sigma_{\min\{m,n\}} \mathbf u_{\min\{m,n\}} \mathbf v_{\min\{m,n\}}^T\\ &= \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T + 0 + \cdots + 0 \\ &= \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T \end{align*}

This leads to the next question: what exactly are the missing columns of $U$ and $V$? There are some hints:

We've already seen that they do not contribute anything to $A$. We know this because their associated singular values are 0.
If $U$ and $V$ are orthogonal matrices, these missing columns must be unit vectors that are orthogonal to the first $r$ columns in each matrix. In other words, they can't just be zeros!
There are exactly $m-r$ missing columns in $U$ and exactly $n-r$ missing columns in $V$.

The answer is that the missing columns in $U$ are a basis for the left null space of $A$: they are orthogonal to the first $r$ columns of $U$, which are a basis for the column space of $A$, and there are exactly $m-r$ of them. Similarly, the missing columns in $V$ are a basis for the null space of $A$: they are orthogonal to the first $r$ columns of $V$, a basis for the row space of $A$, and there are exactly $n-r$ of them.

These missing columns together with the matrices we already discussed give us the full singular value decomposition of $A$. In this sense, the SVD really does give us all we want to know about our matrix: not only do we get orthonormal bases for the column and row spaces, like we would expect, but we also get orthonormal bases for the left and ordinary null spaces.

Let $A$ be an $m \times n$ matrix of rank $r$. Then \[A = U \Sigma V^T = \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T,\] where $U$ is an orthogonal $m \times m$ matrix, $\Sigma$ is an $m \times n$ diagonal matrix, and $V$ is an orthogonal $n \times n$ matrix such that

The first $r$ columns of $U$ are an orthonormal basis for the column space of $A$.
The first $r$ columns of $V$ are an orthonormal basis for the row space of $A$.
The last $m-r$ columns of $U$ are an orthonormal basis for the left null space of $A$.
The last $n-r$ columns of $V$ are an orthonormal basis for the null space of $A$.
The first $r$ diagonal entries of $\Sigma$ are the nonzero singular values of $A$.

Geometry and transformations

Let's go back to geometry and transformations to have another look at what the SVD is saying. Recall that we can view a matrix $A$ as a transformation of vectors in $\mathbb R^n$ to vectors in $\mathbb R^m$. The SVD gives us a more detailed breakdown of this process. We know that if $A = U \Sigma V^T$, then the transformation $A$ can be viewed as a transformation $V^T$, followed by $\Sigma$, followed by $U$.

First, we make a key observation: orthogonal matrices only rotate vectors. How do we know this? If $Q$ is an orthogonal matrix, its determinant is 1: since $Q^TQ = I$, we have $\det Q^T \det Q = \det I = 1$, but $\det Q = \det Q^T$, so $\det Q = 1$. Intuitively, the "volume" doesn't change after transformation under $Q$. This gives us the following breakdown:

$V^T$ is a rotation of vectors in $\mathbb R^m$.
$\Sigma$ is a scaling of those rotated vectors.
$U$ is a rotation of those scaled vectors in $\mathbb R^n$.

There are some interesting things to note here:

This suggests that there's a correspondence between vectors in the row space and vectors in the column space of $A$, since they have dimension $r$.
The vectors that are the basis for the null spaces get scaled to 0 by $\Sigma$.
Since orthogonal matrices only rotate, the only scaling that ever happens is in $\Sigma$—this is why the singular values play such an important role.

Computing the full SVD

If we want to compute the full SVD for $A$, we need to add some steps to our process.

Begin by computing the orthonormal eigenvectors and eigenvalues of $A^TA$. This gives singular vectors $\mathbf v_1, \dots, \mathbf v_r$.
The singular values of $A$ are $\sigma_1, \dots, \sigma_r$, where $\sigma_i = \sqrt{\lambda_i}$ for each $i$ from $1$ to $r$.
For each right singular vector $\mathbf v_i$, compute the left singular vector $\mathbf u_i = \frac{A\mathbf v_i}{\sigma_i}$.
New: Compute an orthonormal basis for the null space of $A$ and add it to $V$.
New: Compute an orthonormal basis for the left null space of $A$ and add it to $U$.

From our example above, we have $A = \begin{bmatrix} 1& 2 & 0 \\ 0 & 1 & 2 \end{bmatrix}$ with the reduced SVD \[U = \frac 1 {\sqrt 2} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}, \quad \Sigma = \begin{bmatrix} \sqrt 7 & 0 \\ 0 & \sqrt 3 \end{bmatrix}, \quad V = \begin{bmatrix} \frac 1 {\sqrt{14}} & -\frac 1 {\sqrt 6} \\ \frac 3 {\sqrt{14}} & -\frac 1 {\sqrt 6} \\ \frac 2 {\sqrt{14}} & \frac 2 {\sqrt 6} \end{bmatrix}. \] To get the full SVD for $A$, we need to expand $\Sigma$ and $V$. We do not need to change $U$, since it is already the right size ($2 \times 2$). First, we simply pad $\Sigma$ so that it becomes a $2 \times 3$ matrix. We get \[\Sigma = \begin{bmatrix} \sqrt 7 & 0 & 0 \\ 0 & \sqrt 3 & 0 \end{bmatrix}.\]

Then we need to make $V$ a $3 \times 3$ matrix. To do this, we need to compute the null space of $A$. The rref of $A$ is $\begin{bmatrix} 1 & 0 & -4 \\ 0 & 1 & 2 \end{bmatrix}$, which gives us a special solution of $\begin{bmatrix} 4 \\ -2 \\ 1 \end{bmatrix}$.

Here, we note that this vector is orthogonal to the existing columns of $V$, just as we promised. There are two reasons for this:

As we mentioned already, the columns of $V$ are a basis for the row space of $A$, so they are necessarily orthogonal to a vector from the null space of $A$.
However, remember also that this vector is an eigenvector of $A^TA$ and its associated eigenvalue is 0. Recall that we are guaranteed that eigenvectors associated with different eigenvalues for a symmetric matrix must be orthogonal.

So all that is left to do is normalize our vector, giving us \[V = \begin{bmatrix} \frac 1 {\sqrt{14}} & -\frac 1 {\sqrt 6} & \frac 4 {\sqrt{21}} \\ \frac 3 {\sqrt{14}} & -\frac 1 {\sqrt 6} & \frac{-2}{\sqrt{21}} \\ \frac 2 {\sqrt{14}} & \frac 2 {\sqrt 6} & \frac{1}{\sqrt{21}} \end{bmatrix}. \]

The significance of the existence of the SVD is both completely natural and kind of astonishing. It is astonishing because it's actually pretty incredible that such a decomposition that contains evertyhing we want to know about our matrix exists—particularly when all of the decompositions we've seen before require our matrices to have special properties.

And yet, this is a completely natural result. If we believe that matrices have certain structures to them, like the fundamental subspaces, and these structures themselves have properties of their own, then of course there should be a decomposition that exposes these structures.

In a way, the SVD is the natural ending point for the work that we've done in this course:

We begin with systems of linear equations, which we can study from the perspective of subspaces of the matrix.
Each of these subspaces is related via their dimension and orthogonality and are described by their bases.
The most convenient bases to work with are orthonormal bases.
The orthogonal bases for our subspaces are really eigenvectors, which are connected via the eigenvalues (the singular values) of symmetric matrices that are hidden in our matrix.

We have one more trick up our sleeve, which is arguably the most useful piece of the SVD: it gives us a convenient tool to describe large, complex, high-dimensional data—low-rank approximations. This is a fitting conclusion to the theme of the course: decomposing large, complex matrices into simpler, tractable parts that expose the underlying structures.