Returning to the SVD, recall that $A^TA$ is a symmetric $n \times n$ matrix and $AA^T$ is a symmetric $m \times m$ matrix. By the spectral theorem, both of these admit diagonalizations $Q \Lambda Q^T$. Now, suppose we know $A = U \Sigma V^T$. Then
Of course, this is entirely the wrong way to go about things: we need to show that $A = U \Sigma V^T$ first. But looking at it like this, we can see that in both cases, we have a diagonal matrix, $\Sigma^2$ surrounded by an orthogonal matrix and its transpose. This tells us that
If we were diagonalizing a symmetric matrix, we could stop here, since it's clear how eigenvalues and eigenvectors are related. However, with SVD, we have a bit more work to do: we have to ensure that the left singular vectors and right singular vectors are associated with the correct singular values.
Recall that we have $AV = U\Sigma$. This means that each right singular vector $\mathbf v_k$ will determine the associated left singular vector by the equation \[A \mathbf v_k = \sigma_k \mathbf u_k.\] Of course, in order to know $\mathbf v_k$, we needed to compute $\lambda_k = \sigma_k^2$, so we already have all the pieces to find $\mathbf u_k = \frac{A \mathbf v_k}{\sigma_k}$. Let's convince ourselves that $\mathbf u_k$ is actually a right singular vector. To do that we need to first show that it's an eigenvector of $AA^T$.
Let $\mathbf u_k = \frac{A \mathbf v_k}{\sigma_k}$. Then $\mathbf u_k$ is an eigenvector of $AA^T$ with eigenvalue $\sigma_k^2$.
We see that \[AA^T \mathbf u_k = AA^T \frac{A \mathbf v_k}{\sigma_k} = A \frac{A^TA \mathbf v_k}{\sigma_k} = A \frac{\sigma_k^2 \mathbf v_k}{\sigma_k} = \sigma_k^2 \mathbf u_k.\]
Then, we want to verify that all of these eigenvectors $\mathbf u_k$ are orthonormal.
Let $\mathbf u_j = \frac{A \mathbf v_j}{\sigma_j}$ and $\mathbf u_k = \frac{A \mathbf v_k}{\sigma_k}$. Then $\mathbf u_j$ and $\mathbf u_k$ are orthonormal.
We have \[\mathbf u_j^T \mathbf u_k = \left(\frac{A \mathbf v_j}{\sigma_j}\right)^T \frac{A \mathbf v_k}{\sigma_k} = \frac{\mathbf v_j^T A^TA\mathbf v_k}{\sigma_j\sigma_k} = \frac{\sigma_k}{\sigma_j} \mathbf v_j^T \mathbf v_k.\] Since we started off with the assumption that the right singular vectors are orthonormal, we have that $\mathbf v_j^T \mathbf v_k$ is either 0 or 1, depending on whether or not $j = k$.
From our earlier example, we have \[A A^T = \begin{bmatrix}5 & 2\\2 & 5\end{bmatrix}, \quad A^T A = \begin{bmatrix}1 & 2 & 0\\2 & 5 & 2\\0 & 2 & 4\end{bmatrix}\] The characteristic polynomial for $A^T A$ is \[(1-\lambda)((5-\lambda)(4-\lambda) -4) - 2(2(4-\lambda)) = -\lambda^3(\lambda^2-10\lambda + 21).\] We verify that the characteristc polynomial for $AA^T$ is \[(5-\lambda)^2 - 4 = \lambda^2 - 10 \lambda + 21.\] The nonzero roots of these polynomials are 7 and 3. This then gives us the singular values we saw earlier of $\sqrt 7$ and $\sqrt 3$. One can then solve for the eigenvectors of $A^TA$ in the usual way. Our initial example then shows how these eigenvectors (the right singular vectors of $A$) can be used to obtain the left singular vectors of $A$.
In summary, to compute the SVD for a matrix $A$:
By now, you may have noticed something amiss: our initial definition of the SVD involved matrices of an $m \times r$ matrix $U$, an $r \times r$ diagonal matrix $\Sigma$, and an $n \times r$ matrix $V$. However, the outline of our computation suggests that we have an $m \times m$ matrix $U = AA^T$ and an $n \times n$ matrix $V = A^TA$. If this is the case, we are left with an $m \times n$ matrix $\Sigma$ in order to make the product $U \Sigma V^T$ work. What gives?
The definition we gave earlier is really technically the reduced SVD. The reduced SVD can be thought of as the "useful" part of the SVD. Why?
Suppose we have a matrix $A$ with rank $r \lt m,n$. In this case, $A$ has $r$ nonzero singular values. However, we can end up with more singular values: each of which would be 0. Then what goes in the rest of the "real" $\Sigma$? We would take our $r \times r$ matrix and pad it out with 0's until it is $m \times n$.
Notice that because the rest of the diagonal of $\Sigma$ would be 0s anyway, any additional vectors aren't needed to reconstruct $A$. We would have \begin{align*} A &= \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T + \sigma_{r+1} \mathbf u_{r+1} \mathbf v_{r+1}^T + \cdots + \sigma_{\min\{m,n\}} \mathbf u_{\min\{m,n\}} \mathbf v_{\min\{m,n\}}^T\\ &= \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T + 0 + \cdots + 0 \\ &= \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T \end{align*}
This leads to the next question: what exactly are the missing columns of $U$ and $V$? There are some hints:
The answer is that the missing columns in $U$ are a basis for the left null space of $A$: they are orthogonal to the first $r$ columns of $U$, which are a basis for the column space of $A$, and there are exactly $m-r$ of them. Similarly, the missing columns in $V$ are a basis for the null space of $A$: they are orthogonal to the first $r$ columns of $V$, a basis for the row space of $A$, and there are exactly $n-r$ of them.
These missing columns together with the matrices we already discussed give us the full singular value decomposition of $A$. In this sense, the SVD really does give us all we want to know about our matrix: not only do we get orthonormal bases for the column and row spaces, like we would expect, but we also get orthonormal bases for the left and ordinary null spaces.
Let $A$ be an $m \times n$ matrix of rank $r$. Then \[A = U \Sigma V^T = \sigma_1 \mathbf u_1 \mathbf v_1^T + \cdots + \sigma_r \mathbf u_r \mathbf v_r^T,\] where $U$ is an orthogonal $m \times m$ matrix, $\Sigma$ is an $m \times n$ diagonal matrix, and $V$ is an orthogonal $n \times n$ matrix such that
Let's go back to geometry and transformations to have another look at what the SVD is saying. Recall that we can view a matrix $A$ as a transformation of vectors in $\mathbb R^n$ to vectors in $\mathbb R^m$. The SVD gives us a more detailed breakdown of this process. We know that if $A = U \Sigma V^T$, then the transformation $A$ can be viewed as a transformation $V^T$, followed by $\Sigma$, followed by $U$.
First, we make a key observation: orthogonal matrices only rotate vectors. How do we know this? If $Q$ is an orthogonal matrix, its determinant is 1: since $Q^TQ = I$, we have $\det Q^T \det Q = \det I = 1$, but $\det Q = \det Q^T$, so $\det Q = 1$. Intuitively, the "volume" doesn't change after transformation under $Q$. This gives us the following breakdown:
There are some interesting things to note here:
If we want to compute the full SVD for $A$, we need to add some steps to our process.
From our example above, we have $A = \begin{bmatrix} 1& 2 & 0 \\ 0 & 1 & 2 \end{bmatrix}$ with the reduced SVD \[U = \frac 1 {\sqrt 2} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}, \quad \Sigma = \begin{bmatrix} \sqrt 7 & 0 \\ 0 & \sqrt 3 \end{bmatrix}, \quad V = \begin{bmatrix} \frac 1 {\sqrt{14}} & -\frac 1 {\sqrt 6} \\ \frac 3 {\sqrt{14}} & -\frac 1 {\sqrt 6} \\ \frac 2 {\sqrt{14}} & \frac 2 {\sqrt 6} \end{bmatrix}. \] To get the full SVD for $A$, we need to expand $\Sigma$ and $V$. We do not need to change $U$, since it is already the right size ($2 \times 2$). First, we simply pad $\Sigma$ so that it becomes a $2 \times 3$ matrix. We get \[\Sigma = \begin{bmatrix} \sqrt 7 & 0 & 0 \\ 0 & \sqrt 3 & 0 \end{bmatrix}.\]
Then we need to make $V$ a $3 \times 3$ matrix. To do this, we need to compute the null space of $A$. The rref of $A$ is $\begin{bmatrix} 1 & 0 & -4 \\ 0 & 1 & 2 \end{bmatrix}$, which gives us a special solution of $\begin{bmatrix} 4 \\ -2 \\ 1 \end{bmatrix}$.
Here, we note that this vector is orthogonal to the existing columns of $V$, just as we promised. There are two reasons for this:
So all that is left to do is normalize our vector, giving us \[V = \begin{bmatrix} \frac 1 {\sqrt{14}} & -\frac 1 {\sqrt 6} & \frac 4 {\sqrt{21}} \\ \frac 3 {\sqrt{14}} & -\frac 1 {\sqrt 6} & \frac{-2}{\sqrt{21}} \\ \frac 2 {\sqrt{14}} & \frac 2 {\sqrt 6} & \frac{1}{\sqrt{21}} \end{bmatrix}. \]
The significance of the existence of the SVD is both completely natural and kind of astonishing. It is astonishing because it's actually pretty incredible that such a decomposition that contains evertyhing we want to know about our matrix exists—particularly when all of the decompositions we've seen before require our matrices to have special properties.
And yet, this is a completely natural result. If we believe that matrices have certain structures to them, like the fundamental subspaces, and these structures themselves have properties of their own, then of course there should be a decomposition that exposes these structures.
In a way, the SVD is the natural ending point for the work that we've done in this course:
We have one more trick up our sleeve, which is arguably the most useful piece of the SVD: it gives us a convenient tool to describe large, complex, high-dimensional data—low-rank approximations. This is a fitting conclusion to the theme of the course: decomposing large, complex matrices into simpler, tractable parts that expose the underlying structures.