CAPP 30271 — Lecture 15

Projections on subspaces, continued

First, let's recall the 1D case. In the 1D case, we had a vector $\mathbf b$ that we wanted to project onto a subspace $S$. This subspace has dimension 1, so it has a single basis vector $\mathbf a$. Then the projection of $\mathbf b$ onto $S$ is a vector $\mathbf p = \mathbf a \mathbf{\hat x}$ for some scalar $\mathbf{\hat x}$.

We want the projection to be the closest vector in $S$ to $\mathbf b$. Unless $\mathbf b = \mathbf p$ (i.e., it is actually in $S$), then there is some "distance" from $\mathbf p$ to $\mathbf b$, which we can describe as a vector, which we call $\mathbf e$ for "error". We then have the relationship $\mathbf b = \mathbf p + \mathbf e$. Since we want the distance between $\mathbf b$ and $\mathbf p$ to be as small as possible, this means we must minimize $\|\mathbf e\|$.

To minimize $\|\mathbf e\|$, we get that this occurs when $\mathbf e$ is orthogonal to $\mathbf a$. So we have that $\mathbf a \cdot \mathbf e = \mathbf 0$. Since $\mathbf b = \mathbf p + \mathbf e$, we can rewrite this in the following way: \begin{align*} \mathbf a \cdot \mathbf e &= \mathbf 0 \\ \mathbf a \cdot (\mathbf b - \mathbf p) &= \mathbf 0 \\ \mathbf a \cdot (\mathbf b - \mathbf a \mathbf{\hat x}) &= \mathbf 0 \end{align*}

But recall that we can write $\mathbf u \cdot \mathbf v$ as $\mathbf u^T \mathbf v$. So we have \[\mathbf a^T (\mathbf b - \mathbf a \mathbf{\hat x}) = \mathbf 0 \] Notice that when we return to the multi-dimensional case, we swap out the single basis vector $\mathbf a$ for a matrix $A$ with columns $\mathbf a_1, \dots, \mathbf a_n$ which are the basis vectors for $S$ and $\mathbf{\hat x}$ is now a vector. We get that \[\mathbf A^T (\mathbf b - \mathbf A \mathbf{\hat x}) = \mathbf 0 \]

From here, we need to find $\mathbf{\hat x}$ to find the projection $\mathbf p = A\mathbf{\hat x}$. First, we rearrange the equation to give us \[A^TA\mathbf{\hat x} = A^T \mathbf b.\] The above equation suggests that if we could get rid of $A^T$ on both sides, we'd be good. Now, we get to do something we've avoided until now: make use of the inverse. We observe that if we do this, we have \[\mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf b.\] This is just the matrix version of $\mathbf{\hat x} = \frac{\mathbf a^T \mathbf b}{\mathbf a^T \mathbf a} = (\mathbf a^T \mathbf a)^{-1} \mathbf a^T \mathbf b$.

This seems fine, but there should be a red flag going off in your head: why are we allowed to do this? Let's go through the possible pitfalls:

So that's good. One complication with this is that we now have to compute the inverse for this and we haven't done too much of that.

We need to find $\mathbf{\hat x}$ in order to find the projection $\mathbf p = A \mathbf{\hat x}$. We have \[\mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf b.\] One complication with this is that we now have to compute the inverse for this and we haven't done too much of that. Luckily, we've had plenty of practice doing elimination for the row-reduced echelon form. The process is pretty much the same:

  1. For an invertible $n \times n$ matrix $A$, construct the augmented matrix $\begin{bmatrix} A & I \end{bmatrix}$, where $I$ is the $n \times n$ identity matrix.
  2. Perform elimination until $A$ is in row-reduced echelon form.
  3. Since $A$ is invertible, all of its columns are independent, so the rref of $A$ is $I$! The resulting matrix is $\begin{bmatrix} I & A^{-1} \end{bmatrix}$.

In fact, we saw this work out the very first time we performed elimination to produce a rref for a matrix.

Once we have found $\mathbf{\hat x}$, it is not hard to find the projection vector $\mathbf p$: \[\mathbf p = A \mathbf{\hat x} = A(A^T A)^{-1} A^T \mathbf b.\] Of course, this is just what we get via straightforward substitution from our previous result. We make one more observation to get the projection matrix: \[P = A(A^T A)^{-1} A^T.\] Again, this is just the matrix that we multiply by $\mathbf b$ from the previous equation.

If we take a closer look at what we've done, we see that it is essentially the same as what we did for projection on a line: \[\mathbf p = \mathbf a \mathbf{\hat x} = \mathbf a \frac{\mathbf a^T \mathbf b}{\mathbf a^T \mathbf a} = \mathbf a (\mathbf a ^T \mathbf a )^{-1} \mathbf a ^T \mathbf b.\]

Suppose we want to project $\mathbf b = \begin{bmatrix} 3 \\ 1 \\ 5 \end{bmatrix}$ onto the subspace spanned by the vectors $\begin{bmatrix} 0 \\ 1 \\ 2 \end{bmatrix}$ and $\begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}$. First, we put our vectors into a matrix $A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix}$. Now, the problem becomes projecting $\mathbf b$ onto the column space of $A$. With $A$, we compute $A^TA$, \[\begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix} = \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix}.\] Then we compute $A^T \mathbf b$, \[\begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 2 \end{bmatrix} \begin{bmatrix} 3 \\ 1 \\ 5 \end{bmatrix} = \begin{bmatrix} 9 \\ 11 \end{bmatrix}.\]

To solve for $\mathbf{\hat x}$, we need $(A^T A)^{-1}$. We can do this by elimination. \[\begin{bmatrix} 3 & 3 & 1 & 0 \\ 3 & 5 & 0 & 1 \end{bmatrix} \rightarrow \begin{bmatrix} 3 & 3 & 1 & 0 \\ 0 & 2 & -1 & 1 \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 1 & \frac 1 3 & 0 \\ 0 & 1 & -\frac 1 2 & \frac 1 2 \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 0 & \frac 5 6 & -\frac 1 2 \\ 0 & 1 & -\frac 1 2 & \frac 1 2 \end{bmatrix}\] This gives us $(A^TA)^{-1} = \begin{bmatrix} \frac 5 6 & -\frac 1 2 \\ -\frac 1 2 & \frac 1 2 \end{bmatrix}$. We then compute $\mathbf{\hat x} = (A^TA)^{-1} A^T \mathbf b$, \[\begin{bmatrix} \frac 5 6 & -\frac 1 2 \\ -\frac 1 2 & \frac 1 2 \end{bmatrix} \begin{bmatrix} 9 \\ 11 \end{bmatrix} = 9 \begin{bmatrix} \frac 5 6 \\ -\frac 1 2 \end{bmatrix} + 11 \begin{bmatrix} -\frac 1 2 \\ \frac 1 2 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}.\]

Alternatively, we could have set up the system of equations for $A^T A \mathbf{\hat x} = A^T \mathbf b$: \[\begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix} \begin{bmatrix} \hat x_1 \\ \hat x_2 \end{bmatrix} = \begin{bmatrix} 9 \\ 11 \end{bmatrix},\] which gives us the system \begin{align*} 3\hat x_1 + 3 \hat x_2 &= 9 \\ 3\hat x_1 + 5 \hat x_2 &= 11 \end{align*} From this, it's not too hard to see that $\mathbf{\hat x} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}$.

This then gives us the projection $\mathbf p = A \mathbf{\hat x}$, which is \[\begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \end{bmatrix} = 2 \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}.\]

Finally, the projection matrix is $P = A(A^TA)^{-1} A^T$, which is

Let's return to our original problem: we want to solve $A \mathbf x = \mathbf b$ but $\mathbf b$ is not in the column space of $A$. This means we want to find $\mathbf x$ that gives us the vector in the column space of $A$ that is closest to $\mathbf b$. This vector is $\mathbf p = A \mathbf{\hat x}$, the projection of $\mathbf b$ onto the column space of $A$.

How do we find $\mathbf{\hat x}$? This depends on the error vector $\mathbf e = \mathbf b - A \mathbf{\hat x}$. Since we want $\mathbf b$ and $\mathbf p$ to be as close as possible, we must have that $\mathbf e$ that minimizes $\|\mathbf e\|$, which means it is perpendicular to the column space. But this means $\mathbf e$ is in the left null space of $A$! This gives us \[A^T\mathbf e = A^T (\mathbf b - A \mathbf{\hat x}) = \mathbf 0,\] so $A^T A \mathbf{\hat x} = A^T \mathbf b$.

In summary, suppose we want to compute the projection of a vector onto a subspace.

  1. Find a basis for your subspace. Take this basis and make them the columns of a matrix $A$. Your subspace is now the column space of $A$. Have $A^T$ ready. It will be useful.
  2. Now, suppose you have a specific vector $\mathbf b$. Find $\mathbf{\hat x} = (A^TA)^{-1} A^T \mathbf b$. Note that this involves computing $(A^TA)^{-1}$, usually by elimination.
  3. Now we can compute $\mathbf p = A \mathbf{\hat x}$.
  4. Finally, to acquire the projection matrix, we compute $P = A(A^TA)^{-1}A^T$.