CAPP 30271 — Lecture 10

A more complete picture

Now, $A = CR$ has reappeared and we now have a systematic way for computing it. However, the claim that $A = CR = C\begin{bmatrix} I & F \end{bmatrix}$ implies that the independent columns of $A$ are always the first few on the left. This obviously can't be the case.

Consider the matrix $A = \begin{bmatrix} 3 & 9 & 0 & 3 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix}$. Let's do elimination. \[\begin{bmatrix} 3 & 9 & 0 & 3 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix} \xrightarrow{\frac 1 3 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix} \xrightarrow{(3) - 5 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 1 & 3 & 1 & 6 \\ 0 & 0 & 0 & 0 \end{bmatrix} \xrightarrow{(2) - (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]

We arrive at $R = \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix}$. There are two observations to make here. The first is that we have a row of zeros. This tells us that the number of independent rows in our matrix is less than the number of rows. In reality, there are two versions of the row-reduced echelon form: one containing only the non-zero rows, which we saw earlier as $R$, and one that contains them, sometimes called $R_0$. The difference is that the second contains as many rows as $A$ does and is what gets produced via elimination. Since we only use the non-zero rows, we have $R$.

The second observation is that we do not have our nice decomposition of $R$ into $\begin{bmatrix} I & F \end{bmatrix}$. There is a reason for this—this $R$ is telling us that the independent columns of $A$ are the first and the third columns. So the order of the columns matters!

However, this is not a huge problem. Observe that if we rearranged the columns, we would have $\begin{bmatrix} I & F \end{bmatrix}$. It just so happens that our columns are out of order. Luckily, just as we can encode the ordering of rows in the LU factorization with permutation, we can also encode the ordering of the columns of $R$ with permutation. (Or rather than fix, we have a way of getting the right form)

So this gives us \[A = CR = C \begin{bmatrix} I & F \end{bmatrix} P.\] Recall that the $P$ on the right means that we are permuting columns.

Expression $R = \begin{bmatrix} I & F \end{bmatrix}$ in these terms also gives us a fast way to identify the special solutions. If $R = \begin{bmatrix} I & F \end{bmatrix}$, then we have \[\begin{bmatrix} I & F \end{bmatrix} \begin{bmatrix} -F \\ I \end{bmatrix} = \mathbf 0.\] Then we can read off the columns of $\begin{bmatrix} -F \\ I \end{bmatrix}$ to get our special solutions.

A quick note about this notation: the $F$ and $-F$ are in the exact same orientation, but the two $I$s can be different! For example, if we have a $2 \times 5$ matrix $R$, then in $\begin{bmatrix} I & F \end{bmatrix}$, $I$ is $2 \times 2$ and $F$ is $2 \times 3$, but in $\begin{bmatrix} -F \\ I \end{bmatrix}$, $-F$ is still $2 \times 3$ but $I$ is now $3 \times 3$ to fill the space. This is what mathematicians sometimes refer to as "abuse of notation": when we use the same name for different things because we assume that everyone can follow along from the context (read: we're lazy). (A computer scientist might call this "overloading")

But what if we have a permutation? Easy: just remember to apply the permutation. \[\begin{bmatrix} I & F \end{bmatrix} P P^T \begin{bmatrix} -F \\ I \end{bmatrix} = \mathbf 0.\] Here, we take advantage of two facts at once: that $P^{-1} = P^T$ and that we permute columns or rows depending on which side of the matrix we're on.

From the previous example, we arrived at $R = \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix}$. We need to shuffle the columns around, swapping columns 2 and 3 to get it into $\begin{bmatrix} I & F \end{bmatrix}$ form. This is done by the permutation matrix $\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$. As a result, we get $\begin{bmatrix} -3 \\ 0 \\ 1 \\ 0 \end{bmatrix}$ and $\begin{bmatrix} -1 \\ -5 \\ 0 \\ 1 \end{bmatrix}$. But we need to remember to shuffle things back into place, swapping rows 2 and 3 again—in this case the permutation matrix is the same. Doing this, we get $\mathbf s_1 = \begin{bmatrix} -3 \\ 1 \\ 0 \\ 0 \end{bmatrix}$ and $\mathbf s_2 = \begin{bmatrix} -1 \\ 0 \\ -5 \\ 1 \end{bmatrix}$.

Solving $A \mathbf x = \mathbf b$

The reappearance of $A = CR$ is an opportune time to revisit some ideas that we planted at the beginning of the quarter. Abstractly, we understand this to mean that $C$ is the set of independent columns and $R$ being the set of linear combinations of those columns. From a "data" perspective, we can view $C$ as being representative profiles and $R$ as a set of weights for these profiles that correspond to particular accounts. \[\begin{bmatrix} \bigg\vert & \bigg\vert & & \bigg\vert \\ \mathbf c_1 & \mathbf c_2 & \dots & \mathbf c_m \\ \bigg\vert & \bigg\vert & & \bigg\vert \\\end{bmatrix} \begin{bmatrix} \rule[0.5ex]{3em}{0.5pt} & \mathbf r_1 & \rule[0.5ex]{3em}{0.5pt} \\ \rule[0.5ex]{3em}{0.5pt} & \mathbf r_2 & \rule[0.5ex]{3em}{0.5pt} \\ & \vdots & \\ \rule[0.5ex]{3em}{0.5pt} & \mathbf r_n & \rule[0.5ex]{3em}{0.5pt} \\ \end{bmatrix}\]

There's an obvious connection to subspaces here: all the accounts are ultimately represented in the column space as a combination of the representative profiles. However, it's not quite as simple as saying that $C$ contains everything we need to know—it's the combination of $C$ and $R$ that makes our data meaningful. We saw in a homework that the column space of $AB$ is contained in the column space of $A$, but this doesn't mean they're the same thing—the column space of $AB$ can be smaller than the column space of $A$.

But we can also view things from the perspective of the row space. Recall that each row can be viewed as an item (a movie, a book, etc.) and the row contains all of the scores or sentiment of the users reviewing that item. In this case, we can view $R$ as containing the analogous "representative profiles" of the items.

So what does the null space tell us? Recall that when we take $A \mathbf x$, the vector $\mathbf x$ encodes a series of weights, just like columns of $R$ does. If $A$ is invertible, its columns are independent and every $\mathbf x$ gets mapped to a unique vector. This is what we have been "solving" for. But the fact that every set of weights gives us something different may not be that interesting—it doesn't tell us much about what's similar to it. This is why this case is rare—$n$ equations for $n$ variables just tells us that everything is unique.

But if $A \mathbf x = \mathbf b$ doesn't have a unique solution, linear algebra tells us that $A$ has dependent columns because there are more columns than rows. One way to think about this is that we are taking something with a lot of information ($n$ measurements) and turning it into something with less information ($m$ measurements). Since you can represent and differentiate more things with $n$ measurements than $m$ measurements, you'll inevitably get natural groupings of data.

What the null space tells us is what parts of the data get flattened away—two vectors $\mathbf u$ and $\mathbf v$ give us $A \mathbf u = A \mathbf v = \mathbf b$. And since it's a subspace, we have the benefit of taking advantage of the structure: we know that there are only a certain number of representative items and that we can rely on linear combinations to describe the whole null space.

This ability to describe all vectors in the null space now lets us describe all solutions to the linear equation $A \mathbf x = \mathbf b$. We proceed with elimination again, but this time we need to make sure that we are also applying our elimination operations on the vector $\mathbf b$. The easiest way to do this is to treat the entire system as one matrix $\begin{bmatrix} A & \mathbf b \end{bmatrix}$.

Consider $A \mathbf x = \mathbf b$ with $A = \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 2 & 6 & 1 & 7 \end{bmatrix}$ and $\mathbf b = \begin{bmatrix} 2 \\ 3 \\ 7 \end{bmatrix}$. We need to perform elimination on this system and this time, we'll make sure to explicitly include $\mathbf b$. Thus we have \[\begin{bmatrix} 1 & 3 & 0 & 1 & 2 \\ 0 & 0 & 1 & 5 & 3 \\ 2 & 6 & 1 & 7 & 7 \end{bmatrix} \xrightarrow{(3) - 2 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 & 2 \\ 0 & 0 & 1 & 5 & 3 \\ 0 & 0 & 1 & 5 & 3 \end{bmatrix} \xrightarrow{(3) - (2)} \begin{bmatrix} 1 & 3 & 0 & 1 & 2 \\ 0 & 0 & 1 & 5 & 3 \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix} \] Now, to solve $A \mathbf x = \mathbf b$, we solve for $\mathbf x$ in the reduced equation $\mathbf R \mathbf x = \mathbf d$, where $\mathbf R$ is our matrix in RREF and $\mathbf d = \begin{bmatrix} 2 \\ 3 \\ 0 \end{bmatrix}$.

The fact that our augmented matrix has a row of zeros at the bottom turns out to be important—it indicates that our system is actually solvable. Why? Suppose that after row reduction we instead had a row that looked like $\begin{bmatrix} 0 & 0 & 0 & 0 & 2 \end{bmatrix}$. This is the equation $0 = 2$, which is never true!

We know already that not every system of linear equations is going to be solvable when we don't have an invertible matrix. So how do we know which vectors $\mathbf b$ do have solutions? The hint is to look at what happens after row reduction: \[\begin{bmatrix} 1 & 3 & 0 & 1 & b_1 \\ 0 & 0 & 1 & 5 & b_2 \\ 2 & 6 & 1 & 7 & b_3 \end{bmatrix} \xrightarrow{(3) - 2 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 & b_1 \\ 0 & 0 & 1 & 5 & b_2 \\ 0 & 0 & 1 & 5 & b_3 - 2b_1 \end{bmatrix} \xrightarrow{(3) - (2)} \begin{bmatrix} 1 & 3 & 0 & 1 & b_1 \\ 0 & 0 & 1 & 5 & b_2 \\ 0 & 0 & 0 & 0 & b_3 - 2b_1 - b_2 \end{bmatrix} \] This tells us that any vector $\mathbf b$ that is solvable must have $-2b_1 - b_2 + b_3 = 0$, or $b_3 = 2b_1 + b_2$. And because we know that $A \mathbf x = \mathbf b$ is solvable only when $\mathbf b$ is in the column space of $A$, this also tells us which vectors are in the column space of $A$.