The goal now is to get from $A$ to $R$. We will rely on elimination again, but first we will need to extend elimination for this purpose. To do so, we now allow the following operations:
The two differences from elimination we had before are that we are now able to modify rows above as well and that we are able to scale rows. The goal is to get to a matrix $R$ in which each nonzero row begins with a 1, so being to scale is necessary. These two generalizations also happen to be what allows us to compute $A^{-1}$. This is not a coincidence.
Let $A = \begin{bmatrix} 2 & 6 & 22 & 32 \\ 2 & 7 & 25 & 36 \end{bmatrix}$. To perform elimination, we first want to get rid of the entry underneath $1$ in the first column. We can do this by subtracting 1 of the first row from it. This gives us the matrix $\begin{bmatrix} 2 & 6 & 22 & 32 \\ 0 & 1 & 3 & 4 \end{bmatrix}$. Now, since we have a $1$ in the second column of the second row, we now want to clear out the entry above, so we remove 6 of the second row from the first and we arrive at $R = \begin{bmatrix} 2 & 0 & 22 & 32 \\ 0 & 1 & 3 & 4 \end{bmatrix}$. Finally, we require that the first non-zero of every row is a 1, so we need to scale the first row by $\frac 1 2$. This gives us the matrix $\begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$.
In NumPy, this proceeds pretty straightforwardly.
>>> A = np.array([[2, 6, 22, 32], [2, 7, 25, 36]])
>>> A[1] = A[1] - A[0]
>>> A
array([[ 2, 6, 22, 32],
[ 0, 1, 3, 4]])
>>> A[0] = A[0] - 6 * A[1]
>>> A
array([[2, 0, 4, 8],
[0, 1, 3, 4]])
>>> A[0] = 1/2 * A[0]
>>> A
array([[1, 0, 2, 4],
[0, 1, 3, 4]])
Notice that applying these operations gives us $I$ at the start of $R$. This means that we had inadvertently computed the inverse of the first half of the matrix! To see this, we consider the three matrices for the operations in the elimination process that got us there and multiply them: \[\overbrace{\begin{bmatrix} \frac 1 2 & 0 \\ 0 & 1 \end{bmatrix}}^{\text{scale row 1 by $\frac 1 2$}} \underbrace{\begin{bmatrix} 1 & -6 \\ 0 & 1 \end{bmatrix}}_{\text{subtract 6 of row 2 from row 1}} \overbrace{\begin{bmatrix} 1 & 0 \\ -1 & 1 \end{bmatrix}}^{\text{subtract row 1 from row 2}} = \begin{bmatrix} \frac 7 2 & -3 \\ -1 & 1 \end{bmatrix}.\] One can check that the inverse of this matrix is indeed $\begin{bmatrix} 2 & 6 \\ 2 & 7 \end{bmatrix}$ by thinking back to inverses of elimination matrices. In this way, we get a very nice relationship between the parts of $R$ and $A$.
>>> A = np.array([[2, 6, 22, 32], [2, 7, 25, 36]])
>>> E0 = np.eye(2)
>>> E0[1,0] = -1
>>> E0
array([[ 1., 0.],
[-1., 1.]])
>>> E1 = np.eye(2)
>>> E1[0,1] = -6
>>> E1
array([[ 1., -6.],
[ 0., 1.]])
>>> E2 = np.diag([1/2, 1])
>>> E2
array([[0.5, 0. ],
[0. , 1. ]])
>>> E2 @ E1 @ E0
array([[ 3.5, -3. ],
[-1. , 1. ]])
>>> E2 @ E1 @ E0 @ A
array([[1., 0., 2., 4.],
[0., 1., 3., 4.]])
>>> np.linalg.inv(E2 @ E1 @ E0)
array([[2., 6.],
[2., 7.]])
We can envision this by dividing $A$ into two blocks of matrices $A = \begin{bmatrix} W & H \end{bmatrix}$. Then our elimination matrix is $W^{-1}$ and applying this to $A$, we get \[W^{-1} A = W^{-1} \begin{bmatrix} W & H \end{bmatrix} = \begin{bmatrix} I & W^{-1}H \end{bmatrix} = \begin{bmatrix} I & F \end{bmatrix} = R.\]
Here, we see that we end up with a matrix where we have the pivots, in the form of the identity matrix $I$ perfectly separated from $F$, the columns representing the free variables.
This leads us somewhere interesting: $R$ turns out to be the same as the $R$ in $A = CR$. This means that $I$ identifies the independent columns of $A$ based on $C$ and $F$ describes combinations of these columns.
Since we had $A = \begin{bmatrix} 2 & 6 & 22 & 32 \\ 2 & 7 & 25 & 36 \end{bmatrix}$, the matrix $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$ tells us that the independent columns of $A$ are $\begin{bmatrix} 2 \\ 2 \end{bmatrix}$ and $\begin{bmatrix} 6 \\ 7 \end{bmatrix}$. This gives us $C = \begin{bmatrix} 2 & 2 \\ 6 & 7 \end{bmatrix}$. One can then verify that the remaining columns of $A$ are combined according to $R$.
>>> A = np.array([[2, 6, 22, 32], [2, 7, 25, 36]])
>>> R = np.array([[1, 0, 2, 4], [0, 1, 3, 4]])
>>> C = A[:,[0,1]]
>>> C
array([[2, 6],
[2, 7]])
>>> C @ R[:,[2]]
array([[22],
[25]])
>>> C @ R[:,[3]]
array([[32],
[36]])
This hints at another interesting property that we'll describe in more detail soon. Recall that we said that the rank of a matrix $r$ is the number of its independent columns. This happens to be the same number as the number of pivot columns in $R$. This tells us that the number of special solutions is exactly $n - r$. This number tells us how many vectors we need to describe the non-zero vectors in the null space.
For instance, we can think about the situation we had before: what happens to the null space when we have exactly one solution for $A \mathbf x = \mathbf 0$? If we carry out LU factorization, we arrive at an upper triangular matrix $U$. But if we go further, we end up with $R = I$. This tells us:
Now, $A = CR$ has reappeared and we now have a systematic way for computing it. However, the claim that $A = CR = C\begin{bmatrix} I & F \end{bmatrix}$ implies that the independent columns of $A$ are always the first few on the left. This obviously can't be the case.
Consider the matrix $A = \begin{bmatrix} 3 & 9 & 0 & 3 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix}$. Let's do elimination. \[\begin{bmatrix} 3 & 9 & 0 & 3 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix} \xrightarrow{\frac 1 3 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix} \xrightarrow{(3) - 5 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 1 & 3 & 1 & 6 \\ 0 & 0 & 0 & 0 \end{bmatrix} \xrightarrow{(2) - (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]
>>> A = np.array([[3, 9, 0, 3], [1, 3, 1, 6], [5, 15, 0, 5]])
>>> A[0] = 1/3 * A[0]
>>> A[2] = A[2] - 5 * A[0]
>>> A[1] = A[1] - A[0]
>>> A
array([[1, 3, 0, 1],
[0, 0, 1, 5],
[0, 0, 0, 0]])
We arrive at $R = \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix}$. There are two observations to make here. The first is that we have a row of zeros. This tells us that the number of independent rows in our matrix is less than the number of rows. In reality, there are two versions of the row-reduced echelon form: one containing only the non-zero rows, which we saw earlier as $R$, and one that contains them, sometimes called $R_0$. The difference is that the second contains as many rows as $A$ does and is what gets produced via elimination. Since we only use the non-zero rows, we have $R$.
The second observation is that we do not have our nice decomposition of $R$ into $\begin{bmatrix} I & F \end{bmatrix}$. There is a reason for this—this $R$ is telling us that the independent columns of $A$ are the first and the third columns. So the order of the columns matters!
However, this is not a huge problem. Observe that if we rearranged the columns, we would have $\begin{bmatrix} I & F \end{bmatrix}$. It just so happens that our columns are out of order. Luckily, just as we can encode the ordering of rows in the LU factorization with permutation, we can also encode the ordering of the columns of $R$ with permutation. (Or rather than fix, we have a way of getting the right form)
So this gives us \[A = CR = C \begin{bmatrix} I & F \end{bmatrix} P.\] Recall that the $P$ on the right means that we are permuting columns.
The reappearance of $A = CR$ is an opportune time to revisit some ideas that we planted at the beginning of the quarter. Abstractly, we understand this to mean that $C$ is the set of independent columns and $R$ being the set of linear combinations of those columns. From a "data" perspective, we can view $C$ as being representative profiles and $R$ as a set of weights for these profiles that correspond to particular accounts. \[\begin{bmatrix} \bigg\vert & \bigg\vert & & \bigg\vert \\ \mathbf c_1 & \mathbf c_2 & \dots & \mathbf c_m \\ \bigg\vert & \bigg\vert & & \bigg\vert \\\end{bmatrix} \begin{bmatrix} \rule[0.5ex]{3em}{0.5pt} & \mathbf r_1 & \rule[0.5ex]{3em}{0.5pt} \\ \rule[0.5ex]{3em}{0.5pt} & \mathbf r_2 & \rule[0.5ex]{3em}{0.5pt} \\ & \vdots & \\ \rule[0.5ex]{3em}{0.5pt} & \mathbf r_n & \rule[0.5ex]{3em}{0.5pt} \\ \end{bmatrix}\]
There's an obvious connection to subspaces here: all the accounts are ultimately represented in the column space as a combination of the representative profiles. However, it's not quite as simple as saying that $C$ contains everything we need to know—it's the combination of $C$ and $R$ that makes our data meaningful. We saw in a homework that the column space of $AB$ is contained in the column space of $A$, but this doesn't mean they're the same thing—the column space of $AB$ can be smaller than the column space of $A$.
But we can also view things from the perspective of the row space. Recall that each row can be viewed as an item (a movie, a book, etc.) and the row contains all of the scores or sentiment of the users reviewing that item. In this case, we can view $R$ as containing the analogous "representative profiles" of the items.
So what does the null space tell us? Recall that when we take $A \mathbf x$, the vector $\mathbf x$ encodes a series of weights, just like columns of $R$ does. If $A$ is invertible, its columns are independent and every $\mathbf x$ gets mapped to a unique vector. This is what we have been "solving" for. But the fact that every set of weights gives us something different may not be that interesting—it doesn't tell us much about what's similar to it. This is why this case is rare—$n$ equations for $n$ variables just tells us that everything is unique.
But if $A \mathbf x = \mathbf b$ doesn't have a unique solution, linear algebra tells us that $A$ has dependent columns because there are more columns than rows. One way to think about this is that we are taking something with a lot of information ($n$ measurements) and turning it into something with less information ($m$ measurements). Since you can represent and differentiate more things with $n$ measurements than $m$ measurements, you'll inevitably get natural groupings of data.
What the null space tells us is what parts of the data get flattened away—two vectors $\mathbf u$ and $\mathbf v$ give us $A \mathbf u = A \mathbf v = \mathbf b$. And since it's a subspace, we have the benefit of taking advantage of the structure: we know that there are only a certain number of representative items and that we can rely on linear combinations to describe the whole null space.
This ability to describe all vectors in the null space now lets us describe all solutions to the linear equation $A \mathbf x = \mathbf b$. We proceed with elimination again, but this time we need to make sure that we are also applying our elimination operations on the vector $\mathbf b$. The easiest way to do this is to treat the entire system as one matrix $\begin{bmatrix} A & \mathbf b \end{bmatrix}$.
Consider $A \mathbf x = \mathbf b$ with $A = \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 2 & 6 & 1 & 7 \end{bmatrix}$ and $\mathbf b = \begin{bmatrix} 2 \\ 3 \\ 7 \end{bmatrix}$. We need to perform elimination on this system and this time, we'll make sure to explicitly include $\mathbf b$. Thus we have \[\begin{bmatrix} 1 & 3 & 0 & 1 & 2 \\ 0 & 0 & 1 & 5 & 3 \\ 2 & 6 & 1 & 7 & 7 \end{bmatrix} \xrightarrow{(3) - 2 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 & 2 \\ 0 & 0 & 1 & 5 & 3 \\ 0 & 0 & 1 & 5 & 3 \end{bmatrix} \xrightarrow{(3) - (2)} \begin{bmatrix} 1 & 3 & 0 & 1 & 2 \\ 0 & 0 & 1 & 5 & 3 \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix} \] Now, to solve $A \mathbf x = \mathbf b$, we solve for $\mathbf x$ in the reduced equation $\mathbf R \mathbf x = \mathbf d$, where $\mathbf R$ is our matrix in RREF and $\mathbf d = \begin{bmatrix} 2 \\ 3 \\ 0 \end{bmatrix}$.
>>> A = np.array([[1, 3, 0, 1], [0, 0, 1, 5], [2, 6, 1, 7]])
>>> b = np.array([2, 3, 7])
>>> B = np.hstack((A,b.reshape(3,1)))
>>> B
array([[1, 3, 0, 1, 2],
[0, 0, 1, 5, 3],
[2, 6, 1, 7, 7]])
>>> B[2] = B[2] - 2 * B[0]
>>> B[2] = B[2] - B[1]
>>> B
array([[1, 3, 0, 1, 2],
[0, 0, 1, 5, 3],
[0, 0, 0, 0, 0]])
The fact that our augmented matrix has a row of zeros at the bottom turns out to be important—it indicates that our system is actually solvable. Why? Suppose that after row reduction we instead had a row that looked like $\begin{bmatrix} 0 & 0 & 0 & 0 & 2 \end{bmatrix}$. This is the equation $0 = 2$, which is never true!
We know already that not every system of linear equations is going to be solvable when we don't have an invertible matrix. So how do we know which vectors $\mathbf b$ do have solutions? The hint is to look at what happens after row reduction: \[\begin{bmatrix} 1 & 3 & 0 & 1 & b_1 \\ 0 & 0 & 1 & 5 & b_2 \\ 2 & 6 & 1 & 7 & b_3 \end{bmatrix} \xrightarrow{(3) - 2 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 & b_1 \\ 0 & 0 & 1 & 5 & b_2 \\ 0 & 0 & 1 & 5 & b_3 - 2b_1 \end{bmatrix} \xrightarrow{(3) - (2)} \begin{bmatrix} 1 & 3 & 0 & 1 & b_1 \\ 0 & 0 & 1 & 5 & b_2 \\ 0 & 0 & 0 & 0 & b_3 - 2b_1 - b_2 \end{bmatrix} \] This tells us that any vector $\mathbf b$ that is solvable must have $-2b_1 - b_2 + b_3 = 0$, or $b_3 = 2b_1 + b_2$. And because we know that $A \mathbf x = \mathbf b$ is solvable only when $\mathbf b$ is in the column space of $A$, this also tells us which vectors are in the column space of $A$.