CAPP 30271 — Lecture 9

The row-reduced echelon form

We need to first talk about $R$. The form of the matrix $R$ is also special: it is called row-reduced echelon form.

A matrix $R$ is in row-reduced echelon form if

The first non-zero entry of each row is to the right of the first non-zero entry of every row above it.
The first non-zero entry of each row is 1 and all other entries in this column are 0.
Any rows that are all zeros are at the bottom.

We can view this matrix as a generalized version of an upper triangular matrix. Such a matrix has rows arranged from "longest" to shortest, where each row begins with a $1$ and all entries to the left of the $1$ are $0$. We have the additional restriction that the leading 1s of each row are the only nonzero entries in their respective columns.

Just as we treated the diagonal of $U$ as "pivots", we treat the leading entries of each row as a "pivot". The remaining columns correspond to "free variables". For example, above we had $x_3$ and $x_4$ free, since their respective columns were non-pivot columns.

We can formally define special solutions with respect to the columns of row-reduced echelon form matrices.

Let $R$ be a matrix in row-reduced echelon form. The special solutions of the equation $R \mathbf x = \mathbf 0$ are the vectors $\mathbf s_1, \dots, \mathbf s_k$ obtained for each free variable corresponding to non-pivot columns.

The special solutions for $R \mathbf x = \mathbf 0$ span the null space of $R$.

New and improved elimination for computing $R$

The goal now is to get from $A$ to $R$. We will rely on elimination again, but first we will need to extend elimination for this purpose. To do so, we now allow the following operations:

Subtracting a multiple of one row from another, both above or below
Scale any row
Exchange rows (permutation)

The two differences from elimination we had before are that we are now able to modify rows above as well and that we are able to scale rows. The goal is to get to a matrix $R$ in which each nonzero row begins with a 1, so being to scale is necessary. These two generalizations also happen to be what allows us to compute $A^{-1}$. This is not a coincidence.

Let $A = \begin{bmatrix} 2 & 6 & 22 & 32 \\ 2 & 7 & 25 & 36 \end{bmatrix}$. To perform elimination, we first want to get rid of the entry underneath $1$ in the first column. We can do this by subtracting 1 of the first row from it. This gives us the matrix $\begin{bmatrix} 2 & 6 & 22 & 32 \\ 0 & 1 & 3 & 4 \end{bmatrix}$. Now, since we have a $1$ in the second column of the second row, we now want to clear out the entry above, so we remove 6 of the second row from the first and we arrive at $R = \begin{bmatrix} 2 & 0 & 22 & 32 \\ 0 & 1 & 3 & 4 \end{bmatrix}$. Finally, we require that the first non-zero of every row is a 1, so we need to scale the first row by $\frac 1 2$. This gives us the matrix $\begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$.

Notice that applying these operations gives us $I$ at the start of $R$. This means that we had inadvertently computed the inverse of the first half of the matrix! To see this, we consider the two elimination matrices that got us there and multiply them: \[\begin{bmatrix} 1 & -3 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ -2 & 1 \end{bmatrix} = \begin{bmatrix} 7 & -3 \\ -2 & 1 \end{bmatrix}.\] One can check that the inverse of this matrix is indeed $\begin{bmatrix} 1 & 3 \\ 2 & 7 \end{bmatrix}$ by thinking back to inverses of elimination matrices. In this way, we get a very nice relationship between the parts of $R$ and $A$.

We can envision this by dividing $A$ into two blocks of matrices $A = \begin{bmatrix} W & H \end{bmatrix}$. Then our elimination matrix is $W^{-1}$ and applying this to $A$, we get \[W^{-1} A = W^{-1} \begin{bmatrix} W & H \end{bmatrix} = \begin{bmatrix} I & W^{-1}H \end{bmatrix} = \begin{bmatrix} I & F \end{bmatrix} = R.\]

This leads us somewhere interesting: $R$ turns out to be the same as the $R$ in $A = CR$. This means that $I$ identifies the independent columns of $A$ based on $C$ and $F$ describes combinations of these columns.

Since we had $A = \begin{bmatrix} 1 & 3 & 11 & 16 \\ 2 & 7 & 25 & 36 \end{bmatrix}$, the matrix $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$ tells us that the independent columns of $A$ are $\begin{bmatrix} 1 \\ 2 \end{bmatrix}$ and $\begin{bmatrix} 3 \\ 7 \end{bmatrix}$. This gives us $C = \begin{bmatrix} 1 & 3 \\ 2 & 7 \end{bmatrix}$. One can then verify that the remaining columns of $A$ are combined according to $R$.

This hints at another interesting property that we'll describe in more detail soon. Recall that we said that the rank of a matrix $r$ is the number of its independent columns. This happens to be the same number as the number of pivot columns in $R$. This tells us that the number of special solutions is exactly $n - r$. This number tells us how many vectors we need to describe the non-zero vectors in the null space.

For instance, we can think about the situation we had before: what happens to the null space when we have exactly one solution for $A \mathbf x = \mathbf 0$? If we carry out LU factorization, we arrive at an upper triangular matrix $U$. But if we go further, we end up with $R = I$. This tells us:

$C = A$, so every column of $A$ is independent
There's no $F$ columns in $R$. This seems to suggest that there are no vectors in the null space, but that's not true—it just tells us there are no special solutions because there are no free variables. But we always have one solution: $\mathbf 0$. In other words, the null space is our subspace consisting of only $\mathbf 0$.
Since there are $n$ columns in $A$ and $r = n$ independent columns, we get $n-r = 0$.

Now, $A = CR$ has reappeared and we now have a systematic way for computing it. However, the claim that $A = CR = C\begin{bmatrix} I & F \end{bmatrix}$ implies that the independent columns of $A$ are always the first few on the left. This obviously can't be the case.

Consider the matrix $A = \begin{bmatrix} 3 & 9 & 0 & 3 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix}$. Let's do elimination. \[\begin{bmatrix} 3 & 9 & 0 & 3 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix} \xrightarrow{\frac 1 3 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 1 & 3 & 1 & 6 \\ 5 & 15 & 0 & 5 \end{bmatrix} \xrightarrow{(3) - 5 \times (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 1 & 3 & 1 & 6 \\ 0 & 0 & 0 & 0 \end{bmatrix} \xrightarrow{(2) - (1)} \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]

We arrive at $R = \begin{bmatrix} 1 & 3 & 0 & 1 \\ 0 & 0 & 1 & 5 \\ 0 & 0 & 0 & 0 \end{bmatrix}$. There are two observations to make here. The first is that we have a row of zeros. This tells us that the number of independent rows in our matrix is less than the number of rows. In reality, there are two versions of the row-reduced echelon form: one containing only the non-zero rows, which we saw earlier as $R$, and one that contains them, sometimes called $R_0$. The difference is that the second contains as many rows as $A$ does and is what gets produced via elimination. Since we only use the non-zero rows, we have $R$.

The second observation is that we do not have our nice decomposition of $R$ into $\begin{bmatrix} I & F \end{bmatrix}$. There is a reason for this—this $R$ is telling us that the independent columns of $A$ are the first and the third columns. So the order of the columns matters!

However, this is not a huge problem. Observe that if we rearranged the columns, we would have $\begin{bmatrix} I & F \end{bmatrix}$. It just so happens that our columns are out of order. Luckily, just as we can encode the ordering of rows in the LU factorization with permutation, we can also encode the ordering of the columns of $R$ with permutation. (Or rather than fix, we have a way of getting the right form)

So this gives us \[A = CR = C \begin{bmatrix} I & F \end{bmatrix} P.\] Recall that the $P$ on the right means that we are permuting columns.