CAPP 30271 — Lecture 9

The reason we want to start thinking about sets of vectors goes back to our discussion of solutions for linear systems. We made the observation that $A \mathbf x = \mathbf b$ has exactly one solution when the columns of $A$ are independent. This is actually a significant restriction: in this case $A$ must be square.

But in fact, most of the matrices we're concerned with aren't square. But such linear systems still describe something useful. Our first step is to recall why it is the case that dependent columns in $A$ means that $A \mathbf x = \mathbf b$ has multiple possible vectors $\mathbf x$.

The reason for this is because once we have linearly dependent columns, we have multiple ways to get $\mathbf 0$, which means that we have multiple solutions \[A \mathbf x + \mathbf 0 = A \mathbf x + A \mathbf u = A(\mathbf x + \mathbf u) = \mathbf b\] where $\mathbf u \neq 0$.

It becomes clear then that the first step is to ask which vectors $\mathbf x$ satisfy $A \mathbf x = \mathbf 0$. It's clear that $\mathbf 0$ is one such vector. If $A$ has independent columns, it is the only solution. Otherwise, there are more vectors. These vectors are interesting.

The null space of an $m \times n$ matrix $A$ is the set of vectors $\mathbf x$ such that $A \mathbf x = \mathbf 0$.

Our first observation is that this is a set of vectors from $\mathbb R^n$, since a vector needs to have $n$ components to be multiplied by an $m \times n$ matrix. Our next result is important.

Let $A$ be an $m \times n$ matrix. Then the null space of $A$, $\mathbf N(A)$ is a subspace of $\mathbb R^n$.

To check that $\mathbf N(A)$ is a vector space, we just need to confirm that $\mathbf 0$ is in $\mathbf N(A)$, which it clearly is, since $A \mathbf 0 = \mathbf 0$. The other vector space axioms come for free because we know vector addition and scalar multiplication over $\mathbb R^n$ works as expected.

All that remains is to check that linear combinations of vectors stay inside $\mathbf N(A)$.

First, we take two vectors, say $\mathbf u$ and $\mathbf v$ in $\mathbf N(A)$. Since they are in $\mathbf N(A)$, we know that $A \mathbf u = \mathbf 0$ and $A \mathbf v = \mathbf 0$. By matrix multiplication, we have $A(\mathbf u + \mathbf v) = \mathbf 0$. Therefore, $\mathbf u + \mathbf v$ is in $\mathbf N(A)$.
Then, take a vector $\mathbf v$ in $\mathbf N(A)$ and any scalar $c$. We have $A(c\mathbf v) = c(A \mathbf v) = \mathbf 0$. Therefore, $c\mathbf v$ is in $\mathbf N(A)$.

Therefore, the linear combinations of vectors in $\mathbf N(A)$ are still in $\mathbf N(A)$, so $\mathbf N(A)$ is a subspace.

Let's stop here to consider what we have here. It seems obvious that matrices have a column space—matrices are all about describing vectors as linear combinations of its columns, we have learned. And given how much we talked about rows, it seems obvious that there's such a thing as a row space—again, how rows of matrices are combined seems very important. But here we have stumbled upon another vector space associated with a matrix. This suggests that these vectors, the vectors that get mapped to $\mathbf 0$ by our matrix are special in some way and hold some special structure. Let's hold on to that thought.

So how do we find all the vectors that satisfy $A \mathbf x = \mathbf 0$? When we had a square matrix, elimination allowed us to transform $A$ into an upper triangular matrix $U$. With a rectangular matrix, we can still use elimination (with a few new tricks) to get something useful. We call this matrix $R$ and consider the solutions $R \mathbf x = \mathbf 0$.

An example of the kind of matrix we're looking for is $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 2 & 3 \end{bmatrix}$. If we want to solve $R \mathbf x = \mathbf 0$, we recall that we have equations \begin{align*} x_1 + 2x_3 + 4x_4 &= 0 \\ x_2 + 3x_3 + 4x_4 &= 0 \end{align*} Here, we seem to run into an issue: we have too many variables and not enough equations! But this is only a problem if we want exactly one solution, which we know that can't be the case here.

What we will do instead is to try to find and describe the vectors that satisfy our equation. We know that there is one such vector: $\mathbf 0$, but that is not so helpful. How many vectors do we need? In order to solve this system, we observe that there are two variables that occur in both of our equations: $x_3$ and $x_4$. This suggests that we want two vectors.

To do this, we set $x_3 = 1$ and $x_4 = 0$ to get one solution. In this case, we get the equations \begin{align*} x_1 + 2 &= 0 \\ x_2 + 3 &= 0 \end{align*} which gives us $x_1 = -2$ and $x_2 = -3$. Then if we set $x_3 = 0$ and $x_4 = 1$, we have \begin{align*} x_1 + 4 &= 0 \\ x_2 + 4 &= 0 \end{align*} which gives us $x_1 = -4$ and $x_2 = -4$.

This gives us two vectors: the first gets us $\mathbf s_1 = \begin{bmatrix} -2 \\ -3 \\ 1 \\ 0 \end{bmatrix}$ and the second gets us $\mathbf s_2 = \begin{bmatrix} -4 \\ -4 \\ 0 \\ 1 \end{bmatrix}$. These solutions that we get are called special solutions. The solutions are "special" because they're not just zeroes.

These vectors are an important starting point for describing the vectors that satisfy $R \mathbf x = \mathbf 0$—we know that we can also combine them to get other vectors that satisfy the equation. This means that we can describe all vectors satisfying $R \mathbf x = \mathbf 0$ by \[\mathbf x = c_1 \mathbf s_1 + c_2 \mathbf s_2\] for any scalars $c_1$ and $c_2$. (This also tells us that all such vectors form a plane.)

But why can we do this? It's because we know we have a subspace. Since we're in a subspace, we know that we can combine our special solutions and we're guaranteed to get other solutions that satisfy $R \mathbf x = \mathbf 0$. We'll also see soon that our special solutions span the null space, meaning that we know that we have every solution just with these vectors.

To see why that is, we need to first talk about $R$. The form of the matrix $R$ is also special: it is called row-reduced echelon form.

We can view this matrix as a generalized version of an upper triangular matrix. Such a matrix has rows arranged from "longest" to shortest, where each row begins with a $1$ and all entries to the left of the $1$ are $0$. We have the additional restriction that the leading 1s of each row are the only nonzero entries in their respective columns.

Just as we treated the diagonal of $U$ as "pivots", we treat the leading entries of each row as a "pivot". The remaining columns correspond to "free variables". For example, above we had $x_3$ and $x_4$ free, since their respective columns were non-pivot columns.

We can formally define special solutions with respect to the columns of row-reduced echelon form matrices.

Let $R$ be a matrix in row-reduced echelon form. The special solutions of the equation $R \mathbf x = \mathbf 0$ are the vectors $\mathbf s_1, \dots, \mathbf s_k$ obtained for each free variable corresponding to non-pivot columns.

New and improved elimination for computing $R$

The goal now is to get from $A$ to $R$. We will rely on elimination again, but first we will need to extend elimination for this purpose. To do so, we now allow the following operations:

The two differences from elimination we had before are that we are now able to modify rows above as well and that we are able to scale rows. The goal is to get to a matrix $R$ in which each nonzero row begins with a 1, so being to scale is necessary. These two generalizations also happen to be what allows us to compute $A^{-1}$. This is not a coincidence.

Let $A = \begin{bmatrix} 1 & 3 & 11 & 16 \\ 2 & 7 & 25 & 36 \end{bmatrix}$. To perform elimination, we first want to get rid of the entry underneath $1$ in the first column. We can do this by subtracting 2 of the first row from it. This gives us the matrix $\begin{bmatrix} 1 & 3 & 11 & 16 \\ 0 & 1 & 3 & 4 \end{bmatrix}$. Now, since we have a $1$ in the second column of the second row, we now want to clear out the entry above, so we remove 3 of the second row from the first and we arrive at $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$.

Notice that applying these operations gives us $I$ at the start of $R$. This means that we had inadvertently computed the inverse of the first half of the matrix! To see this, we consider the two elimination matrices that got us there and multiply them: \[\begin{bmatrix} 1 & -3 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ -2 & 1 \end{bmatrix} = \begin{bmatrix} 7 & -3 \\ -2 & 1 \end{bmatrix}.\] One can check that the inverse of this matrix is indeed $\begin{bmatrix} 1 & 3 \\ 2 & 7 \end{bmatrix}$ by thinking back to inverses of elimination matrices. In this way, we get a very nice relationship between the parts of $R$ and $A$.

We can envision this by dividing $A$ into two blocks of matrices $A = \begin{bmatrix} W & H \end{bmatrix}$. Then our elimination matrix is $W^{-1}$ and applying this to $A$, we get \[W^{-1} A = W^{-1} \begin{bmatrix} W & H \end{bmatrix} = \begin{bmatrix} I & W^{-1}H \end{bmatrix} = \begin{bmatrix} I & F \end{bmatrix} = R.\]

This leads us somewhere interesting: $R$ turns out to be the same as the $R$ in $A = CR$. This means that $I$ identifies the independent columns of $A$ based on $C$ and $F$ describes combinations of these columns.

Since we had $A = \begin{bmatrix} 1 & 3 & 11 & 16 \\ 2 & 7 & 25 & 36 \end{bmatrix}$, the matrix $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$ tells us that the independent columns of $A$ are $\begin{bmatrix} 1 \\ 2 \end{bmatrix}$ and $\begin{bmatrix} 3 \\ 7 \end{bmatrix}$. This gives us $C = \begin{bmatrix} 1 & 3 \\ 2 & 7 \end{bmatrix}$. One can then verify that the remaining columns of $A$ are combined according to $R$.

This hints at another interesting property that we'll describe in more detail soon. Recall that we said that the rank of a matrix $r$ is the number of its independent columns. This happens to be the same number as the number of pivot columns in $R$. This tells us that the number of special solutions is exactly $n - r$. This number tells us how many vectors we need to describe the non-zero vectors in the null space.

For instance, we can think about the situation we had before: what happens to the null space when we have exactly one solution for $A \mathbf x = \mathbf 0$? If we carry out LU factorization, we arrive at an upper triangular matrix $U$. But if we go further, we end up with $R = I$. This tells us:

CAPP 30271 — Lecture 9

The null space

New and improved elimination for computing $R$