CAPP 30271 — Lecture 9

The null space

The reason we want to start thinking about sets of vectors goes back to our discussion of solutions for linear systems. We made the observation that $A \mathbf x = \mathbf b$ has exactly one solution when the columns of $A$ are independent. This is actually a significant restriction: in this case $A$ must be square.

But in fact, most of the matrices we're concerned with aren't square. But such linear systems still describe something useful. Our first step is to recall why it is the case that dependent columns in $A$ means that $A \mathbf x = \mathbf b$ has multiple possible vectors $\mathbf x$.

The reason for this is because once we have linearly dependent columns, we have multiple ways to get $\mathbf 0$, which means that we have multiple solutions $A \mathbf x + \mathbf 0 = \mathbf b$.

It becomes clear then that the first step is to ask which vectors $\mathbf x$ satisfy $A \mathbf x = \mathbf 0$. It's clear that $\mathbf 0$ is one such vector. If $A$ has independent columns, it is the only solution. Otherwise, there are more vectors. These vectors are interesting—if we view this from the perspective of data or information, this suggests that these vectors represent some sort of "redundancy" or "equivalence". That is, there are distinguishing features that we've decided are okay to "lose" (i.e. become zero).

The null space of an $m \times n$ matrix $A$ is the set of vectors $\mathbf x$ such that $A \mathbf x = \mathbf 0$.

Our first observation is that this is a set of vectors from $\mathbb R^n$, since a vector needs to have $n$ components to be multiplied by an $m \times n$ matrix. Our next result is important.

Let $A$ be an $m \times n$ matrix. Then the null space of $A$, $\mathbf N(A)$ is a subspace of $\mathbb R^n$.

To check that $\mathbf N(A)$ is a vector space, we just need to confirm that $\mathbf 0$ is in $\mathbf N(A)$, which it clearly is, since $A \mathbf 0 = \mathbf 0$. The other vector space axioms come for free because we know vector addition and scalar multiplication over $\mathbb R^n$ works as expected.

All that remains is to check that linear combinations of vectors stay inside $\mathbf N(A)$.

Therefore, the linear combinations of vectors in $\mathbf N(A)$ are still in $\mathbf N(A)$, so $\mathbf N(A)$ is a subspace.

Let's stop here to consider what we have here. It seems obvious that matrices have a column space—matrices are all about describing vectors as linear combinations of its columns, we have learned. And given how much we talked about rows, it seems obvious that there's such a thing as a row space—again, how rows of matrices are combined seems very important. But here we have stumbled upon another vector space associated with a matrix. This suggests that these vectors, the vectors that get mapped to $\mathbf 0$ by our matrix are special in some way and hold some special structure. Let's hold on to that thought.

So how do we find all the vectors that satisfy $A \mathbf x = \mathbf 0$? When we had a square matrix, elimination allowed us to transform $A$ into an upper triangular matrix $U$. With a rectangular matrix, we can still use elimination (with a few new tricks) to get something useful. We call this matrix $R$ and consider the solutions $R \mathbf x = \mathbf 0$.

First, what does $R$ look like? How do we use it to describe the null space?

An example of the kind of matrix we're looking for is $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 3 & 4 \end{bmatrix}$. If we want to solve $R \mathbf x = \mathbf 0$, we recall that we have equations \begin{align*} x_1 + 2x_3 + 4x_4 &= 0 \\ x_2 + 3x_3 + 4x_4 &= 0 \end{align*} Here, we seem to run into an issue: we have too many variables and not enough equations! But this is only a problem if we want exactly one solution, which we know that can't be the case here.

What we will do instead is to try to find and describe the vectors that satisfy our equation. We know that there is one such vector: $\mathbf 0$, but that is not so helpful. How many vectors do we need? In order to solve this system, we observe that there are two variables that occur in both of our equations: $x_3$ and $x_4$. This suggests that we want two vectors.

To do this, we set $x_3 = 1$ and $x_4 = 0$ to get one solution. In this case, we get the equations \begin{align*} x_1 + 2 &= 0 \\ x_2 + 3 &= 0 \end{align*} which gives us $x_1 = -2$ and $x_2 = -3$. Then if we set $x_3 = 0$ and $x_4 = 1$, we have \begin{align*} x_1 + 4 &= 0 \\ x_2 + 4 &= 0 \end{align*} which gives us $x_1 = -4$ and $x_2 = -4$.

This gives us two vectors: the first gets us $\mathbf s_1 = \begin{bmatrix} -2 \\ -3 \\ 1 \\ 0 \end{bmatrix}$ and the second gets us $\mathbf s_2 = \begin{bmatrix} -4 \\ -4 \\ 0 \\ 1 \end{bmatrix}$.

Let's verify this.


    >>> R = np.array([[1, 0, 2, 4], [0, 1, 3, 4]])
    >>> s1 = np.array([-2, -3, 1, 0])
    >>> s2 = np.array([-4, -4, 0, 1])
    >>> R @ s1
    array([0, 0])
    >>> R @ s2
    array([0, 0])
    

These solutions that we get are called special solutions. The solutions are "special" because they're not just zeroes.

These vectors are an important starting point for describing the vectors that satisfy $R \mathbf x = \mathbf 0$—we know that we can also combine them to get other vectors that satisfy the equation. This means that we can describe all vectors satisfying $R \mathbf x = \mathbf 0$ by \[\mathbf x = c_1 \mathbf s_1 + c_2 \mathbf s_2\] for any scalars $c_1$ and $c_2$. (This also tells us that all such vectors form a plane.)


    >>> R @ (s1 + s2)
    array([0, 0])
    >>> R @ (-3193 * s1 + 80281 * s2)
    array([0, 0])
    >>> R @ (0.8231 * s1 - 0.19256 * s2)
    array([0., 0.]) 
    

But why can we do this? It's because we know we have a subspace. Since we're in a subspace, we know that we can combine our special solutions and we're guaranteed to get other solutions that satisfy $R \mathbf x = \mathbf 0$. We'll also see soon that our special solutions span the null space, meaning that we know that we have every solution just with these vectors.

The row-reduced echelon form

We need to first talk about $R$. It gets a special name because the form of the matrix $R$ is also special: it is called the row-reduced echelon form.

A matrix $R$ is in row-reduced echelon form if

We can view this matrix as a generalized version of an upper triangular matrix. Such a matrix has rows arranged from "longest" to shortest (hence, echelon), where each row begins with a $1$ and all entries to the left of the $1$ are $0$. We have the additional restriction that the leading 1s of each row are the only nonzero entries in their respective columns.

In the example below, the pivots are in boldface. Notice that everything underneath the pivots must be a 0 and all entries in the same column as a pivot must also be zero. All other entries, with locations denoted by $\mathsf a$ may be any number. \[\begin{bmatrix} \mathbf{1} & 0 & 0 & \mathsf a & \mathsf a & 0 & 0 & \mathsf a & 0 & \mathsf a & \mathsf a\\ 0 & \mathbf{1} & 0 & \mathsf a & \mathsf a & 0 & 0 & \mathsf a & 0 & \mathsf a & \mathsf a\\ 0 & 0 & \mathbf{1} & \mathsf a & \mathsf a & 0 & 0 & \mathsf a & 0 & \mathsf a & \mathsf a\\ 0 & 0 & 0 & 0 & 0 & \mathbf{1} & 0 & \mathsf a & 0 & \mathsf a & \mathsf a\\ 0 & 0 & 0 & 0 & 0 & 0 & \mathbf{1} & \mathsf a & 0 & \mathsf a & \mathsf a\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \mathbf{1} & \mathsf a & \mathsf a \end{bmatrix}\]

Just as we treated the diagonal of $U$ as "pivots", we treat the leading entries of each row as a "pivot". The remaining columns correspond to "free variables". For example, above we had $x_3$ and $x_4$ free, since their respective columns were non-pivot columns.

We can formally define special solutions with respect to the columns of row-reduced echelon form matrices.

Let $R$ be a matrix in row-reduced echelon form. The special solutions of the equation $R \mathbf x = \mathbf 0$ are the vectors $\mathbf s_1, \dots, \mathbf s_k$ obtained for each free variable corresponding to non-pivot columns.

The special solutions for $R \mathbf x = \mathbf 0$ span the null space of $R$.