CAPP 30271 — Lecture 8

For any $n$, the vector space containing $n \times n$ matrices is a vector space. One can convince themselves of this by thinking about how addition and scalar multiplication on matrices basically works the same way as with vectors.

Here's an example of how the choice of operation makes all the difference. Suppose we want to use boolean vectors. So our scalars are $\mathtt T$ and $\mathtt F$. A reasonable choice for scalar multiplication is $\wedge$, conjunction or "AND". And we might think that component-wise vector addition via $\vee$, disjunction or "OR", is also a reasonable choice. There's a nice symmetry to it. But it doesn't work: there's no way to get $x \vee y = \mathtt F$. Therefore, we don't have vectors. However, there is a way to get a boolean vector space: instead of using "OR" for vector addition, use "XOR", which is exclusive or, instead! XOR has the following truth table:

$x$	$y$	$x \mathop{\mathsf{xor}} y$
$\mathtt T$	$\mathtt T$	$\mathtt F$
$\mathtt T$	$\mathtt F$	$\mathtt T$
$\mathtt F$	$\mathtt T$	$\mathtt T$
$\mathtt F$	$\mathtt F$	$\mathtt F$

Here's a more complicated example. The definition of vector spaces imposes rules on the operations we're allowed to use. This is because that's what algebra is all about: the structures you get from choosing different kinds of operations. One such structure is the tropical semiring. Here, our "items" consist of the real numbers together with infinity ($\mathbb R \cup \{\infty\}$), the $\min$ function as "addition" and addition as "multiplication". In other words, consider $x, y \in \mathbb R$ and define \begin{align*} x \oplus y &= \min\{x,y\}, \\ x \otimes y &= x + y. \end{align*} It looks weird, but you can get a working algebraic structure out of this. This structure is implicitly used when computing things like shortest paths in graphs. Since our operations have changed, we have to think carefully about what the identities for addition and multiplication are.

This almost works, but it's missing something: an additive inverse. But such structures are common in computer science. This is also something that these rules are used for: seeing which ones we really need for our applications and which ones we might be able to drop when convenient.

Recall that subspaces are vector spaces inside of another vector space where any linear combination of the vectors stays inside the subspace. Let's look at some subspaces.

We saw earlier that the vector space of only the zero vector is a vector space. Since the zero vector is in every vector space by condition (3), this makes the subspace containing the zero vector a subspace of every vector space. We can formally test our conditions:

The set of all combinations of vectors $\mathbf v = \begin{bmatrix} 3 \\ 2 \\ 0 \end{bmatrix}$ is a subspace: we observe that every combination of vectors in this set is a scalar multiple of $\mathbf v$ and, crucially, $\mathbf 0$ is included in this space, because $0\mathbf v = \mathbf 0$. We can test our two conditions:

Notice that this subspace defines a line in 3-space that goes through the origin (the zero vector).

The set of all combinations of vectors $\begin{bmatrix} x \\ y \\ z \end{bmatrix}$ with $x + y + z = 4$ is not a subspace.

One observation that we get from this is that subspaces have to be "around" $\mathbf 0$ and that these spaces essentially "fill up" a space. This space can be a line, a plane, or some other $n$-dimensional space. The idea is that this space has smaller "dimension" than the original space, so we can think of it as having some "orientation", like a line or plane.

The column space of a matrix

The column space of an $m \times n$ matrix $A$ is the set of all linear combinations of columns of $A$. The column space of $A$ is a subspace of $\mathbb R^m$ and is denoted $\mathbf C(A)$.

If $A$ is an $m \times n$ matrix, then the column space of $A$ is a subspace of $\mathbb R^m$.

Let $A$ be an arbitrary $m \times n$ matrix. We will verify that the column space of $A$ is a subspace by checking each subspace condition.

Is $\mathbf 0$ in the column space of $A$? Yes: one linear combination of columns that will get us $\mathbf 0$ is by zeroing out all the columns.
If two vectors are in the column space of $A$, is their sum in the column space of $A$? Suppose $\mathbf u$ and $\mathbf v$ are in the column space of $A$. This means they are linear combinations of columns of $A$. This means that there's some vector, say $\mathbf x$ and $\mathbf y$ such that $A \mathbf x = \mathbf u$ and $A \mathbf y = \mathbf v$. Then we have that $\mathbf u + \mathbf v = A\mathbf x + A \mathbf y = A(\mathbf x + \mathbf y)$. So $\mathbf u + \mathbf v$ can be expressed as a linear combination of columns of $A$.
If a vector is in the column space of $A$, is every scalar multiple of it also in the column space of $A$? Suppose $\mathbf v$ is in the column space of $A$. This means that there's some vector, say $\mathbf x$ such that $A \mathbf x = \mathbf v$. Let $c$ be a scalar. Then we have that $c\mathbf v = c (A\mathbf x) = A(c\mathbf x)$. So $c \mathbf v$ can be expressed as a linear combination of columns of $A$.

Therefore, the column space of $A$ is a subspace. Since the columns of $A$ are vectors in $\mathbb R^m$, the column space of $A$ is a subspace of the vector space $\mathbb R^m$.

Now, we have a formal way of describing when the equation $A\mathbf x = \mathbf b$ has solutions. Informally, $\mathbf b$ must be a combination of columns of $A$ in order for $\mathbf x$ to exist. But now, we can phrase this in terms of the column space of $A$.

The equation $A \mathbf x = \mathbf b$ has a solution if and only if $\mathbf b$ is in the column space of $A$.

Intuitively, this makes sense: $A \mathbf x = \mathbf b$ asks for some $\mathbf x$ which describes how to combine columns of $A$ to construct $\mathbf b$. If $\mathbf b$ is not in the column space of $A$, this means that there are no linear combinations of columns of $A$ that result in $\mathbf b$. Therefore, $\mathbf x$ can't exist.

Let $A = \begin{bmatrix} 2 & 4 & 3 \\ 3 & 1 & 5 \\ 0 & 0 & 0 \end{bmatrix}$ and $\mathbf b = \begin{bmatrix} 7 \\ 6 \\ 5 \end{bmatrix}$. Then $\mathbf b$ is not in the column space of $A$ and $A \mathbf x = \mathbf b$ is not solvable. Recall that this also tells us a few things about $A$: since we have a linear equation with no solution, it implies that $A$ is not invertible. Since $A$ is not invertible, it must have linearly dependent columns.

But what about the rows of $A$? Sometimes we choose to treat them as vectors. Perhaps they form a subspace? Or is there something about the row vectors that breaks our rules about subspaces?

We can see that the rows of $A$ do form a subspace by playing a very simple trick: if we consider the transpose $A^T$ of $A$, which itself is a matrix, we notice that the rows of $A$ are the columns of $A^T$. Since $A^T$ is a matrix, it has a column space. It is this column space that is the row space of $A$.

The row space of an $m \times n$ matrix $A$ is the column space of the transpose of $A$, $\mathbf C(A^T)$, and is a subspace of $\mathbb R^n$.

Consider the rank 1 matrix $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix} \begin{bmatrix} 2 & -6 & 7 \end{bmatrix} = \begin{bmatrix} 6 & -18 & 21 \\ 2 & -6 & 7 \\ 18 & -54 & 63 \end{bmatrix}$. The column space of this matrix consists of all scalar multiples of the vector $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix}$ and the row space of this matrix consists of all scalar multiples of the vector $\begin{bmatrix} 2 \\ -6 \\ 7 \end{bmatrix}$.

It's important to hammer on the point that the column/row space of a matrix is the set of all vectors that are linear combinations of the columns/rows and not the columns/rows themselves.

Instead, we have a specific term for the role that the columns play with respect to the column space. We say that they span the column space.

Let $v_1, \dots, v_k$ be a set of vectors in the vector space $\mathbf V$. We say that $v_1, \dots, v_k$ span the vector space $\mathbf V$ if every vector in $\mathbf V$ can be written as a linear combination of vectors in $S$.

From the example above, the column space of the rank 1 matrix is spanned by the vector $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix}$. But we can also say that it is spanned by $\begin{bmatrix} 6 \\ 2 \\ 18 \end{bmatrix}$ or even that it is spanned by $\begin{bmatrix}6 \\ 2 \\ 18 \end{bmatrix}$ and $\begin{bmatrix} -18 \\ -6 \\ -54 \end{bmatrix}$. In other words, the set of vectors that span a vector space is not unique, nor does such a set need to be independent!

Of course, it may make sense to want something like a spanning set that is linearly independent. This idea of a spanning set leads to some questions we introduced earlier in the course. For example, suppose we have a spanning set for our vector space—do we really need every vector in that set in order to get every vector in our space? (Once again, we will set this question aside for a bit later)

The null space

The reason we want to start thinking about sets of vectors goes back to our discussion of solutions for linear systems. We made the observation that $A \mathbf x = \mathbf b$ has exactly one solution when the columns of $A$ are independent. This is actually a significant restriction: in this case $A$ must be square.

But in fact, most of the matrices we're concerned with aren't square. But such linear systems still describe something useful. Our first step is to recall why it is the case that dependent columns in $A$ means that $A \mathbf x = \mathbf b$ has multiple possible vectors $\mathbf x$.

The reason for this is because once we have linearly dependent columns, we have multiple ways to get $\mathbf 0$, which means that we have multiple solutions $A \mathbf x + \mathbf 0 = \mathbf b$.

It becomes clear then that the first step is to ask which vectors $\mathbf x$ satisfy $A \mathbf x = \mathbf 0$. It's clear that $\mathbf 0$ is one such vector. If $A$ has independent columns, it is the only solution. Otherwise, there are more vectors. These vectors are interesting—if we view this from the perspective of data or information, this suggests that these vectors represent some sort of "redundancy" or "equivalence". That is, there are distinguishing features that we've decided are okay to "lose" (i.e. become zero).

The null space of an $m \times n$ matrix $A$ is the set of vectors $\mathbf x$ such that $A \mathbf x = \mathbf 0$.

Our first observation is that this is a set of vectors from $\mathbb R^n$, since a vector needs to have $n$ components to be multiplied by an $m \times n$ matrix. Our next result is important.

Let $A$ be an $m \times n$ matrix. Then the null space of $A$, $\mathbf N(A)$ is a subspace of $\mathbb R^n$.

To check that $\mathbf N(A)$ is a vector space, we just need to confirm that $\mathbf 0$ is in $\mathbf N(A)$, which it clearly is, since $A \mathbf 0 = \mathbf 0$. The other vector space axioms come for free because we know vector addition and scalar multiplication over $\mathbb R^n$ works as expected.

All that remains is to check that linear combinations of vectors stay inside $\mathbf N(A)$.

First, we take two vectors, say $\mathbf u$ and $\mathbf v$ in $\mathbf N(A)$. Since they are in $\mathbf N(A)$, we know that $A \mathbf u = \mathbf 0$ and $A \mathbf v = \mathbf 0$. By matrix multiplication, we have $A(\mathbf u + \mathbf v) = \mathbf 0$. Therefore, $\mathbf u + \mathbf v$ is in $\mathbf N(A)$.
Then, take a vector $\mathbf v$ in $\mathbf N(A)$ and any scalar $c$. We have $A(c\mathbf v) = c(A \mathbf v) = \mathbf 0$. Therefore, $c\mathbf v$ is in $\mathbf N(A)$.

Therefore, the linear combinations of vectors in $\mathbf N(A)$ are still in $\mathbf N(A)$, so $\mathbf N(A)$ is a subspace.

Let's stop here to consider what we have here. It seems obvious that matrices have a column space—matrices are all about describing vectors as linear combinations of its columns, we have learned. And given how much we talked about rows, it seems obvious that there's such a thing as a row space—again, how rows of matrices are combined seems very important. But here we have stumbled upon another vector space associated with a matrix. This suggests that these vectors, the vectors that get mapped to $\mathbf 0$ by our matrix are special in some way and hold some special structure. Let's hold on to that thought.

So how do we find all the vectors that satisfy $A \mathbf x = \mathbf 0$? When we had a square matrix, elimination allowed us to transform $A$ into an upper triangular matrix $U$. With a rectangular matrix, we can still use elimination (with a few new tricks) to get something useful. We call this matrix $R$ and consider the solutions $R \mathbf x = \mathbf 0$.

An example of the kind of matrix we're looking for is $R = \begin{bmatrix} 1 & 0 & 2 & 4 \\ 0 & 1 & 2 & 3 \end{bmatrix}$. If we want to solve $R \mathbf x = \mathbf 0$, we recall that we have equations \begin{align*} x_1 + 2x_3 + 4x_4 &= 0 \\ x_2 + 3x_3 + 4x_4 &= 0 \end{align*} Here, we seem to run into an issue: we have too many variables and not enough equations! But this is only a problem if we want exactly one solution, which we know that can't be the case here.

What we will do instead is to try to find and describe the vectors that satisfy our equation. We know that there is one such vector: $\mathbf 0$, but that is not so helpful. How many vectors do we need? In order to solve this system, we observe that there are two variables that occur in both of our equations: $x_3$ and $x_4$. This suggests that we want two vectors.

To do this, we set $x_3 = 1$ and $x_4 = 0$ to get one solution. In this case, we get the equations \begin{align*} x_1 + 2 &= 0 \\ x_2 + 3 &= 0 \end{align*} which gives us $x_1 = -2$ and $x_2 = -3$. Then if we set $x_3 = 0$ and $x_4 = 1$, we have \begin{align*} x_1 + 4 &= 0 \\ x_2 + 4 &= 0 \end{align*} which gives us $x_1 = -4$ and $x_2 = -4$.

This gives us two vectors: the first gets us $\mathbf s_1 = \begin{bmatrix} -2 \\ -3 \\ 1 \\ 0 \end{bmatrix}$ and the second gets us $\mathbf s_2 = \begin{bmatrix} -4 \\ -4 \\ 0 \\ 1 \end{bmatrix}$. These solutions that we get are called special solutions. The solutions are "special" because they're not just zeroes.

These vectors are an important starting point for describing the vectors that satisfy $R \mathbf x = \mathbf 0$—we know that we can also combine them to get other vectors that satisfy the equation. This means that we can describe all vectors satisfying $R \mathbf x = \mathbf 0$ by \[\mathbf x = c_1 \mathbf s_1 + c_2 \mathbf s_2\] for any scalars $c_1$ and $c_2$. (This also tells us that all such vectors form a plane.)

But why can we do this? It's because we know we have a subspace. Since we're in a subspace, we know that we can combine our special solutions and we're guaranteed to get other solutions that satisfy $R \mathbf x = \mathbf 0$. We'll also see soon that our special solutions span the null space, meaning that we know that we have every solution just with these vectors.