CAPP 30271 — Lecture 8

What are some examples of vector spaces? The obvious ones are the ones we've been working in: $\mathbb R^n$. However, notice that while we think of $\begin{bmatrix} 2 \\ 1 \end{bmatrix}$ and $\begin{bmatrix} 3 \\ -1 \\ 9 \\ 3\end{bmatrix}$ as vectors, they do not belong to the same vector space! This is because we can't add them together.

This is one reason why we need the concept of a vector space—it grounds our discussion to vectors of a certain size. This would be overkill if that's all we wanted to do though. Consider what happens in the following.

Let $A = \begin{bmatrix} 2 & 4 & 3 & 4 \\ 3 & 1 & 9 & 3 \end{bmatrix}$ and $\mathbf v = \begin{bmatrix} 2 \\ 0 \\ \frac 1 3 \\ -1 \end{bmatrix}$. Then \[A \mathbf v = 2 \begin{bmatrix} 2 \\ 3 \end{bmatrix} + 0 \begin{bmatrix} 4 \\ 1 \end{bmatrix} + \frac 1 3 \begin{bmatrix} 3 \\ 9 \end{bmatrix} - 1 \begin{bmatrix} 4 \\ 3 \end{bmatrix} = \begin{bmatrix} 1 \\ 6 \end{bmatrix}.\]


    >>> A = np.array([[2, 4, 3, 4], [3, 1, 9, 3]])
    >>> v = np.array([2, 0, 1/3, -1])
    >>> A @ v
    array([1., 6.])
    
Let's examine what happened here. The vector $\mathbf v$ is a vector in $\mathbb R^4$. But when multiplied by $A$, the result is a vector in $\mathbb R^2$.

We've been working with matrices as transformations recently, both in terms of transforming other matrices as well as transforming a vector into some other vector. But since we've largely been working with square matrices recently, we've really only seen examples where a vector gets transformed into a vector in the same vector space. However, what we just saw is a matrix that can transform a vector in some vector space into a vector in a differentvector space. In other words, we can think of an $m \times n$ matrix $A$ as a function $A: \mathbb R^n \to \mathbb R^m$.

It is this aspect of vector spaces that will be necessary to discuss when thinking about data. Since data matrices are often rectangular, this means that they can be thought of as transformations from one vector space to another. The structrual properties that we get from this become important.

As mentioned earlier, we're mostly interested in vectors over $\mathbb R$, so most of vector space rules are no-brainers. But it helps to see some examples of "weirder" vector spaces to drive home the point that there is a wider world of structures out there and that the rules that we have are not necessarily "obvious".

For any $n$, the vector space that consists only of the zero vector $\mathbf 0 = \begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix}$ is a vector space. The rules on our operations work the same way and any combinations of $\mathbf 0$ produce $\mathbf 0$ itself.

We can view polynomials of degree $n$ with real coefficients as a vector space. Think about how polynomial addition works: we gather the terms for each $x^k$ and add the coefficients together. In this way, we can view a function $-3x^2 + x - 5$ like the vector $\begin{bmatrix} -3 \\ 1 \\ -5 \end{bmatrix}$ in $\mathbb R^3$. And scaling a polynomial works in the way that you'd expect.

What isn't a vector space? Here's a simple example: suppose we consider only vectors with non-negative real number scalars. This can be a reasonable restriction: for instance, if we know our data has only nonnegative measurements. But this isn't a vector space—if I have a vector $\mathbf v$, there's no corresponding vector $-\mathbf v$ that I can combine with it to get $\mathbf 0$.

For any $m$ and $n$, the vector space containing $m \times n$ matrices is a vector space. One can convince themselves of this by thinking about how addition and scalar multiplication on matrices basically works the same way as with vectors.

Here's an example of how the choice of operation makes all the difference. Suppose we want to use boolean vectors. So our scalars are $\mathtt T$ and $\mathtt F$. A reasonable choice for scalar multiplication is $\wedge$, conjunction or "AND". And we might think that component-wise vector addition via $\vee$, disjunction or "OR", is also a reasonable choice. There's a nice symmetry to it. But it doesn't work: there's no way to get $x \vee y = \mathtt F$. Therefore, we don't have vectors. However, there is a way to get a boolean vector space: instead of using "OR" for vector addition, use "XOR", which is exclusive or, instead! XOR has the following truth table:

$x$ $y$ $x \mathop{\mathsf{xor}} y$
$\mathtt T$ $\mathtt T$ $\mathtt F$
$\mathtt T$ $\mathtt F$ $\mathtt T$
$\mathtt F$ $\mathtt T$ $\mathtt T$
$\mathtt F$ $\mathtt F$ $\mathtt F$

Here's a more complicated example. The definition of vector spaces imposes rules on the operations we're allowed to use. This is because that's what algebra is all about: the structures you get from choosing different kinds of operations. One such structure is the tropical semiring. Here, our "items" consist of the real numbers together with infinity ($\mathbb R \cup \{\infty\}$), the $\min$ function as "addition" and addition as "multiplication". In other words, consider $x, y \in \mathbb R$ and define \begin{align*} x \oplus y &= \min\{x,y\}, \\ x \otimes y &= x + y. \end{align*} It looks weird, but you can get a working algebraic structure out of this. This structure is implicitly used when computing things like shortest paths in graphs. Since our operations have changed, we have to think carefully about what the identities for addition and multiplication are.

This almost works, but it's missing something: an additive inverse. But such structures are common in computer science. This is also something that these rules are used for: seeing which ones we really need for our applications and which ones we might be able to drop when convenient.

Subspaces

Since we're remaining firmly planted in $\mathbb R^n$, one of the things we gain from talking about vector spaces is the fact that vector spaces can contain vector spaces. Such spaces are called subspaces.

A subspace is a vector space such that for any vectors $\mathbf u$ and $\mathbf v$ in the subspace and any scalar $c$,

  1. $\mathbf u + \mathbf v$ is also in the subspace, and
  2. $c\mathbf v$ is also in the subspace.

In other words, subspaces are vector spaces inside of another vector space where any linear combination of the vectors stays inside the subspace.

We said earlier that matrices define some sort of structure. It is the subspaces defined by each matrix that are the structures that we're interested in. The most obvious structure is one we've already been working with: the column space of a matrix.

We saw earlier that the vector space of only the zero vector is a vector space. Since the zero vector is in every vector space by condition (3), this makes the subspace containing the zero vector a subspace of every vector space. We can formally test our conditions:

  1. Take any two vectors from the subspace and add them together: $\mathbf 0 + \mathbf 0 = \mathbf 0$.
  2. Take any vector from the subspace and a scalar: $c\mathbf 0 = \mathbf 0$.

The set of all combinations of vectors $\mathbf v = \begin{bmatrix} 3 \\ 2 \\ 0 \end{bmatrix}$ is a subspace: we observe that every combination of vectors in this set is a scalar multiple of $\mathbf v$ and, crucially, $\mathbf 0$ is included in this space, because $0\mathbf v = \mathbf 0$. We can test our two conditions:

  1. Take two vectors from the subspace $c\mathbf v + d\mathbf v = (c+d)\mathbf v$.
  2. Take any vector from the subspace and a scalar: $c_2(c_1\mathbf v) = (c_2c_1)\mathbf v$.

Notice that this subspace defines a line in 3-space that goes through the origin (the zero vector).

The set of all combinations of vectors $\begin{bmatrix} x \\ y \\ z \end{bmatrix}$ with $x + y + z = 4$ is not a subspace.

One observation that we get from this is that subspaces have to be "around" $\mathbf 0$ and that these spaces essentially "fill up" a space. This space can be a line, a plane, or some other $n$-dimensional space. The idea is that this space has smaller "dimension" than the original space, so we can think of it as having some "orientation", like a line or plane.

The column space of a matrix

As noted earlier, we can treat an $m \times n$ matrix as a function that takes a vector in $\mathbb R^n$ and transforms it into a vector in $\mathbb R^m$. There's a natural followup question to this: where in $\mathbb R^m$ does it end up and what can we say about this place?

In fact, we've already given a name for this: it's the column space of $A$.

The column space of an $m \times n$ matrix $A$ is the set of all linear combinations of columns of $A$. The column space of $A$ is a subspace of $\mathbb R^m$ and is denoted $\mathbf C(A)$.

In other words, the space that we end up in is not only in $\mathbb R^m$, but it's specifically the slice of $\mathbb R^m$ that contains all the linear combinations of the columns of $A$. First, we should verify that this is actually a subspace.

If $A$ is an $m \times n$ matrix, then the column space of $A$ is a subspace of $\mathbb R^m$.

Let $A$ be an arbitrary $m \times n$ matrix. We will verify that the column space of $A$ is a subspace by checking each subspace condition.

Therefore, the column space of $A$ is a subspace. Since the columns of $A$ are vectors in $\mathbb R^m$, the column space of $A$ is a subspace of the vector space $\mathbb R^m$.

Now, we have a formal way of describing when the equation $A\mathbf x = \mathbf b$ has solutions. Informally, $\mathbf b$ must be a combination of columns of $A$ in order for $\mathbf x$ to exist. But now, we can phrase this in terms of the column space of $A$.

The equation $A \mathbf x = \mathbf b$ has a solution if and only if $\mathbf b$ is in the column space of $A$.

Intuitively, this makes sense: $A \mathbf x = \mathbf b$ asks for some $\mathbf x$ which describes how to combine columns of $A$ to construct $\mathbf b$. If $\mathbf b$ is not in the column space of $A$, this means that there are no linear combinations of columns of $A$ that result in $\mathbf b$. Therefore, $\mathbf x$ can't exist.

Let $A = \begin{bmatrix} 2 & 4 & 3 \\ 3 & 1 & 5 \\ 0 & 0 & 0 \end{bmatrix}$ and $\mathbf b = \begin{bmatrix} 7 \\ 6 \\ 5 \end{bmatrix}$. Then $\mathbf b$ is not in the column space of $A$: no matter what $\mathbf x$ you choose, there's no way to make $\mathbf 0 \cdot \mathbf x = 5$. So $A \mathbf x = \mathbf b$ is not solvable. Recall that this also tells us a few things about $A$: since we have a linear equation with no solution, it implies that $A$ is not invertible. Since $A$ is not invertible, it must have linearly dependent columns.

But what about the rows of $A$? Sometimes we choose to treat them as vectors. Perhaps they form a subspace? Or is there something about the row vectors that breaks our rules about subspaces?

We can see that the rows of $A$ do form a subspace by playing a very simple trick. Recall that the transpose $A^T$ of $A$ is the matrix obtained by moving each entry $a_{ij}$ from the $i$th row and $j$th column to the $j$th row and $i$th column. In NumPy, we can obtain the transpose of a 2D array A by using A.T.


    >>> A = np.array([[2, 4, 3, 4], [3, 1, 9, 3]])
    >>> A.T
    array([[2, 3],
           [4, 1],
           [3, 9],
           [4, 3]])
    

Since $A^T$ is a matrix, we notice that the rows of $A$ are the columns of $A^T$. Since $A^T$ is a matrix, it has a column space. It is this space that is the row space of $A$.

The row space of an $m \times n$ matrix $A$ is the column space of the transpose of $A$, $\mathbf C(A^T)$, and is a subspace of $\mathbb R^n$.

Consider the rank 1 matrix $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix} \begin{bmatrix} 2 & -6 & 7 \end{bmatrix} = \begin{bmatrix} 6 & -18 & 21 \\ 2 & -6 & 7 \\ 18 & -54 & 63 \end{bmatrix}$. The column space of this matrix consists of all scalar multiples of the vector $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix}$ and the row space of this matrix consists of all scalar multiples of the vector $\begin{bmatrix} 2 \\ -6 \\ 7 \end{bmatrix}$.

It's important to hammer on the point that the column/row space of a matrix is the set of all vectors that are linear combinations of the columns/rows and not the columns/rows themselves.

Instead, we have a specific term for the role that the columns play with respect to the column space. We say that they span the column space.

Let $v_1, \dots, v_k$ be a set of vectors in the vector space $\mathbf V$. We say that $v_1, \dots, v_k$ span the vector space $\mathbf V$ if every vector in $\mathbf V$ can be written as a linear combination of vectors in $S$.

From the example above, the column space of the rank 1 matrix is spanned by the vector $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix}$. But we can also say that it is spanned by $\begin{bmatrix} 6 \\ 2 \\ 18 \end{bmatrix}$ or even that it is spanned by $\begin{bmatrix}6 \\ 2 \\ 18 \end{bmatrix}$ and $\begin{bmatrix} -18 \\ -6 \\ -54 \end{bmatrix}$. In other words, the set of vectors that span a vector space is not unique, nor does such a set need to be independent!

Of course, it may make sense to want something like a spanning set that is linearly independent. This idea of a spanning set leads to some questions we introduced earlier in the course. For example, suppose we have a spanning set for our vector space—do we really need every vector in that set in order to get every vector in our space? (Once again, we will set this question aside for a bit later)