CAPP 30271 — Lecture 8

Vector spaces

Our discussion of linear systems is helpful for establishing some key ideas, but there's a problem: it is relies on the idea that what we want is a single solution to a system of linear equations and that we will always have systems for which we have $n$ equations and $n$ unknowns. In other words, our discussion thus far assumes that we are working with square matrices.

However, we will rarely work with such matrices. Matrices that represent data are often long and skinny: many, many data points and relatively few attributes or properties that are measured. So we need a way to extend the ideas we explored above to matrices of varying dimensions. Furthermore, we need a way to discuss the various sets of vectors we get and their properties.

All of this is ties back to one of the initial questions of the course: what is a vector? This is necessary for the idea of sets of vectors having special characteristics. We'll consider such sets as vector spaces. What follows is a very formal definition for a vector space.

A vector space $V$ is a set of vectors equipped with two operations, vector addition and scalar multiplication, such that the following properties hold:

$\mathbf u + \mathbf v = \mathbf v + \mathbf u$ for all vectors $\mathbf u$ and $\mathbf v$ in $V$,
$\mathbf u + (\mathbf v + \mathbf w) = (\mathbf u + \mathbf v) + \mathbf w$ for all vectors $\mathbf u, \mathbf v, \mathbf w$ in $V$,
there is a unique zero vector $\mathbf 0$ such that $\mathbf v + \mathbf 0 = \mathbf v$ for all vectors $\mathbf v$ in $V$
for each vector $\mathbf v$ in $V$, there is a unique vector $-\mathbf v$ such that $\mathbf v + (- \mathbf v) = \mathbf 0$,
$1 \mathbf v = \mathbf v$ for all vectors $\mathbf v$ in $V$,
$(c_1 c_2) \mathbf v = c_1 (c_2 \mathbf v)$ for all vectors $\mathbf v$ in $V$ and all scalars $c_1$ and $c_2$,
$c (\mathbf u + \mathbf v) = c\mathbf u + c\mathbf v$ for all vectors $\mathbf u, \mathbf v$ in $V$ and all scalars $c$,
$(c_1+c_2) \mathbf v = c_1\mathbf v + c_2 \mathbf v$ for all vectors $\mathbf v$ in $V$ and all scalars $c_1$ and $c_2$.

That is all just a very formal way to say that vector spaces are sets of things that behave like vectors. Specifically, vector addition and scalar multiplication on these things work how you'd expect vector addition and scalar multiplication to work.

Formal definition	Practical consequence
$\mathbf u + \mathbf v = \mathbf v + \mathbf u$ for all vectors $\mathbf u$ and $\mathbf v$ in $V$	It doesn't matter which order you add two vectors together
$\mathbf u + (\mathbf v + \mathbf w) = (\mathbf u + \mathbf v) + \mathbf w$ for all vectors $\mathbf u, \mathbf v, \mathbf w$ in $V$	If you're adding three vectors, it doesn't matter which of the two you add together first
there is a unique zero vector $\mathbf 0$ such that $\mathbf v + \mathbf 0 = \mathbf v$ for all vectors $\mathbf v$ in $V$	A zero vector exists and adding it to a vector gives you the same vector
for each vector $\mathbf v$ in $V$, there is a unique vector $-\mathbf v$ such that $\mathbf v + (- \mathbf v) = \mathbf 0$	Every vector has an inverse vector that you can add to it to get the zero vector
$1 \mathbf v = \mathbf v$ for all vectors $\mathbf v$ in $V$	1 exists as a scalar and scalar multiplication by 1 should give you the same vector
$(c_1 c_2) \mathbf v = c_1 (c_2 \mathbf v)$ for all vectors $\mathbf v$ in $V$ and all scalars $c_1$ and $c_2$	It shouldn't matter whether you combine two scalars together first before scaling a vector with it or if you do them both separately
$c (\mathbf u + \mathbf v) = c\mathbf u + c\mathbf v$ for all vectors $\mathbf u, \mathbf v$ in $V$ and all scalars $c$	It doesn't matter whether you add two vectors together first before scaling them
$(c_1+c_2) \mathbf v = c_1\mathbf v + c_2 \mathbf v$ for all vectors $\mathbf v$ in $V$ and all scalars $c_1$ and $c_2$	It doesn't matter whether you add two scalars together first before scaling a vector

As mentioned earlier, we're mostly interested in vectors over $\mathbb R$, so most of these rules are no-brainers. But it helps to see some examples of "weirder" vector spaces to drive home the point that there is a wider world of structures out there and that the rules that we have are not necessarily "obvious".

For any $n$, the vector space that consists only of the zero vector $\mathbf 0 = \begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix}$ is a vector space. The rules on our operations work the same way and any combinations of $\mathbf 0$ produce $\mathbf 0$ itself.

For any $n$, the vector space containing $n \times n$ matrices is a vector space. One can convince themselves of this by thinking about how addition and scalar multiplication on matrices basically works the same way as with vectors.

We can view polynomials of degree $n$ with real coefficients as a vector space. Think about how polynomial addition works: we gather the terms for each $x^k$ and add the coefficients together. In this way, we can view a function $-3x^2 + x - 5$ like the vector $\begin{bmatrix} -3 \\ 1 \\ -5 \end{bmatrix}$ in $\mathbb R^3$. And scaling a polynomial works in the way that you'd expect.

But what isn't a vector space?

Here's a simple example: suppose we consider only vectors with non-negative real number scalars. This can be a reasonable restriction: for instance, if we know our data has only nonnegative measurements. But this isn't a vector space—if I have a vector $\mathbf v$, there's no corresponding vector $-\mathbf v$ that I can combine with it to get $\mathbf 0$.

Here's an example of how the choice of operation makes all the difference. Suppose we want to use boolean vectors. So our scalars are $\mathtt T$ and $\mathtt F$. A reasonable choice for scalar multiplication is $\wedge$, conjunction or "AND". And we might think that component-wise vector addition via $\vee$, disjunction or "OR", is also a reasonable choice. There's a nice symmetry to it. But it doesn't work: there's no way to get $x \vee y = \mathtt F$. Therefore, we don't have vectors. However, there is a way to get a boolean vector space: instead of using "OR" for vector addition, use "XOR", which is exclusive or, instead!

Here's a more complicated example. The definition of vector spaces imposes rules on the operations we're allowed to use. This is because that's what algebra is all about: the structures you get from choosing different kinds of operations. One such structure is the tropical semiring. Here, our "items" consist of the real numbers together with infinity ($\mathbb R \cup \{\infty\}$), the $\min$ function as "addition" and addition as "multiplication". In other words, consider $x, y \in \mathbb R$ and define \begin{align*} x \oplus y &= \min\{x,y\}, \\ x \otimes y &= x + y. \end{align*} It looks weird, but you can get a working algebraic structure out of this. This structure is implicitly used when computing things like shortest paths in graphs. If you check the rules, it even satisfies most of them, but there's a problem. Subtraction doesn't quite work: there's no unique vector $-\mathbf v$ for each vector $\mathbf v$ that gives us the additive identity (in this case, $\infty$) over such a structure.

Subspaces

Since we're remaining firmly planted in $\mathbb R^n$, one of the things we gain from talking about vector spaces is the fact that vector spaces can contain vector spaces. Such spaces are called subspaces.

A subspace is a vector space such that for any vectors $\mathbf u$ and $\mathbf v$ in the subspace and any scalar $c$,

$\mathbf u + \mathbf v$ is also in the subspace, and
$c\mathbf v$ is also in the subspace.

In other words, subspaces are vector spaces inside of another vector space where any linear combination of the vectors stays inside the subspace.

We saw earlier that the vector space of only the zero vector is a vector space. Since the zero vector is in every vector space by condition (3), this makes the subspace containing the zero vector a subspace of every vector space. We can formally test our conditions:

Take any two vectors from the subspace and add them together: $\mathbf 0 + \mathbf 0 = \mathbf 0$.
Take any vector from the subspace and a scalar: $c\mathbf 0 = \mathbf 0$.

The set of all combinations of vectors $\mathbf v = \begin{bmatrix} 3 \\ 2 \\ 0 \end{bmatrix}$ is a subspace: we observe that every combination of vectors in this set is a scalar multiple of $\mathbf v$ and, crucially, $\mathbf 0$ is included in this space, because $0\mathbf v = \mathbf 0$. We can test our two conditions:

Take two vectors from the subspace $c\mathbf v + d\mathbf v = (c+d)\mathbf v$.
Take any vector from the subspace and a scalar: $c_2(c_1\mathbf v) = (c_2c_1)\mathbf v$.

The set of all combinations of vectors $\begin{bmatrix} x \\ y \\ z \end{bmatrix}$ with $x + y + z = 4$ is not a subspace.

One clear reason is that $\begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}$ is not in this set ($0+0+0 \neq 4$). Therefore, it's not a vector space, and therefore not a subspace.
We can also test this in the following way: Consider $\begin{bmatrix} 4 \\ 0 \\ 0 \end{bmatrix}$ and $\begin{bmatrix} 0 \\ 4 \\ 0 \end{bmatrix}$. These two vectors are in our set. But if we add them, we get $\begin{bmatrix} 4 \\ 4 \\ 0 \end{bmatrix}$ and since $4+4 = 8$, this vector is not in the set. So this set of vectors is not a subspace.

The column space of a matrix

We said earlier that matrices define some sort of structure. It is the subspaces defined by each matrix that are the structures that we're interested in. The most obvious structure is one we've already been working with: the column space of a matrix.

The column space of an $m \times n$ matrix $A$ is the set of all linear combinations of columns of $A$. The column space of $A$ is a subspace of $\mathbb R^m$ and is denoted $\mathbf C(A)$.

Now, we have a formal way of describing when the equation $A\mathbf x = \mathbf b$ has solutions. Informally, $\mathbf b$ must be a combination of columns of $A$ in order for $\mathbf x$ to exist. But now, we can phrase this in terms of the column space of $A$.

The equation $A \mathbf x = \mathbf b$ has a solution if and only if $\mathbf b$ is in the column space of $A$.

Intuitively, this makes sense: $A \mathbf x = \mathbf b$ asks for some $\mathbf x$ which describes how to combine columns of $A$ to construct $\mathbf b$. If $\mathbf b$ is not in the column space of $A$, this means that there are no linear combinations of columns of $A$ that result in $\mathbf b$. Therefore, $\mathbf x$ can't exist.

Let $A = \begin{bmatrix} 2 & 4 & 3 \\ 3 & 1 & 5 \\ 0 & 0 & 0 \end{bmatrix}$ and $\mathbf b = \begin{bmatrix} 7 \\ 6 \\ 5 \end{bmatrix}$. Then $\mathbf b$ is not in the column space of $A$ and $A \mathbf x = \mathbf b$ is not solvable. Recall that this also tells us a few things about $A$: since we have a linear equation with no solution, it implies that $A$ is not invertible. Since $A$ is not invertible, it must have linearly dependent columns.

But what about the rows of $A$? Sometimes we choose to treat them as vectors. Perhaps they form a subspace? Or is there something about the row vectors that breaks our rules about subspaces?

We can see that the rows of $A$ do form a subspace by playing a very simple trick: if we consider the transpose $A^T$ of $A$, which itself is a matrix, we notice that the rows of $A$ are the columns of $A^T$. Since $A^T$ is a matrix, it has a column space. It is this column space that is the row space of $A$.

The row space of an $m \times n$ matrix $A$ is the column space of the transpose of $A$, $\mathbf C(A^T)$, and is a subspace of $\mathbb R^n$.

Consider the rank 1 matrix $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix} \begin{bmatrix} 2 & -6 & 7 \end{bmatrix} = \begin{bmatrix} 6 & -18 & 21 \\ 2 & -6 & 7 \\ 18 & -54 & 63 \end{bmatrix}$. The column space of this matrix consists of all scalar multiples of the vector $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix}$ and the row space of this matrix consists of all scalar multiples of the vector $\begin{bmatrix} 2 \\ -6 \\ 7 \end{bmatrix}$.

It's important to hammer on the point that the column/row space of a matrix is the set of all vectors that are linear combinations of the columns/rows and not the columns/rows themselves.

Instead, we have a specific term for the role that the columns play with respect to the column space. We say that they span the column space.

Let $v_1, \dots, v_k$ be a set of vectors in the vector space $\mathbf V$. We say that $v_1, \dots, v_k$ span the vector space $\mathbf V$ if every vector in $\mathbf V$ can be written as a linear combination of vectors in $S$.

From the example above, the column space of the rank 1 matrix is spanned by the vector $\begin{bmatrix} 3 \\ 1 \\ 9 \end{bmatrix}$. But we can also say that it is spanned by $\begin{bmatrix} 6 \\ 2 \\ 18 \end{bmatrix}$ or even that it is spanned by $\begin{bmatrix}6 \\ 2 \\ 18 \end{bmatrix}$ and $\begin{bmatrix} -18 \\ -6 \\ -54 \end{bmatrix}$. In other words, the set of vectors that span a vector space is not unique, nor does such a set need to be independent!

Of course, it may make sense to want something like a spanning set that is linearly independent. This idea of a spanning set leads to some questions we introduced earlier in the course. For example, suppose we have a spanning set for our vector space—do we really need every vector in that set in order to get every vector in our space? (Once again, we will set this question aside for a bit later)