CAPP 30271 — Lecture 5

Solving linear systems

Now, it's clear why having a collection of linearly independent vectors translates to a system with either no solutions or infinitely many solutions, but the case of the unique solution corresponding to linearly independent vectors seems to be a bit too coincidental. We'll discover why these two things mean the same thing in the course of trying to solve one of these systems.

For now, we'll make the following observation. We're already familiar with the idea of linear independence as being about for some collection of vectors, whether any of the vectors can be written as a linear combination of the others. While this definition is intuitive, it's actually a bit cumbersome to use, so here's another definition that captures the same idea.

The vectors $\mathbf v_1, \dots, \mathbf v_n$ are said to be linearly independent if and only if the only linear combination $a_1 \mathbf v_1 + \cdots + a_n \mathbf v_n = \mathbf 0$ is $a_1 = \cdots a_n = 0$.

Why is this definition the same as our more intuitive definition? Consider the situation in which we have vectors that are not independent. So $a_1 \mathbf v_1 + a_2 \mathbf v_2 = \mathbf v_3$. If we rearrange this equation, we get $a_1 \mathbf v_1 + a_2 \mathbf v_2 - \mathbf v_3 = \mathbf 0$. This is a linear combination of the vectors that produces $\mathbf 0$ but the scalars aren't all 0.

This helps us understand why having linearly dependent columns can give us infinitely many solutions for an equation $A \mathbf x = \mathbf b$. Suppose $A$ has linearly dependent columns, say $\mathbf u$ and $\mathbf v$. Then I have $\mathbf u = c \mathbf v$ for some scalar $c$. Well, this gives us $\mathbf u - c\mathbf v = \mathbf 0$, which means that there exists a vector $\mathbf y$, which encodes this linear combination such that $A\mathbf y = \mathbf 0$ and $\mathbf y \neq \mathbf 0$. Then I can take another scalar $k$ and get (infinitely) more combinations of these vectors that give me $k (A \mathbf y) = A(k \mathbf y) = \mathbf 0$.

Now, notice that $A \mathbf x = A \mathbf x + \mathbf 0 = \mathbf b + \mathbf 0 = \mathbf b$. If I have linearly dependent columns, I have infinitely many ways to get $\mathbf 0 = A(k\mathbf y)$, which means that \[\mathbf b = A \mathbf x = A \mathbf x + \mathbf 0 = A \mathbf x + A(k \mathbf y) = A(\mathbf x + k \mathbf y),\] and therefore $\mathbf x + k \mathbf y$ is a solution for all scalars $k$. On the other hand, if I have linearly independent columns, $\mathbf 0$ is the only linear combination of columns of $A$ that gets me $\mathbf 0$ and so there's exactly one way to combine the columns of $A$ to get $\mathbf b$.

Now, let's think a bit about how we might solve an equation of the form $A \mathbf x = \mathbf b$. If we stare at the equation long enough, we have what looks like a multiplication problem, and we know how to solve those: \[\mathbf x = A^{-1} \mathbf b.\] And so we're done! Of course, there are a few questions that we have to deal with. For example, what does $A^{-1}$ mean? What exactly is the (multiplicative) inverse of a matrix? How does one find the multiplicative inverse of a matrix? How do we know that $A^{-1}$ even exists?

Since $A^{-1}$ itself is a matrix, we can view it in the same way: it's a matrix that describes some collection of vectors. But the above scenarios tell us that it's not always the case that such a matrix always exists.

It may surprise you that we can't or don't want to find the inverse of a matrix, but we run into this case all the time even just with ordinary numbers. After all, there's no integer that always gives us a half of a number, or more technically, there are no multiplicative inverses of integers that are themselves integers.

Gaussian elimination

What we need is a process that will give us $\mathbf x$, if one exists and also tell us whether it exists or not. As a result, this process will also tell us something about the columns of $A$. Since for now we're not interested so much in $A^{-1}$, what we can do is manipulate it in a way that gives us an answer for $\mathbf x$ without necessarily preserving $A$ itself. Why can we do this? It's the same principle as in numerical algebra: apply the same operations to both sides of the equation.

Let's jump ahead to the goal: an upper triangular matrix with non-zero entries on the diagonal. Once we have one of these, computing $\mathbf x$ becomes very easy.

Let $U = \begin{bmatrix} 7 & 6 & 9 \\ 0 & 5 & 6 \\ 0 & 0 & 1 \end{bmatrix}$ such that $U \mathbf x = \begin{bmatrix} -18 \\ -25 \\ -5 \end{bmatrix}$. In other words, we need to solve \[\begin{bmatrix} 7 & 6 & 9 \\ 0 & 5 & 6 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -18 \\ -25 \\ -5 \end{bmatrix}.\] Recall that the definition of matrix-vector products gives us the following: \[\begin{bmatrix} 7 & 6 & 9 \\ 0 & 5 & 6 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 7x_1 + 6x_2 + 9x_3 \\ 0x_1 + 5x_2 + 6x_3 \\ 0x_1 + 0x_2 + 1x_3 \end{bmatrix} = \begin{bmatrix} -18 \\ -25 \\ -5 \end{bmatrix}.\] This means that we can interpret this as a system of equations. It turns out this one is not so bad and we can solve it quite straightforwardly. The final equation gives us something to start with, $x_3 = -5$, and we keep on substituting: \begin{align*} 5x_2 + 6x_3 & = -25 & 7x_1 + 6x_2 + 9x_3 & = -18 \\ 5x_2 + 6 \cdot (-5) &= -25 & 7x_1 + 6 \cdot 1 + 9 \cdot (-5) &= -18\\ 5x_2 - 30 &= -25 & 7x_1 - 39 &= -18\\ 5x_2 &= 5 & 7x_1 &= 21 \\ x_2 &= 1 & x_1 &= 3 \end{align*} which gives us $\mathbf x = \begin{bmatrix}3 \\ 1 \\ -5 \end{bmatrix}$.

So the goal is to find a series of operations that we can apply to $A \mathbf x = \mathbf b$ to get $U \mathbf x = \mathbf c$. Again, $\mathbf x$ remains unchanged, because we apply the same transformations to both sides.

What kinds of transformations can we apply? Here, we think back to algebra and solving systems of equations: we want to isolate the final variable so that we can start substituting it into the other equations. We do this by adding multiples of equations to other ones. When viewed as a matrix, this gives us our upper triangular matrix, zeroing out the bottom triangle. This is called Gaussian elimination, after Gauss.

Consider the matrix $A = \begin{bmatrix} 2 & 1 & 1 \\ 6 & 9 & 8 \\ 4 & -10 & -3 \end{bmatrix}$. We want to transform this into an upper triangular matrix.

First, we need to clear out the entries under the first column. We observe that this can be done by subtracting three of the first row from the second to get \[\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 4 & -10 & -3 \end{bmatrix}\] Then we do the same thing with the final row, subtracting two of the first from it. \[\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 0 & -12 & -5 \end{bmatrix}\] Now, we need to do the same thing in the second column. We see that we can "subtract" $-2$ of the second row from the third to do this. Note that this is really adding 2 of the second row to the third, but the text likes to refer to all eliminations as subtraction. \[\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 0 & 0 & 5 \end{bmatrix}\] And now we have an upper triangular matrix.

Now, suppose we want to solve $A \mathbf x = \begin{bmatrix} 2 \\ -3 \\ 7 \end{bmatrix}$. We have to remember to repeat the same steps on the vector on the right hand side. One easy way to do this is to add the vector as a column to $A$, so we have \[\begin{bmatrix} A & \mathbf b \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 & 2 \\ 6 & 9 & 8 & -3 \\ 4 & -10 & -3 & 7 \end{bmatrix}.\] If we perform the same series of operations, we should end up with a matrix that looks like \[\begin{bmatrix} 2 & 1 & 1 & 2 \\ 0 & 6 & 5 & -9 \\ 0 & 0 & 5 & -15 \end{bmatrix}.\]

So the equation we would want to solve is $\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 0 & 0 & 5 \end{bmatrix} \mathbf x = \begin{bmatrix} 2 \\ -9 \\ -15 \end{bmatrix}$. Solving this, we should end up with $\mathbf x = \begin{bmatrix} 2 \\ 1 \\ -3 \end{bmatrix}$, which satisfies both the transformed system as well as the original system.

First, let's verify this.


    >>> A = np.array([[2, 1, 1], [6, 9, 8], [4, -10, -3]])
    >>> x = np.array([2, 1, -3])
    >>> A @ x
    array([ 2, -3,  7])
    >>> np.array([[2, 1, 1], [0, 6, 5], [0, 0, 5]]) @ x
    array([  2,  -9, -15])

Now, let's try to apply our transformation. We can do this by taking advantage of the fact that we can refer to parts of the array, like columns and rows, and that arrays are mutable. But first, let's compose our matrix $A$ with our vector $\mathbf b$. To do this, we can use the hstack function.


    >>> b = np.array([2, -3, 7])
    >>> C = np.hstack((A, b.reshape(3,1)))
    >>> C
    array([[  2,   1,   1,   2],
           [  6,   9,   8,  -3],
           [  4, -10,  -3,   7]])

Now, our first move was to eliminate the bottom of the first column by subtracting rows: we subtract two of Row 1 from Row 3 and three of Row 1 from Row 2.


    >>> C[2] = C[2] - 2 * C[0]
    >>> C
    array([[  2,   1,   1,   2],
           [  6,   9,   8,  -3],
           [  0, -12,  -5,   3]])
    >>> C[1] = C[1] - 3 * C[0]
    >>> C
    array([[  2,   1,   1,   2],
           [  0,   6,   5,  -9],
           [  0, -12,  -5,   3]])

Since arrays are mutable, notice (and be careful) that operations will persist and carry over. To finish off, we add two of Row 2 to Row 3.


    >>> C[2] = C[2] + 2 * C[1]
    >>> C
    array([[  2,   1,   1,   2],
           [  0,   6,   5,  -9],
           [  0,   0,   5, -15]])

Now, if you have been paying attention so far, then you will realize that what we've just done is nothing more than a bunch of linear combinations of rows. And as we saw previously, we can express linear combinations of rows as matrix products—for a matrix product $A = CR$, the rows of $C$ describe how to combine rows of $R$ to get rows of $A$.

Suppose we want to subtract the first row twice from the third row and place it in the third row. What this means is that

the first row stays the same: its linear combination is one of the first row and zero of the others, giving us the row $\begin{bmatrix} 1 & 0 & 0 \end{bmatrix}$.
Same with the second row, we have the row $\begin{bmatrix} 0 & 1 & 0 \end{bmatrix}$.
Then the third row is one of the third row and $-2$ of the first. Encoded as a row, this is $\begin{bmatrix} -2 & 0 & 1 \end{bmatrix}$.

Putting this together gives us the matrix $\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{bmatrix}$.

Generally speaking, matrices that encode a single elimination step have the same form: 1's along the diagonal, 0's everywhere else, except for the entry representing the elimination step.


    >>> C = np.hstack((A, b.reshape(3, 1)))
    >>> E = np.eye(3)
    >>> E[2,0] = -2
    >>> E
    array([[ 1.,  0.,  0.],
           [ 0.,  1.,  0.],
           [-2.,  0.,  1.]])
    >>> E @ C
    array([[  2.,   1.,   1.,   2.],
           [  6.,   9.,   8.,  -3.],
           [  0., -12.,  -5.,   3.]])

What this tells us is that we can define an elimination matrix for each of the row combinations that we performed. We can then sequence these matrices and multiply them to get one matrix that describes all of those row combinations. This matrix that we describe is lower triangular, since all of the entries that we fill out are on the lower triangle. This will turn out to be an important piece of information that we need to reconstruct $A$ together with the resulting upper triangular matrix.

The reason we need this extra information is because we typically only have $A$, but $\mathbf b$ can change. So though we can compute the upper triangular matrix based on $A$ once, we need information to transform any vector we want to solve for. We will see how we can use the lower triangular matrix we've defined for that purpose.