CAPP 30271 — Lecture 6

When elimination fails

So far, we've looked at situations where we could successfully transform a matrix $A$ into an upper triangular matrix. However, our view of possible solutions (one, infinitely many, none) tells us this is not always the case. Indeed, here is an example where this is not possible: \[\begin{bmatrix} 3 & 6 & 9 \\ 1 & 2 & 4 \\ 4 & 8 & 4 \end{bmatrix} \rightarrow \begin{bmatrix} 3 & 6 & 9 \\ 0 & 0 & 1 \\ 0 & 0 & 8 \end{bmatrix}\]

The scenario we've encountered is when we have a zero entry in the diagonal and we have two rows that are dependent. This tells us that we're in a situation where we have dependent rows and columns. Indeed, some observation of the original matrix reveals that column 2 is simply twice column 1.

However, we may also find that elimination fails even though the matrix has independent columns. Consider the following, after performing elimination to clear out the first column: \[\begin{bmatrix} 2 & 4 & 3 \\ 4 & 8 & 7 \\ 2 & 8 & 5 \end{bmatrix} \rightarrow \begin{bmatrix} 2 & 4 & 3 \\ 0 & 0 & 1 \\ 0 & 4 & 2 \end{bmatrix}\] If this were a system of linear equations, we would be pretty happy with this result, because we know we can start substituting. Technically, this is not an upper triangular matrix, but it works for our purposes.

Intuitively, what we're really doing is we're assuming that it's safe to just swap the rows. And it is! But if we want to be thorough, we need to encode this action. Luckily, there is a matrix for this action. This is a permutation. In discrete mathematics, a permutation is a sequencing of objects. What we're sequencing in this case are the rows or columns of our matrix.

But how do we encode sequencing? It's actually very similar to the exchange matrix we saw earlier: it's just another particular form of linear combination.

Suppose we wanted to permute the rows of $A$ so that we have $(1,2,3) \rightarrow (1,3,2)$. This means that

This gives us the matrix $\begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}$.

There are a few interesting things to note about permutation matrices:

One way to see how this plays out is to manipulate an identity matrix in NumPy.


    >>> P = np.eye(3)
    >>> P
    array([[1., 0., 0.],
           [0., 1., 0.],
           [0., 0., 1.]])
    >>> P[[1,2]] = P[[2,1]]  # swap rows 1 and 2
    >>> P
    array([[1., 0., 0.],
           [0., 0., 1.],
           [0., 1., 0.]])
    

You can try the same thing with columns.


    >>> P = np.eye(3)
    >>> P
    array([[1., 0., 0.],
           [0., 1., 0.],
           [0., 0., 1.]])
    >>> P[:,[0,1]] = P[:,[1,0]]  # swap columns 0 and 1
    >>> P
    array([[0., 1., 0.],
           [1., 0., 0.],
           [0., 0., 1.]])
    

So elimination and permutation are the two actions we need and both can be framed in terms of linear combinations.

Matrix inverses

We discussed earlier that not every matrix has an inverse. Even if it does, we don't necessarily want to compute $A^{-1}$. But what exactly is an inverse?

The inverse of a matrix $A$ is a matrix denoted $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$.

We say that if a matrix $A$ has an inverse $A^{-1}$ that it is invertible.

So the product of $A$ and its inverse is the identity matrix. This is generally what we want out of inverses: for instance, the multiplicative inverse is a number $n^{-1}$ such that $n n^{-1} = 1$, the multiplicative inverse. If $n$ is an integer, then its multiplicative inverse is the rational number $\frac 1 n$. Similarly, the additive inverse of a number $n$ is $-n$ and adding these two together gets us the additive identity, 0.

Notice that this implies that $A$ must be a square matrix—otherwise, it cannot be the case that both products $A A^{-1}$ and $A^{-1} A$ exist.

But there are other reasons why, intuitively, not every matrix has an inverse. Recall that we can view a matrix $A$ as a function that takes as input $\mathbf x$ and produces a vector $\mathbf v = A \mathbf x$. Now, it may be the case that we have two different vectors $\mathbf x$ and $\mathbf y$ that give us $A \mathbf x = A \mathbf y = \mathbf v$. In this case, what should $A^{-1} \mathbf v$ be?

The answer is this can't happen: a matrix can only produce one vector when given a vector to multiply by, in the same way that a function can only produce one thing for each input. So it must be that the inverse can't exist.

If we view $A$ as a transformation on some vector $\mathbf x$, then in order to think about what $A^{-1}$ should be, we have to ask what transformation gets us from $A\mathbf x$ back to $\mathbf x$? In the case of elimination matrices, that's actually not too hard to answer.

Consider the elimination matrix $E_{21} = \begin{bmatrix}1 & 0 & 0 \\ -4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}$ which subtracts 4 of the first row from the second row. The clear inverse operation to restore the matrix would be to just add 4 of the first row back to the second row. We can express this as a matrix: $E_{21}^{-1} = \begin{bmatrix}1 & 0 & 0 \\ 4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}$.


    >>> E = np.eye(3)
    >>> E[1,0] = -4
    >>> Einv = np.eye(3)
    >>> Einv[1,0] = 4
    >>> E @ Einv
    array([[1., 0., 0.],
           [0., 1., 0.],
           [0., 0., 1.]])
    >>> np.linalg.inv(E)
    array([[ 1., -0., -0.],
           [ 4.,  1.,  0.],
           [ 0.,  0.,  1.]]
    

Strang goes through a number of different properties of inverses in (2.2). The important question is: when does a matrix $A$ have an inverse? These properties basically tell us that a matrix has an inverse if and only if its columns are linearly independent.

Thinking about and writing down why each of these are true and necessarily imply each other is a nice exercise, but let's think about what this says if we consider matrices as data.

From the functiona/transformation perspective, we see that inverting a matrix amounts to inverting the transformation it does to a vector. In other words, there is enough information about this transformation and enough information is preserved by the transformation to reverse it.

Here, we see that invertible matrices are exactly those with independent columns. This aligns with our previous understanding that a matrix of independent columns results in a linear system with one solution. In other words, the linear system is so well-defined that we can pinpoint a solution—there is enough information to do so.

Now, since we're interested in the inverses of elimination matrices, we should think about how to consider the inverses of products of matrices.

Let $A$ and $B$ be matrices of the same size. If $A$ and $B$ are invertible, then $AB$ is also invertible. Furthermore, the inverse of $AB$, $(AB)^{-1}$ is $B^{-1} A^{-1}$.

This is important—because the order in which the matrices are multiplied matters, the order in which their inverses are multiplied also matters—we must reverse the order.

If we view matrices as a function that transforms vectors, then we can consider the transformation on a given $\mathbf x$. The product $AB\mathbf x$ can be thought of as a transformation that first creates $B \mathbf x = \mathbf y$ and then computes $A \mathbf y = \mathbf z$. \[\mathbf x \xrightarrow{B} \mathbf y \xrightarrow{A} \mathbf z\]

The procedure to reverse this process becomes clear if we have the inverses $A^{-1}$ and $B^{-1}$: we first need to compute $A^{-1} \mathbf z = \mathbf y$ and then take this vector to get $B^{-1} \mathbf y = \mathbf x$: \[\mathbf x \xleftarrow{B^{-1}} \mathbf y \xleftarrow{A^{-1}} \mathbf z\] This is exactly $(AB)^{-1} \mathbf z = B^{-1} A^{-1} \mathbf z$.

Recall that $E_{21} = \begin{bmatrix}1 & 0 & 0 \\ -4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}$ subtracts 4 of the first row from the second row and the inverse operation adds 4 of the first row back to the second row: $E_{21}^{-1} = \begin{bmatrix}1 & 0 & 0 \\ 4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}$.

Now, since we handled the second row, suppose we want to do elimination on the third row by using the second row. We use a matrix like $E_{32} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\0 & -2 & 1 \end{bmatrix}$. As above, the inverse of this matrix is $E_{32}^{-1} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\0 & 2 & 1 \end{bmatrix}$.

So suppose we accomplish $E_{32} E_{21} A = U$. What is $E_{32} E_{21}$? We can compute this: \[E_{32} E_{21} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\0 & -2 & 1 \end{bmatrix}\begin{bmatrix}1 & 0 & 0 \\ -4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ -4 & 1 & 0 \\ 8 & -2 & 1 \end{bmatrix}.\]

Now, we went through the trouble of thinking about inverses, so let's consider them. We have \[(E_{32} E_{21})^{-1} = E_{21}^{-1} E_{32}^{-1} = \begin{bmatrix}1 & 0 & 0 \\ 4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}\begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\0 & 2 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 4 & 1 & 0 \\ 0 & 2 & 1 \end{bmatrix}.\]

There's something in the "niceness" of the matrices that we get. The inverse of $E$ looks "nicer" or "cleaner" than $E$—the multiples go in the right spot, no fuss, no mess. But in $E$, we have the negatives, yes, but for some reason there's an extra entry. What's going on?

Remember that these are elimination matrices, which means that we have been zeroing out entries of $A$. If we do this process in sequence, then this means that we must have zeroed out the $2,1$ entry before moving on to think about the elimination matrices in the third row. What this means is that each elimination matrix is really describing how to combine rows of $U$, not $A$. So if we want a matrix $E$ that describes combinations of rows of $A$, there is going to be some "mess" in some of the entries. But going in reverse, we don't have that problem, so $E^{-1}$ contains only the constants we expect.

Here, we trace this result with three elimination matrices.


    >>> Es = [np.eye(3, dtype=int) for i in range(3)]
    >>> Es[0][1,0] = -4
    >>> Es[1][2,0] = -5
    >>> Es[2][2,1] = 2     # set up the elimination matrices
    >>> Es
    [array([[ 1,  0,  0],
            [-4,  1,  0],
            [ 0,  0,  1]]),
     array([[ 1,  0,  0],
            [ 0,  1,  0],
            [-5,  0,  1]]),
     array([[1, 0, 0],
            [0, 1, 0],
            [0, 2, 1]])]
    >>> Es_inv = [np.int64(np.linalg.inv(e)) for e in Es]   # invert them
    [array([[1, 0, 0],
            [4, 1, 0],
            [0, 0, 1]]),
     array([[1, 0, 0],
            [0, 1, 0],
            [5, 0, 1]]),
     array([[ 1,  0,  0],
            [ 0,  1,  0],
            [ 0, -2,  1]])]
    >>> E = Es[2] @ Es[1] @ Es[0]  # multiply elimination matrices
    >>> E
    array([[  1,   0,   0],
           [ -4,   1,   0],
           [-13,   2,   1]])    
    >>> Es_inv[0] @ Es_inv[1] @ Es_inv[2]  # mulitply to get inversion
    array([[ 1,  0,  0],
           [ 4,  1,  0],
           [ 5, -2,  1]])
    >>> np.int64(np.linalg.inv(E))  # we got the same thing
    array([[ 1,  0,  0],
           [ 4,  0,  0],
           [ 5, -2,  1]])
    

This leads to the following observation. We have been using elimination on a matrix $A$ to transform it into an upper triangular matrix $U$. Elimination can be expressed as a series of elimination matrices. Because elimination changes the bottom rows of a matrix based on the top rows, our elimination matrices are lower triangular matrices, with $1$s along the diagonal.

So if we take all of our elimination matrices, we can multiply them together to get a single matrix $E$. We then have the relationship $E_k E_{k-1} \cdots E_2 E_1 A = EA = U$, where $E_i$ is a single elimination matrix. This is a great result, but as we saw in the example above, the entries are not quite so satisfying. If $\ell_{ij}$ is the corresponding "multiple" from elimination matrix $E_{ij}$, then for a $3 \times 3$ matrix, we have \[E = \begin{bmatrix} 1 & 0 & 0 \\ -\ell_{21} & 1 & 0 \\ \ell_{32} \ell_{21} - \ell_{31} & -\ell_{32} & 1 \end{bmatrix}.\]

However, $E$ is invertible and to get $E^{-1}$, all we need to do is flip the signs for our elimination matrices and multiply them in reverse order. When we do, we get a very interesting looking matrix: \[E^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ \ell_{21} & 1 & 0 \\ \ell_{31} & \ell_{32} & 1 \end{bmatrix}.\]

Indeed, this is exactly what we see from our above code example.

What's more is notice that we get \[E^{-1} E A = A = E^{-1} U.\]

Notice that like $E$, $E^{-1}$ is also a lower triangular matrix. Elimination matrices are lower triangular and their inverses are lower triangular and their products are lower triangular. Since $E^{-1}$ is lower triangular, we let $L = E^{-1}$. Then we get the factorization \[A = LU.\]

This is the LU-factorization.