So elimination matrices give us a way to express elimination. But elimination is not the only operation we need. Consider the following, after performing elimination to clear out the first column: \[\begin{bmatrix} 2 & 4 & 3 \\ 4 & 8 & 7 \\ 2 & 8 & 5 \end{bmatrix} \rightarrow \begin{bmatrix} 2 & 4 & 3 \\ 0 & 0 & 1 \\ 0 & 4 & 2 \end{bmatrix}\] If this were a system of linear equations, we would be pretty happy with this result, because we know we can start substituting. Technically, this is not an upper triangular matrix, but it works for our purposes.
Intuitively, what we're really doing is we're assuming that it's safe to just swap the rows. And it is! But if we want to be thorough, we need to encode this action. Luckily, there is a matrix for this action. This is a permutation. In discrete mathematics, a permutation is a sequencing of objects. What we're sequencing in this case are the rows or columns of our matrix.
But how do we encode sequencing? It's actually very similar to the column exchange matrix we saw earlier: it's just another particular form of linear combination. The trick is something we alluded to earlier: to think about linear combinations of columns, our transformation has to come after $A$, while if we want to use linear combinations of rows, our transformation matrix has to come before $A$.
Suppose we wanted to "permute" the columns of $A$ so that we have $(1,2,3) \rightarrow (2,3,1)$. Then we have the permutation matrix $\begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}$ and we compute $A\begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}$. Now, suppose we wanted to permute the rows of $A$ in the same way. This gives us the permutation matrix $\begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}$ and we compute $\begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}A$. Note that moving the permutation matrix to the other side changes the interpretation of the matrix (you can try this out for yourself!).
We observe that all permutation matrices have this form: exactly one 1 in every row and column and 0 everywhere else.
Elimination and permutation are the two actions we need. However, recall that we may still end up in scenarios where we don't have an upper triangular matrix. \[\begin{bmatrix} 3 & 6 & 9 \\ 1 & 2 & 4 \\ 4 & 8 & 4 \end{bmatrix} \rightarrow \begin{bmatrix} 3 & 6 & 9 \\ 0 & 0 & 1 \\ 0 & 0 & 8 \end{bmatrix}\]
The scenario we've encountered is when we have a zero entry in the diagonal and no permutation is able to fix this. This tells us that we're in a situation where we have dependent rows and columns, and therefore, we have infinitely many solutions to our system. (Indeed, some observation reveals that column 2 is simply twice column 1)
For now, we'll continue to focus on the situation where we have one solution for our system.
We discussed earlier that not every matrix has an inverse. Even if it does, we don't necessarily want to compute $A^{-1}$. But what exactly is an inverse?
The inverse of a matrix $A$ is a matrix denoted $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$.
We say that if a matrix $A$ has an inverse $A^{-1}$ that it is invertible.
So the product of $A$ and its inverse is the identity matrix. This is generally what we want out of inverses: for instance, the multiplicative inverse is a number $n^{-1}$ such that $n n^{-1} = 1$, the multiplicative inverse. If $n$ is an integer, then its multiplicative inverse is the rational number $\frac 1 n$. Similarly, the additive inverse of a number $n$ is $-n$ and adding these two together gets us the additive identity, 0.
Notice that this implies that $A$ must be a square matrix—otherwise, it cannot be the case that both products $A A^{-1}$ and $A^{-1} A$ exist.
But there are other reasons why, intuitively, not every matrix has an inverse. Recall that we can view a matrix $A$ as a function that takes as input $\mathbf x$ and produces a vector $\mathbf v = A \mathbf x$. Now, it may be the case that we have two different vectors $\mathbf x$ and $\mathbf y$ that give us $A \mathbf x = A \mathbf y = \mathbf v$. In this case, what should $A^{-1} \mathbf v$ be?
The answer is this can't happen: a matrix can only produce one vector when given a vector to multiply by, in the same way that a function can only produce one thing for each input. So it must be that the inverse can't exist.
If we view $A$ as a transformation, the question that we have to ask of $A^{-1}$ is what transformation gets us back to $\mathbf x$? In the case of elimination matrices, that's actually not too hard to answer.
Consider the elimination matrix $E_{21} = \begin{bmatrix}1 & 0 & 0 \\ -4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}$ which subtracts 4 of the first row from the second row. The clear inverse operation to restore the matrix would be to just add 4 of the first row back to the second row. We can express this as a matrix: $E_{21}^{-1} = \begin{bmatrix}1 & 0 & 0 \\ 4 & 1 & 0 \\0 & 0 & 1 \end{bmatrix}$.
Strang goes through a number of different properties of inverses in (2.2). The important question is: when does a matrix $A$ have an inverse? These properties basically tell us that a matrix has an inverse if and only if its columns are linearly independent.
Let's look at a few of these in a bit more detail.
If $A$ has an inverse, it is unique.
Suppose that $A$ has two inverses, $B$ and $C$. Since $AA^{-1} = A^{-1} A$, we can write $BA = I$ and $AC = I$. We can multiply these matrices together. By associativity, we have $(BA)C = B(AC)$. This gives us $IC = BI$, which means $B = C$.
This is a common approach for showing that a mathematical object is unique: suppose that you have two of them and then work your way to showing that the two things you thought you had are actually the same thing.
If $A \mathbf v = \mathbf 0$ for some $\mathbf v \neq \mathbf 0$, then $A$ has dependent columns and $A$ has no inverse.
We note that if $A \mathbf v = \mathbf 0$ and $\mathbf v \neq \mathbf 0$, this means that there is a linear combination of the columns of $A$ that produces $\mathbf 0$ that doesn't zero out all the vectors. This is one of the conditions that tells us that our set of vectors is linearly dependent.
So now we have $A \mathbf v = A \mathbf 0 = \mathbf 0$. This means that there are at least two vectors that produce $\mathbf 0$. In other words, there is more than one solution to $A \mathbf x = \mathbf 0$, so $A^{-1}$ cannot exist.