Let's dive in and be a bit presumptuous. If we stare at the equation long enough, we have what looks like a multiplication problem, and we know how to solve those: \[\mathbf x = A^{-1} \mathbf b.\] And so we're done! Of course, there are a few questions that we have to deal with. For example, what does $A^{-1}$ mean? What exactly is the (multiplicative) inverse of a matrix? How does one find the multiplicative inverse of a matrix? How do we know that $A^{-1}$ even exists?
Since $A^{-1}$ itself is a matrix, we can view it in the same way: it's a matrix that describes some collection of vectors. But the above scenarios tell us that it's not always the case that such a matrix always exists. Furthermore, computing such a matrix can be prohibitively expensive. If all we want to do is find $\mathbf x$ for a given $\mathbf b$, then computing $A^{-1}$ can be overkill, if it can be done at all.
It may surprise you that we can't or don't want to find the inverse of a matrix, but we run into this case all the time even just with ordinary numbers. After all, there's no integer that always gives us a half of a number, or more technically, there are no multiplicative inverses of integers that are themselves integers.
What we need is a process that will give us $\mathbf x$, if one exists and also tell us whether it exists or not. As a result, this process will also tell us something about the columns of $A$. Since for now we're not interested so much in $A^{-1}$, what we can do is manipulate it in a way that gives us an answer for $\mathbf x$ without necessarily preserving $A$ itself. Why can we do this? It's the same principle as in numerical algebra: apply the same operations to both sides of the equation.
Let's jump ahead to the goal: an upper triangular matrix with non-zero entries on the diagonal. Once we have one of these, computing $\mathbf x$ becomes very easy.
Let $U = \begin{bmatrix} 7 & 6 & 9 \\ 0 & 5 & 6 \\ 0 & 0 & 1 \end{bmatrix}$ such that $U \mathbf x = \begin{bmatrix} -18 \\ -25 \\ -5 \end{bmatrix}$. In other words, we need to solve \[\begin{bmatrix} 7 & 6 & 9 \\ 0 & 5 & 6 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -18 \\ -25 \\ -5 \end{bmatrix}.\] If we turn this into a system of equations, it's not so bad: the final equation gives us something to start with, $x_3 = -5$, and we keep on substituting: \begin{align*} 5x_2 + 6x_3 & = -25 & 7x_1 + 6x_2 + 9x_3 & = -18 \\ 5x_2 + 6 \cdot (-5) &= -25 & 7x_1 + 6 \cdot 1 + 9 \cdot (-5) &= -18\\ 5x_2 - 30 &= -25 & 7x_1 - 39 &= -18\\ 5x_2 &= 5 & 7x_1 &= 21 \\ x_2 &= 1 & x_1 &= 3 \end{align*} which gives us $\mathbf x = \begin{bmatrix}3 \\ 1 \\ -5 \end{bmatrix}$.
So the goal is to find a series of operations that we can apply to $A \mathbf x = \mathbf b$ to get $U \mathbf x = \mathbf c$. Again, $\mathbf x$ remains unchanged, because we apply the same transformations to both sides.
What kinds of transformations can we apply? Here, we think back to algebra and solving systems of equations: we want to isolate the final variable so that we can start substituting it into the other equations. We do this by adding multiples of equations to other ones. When viewed as a matrix, this gives us our upper triangular matrix, zeroing out the bottom triangle. This is called elimination.
Consider the matrix $A = \begin{bmatrix} 2 & 1 & 1 \\ 6 & 9 & 8 \\ 4 & -10 & -3 \end{bmatrix}$. We want to transform this into an upper triangular matrix.
First, we need to clear out the entries under the first column. We observe that this can be done by subtracting three of the first row from the second to get \[\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 4 & -10 & -3 \end{bmatrix}\] Then we do the same thing with the final row, subtracting two of the first from it. \[\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 0 & -12 & -5 \end{bmatrix}\] Now, we need to do the same thing in the second column. We see that we can "subtract" $-2$ of the second row from the third to do this. Note that this is really adding 2 of the second row to the third, but the text likes to refer to all eliminations as subtraction. \[\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 0 & 0 & 5 \end{bmatrix}\] And now we have an upper triangular matrix.
Now, suppose we want to solve $A \mathbf x = \begin{bmatrix} 2 \\ -3 \\ 7 \end{bmatrix}$. We have to remember to repeat the same steps on the vector on the right hand side. One easy way to do this is to add the vector as a column to $A$, so we have \[\begin{bmatrix} A & \mathbf b \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 & 2 \\ 6 & 9 & 8 & -3 \\ 4 & -10 & -3 & 7 \end{bmatrix}.\] If we perform the same series of operations, we should end up with a matrix that looks like \[\begin{bmatrix} 2 & 1 & 1 & 2 \\ 0 & 6 & 5 & -9 \\ 0 & 0 & 5 & -15 \end{bmatrix}.\]
So the equation we would want to solve is $\begin{bmatrix} 2 & 1 & 1 \\ 0 & 6 & 5 \\ 0 & 0 & 5 \end{bmatrix} \mathbf x = \begin{bmatrix} 2 \\ -9 \\ -15 \end{bmatrix}$. Solving this, we should end up with $\mathbf x = \begin{bmatrix} 2 \\ 1 \\ -3 \end{bmatrix}$, which satisfies both the transformed system as well as the original system.
This is all well and good, but we should probably ask ourselves whether we're allowed to do this. Recall that in the world of linear algebra, the only things we're really allowed to do is to take linear combinations of things. So the question is: Can we express what we've been doing as some kind of linear combination?
Luckily, the answer is yes: we are taking linear combinations of rows, since all we're doing is scaling a row and adding them together. And as we saw previously, we can express linear combinations of rows as matrix products—for a matrix product $A = CR$, the rows of $C$ describe how to combine rows of $R$ to get rows of $A$.
Suppose we want to subtract the first row twice from the third row and place it in the third row. What this means is that
Putting this together gives us the matrix $\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{bmatrix}$.
Generally speaking, matrices that encode a single elimination step have the same form: 1's along the diagonal, 0's everywhere else, except for the entry representing the elimination step.
So elimination matrices give us a way to express elimination. But elimination is not the only operation we need. Consider the following, after performing elimination to clear out the first column: \[\begin{bmatrix} 2 & 4 & 3 \\ 4 & 8 & 7 \\ 2 & 8 & 5 \end{bmatrix} \rightarrow \begin{bmatrix} 2 & 4 & 3 \\ 0 & 0 & 1 \\ 0 & 4 & 2 \end{bmatrix}\] If this were a system of linear equations, we would be pretty happy with this result, because we know we can start substituting. Technically, this is not an upper triangular matrix, but it works for our purposes.
Intuitively, what we're really doing is we're assuming that it's safe to just swap the rows. And it is! But if we want to be thorough, we need to encode this action. Luckily, there is a matrix for this action. This is a permutation. In discrete mathematics, a permutation is a sequencing of objects. What we're sequencing in this case are the rows or columns of our matrix.
But how do we encode sequencing? It's actually very similar to the exchange matrix we saw earlier: it's just another particular form of linear combination.
Suppose we wanted to permute the rows of $A$ so that we have $(1,2,3) \rightarrow (1,3,2)$. This means that
This gives us the matrix $\begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}$.
There are a few interesting things to note about permutation matrices:
So elimination and permutation are the two actions we need and both can be framed in terms of linear combinations. However, even with this, we may still end up in scenarios where we don't have an upper triangular matrix. \[\begin{bmatrix} 3 & 6 & 9 \\ 1 & 2 & 4 \\ 4 & 8 & 4 \end{bmatrix} \rightarrow \begin{bmatrix} 3 & 6 & 9 \\ 0 & 0 & 1 \\ 0 & 0 & 8 \end{bmatrix}\]
The scenario we've encountered is when we have a zero entry in the diagonal and no permutation is able to fix this. This tells us that we're in a situation where we have dependent rows and columns, and therefore, we have infinitely many solutions to our system. (Indeed, some observation reveals that column 2 is simply twice column 1)
For now, we'll continue to focus on the situation where we have one solution for our system.
We discussed earlier that not every matrix has an inverse. Even if it does, we don't necessarily want to compute $A^{-1}$. But what exactly is an inverse?
The inverse of a matrix $A$ is a matrix denoted $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$.
We say that if a matrix $A$ has an inverse $A^{-1}$ that it is invertible.
So the product of $A$ and its inverse is the identity matrix. This is generally what we want out of inverses: for instance, the multiplicative inverse is a number $n^{-1}$ such that $n n^{-1} = 1$, the multiplicative inverse. If $n$ is an integer, then its multiplicative inverse is the rational number $\frac 1 n$. Similarly, the additive inverse of a number $n$ is $-n$ and adding these two together gets us the additive identity, 0.
Notice that this implies that $A$ must be a square matrix—otherwise, it cannot be the case that both products $A A^{-1}$ and $A^{-1} A$ exist.
But there are other reasons why, intuitively, not every matrix has an inverse. Recall that we can view a matrix $A$ as a function that takes as input $\mathbf x$ and produces a vector $\mathbf v = A \mathbf x$. Now, it may be the case that we have two different vectors $\mathbf x$ and $\mathbf y$ that give us $A \mathbf x = A \mathbf y = \mathbf v$. In this case, what should $A^{-1} \mathbf v$ be?
The answer is this can't happen: a matrix can only produce one vector when given a vector to multiply by, in the same way that a function can only produce one thing for each input. So it must be that the inverse can't exist.