The 3-satisfiability problem is one of the most famous "difficult" problem in computer science. The problem asks, when given a propositional formula in conjunctive normal form where each clause has three literals, whether there is an assignment of variables that makes the formula evaluate to true. Here, a clause is three variables or their negations joined together by $\vee$. Each clause is joined by $\wedge$. For example, \[(x_1 \vee \neg x_2 \vee x_4) \wedge (x_2 \vee x_3 \vee \neg x_4) \wedge (\neg x_1 \vee \neg x_3 \vee x_5).\]
One (also difficult) variant of this problem asks for an assignment that makes as many clauses true as possible. This is difficult, because if there are $n$ variables, we have to look through as many as $2^n$ different assignments to see which one gives us the maximum number of satisfied clauses.
However, what if we just randomly guessed an assignment? Suppose we flip a coin independently for each variable $x_i$, setting it to True with probability $\frac 1 2$. How many clauses should we expect to satisfy? Our claim is the following.
Let $\varphi$ be a formula in 3-CNF with $k$ clauses. we should expect that $\frac 7 8$ of them will be satisfied—perhaps surprisingly large.
Let's name our clauses $C_1, \dots, C_k$. Define the random variable $Z_j$ by \[Z_j = \begin{cases} 1 & \text{if clause $C_j$ is satisfied}, \\ 0 & \text{otherwise}. \end{cases}\] Then we define the random variable $Z$ to be the number of clauses that have been satisfied. That is, $Z = Z_1 + \cdots + Z_k$. We can then compute $E(Z)$ by figuring out $E(Z_j)$ for a single clause. But what is $E(Z_j)$?
Let's consider the possible assignments we can make to a clause $(x \vee y \vee z)$. There are three literals and each one can take on two values, so there are as many as 8 possible assignments. However, we observe that the clause is satisfied as long as one literal evaluates to True. In other words, in order for the clause not to be satisfied, all three literals must evaluate to False. There is only one possible assignment that does this, with probability $\frac 1 8$, which means that the probability that our clause is satisfied is $1 - \frac 1 8 = \frac 7 8$.
We can trace the reasoning here a bit more formally in the following way. Let $C_i = \ell_1 \vee \ell_2 \vee \ell_3$ be a clause with literals $\ell_1, \ell_2, \ell_3$. Then, \begin{align*} \Pr(\text{Clause $C_i$ is satisfied}) &= 1 - \Pr(\text{$C_i$ is not satisfied}) \\ &= 1 - \Pr(\ell_1 = F \wedge \ell_2 = F \wedge \ell_3 = F) \\ &= 1 - \Pr(\ell_1 = F) \cdot \Pr(\ell_2 = F) \cdot \Pr(\ell_3 = F) \\ &= 1 - \frac 1 2 \cdot \frac 1 2 \cdot \frac 1 2 \\ &= 1 - \frac 1 8 \\ &= \frac 7 8 \end{align*}
So $E(Z_j) = \frac 7 8$ for all $j$. This gives us \[E(Z) = \sum_{j=1}^k E(Z_j) = \sum_{j=1}^k \frac 7 8 = \frac 7 8 k.\]
This is a nice result, which tells us a bit more than what we came here for. This result says that if we have a random assignment, we can expect to satisfy $\frac 7 8$ of the clauses. Now, notice that we may not be guaranteed this number—we may end up satisfying fewer. However, the implication is that we could also end up satisfying more, which gives us an interesting result: there has to exist an assignment of variables that satisfies at least $\frac 7 8$ of the clauses.
This is an example of what's called the probabilistic method, which was popularized by Erdős. Recall that there are a few ways to show existence: we can exhibit an example, we can come up with a constructive proof, or we can show indirectly that it must exist by assuming otherwise and arriving at a contradiction. Probaility theory gives us another indirect way to show that an object exists: show that there is a nonzero probability that such an object exists. In our case, we've shown that there is non-zero probability that every 3-CNF formula contains an assignment that will satisfy a large number of clauses, i.e. at least $\frac 7 8$ths of them.
This also gives a pretty good algorithm for approximating an assignment that satisfies the maximum number of clauses. It's quite simple: randomly choose an assignment and on expectation, you'll get one that satisfies at least $\frac 7 8$ of them. If you don't get one, you can just try again. Some more complicated analysis can tell you how many times to expect to try before getting a good assignment. But notice that the algorithms themselves are quite simple—it's much more complicated to try to pull this off deterministically.
First, recall that the typical paradigm for analyzing algorithms is worst-case analysis. That is, the complexity of an algorithm is determined by the worst-case input to that algorithm. For example, if we consider something like bubble sort or insertion sort, the worst-case input to the algorithm is an list that is in reverse-sorted order. Such an input forces our algorithm to carry out the most number of steps, which for an list of size $n$ is typically about $\binom n 2 = O(n^2)$ operations.
Where does the number $\binom n 2$ come from? If we think about those simple sorting algorithms, we notice that we're always considering pairs of elements to put into order. So the maximum number of comparisons should be comparing all possible pairs: $\binom n 2$. (Obviously, there are worse, less-clever algorithms that will surpass this)
Based on merge sort, we know that sorting can be done in $O(n \log n)$ time. However, one problem with merge sort is it involves the creation of a bunch of new arrays or lists in order to perform the recursive step. This sort of thing can be expensive. We might rather have a way to sort an array of integers without messing around and creating a million new arrays.
So is there a way to sort an array in $O(n \log n)$ time simply by swapping elements around in an array? Quicksort is such an algorithm, invented by Tony Hoare in 1959, while he was working on machine translation with Kolmogorov (recall his earlier appearance in probability theory).
The idea behind Quicksort is this: given an array $A$,
The act of moving elements around the pivot is called partitioning. What does the result look like?
Consider the array $A = [2,7,3,6,1,5,9,0,8,4]$. If we choose $3$ as our pivot, one possible array we can obtain is $[2,1,0,3,7,6,5,9,8,4]$. We would then recursively sort $[2,1,0]$ and $[7,6,5,9,8,4]$.
For the purposes of analysis, we will be assuming that no two elements are the same.
It's not entirely clear that this is actually efficient. There are two things we need to deal with here. First, let's deal with how to do step 2, the swapping. There's an obvious way to do this by constructing a new array. It's not quite so obvious how to do this in-place. But all it requires is a simple trick with pointers.
Consider the following algorithm on an array $A$:
If $A = [8,7,6,2,3,1,4,9,0,5]$ and we choose $4$ to be our pivot, we obtain the following arrays: \begin{align*} [8,\underline 7,6,2,3,1,\fbox{4},9,0,\overline 5] \\ [\fbox{4},\underline 7,6,2,3,1,8,9,0,\overline 5] \\ [\fbox{4},\underline 5,6,2,3,1,8,9,\overline 0, 7] \\ [\fbox{4},\underline 0,6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 9,2,3,1,\overline 8, 6, 5, 7] \\ [0, \fbox{4},\underline 8,2,3,\overline 1, 9, 6, 5, 7] \\ [0, \fbox{4},\underline 1,2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, \fbox{4},\underline 2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, 2, \fbox{4},\underline{\overline 3}, 8, 9, 6, 5, 7] \\ [0, 1, 2, 3, \overline{\underline{\fbox{4}}}, 8, 9, 6, 5, 7] \\ \end{align*}
It's clear that this algorithm just moves through the array at most once, manipulating pointers/array indices and performing swaps.
Observe that in our algorithm, partitioning is where most of the work happens. Since partitioning needs just one pass through the list, the number of comparisons required is said to be $O(n)$.
The remaining question is how we should pick our pivot $p$. Intuitively, if we pick $p$ poorly, say too small or large relative to the rest of the integers in the array, then we end up doing a lot more work in one of the "halves" (but they're not really halves because they're not even) of the array. But how bad can this get?
Quicksort has running time $O(n^2)$ in the worst case.
Let $p$ be the pivot element. Let $T(n)$ be the number of comparisons on an input of size $n$. Then we have the recurrence \[T(n) = T(p-1) + T(n-p) + O(n).\] Here, we get $O(n)$ from the partition step, $T(p-1)$ for the lesser portion of the array, and $T(n-p)$ for the greater portion of the array. So we want to take $p$ such that this recurrence is maximized, \[T(n) = \max_{1 \leq p \leq n}\{T(p-1) + T(n-p) + O(n)\}.\] It's easy to see that this is maximized when $p = 1$ or $p - n$, giving us the recurrence \[T(n) = T(n-1) + O(n),\] which if we roll out, gives us $O(n^2)$.
Basically, when the worst case happens, it's analogous to insertion sort. It's pretty clear then that the optimal choice of $p$ is to choose the median element of the array, which would give you a recurrence resembling $T(n) = 2 T(n/2) + O(n)$, similar to merge sort. But this is sort of a chicken and egg problem: if we knew what the median element of the array was, we probably wouldn't need to sort it!
So the challenge here is that we need to choose the median in better than $O(n \log n)$ time. Specifically, we want to do so in linear time, since otherwise, the best we can do ends up being $O(n^2)$ or more. But this turns out to be a difficult problem—it wasn't until 1973 that a linear time algorithm was found for this problem. So how did Quicksort work until then?
The trick is to use randomness (and in fact, this was the initial version of Quicksort that was published). Instead of trying to figure out how to find the best $p$, let's just make a random choice! So our algorithm looks like the following:
Recall that the operation of interest in sorting is the comparison—i.e. insertion sort can require $\binom n 2$ comparisons in the worst case. So in similar fashion, we'll consider the expected number of comparisons over the entire run of the algorithm.
First, we'll define a random variable $X$ for the total number of comparisons performed by the algorithm. Then we can express the total number of comparisons by defining indicators for each pair of integers that gets compared.
Let $z_1 \lt z_2 \lt \cdots \lt z_n$ be the integers in our list in sorted order. Note that the index here does not indicate the integer's position in the list. The integers are ordered here in increasing order, since whether or not an integer gets compared depends on its size, not its location. We'll see why in a bit.
We observe that two integers $z_i$ and $z_j$ are compared at most once: when either one is the pivot. So we can define indicators $I_{i,j}$ by \[I_{ij} = \begin{cases} 1 & \text{if $z_i$ is compared to $z_j$}, \\ 0 & \text{otherwise.} \end{cases}\] which gives us \[X = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} I_{ij}.\] So, the expected number of comparisons is $E(X)$, by linearity of expectation, is \[E(X) = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} E(I_{ij}).\] Since we need the expectation of each indicator, all that remains is to argue about the probability of each comparison. So we need to take a closer look at which comparisons are actually made by our algorithm.
The probability that $z_i$ and $z_j$ are compared in a run of Quicksort is $\frac{2}{j-i+1}$.
Observe that for any element $x$ such that $z_i \lt x \lt z_j$ is chosen as a pivot, $z_i$ and $z_j$ do not get compared. Therefore, of the set $\{z_i, \dots, z_j\}$, either $z_i$ or $z_j$ must be chosen as a pivot first. Since there are $j-i+1$ such elements, the probability of this is $\frac{1}{j-i+1}$. Since either $z_i$ or $z_j$ can get chosen in order for the comparison to be made, we have a total probability of $\frac{2}{j-i+1}$.
This probability is enough to let us compute the expected number of comparisons, $E(X)$.
The expected number of comparisons over a single run of Quicksort is $O(n \log n)$.
We take the expectation of $X$, as defined above. Then we have \begin{align*} E(X) &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n E(I_{ij}) \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j-i+1} \\ &= 2 \sum_{i=1}^{n-1} \sum_{j=2}^{n-i+1} \frac 1 j \\ &\leq 2n \sum_{j=1}^{n} \frac 1 j \\ &\leq 2n (1 + \log n) \\ &= O(n \log n) \end{align*} Therefore the expected number of comparisons is $O(n \log n)$.
Recall that the worst-case running time for the deterministic version of this algorithm is $O(n^2)$. This algorithm has the same worst-case running time, if Quicksort always happens to choose the worst pivot. However, for randomized algorithms, what we really care about is the expected running time. Since we use a random process to choose the pivot, we are no longer bound to the worst-case, since the algorithm is no longer deterministic. This is a nice example of how introducing a bit of randomness can help create efficient algorithms.