CMSC 27100 — Lecture 26

Today, we'll be continuing our discussion of expectation by looking at an application to probabilistic analysis of algorithms.

First, recall that the typical paradigm for analyzing algorithms is worst-case analysis. That is, the complexity of an algorithm is determined by the worst-case input to that algorithm. For example, if we consider something like bubble sort or insertion sort, the worst-case input to the algorithm is an list that is in reverse-sorted order. Such an input forces our algorithm to carry out the most number of steps, which for an list of size $n$ is typically about $\binom n 2 = O(n^2)$ operations.

Where does the number $\binom n 2$ come from? If we think about those simple sorting algorithms, we notice that we're always considering pairs of elements to put into order. So the maximum number of comparisons should be comparing all possible pairs: $\binom n 2$. (Obviously, there are worse, less-clever algorithms that will surpass this)

Based on merge sort, we know that sorting can be done in $O(n \log n)$ time. However, one problem with merge sort is it involves the creation of a bunch of new arrays or lists in order to perform the recursive step. This sort of thing can be expensive. We might rather have a way to sort an array of integers without messing around and creating a million new arrays.

So is there a way to sort an array in $O(n \log n)$ time simply by swapping elements around in an array? Quicksort is such an algorithm, invented by Tony Hoare in 1959, while he was working on machine translation with Kolmogorov (recall his earlier appearance in probability theory).

The act of moving elements around the pivot is called partitioning. What does the result look like?

Consider the array $A = [2,7,3,6,1,5,9,0,8,4]$. If we choose $3$ as our pivot, one possible array we can obtain is $[2,1,0,3,7,6,5,9,8,4]$. We would then recursively sort $[2,1,0]$ and $[7,6,5,9,8,4]$.

For the purposes of analysis, we will be assuming that no two elements are the same.

It's not entirely clear that this is actually efficient. There are two things we need to deal with here. First, let's deal with how to do step 2, the swapping. There's an obvious way to do this by constructing a new array. It's not quite so obvious how to do this in-place. But all it requires is a simple trick with pointers.

If $A = [8,7,6,2,3,1,4,9,0,5]$ and we choose $4$ to be our pivot, we obtain the following arrays: \begin{align*} [8,\underline 7,6,2,3,1,\fbox{4},9,0,\overline 5] \\ [\fbox{4},\underline 7,6,2,3,1,8,9,0,\overline 5] \\ [\fbox{4},\underline 5,6,2,3,1,8,9,\overline 0, 7] \\ [\fbox{4},\underline 0,6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 6,2,3,1,8,\overline 9, 5, 7] \\ [0, \fbox{4},\underline 9,2,3,1,\overline 8, 6, 5, 7] \\ [0, \fbox{4},\underline 8,2,3,\overline 1, 9, 6, 5, 7] \\ [0, \fbox{4},\underline 1,2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, \fbox{4},\underline 2,\overline 3, 8, 9, 6, 5, 7] \\ [0, 1, 2, \fbox{4},\underline{\overline 3}, 8, 9, 6, 5, 7] \\ [0, 1, 2, 3, \overline{\underline{\fbox{4}}}, 8, 9, 6, 5, 7] \\ \end{align*}

It's clear that this algorithm just moves through the array at most once, manipulating pointers/array indices and performing swaps.

The remaining question is how we should pick our pivot $p$. Intuitively, if we pick $p$ poorly, say too small or large relative to the rest of the integers in the array, then we end up doing a lot more work in one of the "halves" (but they're not really halves because they're not even) of the array. But how bad can this get?

Let $p$ be the pivot element. Let $T(n)$ be the number of comparisons on an input of size $n$. Then we have the recurrence \[T(n) = T(p-1) + T(n-p) + O(n).\] Here, we get $O(n)$ from the partition step, $T(p-1)$ for the lesser portion of the array, and $T(n-p)$ for the greater portion of the array. So we want to take $p$ such that this recurrence is maximized, \[T(n) = \max_{1 \leq p \leq n}\{T(p-1) + T(n-p) + O(n)\}.\] It's easy to see that this is maximized when $p = 1$ or $p - n$, giving us the recurrence \[T(n) = T(n-1) + O(n),\] which if we roll out, gives us $O(n^2)$.

Basically, when the worst case happens, it's analogous to insertion sort. It's pretty clear then that the optimal choice of $p$ is to choose the median element of the array, which would give you a recurrence resembling $T(n) = 2 T(n/2) + O(n)$, similar to merge sort. But this is sort of a chicken and egg problem: if we knew what the median element of the array was, we probably wouldn't need to sort it!

So the challenge here is that we need to choose the median in better than $O(n \log n)$ time. Specifically, we want to do so in linear time. This turns out to be a difficult problem—it wasn't until 1973 that a linear time algorithm was found for this problem. So how did Quicksort work until then?

The trick is to use randomness. Instead of trying to figure out how to find the best $p$, let's just make a random choice! So our algorithm looks like the following:

Recall that the operation of interest in sorting is the comparison—i.e. insertion sort can require $\binom n 2$ comparisons in the worst case. So in similar fashion, we'll consider the expected number of comparisons over the entire run of the algorithm.

So we need to take a closer look at which comparisons can be made by our algorithm. For the purposes of analysis, we will rename our integers to $z_1, \dots, z_n$ so that $z_1 \lt \cdots \lt z_n$.

First, we'll define a random variable $X$ for the total number of comparsisons performed by the algorithm. We observe that two integers $z_i$ and $z_j$ are compared at most once: when either one is the pivot. So we can rewrite $X$ in terms of indicators $I_{i,j}$ defined by \[I_{ij} = \begin{cases} 1 & \text{if $z_i$ is compared to $z_j$}, \\ 0 & \text{otherwise.} \end{cases}\] which gives us \[X = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} I_{ij}.\] So, the expected number of comparisons is $E(X)$, which by linearity of expectation is \[E(X) = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} E(I_{ij}).\] Since we need the expectation of each indicator, all that remains is to argue about the probability of each comparison.

The probability that $z_i$ and $z_j$ are compared in a run of Quicksort is $\frac{2}{j-i+1}$.

Observe that for any element $x$ such that $z_i \lt x \lt z_j$ is chosen as a pivot, $z_i$ and $z_j$ do not get compared. Therefore, of the set $\{z_i, \dots, z_j\}$, either $z_i$ or $z_j$ must be chosen as a pivot first. Since there are $j-i+1$ such elements, the probability of this is $\frac{1}{j-i+1}$. Since either $z_i$ or $z_j$ can get chosen in order for the comparison to be made, we have a total probability of $\frac{2}{j-i+1}$.

This probability is enough to let us compute the expected number of comparisons, $E(X)$.

The expected number of comparisons over a single run of Quicksort is $O(n \log n)$.

We take the expectation of $X$, as defined above. Then we have \begin{align*} E(X) &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n E(I_{ij}) \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j-i+1} \\ &= 2 \sum_{i=1}^{n-1} \sum_{j=2}^{n-i+1} \frac 1 j \\ &\leq 2n \sum_{j=1}^{n} \frac 1 j \\ &\leq 2n (1 + \log n) \\ &= O(n \log n) \end{align*} Therefore the expected number of comparisons is $O(n \log n)$.

Recall that the worst-case running time for the deterministic version of this algorithm is $O(n^2)$. This algorithm has the same worst-case running time, if Quicksort always happens to choose the worst pivot. However, for randomized algorithms, what we really care about is the expected running time. Since we use a random process to choose the pivot, we are no longer bound to the worst-case, since the algorithm is no longer deterministic. This is a nice example of how introducing a bit of randomness can help create efficient algorithms.

CMSC 27100 — Lecture 26

Probabilistic analysis of algorithms