CISC 462 — Lecture 32

Derandomization

Last class, we ended off with a short history of primality testing slowly descending our complexity hierarchy over a period of about 25 years. This leads us to the question of whether we can do this for any problem that's in $\mathbf{BPP}$. But, we'll start off with another similar question that we've avoided until now: Is $\mathbf{BPP} \subseteq \mathbf{NP}$? For now, we know at least the following:

Theorem. (Sipser-Gács) $\mathbf{BPP} \subseteq \Sigma_2^P \cap \Pi_2^P$.

In other words, $\mathbf{BPP}$ is in the polynomial hierarchy and we even know that it's somewhere in the second level. What we don't know is how it relates to $\mathbf{NP}$. That is, this is a case where we don't even know if $\mathbf{NP} \subseteq \mathbf{BPP}$ or $\mathbf{BPP} \subseteq \mathbf{NP}$. This seems a bit strange but if you explicitly say the implications out loud, it makes a bit more sense.

It isn't exactly clear that $\mathbf{BPP} \subseteq \mathbf{NP}$, since it's not exactly clear how to convince a polynomial time verifier that you do indeed have the correct answer from some run of a probabilistic TM. On the other hand, if $\mathbf{NP} \subseteq \mathbf{BPP}$, then this would mean that we could use randomized algorithms to solve problems in NP. But then the following result throws some more wrenches into that idea.

Theorem. (Adleman) $\mathbf{BPP} \subseteq \mathbf P/poly$.

Proof. Suppose $L \in \mathbf{BPP}$. Then there is a PTM such that on an input word $w$ of size $n$, \begin{align*} w \in L &\implies Pr(\text{$M$ accepts $w$}) \geq 1 - 2^{-(n+1)}, \\ w \not \in L &\implies Pr(\text{$M$ accepts $w$}) \leq 2^{-(n+1)}. \\ \end{align*} Now, we can treat our series of coin flips on $M$ as a string over $\{0,1\}$. Let $r \in \{0,1\}^m$. For every $w$, at most $\frac{2^m}{2^{(n+1)}}$ values of $r$ cause $M$ to given an incorrect answer on $w$. Then summing up across all other strings of length $n$, there are at most $2^n \cdot \frac{2^m}{2^{(n+1)}} = \frac{2^m}2$ strings $r$ that cause $M$ to give an incorrect answer for some word $w$. In other words, there exists a string $r_0 \in \{0,1\}^m$ such that for every input word $w$, $M$ will give the correct answer. Then using this string $r_0$, we can create a polynomial size circuit $C$ and hardwire $r_0$ into it so that $C(w)$ gives the same answer as $M$ on $w$ with the random string $r_0$. $$\tag*{$\Box$}$$

The simple observation here is that nonuniformity is at least as powerful as randomness. However, we observe that our scaffolding of complexity classes gets another notch in it. If $\mathbf{NP} \subseteq \mathbf{BPP}$, then by the previous theorem, we have that $\mathbf{NP} \subseteq \mathbf P/poly$. But last week we saw from the Karp-Lipton theorem that if $\mathbf{NP} \subseteq \mathbf P/poly$, then $\mathbf{PH} = \Sigma_2^P$.

Now, this brings us back to the question of whether $\mathbf{BPP} = \mathbf P$. It turns out that unlike most other cases where we don't know the relationship of two classes, most theorists believe that $\mathbf{BPP} = \mathbf P$. The reason for this consists of some intuition and some interesting results where two seemingly unrelated concepts are tied together and give us another way to make our complexity hierarchy come crumbling down.

Let's go through some of the intuition. Basically, it deals with pseudorandom number generators. We're all (hopefully) aware of the fact that whenever we hit up random() in our language of choice, we're not really getting a random number; rather, given some input seed, we're creating a number that looks random in polynomial time (we don't want to wait a few years for our pseudorandom numbers!).

The idea is that if we're seeding the generator with some input of size $n$, we obviously can't generate any more than $2^n$ possible strings, thus, we want our numbers to be indistinguishable from actual random numbers in polynomial time. Once we have this, we could show that $\mathbf P = \mathbf{BPP}$ in the following way:

Let $n$ be the size of our input and, therefore, the size of the pseudorandom output. If we had a $\mathbf{BPP}$ machine that ran in $n^k$ time, we could loop over all $O(\log n)$-bit seeds, give the outputs to the $\mathbf{BPP}$ machine to use as coin flips and output the answer. The probability that the $\mathbf{BPP}$ machine accepts with a given pseudorandom string of coin flips has to be about the same as for a random string of coin flips, otherwise the machine could distinguish between them.

The following result will give us some further intuition about what's happening here.

Theorem. (Impagliazzo-Wigderson) Suppose there exists a problem that's solvable in exponential time and that's not solvable in subexponential time even with the help of a subexponential-size advice string. Then $\mathbf P = \mathbf{BPP}$.

The gist of this result is that it places a limit on what kinds of problems can be solved using advice. In particular, we don't believe we can solve exponential time problems with subexponential time even if we had advice that was subexponential (and we already saw how powerful having even one bit of advice was). But this very reasonable but seemingly unrelated assumption, Impagliazzo and Wigderson show, implies that $\mathbf P = \mathbf{BPP}$.

Now, if we flip our hypothetical $\mathbf{BPP}$ around from above, we can view the problem as the machine trying to distinguish between random and pseudorandom strings. If we're able to distinguish between pseudorandom and random strings, there must be some input word $w$ for which this is the case. Then we can think of $w$ as the advice string that tells the machine how to distingush between the two.

Put another way, what this means is if we can show that certain problems are hard even with advice, we would be able to prove that $\mathbf P = \mathbf{BPP}$.

With that in mind, let's go through a recap of what we've covered in the course (in class).