CMSC 27200 — Lecture 24

Imagine again that we have a mysterious cube that can solve a particularly difficult problem for us. A good question to ask about our mysterious cube is how we can tell if it's correct. If it can solve a hard problem, how can we possibly check it?

So the mysterious cube goes okay, I can give you proof and along with a Yes answer to the problem it prints out a receipt that we can use to check whether it's correct or not. And after being spooked by the cube that was apparently listening in on us, we have to figure out what to do with this new information.

An algorithm $V(x,c)$ is a verifier for a problem $P$ if for every input string $x$, $x \in P$ if and only if there exists a string $c$ such that $V(x,c) = yes$. We say that $c$ is a certificate.

Consider the Independent Set problem. If a graph $G$ contains an independent set, a certificate for this is the set $S$. Given $S$, we can check if $S$ is an independent set in polynomial time: for each vertex in $S$, check its neighbours and verify that none of them are in $S$.

Of course, we can ask the same sorts of questions about verification algorithms as we do of any algorithm, particularly what its time complexity is. This leads us to the following idea.

The class $\mathbf{NP}$ is the class of decision problems for which there exists a polynomial time verifier $V$. That is, $V(x,c)$ is a polynomial-time algorithm and the certificate $c$ has size polynomial in the size of $x$; $|c| \leq p(|x|)$ for some polynomial $p$.

So which problems are in $\mathbf{NP}$? One easy thing to see is have that any problem that was efficiently solvable to begin with is also efficiently verifiable.

Suppose we have a problem $P \in \mathbf P$. Then there is a polynomial time algorithm $A$ that solves it. We will define the following verifier $V$ for $P$:

On input $x$ and certificate $c$, Return $A(x)$

Essentially, this verifier ignores whatever certificate is provided and just calls $A$ to compute the answer itself. This will always give us the correct answer for any certificate $c$. This verifier runs in polynomial time, since $A$ runs in polynomial time. Therefore, $P \in \mathbf{NP}$.

What else is in $\mathbf{NP}$? A lot of the problems we've seen in this section of the course are also in $\mathbf{NP}$.

For each of the problems above, we will describe an appropriate certificate and how to use it to verify a solution in polynomial time.

A certificate for Independent Set is a list of the vertices in the independent set. One can verify that the list of vertices is of size at least $k$ and that no two vertices share an edge in $O(m)$ time.
A certificate for Vertex Cover is a list of the vertices in the vertex cover. One can verify that the list of vertices is of size at most $k$ and that no edge is uncovered in $O(m)$ time.
A certificate for Set Cover is a list of the subsets $X_i$ in the cover. One can verify that the list of subsets is of size at most $k$ and that the union of the subsets is equal to the universe in $O(n)$ time, where $n$ is the number of elements in the universe.
A certificate for Satisfiability is an assignment of each variable. One can verify that the assignment is satisfying for a formula by evaluating the formula in $O(n)$ time, where $n$ is the number of variables. To see this, we can form a parse tree for a well-formed formula and evaluate the truth value at each internal node—there are $O(n)$ of these.
A certificate for 3-SAT is the same as for Satisfiability.

Okay, cool. Is there anything that's not known to be verifiable in polynomial time? Yes, here's one: Suppose you have a boolean formula in 3-CNF; is it unsatisfiable? Here's another one: given a boolean formula in 3-CNF, is there a unique satisfying assignment for it? What do these problems have in common? Just having one solution isn't enough: we need to rule out other possible solutions. So there's a lot more out there than just $\mathbf P$ and $\mathbf{NP}$!

Now, the next natural question to ask is: what are the problems that can be efficiently verified, but not efficiently solvable? The answer to that is, we don't know, because we don't actually know if any exist.

Does there exist a problem $P \in \mathbf{NP}$ such that $P \not \in \mathbf P$?

This is the famous $\mathbf P$ vs. $\mathbf{NP}$ problem—the biggest open problem in theoretical computer science for almost 50 years. Practically speaking, it's the roadblock that keeps us from definitively answering whether large numbers of problems are efficiently solvable.

Philosophically, this question raises questions about how we think about computation and efficiency, and one of the Millennium Prize Problems. Intuitively, it should be much easier to verify a solution for a problem than it is to find it. And yet, we've been unable to prove it.

A lot of the computational complexity theory that has been developed since the early 70s was in the pursuit of this question. And if you ever go on to study more complexity theory, what you'll find are some answers and way more questions.

$\mathbf{NP}$-completeness

Recall that reduction is not symmetric. This suggests that one thing we could do is try to find the "hardest" problems in NP and try to separate those from the "easier" ones. But what does "hardest" mean in this context? That would be the existence of some problem $A$ for which $A \leq_P B$ for all $B \in NP$. Such problems are called NP-complete.

It's important to note that a problem can be $\mathbf{NP}$-hard but not in $\mathbf{NP}$, which is why that notion is separate from the notion of completeness. Problems that are $\mathbf{NP}$-complete represent the "most difficult" problems in $\mathbf{NP}$.

One consequence of the existence of such a problem is it would give us a very easy way to show that $\mathbf{NP} = \mathbf P$. Suppose that $P$ is $\mathbf{NP}$-complete. If we find a polynomial time algorithm for $P$, then this means we have a polynomial time algorithm for every problem in $\mathbf{NP}$, by transitivity of $\leq_P$, and therefore $\mathbf P = \mathbf{NP}$.

Of course, an obvious question is whether such a problem exists. It sounds like a lot of work to show that a problem $P$ is $\mathbf{NP}$-complete, since one would have to show how to encode an arbitrary problem in $\mathbf{NP}$ into an instance of $P$ (how would this work for something like Vertex Cover?). But for now, let's suppose that there is an $\mathbf{NP}$-complete problem out there. By using transitivity of $\leq_P$, we get the following.

Let $P$ be $\mathbf{NP}$-complete and suppose $P' \in \mathbf{NP}$. If $P \leq_P P'$ then $P'$ is $\mathbf{NP}$-complete.

Let $Q \in \mathbf{NP}$. Since $P$ is $\mathbf{NP}$-complete, we have $Q \leq_P P$. But since $P \leq_P P'$, by transitivity, we have $Q \leq_P P'$. Since $Q$ was arbitrary, we have that $P'$ is $\mathbf{NP}$-complete.

So if we want to show that $P$ is $\mathbf{NP}$-complete, we can avoid having to show how to transform an instance of some arbitrary problem in $\mathbf{NP}$ into an instance of $P$. All we have to do is show that we can transform a problem we've already shown is $\mathbf{NP}$-complete into $P$, which is much easier. The other observation is that this means that all $\mathbf{NP}$-complete problems are polynomial time reducible to each other.

Of course, all of this only works if such a problem actually exists. That such a problem does exist was shown in 1971 by Stephen Cook, who we've seen before briefly when we were discussing fast integer multiplication. This is the result for which Cook is most famous. The theorem was independently proved by Leonid Levin, who at the time was in the Soviet Union. Levin is currently a faculty member at Boston University and Cook is a faculty member at the University of Toronto, which he joined in 1970 due to being denied tenure at Berkeley.

One thing you might notice is that $\mathbf{NP}$ is not exactly an acronym for polynomially verifiable. That is because efficient verification is an alternate, but equivalent, definition for $\mathbf{NP}$. $\mathbf{NP}$ actually stands for nondeterministic polynomial time. The original definition has to do with distinguishing between problems that could be solved on deterministic vs. nondeterministic Turing machines.

For our purposes, we can think of it this way: we can solve a problem in $\mathbf{NP}$ in polynomial time using nondeterminism. How do we do this? First, we nondeterministically guess/choose a certificate $c$. Since $c$ is polynomial in size, we can do this in polynomial time. Then once we have our string, we can run the verifier in polynomial time to get our answer.

We won't go through the proof in detail here, because it involves Turing machines, but the proof is really amazing. So recall that a Turing machine is a model of computation and for all intents and purposes, you can consider a specific Turing machine as a computational device that computes the solution to a particular problem, much like an algorithm. Here, we're particularly interested in nondeterministic Turing machines.

Since $\mathbf{NP}$ is the class of problems that can be solved by a nondeterministic Turing machine in polynomial time, an easy way to show that every problem reduces to a particular problem is just to show how to simulate a nondeterministic Turing machine. This is basically what the proof of Cook-Levin does.

The idea is to take a nondeterministic Turing machine and an input string and to show how to construct a propositional formula that simulates the computation of the Turing machine on the input string. The proof involves showing the following:

The consequence of this is that if we can solve Satisfiability, we can solve any problem in $\mathbf{NP}$. How? Here is an algorithm:

So if we can solve Satisfiability in polynomial time, we can solve any problem in $\mathbf{NP}$ in polynomial time.

The existence of an NP-complete problem was obviously a huge breakthrough and the natural follow-up question was what other NP-complete problems might be out there. Since these are the hardest problems in NP, the thinking was that we'd find a few more and it wouldn't be very long before we're able to show a clear separation from P. Not long after, Richard Karp (from Edmonds-Karp) published a paper showing 21 problems were NP-complete in 1972, some of which we've seen already. And since Cook-Levin and Karp, there have been thousands of problems shown to be $\mathbf{NP}$-complete.

Practically speaking, this gives us a much easier way to prove that certain problems are $\mathbf{NP}$-complete (which, for now, means that we can give up on finding an efficient algorithm for them). To do this we need to show that our problem is in $\mathbf{NP}$ and that there is an $\mathbf{NP}$-complete problem that reduces to it. This can be summarized in four steps.

CMSC 27200 — Lecture 24

Polynomial Time Verification

$\mathbf{NP}$-completeness