CISC 462 — Lecture 16

NP-completeness

So far we've defined $\mathbf P$, which is the class of problems that encapsulates the notion of what we know is efficiently solvable, and $\mathbf{NP}$, which is the class of problems that encompasses those problems with solutions that can be efficiently verified. By definition, it's clear that $\mathbf P \subseteq \mathbf{NP}$.

This leads us to one of the greatest mysteries and unsolved problems in all of computer science: does $\mathbf P = \mathbf{NP}$? At first, it seems like it should be incredibly obvious that $\mathbf P \neq \mathbf{NP}$. Indeed, our intuition tells us that while certain problems should be very difficult to solve, verifying the solution to a problem should be much easier than solving it. It's for this reason that the overwhelming majority of computer scientists believe that $\mathbf P \neq \mathbf{NP}$.

And yet, we're still unable to definitively prove this apparently obvious fact. So far, the best that we've been able to show is the NTM to TM simluation we looked at last week. In other words, $$ \mathbf{NP} \subseteq \mathbf{EXPTIME} = \bigcup_{k \geq 0} \mathbf{TIME}(2^{n^k}).$$ This gap in knowledge is one of the reasons why the second part of this course is so long and why complexity theory is still an active research area. Much of what we'll be going through in the rest of the course was developed as a possible way to attack the P vs. NP question. A lot of this theory turns out to be quite interesting in its own right and also gave us some interesting practical tools to work with. Of course, you may have noticed that the real prize remains out of reach.

One of the most important early developments in the pursuit of answering this question is the notion of NP-completeness. NP-completeness gives us an extremely useful tool for studying complexity as well as an extremely easy way to show that $\mathbf P = \mathbf{NP}$. Of course, that we haven't been able to show this yet is just another mystery on top of the already huge mystery.

This tool is reducibility, which we have already encountered earlier in the context of computability. Once again, we will use this concept to show that various problems are as complex as some other problem that we already know the complexity of.

A function $f: \Sigma^* \to \Sigma^*$ is a polynomial time computable function if some polynomial time TM $M$ exists that halts with $f(w)$ on its tape when started on any input $w$.

A language $A$ is polynomial time mapping reducible to a language $B$, written $A \leq_P B$, if a polynomial time computable function $f:\Sigma^* \to \Sigma^*$ exists, where for every $w$, $$ w \in A \iff f(w) \in B.$$ The function $f$ is called the polynomial time reduction of $A$ to $B$.

Theorem. If $A \leq_P B$ and $B \in \mathbf P$, then $A \in \mathbf P$.

Proof. Let $M$ be the polynomial time algorithm deciding $B$ and $f$ be the polynomial time reduction from $A$ to $B$. We describe a polynomial time algorithm $N$ for deciding $A$:

On input $w$, compute $f(w)$.
Run $M$ on input $f(w)$ and output whatever $M$ outputs.

Note $M$ accepts $f(w)$ whenever $w \in A$ by definition of $f$. Then $N$ runs in polynomial time since the computation of $f$ and $M$ are both polynomial and composition of polynomials is polynomial. $$\tag*{$\Box$}$$

A language $B$ is $\mathbf{NP}$-complete if it satisfies two conditions:

$B \in \mathbf{NP}$, and
$A \leq_P B$ for all $A \in \mathbf{NP}$.

This leads us to the following results.

Theorem. If $B$ is $\mathbf{NP}$-complete and $B \in \mathbf P$, then $\mathbf P = \mathbf{NP}$.

Theorem. If $B$ is $\mathbf{NP}$-complete and $B \leq_P C$ for $C \in \mathbf{NP}$, then $C$ is $\mathbf{NP}$-complete.

Proof. We already know $C$ is in $\mathbf{NP}$. Since $B$ is NP-complete, every language in $\mathbf{NP}$ is polynomial time reducible to $B$. Note that polynomial time reductions compose. Then since $B \leq_P C$, this implies that every language in $\mathbf{NP}$ is polynomial time reducible to $C$. $$\tag*{$\Box$}$$

The implications of the above two results are quite significant. Basically, if we want to show that $\mathbf P = \mathbf{NP}$, then all we need to do is show that just one problem that is $\mathbf{NP}$-complete reduces to problem in $\mathbf P$, or equivalently, has a polynomial time algorithm. This leads to the natural question of just how many $\mathbf{NP}$-complete problems there are.

The most difficult part in this process is finding the "first" $\mathbf{NP}$-complete problem. After all, it's quite daunting to show that every problem in $\mathbf{NP}$ can somehow reduce to this first problem. But once we're able to get this one problem, then we can just show that other problems are $\mathbf{NP}$-complete by reducing from this first one. And as we get more examples of $\mathbf{NP}$-complete problems, we can reduce from any of the problems we already know are $\mathbf{NP}$-complete.

So how many are there now, in 2017? In 1971, Stephen Cook showed what was essentially the first $\mathbf{NP}$-complete problem. In 1972, Richard Karp published a paper that gave a list of 21 problems and showed that they were $\mathbf{NP}$-complete by showing the polynomial time reductions. In 1979, Garey and Johnson published a book which contained approximately 300 problems that were shown to be $\mathbf{NP}$-complete. Today, we don't really have an accurate count anymore, but the number of problems ranges in the thousands and tens of thousands.

Showing any one of these problems has a polynomial time algorithm is enough to show that $\mathbf P = \mathbf{NP}$.

Satisfiability

The first problem which was shown to be $\mathbf{NP}$-complete is called the satisfiability problem, or $\mathrm{SAT}$. This was shown by Stephen Cook in 1971 and independently by Leonard Levin in 1973.

Recall that a boolean variable can take on a value of True or False. A boolean formula is an expression involving boolean variables and the boolean operations AND, OR, and NOT. A boolean formula is satisfiable if some assignment of 0s and 1s to to the variables makes the formula evaluate to True.

$\mathrm{SAT}$ asks, given a boolean formula, whether or not it is satisfiable.

Theorem. (Cook-Levin) $\mathrm{SAT}$ is $\mathbf{NP}$-complete.