CISC 462 — Lecture 14

P

We define $\mathbf P$ to be the class of languages that are decidable in polynomial time on a deterministic single-tape Turing machine. In other words, $$\mathbf P = \bigcup_{k \geq 1} \mathbf{TIME}(n^k).$$

Why do we care about polynomial time? Broadly speaking, we consider problems in $\mathbf P$ to be "feasible". Another common idea is that an algorithm with a polynomial bound is "fast" while an algorithm with an exponential bound is "slow". This is in line with our typical intutive understanding of what it means for an algorithm to be "efficient" if we haven't encountered any formal complexity theory. Of course, this is roughly speaking, and there are some common objections like, what about $n^{1000}$ or $1.0000001^{n}$?

These objections are quite reasonable. And although it's bad form for a theoretician to be appealing to an argument about practicality, it turns out most natural problems that are solvable in polynomial time do exhibit solutions that have complexity bounded by low degree polynomials. In fact, a lot of the results that result in high degree polynomials were eventually improved. The same argument about algorithms with exponential bounds is also true: most problems that we run into that have exponential bounds are actually quite hard.

Here, we can begin to abstract away from Turing machines. Recall that the Church-Turing thesis says that every reasonable model of computation is essentially equivalent in power. In other words, they can all simulate each other and solve exactly the same kinds of problems. The extended Church-Turing thesis says that not only can they all simulate each other, but they can do so with only a polynomial increase in running time. (As it turns out, because of recent developments, we're now much less certain about this than the standard Church-Turing thesis)

These two ideas together give us good enough reasons to consider $\mathbf P$ a class that's central to our study of complexity. On the one hand, we have the notion that $\mathbf P$ encompasses the set of problems for which we have efficient algorithms. And on the other, we have that we can abstract the exact machine model we're using because we can simulate it all with only polynomial overhead.

So let's look at some problems and do some analysis. The first problem we'll look at is called $\mathrm{PATH}$. Given a directed graph $G$ and two vertices $s$ and $t$, we want to know whether there's a path from $s$ to $t$.

Theorem. $\mathrm{PATH}$ is in $\textbf P$.

First, let's recall that a graph is a set of vertices and edges $\langle G \rangle$ is simply a list of its vertices and edges. Then the size of $G$ can be considered as the number of its vertices and edges. If $G$ has $n$ nodes, then a path has length at most $n$. If we do a brute force search, we end up with $n^n$ possible paths to check, which isn't polynomial in the size of the graph. Instead, we'll do the following.

Proof. The following is a polynomial time algorithm for $\mathrm{PATH}$.

  1. Mark node $s$.
  2. Repeat until no additional nodes can be marked: Scan all the edges of $G$. If an edge $(u,v)$ is found going from a marked node $u$ to an unmarked node $b$, mark $b$.
  3. If $t$ is marked, accept. Otherwise, reject.

To see that this algorithm runs in polynomial time, we note that the middle step is the one that gets run more than once. It runs at most $n$ times, since it marks an unmarked node. This is clearly polynomial in $n$. $$\tag*{$\Box$}$$

Next, we'll consider the problem of whether two given integers are relatively prime, $\mathrm{RELPRIME}$. Two numbers are relatively prime if 1 is the largest integer that divides both of them.

Theorem. $\mathrm{RELPRIME}$ is in $\mathbf P$.

Here, the algorithm seems quite straightforward: simply search through all of the divisors for both numbers. Unfortunately, this doesn't quite work; we have to be careful about what our input looks like. If our integers are given as binary (which they are), then the algorithm we just proposed is exponential in the size of the input.

Instead, we use the Euclidean algorithm. Recall that the greatest common divisor of $x$ and $y$ is the largest integer that divides both $x$ and $y$ and is denoted $\mathrm{gcd}(x,y)$. Then if $x$ and $y$ are relatively prime, we have $\mathrm{gcd}(x,y) = 1$.

Proof. The Euclidean algorithm is as follows.

  1. Given $\langle x,y \rangle$, where $x$ and $y$ are natural numbers in binary,
  2. Repeat until $y = 0$:
    1. Assign $x \gets x \bmod y$.
    2. Exchange $x$ and $y$.
  3. Output $x$.

Then the following algorithm solves $\mathrm{RELPRIME}$.

  1. Run the Euclidean algorithm on $\langle x,y \rangle$.
  2. If the result is 1, accept. Otherwise, reject.

Basically, we need to show that the Euclidean algorithm runs in polynomial time. Note that every execution of $x \gets x \bmod y$ cuts $x$ by at least half. First we note that $x \bmod y < y$, which means that $x > y$ after $x$ and $y$ are swapped before each execution. If $\frac x 2 \geq y$, then $x \bmod y < y \leq \frac x 2$ and $x$ is reduced by at least half. If $\frac x 2 < y$, then $x \bmod y = x - y < \frac x 2$ and $x$ is reduced by at least half.

Since the values of $x$ and $y$ are reduced by at least half at each iteration, the maximum number of times the loop is executed is $\min\{2 \log_2 x, 2 \log_2 y\}$. Note that the length of the binary encoding of an integer $n$ is $\log_2 n$. Thus, the number of iterations executed in the Euclidean algorithm is $O(n)$, where $n$ is the length of the binary representation of $x$ and $y$. $$\tag*{$\Box$}$$

Finally, we have the following result given without proof. You can find the proof in the textbook (Theorem 7.16).

Theorem. Every context-free language is a member of $\mathbf P$.

We showed that this was decidable a few weeks ago. As it turns out, that algorithm is not polynomial. Instead, the polynomial bound is achieved using a dynamic programming algorithm.