CS 360 — Lecture 20

Time-bounded computation

Let $f$ and $g$ be functions $f,g:\mathbb N \to \mathbb R$. We say $f(n) = O(g(n))$ if there exist $c$ and $n_0$ such that for all $n \geq n_0$, $f(n) \leq c\cdot g(n)$. We say that $g(n)$ is an upper bound for $f(n)$.

Let $f$ and $g$ be functions $f,g : \mathbb N \to \mathbb R$. We say that $f(n) = o(g(n))$ if $$ \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0.$$ That is, $f(n) = o(g(n))$ means that for any real number $c > 0$, there exists $n_0$ such that $f(n) < c \cdot g(n)$ for all $n \geq n_0$.

Let $T: \mathbb N \to \mathbb R$ be a function. The time complexity class $\mathbf{TIME}(T(n))$ is the collection of all languages that are decidable by an $O(T(n))$ time Turing machine.

Now that we have this notion of time based on the single tape deterministic Turing machine, it is worth asking whether the same properties apply to other models. We showed before that all of the augmentations that we made to the Turing machine didn't change its power. That is, a decidable or recognizable problem for one of the machines was also decidable or recognizable on the standard model. However, just because we can simulate a fancier machine doesn't mean that we can do so at no cost.

Theorem. Let $T(n)$ be a function, where $T(n) \geq n$. Then every $T(n)$ time multitape Turing machine has an equivalent $O(T^2(n))$ time single-tape Turing machine.

Proof. Let $M$ be a $k$-tape TM that runs in time $T(n)$. Then we construct $M'$ which has a single tape.

Recall how the single tape machine worked. We put all the tapes on one tape separated by $\#$ and with marked virtual tape heads. The machine makes two passes over the entire tape to check and update the tape heads on each simulation of a computation step on $M$.

How much time does this take? It depends on how much tape is used by the virtual tapes. Since $M$ ran in $T(n)$ time, each virtual tape has size at most $T(n)$. This gives us a total tape length of $O(T(n))$. Since each simulation step passes over the entire tape twice, we get $O(T(n))$ time for each simulation step.

Since we are simulating $M$, we will be simulating the $T(n)$ steps that $M$ takes. Each simluated step takes $O(T(n))$, giving us a total of $O(T^2(n))$ steps on $M'$. $$\tag*{$\Box$}$$

This result is a fundamental result of Hartmanis and Stearns in 1965 and was what started the study of computational complexity.

The Time Hierarchy

By this point in your career, you're familiar with the notion of analyzing the performance of algorithms and that translates fairly easily to the notion of problems requiring a certain amount of time complexity, but we haven't exactly proved one very important assumption we've been making all along: that we can solve more problems if we're given more time. It's pretty obvious that for, say, $n^2$ and $n^3$, we can solve an $n^2$ time problem with $n^3$ time. However, we haven't shown that there is an $n^3$ time problem that can't be solved using $n^2$ time.

It seems reasonable that if we have more time, we should be able to solve more problems. Intuitively, if I have two functions, one which grows faster than the other, the time complexity class for the faster growing function should have more problems in it. Of course, things are not so simple.

Theorem. (Borodin) Suppose $h$ is a recursive function such that $h(n) \geq n$. Then there exists an increasing recursive function $g$ such that $$\mathbf{TIME}(g(n)) = \mathbf{TIME}(h(g(n)).$$

This result is called the Gap theorem and was shown by Borodin in 1972. Essentially, this result says that there exists some function $g(n)$ where any $h(g(n))$ time bounded computation can be performed in $g(n)$ time; it doesn't matter if we allow $h(g(n))$ time, even if we had, say $h(n) = 2^n$. Naturally, this goes against our intuition and so we need a resonable restriction on the kinds of functions we consider.

A function $f : \mathbb N \to \mathbb N$, where $f(n) \geq O(n \log n)$ is time constructible if the function that maps the string $1^n$ to the binary represenation of $f(n)$ is computable in time $O(f(n))$.

Most of the functions that you'd expect to run into are time constructible. According to the definition, we only consider functions that are at least $n \log n$. For example, to see that $n \sqrt n$ is time constructible, we first count the number of $1$s in binary, which requires $O(n \log n)$ time since $O(\log n)$ steps are required for each of $n$ positions. Then, compute $\lfloor n \sqrt n \rfloor$ in binary via the binary representation of $n$. This is possible in $O(n \log n)$ time since the length of the numbers is $O(\log n)$.

Notice that time constructiblity is a condition that relies on the fact that a Turing machine computes it. This gives us a way to bound the running time of a Turing machine, by computing the function. Taking this into consideration, if we restrict our attention to those functions that are time-constructible (i.e. reasonable), we're able to say something about time complexity classes.

Theorem. For any time constructible function $f:\mathbb N \to \mathbb N$, a language $A$ exists that is decidable in $O(f(n))$ time but not in $o \left( \frac{f(n)}{\log f(n)} \right)$ time.

Instead of proving this formally (which would take a while), let's just do a quick outline of the idea of the proof and ignore the $\log$ factor. It turns out to be very similar to the kinds of things we were doing to show undecidability of certain languages.

Suppose we have a time-constructible function $f : \mathbb N \to \mathbb N$ and the Turing machine $D$ defined by

  1. On input $w = \langle M \rangle 1 0^k$, where $M$ is a Turing machine and $k \in \mathbb N$, compute $t = f(|w|)$.
  2. Simulate $M$ on $w$ for $t$ steps.
  3. If $M$ rejects, then accept. If $M$ accepts, then reject. If $M$ does not halt within $t$ steps, reject.

So what does $D$ do? It simulates a given Turing machine within time $O(f(n))$. If the simulation uses more time, $D$ rejects. If the simulation successfully finishes, then $D$ does the opposite of $M$.

So suppose that $L(D)$ is decidable in time $g(n) = o(f(n))$ by some $g(n)$-time Turing machine $M$. So for an appropriately large $n_0$, we feed $\langle M \rangle 1 0^{n_0}$ into $D$ and see what happens. Since $M$ runs in time $g(n) = o(f(n))$, we know the simulation on $M$ will finish and $M$ will either accept or reject and $D$ will do the opposite. Thus, we have $w \in L(D) \iff w \not \in L(M)$. But $M$ was supposed to decide the same language as $D$, so this is a contradiction, and therefore $L(D)$ isn't decidable in $o(f(n))$ time.

The formal proof is along these lines, but there are a lot of details concerning how you'd simulate this kind of thing and to verify that everything can be done within the claimed time bounds and this is basically why a $\log$ factor pops up.

This result is called the time hierarchy theorem, since in showing the existence of problems that can be solved in $O(f(n))$ time but not $o(f(n))$ time, it establishes the hierarchy of time complexity classes that we've been assuming existed. This is another result of Hartmanis and Stearns from 1965 and one that won them a Turing award.

Corollary. For any two functions $f_1,f_2:\mathbb N \to \mathbb N$, where $f_1(n)$ is $o \left( \frac{f_2(n)}{\log f_2(n)} \right)$ and $f_2$ is time constructible, $\mathbf{TIME}(f_1(n)) \subsetneq \mathbf{TIME}(f_2(n))$.

Corollary. For any two real numbers $0 \leq \varepsilon_1 < \varepsilon_2$, $$\mathbf{TIME}(n^{\varepsilon_1}) \subsetneq \mathbf{TIME}(n^{\varepsilon_2}).$$

As a final fun and strange, unintuitive result, let's think back to coming up with algorithms to solve problems. As you have probably experienced, you are often presented with problems and asked to come up with good algorithms to solve them. The recurring question in algorithm design is "can we do better"? We have plenty of examples from various classes, where you are confronted with a problem and iteratively refine an algorithm until you can't do any better.

But the question of when to stop is not exactly trivial. For many well-studied problems, we still don't really know if we have the best algorithm. Matrix multiplication is a great example, where the standard algorithm is $O(n^3)$ but we've spent decades shaving off the exponent down but it's still not known whether we can get it all the way down to $O(n^2)$.

It turns out that this is not that big of a mystery and it was shown that there do exist problems for which no fastest algorithm exists. This result is called Blum's speedup theorem and was shown by Blum in 1967.

Theorem. (Blum) There exists a computable function $P$ such that for any algorithm computing $P$ in time $T(n)$, there exists another algorithm computing $P$ in time $O(\log T(n))$.

Of course, we may not need to worry about these kinds of problems so much after all if we think back to the time hierarchy theorem.

P

We define $\mathbf P$ to be the class of languages that are decidable in polynomial time on a deterministic single-tape Turing machine. In other words, $$\mathbf P = \bigcup_{k \geq 1} \mathbf{TIME}(n^k).$$

The notion of characterizing problems by compuatiblity in polynomial time was first noted by Cobham in 1964. The most obvious question to come from this is why should we care about polynomial time? Broadly speaking, we consider problems in $\mathbf P$ to be "feasible". Another common idea is that an algorithm with a polynomial bound is "fast" while an algorithm with an exponential bound is "slow". This is in line with our typical intutive understanding of what it means for an algorithm to be "efficient" if we haven't encountered any formal complexity theory. Of course, this is roughly speaking, and there are some common objections like, what about $n^{1000}$ or $1.0000001^{n}$?

These objections are quite reasonable. And although it's bad form for a theoretician to be appealing to an argument about practicality, it turns out most natural problems that are solvable in polynomial time do exhibit solutions that have complexity bounded by low degree polynomials. In fact, a lot of the results that result in high degree polynomials were eventually improved. The same argument about algorithms with exponential bounds is also true: most problems that we run into that have exponential bounds are actually quite hard.

Here, we can begin to abstract away from Turing machines. Recall that the Church-Turing thesis says that every reasonable model of computation is essentially equivalent in power. In other words, they can all simulate each other and solve exactly the same kinds of problems. The extended Church-Turing thesis says that not only can they all simulate each other, but they can do so with only a polynomial increase in running time. (As it turns out, because of recent developments, we're now much less certain about this than the standard Church-Turing thesis)

These two ideas together give us good enough reasons to consider $\mathbf P$ a class that's central to our study of complexity. On the one hand, we have the notion that $\mathbf P$ encompasses the set of problems for which we have efficient algorithms. And on the other, we have that we can abstract the exact machine model we're using because we can simulate it all with only polynomial overhead.

So let's look at some problems and do some analysis. The first problem we'll look at is called $\mathrm{PATH}$. Given a directed graph $G$ and two vertices $s$ and $t$, we want to know whether there's a path from $s$ to $t$.

Theorem. $\mathrm{PATH}$ is in $\textbf P$.

First, let's recall that a graph is a set of vertices and edges $\langle G \rangle$ is simply a list of its vertices and edges. Then the size of $G$ can be considered as the number of its vertices and edges. If $G$ has $n$ nodes, then a path has length at most $n$. If we do a brute force search, we end up with $n^n$ possible paths to check, which isn't polynomial in the size of the graph. Instead, we'll do the following.

Proof. The following is a polynomial time algorithm for $\mathrm{PATH}$.

  1. Mark node $s$.
  2. Repeat until no additional nodes can be marked: Scan all the edges of $G$. If an edge $(u,v)$ is found going from a marked node $u$ to an unmarked node $b$, mark $b$.
  3. If $t$ is marked, accept. Otherwise, reject.

To see that this algorithm runs in polynomial time, we note that the middle step is the one that gets run more than once. It runs at most $n$ times, since it marks an unmarked node. This is clearly polynomial in $n$. $$\tag*{$\Box$}$$

Next, we'll consider the problem of whether two given integers are relatively prime, $\mathrm{RELPRIME}$. Two numbers are relatively prime if 1 is the largest integer that divides both of them.

Theorem. $\mathrm{RELPRIME}$ is in $\mathbf P$.

Here, the algorithm seems quite straightforward: simply search through all of the divisors for both numbers. Unfortunately, this doesn't quite work; we have to be careful about what our input looks like. If our integers are given as binary (which they are), then the algorithm we just proposed is exponential in the size of the input.

Instead, we use the Euclidean algorithm. Recall that the greatest common divisor of $x$ and $y$ is the largest integer that divides both $x$ and $y$ and is denoted $\mathrm{gcd}(x,y)$. Then if $x$ and $y$ are relatively prime, we have $\mathrm{gcd}(x,y) = 1$.

Proof. The Euclidean algorithm is as follows.

  1. Given $\langle x,y \rangle$, where $x$ and $y$ are natural numbers in binary,
  2. Repeat until $y = 0$:
    1. Assign $x \gets x \bmod y$.
    2. Exchange $x$ and $y$.
  3. Output $x$.

Then the following algorithm solves $\mathrm{RELPRIME}$.

  1. Run the Euclidean algorithm on $\langle x,y \rangle$.
  2. If the result is 1, accept. Otherwise, reject.

Basically, we need to show that the Euclidean algorithm runs in polynomial time. Note that every execution of $x \gets x \bmod y$ cuts $x$ by at least half. First we note that $x \bmod y < y$, which means that $x > y$ after $x$ and $y$ are swapped before each execution. If $\frac x 2 \geq y$, then $x \bmod y < y \leq \frac x 2$ and $x$ is reduced by at least half. If $\frac x 2 < y$, then $x \bmod y = x - y < \frac x 2$ and $x$ is reduced by at least half.

Since the values of $x$ and $y$ are reduced by at least half at each iteration, the maximum number of times the loop is executed is $\min\{2 \log_2 x, 2 \log_2 y\}$. Note that the length of the binary encoding of an integer $n$ is $\log_2 n$. Thus, the number of iterations executed in the Euclidean algorithm is $O(n)$, where $n$ is the length of the binary representation of $x$ and $y$. $$\tag*{$\Box$}$$

Finally, we have the following results given without proof.

Theorem. Every context-free language is in $\mathbf P$.

Theorem. Every regular language is decidable in time $o(n \log n)$.