CISC 462 — Lecture 22

TQBF is PSPACE-complete

Theorem. $\mathrm{TBQF}$ is $\mathbf{PSPACE}$-complete.

Proof. First, we show that $\mathrm{TBQF} \in \mathbf{PSPACE}$. We can define a simple recursive algorithm $T$ that decides $\mathrm{TBQF}$ in polynomial space:

  1. On $\langle \varphi \rangle$, if there are no quantifers, then there are only constants and we just evaluate whether $\varphi$ is True or False; accept if it's True and reject if it's False.
  2. If $\varphi = \exists x \psi$, then recurisvely call $T$ on $\psi$ with $x = 0$ and then with $x = 1$. If either of these accepts, then accept; otherwise, reject.
  3. If $\varphi = \exists x \psi$, then recurisvely call $T$ on $\psi$ with $x = 0$ and then with $x = 1$. If both of these accepts, then accept; otherwise, reject.

We note that this algorithm has a recursion depth of $n$, one for each variable. At each level, we only store the value of one variable. Thus, the total space used is linear.

Now, we show that $\mathbf{TQBF}$ is $\mathbf{PSPACE}$-hard. Let $A$ be a language decided by a TM $M$ in space $n^k$. We give a polynomial time reduction from $A$ to $\mathrm{TBQF}$, mapping an input string $w$ to a quantified boolean formula $\varphi_w$ that is true if and only if $M$ accepts $w$.

Recall from the proof of the Cook-Levin theorem that we constructed a formula by defining constraints for a tableau of a computation of the machine $M$ on a word $w$. We cannot directly construct such a formula, since the total size of the tableau is exponential. This means that the resulting formula will be exponential in size, which would exceed the polynomial time restriction on our reduction.

Of course, that assumes that we encode the constraints of each cell in the same way. However, we have some new tools that we can put to use: quantifiers. To see how we can do this, we consider the strategy that we used in the proof of Savitch's theorem. That is, we consider how to construct some general formula $\varphi_{c_1,c_2,t}$, which encodes whether the machine can reach a configuration $c_2$ from $c_1$ in $t$ steps. Once we can do this, we construct a formula that is true when the starting configuration can reach the accepting configuration in $2^{d\cdot f(n)}$ steps, where $f(n)$ is the space used by $M$.

Again, we perform this construction recursively. We first consider the base case, $t=1$. We check that either $c_1 = c_2$ or that $c_2$ is reachable from $c_1$ on a transition that's defined on the machine $M$. This is done exactly in the same way as in the Cook-Levin theorem, checking that every $3 \times 2$ window of cells is "legal". Again, we observe that since the machine runs in $n^k$ space, the size of each configuration is $n^k$ and each of these formulas must be of size $O(n^k)$.

Then if $t>1$, we construct $\varphi_{c_1,c_2,t}$ recursively. To start, we consider the obvious construction $$ \varphi_{c_1,c_2,t} = \exists m_1 \left[\varphi_{c_1,m_1,\frac t 2} \wedge \varphi_{m_1,c_2,\frac t 2}\right].$$ Here, we write $m_1$ as shorthand for all of the variables $x_1, \dots, x_{O(n^k)}$ of some configuration $m_1$, so in reality, there are $O(n^k)$ variables being existentially quantified here. Then this formula checks that there exists some configuration $m_1$ that can be reached from $c_1$ in $\frac t 2$ steps and that it can reach $c_2$ in $\frac t 2$ steps. Then these two formulas can themselves be constructed recursively.

However, if we examine the size of the formulas, while we reduce $t$ by half at each level of recursion, we also add two forumlas, negating the space savings that we wanted. To solve this, we apply the universal quantifier to get the following formula $$ \varphi_{c_1,c_2,t} = \exists m_1 \forall (c_3,c_4) \in \{(c_1,m_1),(m_1,c_2)\} \left[\varphi_{c_3,c_4,\frac t 2}\right].$$ We take advantage of the fact that the two subformulas that we create are fairly similar and put the universal quantifier to work. Of course, what we wrote is not exactly syntacticially correct, but it's not too difficult to see that it can be rewritten "properly" in constant size.

Then we can construct $\varphi_{c_{start},c_{accept},h}$, where $c_{start}$ is the starting configuration, $c_{accept}$ is the accepting configuration, and $h = 2^{d\cdot f(n)}$ with some constant $d$ that ensures that we have enough steps for configurations of $M$. Each level of recursion adds a portion of the formula that is linear in the size of the configurations, so we get size $O(f(n))$ for each level. There are $\log(2^{d\cdot f(n)}) = O(f(n))$ levels of recursion, so we get a formula of size $O(f^2(n))$. Since $f(n) = n^k$, we have a formula of size $n^{2k}$. $$\tag*{$\Box$}$$

Logarithmic space

So far, we've been looking at more and more complex problems. Now, we'll go backwards and consider problems that can be solved in less space. In particular, we consider sublinear space bounds, something that we can't really do with time (why not?). In order to do so, we need to rethink our definition of the Turing machine and space, since otherwise, we won't be able to keep the input on the tape under our current definitions.

We introduce a two-tape Turing machine model where one tape is the read-only input tape and the other tape is a read/write work tape. On the read-only input tape, the input word is on the tape and the head can only move within the space of the input. The work tape works as normal and only tape cells scanned on the work tape are considered in the space complexity of the machine. For space bounds that are at least linear, this model has equivalent space complexity with the standard one tape model (in fact, this model is the standard model used in some textbooks). For sublinear bounds, we only consider the two-tape model.

This new TM model means we have to modify our definition of a TM configuration.

If $M$ is a Turing machine that has a separate read-only input tape and $w$ is an input, a configuration of $M$ on $w$ consists of: the state, the work tape, and the positions of the tape heads.

This new definition has a few consequences. First, we note that there are now as many as $n\cdot 2^{O(f(n))}$ configurations. This definition also allows us to modify some of our results from before which required a space bound of at least $f(n) \geq n$. For instance, Savitch's theorem now holds even for logarithmic space bounds under this definition.

We define $\mathbf L$ to be the class of languages that are decidable in logarithmic space on a deterministic Turing machine. That is, $$ \mathbf L = \mathbf{SPACE}(\log n).$$ Similarly, $\mathbf{NL}$ is the class of languages that are decidable in logarithmic space on a nondeterministic Turing machine, $$ \mathbf{NL} = \mathbf{NSPACE}(\log n).$$

What are some problems that are in logarithmic space? We'll take a look at some familiar ones.

A language in L

The context-free language $L = \{0^n 1^n \mid n \geq 0\}$ is in $\mathbf L$. It's easy to see that a familiar algorithm that goes across the tape marking 0s and 1s can be done in linear space. The more conceptually obvious but tedious to mechanically describe algorithm is to count the number of 0s and 1s. It turns out that this method is the logarithmic space algorithm we are looking for, since a binary counter for the number of 0s and 1s is logarithmic in the size of the input word.

PATH is in NL

Another familiar problem, $\mathrm{PATH}$ turns out to be in $\mathbf{NL}$. Recall that $\mathrm{PATH}$ is the problem: Given a directed graph $G$ and vertices $s$ and $t$, is there a path from $s$ to $t$ in $G$? We showed before that $\mathrm{PATH}$ is in $\mathbf P$ by marking reachable vertices, but this requires linear space. We can show a nondeterministic logspace algorithm, but we don't know if there's a deterministic logspace algorithm.

The nondeterministic algorithm works as follows: Starting at the vertex $s$, nondeterministically guess the next vertex. We only keep track of the current vertex. If we kept every vertex we've seen, that would exceed the logarithmic space bound. We nondeterministically choose each next node until either we reach the vertex $t$ or we run for $n$ steps, where $n$ is the number of vertices, since if $t$ was reachable on a path from $s$, one of the branches of computation would reach it by step $n$.