Earlier, we saw from the Chomsky hierarchy that the Type-0 or unrestricted grammars are those grammars that generate exactly the Turing-recognizable languages. We will try to work through this result.
We’ve seen already that Turing machine configurations can be represented using strings. We’ve also seen that rewriting a configuration into the configuration that follows from an application of the transition function actually isn’t that hard: we just have to find the state indicator and make the appropriate changes. This actually turns out to be relatively straightforward with a grammar.
For instance, for each transition $\delta(p,X) = (q,Y,\rightarrow)$, we know that this corresponds to the computation step $upXv \vdash uYqv$. With an unrestricted grammar, this can be represented as the rule $pX \to Yq$. Similarly, for $\delta(p,X) = (q,Y,\leftarrow)$, this looks like $uZpXv \vdash uqZYv$, which would be represented using the rule $ZpX \to qZY$, thoug we have to account for all possible $Z \in \Gamma$ since we’re moving to the left.
From this, it’s not hard to see that we can encode the transition function $\delta$ of a Turing machine as rules of an unrestricted grammar. Once we have these rules, we can then transform a starting configuration $q_0 w$ to some accepting configuration $u q_{\mathsf{acc}} v$. This accounts for most of the main “intellectual” core of the idea. What remains is a bit more technical.
Basically, there are two main problems to resolve, having to do with the core differences between the Turing machine and unrestricted grammar models.
So any proof of equivalence must bridge these two gaps. We’ll start with the issue of nondeterminism.
What we will try to do is define the notion of a nondeterministic Turing machine and then show that such a machine can be simulated using our ordinary deterministic Turing machine.
We’ve seen already that the Turing machine model is quite robust to changes. For instance, we discussed the notion of computable functions, for which the notion of acceptance and rejection doesn’t really make sense. In this case, one can modify the Turing machine so that instead of having an accepting or rejecting state, we have a halting state. This model is equivalent since we can interpret an output of $0$ to be “reject” and $1$ to be “accept”.
Another observation is that even the basic definition has variants:
We’ve also seen specialized machines, like the enumeration machine that takes no input, but lists all the strings in some Turing-recognizable language.
It is not hard to see that all of these models can be simulated by the basic Turing machine model (whichever one we decide on). Just like with DFAs and PDAs, it is worth asking what kinds of modifications we can make and still retain the same “power” of the machine.
One example of how we can modify the Turing machine model is to ask what if instead of using only a single tape, we allowed the Turing machine to have multiple tapes with tape heads that move independently of each other? That is, for $k$ tapes, we modify the definition so we have the transition function $\delta : Q \times \Gamma^k \to Q \times \Gamma^k \times \{L,R\}^k$.
Every language recognized by a multi-tape Turing machine can be recognized by a single tape Turing machine.
In fact, we’ve already seen this kind of idea when discussing the universal Turing machine. For that machine, we discussed a scheme to “divide” the tape. The idea of representing $k$ tapes proceeds much along the same lines.
We show how to simulate a multi-tape Turing machine on a single-tape Turing machine. The idea is to store all the contents of each tape on the single tape and separate each tape as required. We also keep track of the tape head positions on each of the virtual tapes.
What does this machine look like? We describe a Turing machine which simulates a $k$-tape TM $M$ as follows:
Since a single-tape Turing machine can obviously be simulated by a $k$-tape Turing machine, this tells us that the two models are equivalent.
Now, based on what we’ve discussed, the $k$-tape machine is really a convenience for us to think about how the machine stores information. As discussed in the proof, we could have accomplished this in the way we’ve been doing all along, by talking about reserved portions of the tape. In any case, we can now make use of this model to discuss how to simulate nondeterminism.
First, what does a nondeterministic Turing machine look like? In a nondeterministic Turing machine, the transition function is a function $\delta: Q \times \Gamma \to 2^{Q \times \Gamma \times \{L,R\}}$. That is, our transition function maps onto a set of possible transitions.
As with other nondeterministic models, we can view the computation of a nondeterministic Turing machine as a tree, where each branch represents a possible choice or guess. Then a nondeterministic Turing machine accepts a word if at least one branch of the computation tree enters an accepting state. It’s important to note that, as with other nondeterministic models, a nondeterministic Turing machine $M$ on an input string $w$
Every nondeterministic Turing machine has an equivalent deterministic Turing machine.
The idea behind this proof is to construct a Turing machine that performs a breadth-first search of the computation tree. Recall that the transition function of a nondeterministic Turing machine is a function $\delta: Q \times \Gamma \to 2^{Q \times \Gamma \times \{\leftarrow, \rightarrow\}}$. Since $Q$ and $\Gamma$ are both finite, any single transition has a finite number of choices, so every node in the tree is guaranteed to have finitely many children.
Since we do a breadth-first search, if the machine accepts the input string, the branch containing an accepting configuration will be finite and we will eventually find it. Similarly, if the machine rejects the input string, we will eventually find all rejecting configurations. And if the machine runs forever, then the search will continue forever.
We use a 3-tape deterministic machine to simulate the nondeterministic machine $N$. We can do this because we have just conveniently showed that a $k$-tape machine is equivalent to a single tape machine. The tapes are used as follows:
The third tape is what will allow us to navigate through the tree. Let $k = \max\limits_{q \in Q, a \in \Gamma} |\delta(q,a)|$. This is the largest number of choices for any single transition in the machine. For each node, we assign a value to each child from $1$ to $k$. Then by considering a string over the alphabet $\{1,\dots,k\}$, we follow a particular branch at each set of choices, starting from the root of the tree.
Note that not every string over $\{1,\dots,k\}$ will correspond to a valid address, since not every nondeterministic set of choices is guaranteed to have as many as $k$ choices. But this is fine, since that address simply will not correspond to a node. Then by enumerating through every string, we are guaranteed to reach every node of the computation tree.
Now, the machine operates as follows:
Something you might notice or object to is that simulating a nondeterministic Turing machine with a deterministic machine can potentially take a very long time. Since we’re performing a breadth-first search, this means that we’re looking at a computation that’s exponential in the length of the shortest accepting path in the nondeterministic computation tree. But that’s okay, because we only care whether our deterministic machine is able to recognize or decide any language that a nondeterministic machine is capable of recognizing.
But for argument’s sake, let’s suppose that we do care about the time that it takes. We can measure this: how many steps is the computation on our machine? We can apply our notions of efficiency from other CS courses and consider polynomially many steps to be good or efficient. So suppose that I have a nondeterministic TM that has an accepting computation branch that’s polynomially long in the size of the input. Is it possible to simulate or come up with a way to solve the same problem deterministically, but still end up with only a polynomially long computation?
The answer is: we don’t know! Because this question is exactly the P vs. NP problem.
Since we now know that nondeterministic and deterministic Turing machines are equivalent, we can use a nondeterministic machine as our basis for discussion and take advantage of nondeterminism. This will make it much easier to think through how the operation of the machine aligns with the behaviour of the grammar.
We’ll start with the easier case: given an unrestricted grammar, show that a Turing machine can recognize the language of the grammar.
Let $G = (V, \Sigma, P, S)$ be an unrestricted grammar. Then $L(G)$ is Turing-recognizable.
Construct a nondeterministic Turing machine that does the following:
Recall that when generating a string using a grammar, there are two possible nondeterministic choices for each step of a derivation:
We use a nondeterministic machine to handle these two choices. The result is a machine that nondeterministically applies productions. If $w$ is a string that is generated by $G$, there must exist a derivation $S \Rightarrow^* w$ in $G$ and so there must exist a series of nondeterministic choices that our Turing machine can apply. So our machine will accept the string.
On the other hand, if $G$ does not generate $w$, then there will be no series of choices that our machine will apply that generates $w$. In this case, the machine will either run out of possible rules to apply or run forever. In either of these case, the machine does not accept $w$.
Showing that a Turing machine can simulate a grammar turned out not to be too difficult, by taking advantge of the ability of a Turing machine to write strings and nondeterministically choose strings for granted. What is less obvious is showing that a grammar can simulate a Turing machine.
One way to do this is to meet in the middle: instead of simulating a Turing machine that accepts a language, we instead simulate a Turing machine that nondeterministically generates a string from the language. This is not too different from how we defined a Turing machine that enumerates all strings from a language.
Let $L$ be a Turing-recognizable language. Then there exists an unrestricted grammar $G$ such that $L(G) = L$.
Let $M$ be a Turing machine that recognizes $L$. Using $M$, we construct a nondeterministic Turing machine $M'$ that nondeterministically “generates” a string from $L$ in the following way:
One way to think about this machine is that for each string $w \in L$, there is a branch of computation that successfully simulates a computation of $w$ on $M$ and ends with $w$ remaining on the tape. For strings $w \not \in L$, there will be no successful computations of $M$ on $w$ so these will either crash/reject or run forever.
One issue here is how to handle the erasing of the work tape in the final step. This issue arises if $M$ ever writes blanks—we need to get rid of any that our machine wrote but not the infinitely many that our machine never encountered. The trick is to make sure that $M'$ never writes a blank ($\Box$) to the tape by making it write something like $\boxdot$ instead. This way, we can distinguish truly blank tape cells and cells in which we’ve overwritten and we don’t try to erase the infinitely many blank cells that were never visited.
Based on this, we can construct the following grammar to simulate $M'$, the nondeterministic generator for $L$. \begin{align*} S &\to q_0 S_1 \\ S_1 &\to \Box S_1 \mid T \\ pX &\to Yq &\text{for each $(q,Y,\rightarrow) \in \delta(p,X)$} \\ ZpX &\to qZY &\text{for each $Z \in \Gamma$ and $(q,Y,\leftarrow) \in \delta(p,X)$} \\ q_{\mathsf{acc}} \boxdot &\to q_{\mathsf{acc}} \\ q_{\mathsf{acc}} \Box &\to q_{\mathsf{acc}} \\ q_{\mathsf{acc}}T &\to \varepsilon \end{align*}
We can split these rules up into three groups:
The equivalence of unrestricted grammars and Turing machines ties a nice bow on many ideas we’ve seen throughout the quarter.
One observation about this is that very little machinery is actually required for computation. The Turing machine and unrestricted grammar models are not extremely complicated. They actually have fairly small, well-defined sets of rules. Despite this, they can express the full power of computation. This is one of the surprising consequences of the Church–Turing thesis and why it’s easy to run into mechanisms that can perform computation. In this sense, it is a bit surprising that we’ve basically hit the limits of computation even on relatively simple systems.
Computation is just strings, which is why computation is everywhere, but we still can’t compute everything.