CMSC 28000 — Lecture 22

Universality

It seems like Turing machines are a very powerful model. Having arbitrary access to unlimited amounts of memory seems to allow us to compute relatively complicated tasks, even if it takes a bit of work to define. And the simplest model seems to be able to simulate surprisingly complex modifications to the model.

So just how powerful are Turing machines? We will be seeing some very surprising answers to that question. However, we should begin with some observations.

First, recall at the very beginning of the quarter, we made a very critical assumption. This is the fact that every "thing" we would like to "do computation on" must have a string encoding. We will introduce some notation for this idea.

Let $A$ be an object. We denote by $\llbracket A \rrbracket$ the encoding of the object $A$ as a string.

Next, we observe that anything that we can describe as a string should have a reasonable encoding. This includes Turing machines—recall that a Turing machine $M$ is a 7-tuple $(Q, \Sigma, \Gamma, \delta, q_0, q_{\mathsf{acc}}, q_{\mathsf{rej}})$. This means that to encode a Turing machine, we just need

Since states and tape symbols can be labelled by integer and a transition is just a tuple of these things, it's not hard to see how we can come up with a reasonable encoding. Kozen Chapter 31 walks through this in a bit more detail but it is not particularly earth-shattering stuff.

But the existence of an encoding for Turing machines means that we can necessarily compute and process encodings of Turing machines. This suggests that we could define a language \[A_{\mathsf{TM}} = \{ \llbracket M,w \rrbracket \mid \text{$M$ is a Turing machine and $M$ accepts $w$} \}\] This is a decision problem—the membership problem for Turing machines. It answers the question "Given the Turing machine $M$ and input string $w$, does $M$ accept $w$?"

Now that we've defined the language, the natural follow-up question is whether there exists a Turing machine that recognizes it. We will see that this is the case.

$A_{\mathsf{TM}}$ is recognizable.

This is not completely obvious. In fact, it's quite a profound idea. What this says is that there is a Turing machine out there that can take an encoding of a Turing machine and some input, and simulate the encoded Turing machine on the desired input word. Such a Turing machine is called a universal Turing machine.

To prove that $A_{\mathsf{TM}}$ is recognizable, we'll describe how a universal Turing machine could work.

We will describe a universal Turing machine $U$. On input $\llbracket M,w \rrbracket$, where $M$ is a Turing machine and $w$ is a string, $U$ operates as follows:

  1. Simulate $M$ on input $w$.
  2. If $M$ enters the accept state, accept; if $M$ ever enters the reject state, reject.

Note that because $U$ simulates another Turing machine, $U$ will loop forever if $M$ never enters the acecpt or reject state. Of course, this is a rather unsatisfying description of a very important concept, so let's take a closer look at the implementation.

Our universal Turing machine sets up its tape in the following way, for a Turing machine $M$ and input string $w = a_1 a_2 \cdots a_n$

\[ \llbracket M, w \rrbracket \# \llbracket q_0 \rrbracket \# \llbracket \dot a_1 a_2 \cdots a_n \rrbracket \] Our tape is partitioned into the following sections:

The operation of the machine applying one transition is as follows:

  1. Scan the current state
  2. Scan the tape symbol "under" the virtual tape head
  3. Scan the encoding of the Turing machine looking for the corresponding transition
  4. Update the state
  5. Write to the tape
  6. Update the location of the virtual tape head

 

What this says is that it's possible to simulate the computation of a Turing machine. We can then confirm whether a string is accepted by that Turing machine—after all, we just run it. But what if the string isn't accepted by the machine? We have two possibilities:

In essence, what this says is that in order to decide this language, we need to do it in a more clever way than just running the machine. Is this possible? We'll see that it is not.

But first, there are a few interesting observations we can make.

The Church-Turing thesis

We've studied a number of problems where we're essentially simulating other models of computation, but the idea of a universal Turing machine acts very much like our real general-purpose computers do. And each of our computers has various differences, but via encodings and the power of being able to decode these encodings allows us to simply write programs and have them work on every computer—in other words, software.

So there's a very clear analogy here to writing programs. Just as we can treat the programs we write as text files, we can treat the Turing machines as strings. But there is another analogy that many of you will be familiar with if you've taken a functional programming class: functions as data. In functional programming, we are used to the idea of treating functions as essentially another kind of value that can be passed around. The idea behind encoding Turing machines as strings is no different.

In fact, it's this idea that ties together all of the formalisms of the 1930s that try to nail down a definition of effective computability:

These formalisms look very different. One of the issues back in the 1930s was that it wasn't entirely clear that these were all really different ways of doing the same thing. We've seen hints of this already: there's no real reason we should've expected, say, certain automata and grammars to be equivalent, but it turned out they are.

But there are some common themes:

What ended up tying all of these things together was the Turing machine. Recall that what Turing attempted to do was formalize how human computers might compute a problem. We can view the finite state system as a set of instructions we might give to a person, that tells them what to do when they read a particular symbol. We allow this person to read and write on some paper and we let them have as much paper as they want.

This gives us a fairly concrete notion of computation compared to things having to do with recursive functions. The clarity of the model and how it works makes it easy to prove that these other systems were equivalent to Turing machines. This led to the observation that's now called the Church-Turing thesis.

Nowadays, the Church-Turing thesis is often paraphrased in the context of real computers and the notion of Turing-completeness of programming languages and other systems. However, the Church-Turing thesis is really a statement about the equivalence of those formalisms from the 30s. Since Turing machines are now accepted as the standard of computability, Turing-completeness is just the property of a particular system to be able to simulate a Turing machine, and implies that this system is capable of "real" computation.

Most people are introduced to the notion of Turing-completeness when thinking about programming languages and languages similar to programming languages. For instance, it's not surprising that your typical programming language is Turing-complete. Where things get more interesting is when dealing with those languages that we don't typically think of as "programming" languages, like $\TeX$ or C++ template metaprogramming. Of course, this has important theoretical uses, like showing that theoretical molecular or quantum models of computation can at least achieve the standard notion of computability. This also has less important theoretical uses, like showing that Magic: The Gathering or PowerPoint transitions also have the full power of computability.

Diagonalization

Now, we'll try to show that $A_{\mathsf{TM}}$ is undecidable. The crux of the idea has to do with the number of languages that exist compared to the number of Turing machines there are.

This will take us back to some basic set theory. Specifically, we'll be dealing with the size of infnite sets. In the late 1800s, Georg Cantor considered the question of sizes of infinite sets. For instance, if we had the set of natural numbers and the set of all even numbers, what can we say about their sizes?

Recall the following definitions about functions.

Let $A$ and $B$ be two sets and consider a function $f:A \to B$. We say $f$ is one-to-one or injective if $f(a) \neq f(b)$ whenever $a \neq b$. We say $f$ is onto or surjective if for every $b \in B$, there exists $a \in A$ such that $f(a) = b$. A function that is both one-to-one and onto is a correspondence or bijection.

We say a set $A$ is countable if either it is finite or has the same size as $\mathbb N$.

Our intuitive understanding of infinity leads us to think that infinite sets are all the same size, since they all have infinitely many things in them. So we might expect that we can if we try hard enough, we can always come up with a correspondence. As it turns out, this is not the case and there are infinite sets that are larger than $\mathbb N$. Such a set is said to be uncountable.

The trick that Cantor uses, the diagonal argument, is a proof by contradiction. Suppose that $\mathbb R$ were countable. Then there's a bijection $f : \mathbb N \to \mathbb R$ and we can list all the elements. We can then use this hypothetical list to construct a real number $x \in \mathbb R$. However, by the way that the real number $x$ is constructed, it is actually impossible that it shows up in the list, which implies that $f$ could not have been a bijection after all.

We'll start by using it to show that unrecognizable languages exist—that is, languages for which no Turing machine exists that will recognize them. But first, here are some theorems about sets of strings.

For every alphabet $\Sigma$, the set of strings $\Sigma^*$ is countable.

The proof of this is not difficult: just take $f(n) = w$, where $w$ is the $n$th string in lexicographic order, for some order on $\Sigma$. In a sense, what we're really doing is we're really just counting in base-$|\Sigma|$ (almost, since this doesn't account for repeats with leading 0's).

But what about languages? The set of languages turns out to be uncountable. We'll adapt Cantor's diagonal argument for this proof.

For every alphabet $\Sigma$, the set of languages, $2^{\Sigma^*}$, is uncountable.

Suppose that $2^{\Sigma^*}$ is countable. We will make use of the countability of $\Sigma^*$. Since we assume that $2^{\Sigma^*}$ is countable, there exists a bijection $f : \Sigma^* \to 2^{\Sigma^*}$. In other words, we map each string over $\Sigma$ to a language, or a subset of $\Sigma^*$.

Now, define the language $D$ by \[D = \{w \in \Sigma^* \mid w \not \in f(w)\}.\] Since $D \subseteq \Sigma^*$ is a language and $f$ is a bijection, there exists a string $x \in \Sigma^*$ such that $f(x) = D$. So we can ask ourselves whether $x \in D$. We have \[x \in D \iff x \in f(x),\] since $D = f(x)$. But by definition of $D$, we have \[x \in D \iff x \not \in f(x).\] This gives us \[x \in f(x) \iff x \not \in f(x),\] or, in other words, \[x \in D \iff x \not \in D,\] which is a contradiction. Therefore, $2^{\Sigma^*}$ is not countable.

Why is this called a diagonalization argument? Consider the following table, where we have strings running down, and a list of languages running across. This table tells us, hypothetically, whether a particular string is in a particular language, which is something we can construct if we had $f$. Notice that $D$ is constructed by going along the diagonal: $x \in D$ if and only if $x \not\in f(x)$. The contradiction becomes clear when we try to look up the entry for $D$ on $w$, where $f(w) = D$.

$L_0$ $L_1$ $L_2$ $L_3$ $L_4$ $\cdots$ $D$ $\cdots$
$f(\varepsilon)$ $f(0)$ $f(1)$ $f(00)$ $f(01)$ $\cdots$ $f(x)$ $\cdots$
$\varepsilon$ ✔️️ ✔️️ ✔️️
$0$ ✔️️ ✔️️ ✔️️
$1$ ✔️️ ✔️️ ✔️️
$00$ ✔️️ ✔️️
$01$ ✔️️ ✔️️ ✔️️ ✔️️
$\vdots$ $\ddots$ $\vdots$
$x$ ✔️️ ✔️️ ✔️️ $\cdots$ $\cdots$
$\vdots$ $\vdots$ $\ddots$

This proof might seem a bit discomforting because of the way that we defined $D$. And you are not alone—many (admittedly unserious) people are so uncomfortable with diagonalization arguments that they dispute that $|\mathbb R| \gt |\mathbb N|$. But even if you are uncomfortable with how we defined $D$, it has to exist! There has to be a language with the words that we picked, and if it's a language, it has to show up in somewhere in our bijection.

If you're familiar with the original diagonalization argument, another way to think about this is that the set of finite strings corresponds to the set of natural numbers. As we argued before, the strings over an alphabet of size $k$ can be thought of as encodings of numbers in base $k$. But what about languages?

There are a number of representations for real numbers, but the one we're interested in is the real number as an infinite sequence of digits. So if we imagine the real number $0.437$, there's really an infinite sequence of 0's following it. But this is the base-10 representation. If we go to base 2, then we have infinite binary sequences.

We can argue that languages, or binary functions over strings, correspond to infinite binary sequences in the following way. Fix an alphabet $\Sigma$. We've said already that strings over $\Sigma$ are base $k$ representations of a natural number. So there's a string for $n$. We can define an infinite binary sequence for a language $L$ by choosing the $n$th digit to be 1 if the string for $n$ is in $L$ and 0 otherwise. So this infinite binary sequence encodes our language.

We can now use this fact to show that there are languages that are not recognizable. In other words, there are languages that have no Turing machine that recognizes them.

There exist languages that are not Turing-recognizable.

First, we need to show that the set of Turing machines is countable. Recall that the set of all string $\Sigma^*$ is countable. Then we observe that every Turing machine has a string encoding $\llbracket M \rrbracket$, for some fixed encoding. Now, not every string corresponds to a proper encoding of a Turing machine, but we can ignore or skip all of those. Thus, there are countably many Turing machines.

Now, suppose that every language is recognizable. Then there is a Turing machine that recognizes each of these languages. We can define a mapping $f : \Sigma^* \to 2^{\Sigma^*}$ that assigns each Turing machine to the language it recognizes. However, we have showed that the set of all languages is uncountable. Thus, $f$ cannot be a bijection, and in fact, there are strictly more languages than there are Turing machines. This means there must exist some language that is not recognized by a Turing machine.

This might be a bit surprising, but it is also not quite satisfying. After all, who cares if there's some crazy problem out there that's not solvable? What if we only care about "real" problems?

We will show that there is at least one "useful" problem that is not possible to decide: membership for Turing machines, $A_{\mathsf{TM}}$.

Why is this problem useful? Recall that it is recognizable, just by being able to run the machine. That's not actually that exciting, just running it and checking the result. But what deciding the problem would say is that it is possible to determine the behaviour of the machine without running it and only by examining its description/encoding.

In modern terms, this is the difference between running a program and seeing what it does and reading the source code of a program and figuring out what it does, without having to run it.

An undecidability result tells us that such analysis is not possible—there is no algorithm that can take the source of any program and any input and tell us what its output is going to be without running it.