CMSC 28000 — Lecture 25

Undecidable problems about context-free languages

There is a similar result characterizing undecidable properties of languages for subclasses of decidable languages (i.e. context-free languages, context-sensitive languages, etc.) called Greibach's theorem. As you might expect, this was shown by Sheila Greibach in 1963 (you may remember Greibach being mentioned when were talking about normal forms for grammars).

We won't prove Greibach's theorem, but we will show that some problems about context-free languages are undecidable. The proofs for these kinds of problems are going to be much trickier, since we don't have the benefit of conjuring up Turing machines. However, we can do something that may be surprising, which is make use of computations of Turing machines.

Definition 25.1. Let $M$ be a Turing machine and $w$ and input string. An accepting computation path for $M$ on $w$ is a sequence of configurations $C_1, \dots, C_\ell$, where $C_1$ is the start configuration of $M$ on $w$ and $C_\ell$ is an accepting configuration of $M$ on $w$ and each $C_i$ follows directly from $C_{i-1}$ (that is, $C_{i-1} \vdash C_i$) as defined by the transition function of $M$. We define a rejecting computation path similarly.

This gives us a very strict format for the computation of a word $w$ when run on a Turing machine $M$. We will show that we can compute all of the strings that aren't accepting computations.

Lemma 25.2. Consider the language $$A(M,w) = \{\langle C \rangle \mid \text{$C$ is an accepting computation path for $M$ on $w$}\}.$$ Then $\overline{A(M,w)}$ is a context-free language.

Proof. In order for a string to be an accepting computation for $M$ on $w$, it must satisfy the following conditions:

$\langle C \rangle$ is a string of the form $\langle C \rangle = \# C_1 \# C_2^R \# C_3 \# C_4^R \# \cdots \# C_\ell \#$,
$C_1$ is a starting configuration,
$C_\ell$ is an accepting configuration,
For each $C_i$, $C_i \vdash C_{i+1}$ according to $M$.

You might wonder why we're reversing every other configuration in our encoding of $C$. We will get to that shortly. However, we claim that for each of the items in the previous list, we can define a language $A_i$ for them. Then $$A(M,w) = A_1 \cap A_2 \cap A_3 \cap A_4,$$ which is all strings which satisfy all of our conditions. By De Morgan's laws, we have $$\overline{A(M,w)} = \overline{A_1} \cup \overline{A_2} \cup \overline{A_3} \cup \overline{A_4}.$$ Then, we must show that each $\overline{A_i}$ are context-free languages. The first three are easy: $A_1, A_2, A_3$ are actually regular, and it's not hard to see how you can define a DFA for each of those languages.

The not so obvious part is showing that $\overline{A_4}$ is context-free. Suppose that we have a string that satisfies conditions 1 through 3. To fail condition 4, we must find a substring $\#C_i\#C_{i+1}^R\#$ such that $C_i \not\vdash C_{i+1}$. Suppose we are using a PDA to do this. When reading $C_i$, we push every symbol of $C_i$ onto the stack. Then when reading $C_{i+1}^R$, we pop every symbol of the stack to compare.

Since our language is specific to $M$, we can compare the two configurations against the transition function of $M$. The crucial thing to notice here is that the change to the configuration by following a single transition of a Turing machine is very localized around the position of the tape head, and there are finitely many changes that one can make.

Then since each of $\overline{A_i}$ are context-free, their union, $\overline{A(M,w)}$, is also context-free. $\tag*{$\Box$}$

We can now use this to show that universality for CFGs, is undecidable: $$U_{CFG} = \{\langle G \rangle \mid \text{$G$ is a CFG and $L(G) = \Sigma^*$} \}$$

Theorem 25.3. $U_{CFG}$ is undecidable.

Proof. We will assume that $U_{CFG}$ is decidable and show that doing so will allow us to decide $A_{TM}$. Consider the language $\overline{A(M,w)}$ of strings that are not accepting computation paths for $M$ when run on $w$. Observe that $M$ accepts $w$ if and only if there is an accepting computation path $C$ for $M$ when run on $w$. So $C \in A(M,w)$ and therefore $C \not \in \overline{A(M,w)}$. Then $A(M,w) = \emptyset$ if and only if $M$ does not accept $w$, or equivalently, $M$ does not accept $w$ if and only if $\overline{A(M,w) = \Sigma^*$.

Now, suppose we have a Turing machine, say $T$ that decides $U_{CFG}$. Then we can construct a Turing machine $S$ that decides $A_{TM}$ that does the following:

On input $\langle M,w \rangle$, construct a context-free grammar $G$ for $\overline{A(M,w)}$, which we have shown is context-free by Lemma 25.2.
Run $T$ on $\langle G \rangle$ to decide whether $L(G) = \Sigma^*$.
If $T$ accepts, then reject. If $T$ rejects, then accept.

Since $A_{TM}$ is undecidable, we can conclude that $U_{CFG}$ is undecidable. $$\tag*{$\Box$}$$

This gives us a very easy way to show that $EQ_{CFG}$ is undecidable.

Theorem 25.4. $EQ_{CFG}$ is undecidable.

Proof. If we have a TM $R$ that decides $EQ_{CFG}$, then we can decide $ALL_{CFG}$ with the following machine:

On input $\langle G \rangle$, where $G$ is a CFG, run $R$ on input $\langle G,G' \rangle$, where $L(G') = \Sigma^*$.
If $R$ accepts, accept; if $R$ rejects, reject.

But $ALL_{CFG}$ is undecdiable, so $EQ_{CFG}$ must also be undecidable. $$\tag*{$\Box$}$$

This technique of defining CFGs to generate Turing machine computations (or not Turing machine computations, as it turns out) was shown by Hartmanis in 1967. He shows that several other decision problems regarding CFGs are undecidable using the same approach.

Post's Correspondence Problem

Suppose we have a collection of dominoes. A domino is a tile that has a string on the top and a string on the bottom: $\binom{a}{ab}$. A collection of dominoes looks like $$ \left\{ \binom{b}{ca}, \binom{a}{ab}, \binom{ca}{a}, \binom{abc}{c} \right\}.$$ We want to find a match from a sequence of dominoes drawn from our collection: $$ \binom{a}{ab} \binom{b}{ca} \binom{ca}{a} \binom{a}{ab} \binom{abc}{c}.$$

Definition 25.5. An instance of the Post Correspondence Problem is a sequence of pairs of strings $$(u_1,v_1),(u_2,v_2),\dots,(u_k,v_k)$$ where $u_i, v_i \in \Sigma^*, i = 1,\dots,k$. A solution of the above instance is the sequence $i_1,i_2,\dots,i_n \in \{1,\dots,k\}$ such that $$u_{i_1} u_{i_2} \cdots u_{i_n} = v_{i_1} v_{i_2} \cdots v_{i_n}.$$ The Post Correspondence problem asks whether or not a given instance of PCP has a solution. We define the language of PCP instances that have a solution to be $$L_{PCP} = \{ \langle P \rangle \mid \text{$P$ is a PCP instance that has a solution} \}.$$ Then we have the following result.

Theorem 25.6. PCP is undecidable.

First, let's talk about how we do this. The idea is to reduce from $A_{TM}$. We show that for any Turing machine $M$ and input string $w$, we can construct an instance of PCP such that a match is an accepting computation path for $M$ on $w$. Thus, if we have a match, then we can determine whether $M$ accepts $w$.

Proof. Suppose there is a TM $R$ that decides PCP. We construct a TM $S$ that decides $A_{TM}$. Let $M = (Q,\Sigma,\Gamma,\delta,q_0,q_A,q_R)$. We will have $S$ construct an instance of the PCP $P$ that has a match if and only if $M$ accepts $w$. Essentially, we will be defining pairs for the set $P'$.

First, we show how to do this with a restricted case of PCP where we require a particular starting pair. Later, we will show how to remove this restriction. The starting pair is defined $$\binom{\#}{\#q_0 w_1 w_2 \cdots w_n \#}.$$ Note that this is the starting configuration for $M$ on $w$.

Next, for every $a,b \in \Gamma$ and every $q,q' \in Q$ where $q \neq q_R$, if $\delta(q,a) = (q',b,R)$, add the pair $\binom{qa}{bq'}$. These pairs handle transitions for the tape head moving to the right.

For every $a,b,c \in \Gamma$ and every $q,q' \in Q$ where $q \neq q_R$, if $\delta(q,a) = (q',b,L)$, add $\binom{cqa}{q'cb}$. These pairs handle transitions for the tape head moving to the left.

For every $a \in \Gamma$, add $\binom a a$. These pairs correspond to tape contents that aren't near the tape head.

Add $\binom \# \#$ and $\binom{\#}{\Box\#}$. These pairs handle the delimiters between the configurations.

For every $a \in \Gamma$, add $\binom{a q_A}{q_A}$ and $\binom{q_A a}{q_A}$. This handles when the computation reaches the accept state. Note that the top lags behind the bottom, so we need to add some extra steps in order for us to get the match that we want.

Finally, we add the pair $\binom{q_A\#\#}{\#}$ to finish.

Now, we've constructed an instance $P'$ of PCP with a restriction, so it doesn't quite work the way we want it to. If we just use $P'$, there is a match regardless of whether $M$ accepts $w$ or not. So we have a bit more work to do.

First, let $w = w_1 w_2 \cdots w_n$ be some string of length $n$. Then we define the following strings:

$\star w = \star w_1 \star w_2 \star \cdots \star w_n$,
$w \star = w_1 \star w_2 \star \cdots \star w_n \star$,
$\star w \star = \star w_1 \star w_2 \star \cdots \star w_n \star$.

Now, we convert our restricted PCP instance $P'$ that we constructed into a proper PCP instance $P$. If we have $$ P' = \left\{ \binom{u_1}{v_1}, \binom{u_2}{v_2}, \dots , \binom{u_k}{v_k} \right\},$$ where $\binom{u_1}{v_1} = \binom{\#}{\#q_0 w_1 w_2 \cdots w_n \#}$, then we define $P$ to be the set $$ \left\{ \binom{\star u_1}{\star v_1 \star}, \binom{\star u_1}{v_1 \star}, \binom{\star u_2}{v_2 \star}, \dots, \binom{\star u_k}{v_k \star}, \binom{\star \triangle}{\triangle} \right\}.$$

Then in the PCP instance $P$, the tile $\binom{\star u_1}{\star v_1 \star}$ is the only one that can be the first tile since it is the only one that starts with the same symbol $\star$. The addition of the $\star$s doesn't matter in the other tiles, since the $\star$s are simply interleaved in the original string. Then the final tile is always going to be $\binom{\star \triangle}{\triangle}$ to take care of the extra $\star$. $$\tag*{$\Box$}$$

Post's Correspondence Problem originates all the way back from 1946, due to Emil Post. While we've shown that this is undecidable via Turing machine simulation, Post's original proof of undecidability uses the string generation system formulated by Post in 1943 which we've allueded to, the Post canonical systems. The languages generated by such systems are recognizable and so these systems are Turing-complete.

Of course, this isn't just a neat trick to show that there are undecidable problems that look deceptively easy. As with any undecidable problem, it can become another tool in our toolbox to show that a problem is undecidable. For instance, let's consider the problem of intersection emptiness for CFGs, $$ISE_{CFG} = \{ \langle G_1, G_2 \rangle \mid \text{$G_1, G_2$ are CFGs, $L(G_1) \cap L(G_2) = \emptyset$} \}.$$

Theorem 25.7. $ISE_{CFG}$ is undecidable.

Proof. We assume that we can decide intersection emptiness for context-free languages. Let $\Sigma = \{a,b\}$ and we define $I$ to be an arbitrary PCP instance over $\Sigma$, $$(\alpha_1, \beta_1), \dots, (\alpha_k,\beta_k).$$ We define the following languages over the alphabet $\Sigma \cup \{c\} = \{a,b,c\}$.

$L_{mi} = \{w_1 c w_2 c w_2^R c w_1^R \mid w_1,w_2 \in \{a,b\}^* \}$,
$L(\alpha) = \{b a^{i_n} b a^{i_{n-1}} \cdots b a^{i_1} c \alpha_{i_1} \alpha_{i_2} \cdots \alpha_{i_n} \mid n \geq 1, 1 \leq i_j \leq n, j = 1, \dots, k\}$,
$L(\beta) = \{b a^{i_n} b a^{i_{n-1}} \cdots b a^{i_1} c \beta_{i_1} \beta_{i_2} \cdots \beta_{i_n} \mid n \geq 1, 1 \leq i_j \leq n, j = 1, \dots, k\}$,
$L_0 = L(\alpha) c L(\beta)^R$.

All of these languages are context-free. Then we observe that $L_0 \cap L_{mi} \neq \emptyset$ if and only if $I$ has a solution. Since $L_0$ and $L_{mi}$ are context-free, if we could decide emptiness of the intersection of context-free languages, we can also decide whether or not an arbitrary instance of PCP has a solution. $$\tag*{$\Box$}$$

Kolmogorov Complexity

We've been thinking a lot about sets of strings and talking roughly about the complexity of their description (i.e. DFA vs. PDA vs. TM). We can do the same thing with strings: what does it mean for a string to be complex? The following notion is due to Andrey Kolmogorov in 1963.

Definition 25.8. The Kolmogorov complexity $C(x)$ of a string $x$ is the size of the smallest computable description of $x$.

Here, computable description is left kind of vague, but generally, we can consider this as the output of a computable function or machine. Informally, we can consider this as the output of some program in a fixed programming language. As it is, this definition is dependent on a choice of function or machine or programming language. However, it's not hard to argue that all of these are ultimately the equivalent: as long as you have an interpreter, you can, with constant overhead, transform one description into another.

What does Kolmogorov complexity tell us? In general, Kolmogorov complexity tells us about the compressibility of a string. The notion of compression is really just finding a shorter description of a piece of data. Kolmogorov complexity captures that information: if a string is more "complex" then it's harder to "compress". This leads to the following definition.

Definition 25.9. Let $x$ be a string. If $C(x) \geq |x|$, then $x$ is incompressible or random.

Implicit in this definition is the fact that $C(x) \leq |x|+O(1)$. Why? Because you can always get a description of a string about as long as the string itself: just write a program that hardcodes the string to be output and write it out!

But this definition leads us to another important thing that Kolmogorov complexity tells us about: the nature of randomness. We're all familiar with the notion of fair coin flips having a $\frac 1 2$ probability of landing on either heads or tails. If we consider a binary string of length 50 representing the result of 50 coin flips, we understand that any one of those strings should be equally likely. However, our lizard brains are often alarmed by outcomes like $$01010101010101010101010101010101010101010101010101.$$

Kolmogorov complexity tells us that complexity and randomness are connected: something that's hard to describe is very complex and therefore, it appears random.

The natural question that arises from this is how to compute $C(x)$. If we could do this, then we could figure out the perfect, most compact representation for any given string. As you might guess, this quantity is uncomputable.

Theorem 25.10.The Kolmogorov complexity $C(x)$ of a string $x$ is uncomputable.

Proof. Suppose that we $C(x)$ is computable by some Turing machine, say $M$. Now, we define a new Turing machine $M'$ that takes as input an integer $\ell$ in binary. This Turing machine does the following:

For each string $y \in \Sigma^\ell$,
1. Run $y$ on $M$ to determine $C(y)$.
2. If $C(y) \geq \ell$, return $y$.

By the Pigeonhole principle, there must be at least one string $y$ with $C(y) \geq |y| = \ell$. However, note that $M'$ and $\ell$ constitute a description of $y$, since $M'$ when run on $\ell$ will always return $y$.

Let $m = |\langle M' \rangle|$. Then $C(y) \leq m + \log \ell + c$ for some constant $c$. Putting our two inequalities together, we have $$\ell \leq C(y) \leq m + \log \ell + c.$$ This inequality fails for large $\ell$. $\tag*{$\Box$}$

One cool thing that you can do with Kolmogorov complexity, shown by Li and Vitányi in 1995, is show that we can replace pumping lemmas with arguments about the Kolmogorov complexity of words in that language. One can also extend these ideas to algorithmically random infinite sequences.

Chaitin's constant

Speaking of algorithmically random infinite sequences, if we can go back to the notion of infinite binary sequences, we realize that some of those infinite binary sequences are algorithmically random. We've already seen that all of these sequences correspond to languages, but these also correspond to real numbers. The idea that an infinite binary sequence could capture the information in a language is not so far-fetched, given the way we were set up with the right perspective. The notion that all of this information is encoded in a single real number is a bit more unbelievable, but that's what we've done.

So it's not hard to see how, in the course of talking about the undecidability of some sets of Turing machines, we're really also talking about the undecidability of some sets of natural numbers. After all, Turing machines have a correspondence with the natural numbers. But if binary sequences are real numbers, what we've just discussed means that there are real numbers that are uncomputable.

Definition 25.11. Let $p$ be the encoding of a Turing machine. Then, \[\Omega = \sum_{\text{$p$ halts}} 2^{-|p|}.\] The real number $\Omega$ is Chaitin's constant and gives the probability that a randomly chosen Turing machine will halt.

Technically speaking, we have to fix a choice of universal Turing machine and encoding for our Turing machines, but the idea is pretty much the same. Furthermore, if we could compute the first, say, $n$ bits of this machine, then we could figure out whether Turing machines of size up to $n$ will halt.

This is all very informal, and Chaitin himself has a nice informal writeup of what this constant means and its implications.