CISC 462 — Lecture 12

The Post Correspondence Problem

Suppose we have a collection of dominoes. A domino is a tile that has a string on the top and a string on the bottom: $\binom{a}{ab}$. A collection of dominoes looks like $$ \left\{ \binom{b}{ca}, \binom{a}{ab}, \binom{ca}{a}, \binom{abc}{c} \right\}.$$ We want to find a match from a sequence of dominoes drawn from our collection: $$ \binom{a}{ab} \binom{b}{ca} \binom{ca}{a} \binom{a}{ab} \binom{abc}{c}$$.

Formally, an instance of the Post Correspondence Problem is a sequence of pairs of strings $$(u_1,v_1),(u_2,v_2),\dots,(u_k,v_k)$$ where $u_i, v_i \in \Sigma^*, i = 1,\dots,k$. A solution of the above instance is the sequence $i_1,i_2,\dots,i_n \in \{1,\dots,k\}$ such that $$u_{i_1} u_{i_2} \cdots u_{i_n} = v_{i_1} v_{i_2} \cdots v_{i_n}.$$ The Post Correspondence problem asks whether or not a given instance of PCP has a solution. We define the language of PCP instances that have a solution to be $$L_{PCP} = \{ \langle P \rangle \mid \text{$P$ is a PCP instance that has a solution} \}.$$ Then we have the following result.

Theorem. PCP is undecidable.

First, let's talk about how we do this. The idea is to reduce from $A_{TM}$. We show that for any Turing machine $M$ and input string $w$, we can construct an instance of PCP such that a match is an accepting computation history for $M$ on $w$. Thus, if we have a match, then we can determine whether $M$ accepts $w$.

Proof. Suppose there is a TM $R$ that decides PCP. We construct a TM $S$ that decides $A_{TM}$. Let $M = (Q,\Sigma,\Gamma,\delta,q_0,q_A,q_R)$. We will have $S$ construct an instance of the PCP $P$ that has a match if and only if $M$ accepts $w$. Essentially, we will be defining pairs for the set $P'$.

First, we show how to do this with a restricted case of PCP where we require a particular starting pair. Later, we will show how to remove this restriction. The starting pair is defined $$\binom{\#}{\#q_0 w_1 w_2 \cdots w_n \#}.$$ Note that this is the starting configuration for $M$ on $w$.

Next, for every $a,b \in \Gamma$ and every $q,q' \in Q$ where $q \neq q_R$, if $\delta(q,a) = (q',b,R)$, add the pair $\binom{qa}{bq'}$. These pairs handle transitions for the tape head moving to the right.

For every $a,b,c \in \Gamma$ and every $q,q' \in Q$ where $q \neq q_R$, if $\delta(q,a) = (q',b,L)$, add $\binom{cqa}{q'cb}$. These pairs handle transitions for the tape head moving to the left.

For every $a \in \Gamma$, add $\binom a a$. These pairs correspond to tape contents that aren't near the tape head.

Add $\binom \# \#$ and $\binom{\#}{\Box\#}$. These pairs handle the delimiters between the configurations.

For every $a \in \Gamma$, add $\binom{a q_A}{q_A}$ and $\binom{q_A a}{q_A}$. This handles when the computation reaches the accept state. Note that the top lags behind the bottom, so we need to add some extra steps in order for us to get the match that we want.

Finally, we add the pair $\binom{q_A\#\#}{\#}$ to finish.

Now, we've constructed an instance $P'$ of PCP with a restriction, so it doesn't quite work the way we want it to. If we just use $P'$, there is a match regardless of whether $M$ accepts $w$ or not. So we have a bit more work to do.

First, let $w = w_1 w_2 \cdots w_n$ be some string of length $n$. Then we define the following strings:

$\star w = \star w_1 \star w_2 \star \cdots \star w_n$,
$w \star = w_1 \star w_2 \star \cdots \star w_n \star$,
$\star w \star = \star w_1 \star w_2 \star \cdots \star w_n \star$.

Now, we convert our restricted PCP instance $P'$ that we constructed into a proper PCP instance $P$. If we have $$ P' = \left\{ \binom{u_1}{v_1}, \binom{u_2}{v_2}, \dots , \binom{u_k}{v_k} \right\},$$ where $\binom{u_1}{v_1} = \binom{\#}{\#q_0 w_1 w_2 \cdots w_n \#}$, then we define $P$ to be the set $$ \left\{ \binom{\star u_1}{\star v_1 \star}, \binom{\star u_1}{v_1 \star}, \binom{\star u_2}{v_2 \star}, \dots, \binom{\star u_k}{v_k \star}, \binom{\star \triangle}{\triangle} \right\}.$$

Then in the PCP instance $P$, the tile $\binom{\star u_1}{\star v_1 \star}$ is the only one that can be the first tile since it is the only one that starts with the same symbol $\star$. The addition of the $\star$s doesn't matter in the other tiles, since the $\star$s are simply interleaved in the original string. Then the final tile is always going to be $\binom{\star \triangle}{\triangle}$ to take care of the extra $\star$. $$\tag*{$\Box$}$$

Of course, this isn't just a neat trick to show that there are undecidable problems that look deceptively easy. As with any undecidable problem, it can become another tool in our toolbox to show that a problem is undecidable. For instance, let's consider the problem of intersection emptiness for CFGs, $$ISE_{CFG} = \{ \langle G_1, G_2 \rangle \mid \text{$G_1, G_2$ are CFGs, $L(G_1) \cap L(G_2) = \emptyset$} \}.$$

Theorem. $ISE_{CFG}$ is undecidable.

Proof. We assume that we can decide intersection emptiness for context-free languages. Let $\Sigma = \{a,b\}$ and we define $I$ to be an arbitrary PCP instance over $\Sigma$, $$(\alpha_1, \beta_1), \dots, (\alpha_k,\beta_k)$$. We define the following languages over the alphabet $\Sigma \cup \{c\} = \{a,b,c\}$.

$L_{mi} = \{w_1 c w_2 c w_2^R c w_1^R \mid w_1,w_2 \in \{a,b\}^* \}$,
$L(\alpha) = \{b a^{i_n} b a^{i_{n-1}} \cdots b a^{i_1} c \alpha_{i_1} \alpha_{i_2} \cdots \alpha_{i_n} \mid n \geq 1, 1 \leq i_j \leq n, j = 1, \dots, k\}$,
$L(\beta) = \{b a^{i_n} b a^{i_{n-1}} \cdots b a^{i_1} c \beta_{i_1} \beta_{i_2} \cdots \beta_{i_n} \mid n \geq 1, 1 \leq i_j \leq n, j = 1, \dots, k\}$,
$L_0 = L(\alpha) c L(\beta)^R$.

All of these languages are context-free. Then we observe that $L_0 \cap L_{mi} \neq \emptyset$ if and only if $I$ has a solution. Since $L_0$ and $L_{mi}$ are context-free, if we could decide emptiness of the intersection of context-free languages, we can also decide whether or not an arbitrary instance of PCP has a solution. $$\tag*{$\Box$}$$