CISC 462 — Lecture 5

Context-free languages

Now, we'll turn our attention to the class of context-free languages. Recall that context-free languages are recognized by pushdown automata and are generated by context-free grammars. In today's class, we're going to be dealing mainly with context-free grammars. As you might expect, things will get a bit more complicated, especially as grammars are fairly different objects from finite automata and we don't have many of the same nice properties of regular languages to rely on.

Membership

Again, we begin with membership. In the case of a grammar, rather than acceptance, we have the problem of whether a grammar generates a particular string. Thus, we have the following language $$ A_{CFG} = \{\langle G, w \rangle \mid \text{$G$ is a CFG that generates $w$}\}.$$

Theorem. $A_{CFG}$ is a decidable language.

Since we're trying to determine generation rather than acceptance, the algorithm is not as straightforward as in the case for DFAs. The obvious solution of just trying every single possible derivation doesn't quite work, since there are infinitely many derivations that we have to try. If $w$ wasn't generated by $G$, then the algorithm would never halt and our language wouldn't be decidable.

To solve this, we need to take advantage of a property of context-free grammars. Remember that we can transform any grammar into an equivalent grammar in Chomsky normal form. A grammar in CNF is one where all of its production rules are of the form $A \to BC$ or $A \to a$ where $A,B,C$ are all varibles and $a$ is a terminal.

The key to making our algorithm work is the property that for a grammar in CNF, any derivation of a word of length $n$ has exactly $2n-1$ steps. Therefore, instead of checking every derivation, we only need to check derivations with $2n-1$ steps, of which there are only finitely many. Then the Turing machine operates as follows:

On input $\langle G,w \rangle$, where $G$ is a CFG and $w$ is a string, transform $G$ into an equivalent grammar $G'$ in Chomsky normal form.
List all derivations with $2n-1$ steps, where $n = |w|$, unless $n = 0$, in which case list all derivations with 1 step.
If any of these derivations generate $w$, accept; otherwise, reject.

Emptiness

We can also show that the emptiness problem for CFGs is also decidable. Again, it'll look a bit different from what we've done with DFAs. First, here's the language $$E_{CFG} = \{\langle G \rangle \mid \text{$G$ is a CFG and $L(G) = \emptyset$} \}.$$

Theorem. $E_{CFG}$ is a decidable language.

In a way, we can apply the same kind of idea from the DFA case. Of course, a grammar isn't a graph, so it's not exactly the same. However, in essence, what we want to do is check that there's a derivation that will generate some terminal beginning from the start variable. To do this, we check each variable to see if it can generate a string of terminals. We begin by marking terminals and determining whether each variable can generate a string of variables and terminals that are all marked. Once the algorithm is finished, we can simply check whether the start variable is marked or not.

On input $\langle G \rangle$ where $G$ is a CFG, mark all terminal symbols in $G$.
Repeat until no new variables get marked:
Mark any variable $A$ where $G$ has a rule $A \to U_1 U_2 \cdots U_k$ and each symbol $U_1, \dots, U_k$ has already been marked.
If the start variable is not marked, accept; otherwise reject.

Equality

Now we'll consider the problem of determining whether two grammars generate the same language. This is the language $$ EQ_{CFG} = \{\langle G,H \rangle \mid \text{$G$ and $H$ are CFGs and $L(G) = L(H)$}\}.$$ This turns out to be challenging. Unlike regular languages, we can't rely on intersection and complementation, since context-free languages aren't closed under these operations. However, even if we tried harder, we won't be able to solve this problem, as it turns out to be our first encounter with an unsolvable problem.

Decidability

Before we move into how to show that the previous problem is undecidable, we'll first establish that every context-free language is decidable. That is, every language that is recognized by a pushdown automaton is a decidable language. In essence, this would show that decidable languages are strictly more powerful than context-free languages.

Theorem. Every context-free language is decidable.

The straightforward approach would be to simulate a PDA for our language as we did for DFA membership. However, we hit a snag when we consider that a PDA may be nondeterministic. Unlike the NFA, there's no way to determinize a PDA. This adds a difficulty since some branches of computation may not halt. Instead, we use the Turing machine for deciding CFG membership. Then we build the following Turing machine for deciding the grammar $G$.

On input $w$: run the CFG membership Turing machine on input $\langle G,w \rangle$.
If the machine accepts, accept; if it rejects, reject.