So now, the obvious question to ask is: does nondeterminism really give us more power? That is, can NFAs recognize languages that DFAs can’t? When considering concatenation, we got stuck when we were limited to a deterministic machine and so it’s hard to imagine that we could succeed without nondeterminism. And yet, it turns out DFAs are just as powerful as NFAs.
But how? The key observation is in examining what exactly is happening with our “nondeterministic” transition function. Remember that for each state and symbol, the transition function gives a set of states rather than a single state as a deterministic finite automaton.
So how many sets are there? If there are $n$ states, then there are $2^n$ possible subsets of $Q$. This is a large number which we’ve been trained as computer scientists to be alarmed at, but the important thing is that it is finite. This gives us a way to “simulate” the nondeterminism of an NFA using a DFA: just make each subset of $Q$ its own state!
This construction is called the subset construction and was again due to Rabin and Scott in 1959.
Let’s define the subset construction formally.
Let $N = (Q,\Sigma,\delta,q_0,F)$ be an NFA. We will construct a DFA $\mathcal D(N) = (Q',\Sigma,\delta',q_0',F')$ such that $L(\mathcal D(N)) = L(N)$. We will define each component of $\mathcal D(N)$:
Let’s try this with this example. Here’s the NFA again.
And here is the transition table with all the subsets.
| $a$ | $b$ | |
|---|---|---|
| $\emptyset$ | $\emptyset$ | $\emptyset$ |
| $\{q_0\}$ | $\{q_0\}$ | $\{q_0,q_1\}$ |
| $\{q_1\}$ | $\{q_2\}$ | $\{q_2\}$ |
| $\{q_2\}$ | $\{q_3\}$ | $\{q_3\}$ |
| $\{q_3\}$ | $\emptyset$ | $\emptyset$ |
| $\{q_0,q_1\}$ | $\{q_0,q_2\}$ | $\{q_0,q_1,q_2\}$ |
| $\{q_0,q_2\}$ | $\{q_0,q_3\}$ | $\{q_0,q_1,q_3\}$ |
| $\{q_0,q_3\}$ | $\{q_0\}$ | $\{q_0,q_1\}$ |
| $\{q_1,q_2\}$ | $\{q_2,q_3\}$ | $\{q_2,q_3\}$ |
| $\{q_1,q_3\}$ | $\{q_2\}$ | $\{q_2\}$ |
| $\{q_2,q_3\}$ | $\{q_3\}$ | $\{q_3\}$ |
| $\{q_0,q_1,q_2\}$ | $\{q_0,q_2,q_3\}$ | $\{q_0,q_1,q_2,q_3\}$ |
| $\{q_0,q_1,q_3\}$ | $\{q_0,q_2\}$ | $\{q_0,q_1,q_2\}$ |
| $\{q_0,q_2,q_3\}$ | $\{q_0,q_3\}$ | $\{q_0,q_1,q_3\}$ |
| $\{q_1,q_2,q_3\}$ | $\{q_2,q_3\}$ | $\{q_2,q_3\}$ |
| $\{q_0,q_1,q_2,q_3\}$ | $\{q_0,q_2,q_3\}$ | $\{q_0,q_1,q_2,q_3\}$ |
Note that all the subsets in red cannot be reached from the initial state $\{q_0\}$. Now, here is the state diagram for the DFA.
I’ve omitted all the subsets that can’t be reached, but this is still a large DFA relative to the size of the NFA. In fact, one can show that the language $\{a,b\}^* b \{a,b\}^{n-1}$ can be recognized by an NFA with $n+1$ states but requires a DFA with $2^n$ states. (This is why it was interesting to see that an exercise from the first edition of Hopcroft and Ullman’s book asked for a DFA that accepted this language for $n = 10$)
This particular example gives some insight into what our two models tell us about the same language. Both machines attempt to “check” the third-last symbol of the string, but they employ different strategies to do so. One can think of the NFA as “guessing” that the end of the input string is near and making a nondeterministic choice in doing so. However, the DFA does not have the ability to guess. Instead it must keep track of all possibilities. This amounts to “remembering” the last three symbols it has read. How many binary strings of length 3 are there? Exactly $2^3 = 8$, which is exactly the number of states that are in the DFA.
An interesting question that one can consider is when the worst case of $2^n$ states comes up. Similar questions about the worst-case growth in the number of states of a DFA or NFA can be asked for other transformations or language operations in a line of research in automata theory called state complexity.
Now, we want to show that $\mathcal D(N)$ and $N$ recognize exactly the same language. This is done in the usual way: an inductive argument about the states and transitions of the machine, followed by an if and only if argument about acceptance for the machine.
Let $N$ be an NFA. Then $L(N) = L(\mathcal D(N))$.
We want to show that $w \in L(N)$ if and only if $w \in L(\mathcal D(N))$. This means that by definition, $\delta(q_0,w) \cap F \neq \emptyset$ if and only if $\delta'(\{q_0\},w) \cap F \neq \emptyset$. So, we will first show that $\delta(q,w) = \delta'(\{q\},w)$ by induction on $w$.
First, we have $w = \varepsilon$. Then $\delta(q,\varepsilon) = \{q\} = \delta'(\{q\},\varepsilon)$.
Now, let $w = xa$ with $a \in \Sigma$ and string $x \in \Sigma^*$ and assume that $\delta(q,x) = \delta'(\{q\},x)$. Then we have \begin{align*} \delta'(\{q\},xa) &= \delta'(\delta'(\{q\},x),a) &\text{definition of $\delta'$}\\ &= \delta'(\delta(q,x),a) &\text{inductive hypothesis}\\ &= \bigcup_{p \in \delta(q,x)} \delta(p,a) &\text{definition of $\delta'$}\\ &= \delta(q,xa). &\text{definition of $\delta$} \end{align*} Thus, we have shown that $\delta(q,w) = \delta'(\{q\},w)$ for all $q \in Q$ and $w \in \Sigma^*$. Then by the definition of the final states we can conclude that $L(N) = L(\mathcal D(N))$.
This is enough to give us the following.
A language $L$ is accepted by a DFA if and only if $L$ is accepted by an NFA.
We have already shown the if direction of this theorem (if $L$ is accepted by an NFA, then $L$ is accepted by a DFA) in Theorem 6.3. What’s left to show is the only if direction ($L$ is accepted by a DFA only if $L$ is accepted by an NFA). This involves taking a DFA and constructing an equivalent NFA. This is fairly simple to define and prove correctness for, so we won’t go through it.
Another way to state the last theorem is the following:
A language is recognizable if and only if it is accepted by an NFA.
So we are able to conclude the following.
The class of recognizable languages is closed under concatenation.
Let $A$ and $B$ be DFAs. By Theorem 5.9, there exists an NFA that accepts the language $L(A)L(B)$. Then by Theorem 6.4, there must exist a DFA that accepts $L(A)L(B)$. Therefore, $L(A)L(B)$ is recognizable.
Ultimately, what this says is that nondeterminism is more of a tool of expression rather than power. We can certainly accomplish everything that we can do with NFAs without nondeterminism. However, doing so requires a significant cost in the size of our descriptions. At the same time, real computers are deterministic and must simulate nondeterminism. This comes at a cost.
The subset construction reveals that no matter how much capability we add to a finite automaton model, if we’re stuck with a finite number of states, then we should be able to determinize the model by explicitly encoding all the choices as states.
For instance, a question about the definition of NFAs from class was whether the initial state of an NFA had to be a single state or whether we could have used a set of initial states instead. Indeed, this is the definition that Kozen uses. And based on our earlier discussion of the subset construction, it’s not hard to see and show that this change would be one of convenience. We can always turn such machines into standard NFAs and even DFAs.
Now, we’ll consider another possible enhancement to the finite automaton: allowing transitions to other states via $\varepsilon$. This additional trick will be very convenient, but again, we have to ask the question: does this make finite automata more powerful?
First, the formal definition.
An $\varepsilon$-NFA is a 5-tuple $A = (Q,\Sigma,\delta,q_0,F)$ as in the definition of the NFA, except that the transition function is now a function from a state and either a symbol or $\varepsilon$ to a set of states, $\delta : Q \times (\Sigma \cup \{\varepsilon\}) \to 2^Q$.
So the question now is, what is the result of a transition function that allows $\varepsilon$s? Let’s consider the following example.
Consider the state diagram below. We can ask for each state, which other states we should be in if we “read $\varepsilon$”? In other words, if we don’t read anything, where would we be?
You might observe that, as these machines still only have a finite number of states, allowing $\varepsilon$-transitions does not give NFAs any additional power, and you would be absolutely correct. There are a number of ways to get to this conclusion formally:
For the most part, the distinction between an NFA and an $\varepsilon$-NFA is not that big, so it is usually not specified. You can assume that you can use $\varepsilon$-transitions in an NFA unless otherwise specified (but not DFAs).
Here is one way we can use this result.
The class of recognizable languages is closed under Kleene star.
Let $L$ be a recognizable language and suppose $A = (Q,\Sigma,\delta,q_0,F)$ is a DFA that accepts $L$. We can construct an $\varepsilon$-NFA $A'$ that accepts $L^*$ that looks like this:
Formally, we define $A'$ by $A' = (Q \cup \{q_0',\}, \Sigma, \delta', q_0', \{q_0'\})$, where the transition function $\delta' : Q \times (\Sigma \cup \{\varepsilon\}) \to 2^{Q \cup \{q_0'\}}$ is defined by
In other words, we modify $A$ to get $A'$ as follows:
As mentioned prior, $\varepsilon$-transitions are not strictly necessary, but this will make the construction and proof clearer.
First, suppose $w \in L(A’)$. The computation on $w$ must begin at $q_0'$ and must end at $q_0'$, making zero or more traversals through the original DFA $A$. In the case of zero traversals, this is $\varepsilon$, which is in $L^*$ for every language $L$. In the case of a non-empty string $w$, in order for $w$ to end in $q_0'$, each traversal must begin in the original initial state of $A$, $q_0$ and it must end in a final state of $A$, via which it can use an $\varepsilon$-transition to reach $q_0'$. Therefore, wach traversal through $A$ will correspond to a word $u$ such that $\delta(q_0,u) \in F$, i.e. it was accepted by $A$. So we can conclude that $w = u_1 u_2 \cdots u_k$, where $\delta(q_0,u_i) \in F$, for some $k \in \mathbb N$. Hence, each word accepted by $A'$ is a concatenation of zero or more words accepted by $A$ and thus, $w \in L^*$ by definition.
Now, suppose that $w \in L^*$. This means that $w = \varepsilon$, in which case it is accepted, or we can write $w = u_1 u_2 \cdots u_k$, where each $u_i \in L$ for some $k \geq 1$. Since each $u_i \in L$, we have that $u_i \in L(A)$ and therefore, $\delta(q_0, u_i) \in F$. Therefore, we can trace a path for $w$ through $A'$: \[q_0' \xrightarrow{\varepsilon} q_0 \xrightarrow{u_1} \delta(q_0, u_1) \xrightarrow{\varepsilon} q_0' \xrightarrow{\varepsilon} q_0 \xrightarrow{u_2} \delta(q_0, u_2) \xrightarrow{\varepsilon} q_0' \xrightarrow{\varepsilon} q_0 \rightarrow \cdots \rightarrow \delta(q_0, u_k) \xrightarrow{\varepsilon} q_0' .\] Therefore, $\delta'(q_0', w) \cap \{q_0'\} \neq \emptyset$ and therefore $w \in L(A')$.
Did we really need $\varepsilon$-transitions to prove this? According to the results we proved earlier, not really. But we also shouldn’t really need nondeterminism to prove this either. Again, the choice of model is really about how to express the solution. In this case, $\varepsilon$-transitions give us a way to control the entrance and exit of a computation into the original DFA $A$. This becomes a convenient mechanism to use in our proof.