CMSC 28000 — Lecture 6

$\varepsilon$-transitions

Now, we'll consider another possible enhancement to the finite automaton: allowing transitions to other states via $\varepsilon$. This additional trick will be very convenient, but again, we have to ask the question: does this make finite automata more powerful? Some of you have already caught on when I introduced the NFA with multiple initial states and wondered if multiple initial states really makes a difference over a single initial state. That same intuition will serve us when we think about how $\varepsilon$-transitions actually work.

First, the formal definition.

Definition 6.1. An $\varepsilon$-NFA is a 5-tuple $A = (Q,\Sigma,\delta,q_0,F)$ as in the definition of the NFA, except that the transition function is now a function from a state and either a symbol or $\varepsilon$ to a set of states, $\delta : Q \times (\Sigma \cup \{\varepsilon\}) \to 2^Q$.

So the question now is, what is the result of a transition function that allows $\varepsilon$s? What we would like is a way to say, for some set of states $P \subseteq Q$, "all the states that are reachable on one or more $\varepsilon$-transitions from a state $p \in P$". We'll define the $\varepsilon$-closure of $P$ to be exactly this.

Definition 6.2. The $\varepsilon$-closure of a state $q \in Q$ is denoted by $\overline \varepsilon(q)$ and is defined inductively by

Then the $\varepsilon$-closure of a set $P \subseteq Q$, denoted $\overline \varepsilon(P)$ is defined by $$\overline \varepsilon(P) = \bigcup_{p \in P} \overline \varepsilon(p).$$

Now, we can define our new extended transition function.

Definition 6.3. Let $\hat \delta : Q \times \Sigma^* \to 2^Q$ be a function defined as follows,

In other words, our new extended transition function for the $\varepsilon$-NFA is just the one for the ordinary NFA, except that we need to ensure that we account for the $\varepsilon$-closure of the result.

You might observe that, as we still only need to consider at most $2^{|Q|}$ possible sets, that allowing $\varepsilon$-transitions do not give NFAs any additional power, and you would be absolutely correct. At this point, you can play the same game as with the NFA and perform the subset construction to acquire an equivalent DFA, as long as you make sure to account for the $\varepsilon$-closure.

If we have an $\varepsilon$-NFA $N = (Q,\Sigma,\delta,q_0,F)$, we can define a DFA $\mathcal D_\varepsilon(N) = (Q',\Sigma,\delta',q_0',F')$ in the same way as the subset construction above, but with two differences:

  1. $q_0' = \overline \varepsilon(q_0)$,
  2. for a set of states $S \subseteq Q$ and symbol $a \in \Sigma$, the transition function $\delta'$ is defined $$\delta'(S,a) = \overline \varepsilon \left(\bigcup_{p \in S} \delta(p,a)\right).$$

In other words, the subset construction for $\varepsilon$-NFAs is exactly the same as for NFAs, except that all the states of $\mathcal D_\varepsilon(N)$ are $\varepsilon$-closed subsets of $Q$. We can then use this to show the following.

Theorem 6.4. A language $L$ is accepted by an $\varepsilon$-NFA if and only if $L$ is accepted by a DFA.

The proof is almost exactly the same as for ordinary NFAs so we won't be going through it here. This result establishes that DFAs, NFAs, and $\varepsilon$-NFAs are all equivalent models and that they all recognize the same class of languages, the regular languages. In other words, nondeterminism and $\varepsilon$-transitions do not give our basic finite automaton model any additional power.

Now, let's finally see how to build an $\varepsilon$-NFA for the concatenation of two languages.

Theorem 6.5. The class of regular languages is closed under concatenation.

Proof. Let $L_1$ and $L_2$ be regular languages that are recognized by DFAs $A_1 = (Q_1, \Sigma, \delta_1, q_{0,1}, F_1)$ and $A_2 = (Q_2, \Sigma, \delta_2, q_{0,2}, F_2)$, respectively. We will construct an $\varepsilon$-NFA $A' = (Q',\Sigma, \delta', I', F')$ to recognize $L_1 L_2$. Here, we have

and the transition function $\delta':(Q_1 \cup Q_2) \times (\Sigma \cup \{\varepsilon\}) \rightarrow $2^{Q_1 \cup Q_2}$ is defined by

Here is a schematic of our machine.

This construction hinges on the observation that we can only make it into the automaton $A_2$ if we hit a final state of $A_1$. Consider a word $w \in \Sigma^*$ that's accepted by $A'$. Then $\delta'(\{q_{0,a}\}, w) \cap F_2 \neq \emptyset$. So we can split the word up into two words $w = xy$. Since the only entry point into $A_2$ is $q_{0,2}$, we have that $\delta_2(q_{0,2},y) \in F_2$. But in order for $q_{0,2}$ to be reached, there had to be some word $x$ such that $\delta_1(q_{0,1},x) \in F_1$. This gives us our two words $x \in L(A_1)$ and $y \in L(A_2)$. $\tag*{$\Box$}$

Let's think about this machine a bit. One thing we may notice is that we can remove the $\varepsilon$-transitions by forming $\varepsilon$-closures of states. To do this, we can add new transitions so that any state that has a transition to a final state of $A_1$ will go to the initial state of $A_2$ on that same symbol. Another way to do this would be to add transitions from $q_{0,2}$ to each of the final states of $A_1$.

And of course, we could construct this NFA and then turn it into a DFA with the subset construction. If $A_1$ was a DFA with $m$ states and $A_2$ was a DFA with $n$ states, then our NFA would have $m+n$ states and performing the subset construction gives us as many as $2^{m+n}$ states. However, we can get away with a lot fewer than this if take a closer look at how the NFA works.

There are some key observations. First, the only point where nondeterminism is introduced in our machine is when we enter a final state of $A_1$. Secondly, we only make use of nondeterminism when exploring $A_2$; we're ever only in one state of $A_1$ at a time.

These observations lead to the following construction, from the early 90s due to Yu, Zhuang, and Salomaa, where they study similar constructions for basic language operations. They also give lower bound examples for these constructions, showing that the these bounds are reachable in the worst case.

Theorem 6.6. Let $A_1$ and $A_2$ be DFAs with $m$ and $n$ states, respectively. Then there exists a DFA with at most $m2^n - 2^{n-1}$ states that recognizes $L(A_1)L(A_2)$.

Proof. Let $A_1 = (Q_1, \Sigma, \delta, q_{0,1}, F_1)$ and $A_2 = (Q_2, \Sigma, \delta_2, q_{0,2}, F_2)$. We will construct a DFA $A' = (Q',\Sigma,\delta',q_0',F')$, where

This DFA keeps track of a computation in $A_1$ and possibly many computations in $A_2$. We begin a new computation in $A_2$ by adding the initial state of $A_2$ whenever we reach a final state in $A_1$, mimicking the $\varepsilon$-transition from our NFA construction. We accept if one of our computations in $A_2$ that we've been keeping track of ends in a final state of $A_2$. We can make the same argument as for the NFA that this is only possible if a final state of $A_1$ was reached at some earlier point.

We'll talk about how we get the number of states next class.