A second important consequence of the Myhill–Nerode theorem comes from the proof of the Myhill–Nerode theorem, when it defines a DFA $A_{\equiv_L}$ for a regular language $L$. If the Myhill–Nerode equivalence classes are about distinguishable prefixes with respect to $L$, this suggests that a DFA recognizing $L$ will need at least as many states as $L$ has Myhill–Nerode equivalence classes. Since these equivalence classes correspond to the states of $A_{\equiv_L}$, which means that any DFA that recognizes $L$ must have at least as many states as $A_{\equiv_L}$.
For example, recall that we showed that we can use the product construction to obtain a DFA that recognizes the intersection of two regular languages. If the two languages have DFAs of size $m$ and $n$, then the product construction gives a DFA with up to $mn$ states. But do we really need that many? We can show that there exist languages for which we do need all of those states.
For all $m,n \geq 1$, there exist regular languages $L_m$ and $L_n$ recognized by a DFA with $m$ and $n$ states, respectively, such that a DFA that recognizes $L_m \cap L_n$ must have at least $mn$ states.
Let $\Sigma = \{a,b\}$ and consider the languages \begin{align*} L_m &= \{w \in \Sigma^* \mid |w|_a = 0 \pmod m\}, \\ L_n &= \{w \in \Sigma^* \mid |w|_b = 0 \pmod n\}. \end{align*} Then $L_m$ can be recognized by a DFA with $m$ states and $L_n$ can be recognized by a DFA with $n$ states (you can try to find the DFAs yourselves as an easy exercise). Then $$L_m \cap L_n = \{w \in \Sigma^* \mid |w|_a = 0 \pmod m \wedge |w|_b = 0 \pmod n\}.$$ So we need to consider the equivalence classes of $\equiv_{L_m \cap L_n}$.
We will consider strings $a^i b^j$ with $0 \leq i \lt m$ and $0 \leq j \lt n$ to be our representatives for the equivalence classes. To show that these are all distinct, we consider two words $x = a^i b^j$ and $y = a^k b^\ell$ with $0 \leq i, k \lt m$, $0 \leq j, \ell \lt n$, and $x \neq y$. This means that $i \neq k$ or $j \neq \ell$ (or both). We suppose $i \neq k$; the $j \neq \ell$ case follows a similar argument.
We choose the word $z = a^{m-i} b^{n-\ell}$. Then $xz \in L_m \cap L_n$ but $yz \not \in L_m \cap L_n$. To see this, the number of $a$s in $xz$ is $i + m - i = 0 \pmod m$ and the number of $b$s in $xz$ is $j + n - j = 0 \pmod n$. But the number of $a$s in $yz$ is $k + m - i \neq 0 \pmod m$ if $k \neq i$. So $z$ distinguishes $x$ and $y$.
This means that there are at least $mn$ pairwise distinguishable Myhill–Nerode equivalence classes ($i$ can range from a to $m-1$ and $j$ can range from a to $n-1$).
Since we know that the product construction has at most $mn$ states, together with this result, we’ve also just shown that there’s a DFA for $L_m \cap L_n$ that has exactly $mn$ states.
This leads to the notion of a minimal DFA, in terms of the number of states it has, for a language. So it’s clear from this that any minimal DFA will have as many states as there are Myhill–Nerode equivalence classes for its language, but we can actually say something stronger than that: the Myhill–Nerode DFA is the only minimal DFA for its language.
Let $L$ be a regular language. The DFA $A_{\equiv_L}$ is the unique minimal DFA for $L$ up to isomorphism.
Let $A = (Q,\Sigma,\delta,q_0,F)$ be a minimal DFA for $L$. Recall that the equivalence classes of the relation $\sim_A$ correspond to states of $A$. By Lemma 11.17, $\sim_A$ refines $\equiv_L$ and therefore, the index of $\sim_A$ must be at least the index of $\equiv_L$. Since we assumed $A$ was minimal, this means that $A$ has exactly as many states as $A_{\equiv_L}$.
Now, we need to show that there is a correspondence between states of $A$ and states of $A_{\equiv_L}$. Let $q \in Q$. Then $q$ must be reachable from the initial state $q_0$, since otherwise, we can remove it. This means there exists a word $w \in \Sigma^*$ such that $\delta(q_0,w) = q$. Then $q$ corresponds to the equivalence class for $[w]$.
Assuming it’s computable, the existence of a minimal DFA and the fact that a regular language has a unique minimal DFA gives us a way to test whether two DFAs or NFAs or regular expressions correspond to the same language: just turn the two objects into a minimal DFA and if they’re the same, then they accept the same language.
Luckily, the minimal DFA can be computed, and even better is that it is efficiently computable. The algorithm with the best worst-case time complexity is Hopcroft’s partition refinement algorithm (due to Hopcroft in 1971) which for an $n$-state DFA has time complexity $O(n \log n)$.
However, we will instead work through the more straightforward algorithm that Kozen gives in Chapter 14 of the textbook. The basic idea behind this is to try to identify states in the DFA that are indistinguishable.
Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA. For states $p,q \in Q$, we say that $p$ and $q$ are indistinguishable if for all $w \in \Sigma^*$, $\delta(p,w) \in F \iff \delta(q,w) \in F$.
A consequence of this definition is that if $p$ and $q$ are indistinguishable, then $p$ and $q$ must correspond to the same Myhill–Nerode equivalence class and we can replace both states with a single state. Now, as stated, this seems difficult to do (verify for all strings in $\Sigma^*$), so we can reframe this property in terms of distinguishability.
Let $A = (Q, \Sigma, \delta, q_0, F)$ be a DFA. Two states $p,q \in Q$ are said to be distinguishable if
Notice that we get this by negating the definition of indistinguishability and implicitly setting up the definition inductively on strings. This leads us to our algorithm for determining distinguishability of pairs of states, which was first formulated by Moore in 1956.
Given a DFA $A = (Q, \Sigma, \delta, q_0, F)$,
Here is an example. We have the following DFA.
First, we set up our table and consider the first iteration of our algorithms. Note that since pairs are unordered, we only fill half the table, so to distinguish unmarked cells from empty cells, we use -. Marked pairs of states are marked by which round of iteration they were marked in, for demonstrative purposes.
| $q_1$ | $q_2$ | $q_3$ | $q_4$ | $q_5$ | |
|---|---|---|---|---|---|
| $q_0$ | 1 | 1 | - | - | 1 |
| $q_1$ | - | 1 | 1 | - | |
| $q_2$ | 1 | 1 | - | ||
| $q_3$ | - | 1 | |||
| $q_4$ | 1 |
Next, we consider each pair of unmarked states and determine whether each pair reaches a marked pair. We have that $(q_1,q_5)$ and $(q_2,q_5)$ get marked in this phase, because $(\delta(q_1,a), \delta(q_5,a)) = (q_3,q_5)$ and $(\delta(q_2,a), \delta(q_5,a)) = (q_4,q_5)$, which have been marked distinguishable.
The reasoning here is simple: suppose that a pair wasn’t distinguishable. Then, because the “reaching the same state relation” $\sim_A$ is right-invariant, reading any word from the two states should get us to indistinguishable states. But clearly, this isn’t the case, since reading $a$ brings us to a pair of distinguishable states. It’s not too hard to see that we can extend this argument inductively.
| $q_1$ | $q_2$ | $q_3$ | $q_4$ | $q_5$ | |
|---|---|---|---|---|---|
| $q_0$ | 1 | 1 | - | - | 1 |
| $q_1$ | - | 1 | 1 | 2 | |
| $q_2$ | 1 | 1 | 2 | ||
| $q_3$ | - | 1 | |||
| $q_4$ | 1 |
We do another round of this, taking into account the new pairs of distinguishable states.
| $q_1$ | $q_2$ | $q_3$ | $q_4$ | $q_5$ | |
|---|---|---|---|---|---|
| $q_0$ | 1 | 1 | 3 | 3 | 1 |
| $q_1$ | - | 1 | 1 | 2 | |
| $q_2$ | 1 | 1 | 2 | ||
| $q_3$ | - | 1 | |||
| $q_4$ | 1 |
At this point, we’re finished because and we have two pairs of states that are indistinguishable. We can merge these states safely to get the following DFA.
An initial analysis of this algorithm puts it at something like $O(|\Sigma|n^4)$, where $n$ is the number of states in the DFA: there are $\binom n 2$ pairs of states and each iteration of the algorithm adds 1 pair in the worst case. A more careful analysis gets us down to $O(|\Sigma| n^3)$ via a not so obvious argument that the number of iterations is the length of the shortest distinguishing string for the last distinguishable pair—this happens to be no more than $n$. Then if you are slightly more clever, we can get this down to $O(|\Sigma| n^2)$, by maintaining a list of pairs $\{p',q'\}$ for each pair $\{p,q\}$, with $p' = \delta(p,a)$ and $q' = \delta(q',a)$ for some $a \in \Sigma$. Then, once the lists are built, we can mark pairs by going through the list in one go.
By virtue of being a polynomial-time algorithm, we’ve shown that we can consider DFA minimization to be “efficient”. So we can take for granted that we can efficiently determine whether two DFAs recognize the same language. For NFAs and regular expressions, things are a bit harder because the determinization process can lead to exponential state blowups. (In fact, while this puts DFA equality in $\mathsf P$, NFA and regular expression equality are actually $\mathsf{PSPACE}$-complete.)
Speaking of exponential state blow up, an algorithm that takes a very different approach from the Moore algorithm is due to Brzozowski. It is surprisingly simple to describe.
The DFA $\mathcal D((\mathcal D(A^R))^R)$ is a minimal DFA equivalent to $A$.
This gives us the algorithm: take your DFA $A$, compute the NFA for its reversal, compute the DFA for the reversal via the subset construction, reverse that, and finally do the subset construction on the result of that. Why does it work? It turns out one can show that applying the subset construction to an NFA that reverses a DFA will give you exactly the minimal DFA for the reversal of original DFA! So all you have to do is do it twice to get back where you started.
Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA accepting $L$ and suppose that every state of $A$ is reachable. Then $\mathcal D(A^R)$ is a minimal DFA for $L^R$.
Because Brzozowski’s algorithm is worst-case exponential, its complexity has been studied over the last few decades, both experimentally and through average-case analysis. Part of the challenge of doing such analysis is that it’s difficult to generate random DFAs and NFAs with a reasonable distribution and that behave in a reasonable way. For instance, just randomly generating DFAs with $n$ states by randomly generating transitions between them will result in a lot of disconnected DFAs.
This is for those of you who care about algebra. The minimal automaton that we get out of all of this is also called the quotient automaton, because if you think about our automaton as an algebraic object (which it is), then we’re essentially performing a quotient construction on it. Generally speaking, we could refer to any equivalence relation, but the one that matters the most in our case is $\equiv_L$, the Myhill–Nerode relation for the language of the machine. And so we can refer to this as $A/\equiv_L$ if we want.
As I’ve mentioned offhandedly at various points, we can think of languages as subsets of the free monoid on $\Sigma$. Now, we have these notions of equivalence classes of words and quotient constructions on automata so you may be wondering how these are all related.
Back when we defined regular languages in terms of regular operations, I mentioned that the same operations (union, concatenation/product, star) were called rational operations by the French and the algebraists. This terminology typically arises when applied to general monoids, not just free monoids, that are then used to define the rational subsets of a monoid.
In fact, there is another notion that we can apply that has suspiciously familiar terminology, called recognizable subsets of a monoid. We use recognizable in kind of but not really the same way as we do when thinking about the finite automaton as a machine, but again, it turns out there is a more general monoid definition for this. A subset $X \subseteq M$ of a monoid $M$ is said to be recognizable if there exist a finite monoid $N$, homomorphism $\varphi:M \to N$ and a subset $P \subseteq N$ such that $X = \varphi^{-1}(P)$.
So if we have $M = \Sigma^*$, then $X$ would be a language. The requirement that $N$ is a finite monoid should get your spidey-senses tingling and that it probably has to do with the Myhill–Nerode equivalence classes somehow. Using that assumption for now, the rest of the definition falls into place: our language $X$ is just the preimage of words that map, via our monoid homomorphism $\varphi$, to $P$, which we think should be the subset of the Myhill–Nerode equivalence classes that make up our language.
Our intuition and feeling around gets us most of the way there, but falls short, since the Myhill–Nerode equivalence is only concerned with right-sided syntactic equivalence ($u \equiv_L v$ iff for all $x \in \Sigma^*, ux \in L \iff vx \in L$). But there’s also a left-sided version of this: $u \approx_L v$ if and only if $\forall x \in \Sigma^*, xu \in L \iff xv \in L$. If we combine the two, we get what is called the Myhill relation or syntactic relation, $$u \cong_L v \iff (\forall s,t \in \Sigma^*, sut \in L \iff svt \in L).$$ Then our finite monoid $N$ from the definition is exactly the quotient monoid $\Sigma^*/\cong_L$. This is called the syntactic monoid of $L$.
So it’s probably still not clear how this is useful, so let’s consider yet another monoid. This time, let’s take our language $L$ and some DFA $A$ that accepts it. It would probably not surprise you at this point that we can also map the structure of the DFA onto a monoid. Instead of thinking of the transition function as a function $\delta:Q \times \Sigma \to Q$, we can consider a set of functions $\delta_a : Q \to Q$, that for each symbol $a \in \Sigma$ describe a transformation of $Q$. Then, we can compose these functions to get transformations on words by $\delta_w = \delta_a \circ \delta_x$ for $w = ax$ with $a \in \Sigma$ and $x \in \Sigma^*$. From this, we get the transition monoid of $A$, which is the monoid of our set of functions $\{\delta_a \mid a \in \Sigma\}$ together with function composition as the binary operation. This gives us a straightforward monoid homomorphism from $\Sigma^*$ to the transition monoid.
The big connection is this: the transition monoid of the minimal DFA of $L$ is isomorphic to the syntactic monoid of $L$. And with this, Kleene's theorem can be restated as a theorem about monoids. Recall that Kleene's theorem says that if a language is regular (can be described using regular operations), then it is accepted by a DFA. But we know that regular languages are just rational subsets (in the monoid sense) of $\Sigma^*$, which are those subsets that can be described using rational operations, which are just regular operations. We also know that languages that are accepted by DFAs are just recognizable subsets (in the monoid sense) of $\Sigma^*$, which are those subsets that are recognized by a finite syntactic monoid, which is isomorphic to the transition monoid of a finite automaton. So we have
Let $\Sigma$ be a finite alphabet. A language $L \subseteq \Sigma^*$ is rational if and only if $L$ is recognizable.
Very cool.