We’ve seen now that regular languages have some finiteness property that we’ve waved our hands around, no matter the formalism we use. This property is clearly exhibited by the states of a finite automaton. We see that this property is present in the statement of the pumping lemma and that if a language doesn’t have this property then we can conclude it is not regular. And we now see it show up again when considering the derivatives of a regular expression.
What is this property? It’s time to try to clearly articulate it. Recall that when we wanted to formally prove the correctness of a finite automaton, we had to argue that its states try to “group” strings together and every string belongs to one of these groups. Suppose $L$ is a regular language. Then there’s a DFA $A$ that accepts $L$ and we know for any string $w \in \Sigma^*$, we always have that $\delta(q_0, w) \in Q$. This suggests that we should start thinking of some kind of equivalence between these strings that is related to our language $L$.
Let’s recall some definitions surrounding equivalence.
A relation $\mathsf R$ over a set $S$ is a subset $\mathsf R \subseteq S \times S$ and we write $x \mathrel{\mathsf R} y$ if $(x,y) \in \mathsf R$. A relation $\mathsf R$ is an equivalence relation if it satisfies the following:
An equivalence relation partitions $S$ into equivalence classes. An equivalence class with representative $x$ is $$[x] = \{y \in S \mid x \mathrel{\mathsf R} y\}.$$
An equivalence relation that most of you should be familiar with is $\equiv_m$, or equivalence modulo $m$ for the integers. For instance, we know that $\equiv_2$ has two equivalence classes: the odd and even integers. This is a partition because every integer must be odd or even and not both.
Now, let’s define an equivalence relation based on a DFA.
Let $A = (Q, \Sigma, \delta, q_0, F)$ be a DFA. We define the relation $\sim_A$ for words $u,v \in \Sigma^*$ by $u \sim_A v$ if and only if $\delta(q_0,u) = \delta(q_0,v)$.
In other words, we can consider two words to be equivalent under $\sim_A$ if they reach the same state. Notice that a DFA naturally partitions all strings—every string gets taken to some state of the DFA and exactly one state. And the notion that the DFA considers two strings equivalent if they reach the same state makes sense: all it knows is that the two strings reached a particular state, but once a string enters the state, it doesn’t know anything more about it, like its exact path it took to get there.
The relation $\sim_A$ is an equivalence relation on $\Sigma^*$.
We will show that $\sim_A$ satisfies each of reflexivity, symmetry, and transitivity.
Therefore, $\sim_A$ is an equivalence relation.
The fact that an automaton will naturally partition all the strings and group them into equivalence classes is interesting but it suggests that this should really be a property of the language. Why? Because a langauge can have many DFAs that accept it. But if you think about it, all of these DFAs should have similar kinds of partitions so there should be a way to think about how strings are related with respect to a language and that should be where the automaton gets its structure from. So we’ll try to define a similar kind of relation based on the language rather than a particular automaton.
We will need some additional definitions about equivalence relations.
The index of a relation $\mathsf R$ is the number of equivalence classes of $\mathsf R$; we say $\mathsf R$ is of finite index if it has finitely many equivalence clases. For two relations $\mathsf R_1, \mathsf R_2$ over $S$, we say $\mathsf R_1$ is a refinement of, or refines $\mathsf R_2$ if $\mathsf R_1 \subseteq \mathsf R_2$ (i.e. $\forall x,y \in S, x \mathrel{\mathsf R_1} y \rightarrow x \mathrel{\mathsf R_2} y$).
The following property is almost immediate from the definition.
If $\mathsf R_1$ refines $\mathsf R_2$, then each equivalence class of $\mathsf R_2$ is a union of some equivalence classes of $\mathsf R_1$.
Consider the equivalence relation is $\equiv_4$, equivalence modulo 4, which has four equivalence classes $[0], [1], [2], [3]$ (so $\equiv_4$ has index 4). Observe that $\equiv_4$ is a refinement of $\equiv_2$. Why?
In other words, we say that $\equiv_4$ refines $\equiv_2$ because it “breaks up” each equivalence class into more equivalence classes: $[0]_2$ into $[0]_4$ and $[2]_4$, and $[1]_2$ into $[1]_4$ and $[3]_4$. The integers mod 4 are “finer” than the integers mod 2 because we are more “specific” (there are more equivalence classes) about the relationships between those integers.
It is not hard to see that for any DFA $A$, the relation $\sim_A$ should have finite index.
Let $A$ be a DFA. The relation $\sim_A$ has finite index.
Let $A = (Q, \Sigma, \delta, q_0, F)$. There is an equivalence class for each state of $A$, therefore the index of $\sim_A$ is at most $|Q|$ and is finite.
Next, we’ll introduce an important definition, which moves us from talking about generic equivalence relations to thinking about relations on strings.
We say that a relation $\sim$ is right invariant if $\forall x,y,z, x \sim y \rightarrow xz \sim yz$.
One can define a notion of right-invariance on any operation where order matters. For example, you can say that $\equiv_m$ is right-invariant with respect to modular multiplication (i.e. if $a \equiv_m b$ then $ac \equiv_m bc$ for all $c \in \mathbb Z_m$). In this case, we are dealing with concatenation. This basically says that a relation is right-invariant if attaching the same suffix to two strings doesn’t change anything about their relationship under $\sim$.
Let $A$ be a DFA. The relation $\sim_A$ is right-invariant.
Let $A = (Q, \Sigma, \delta, q_0, F)$. Suppose $x \sim_A y$ and let $z \in \Sigma^*$. Then \begin{align*} \delta(q_0,xz) &= \delta(\delta(q_0,x), z) \\ &= \delta(\delta(q_0,y), z) & \text{since $x \sim_A y$}\\ &= \delta(q_0,yz) \end{align*} Therefore, $xz \sim_A yz$ and therefore $\sim_A$ is right-invariant.
Let’s think about what this is saying. It basically says that two strings that end up in the same state are equivalent and that attaching the same suffix to both strings, the resulting strings are also equivalent. Of course, this makes sense: once we start from the same state, if we read the same string, we will necessarily end up in the same state. In this sense, the two strings are equivalent: we have forgotten what distinguishes them at the point that they arrive in the same state.
One of the consequences of this idea of states as equivalence classes means that a language that is accepted by a DFA can be described as a union of entire equivalence classes. That is, there’s no situation where we only take some of the words of some equivalence class to be in the language.
Let $A$ be a DFA. The language $L(A)$ is a union of equivalence classes of $\sim_A$.
Let $A = (Q, \Sigma, \delta, q_0, F)$. We will show if $x \sim_A y$, then $x \in L \iff y \in L$. That is, if $x$ and $y$ are in the same equivalence class, they must both be in the language. This means that if $x \in L$, then the entire equivalence class $[x]$ is contained in $L$. Suppose $x \sim_A y$. Then $$x \in L \iff \delta(q_0,x) \in F \iff \delta(q_0,y) \in F \iff y \in L.$$
Now, so far, we have still been defining relations about DFAs, not the language itself. Recall that our motivation today is to try to get away from the machine. A big reason for this is that the language is really the object that we care about, but a language can have many machines that accept it. Our goal is to try to avoid having to say something about many machines and hope that we can say something specific about the language that will cover all of those machines.
To that end, let’s try to define a similar kind of equivalence relation for languages as $\sim_A$, based on the language instead of a DFA. That is, we want an equivalence relation about strings where, adding the same suffixes to the strings “does the same thing” to them. But because we’re dealing with languages, “the same thing” really boils down to membership in the language.
Let $L \subseteq \Sigma^*$ be a language. We define the relation $\equiv_L$ for words $x,y \in \Sigma^*$ by $x \equiv_L y$ if and only if $\forall z \in \Sigma^*, xz \in L \iff yz \in L$. The relation $\equiv_L$ is called the Myhill–Nerode relation.
Let $L = \{a^k b \mid k \geq 0\}$. Then $aaa \equiv_L aaaaaaa$, since for any string $x \in \{a,b\}^*$, we have $aaax \in L$ if and only if $aaaaaaax \in L$. On the other hand, $aab \not \equiv_L aaaaa$, since there is at least one string, $y = ab$, for which $aaby = aabab \not \in L$ and $aaaaay = aaaaaab \in L$.
Note that the relation $\equiv_L$ is defined for any language $L$, not just those that are regular. It’s also important to note that the relation applies to all words, not just those in $L$.
It turns out that Myhill–Nerode relations are right-invariant equivalence relations.
$\equiv_L$ is an equivalence relation.
$\equiv_L$ is right-invariant.
Suppose that $x \equiv_L y$. Then $xu \in L \iff yu \in L$ for all $u \in \Sigma^*$. Let $u = vz$. Then, $$x(vz) \in L \iff (xv)z \in L \iff (yv)z \in L \iff y(vz) \in L.$$
What this leads to is the major result that we’re interested in: a characterization of regular languages that directly invokes the finite property we’ve been looking for.
Let $L \subseteq \Sigma^*$ be a language. Then $L$ is regular if and only if the Myhill–Nerode relation $\equiv_L$ is of finite index. That is, $\equiv_L$ has finitely many equivalence classes.
Let’s put a few things together.
In order to prove the Myhill–Nerode theorem, we’ll first prove a lemma that gives us a way to relate the equivalence relation $\sim_A$ for a DFA $A$ (and other similar relations) with the Myhill–Nerode relation for the language.
Let $L \subseteq \Sigma^*$, and let $\approx$ be any right-invariant equivalence relation on $\Sigma^*$ such that $L$ is a union of equivalence classes of $\approx$. Then $\approx$ refines $\equiv_L$.
Suppose $x \approx y$. Then $xu \approx yu$ for all $u \in \Sigma^*$. Since $L$ is a union of equivalence classes of $\approx$, we have $xu \in L$ if and only if $yu \in L$. Therefore, $x \equiv_L y$.
Since $\sim_A$ is a right-invariant relation and $L$ is a union of equivalence classes of $\sim_A$, it satisfies the lemma. We’ll use this fact in the proof of the Myhill–Nerode theorem. We’ll restate the theorem so that it’s about three conditions rather than just two.
Let $L \subseteq \Sigma^*$ be a language. The following are equivalent:
$(1) \implies (2)$: If $L$ is regular, then there exists a DFA $A$ that recognizes $L$. Then $L$ is the union of equivalence classes of $\sim_A$, which is a right invariant equivalence relation of finite index.
$(2) \implies (3)$: Since $\sim_A$ is a right invariant equivalence relation such that $L$ is a union of equivalence classes of $\sim_A$, by Lemma 11.17, $\sim_A$ refines $\equiv_L$. Since $\sim_A$ is of finite index, the index of $\equiv_L$ is at most the index of $\sim_A$, and so $\equiv_L$ must also be of finite index.
$(3) \implies (1)$: We define a DFA $A_{\equiv_L} = (Q,\Sigma,\delta,q_0,F)$ that recognizes $L$, with
Here, the states of our DFA are exactly the equivalence classes of $\equiv_L$. Since $\equiv_L$ has finite index, this means the machine we defined has a finite number of states.
To show that $\delta$ is well-defined for our equivalence classes, suppose that $x \equiv_L y$. Since $\equiv_L$ is right-invariant, we have $xa \equiv_L ya$ for $a \in \Sigma$. Then, $$\delta([x],a) = [xa] = [ya] = \delta([y],a).$$
Then to show that $L(A_{\equiv_L}) = L$, we see that for $w \in \Sigma^*$, \begin{align*} w \in L(A_{\equiv_L}) & \iff \delta([\varepsilon],w) \in F \\ &\iff [w] \in F \\ &\iff w \in L \end{align*}
One thing to note about the last part is that this is not an algorithm for constructing a DFA—given a language (whatever that means), it’s not necessarily the case that we “know” its equivalence classes or use this information in an algorithm. Recall that a language is not really an object that we can “compute” with. Rather, we need some representation of this language in order to do anything with it. So what the last part of the proof really says is that given the structure of $L$ and the properties of $\equiv_L$, there must exist some machine out there that looks and works like the one we described.
The Myhill–Nerode theorem is called as such because it was independently proved by John Myhill and Anil Nerode in the late 50s. Myhill worked in logic and computability theory moving among philosophy and math departments in the 1950s and eventually settled at SUNY Buffalo in the 1960s. Nerode is currently a professor at Cornell, where he’s been a faculty member for 60 years. More importantly, he proved this theorem while he was a PhD student here, working with Saunders Mac Lane.
There are some significant consequences of the Myhill–Nerode theorem. One of the most immediate is that it gives us another way to determine whether a language is regular or not.
Let $L = \{a^n b^n \mid n \in \mathbb N\}$. We claim that the equivalence classes of $\equiv_L$ are $$\{[\varepsilon], [a], [aa], \dots\} \cup \{[b],[ab],[aab],\dots\}.$$ We need to show that each of these classes are actually different. To do this, we wwill show that they are pairwise distinct. That is, the strategy is to pick any two classes and show that there exists a word $w$ such that $uw \in L$ and $vw \not \in L$ (or vice-versa).
From our enumeration above, we’ve grouped the equivalence classes into two broad sets. To show that every class is pairwise distinct, we’ll need to argue that taking two from the same set will be different for each set, then taking one class from each will also be different.
This shows that there are at least as many equivalence classes as we’ve described, but to fully characterize the language, we want to make sure that we’re not missing any other possible classes. Consider a possible equivalence class defined by a word $w$. There are two possibilities.
It’s clear from this that $L$ has infinitely many equivalence classes and so it can’t be regular. However, we could imagine that if we could build an infinite state automaton, the Myhill–Nerode equivalence classes would tell us how to do it (we will not do this).