CMSC 28000 — Lecture 11

The fact that an automaton will naturally partition all the strings and group them into equivalence classes is interesting but it suggests that this should really be a property of the language. Why? Because a langauge can have many DFAs that accept it. But if you think about it, all of these DFAs should have similar kinds of partitions so there should be a way to think about how strings are related with respect to a language and that should be where the automaton gets its structure from. So we'll try to define a similar kind of relation based on the language rather than a particular automaton.

The index of a relation $R$ is the number of equivalence classes of $R$; $R$ is of finite index if it has finitely many equivalence clases. For two relations $R_1, R_2$ over $S$, we say $R_1$ is a refinement of, or refines $R_2$ if $R_1 \subseteq R_2$ (i.e. $\forall x,y \in S, x \mathrel{R_1} y \rightarrow x \mathrel{R_2} y$). If $R_1$ refines $R_2$, then each equivalence class of $R_2$ is a union of some equivalence classes of $R_1$.

Consider the equivalence relation is $\equiv_4$, equivalence modulo 4, which has four equivalence classes $[0], [1], [2], [3]$ (so $\equiv_4$ has index 4). Observe that $\equiv_4$ is a refinement of $\equiv_2$. Why? Suppose $u \equiv_4 v \equiv_4 3$. Then they're clearly odd numbers, so this must mean that $u \equiv_2 v \equiv_2 1$. Similarly, if $u \equiv_4 v \equiv_4 2$, then they're even. But $\equiv_3$, equivalence modulo 3, is not a refinement of $\equiv_2$—we have $3 \equiv_3 6$ but $3 \not\equiv_2 6$.

In other words, we say that $\equiv_4$ refines $\equiv_2$ because it "breaks up" the equivalence classes into more equivalence classes: $[0]_2$ into $[0]_4$ and $[2]_4$, and $[1]_2$ into $[1]_4$ and $[3]_4$. The integers mod 4 are "finer" than the integers mod 2 because we are more "specific" (there are more equivalence classes) about the relationships between those integers.

There is an equivalence class for each state of $A$, therefore the index of $\sim_A$ is at most $|Q|$ and is finite.

We say that a relation $\sim$ is right invariant if $\forall x,y,z, x \sim y \rightarrow xz \sim yz$.

Suppose $x \sim_A y$ and let $z \in \Sigma^*$. Then \begin{align*} \delta(q_0,xz) &= \delta(\delta(q_0,x), z) \\ &= \delta(\delta(q_0,y), z) \\ &= \delta(q_0,yz) \end{align*} Therefore, $xz \sim_A yz$ and therefore $\sim_A$ is right-invariant.

Let's think about what this is saying. It basically says that two strings that end up in the same state are equivalent and that attaching the same suffix to both strings, the resulting strings are also equivalent. Of course, this makes sense: once we start from the same state, if we read the same string, we will necessarily end up in the same state. In this sense, the two strings are equivalent: we have forgotten what distinguishes them at the point that they arrive in the same state.

One of the consequences of this idea of states as equivalence classes means that a language can be described as a union of entire equivalence classes. That is, there's no situation where we only take some of the words of some equivalence class to be in the language.

To show this, we show if $x \sim_A y$, then $x \in L \iff y \in L$. That is, if $x$ and $y$ are in the same equivalence class, they must both be in the language. This means that if $x \in L$, then the entire equivalence class $[x]$ is contained in $L$. Suppose $x \sim_A y$. Then $$x \in L \iff \delta(q_0,x) \in F \iff \delta(q_0,y) \in F \iff y \in L.$$

Okay, so let's try to define a similar kind of equivalence relation but based on the language instead of a DFA. That is, we want an equivalence relation about strings where, adding the same suffixes to the strings "does the same thing" to them. But because we're dealing with languages, "the same thing" really boils down to membership in the language.

Let $L \subseteq \Sigma^*$ be a language. We define the relation $\equiv_L$ for words $x,y \in \Sigma^*$ by $x \equiv_L y$ if and only if $\forall z \in \Sigma^*, xz \in L \iff yz \in L$. The relation $\equiv_L$ is the Myhill–Nerode equivalence relation.

Note that we can define $\equiv_L$ for any language, not just those that are regular. It's also important to note that the equivalence relation applies to all words, not just those in $L$.

Reflexivity: $xu \in L \iff xu \in L$ and therefore $x \equiv_L x$.
Symmetry: \begin{align*} x \equiv_L y &\iff \forall u \in \Sigma^*, (xu \in L \iff yu \in L) \\ &\iff \forall u \in \Sigma^*, (yu \in L \iff xu \in L) \\ &\iff y \equiv_L x \\ \end{align*}
Transitivity: Suppose $x \equiv_L y$ and $y \equiv_L z$. Then for all $u \in \Sigma^*$, $xu \in L \iff yu \in L$ and $yu \in L \iff zu \in L$. Thus, $xu \in L \iff zu \in L$ and therefore $x \equiv_L z$.

Suppose that $x \equiv_L y$. Then $xu \in L \iff yu \in L$ for all $u \in \Sigma^*$. Let $u = vz$. Then, $$x(vz) \in L \iff (xv)z \in L \iff (yv)z \in L \iff y(vz) \in L.$$

What this leads to is the major result that we're interested in: a characterization of regular languages that directly invokes the finite property we've been looking for.

Let $L \subseteq \Sigma^*$ be a language. Then $L$ is regular if and only if the Myhill–Nerode relation $\equiv_L$ is of finite index. That is, $\equiv_L$ has finitely many equivalence classes.

Let's put a few things together. DFAs define a right-invariant relation where strings that enter the same state are equivalent and that attaching a suffix to those strings makes them enter the same state. So this means that if $x \sim_A y$, then $xz \sim_A yz$ and $xz \in L$ iff $yz \in L$—that is, they either both enter a final state or they both don't. So there seems to be a connection between a DFA and the Myhill–Nerode relation.

So what's the finite property that the pumping lemma assumes of regular languages? It's really how many Myhill–Nerode equivalence classes they have. And so if a language has infinitely many Myhill—Nerode equivalence classes, it is not regular.

In order to prove the Myhill–Nerode theorem, we'll first prove a lemma relating the equivalence relation $\sim_A$ for a DFA $A$ with the Myhill–Nerode relation for the language.

Let $L \subseteq \Sigma^*$, and let $\approx$ be any right-invariant equivalence relation on $\Sigma^*$ such that $L$ is the union of equivalence classes of $\approx$. Then $\approx$ refines $\equiv_L$.

Suppose $x \approx y$. Then $xu \approx yu$ for all $u \in \Sigma^*$. Since $L$ is the union of equivalence classes of $\approx$, we have $xu \in L$ if and only if $yu \in L$. Therefore, $x \equiv_L y$.

Since $\sim_A$ is a right-invariant relation and $L$ is the union of equivalence classes of $\sim_A$, it satisfies the lemma. We'll use this fact in the proof of the Myhill–Nerode theorem. We'll restate the theorem so that it's about three conditions rather than just two.

$(1) \implies (2)$: If $L$ is regular, then there exists a DFA $A$ that recognizes $L$. Then $L$ is the union of equivalence classes of $\sim_A$, which is a right invariant equivalence relation of finite index.

$(2) \implies (3)$: By Lemma 12.1, $\sim_A$ refines $\equiv_L$. Since $\sim_A$ is of finite index, and the index of $\equiv_L$ is at most the index of $\sim_A$, $\equiv_L$ must also be of finite index.

$(3) \implies (1)$: We define a DFA $A_{\equiv_L} = (Q,\Sigma,\delta,q_0,F)$ that recognizes $L$, with

$Q = \{[x] \mid x \in \Sigma^*\}$,
$q_0 = [\varepsilon]$,
$F = \{[x] \mid x \in L\}$,
$\delta([x],a) = [xa]$.

Here, the states of our DFA are exactly the equivalence classes of $\equiv_L$. Since $\equiv_L$ has finite index, this means the machine we defined has a finite number of states.

To show that $\delta$ is well-defined for our equivalence classes, suppose that $x \equiv_L y$. Since $\equiv_L$ is right-invariant, we have $xa \equiv_L ya$ for $a \in \Sigma$. Then, $$\delta([x],a) = [xa] = [ya] = \delta([y],a).$$

Then to show that $L(A_{\equiv_L}) = L$, we see that for $w \in \Sigma^*$, \begin{align*} w \in L(A_{\equiv_L}) & \iff \delta([\varepsilon],w) \in F \\ &\iff [w] \in F \\ &\iff w \in L \end{align*}

The Myhill–Nerode theorem is called as such because it was independently proved by John Myhill and Anil Nerode in the late 50s. Myhill worked in logic and computability theory moving among philosophy and math departments in the 1950s and eventually settled at SUNY Buffalo in the 1960s. Nerode is currently a professor at Cornell, where he's been a faculty member for 60 years. More importantly, he proved this theorem while he was a PhD student here, working with Saunders Mac Lane.

There are some significant consequences of the Myhill–Nerode theorem. One of the most immediate is that it gives us another way to determine whether a language is regular or not.

Let $L = \{a^n b^n \mid n \in \mathbb N\}$. We claim that the equivalence classes of $\equiv_L$ are $$\{[\varepsilon], [a], [aa], \dots\} \cup \{[b],[ab],[aab],\dots\}.$$ We need to show that each of these classes are actually different and so we need to show that they are pairwise distinct. To show that two classes $[u]$ and $[v]$ are different, we show that there exists a word $w$ such that $uw \in L$ and $vw \not \in L$.

This shows that there are at least as many equivalence classes as we've described, but to fully characterize the language, we want to make sure that we're not missing any other possible classes. Consider a possible equivalence class defined by a word $w$. There are two possibilities.

It's clear from this that $L$ has infinitely many equivalence classes and so it can't be regular. However, if we could build an infinite automaton, the Myhill–Nerode equivalence classes would tell us how to do it (we will not do this).