Finite automata are arguably the simplest model of computation. Finite automata first appeared as far back as the 1940s in the context of early neural networks. The idea was that we wanted to represent an organism or robot of some kind responding to some stimulus, which we can also call events. Upon the occurrence of an event, the robot would change its internal state in response and there would be a finite number of possible such states. Today, the use of finite automata for modelling such systems is still around in a branch of control theory in engineering under the name discrete event systems. But all of this has nothing to do with how we're going to consider finite automata.
First, we'll introduce the idea of a "machine" as used in the theory of computation, which we'll be coming back to throughout the course. A machine has an input tape, divided into cells, with an input word written on it. The machine has a tape head that can scan the input tape, reading a symbol from each cell.
The machine also contains a number of states, and depending on the symbol that is read by the tape head, the machine's internal state will change. Upon reading the entire input, the machine accepts or rejects the input word based on the state that the machine is in. We call these finite automata because they have only a finite number of states. It wasn't stated explicitly, but based on the description, we also assume that the tape head can only move in one direction. Because of this restriction, it's not really necessary to think about the tape portion of the machine, at least for now.
We can represent finite automata by drawing the important part of the machine: the state diagram. We draw each state as a circle and label it as appropriate. Transitions are represented as arrows, leaving one state and entering another, labelled by the appropriate input symbol. We denote accepting states by a double circle and we denote the start state by an arrow that comes in from nowhere in particular.
Formally, a deterministic finite automaton (DFA) is defined as a 5-tuple $A = (Q,\Sigma,\delta,q_0,F)$, where
Note that by our definition, $\delta$ must be defined for every state and input symbol. We call this a deterministic finite automaton because the transition function $\delta$ is uniquely determined by the state and input symbol.
From our example above, we can list all the parts that we've specified in our formal definition:
and $\delta$ is given by \begin{align*} \delta(q_0,a) &= q_1 & \delta(q_0,b) &= q_1 & \delta(q_0,c) = q_0 \\ \delta(q_1,a) &= q_2 & \delta(q_1,b) &= q_0 & \delta(q_1,c) = q_1 \\ \delta(q_2,a) &= q_3 & \delta(q_2,b) &= q_2 & \delta(q_2,c) = q_2 \\ \delta(q_3,a) &= q_0 & \delta(q_3,b) &= q_3 & \delta(q_3,c) = q_0 \end{align*}
The transition function can also naturally be described as a table,
| $a$ | $b$ | $c$ | |
|---|---|---|---|
| $q_0$ | $q_1$ | $q_1$ | $q_0$ |
| $q_1$ | $q_2$ | $q_0$ | $q_1$ |
| $q_2$ | $q_3$ | $q_2$ | $q_2$ |
| $q_3$ | $q_0$ | $q_3$ | $q_0$ |
Intuitively, we have a sense of what acceptance and rejection means. We read a word and trace a path along the automaton and if we end up in an accepting state, the word is accepted and otherwise, the word is rejected. But how would we define all of these notions formally?
First of all, it makes sense that we should be able to define this formally by thinking about the conditions that we've placed on the transition function. Since on each state $q$ and on each symbol $a$, we know exactly what state we end up on, it makes sense that each word takes us to a specific state.
So, we can extend the definition of the transition function so that it is a function from a state and a word to a state, $\hat \delta : Q \times \Sigma^* \to Q$. We define it recursively. First, $\hat \delta(q,\varepsilon) = q$ for all states $q \in Q$. Then, for $w \in \Sigma^*$, if $|w| > 0$, let $w = xa$ with $|a| = 1$ and we have $$\hat \delta(q,w) = \delta(\hat \delta(q,x),a).$$ First, we want to make sure that our intuition about $\hat \delta$ matches with $\delta$, which we'll do with the following simple lemma.
Lemma. Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA with extended transition function $\hat \delta$. Then for any state $q \in Q$ and symbol $a \in \Sigma$, $\hat \delta(q,a) = \delta(q,a)$.
Proof. $\hat \delta(q,a) = \hat \delta(q,\varepsilon a) = \delta(\hat \delta(q,\varepsilon),a) = \delta(q,a)$. $$\tag*{$\Box$}$$
It might be helpful to note that we can also approach this definition from the other direction. That is, instead of writing $w = xa$, we consider the factorization $w = ax$. Then, our definition when $|w| > 0$, becomes $$\hat \delta(q,w) = \hat \delta(\delta(q,a),x).$$
Now, we can formally define the acceptance of a word by a DFA $A = (Q,\Sigma,\delta,q_0,F)$. We say $A$ accepts a word $w \in \Sigma^*$ when $\hat \delta(q_0,w) \in F$. Then the language of a DFA $A$ is the set of all words accepted by $A$, $$L(A) = \{w \in \Sigma^* \mid \hat \delta(q_0,w) \in F \}.$$ We also sometimes say that a word or language is recognized by $A$. It is important to note that when we say that a DFA accepts a language $L$, we are saying that the DFA accepts only those words in $L$ and rejects all words not in $L$.
At this point, I will mention that the class of languages that can be accepted by DFAs is called the class of regular languages, although at this point, it is probably not obvious why we call them regular.
Now, we can begin to think about the kinds of problems that finite automata can solve. As a first example, we'll consider the language $$ L = \{w \in \{0,1\} \mid \text{$w$ is the binary representation of a number that is divisible by 3} \}.$$ Let's build a DFA that recognizes this language. Let $A = (Q,\Sigma,\delta,q_0,F)$, where
| $0$ | $1$ | |
|---|---|---|
| $q_0$ | $0$ | $1$ |
| $0$ | $0$ | $1$ |
| $1$ | $2$ | $0$ |
| $2$ | $1$ | $2$ |
The state diagram for $A$ is shown below.
Let's think about how this machine should work. The DFA works by reading a string from left to right, which means that it'll be reading the most significant bit first. The natural way to think about this problem, then, is to consider what our input modulo 3 should be. Obviously, we want to accept a string if and only if it corresponds to an integer that's equivalent to $0 \bmod 3$. Keeping this in mind, having each state for the different possibilities modulo 3 seems like a reasonable idea.
So, suppose we've read a string $w$ and now we want to read either a $0$ or a $1$. We just have to remember a bit about binary math. Let $n(w)$ denote the integer that $w$ represents in binary. Then, $$n(w0) = 2 \cdot n(w) \quad \text{and} \quad n(w1) = 2 \cdot n(w) + 1.$$ With this in mind, we just need to think about how reading a new bit will change our result modulo 3. We can summarize the changes to $n(w) \bmod 3$ in the following table:
| $n(w) \bmod 3$ | $n(w0) \bmod 3$ | $n(w1) \bmod 3$ |
|---|---|---|
| $0$ | $0$ | $1$ |
| $1$ | $2$ | $0$ |
| $2$ | $1$ | $2$ |
Now, this looks suspiciously like a transition table. Let's prove this formally.
Proposition. $L = L(A)$.
Proof. Let $n(w)$ denote the integer represented by the binary string $w$. We claim that a string $w$ reaches state $i$ if and only if $n(w) = i \bmod 3$, for $i = 0,1,2$. We will show this by induction on the length of $w$.
First, as our base case, let $w = \varepsilon$. Then $\delta(q_0,\varepsilon) = q_0$. Since $n(\varepsilon)$ is not defined, it is not equivalent to $i \bmod 3$ for any $i = 0,1,2$. Thus, our condition holds.
Now suppose $|w| = k$ and for every string $w'$ with length less than $k$, $w'$ reaches state $i$ if and only if $n(w') = i \bmod 3$. Let $w = w' a$, where $w' \in \Sigma^*$ and $a \in \Sigma$. There are two cases to consider.
Thus, we have shown that $w$ reaches a state $i$ if and only if $n(w) = i \bmod 3$. Since $0$ is the only final state of $A$, we can conclude that $A$ accepts $w$ if and only if $n(w)$ is divisible by 3. $$\tag*{$\Box$}$$
From this, one might ask whether it's possible to construct a DFA that recognizes all of the binary strings that represent integers that are divisible by $n$ for any arbitrary integer $n$. The answer is yes, and it's not hard to see why. There are only finitely many equivalence classes modulo $n$ (or rather, exactly $n$) and all we need to do is keep track of how this number changes with each additional symbol read. You can even take this approach and make it work for strings representing integers in any base $k$.
The Hamming distance of two words $x$ and $y$, denoted $H(x,y)$, is defined for $|x| = |y|$ and is the number of positions in which $x$ and $y$ differ. If $|x| \neq |y|$ then $H(x,y)$ is undefined. So the Hamming distance between the words $stars$ and $story$ is 2, since the third and fifth positions of the two words differ.
Now, we want to consider the set of words that are a Hamming distance of at most 2 away from, say, the word $beef$. A straightforward thing to do would be to simply list all of the words and try to construct a DFA that recognizes all of those words. Of course, this is possible since there are only finitely many such words (or are there?). But this is pretty tedious and it's the kind of thing we invented computers to do for us, so there must be a simpler way.
How does this DFA work? We can see that reading the original word, takes us straight to the final state and our word is accepted. However, what happens when we encounter a symbol that differs from what we expect? In our DFA, we descend one level and move to the next state. In other words, we progress horizontally as we read symbols, but we keep track of how many positions have differed from our original word so far. In the end, we either reach rightmost set of states and can accept, or we either differ in too many positions and we end up in the sink state and our word is rejected. There are also other possibilities: our word is shorter or longer than $beef$ and it is not accepted either.
We can generalize this. If we have any word $w = a_1 a_2 \cdots a_n$, we can construct the $k$-neighbourhood if $w$ in the following way: let $A_{w,k} = (Q_{w,k}, \Sigma, \delta_{w,k}, q_0, F_{w,k})$ be a DFA defined by
and the transition function $\delta_{w,k}$ is defined by
This DFA works in roughly the same way as in the example described above. There were a few interesting comments about this construction in class. First, it is true that there are states in this construction and in the example above that are not reachable. We didn't need to draw them or define them and we can get rid of them (an interesting question to consider is how you might go about proving this). Secondly, despite the existence of these unreachable states, the size of this construction is $O(nk)$, measured in the number of states. Finally, a question to think about is how we might prove formally that our DFA construction is correct.