CMSC 28000 — Lecture 1

Welcome to CMSC 28000! Basic course information can be found on the home page of the course website.

What is this course about?

The title of this course refers to formal languages and you will definitely learn a lot about formal languages. However, for our purposes, formal language theory is really more of the vehicle through which we talk about computation. Many similar courses at other schools will call this course something like Theory of Computation.

By this time, you've probably spent some time "doing" computer science: writing code, designing algorithms, analyzing programs, and so on, so you might have an idea of what "computing" involves. All of this sort of touches on what the theory of computation is about.

Very broadly, the theory of computation is about problems and how to solve them mechanically. Here, I use the word mechanically rather than computationally to emphasize the idea that we're interested in how to solve problems via a well defined process that can be automated. The key idea in the theory of computation is that all of the objects that we study, problems and the processes by which we can solve these problems, are themselves mathematical objects.

The aim of this course, then, is to introduce and formalize these notions of computation. We will be exploring different models of computation, starting with some very simple ones, and studying their limitations. As we run into these limits, we will introduce concepts that will give our models more power. And so, the main question of the course is this: Are there limits to what kinds of problems that computational devices can solve?

Why should you take this course?

Foundations. The theory of computation is essentially early computer science. Many of the fundamental ideas of computer science (computability, simulation, complexity) were established here.
Historical perspective. The ideas that led to the development of the theory of computation, and by extension, the field of computer science as a discipline, stretch back to the early 20th century, before the invention of the first physical computers.
You like theory. This course is one of the "third" theory courses that you need to take to satisfy the CS major requirements. Up until recently, it was one of only two such courses (the list has expanded since). However, this course will give you the fundamentals to continue studying theoretical computer science, particularly if you're interested in formal language theory specifically.
It's interesting. There are some fascinating mathematical problems to be solved and interesting non-mathematical questions that arise as a consequence of some of the ideas we'll discuss.

What will we be covering in this course?

This course is divided roughly into three parts, corresponding to models of computation and the problems they can solve, beginning with the simplest and progressing from there. In each part, we consider the model of computation, various ways to represent them, and the limits on the kinds of problems they can solve.

This is a theoretical course, and we'll be discussing theorems and how to prove them. To succeed in this course, you should be comfortable with mathematical proof. Although it's not a prerequisite, many of the results in this course will depend on ideas from discrete math, particularly mathematical induction.

Problems

So what is a problem? In this course and for a lot of computability and complexity theory, we focus on decision problems, which can be answered by a YES or NO. In this sense, we can view a problem as a function: given some input, our function tells us YES or NO.

For instance, a well-known problem would be something like: Given a natural number $n$, is $n$ a prime number? This is a yes or no question and we can define a function, say $\operatorname{Prime}:\mathbb N \to \{\mathrm{Yes},\mathrm{No}\}$.

We can get fancier, of course. What if we wanted to receive multiple inputs? Or more exotic inputs? For example, we could ask whether, given a graph $G$, and vertices in that graph $u$ and $v$, whether $G$ contains a path between $u$ and $v$. Again, this is a yes or no question.

Viewing such problems as functions is a good start, but we want something more general to work with. In the above examples, there's no unification in terms of the number of inputs or the domains of the functions, which makes it a bit difficult to talk about the two questions in the same way.

One of the first insights you will have internalized already is that we can represent pretty much anything we would like with the proper encoding. Generally speaking, this means that we can treat any input as a string. We can even treat "multiple" inputs, as in the case of our graph problem, as a single string, by encoding the delimiters appropriately as well. In this way, we can now define decision problems a functions from strings to Yes or No.

Another observation is that such a function essentially defines a set of strings. If $f$ is our function for our decision problem, then we can define $A$ to be the set of strings for which $f$ answers Yes and the complement of $A$ is all of the other strings.

And so we have already established a basic connection between the study of problems and how to compute them and the study of the mathematical properties of strings and sets of strings.

Strings

We typically first run into strings when we first learn to program. They also happen to be an interesting mathematical object.

Definition 1.1. An alphabet is a finite set of symbols, denoted by $\Sigma$.

When we think of an alphabet we usually think of something like Latin alphabet $\{a,b,c,d,\dots,z\}$. However, we can take any non-empty finite set to be an alphabet. An alphabet that we're all probably very familiar with is the binary alphabet $\{0,1\}$. From this, we can make binary strings. We can also consider more exotic alphabets if we wish. For instance, we might consider the alphabet of DNA bases $\{A,C,G,T\}$. Or we can consider a very large, but still finite, alphabet like Unicode.

Definition 1.2. A string or word $w = a_1 a_2 \cdots a_n$ over an alphabet $\Sigma$ is a finite sequence of symbols where each $a_i \in \Sigma$. The length of a word $w$ is denoted by $|w|$ and is the number of symbols in $w$. We denote by $\varepsilon$ the empty word, and $|\varepsilon| = 0$. Sometimes, we also want to consider the number of a particular alphabet symbol in a word. We denote by $|w|_a$ the number of occurrences of the letter $a$ in $w$.

Depending on the particular subfield of research, strings are sometimes called words. I can guarantee that even if I promise to try to use one consistently, I will slip and use the other, so I will make no such promises.

The most basic operation that we can perform on strings is concatenating them.

Definition 1.3. The concatenation of two strings $u = a_1 a_2 \cdots a_m$ and $v = b_1 b_2 \cdots b_n$ is $uv = a_1 \cdots a_m b_1 \cdots b_n$.

Observe that concatenation is associative, since $(uv)w = u(vw)$. However, it is not commutative (i.e. $uv$ is not necessarily equal to $vu$). Usually, we learn about concatenation when we first learn to program and we think of it as sticking two strings together. This works even for the empty string. If we have a string $w$, then $w \epsilon = \epsilon w = w$. This leads us to a very interesting way to consider strings.

Definition 1.4. A monoid is a structure $\langle M,\cdot,1 \rangle$, where $M$ is a set, $\cdot$ is an associative binary operation, and $1$ is an identity for $\cdot$.

From this, we can see that the set of strings, with the binary operation concatenation, is a monoid. We even have an identity element: the empty string. Of course, at this point, this is little more than a curiosity, but it does point to a few things. First of all, strings aren't just arbitrary objects; they have mathematical meaning and structure. Secondly, monoids are algebraic objects (just add inverses and you've got groups!), and this points to the possibility that studying formal language theory using algebraic tools may not be such a bad idea.

Let's go back to concatenation and define a few more notions.

Definition 1.5. The $k$th power of a string $w$ is $$w^k = \overbrace{ww \cdots w}^{k \text{ times}}$$ and we define $w^0 = \varepsilon$. Formally, we define this inductively by

$w^0 = \varepsilon$, and
$w^k = w^{k-1} w$.

There are many definitions that have a natural recursive definition. If you aren't already, you should get comfortable with inductive definitions because they will be useful when we begin proving things.

Definition 1.6. If $w = uxv$ is a string, we say $u$ is a prefix of $w$, $v$ is a suffix of $w$, and $x$ is a subword or infix or substring or factor of $w$.

Much like the usage of string or word, there are a plethora of terms that can be used interchangeably with what you're probably most familiar with as a substring. The term subword is the same idea, if you were to use word instead of string. Similarly, the term infix comes naturally from the analogy with prefix and suffix. The one which may be less obvious is factor; this comes from viewing the string as an algebraic object.

Definition 1.7. The reversal of a string $w = a_1 a_2 \cdots a_n$ is the string with the symbols written in reverse order, $w^R = a_n \cdots a_2 a_1$. Formally,

if $w = \varepsilon$, then $\varepsilon^R = \varepsilon$, and
if $w = xa$ for a string $x \in \Sigma^*$ and a symbol $a \in \Sigma$, then $(xa)^R = ax^R$.

If $w = w^R$, we say $w$ is a palindrome

We'll see how useful this turns out to be by proving a quick result about reversal and practice induction (which you are hopefully familiar with) as a bonus.

Theorem 1.8. $(xy)^R = y^R x^R$.

Proof. We will show this by induction on $|y|$.

For our base case, let $|y| = 0$. This means that $y = \varepsilon$. We have \begin{align*} (xy)^R &= (x\varepsilon)^R &\text{substitution}\\ &= x^R &\text{definition of $\varepsilon$} \\ &= \varepsilon^R x^R &\text{definition of $\varepsilon$}\\ &= y^R x^R & \text{substitution} \end{align*}

Now, for our induction step, suppose that for some $k \in \mathbb N$, all words $z \in \Sigma^*$ with $|z| = k$ satisfy $(xz)^R = z^R x^R$. Consider a word $y$ of length $k+1$. We write $y = za$, where $z \in \Sigma^*$ is a word of length $k$ and $a \in \Sigma$ is a single letter. Then we have $xy = xza$ and \begin{align*} (xy)^R &= (xza)^R &\text{subtitution}\\ &= a(xz)^R &\text{definition of reversal}\\ &= a(z^R x^R) &\text{inductive hypothesis}\\ &= (za)^R x^R &\text{definition of reversal}\\ &= y^R x^R &\text{subtitution} \end{align*} $\tag*{$\Box$}$