CMSC 28000 — Lecture 1

The Theory of Computation

The title of this course refers to formal languages and you will definitely learn a lot about formal languages. However, for our purposes, formal language theory is really more of the vehicle through which we talk about computation. Many similar courses at other schools will call this course something like Theory of Computation. But what is the theory of computation?

As we know, computer science is mathematics. Generally speaking, one can view math as the study of structures. For example,

Number theory is about the structure of the integers.
Algebra is about algebraic structures, which we get when considering sets and operations on those sets.
Combinatorics is about enumeration of discrete structures.
Calculus and analysis are about continuous structures.

So what are the structures that computer science is concerned about? Let's think back to what we do in computer science:

We abstract and model problems as mathematical problems.
We solve mathematical problems mechanically by exploiting the structure of those problems.
We formalize those solutions into code.

One obvious structure is the structure of the algorithms that solve problems. This is essentially what we study in CMSC 27200—algorithms are structures with certain properties that we can prove things about. But we can also study the structure of the problems.

At first, it might be a bit strange to think of a problem as a mathematical object, but if we read what we said above carefully, we realize that we have always implicitly assumed that problems have some kind of structure. After all, that's why algorithms even work.

So very broadly, the theory of computation is about problems and how to solve them mechanically. Here, I use the word mechanically rather than computationally to emphasize the idea that we're interested in how to solve problems via a well defined process that can be automated rather than focusing on the machines called computers. The key idea in the theory of computation is that all of the objects that we study, problems and the processes by which we can solve these problems, are themselves mathematical objects.

So what is a problem? In a course like CS 141 or Theory of Algorithms, we think of problems as questions which ultimately turn out to be functions, mapping inputs to outputs. This is very clear in CS 141: despite the amount of effort we take to write up problem descriptions, at the end of the day, the correctness of your solution is judged against a test suite that checks whether it gets the expected outputs from your function. However, it's important to separate the mathematical notion of a function from the way we use the term function in programming.

A common misconception is that the definition of a function needs to be something that tells us how to get from an input to an output. After all, we're introduced to functions as lines and curves that are expressed as polynomials. But if we view functions from the perspective of set theory, then they are nothing more than sets of input-output pairs. Again, a common misconception about sets is that their definition needs to be something that tells us how to figure out whether an element is a member of the set or not, but this isn't a requirement.

For instance, we can define a set $X = \{(a,b,c) \in \mathbb Z^3 \mid \exists n \in \mathbb N, a^n + b^n = c^n\}$. For a very long time, no one knew how to figure out what all the members of this set might've looked like, but despite this, we would acknowledge that there's nothing wrong with this definition. So set definitions and by extension function definitions don't actually require the "how".

This is really important because the "how" is the crucial link that brings us over into computer science. When we fill in the how, as we do in situations like in CS 141, what we typically call functions are really algorithms: the steps to computing the function.

This leads us to another assumption that we tend to make about functions: we assume that there's necessarily an algorithm that computes every function. This assumption gets to the heart of what we're going to be exploring in this course. So then solving problems amounts to describing a program or algorithm that correctly computes the function.

However, functions can have all sorts of inputs and outputs, which complicates things if we want to study them. So in this course, and for a lot of computability and complexity theory, we focus on decision problems, which are those questions (or functions) that can be answered by (have an output of) YES or NO.

For instance, we can ask about a natural number and ask for its prime factors. In this case, we'd have a function $\operatorname{Factor}: \mathbb N \to \operatorname{List}(\mathbb N)$ that maps a number to its list of prime factors. However, we might ask instead whether a given natural number $n$, is prime. This question is a yes or no question, say $\operatorname{Prime}:\mathbb N \to \{\mathrm{Yes},\mathrm{No}\}$.

We can get fancier, of course. What if we wanted to receive multiple inputs? Or more exotic inputs? For example, we could ask whether, given a graph $G$, and vertices in that graph $u$ and $v$, whether $G$ contains a path between $u$ and $v$. Again, this is a yes or no question. So in addition to restricting our output, we take another simplification and restrict our set of inputs.

To justify this step, let's think again to the programs we write. We can write functions that take any number of things as inputs: datasets, graphs, dogs. However, while mathematicians can think of these things literally, computer scientists don't—we always have some encoding of our data into strings. So we assume that all problems have string inputs.

So we've taken arbitrary problems as functions \[P : \mathrm{IN} \to \mathrm{OUT}\] and restricted it to string functions with binary outputs \[P : \Sigma^* \to \{\mathrm{Yes}, \mathrm{No}\}.\] This is great because it gives us a very clear set of functions to work with. But we're not quite done just yet.

One observation we can make is that such a function essentially defines a set of strings. If $P$ is our function for our decision problem, then we can define a set \[\{x \in \Sigma^* \mid P(x) = \mathrm{Yes}\},\]a the set of strings for which $P$ answers Yes. Then the complement of this set is all of the other strings.

This is a very important sleight of hand—instead of thinking about functions, we can now reframe our questions to be about sets of strings. In essence, we've worked our way from thinking about problems to thinking about sets of strings. What's interesting about this is that this allows us to divorce the meaning of the problem (a function in some interesting domain) and instead focus on the structure of the strings in this set.

And so we have established the basic connection between the study of problems and how to compute them and the study of the mathematical properties of strings and sets of strings. Sets of strings are called languages in formal language theory. Then solving a problem amounts to solving the langauge membership problem for strings. It seems very strange to be be able to reduce computation to thinking about strings, but this is the origin of some of the most fundamental results in our field.

Strings

Languages are sets of strings, so to talk about languages, we need to talk about strings first. We typically first run into strings when we first learn to program. They also happen to be an interesting mathematical object.

An alphabet is a finite set of symbols. We typically denote alphabets by $\Sigma$.

When we think of an alphabet we usually think of something like Latin alphabet $\{a,b,c,d,\dots,z\}$. However, we can take any non-empty finite set to be an alphabet. An alphabet that we're all probably very familiar with is the binary alphabet $\{0,1\}$. From this, we can make binary strings. We can also consider more exotic alphabets if we wish. For instance, we might consider the alphabet of DNA bases $\{A,C,G,T\}$. Or we can consider a very large, but still finite, alphabet like Unicode.

A string or word $w = a_1 a_2 \cdots a_n$ over an alphabet $\Sigma$ is a finite sequence of symbols where each $a_i \in \Sigma$. The set of all strings over $\Sigma$ is denoted $\Sigma^*$.

Depending on the particular subfield of research, strings are sometimes called words. And using word to mean string makes some sort of sense since we're talking about "langauges". Anyhow, I can guarantee that even if I promise to try to use one consistently, I will slip and use the other, so I will make no such promises.

This definition for a string is a typical definition and it's probably one that you're used to. However, a slightly more insightful and formal way to define strings is by defining

A string or word $w$ over an alphabet $\Sigma$ is either

the empty string $\varepsilon$, or
$ax$, where $a \in \Sigma$ and $x$ is a string.

At first glance, the empty string seems like a strange object, but if you've ever programmed, then you're used to the fact that "" is a string. While in programming, we often treat the empty string almost as an annoyance, we'll see that the empty string plays an important role in the mathematical theory of strings.

As with other recursive definitions, this definition gives us a natural way to define other properties of strings and perform induction on them.

The length of a word $w$ is denoted by $|w|$ and is the number of symbols in $w$. The length of a string can be defined inductively by

if $w = \varepsilon$, then $|w| = 0$, and
if $w = ax$ for $a \in \Sigma$ and $x \in \Sigma^*$, then $|w| = 1 + |x|$.