CMSC 28000 — Lecture 1

The title of this course refers to “formal languages”. You will definitely learn a lot about formal languages. However, for our purposes, formal language theory is really more of the vehicle through which we talk about computation. Many similar courses at other institutions will call this course something like “Theory of Computation”. But what is “theory of computation”? Is it any different from “computer science”?

As we know, computer science is a branch of mathematics. Generally speaking, one can view math as the study of abstract structures. For example,

So what are the structures that computer science is concerned about? Let’s think back to what we do in computer science:

One obvious structure is the structure of the algorithms that solve problems. This is essentially what we study in CMSC 27200—algorithms are structures with certain properties that we can prove things about. But near the end of CMSC 27200, the script gets flipped: in discussing the notion of intractability, we begin to study the structure of the problems themselves.

At first, it might be a bit strange to think of a problem as a mathematical object, but if we read what we said above carefully, we realize that we have always implicitly assumed that problems have some kind of structure. After all, that’s why algorithms even work.

So very broadly, the theory of computation is about problems and how to solve them mechanically. Here, I use the word mechanically rather than computationally to emphasize the idea that we’re interested in how to solve problems via a well defined process that can be automated rather than focusing on the physical machines that we think of as computers. The key idea in the theory of computation is that all of the objects that we study, problems and the processes by which we can solve these problems, are themselves mathematical objects.

Problems are just sets of strings

So what exactly is a problem? In a course like CS 141 or Theory of Algorithms, we think of problems as questions which ultimately turn out to be functions, mapping inputs to outputs. This is very clear in CS 141: despite the amount of effort we take to write up problem descriptions, at the end of the day, the correctness of your solution is judged against a test suite that checks whether it gets the expected outputs from your function. However, it’s important to separate the mathematical notion of a function from the way we use the term function in programming.

A common misconception is that the definition of a function needs to be something that tells us how to get from an input to an output. After all, we’re introduced to functions as lines and curves that are expressed as polynomials. But if we view functions from the perspective of set theory, then they are nothing more than sets of input-output pairs.

A similar misconception about sets is that their definition needs to be something that tells us how to figure out whether an element is a member of the set or not, but this isn’t a requirement. For instance, we can define a set of triples of integers $$\{(a,b,c) \in \mathbb Z^3 \mid \exists n \in \mathbb N, a^n + b^n = c^n\}.$$ For a very long time, no one knew how to figure out what all the members of this set might’ve looked like, but despite this, we would acknowledge that there’s nothing wrong with this definition. So set definitions and, by extension function definitions, don’t actually require the “how”.

This is really important because the “how” is the crucial link that brings us over into computer science. When we fill in the how, as we do in situations like in CS 141, what we typically call “functions” are really algorithms: the steps for computing the (abstract) function. So then solving problems amounts to describing a program or algorithm that correctly computes the function.

It’s not hard to see how the belief that functions are just algorithms leads to the assumption that every problem can be solved via some algorithm. This assumption gets to the heart of what we’re going to be exploring in this course.

However, functions can have all sorts of inputs and outputs, which complicates things if we want to study them. So in this course, and for a lot of computability and complexity theory, we focus on decision problems, which are those questions (or functions) that can be answered by (have an output of) YES or NO.

For instance, we can ask about a natural number and ask for its prime factors. In this case, we’d have a function $\operatorname{\textsf{PrimeFactors}}: \mathbb N \to \operatorname{\textsf{Set}}(\mathbb N)$ that maps a number to its set of prime factors. However, we might ask instead whether a given natural number $n$, is prime. This question is a yes or no question, say $\operatorname{\textsf{IsPrime?}}:\mathbb N \to \{\mathrm{Yes},\mathrm{No}\}$.

We can get fancier, of course. What if we wanted to receive multiple inputs? Or more exotic inputs? For example, we could ask whether, given a graph $G$, and vertices in that graph $u$ and $v$, whether $G$ contains a path between $u$ and $v$. Again, this is a yes or no question. So in addition to restricting our output, we take another simplification and restrict our set of inputs.

To justify this step, let’s think again to the programs we write. We can write functions that take any number of things as inputs: lists, graphs, dogs. However, while mathematicians can think of these things literally, computer scientists don’t—we always have some encoding of our data into strings.

In programming, a string is just a sequence of characters. We often view strings as text data. Even at the level of defining a class, we have to express the description of the class in text: what are the names and types of the attributes and methods?

So we will assume that all problems have string inputs. Now we’ve taken arbitrary problems as functions \[P : \mathsf{Input} \to \mathsf{Output}\] and restricted it to string functions with binary outputs \[P : \mathsf{Strings} \to \{\mathrm{Yes}, \mathrm{No}\}.\] This is great because it gives us a very clear set of functions to work with. But we’re not quite done just yet.

One observation we can make is that such a function, with only two outputs, essentially defines a set of strings. If $P$ is our function for our decision problem, then we can define a set \[\{x \in \mathsf{Strings} \mid P(x) = \mathrm{Yes}\},\] the set of strings for which $P$ answers Yes. Then the strings for which $P$ answers No, the complement of this set, is all the other strings not in this set.

This is a very important sleight of hand because instead of thinking about functions, we can now reframe our questions to be about sets of strings:

What’s interesting about this is that this allows us to divorce the meaning of the problem (a function in some interesting domain) and instead focus on the structure of the strings in a set. In some sense, we have now abstracted our abstraction.

Perhaps this sounds a bit too high-level and is the result of theoreticians trying to avoid the consequences of the real world. But if we take a trip going down past the theory of algorithms, past the text of source code, down underneath the operating system and compiler, and past the assembly code, we find machine code and we find transistors and voltage passing through wire. All the way on the opposite side of theory, deep inside the real, cold machine, we find sequences of bits: 1s and 0s—binary strings.

This is the basic connection between the study of problems and how to compute them and the study of the mathematical properties of strings and sets of strings. Sets of strings are called languages in formal language theory. Then solving a problem amounts to solving the language membership problem for strings. It seems a bit strange to be be able to reduce computation to thinking about strings, but this is the origin of some of the most fundamental results in our field.

Strings

Languages are sets of strings, so to talk about languages, we need to talk about strings first. We typically first run into strings when we first learn to program. They also happen to be an interesting mathematical object.

When we think of an alphabet we usually think of alphabets for written languages like the Latin alphabet, $\{a,b,c,d,\dots,z\}$. However, we can take any non-empty finite set to be an alphabet.

An alphabet that we’re all probably very familiar with is the binary alphabet $\{0,1\}$. From this, we can make binary strings.
We can also consider more exotic alphabets if we wish. For instance, we might consider the alphabet of DNA bases, $\{A,C,G,T\}$.
Or we can consider a very large, but still finite, alphabet like Unicode.
But alphabets don’t have to be literal symbols. For example, we can define an alphabet consisting of the set of system calls, of which there are finitely many. So this would be a set $\{\mathtt{read}, \mathtt{write}, \mathtt{open}, \mathtt{close}, \mathtt{fork}, \dots\}$.

Just as in mathematical logic, where we set our domain of discourse, the alphabet serves the same purpose. We typically fix the alphabet ahead of anything else and all further discussion is with respect to the alphabet. We do not typically mix alphabets.

A string or word $w = a_1 a_2 \cdots a_n$ over an alphabet $\Sigma$ is a finite sequence of symbols where each $a_i \in \Sigma$. The set of all strings over $\Sigma$ is denoted $\Sigma^*$.

Depending on the particular geographic or research community, strings are sometimes called words. Using word to mean string does make sense since we’re talking about “languages”. On observation, it seems Americans and computer scientists are more likely to call these strings, while Europeans and mathematicians are more likely to call these words.

Here are some strings over the alphabets we mentioned earlier.

$00110101$ is a string over the alphabet $\{0,1\}$.
$GGTACAGATCTA$ is a string over the alphabet $\{A,C,G,T\}$.
$\text{くéaîß🤔ん}$ is a string over the Unicode alphabet.
$\verb|open() read() close()|$ is a string over the system call alphabet.

This definition for a string is a typical definition and it’s probably one that you’re used to. For instance, in C, a string is nothing but an array of chars. However, a slightly more insightful and formal way to define strings is by defining them as a recursive object.

A string or word $w$ over an alphabet $\Sigma$ is

the empty string, denoted $\varepsilon$, or
$ax$, where $a \in \Sigma$ and $x \in \Sigma^*$ is a string.

At first glance, the empty string seems like a strange object, but if you’ve ever programmed, then you’re used to the fact that "" is a string. While in programming, we often treat the empty string almost as an annoyance, we’ll see that the empty string plays an important role in the mathematical theory of strings.

As with other recursive definitions, this definition gives us a natural way to define other properties of strings and perform induction on them.

The length of a word $w$ is denoted by $|w|$ and is the number of symbols in $w$. The length of a string can be defined inductively by

if $w = \varepsilon$, then $|w| = 0$, and
if $w = ax$ for $a \in \Sigma$ is a symbol and $x \in \Sigma^*$ is a string over $\Sigma$, then $|w| = 1 + |x|$.

The concatenation of two strings $u = a_1 a_2 \cdots a_m$ and $v = b_1 b_2 \cdots b_n$, denoted $u \cdot v$ or simply $uv$, is $uv = a_1 \cdots a_m b_1 \cdots b_n$.

The use of $\cdot$ here is suggestive of “product” or “multiplication”. Just like with multiplication, we can denote concatenation explicitly, using $\cdot$, but most of the time, we do so implicitly.

Observe that concatenation is associative (that is, $(uv)w = u(vw)$): \[(to \cdot ma) \cdot to = toma \cdot to = tomato = to \cdot mato = to \cdot (ma \cdot to).\]

However, it is not commutative (i.e. $uv$ is not necessarily equal to $vu$): \[to \cdot mato = tomato \neq matoto = mato \cdot to.\]

This makes sense: the order in which we combine strings doesn’t really matter, but the order that we place them in does because they are read in a particular direction.

Usually, we learn about concatenation when learning to program. Depending on how you learned to program, you may think of it as sticking two strings together. But if you learned to program in a functional programming language, then you will have seen a version of the following alternative definition before.

The concatenation of two strings $u$ and $v$ is defined inductively:

if $u = \varepsilon$, then $uv = \varepsilon v = v$, or
if $u = ax$ for $a \in \Sigma$ and $x \in \Sigma^*$, then $u \cdot v = a(x \cdot v)$.

To see that this works, just follow the definition: \begin{align*} cat \cdot hat &= c(at \cdot hat) \\ &= c(a(t \cdot hat)) \\ &= c(a(t(\varepsilon \cdot hat))) \\ &= c(a(t(hat))) \\ &= c(a(that)) \\ &= c(athat) \\ &= cathat \\ \end{align*}

This definition has a few advantages over the more qualitative one. As with other recursive definitions, this one acts as an algorithm as well—it tells us how two strings are to be concatenated. And because it is more explicit, it allows us to actually prove that concatenation is associative (give it a shot), rather than just notice it.

It also becomes more clear in this view of strings that alphabet symbols are not technically considered strings. This is similar to the way that some programming languages distinguish characters from strings. Sometimes, this distinction will be very helpful in proofs. Other times, like in Python, we can be lazy and not bother making this distinction.

Notice that the more explicit definition also makes it clear that we can perform concatenation on the empty string. If we have a string $w$, then $w \varepsilon = \varepsilon w = w$. (Though technically, based on the definition, we only really know $\varepsilon w = w$ by definition and $w \varepsilon = w$ can be proved using the definition)

An interesting consequence of these definitions is that they set us up to define an algebraic structure.

A monoid is a structure $\langle M,\cdot,1 \rangle$, where $M$ is a set, $\cdot$ is an associative binary operation, and $1$ is an identity for $\cdot$.

From this, we can see that the set of strings, with the binary operation concatenation and identity element $\varepsilon$, is a monoid, and this is where $\Sigma^*$ gets its notation from—it is the free monoid on the set $\Sigma$. This is an important connection to be aware of because it tells us that strings aren’t just arbitrary objects made up by programming language designers who needed a way to represent text. In fact, strings are real mathematical objects.

The other thing we get from this is that since monoids are algebraic objects (just add inverses and you’ve got groups!), and this points to the possibility of studying formal language theory using algebraic tools. For instance, one neat thing this gives us is the ability to define a structure-preserving map on strings.

Let $\Sigma$ and $\Gamma$ be two alphabets. A (string) homomorphism is a function $h:\Sigma^* \to \Gamma^*$ such that for all $u,v \in \Sigma^*$, $h(uv) = h(u) h(v)$.

This is just the definition of a homomorphism but stated specifically for strings. This is not surprising if you’ve done some algebra since based on our prior discussion this is just a monoid homomorphism. Other kinds of structure-preserving transformations you may have encountered before include graph isomorphisms (a very specific kind of graph homomorphism) and linear transformations (vector space homomorphisms).

CMSC 28000 — Lecture 1

The Theory of Computation

Problems are just sets of strings

Strings