CS 360 — Lecture 1

Welcome to CS 360! Basic course information can be found on the home page of the course website.

What is this course about?

This course is about computation. By this time, you've probably been "doing" computer science for a few years: writing code, designing algorithms, analyzing programs, and so on, so you might have an idea of what "computing" involves. All of this sort of touches on what the theory of computation is about.

Very broadly, the theory of computation is about problems and how to solve them mechanically. Here, I use the word mechanically rather than computationally to emphasize the idea that we're interested in how to solve problems via a well defined process that can be automated. The key idea in the theory of computation is that all of the objects that we study, problems and the processes by which we can solve these problems, are themselves mathematical objects.

The aim of this course, then, is to introduce and formalize these notions of computation. We will be exploring different models of computation, starting with some very simple ones, and studying their limitations. As we run into these limits, we will introduce concepts that will give our models more power. And so, the main question of the course is this: Are there limits to what kinds of problems that computational devices can solve?

Why should you take this course?

Foundations. The theory of computation used to be the entirety of what computer science was about. Many of the fundamental ideas of computer science (computability, simulation, complexity) were established here.
Historical perspective. The ideas that led to the development of the theory of computation, and by extension, the field of computer science as a discipline, stretch back to the early 20th century, before the invention of the first computers.
You like theory. This course is the entry point to theoretical computer science. This course will give you the fundamentals to continue studying theoretical computer science and is a prerequisite for the TCS courses here.
It's interesting. There are some fascinating mathematical problems to be solved and interesting non-mathematical questions that arise as a consequence of some of the ideas we'll discuss.

What will we be covering in this course?

This course is divided roughly into three parts.

Regular languages. You've already seen these in CS 241. We'll be formally defining DFAs and NFAs and regular expressions. We'll be studying their properties, showing that they're equivalent models, and showing how to show that a language isn't regular.
Context-free languages. You've also seen these in CS 241. We'll be formally defining context-free grammars, etc, and defining an equivalent machine model for CFLs.
Turing machines. These will be completely new to you. We'll be discussing Turing's machine, its power, computability, and uncomputability. If we have time, we may get into computational complexity.

At this point, you might be wondering what, say, regular languages and finite automata have to do with computation. This has to do with how we will be modeling "computation" and "problems". But first, we will need to refresh ourselves on some basics of formal language theory.

Strings

An alphabet is a finite set of symbols, denoted by $\Sigma$. When we think of an alphabet we usually think of something like Latin alphabet $\{a,b,c,d,\dots,z\}$. An alphabet that we're all also probably very familiar with is the binary alphabet $\{0,1\}$. However, we can consider more exotic alphabets if we wish. For instance, we might consider the alphabet of DNA bases $\{A,C,G,T\}$. Or we can consider a very large, but still finite, alphabet like Unicode.

A string or word $w$ over an alphabet $\Sigma$ is a finite sequence of symbols from an alphabet. The length of a word $w$ is denoted by $|w|$ and is the number of symbols in $w$. We denote by $\varepsilon$ the empty word, and $|\varepsilon| = 0$. Sometimes, we also want to consider the number of a particular alphabet symbol in a word. We denote by $|w|_a$ the number of occurrences of the letter $a$ in $w$.

There are a number of operations that we can perform on strings. The concatenation of two strings $u = a_1 a_2 \cdots a_m$ and $v = b_1 b_2 \cdots b_n$ is $uv = a_1 \cdots a_m b_1 \cdots b_n$. The $k$th power of a string $w$ is $$w^k = \overbrace{ww \cdots w}^{k \text{ times}}$$ and we define $w^0 = \varepsilon$. Alternatively, we can define this recursively: let $w^0 = \varepsilon$ and define $w^k = w^{k-1} w$.

If $w = uxv$ is a string, we say $u$ is a prefix of $w$, $v$ is a suffix of $w$, and $x$ is a subword or infix or substring or factor of $w$.

The reversal of a string $w = a_1 a_2 \cdots a_n$ is the string with the symbols written in reverse order, $w^R = a_n \cdots a_2 a_1$. If $w = w^R$, we say $w$ is a palindrome. Now, here, we'll observe that reversal is another operation like power, where there is a nice recursive definition: let $\varepsilon^R = \varepsilon$ and define for a string $x \in \Sigma^*$ and a symbol $a \in \Sigma$, $(xa)^R = ax^R$.

We'll see how useful this turns out to be by proving a quick result about reversal and practice induction as a bonus.

Theorem. $(xy)^R = y^R x^R$.

Proof. We will show this by induction on $|y|$.

For our base case, let $|y| = 0$. This means that $y = \varepsilon$. We have $$(xy)^R = (x\varepsilon)^R = x^R = \varepsilon^R x^R = y^R x^R.$$

Now, for our induction step, let $|y| = k > 0$ and suppose that we have $(xz)^R = z^R x^R$ for all words $z$ with $|z| < k$. We write $y = za$ for a single letter $a$. Then we have $xy = xza$ and $$(xy)^R = (xza)^R = a(xz)^R = a(z^R x^R) = (za)^R x^R = y^R x^R.$$ $$\tag*{$\Box$}$$

Languages

We denote by $\Sigma^*$ the set of all finite words over $\Sigma$. A language is a subset of $\Sigma^*$. We describe a language in the same way as for any other set. Here are some examples of languages.

$\{ w \in \{a,b\}^* \mid |w|_a \geq |w|_b \}$
$\{ w \in \Sigma^* \mid w = w^R \}$
$\{ w \in \{0,1\}^* \mid \text{$w$ is the binary representation of a prime number} \}$
$\{ w \in \{A,C,G,T\}^* \mid \text{$ACC$ is not a substring of $w$} \}$

Since languages are sets, we can perform any set theoretic operations we're familiar with. For a language $L \subseteq \Sigma^*$, the complement of $L$ is $\overline L = \Sigma^* \setminus L$.

This also means there are a few things we have to be careful with. For example, there is the notion of the empty language $\emptyset$, which is the language with no strings in it. This is different from the language $\{\varepsilon\}$, which is the language containing the empty string.

We can also extend many operations on strings to operations on languages. For two languages $L_1$ and $L_2$, we define the catenation of $L_1$ and $L_2$ by $$ L_1 L_2 = \{uv \mid u \in L_1, v \in L_2 \}.$$ We can define power in a similar fashion. We let $L^0 = \{\varepsilon\}$ and define the $k$th power of $L$ recursively by $L^k = L^{k-1} L$. The Kleene star or Kleene closure of $L$ is then defined by $$ L^* = \bigcup_{i \geq 0} L^i.$$ We can similarly define the positive Kleene closure of $L$ by $$ L^+ = \bigcup_{i \geq 1} L^i.$$ In other words $L^+ = L^* \setminus \{\varepsilon\}$.

Now, we'll take another opportunity to prove a simple theorem to see how to show that two languages are equal.

Theorem. $L^2 \subseteq L$ if and only if $L = L^+$.

Proof. First, we assume that $L = L^+$. Since $L^2 \subseteq L^+$ for every language $L$, if $L = L^+$, then we must have $L^2 \subseteq L$.

Now, we assume that $L^2 \subseteq L$. To show that $L = L^+$, we must show that $L \subseteq L^+$ and $L^+ \subseteq L$. The first direction is easy: $L \subseteq L^+$ for all languages $L$. So we just need to show $L^+ \subseteq L$.

Since $L^+ = \bigcup_{i \geq 1} L^i$, we will show this by induction on $i$. First, let $i = 1$. Then we have $LL^1 = L^2 \subseteq L$ by our assumption.

Now, we assume that $i > 1$ and suppose that $L L^k \subseteq L$ for all $k \leq i$. Consider a word $w \in L L^i$. We can write $w = uv$ where $u \in L$ and $v \in L^i$. Then by our induction hypothesis, we have $L^i = L L^{i-1} \subseteq L$ and $v \in L$. But this means that $uv \in L^2$, which means that $uv = w \in L$ by our initial assumption and we have shown that $LL^i \subseteq L$ for all $i \geq 1$. Thus, $L^+ \subseteq L$ and we have shown that $L = L^+$. $$\tag*{$\Box$}$$

Problems

Now, we can consider the notion of a problem. In this course and for a lot of computability and complexity theory, we focus on decision problems, which can be answered by YES or NO.

Let's consider another problem. Suppose you are given a word $w$ and a language $L$. The most basic question that we can ask is whether $w$ is a member of $L$. If we consider an example language from earlier, $$L = \{x \in \{0,1\}^* \mid \text{$x$ is the binary representation of a prime number} \},$$ then we get the following implication: if we can come up with a way to solve the language membership problem for this language, then we've come up with a way to determine whether a given integer is a prime number or not.

Or, we could consider something a bit more challenging, $$L = \{x \in \{0,1\}^* \mid \text{$x$ is the binary representation of an odd perfect number} \}.$$ Again, the implication here is that if we could solve the language problem for this, we could figure out whether a given number is an odd perfect number. This would be very helpful, as we currently don't know whether an odd perfect number exists.

Of course, that seems like a pretty natural language to be able to define. After all, numbers are just binary. But if we try hard enough, we can play the same game with a lot of other kinds of objects and problems. For instance, we can consider the following language, $$\{x \in \{0,1\}^* \mid \text{$x$ is the encoding of a graph $G$ that contains a path from a vertex $s$ of $G$ to a vertex $t$ of $G$} \}.$$ There are a few assumptions I've made here, but they're all reasonable.

We can encode a graph $G$ as a binary string. This is pretty simple, since we can just list the vertices and edges of $G$ as a string and convert that into binary.
We can encode multiple pieces of information. Here, we assume that $x$ contains the following information: a graph $G$ and the label of two vertices $s$ and $t$.

Again, we arrive at the conclusion that if we can figure out whether a binary string belongs to this language, we've basically solved this problem.

At this point, you can probably see where we're going with this: decision problems and languages are one and the same. In fact, it is assumed in the literature that decision problems and languages are essentially interchangeable. This may still seem a bit unusual, but recall that one of the lessons of CS 241 is that everything we do on a computer ultimately boils down to 0s and 1s at the machine level. This has to do with the physical realities of computers and electronics and such, but it turns out things are not so different in the world of theory.

From this, we can begin to see how formal language theory plays a vital part in our study of computation. In this setting, computational models are nothing more than objects that we can use to describe languages and, by extension, solve problems.