CS 245 — Lecture 2

Well-Formed Formulas

So far, we have just introduced a bunch of symbols, but we haven't defined any particular way to use them. Of course, the way we defined the connectives seems to imply some sort of proper way we should use them to impart some meaning to them. Without any further direction, we can throw together a string of symbols.

Definition 2.1. We say an expression is a finite sequence of symbols. The length of an expression is the number of symbols it contains.

So an expression can be something like

$\alpha = (p \vee q)$
$\beta = ((())p\neg$
$\gamma = p \wedge \neg q \rightarrow r$

Note that we use something that's not a symbol to name an expression, typically lowercase Greek letters.

So how would we interpret these? The expression $\alpha$ is obvious and $\beta$ is obviously nonsense, but what about $\gamma$? It seems like there are multiple ways of interpreting this. Ideally, we would like to work with formulas that only have one possible interpretation. Is this possible?

Definition 2.2. Let $\mathcal P$ be a set of propositional variables. We define the set of well-formed formulas over $\mathcal P$ inductively as follows.

A propositional variable in $\mathcal P$ is well-formed.
If $\alpha$ is well-formed, then $(\neg \alpha)$ is well-formed.
If $\alpha$ and $\beta$ are well-formed, then each of $(\alpha \wedge \beta), (\alpha \vee \beta), (\alpha \rightarrow \beta)$ is well-formed.

Keep in mind that, like most other things that we're working with right now, this definition is purely syntactic.

What does a well-formed formula look like?

$p,q,r,s,\dots$; From the first rule of the definition, it's pretty clear that propositional variables alone, on their own, are a well-formed formula.
$(\neg q)$
$(p \vee r)$
$(r \rightarrow (\neg p))$
$((p \vee q) \rightarrow ((\neg p) \rightarrow r)$
$(\neg (q \rightarrow r))$

One of the nice things about well-formed formulas is that we can view them as trees, which are called parse trees. Recall from CS 136 that trees contain two types of nodes: internal nodes and leaf nodes. For parse trees, connectives are internal nodes, while propositional variables are leaf nodes. Then, each connective has as its children a subtree representing another well-formed formula. Of course, it's possible to make this all formal so we can prove interesting things about parse trees, but we won't do that yet. Instead, we'll have a look at some examples of parse trees for the list of well-formed formulas above.

Structural Induction on Formulas

One of the reasons we want to bother with formalizing all of these concepts is that we want to be precise. In that vein, it would be very good if each formula had exactly one way to read and interpret it and therefore, has exactly one meaning, particularly when we get to adding semantics to these constructs.

Our hope is that well-formed formulas will have this property. More precisely, we would like to show that there is a unique way to construct each well-formed formula. To do this, we need to somehow prove that this is true for every well-formed formula.

How can we do this? We will use everyone's favourite proof technique, induction. In fact, we will be using this a lot to prove many other things about well-formed formulas. For instance, if we have some property $P$ that we want to show that every formula has, we might want to prove something like:

Theorem. For every well-formed propositional formula $\varphi$, $P(\varphi)$ is true.

First, let's think back to MATH 135 and how we would prove some property $P$ for the natural numbers (which in computer science we believe to begin at 0) by induction:

Theorem. For all natural numbers $n \in \mathbb N$, $P(n)$ holds.

Proof.

Base case: Prove $P(0)$.
Induction step: Consider an arbitrary $k \geq 0$. Assume that $P(k)$ is true. Then prove that $P(k+1)$ is true.
By the principle of mathematical induction, $P(n)$ is true for all $n \in \mathbb N$.

$$\tag*{$\Box$}$$

We can apply the same kind of idea to other structures. For instance, what if we want to prove something about expressions?

Example 2.3. The reversal of an expression $\alpha = a_1 a_2 \cdots a_n$ is the expression with the symbols written in reverse order, $\alpha^R = a_n \cdots a_2 a_1$. However, we can also define reversal recursively: let $\varepsilon^R = \varepsilon$, where $\varepsilon$ represents the expression of length 0, and define for an expression $\beta$ and a single symbol $a$, the reversal $(\beta a)^R = a\beta^R$.

Theorem. Let $\alpha$ and $\beta$ be expressions. Then $(\alpha\beta)^R = \beta^R \alpha^R$.

Proof. We will show this by induction on $|\beta|$.

For our base case, let $|\beta| = 0$. This means that $\beta = \varepsilon$. We have \begin{align*} (\alpha\beta)^R &= (\alpha\varepsilon)^R & \text{(substitution of $\beta$)}\\ &= \alpha^R & \text{(definition of $\varepsilon$)}\\ &= \varepsilon^R \alpha^R & \text{(definition of $\varepsilon$)} \\ &= \beta^R \alpha^R & \text{(substitution of $\beta$)} \end{align*}

Now, for our induction step, let $|\beta| = k > 0$ and suppose that we have $(\alpha \gamma)^R = \gamma^R \alpha^R$ for all words $\gamma$ with $|\gamma| < k$. We write $\beta = \delta a$ for a single letter $a$. Then we have $\alpha\beta = \alpha \delta a$ and \begin{align*} (\alpha\beta)^R &= (\alpha \delta a)^R & \text{(substitution of $\beta$)} \\ &= a(\alpha \delta)^R & \text{(definition of reversal)} \\ &= a(\delta^R \alpha^R) & \text{(inductive hypothesis)} \\ &= (\delta a)^R \alpha^R & \text{(definition of reversal)} \\ &= \beta^R \alpha^R & \text{(substitution of $\beta$)} \end{align*} $$\tag*{$\Box$}$$

How does this work? First, we need to figure out what the structure in our problem is. In the above, it was the expression (not clear how? We'll see how in a bit). For mathematical induction, it was the natural numbers. Soon, we want to work with formulas.

Secondly, we need to figure out how the structure is recursive.

For the natural numbers, we have $0$, and $n+1$ for some natural number $n$.
For expressions, we have either the empty expression, or $\alpha a$ for some expression $\alpha$ and symbol $a$.
For well-formed formulas, we have propositional variables, and for well-formed formulas $\alpha$ and $\beta$, $(\neg \alpha), (\alpha \wedge \beta), (\alpha \vee \beta), (\alpha \rightarrow \beta)$ are well-formed.

In each of these cases, there is a non-recursive part of the definition and a recursive part. The non-recursive part forms our base cases, while the recursive parts become our inductive cases. Note that depending on how it's defined, our structure may have several inductive cases to take care of.

Now, we want to apply this technique to a structure like a formula. Again, let $P$ denote a property. Then our proof would look something like this:

Theorem. For all well-formed formulas $\varphi$, $P(\varphi)$ holds.

Proof.

Base case: $\varphi$ is a propositional variable $p$. Show that $P(q)$ holds.
Induction step:
1. $\varphi = (\neg \alpha)$, where $\alpha$ is a well-formed formula. Assume that $P(\alpha)$ holds. Prove that $P((\neg \alpha))$ holds.
2. $\varphi = (\alpha * \beta)$, where $\alpha$ and $\beta$ are well-formed formulas, and $*$ is a binary connective $\wedge,\vee,\rightarrow$. Assume that $P(\alpha)$ and $P(\beta)$ hold. Prove that $P((\alpha * \beta))$ holds.
By the principle of structural induction, $P(\varphi)$ holds for every well-formed formula $\varphi$.

$$\tag*{$\Box$}$$

Before moving on to the main result we want to prove, let's see how we would prove something a bit simpler using structural induction.

Lemma 2.4. Every well-formed formula $\varphi$ has an equal number of opening and closing parentheses.

Proof. We will use structural induction on the formula. Let $\varphi$ be a formula and let $R(\varphi)$ denote the property that $\varphi$ has an equal number of opening and closing parentheses.

Base case: $\varphi$ is an atom; that is, $\varphi$ is a propositional variable. Then $\varphi$ has 0 parentheses and therefore has an equal number of left and right parentheses. Thus, $R(\varphi)$ holds.

Inductive step: Let $op(\varphi)$ and $cl(\varphi)$ denote the number of opening and closing parentheses in a formula $\varphi$, respectively. We have two cases to consider.

$\varphi = (\neg \alpha)$, where $\alpha$ is a well-formed formula.
Induction hypothesis: Assume that $\alpha$ has an equal number of opening and closing parentheses.
Then we have the following: \begin{align*} op(\varphi) &= op((\neg \alpha)) \\ &= 1+op(\alpha) \\ &= 1+cl(\alpha) & \text{by induction hypothesis} \\ &= cl((\neg \alpha)) \\ &= cl(\varphi) \end{align*} Thus, $\varphi$ has an equal number of opening and closing parentheses and therefore $R(\varphi)$ holds.
$\varphi = (\alpha * \beta)$, where $\alpha$ and $\beta$ are well-formed formulas and $*$ is a binary connective $\wedge, \vee, \rightarrow$.
Induction hypothesis: Assume that each of $\alpha$ and $\beta$ have an equal number of opening and closing parentheses.
Then we have the following: \begin{align*} op(\varphi) &= op((\alpha * \beta)) \\ &= 1+op(\alpha)+op(\beta) \\ &= 1+cl(\alpha)+cl(\beta) & \text{by induction hypothesis} \\ &= cl((\alpha * \beta)) \\ &= cl(\varphi) \end{align*} Thus, $\varphi$ has an equal number of opening and closing parentheses and therefore $R(\varphi)$ holds.

By the principle of structural induction, every well-formed formula $\varphi$ has an equal number of opening and closing parentheses. Thus, $R(\varphi)$ holds for every well-formed formula $\varphi$. $$\tag*{$\Box$}$$

As it turns out, this property of well-formed formulas turns out to be the first of three properties that we will need to make use of in our attempt to prove that every well-formed formula has a unique interpretation.