CS 245 — Lecture 13

Substitution

Intuitively, a substitution of a variable $x$ in a formula $\varphi$ by a term $t$ replaces all free occurrences of $x$ by $t$ and is denoted $\varphi[t/x]$. Note we would like substitution to replace a variable by a term, which goes beyond simple renaming. Furthermore, it is very important that a substitution only replaces the free occurrences of a variable. The bound occurrences of a variable should not be changed by the substitution.

Example 13.1. Consider a formula $\varphi = (\exists x (P(x,y) \wedge Q(x,z)))$. Then, $$\varphi[f(u)/y] = (\exists x (P(x,f(u)) \wedge Q(x,z)))$$ as expected. However, $$\varphi[f(u)/x] = (\exists x(P(x,y) \wedge Q(x,z)))$$ since there are no free occurrences of $x$ in $\varphi$.

This seems straightforward, but there is a tricky scenario to deal with before we can go ahead and define a substitution formally.

Example 13.2. Consider again our formula $\varphi = (\exists x (P(x,y) \wedge Q(x,z)))$ from the previous example. Now, consider the substitution $$\varphi[x/y] = (\exists x(P(x,x) \wedge L(x,z))).$$ There is clearly a problem here. Here, $y$ was free, so we were free to substitute it, but in not being careful, we have now bound the variable to $\exists x$. Now, if we had a formula $\varphi' = (\exists u(P(u,y) \wedge Q(u,z)))$, we could have performed the substitution $$\varphi'[x/y] = (\exists u(P(u,x) \wedge Q(u,z)))$$ and avoided this problem.

The phenomenon that we encountered is called variable capture. Without any further qualification, we are in danger of having substitution potentially change the meaning of our formula based entirely on our choice of variable names. As we observed, this problem is easily fixed, since we know we can rename bound variables safely. That variable capture can happen means we have to define subtitution carefully, but luckily, we already know the fix and can easily formalize the process of doing so.

Definition 13.3. Let $x$ be a variable and $t$ a term. We define the substitution $[t/x]$ as follows.

For a term $u$, the term $u[t/x]$ is $u$ with each occurrence of $x$ replaced by $t$.
For a formula $\varphi$,
1. if $\varphi = P(t_1, \dots, t_n)$ then $\varphi[t/x] = P(t_1[t/x], \dots, t_n[t/x])$.
2. if $\varphi = (\neg \alpha)$, then $\varphi[t/x] = (\neg \alpha[t/x])$.
3. if $\varphi = (\alpha * \beta)$, then $\varphi[t/x] = (\alpha[t/x] * \beta[t/x])$.
4. if $\varphi = (\mathsf{Q}x \alpha)$, then $\varphi[t/x] = \varphi$.
5. if $\varphi = (\mathsf{Q}y \alpha)$, for $y \neq x$, then
  1. if $y$ does not occur in $t$, then $\varphi[t/x] = (\mathsf{Q} y \alpha[t/x])$.
  2. Otherwise, choose a variable $z$ that occurs in neither $\varphi$ or $t$. Then $\varphi[t/x] = (\mathsf{Q} z (\alpha[z/y])[t/x])$.

Note that we rename the variable that causes us problems in the formula, not the term.

Example 13.4. Let $\varphi = (\forall x (\exists y P(f(x,y),z)))$. What is $\varphi[g(y,c)/z]$? Since the term we wish to substitute contains a variable that is quantified in $\varphi$, we need to apply (5b). Let $\psi = P(f(x,y),z)$. Then $\varphi = (\forall x (\exists y \psi))$. We choose a new variable $w$ to obtain $$\psi[w/y] = P(f(x,w),z).$$ This gives us $$\psi[w/y][g(y,c)/z] = P(f(x,w),g(y,c)).$$ With this, we obtain $$\varphi[g(y,c)/z] = (\forall x (\exists w P(f(x,w),g(y,c)))).$$

Semantics of Predicate Logic

We are now ready to talk about what things really mean in predicate logic. If we think back to propositional logic, we worked with the notion of valuations and these valuations represented some interpretation of the propositions. We now need to extend our understanding of these notions to the machinery of predicate logic.

Where before, in propositional logic, we assigned truth values to propositional atoms, in predicate logic, the smallest unit to which we can assign a truth value is a predicate $P(t_1, \dots, t_n)$. However, we can't just assign truth values all willy-nilly to a predicate $P(t_1, \dots, t_n)$. We need to carefully define what terms, variables, and functions mean and make sure that our assignment of truth values is consistent.

This is something I've mentioned a lot ever since we were talking about Gordon Ramsay and his dog taking CS 245. We had a formula $(\forall x((P(x) \wedge (\neg Q(x))) \rightarrow R(x)))$ for the sentence "Every student who took CS 245, but did not pass CS 245, failed CS 245". So here, $x$ ranges over all students (and maybe their dogs). From our previous discussion, terms would denote students and the predicates $P,Q,R$ are predicate symbols representing properties of students concerning their status in CS 245. We can think of $P$ as the set of students who took CS 245 and interpret $P(x)$ as $x \in P$ and do the same with $Q$ and $R$.

But this stuff about students and CS 245 was just our interpretation from last week. We also wanted to talk about integers and we had the following statement just by replacing our definitions of the predicates:

$P(x) = $ "$x$ is a natural number",
$Q(x) = $ "$x$ is an even number",
$R(x) = $ "$x$ is an odd number".

All of a sudden, our formula about students taking CS 245 is reinterpreted as a statement about even numbers: every natural number that is not even is odd. What if we define $R(x)$ to be $x$ is a prime number? We would have a statement that reads very similarly but would not be true anymore. Of course, there's nothing inherently wrong with any of these interpretations.

We can keep extending this idea. Remember our predicate $L(x,y)$ about student $x$ loving course $y$. Just like the unary predicates above, we can reinterpret this as a set of pairs $(x,y)$. This is a relation in the mathematical sense. Again, we can reinterpret $L$ as a predicate about something else, maybe coordinates on a Cartesian plane? And of course, we can do this with $k$-ary predicate symbols and interpret them as $k$-ary relations.

Interpretations

I keep using the word interpretation because that is essentially what we are doing when we imbue our abstract symbols with meaning. We make concrete the values, functions, and relations that we want to talk about. As it turns out, we use the word interpretation (rather than valuation) to talk about the assignment of meaning in predicate logic.

Definition 13.5. Let $L$ be a set of constant symbols, function symbols, and predicate symbols. We say $L$ is a language. An interpretation (or model) $\mathcal I$ for $L$ consists of

a non-empty set $D^\mathcal I$ (or $\operatorname{dom}(\mathcal I)$), called the domain, or universe of $\mathcal I$;
for each constant symbol $c$, an element of the domain $c^\mathcal I \in D^\mathcal I$;
for each $n$-ary function symbol $f$, a total $n$-ary function $f^\mathcal I : (D^\mathcal I)^n \to D^\mathcal I$;
for each $n$-ary predicate symbol $P$, an $n$-ary predicate (or relation) $P^\mathcal I \subseteq (D^\mathcal I)^n$.

So how do we interpret terms and predicates within a particular interpretation?

Definition 13.6. Let $\mathcal I$ be an interpretation and $t$ a term without variables. Then $t^\mathcal I$, the value of $t$ under interpretation $\mathcal I$ is given by

if $t$ is a constant $c$, then $t^\mathcal I = c^\mathcal I$;
if $t = f(t_1, \dots, t_n)$, where $f$ is an $n$-ary function symbol, then $t^\mathcal I = f^\mathcal I (t_1^\mathcal I, \dots, t_n^\mathcal I)$.

Recall that since $f^\mathcal I$ is a function that takes values from $D^\mathcal I$ and maps them to an element of $D^\mathcal I$, we have by our definition that a term is always an element of the domain $D^\mathcal I$.

Definition 13.7. Let $\mathcal I$ be an interpretation and $\varphi$ a formula containing no variables. Then we define $\varphi^\mathcal I$, the value of $\varphi$ under interpretation $\mathcal I$ as follows.

If $\varphi = P(t_1, \dots, t_n)$, then $$\varphi^\mathcal I = (P(t_1, \dots, t_n))^\mathcal I = \begin{cases} T & \text{if $\langle t_1^\mathcal I, \dots, t_n^\mathcal I \rangle \in P^\mathcal I$} \\ F & \text{otherwise.} \end{cases}$$
If $\varphi = (\neg \alpha)$ or $(\alpha * \beta)$, then $\varphi^\mathcal I$ is determined by $\alpha^\mathcal I$ and $\beta^\mathcal I$ in the same way as for propositional logic.

Example 13.8. Let our language consist of the following symbols:

constant symbols 0 and 1,
a binary function symbol $+$,
a binary predicate symbols $=$.

With this, we can construct formulas like $((0+1)=1)$ or $(\forall x (\neg (0 = x + 1)))$ (yes, these are not well-formed by our definition, but we will allow ourselves the infix notation where it "makes sense"). Now, we'll consider an interpretation $\mathcal I$ for this language.

The domain $D^\mathcal I$ is the natural numbers, $\mathbb N$.
$0^\mathcal I = 0$ is zero.
$+^\mathcal I: \langle m,n \rangle \mapsto m + n$ is the addition of natural numbers.
$=^\mathcal I$ is the set $\{\langle n,n \rangle \mid n \in \mathbb N\}$, the equality relation.

The under the interpretation $\mathcal I$, the formula $((0+1)=1)^\mathcal I$ says "the addition of zero and one is equal to one" and $(\forall x (x+0 = x))^\mathcal I$ says "any number added to zero is the same number" (although we haven't talked about how to interpret variables yet so don't take this too seriously yet) and under $\mathcal I$, both of these formulas are true.

This language and interpretation is allows us to express the axioms for Presburger arithmetic. Like Peano arithmetic, it is meant to formalize arithmetic over the natural numbers. However, Presburger arithmetic only expresses addition, while Peano arithmetic can express multiplication. One can think of Peano arithmetic as Presburger arithmetic with multiplication, and so it is not too hard to see that Peano arithmetic is more powerful in what kinds of things can be expressed using it compared to Presburger arithmetic. However, this great power comes at a cost...

Example 13.9. Recall Example 12.1 from two classes ago, where we had a wacky looking bunch of symbols. Here they are again for completeness.

constants: $0,å,£,の$
unary predicates: $É,\Omega$
binary predicate: $\kappa$
unary function: $\heartsuit$
binary functions: $\square, \natural$
ternary function: $\leadsto$

Just like in the previous example, we can give meaning to this eclectic set of symbols by defining an interpretation $\mathcal J$. Let the domain of $\mathcal J$ be $D^\mathcal J = \mathbb R$. We define the constants $$ 0^\mathcal J = 1 \quad å^\mathcal J = 2 \quad £^\mathcal J = 0.6 \quad の^\mathcal J = 0.$$ We define the predicates by

$É^\mathcal J = \{x \in \mathbb Z \mid x \bmod 2 = 0\}$, $x$ is even,
$\Omega^\mathcal J = \{n \in \mathbb Z\}$, $n$ is an integer,
$\kappa^\mathcal J = \{\langle x,y \rangle \mid x = y\}$, $x$ and $y$ are equal.

And we will define the functions by

$\heartsuit^\mathcal J : x \mapsto x+1$,
$\square^\mathcal J : \langle x,y \rangle \mapsto x^2 + y^2$,
$\natural^\mathcal J : \langle x,y \rangle \mapsto |x + y|$,
$\leadsto^\mathcal J : \langle x,y,z \rangle \mapsto \min \{x,y,z\}$.

Now, we can construct formulas with this language and interpret it via the interpretation we just defined. For instance, $\Omega(£)^\mathcal I$ means "0.6 is an integer" and $(\neg \kappa(0,\leadsto(å,£,の)))^\mathcal I$ says "1 is not equal to the minimum of 2, 0.6, and 0".

Under the interpretation $\mathcal J$, the former formula is false while the latter is true. If we change the interpretation, we can change the truth of the formulas. Consider an interpretation $\mathcal J'$, where everything is the same as $\mathcal J$ except that we have $$\Omega^{\mathcal J'} = \{n \in \mathbb Q\}$$ so that $\Omega(x)$ is now the property that $x$ is a rational number. In this case, $\Omega(£)^{\mathcal J'}$ is true.