Finally, how do we know that what we've defined is correct? Well, we don't—we can define whatever we want. But what we might care about is whether what we've defined is interesting (what a mathematician might say) or useful (what a computer scientist might say). To do so, we might want to know some things about our definition: what properties does it have?
For example, we don't really know anything about addition yet but we might have some guesses. For example, we might want to know that adding zero to any number $n$ gives us $n$. Formally, we call $z$ an additive identity. This is very easy to see.
For all natural numbers $n$, $z+n = n$.
How do we know this is true? Well, this is exactly the definition of addition! We can formalize this with a proof.
Let $n$ be an arbitrary natural number. Then by Definition 1.6(2), $z + n = n$.
Okay, very simple. But what if we wanted to know something like the following?
For all $n \in \mathbb N$, $n+z = n$.
Notice that we can't appeal to our definition anymore because the numbers aren't in the right order. Now, you might say that it's obvious that $a+b = b+a$, but we don't actually know this from our definition!
However, this seems to be true. How can we convince ourselves that this is actually true? We might try a few examples to test it.
If $n = z$, we have $z + z = z$ by Definition 1.6(1).
If $n = \operatorname{succ}(z)$, we have \begin{align*} \operatorname{succ}(z) + z &= \operatorname{succ}(z + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(z) &\text{Definition 1.6(1)} \end{align*}
If $n = \operatorname{succ}(\operatorname{succ}(z))$, we have \begin{align*} \operatorname{succ}(\operatorname{succ}(z)) + z &= \operatorname{succ}(\operatorname{succ}(z) + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(z + z)) &\text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(z)) &\text{Definition 1.6(1)} \end{align*}
If $n = \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)))$, then we have \begin{align*} \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) + z &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)) + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z) + z)) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z + z))) &\text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) &\text{Definition 1.6(1)} \end{align*}
So, this looks a lot like our computation for the factorial, in that it is very repetitive. However, that is the key to our proof: the observation here is that our computation of $n+z$ depends on the computation of whatever the number that comes before $n$ is. By the first line of our computation, we have that \[\operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) + z = \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)) + z)\] but we've already proven what $\operatorname{succ}(\operatorname{succ}(z)) + z$ is! We could instead write something like \begin{align*} \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) + z &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z)) + z) & \text{Definition 1.6(2)} \\ &= \operatorname{succ}(\operatorname{succ}(\operatorname{succ}(z))) &\text{Proof of $\operatorname{succ}(\operatorname{succ}(z)) + z$} \end{align*}
What would be convenient is if there were a way to use this idea: that we know that if we can prove a property $P$ for a natural number $n$, we can use that proof to prove that $P$ also holds for $\operatorname{succ}(n)$. This is the idea behind the principle of mathematical induction.
We will prove this by induction on $n$.
Base case. If $n = z$, then \begin{align*} n + z &= z + z &\text{substitution} \\ &= z &\text{Definition 1.6(1)} \\ &= n &\text{substitution} \end{align*} Therefore, our claim holds for the base case.
Inductive case. If $n = \operatorname{succ}(k)$ for some natural number $k$, then assume that $k+z = k$. We will show that $\operatorname{succ}(k) + z = \operatorname{succ}(k)$. Consider the following: \begin{align*} n + z &= \operatorname{succ}(k) + z &\text{substitution} \\ &= \operatorname{succ}(k + z) &\text{Definition 1.6(2)} \\ &= \operatorname{succ}(k) &\text{inductive hypothesis} \end{align*} Therefore, our claim holds for the inductive case and therefore, $n + z = n$ for all natural numbers $n$.
What does a proof by induction look like? There are a few steps. First, observe that like our discussions of recursive functions and recursive data types, the structure of the proof follows the structure of the recursive definition.
But why are we allowed to do this? This seems very arbitrary.
Mathematical induction happens to be an axiom of the Peano definitions for natural numbers. We can state this for a property $P$ in first-order logic as \[P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))) \rightarrow \forall n P(n).\] What this tells us is that if we can show that the hypothesis, $P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k)))$, is true, then we can conclude that $\forall n P(n)$.
But how do we know or do this? In order to dissect this further, we need to talk about mathematical logic and proof.
Mathematical logic is the study of reasoning from the point of view of mathematics. It's a rich and deep field of study in its own right, but in the context of computer science, it's particularly important. The development of mathematical logic in the early 20th century leads directly to the first notions of computability and ultimately the birth of computer science.
For the purposes of this course, the basics of mathematical logic allow us to develop a language to express mathematical ideas and a framework in which to prove their validity. There are two parts to this.
We first describe how to express things. That is, how we can say something like our induction axiom, \[P(z) \wedge \forall k (P(k) \rightarrow P(\operatorname{succ}(k))) \rightarrow \forall n P(n)\] as well as what it means.
We then follow up with how to construct proofs about those statements. That is, how do we prove the validity of the sentence above? What does its structure say about how we should approach this?
Notice that we used informal, natural language to say things:
For all natural numbers $n$, $n+z = n$.
We want to (and can) express this more formally using mathematical logic. In the strictest sense, the language of mathematical logic is really just a collection of symbols and some rules about how to arrange them. So, you're allowed to write something that looks like this, but not that, and so on and so forth. The magic comes when we define meaning for these symbols.
Generally, we assume that we are working in classical logic. Here, our statements take on truth values: true or false. However, this is not the only interpretation we can take—we will see an alternative point of view when we discuss proof which has interesting connections to computer science. But generally for this course and for most mathematics, we use classical logic.
The basic element of logic is the proposition. A proposition is a statement that is either true or false. All mathematical results are propositions—they are statements that are either true or false. Even something as simple as $2+2 = 4$ is a proposition: the statement asserts that the addition of 2 and 2 is equivalent to 4. This happens to be true. But $2+2 = 5$ is also a proposition—one which is false.
Since all these statements are propositions, the names that we give to them really only describe their importance.
We can then combine several propositions together to create more interesting and complex propositions. This is done by using logical connectives, which modify the meaning of the propositions they combine in various ways.
There are four basic logical connectives, also called Boolean connectives, after George Boole who defined them in 1854. We will define and consider each one.
The connective $\wedge$ is called conjunction, and the proposition $p \wedge q$ is pronounced "$p$ and $q$". The proposition $p \wedge q$ is true if and only if both $p$ and $q$ are true.
This connective expresses the idea that both $p$ and $q$ are true. The usual way you've likely seen this defined is via truth table.
| $p$ | $q$ | $p \wedge q$ |
|---|---|---|
| $T$ | $T$ | $T$ |
| $T$ | $F$ | $F$ |
| $F$ | $T$ | $F$ |
| $F$ | $F$ | $F$ |
Truth tables are a representation for classical logic, which makes it seem like logical formulas are just for computing truth values. However, things become a bit more interesting if we consider the question of what a proof of $p \wedge q$ should look like.
The connective $\vee$ is called disjunction. The proposition $p \vee q$ and is pronounced "$p$ or $q$" and is true if and only if $p$ is true or $q$ is true or both.
| $p$ | $q$ | $p \vee q$ |
|---|---|---|
| $T$ | $T$ | $T$ |
| $T$ | $F$ | $T$ |
| $F$ | $T$ | $T$ |
| $F$ | $F$ | $F$ |
One tricky thing to note with English is that many times we use "or" to mean "exclusive or". For instance, when you are asked whether you prefer beef or chicken, the expectation is that you may only choose one and not both. This logical connective allows for both $p$ and $q$ to be true, which corresponds to something like "and/or" in English.
The connective $\neg$ is called negation and the proposition $\neg p$ is pronounced "not $p$". The proposition $\neg p$ is true if and only if $p$ is false.
If $q$ is the proposition "I can take two weeks off", then the proposition $\neg q$ is "I cannot take two weeks off". In other words, when translating natural language to formal language, negation modifies the proposition. Notice that negation is a unary connective, in that it only applies to one proposition, while all the other connectives are binary. This also means that propositions stated negatively are considered to be positive propositions modified by negation.
These three connectives are ones you're probably most familiar with in the context of programming: they are the connectives that appear as boolean operators which you use to construct boolean expressions.
However, there is one more boolean connective which is used in mathematical logic and is arguably more important.
The connective $\rightarrow$ is called implication. The proposition $p \rightarrow q$ is pronounced "if $p$, then $q$" and $p \rightarrow q$ is true if and only if whenever $p$ is true, $q$ is true also.
In the proposition $p \rightarrow q$, we call $p$ the hypothesis and $q$ the conclusion.
This connective is often a bit trickier to think about because it's not commutative like the conjunction or disjunction. However, it's important because mathematical statements are phrased in this way: we state conditions as our hypothesis and we are guaranteed the conclusions as long as the hypothesis is true. We can see how this is caputred in the truth table.
| $p$ | $q$ | $p \rightarrow q$ |
|---|---|---|
| $T$ | $T$ | $T$ |
| $T$ | $F$ | $F$ |
| $F$ | $T$ | $T$ |
| $F$ | $F$ | $T$ |
It is worth reasoning through each scenario.
Consider the propositions
In this case, $p \rightarrow q$ is "If I have obtained a computed grade of at least 90%, then I am assigned a grade of A."
Since $p \rightarrow q$ and $q \rightarrow p$ are not necessarily logically equivalent, we call $q \rightarrow p$ is called the converse of $p \rightarrow q$.
Note that the Boolean connectives are not the only logical connectives out there. There are a few more that are in common use in programming or computer engineering. It also isn't true that the Boolean connectives are necessary, in the sense that if you take one of the Boolean connectives out, there's a way to rewrite any proposition that uses the connective we took out by using a combination of the other three. In fact, we only need two of the Boolean connectives, or one bespoke connective to express everything we could using the four Boolean connectives.
One problem that we run into very quickly is that propositional logic is not quite powerful enough for our needs. For instance, if I wanted to say something like
$x$ is odd and $x$ is less than $y$
then I can define propositions for them. For instance, $p$ can be "$x$ is odd" and $q$ can be "$x$ is less than $y$". This would give us the statement $p \wedge q$.
But this is not very satisfying or helpful. We see that there's no way to express the fact that the two different propositions are really talking about the same $x$. Why is that? The problem lies in the fact that propositional logic is about propositions but what we really care about are not the propositions themselves but what we can say about things: numbers, vectors, functions, cars, and so on.
So if we want a language to talk about things, we need to expand our language a bit further. We will be discussing predicate or first-order logic.
First, we need to determine a domain of discourse: what are the things that we want to talk about? This can be something like the natural numbers $\mathbb N$ or the integers $\mathbb Z$ or matrices or functions or graphs and so on. There's also nothing stopping us from making our domain the set of all dogs, if what we really wanted to do was express statements about dogs, or other "real-world" types of objects. (This occurs less in math but more in places like database theory.)
Then, we want to express properties and relationships about the objects in our domain. For this, we need to define predicates. Predicates are written $P(x_1, x_2, \dots, x_k)$ and can be thought of as statements about things $x_1$ through $x_k$.
Let's try to express $x$ is odd and $x$ is less than $y$ as a sentence in predicate logic. We will define the following predicates.
This is enough to get us something like $P(x) \wedge Q(x,y)$ for the example statement from above. Notice that we are able to say that $x$ is the same among the two statements and that we're able to relate $x$ and $y$. Observe also that many of the relations that we're used to expressing are really predicates. For instance $=$ is another example of a predicate.
But what are $x$ and $y$? At this point, they are both not anything at all and at the same time can be anything. This is because they are variables to which we have not given any meaning. They are still quite useful at this point because they give us a way to name things. But one way we can use them is to set them (using other predicates), say like $x$ is 3 and $y$ is 5.
However, we often want to say more general things than that. Since we have a domain of discourse, we often want to make claims about various objects in the domain. We want to say that all of the things have some particular property or that there are objects that have various properties. Such statements quantify over our domain, so we define quantifiers to talk about them.
The symbol $\forall$ is called the universal quantifier and is read "for all". The statement $\forall x P(x)$ is true if and only if $P(x)$ is true for every element $x$ in the domain.
The symbol $\exists$ is called the existential quantifier and is read "there exists". The statement $\exists x P(x)$ is true if and only if $P(x)$ is true for at least one element $x$ in the domain.
Basically, we use $\forall$ when we want to make a statement about all objects in our domain ("every dog is a good boy") and we use $\exists$ when we want to say that there is some object out there in our domain that has a particular property ("there is a dog that can play Jenga").
Suppose our domain is the integers $E(x)$ is the statement "$x$ is an even number". We can use predicates and quantifiers together with the logical connectives we introduced with propositional logic to say things like $\forall x (E(x) \rightarrow \neg E(x+1))$, which says that "for every integer $x$, if $x$ is even, then $x+1$ is not even".
We can define statements that relate more than one object. Suppose our domain is the natural numbers and $L(x,y)$ is the statement "$x$ is less than $y$". We can consider the following proposition: $$\forall x \exists y L(x,y).$$ This reads "for every natural number $x$, there exists a natural number $y$ such that $x \lt y$". This statement turns out to be true: every natural number does have some other natural number that it's less than. Now, let's consider the following similar statement, where the order of the quantifiers is reversed: $$\exists y \forall x L(x,y).$$ Here, we're saying that there is a natural number $y$ for which every natural number $x$ is less than it. In other words, this says that there is a "biggest" natural number, which is obviously not true. As we can see, the order of the quantifiers matters.
A point about notation here. Notice that we write $\forall x P(x)$, which leaves the domain implicit. This is technically the correct way to write this—the domain of discourse is fixed beforehand. However, you may sometimes see such statements written as $\forall x \in \mathbb R, P(x)$, if the writer wanted to use $\mathbb R$ as the domain. This is not really necessary if the domain is $\mathbb R$, but what if the domain wasn't? What was really attempted then was shorthand for something like $\forall x (R(x) \rightarrow P(x))$, where $R(x)$ would be a predicate for "$x$ is a real number".