Now that we've defined the language of first-order logic, we can move on to talking about set theory. Modern mathematics happens to be built on the idea that everything is a set. Everything. Functions? Sets. Numbers? Sets. Problems? Sets. Sets? Sets. So, it's important to familiarize ourselves with the basics of sets and how they work.
A set is an unordered collection of objects. If $S$ is a set and $a$ is a member of $S$, then we write $a \in S$. Similarly, if $a$ is not a member of $S$, then we write $a \not \in S$.
Here, objects is used loosely. Often, we'll just stick to numbers, but these can really be anything: books, trees, other sets, or anything else. One point of interest is that in theory, datatypes or simply types from programming languages, are themselves just sets.
Note that since sets are unordered, this means that $\{1,2,3\}$ and $\{3,2,1\}$ are the same set. Furthermore, since an element is either a member or not a member of a set, each element occurs in a set at most once. This means that $\{2,1,3,2\}$ is really just $\{1,2,3\}$.
The set $\{\}$ containing no elements is the empty set and is denoted by $\emptyset = \{\}$.
In light of the above remark about objects in sets, note that $\{\emptyset\}$ is not the same as $\emptyset = \{\}$. The set $\{\emptyset\}$ is actually the set containing the empty set, which means it is not empty; it contains one element: the empty set.
There are two common ways to define sets. First, we can just list the elements of the set, like $\{1, 2, 3\}$. This is easy to do for finite sets. For infinite sets, we can do something like $\{1, 2, 3, \dots\}$ and usually, that's enough for us to understand that this is the set containing all natural numbers.
For more complicated definitions, we define sets using comprehensions, also called set-builder notation.
Similar to what we did above, if we wanted to define the set of even natural numbers, we can write something like $\{2,4,6,\dots\}$. Using comprehensions, we can also write $$\{n \in \mathbb N \mid \exists m \in \mathbb N, n = 2 \cdot m \},$$ which says that we want the set of all natural numbers $n$ such that $n$ can be expressed as $2 \cdot m$ for some natural number $m$. That, of course, turns out to be the even natural numbers.
We can do more complicated things with this, since there aren't very many hard rules about what kinds of conditions we can or cannot place on our sets. For instance, we can define the set $$\{n \in \mathbb N \mid \exists m \in \mathbb N, (2 \cdot m = n) \wedge (\forall k \in \mathbb N, n \neq k^2) \}.$$ Sometimes, we don't even bother with that level of formality and just write something like $$\{n \in \mathbb N \mid \text{$n$ is even and not a square}\}.$$ This means that we have a lot of power in how we want to define our sets. Without any further restrictions on what we can say when defining sets, this puts us in the realm of naive set theory. This system was devised by Georg Cantor and later, Gottlob Frege as part of his quest to formalize a foundation for mathematics.
However, we can get into trouble if we're not careful about how we define sets. Bertrand Russell noticed this problem, leading to the following famous example: $$S = \{T \mid T \not \in T\}.$$ If we let something like this happen, then we get to conclude $S \in S \iff S \not \in S$, which is very bad if we want mathematics to be consistent (when Russell communicated this to Frege, he was quite spooked). This led to the development of axiomatic set theory. For our purposes, we won't need to delve into axiomatic set theory, but it is important to note that when we talk about set theory being the basis of all mathematics, what we really mean is the axiomatic set theory.
Two sets $A$ and $B$ are equal if and only if they have the same elements. That is, $$A = B \iff (\forall x, x \in A \iff x \in B).$$
A set $S$ is a subset of $T$, written $S \subseteq T$, when every element of $S$ belongs to $T$. A set $S$ is a proper subset of a set $T$, written $S \subsetneq T$, when $S$ is a subset of $T$ and there exists an element of $T$ which does not belong to $S$.
You may notice that sometimes $\subset$ is used for proper subset. This works quite nicely with $\subseteq$ meaning subset. However, we'll avoid this particular notation because there are many other mathematicians who use $\subset$ to mean (not necessarily a proper) subset. Instead, we will use $\subseteq$ and $\subsetneq$ to keep things clear.
From the above, we note that by definition, we have $S \subseteq S$ and if $S = T$, we have $S \subseteq T$ and $T \subseteq S$. This second definition gives us an alternate characterization of two sets that are equal.
The cardinality of a set $S$ is the number of elements in $S$ and is denoted $|S|$. If $S$ is finite, then this will be a natural number. So, the size of the set $\{1,2,4,8\}$ would be $|\{1,2,4,8\}| = 4$.
If $S$ is an infinite set, then things are a bit trickier. The cardinality of the natural numbers is defined to be $|\mathbb N| = \aleph_0$, while the cardinality of the real numbers is $|\mathbb R| = 2^{\aleph_0}$. Here, we reach into the Hebrew alphabet for $\aleph$ (aleph). Anyhow, the cardinalities of $\mathbb N$ and $\mathbb R$ are clearly not the same, a fact famously proved by Cantor in 1891. There are many infinite sets that have cardinality $\aleph_0$, such as the set of even natural numbers $\{n \mid \exists m \in \mathbb N, n = 2\cdot m\}$ or the set of rational numbers $\mathbb Q = \{\frac m n \mid \exists m,n \in \mathbb N\}$. There are also many sets with cardinality $2^{\aleph_0}$. This brings us to a fun problem for you to think about in your spare time: are there any infinite sets that have cardinality strictly between $|\mathbb N| = \aleph_0$ and $|\mathbb R| = 2^{\aleph_0}$?
The power set of a set $A$ is the set containing all of the subsets of $A$, $$\mathcal P(A) = \{S \mid S \subseteq A\}.$$
If $A$ is a finite set, then $|\mathcal P(A)| = 2^{|A|}$.
We will prove this later, in a few ways. However, because of this fact, you'll sometimes see the power set of a set $A$ denoted by $2^A$. This also gives you some insight about why it might be that $|\mathbb R| = 2^{\aleph_0}$.
Recall that sets are unordered collections and don't contain multiples of elements. But what if order does matter? We need to define another structure to work with.
An $n$-tuple $(a_1, a_2, \dots, a_n)$ is an ordered collection that has $a_1$ as its first element, $a_2$ as its second element, $\dots$, and $a_n$ as its $n$th element. An ordered pair is a 2-tuple.
Observe that since tuples are ordered, we have $(a_1, \dots, a_n) = (b_1, \dots, b_n)$ if and only if $a_i = b_i$ for $i = 1, \dots, n$.
As a side note, we claimed earlier that everything in mathematics is just some kind of set, so how would we define tuples using sets? To keep things simple, we'll consider the definition of the pair first. We can define $(x,y)$ to be the set $\{\{x\}, \{x,y\}\}$. In this way, we can distinguish $x$ and $y$ and we have a way to determine the order in which they appear. This definition is due to Kuratowski in 1921. We can then generalize this definition for arity $n \gt 2$.
We'll begin going through operations that we can perform on sets with one that is related to our discussion of tuples.
The cartesian product of two sets $A$ and $B$ is $$A \times B = \{(a,b) \mid a \in A, b \in B\}.$$
We can generalize this to products of $n$ sets.
The cartesian product of $n$ sets $A_1, A_2, \dots, A_n$, denoted $A_1 \times A_2 \times \cdots \times A_n$ is defined $$A_1 \times A_2 \times \cdots \times A_n = \{(a_1, a_2, \dots, a_n) \mid a_i \in A_i, i = 1, 2, \dots, n\}.$$
Everything is a set and it turns out that the definitions of predicates I vaguely danced around earlier are defined much more concretely once we know some set theory. Suppose I have an interpretation $\mathcal I$, which is just what we'll call how we define our domain and the meaning of our predicates. We want to distinguish the symbols from how we're interpreting them, since we can take the same formula made up of the same symbols and apply a different interpretation to them. Don't worry too much about this because we won't be using this again.
First, the domain is a set; let's call it $\mathcal D$. Then we can define a $k$-ary predicate $P$ under our interpretation $\mathcal I$ as a set of $k$-tuples $(a_1, \dots, a_k)$. In other words, $P^\mathcal I \subseteq \mathcal D^k$. Then we can define the following: $$P(x_1, \dots, x_k) = \begin{cases}T & \text{if $(x_1^\mathcal I, \dots, x_k^\mathcal I) \in P^\mathcal I$,} \\ F & \text{if $(x_1^\mathcal I, \dots, x_k^\mathcal I) \not \in P^\mathcal I$.} \end{cases} $$
Suppose we have predicates $E(x)$ for "$x$ is even" and $L(x,y)$ for "$x$ is less than $y$" and our domain is $\mathbb N$. We can define $E$ and $L$ in an interpretation $\mathcal I$ as follows: \begin{align*} E^\mathcal I &= \{n \in \mathbb N \mid \exists m \in \mathbb N, n = 2m\} \\ L^\mathcal I &= \{\langle x,y \rangle \in \mathbb N^2 \mid x \lt y\}. \end{align*}
This definition actually points to a correspondence between what we did in logic and the set operations we'll now define. In some cases, the correspondence is very obvious because the definitions themselves are taken right from logic.
The union of two sets $S$ and $T$, denoted $S \cup T$ is defined $$S \cup T = \{x \mid x \in S \vee x \in T\}.$$ The intersection of two sets $S$ and $T$, denoted $S \cap T$, is defined $$S \cap T = \{x \mid x \in S \wedge x \in T\}.$$ The complement of a set $S$, written $\overline S$, is the set of all elements not in $S$, $$\overline S = \{x \mid x \not \in S\}.$$
We can also define $\overline S$ relative to a universal set $U$ by $\overline S = \{x \in U \mid x \not \in S\}$. If we wanted to get much more formal and delve into the world of axiomatic set theory, we would have to place some conditions on what $U$ can or can't be. However, for our purposes, this is fine and the notion of the universal set corresponds nicely with the notion of the domain from first-order logic.
So far, we have $\vee, \wedge, \neg$ covered (since $x \not \in S$ is $\neg (x \in S)$), but what about $\rightarrow$? This turns out to be something we defined last class: subset. Recall that $A \subseteq B$ if $\forall x (x \in A \rightarrow x \in B)$. The correspondence is right there in the definition, hiding in plain sight.
This means we can translate some equivalences of propositional logic to set theory.
For two sets $A$ and $B$, \begin{align} \overline{A \cup B} &= \overline A \cap \overline B, \\ \overline{A \cap B} &= \overline A \cup \overline B. \end{align}
Another common operation is set difference.
The difference of two sets $S$ and $T$, denoted $S \setminus T$, is defined $$S \setminus T = \{x \mid x \in S \wedge x \not \in T\}.$$
Using this, we can say things like $\overline S = U \setminus S$ where $U$ is a universe. This checks out, because by defintiion, we have $$U \setminus S = \{x \mid x \in U \wedge x \not \in S\}.$$
A final important property of sets is to talk about the size of the sets after performing some operations on them. The following is a basic, but important property.
For all finite sets $A$ and $B$, $|A \cup B| = |A| + |B| - |A \cap B|$.
A simple proof of this goes like this: we count all of the elements in $A$ and then count all of the elements of $B$. However, if there are elements in both $A$ and $B$, we've counted them twice, so we have to take out $|A \cap B|$ from our count.
We say two sets $S$ and $T$ are disjoint if $S \cap T = \emptyset$.
If $A$ and $B$ are disjoint sets, then $|A \cup B| = |A| + |B|$.
This is easy to see, since the intersection of $A$ and $B$ is empty.
Here's an interesting application of set operations. Recall from your programming classes that there is this notion of a list. In particular, we would like to consider linked lists which are lists that can be of arbitrary length and this length can be easily changed (in contrast with, say, arrays). One typical understanding of a linked list is that it's made up of nodes containing a value and a pointer to the next item of the list, but this is a very specific implementation of the mathematical definition of a list, which you might find more common in functional programming languages.
A list is either
- empty, or
- a pair containing a value and a list.
Again, we have a recursive definition, but this time we can involve the notions that we've built up about sets to define a list datatype: a list is just the union of the set containing the empty list and the set of pairs of a value and a list. And recall that pairs are just elements of the products of sets—it's sets all the way down.
You may have learned various definitions of what is or isn't a function. Typically, we think of functions in a very calculus kind of way, where we assign a value $x$ to some other value based on some expression like $x^2$. Then, we come across more complicated functions, that are defined piecewise, or maybe weird functions like the Heaviside function or the parity function. Here, we'll make use of the set theory concepts we've just defined to work towards a general definition for functions.
A relation $R$ with domain $X$ and co-domain $Y$ is a subset of $X \times Y$.
We can see from the definition that a relation really is just a subset of the cartesian product of some sets. In other words, it's a set of tuples. This also resembles the set-theoretic definition of predicates from above and it's not entirely a coincidence that we think of $k$-ary predicates as relations.
A relation $R \subseteq A \times B$ is total if $\forall a \in A, \exists b \in B, R(a,b)$. A relation $R$ is single-valued if $\forall a \in A, \forall b_1, b_2 \in B, R(a, b_1) \wedge R(a,b_2) \rightarrow b_1 = b_2$.
These properties lead us right to a definition for functions.
A function $f : A \to B$ is a total, single-valued relation with domain $A$ and co-domain $B$. The set $B$ is also called the range of $f$.
Again, based on the definition, a function is really just a relation that satisfy the properties we just defined, totality and single-valued-ness. And if we go further, a function is just a set, like everything else in mathematics. The restrictions on the domain and co-domain of the function tracks with our intuitive understanding of functions that we may have learned earlier. For instance, totality guarantees that a function gives us something for every value we throw at it, while single-valuedness guarantees that our function will only give us one possible output for every input that we give it.
We also have different kinds of functions that are important and worth distinguishing.
A function $f:A \to B$ is one-to-one or injective if $$\forall x, y \in A, f(x) = f(y) \rightarrow x = y.$$
The successor function $f:\mathbb N \to \mathbb N$ defined by $f(x) = x+1$ is one-to-one. On the other hand, the function $g:\mathbb N \to \mathbb N$ defined by $g(x) = \left\lfloor \frac x 2 \right\rfloor$ is not one-to-one, since for any even natural number $m$, we have $g(m) = g(m+1)$ but $m \neq m+1$.
The image of a function $f : A \to B$ is the set $\operatorname{im} f = \{f(x) \mid x \in A\}$.
Clearly, $\operatorname{im} f \subseteq \operatorname{rng} f$. However, it is not always the case that $\operatorname{im} f = \operatorname{rng} f$. If it is, we get another special kind of function.
A function $f:A \to B$ is onto or surjective if $\forall y \in B, \exists x \in A, f(x) = y$.
The function $f:\mathbb N \to \mathbb N$ defined by $f(x) = x^2$ is not surjective (for instance, there is no natural number $n$ such that $n^2 = 2$). On the other hand, the function $g(x) = \left\lfloor \frac x 2 \right\rfloor$ from the above example is surjective.
If we combine these two properties, we get an even more special kind of function.
A function $f:A \to B$ is a correspondence or bijection if $f$ is one-to-one and onto.
A correspondence between two sets $A$ and $B$ uniquely pairs each element of $A$ with an element in $B$. This turns out to be how we can determine something like $|\mathbb N| = |2 \mathbb N|$, where $2 \mathbb N$ is the set of even natural numbers. The correspondence from $\mathbb N$ to $2 \mathbb N$ is very simple: it's the function defined by $f(n) = 2n$. Since a correspondence pairs each element in the set, both sets must be the same size. If $B$ were smaller than $A$, then $f$ couldn't be one-to-one, and if $B$ were larger than $A$, then $f$ couldn't be onto. This turns out to be how Cantor shows that $|\mathbb R| \gt |\mathbb N|$, by showing that there are real numbers that would be "missed" by any function that one claimed to be a correspondence from $\mathbb N$ to $\mathbb R$.
Let's begin with some number theory. One of the first topics in elementary number theory is divsibility of integers. Division on integers is an interesting operation even though we've likely been dividing numbers since childhood without thinking about it too much. However, where it does come back to trip us up again is when we start to learn how to program. This usually comes in the form of trying to divide two numbers and then remembering that dividing two integers doesn't always give you an integer. This happens to be why division on integers is interesting.
Let $a,b$ be integers. We say that $a$ divides $b$, written $a \mid b$, if there exists an integer $n$ such that $a \cdot n = b$. We also say that $b$ is a multiple of $a$.
First, note that divisbility defines a relation in the sense that we defined last class. Of course, when we defined relations, we treated them as a set. Often times, we treat them as operators, writing them infix like $a \lt b$ or $a \mid b$ instead of treating them as sets and writing them $(a,b) \in \lt$ or $(a,b) \in \mid$.
Observe that by this definition, we have that for all integers $n$, $n \mid 0$ and in particular, $0 \mid 0$. At first this seems a bit odd because it feels like we are tempting fate by talking about dividing by 0, but if we take a more careful look at the definition of divisibility, we see that there actually isn't any division going on. What we're really talking about is multiplication. This important to keep in mind because Rosen, for whatever reason, chooses to add the condition that $a \neq 0$.
For all integers $a, b, c$, if $a \mid b$ and $a \mid c$, then $a \mid (b + c)$.
We can use what we've learned last week and translate it into logic-language: $$\forall a, b, c, (a \mid b) \wedge (b \mid c) \rightarrow a \mid (b+c).$$ So in order to prove this statement is true, we need to consider every integer for $a,b,c$ and assume that our hypothesis, $a \mid b$ and $a \mid c$, is true. Remember that since this is an implication, we don't care about the case when we don't have $a,b,c$ such that $a \mid b$ and $a \mid c$.
Let $a,b,c$ be arbitrary integers and assume that $a \mid b$ and $a \mid c$. Since $a \mid b$, there exists an integer $m$ such that $b = m \cdot a$, and since $a \mid c$, there exists an integer $n$ such that $c = n \cdot a$. Now, consider the integer $b + c$. We have $$b + c = a \cdot m + a \cdot n = a \cdot (m + n).$$ Since $m + n$ is an integer, by definition of divisibility we have $a \mid (b+c)$.
Let's review what it is that we've done in this proof.
| Let $a,b,c$ be arbitrary integers | Since our claim about $a,b,c$ must hold for all possible integers, we simply say that they're arbitrarily chosen. We can think of $a,b,c$ as placeholders for any integers that satisfy our condition. |
| and assume that $a \mid b$ and $a \mid c$. | Here is our condition under which we chose $a,b,c$ and the assumption that we need to make to prove the theorem. |
| Since $a \mid b$, there exists an integer $m$ such that $b = m \cdot a$, | This is the definition of divsibility. Just like above, we don't know exactly which integer works, we only know that one of them works, so we give it a name (in this case $m$). |
| and since $a \mid c$, there exists an integer $n$ such that $c = n \cdot a$. | This is the definition of divsibility again. Here, we make sure to choose a different variable name that hasn't been used ($n$). |
| Now, consider the integer $b + c$. | This is because adding two integers gives us another integer. |
| We have $$b + c = a \cdot m + a \cdot n = a \cdot (m + n).$$ | This is some simple algebraic manipulation. |
| Since $m + n$ is an integer, by definition of divisibility we have $a \mid (b+c)$. | This follows from the above algebra, and we apply the definition of divisibility. |
| $$\tag*{$\Box$}$$ | We end proofs with a box. In typography, this is called a tombstone and you often see these in publications like magazines. In mathematics, some call this a halmos, after Paul Halmos, who first used it in the mathematical context. |
Since divisibility is a relation, we can think about what kinds of useful properties it might have.
A binary relation $R$ is transitive if $\forall a, b, c$, $a \mathrel R b \wedge b \mathrel R c \rightarrow a \mathrel R c$.
In other words, if I know that $a$ is related to $b$ in some way and $b$ is related to $c$, then I can conclude that $a$ is related to $c$. The subset relation is a transitive relation that we've seen already: if we know that $A \subseteq B$ and $B \subseteq C$, it's not too hard to argue that $A \subseteq C$. Similarly, implication is also transitive: if $p \rightarrow q$ and $q \rightarrow r$, then it's not too hard to see that $p \rightarrow r$. We will show that divsibility is transitive.
For all integers $a, b, c$, if $a \mid b$ and $b \mid c$, then $a \mid c$.
Let $a, b, c$ be arbitrary integers and assume that $a \mid b$ and $b \mid c$. Since $a \mid b$, there exists an integer $m$ such that $b = m \cdot a$. Since $b \mid c$, there exists an integer $n$ such that $c = n \cdot b$. Then we get $$c = n \cdot b = n \cdot (m \cdot a) = (n \cdot m) \cdot a.$$ But $n \cdot m$ is also an integer, so by the definition of divisibility, we have $a \mid c$.
Here is a useful theorem that we'll leave as an exercise.
For all integers $a,b,c,m,n$, if $a \mid b$ and $a \mid c$, then $a \mid (bm + cn)$.
Now, we'll take a trip back to grade school and think about division again. Recall that if we divide two numbers $a$ by $b$, if $a$ is a multiple of $b$ (if $b \mid a$), we get an integer $q$ which we call the quotient. But if $a$ isn't a multiple of $b$, we get a remainder $r$. The following theorem, called the Division Theorem formally states that whenever we divide two numbers $a$ and $b$, we will always be able to "get" the integers $q$ and $r$ (they exist) and that they're unique.
For all integers $n$ and $d \gt 0$, there exist unique integers $q$ and $r$ such that $n = d \cdot q + r$ and $0 \leq r \lt d$.
To prove this theorem, we have to show two things:
Let's prove existence first as a lemma. This will be an example of a non-constructive existence proof. That is, we'll be showing that an appropriate $q$ and $r$ have to exist, but we don't actually give a way to find them. This is typical of existence proofs that rely on contradiction: we assume that the element we want can't exist and then run into a contradiction and conclude that the element must exist after all.
For all integers $n$ and $d \gt 0$, there exist integers $q$ and $r$ such that $n = d \cdot q + r$ and $0 \leq r \lt d$.
Let $n$ and $d \gt 0$ be arbitrary integers. Consider the set $S = \{n - d \cdot q \mid q \in \mathbb Z\}$. Since $d \gt 0$ and $q \in \mathbb Z$, there must be some non-negative element in $S$. Let $r$ be the smallest non-negative member of $S$ and let $q$ be such that $r = n - d \cdot q$. We want to show that $r \lt d$.
Suppose that it isn't and we have $r \geq d$. Then we have $0 \leq r-d \lt r$, and $r-d$ is also a non-negative element of $S$. However, this means that there is a smaller element of $S$ than $r$, which contradicts our assumption that $r$ was the smallest non-negative element of $S$. Therefore, it can't be the case that $r \geq d$.
Since $q \in \mathbb Z$ is such that $r \in S$, we have that $r = n - d \cdot q$. This gives us $n = d \cdot q + r$ with $0 \leq r \lt d$ as desired.
So now that we know that our desired integers have to be out there somewhere, we need to show that they're the only ones. To see how we might do this, it's worth considering the definition of a bespoke quantifier, $\exists!$. This is called the unique existential quantifier and it's not really necessary, because we can express the definition of unique existence without it: \[\exists x (P(x) \wedge \forall y (P(y) \rightarrow x = y)).\]
This says that there exists an element $x \in S$ such that $P(x)$ holds and if $P(y)$ holds for any $y \in S$, it must be the case that $x = y$. In other words, the only element for which $P$ can hold is $x$.
This second part of the statement gives us an idea of how to prove that our integers are unique: we assume that we have two solutions, and then arrive at the conclusion that these two solutions were the same to begin with.
Let $n$ and $d \gt 0$ be arbitrary integers. By Lemma 2.36, there exist integers $q$ and $r$ such that $n = d \cdot q + r$ and $0 \leq r \lt d$. We want to show that $q$ and $r$ are unique.
Suppose that there exist integers $q_1, r_1, q_2, r_2$ such that $n = d \cdot q_1 + r_1$ and $n = d \cdot q_2 + r_2$ with $0 \leq r_1, r_2 \lt d$. Without loss of generality, let $r_1 \geq r_2$ and consider the following system of equations \begin{align*} n &= q_1 \cdot d + r_1 \\ n &= q_2 \cdot d + r_2. \end{align*} Subtracting gives us $0 = (q_1 - q_2) \cdot d + (r_1 - r_2)$. Rearranging this equation gives us $(q_2 - q_1) \cdot d = r_1 - r_2$. Since $(q_2 - q_1)$ is an integer, we have $d \mid (r_1 - r_2)$ by definition of divisibility. Since $0 \leq r_1 - r_2 \lt d$, we have $r_1 - r_2 = 0$ and therefore $r_1 = r_2$. Then from above, this gives us $(q_2 - q_1) \cdot d = 0$. Since $d \gt 0$, we have $q_2 - q_1 = 0$ and therefore $q_1 = q_2$. Thus, we have shown that $q$ and $r$ are unique.
You might recall that all of the numbers we discussed in the Division Theorem actually have names (which is why we use certain letters to name them):
We are particularly interested in divisors. Why? Because the divisors of a number are exactly those that divide it. We get the following definition.
The set of divisors of $n$ is defined by $\operatorname{Div}(n) = \{a : a \mid n\}$.
One thing that is of particular interest is when certain numbers share divisors. This is one way that we can relate numbers with each other. This definition of divisors as a set is nice because it is obvious what the common divisors of $m$ and $n$ are supposed to be: the intersection of the sets $\operatorname{Div}(m)$ and $\operatorname{Div}(n)$. We then use this to get the following definition.
If $m$ and $n$ are integers with $m \neq 0$ or $n \neq 0$, then the greatest common divisor of $m$ and $n$, denoted $\gcd(m,n)$, is the largest element of $\operatorname{Div}(m) \cap \operatorname{Div}(n)$. If $m = n = 0$, we define $\gcd(m,n) = 0$.
From the way we've defined this, it's not entirely obvious that the gcd should exist. Without loss of generality, let $m \neq 0$. Then $\operatorname{Div}(m)$ has a largest element $m$ and a smallest element $-m$. This means that $\operatorname{Div}(m)$ is finite and that $\operatorname{Div}(m) \cap \operatorname{Div}(n)$ is also finite. Since 1 is a divisor for every integer, we know that $1 \in \operatorname{Div}(m) \cap \operatorname{Div}(n)$, which means that $\operatorname{Div}(m) \cap \operatorname{Div}(n)$ is non-empty. Since it's not empty, $\operatorname{Div}(m) \cap \operatorname{Div}(n)$ must contain a largest element, which is the gcd.
From the Division Theorem, we get the following result about the gcd by Étienne Bézout in the mid-1700s.
For all integers $m$ and $n$, there exist integers $a$ and $b$ such that $\gcd(m,n) = a \cdot m + b \cdot n$.
If $m = n = 0$, then any $a$ and $b$ works. So without loss of generality, we have non-zero $m$ or $n$. Consider a set $S = \{am + bn \mid a,b \in \mathbb Z\}$. Let $d$ be the least positive integer in $S$ and let $a$ and $b$ be such that $d \in S$. First, we will show that $d$ is a common divisor of $m$ and $n$.
Suppose otherwise that $t$ isn't a common divisor of $m$ and $n$. Without loss of generality, say $d \not \mid m$. Then by the Division Theorem, there exist $q$ and $r$ such that $m = q \cdot d + r$ and $d \gt r \neq 0$. Then we have \begin{align*} r &= m - q \cdot d \\ &= m - q \cdot (am + bn) \\ &= (1-qa)m + (-bq)n \end{align*} This gives us $r \in S$. But since $r \lt d$, this contradicts $d = \min S$. Therefore, $d$ must be a common divisor of $m$ and $n$.
Now, consider any common divisor $c$ of $m$ and $n$. Then there exist integers $s$ and $t$ such that $m = cs$ and $n = ct$. This gives us \begin{align*} d &= am + bn \\ &= acs + bct \\ &= c \cdot (as + bt) \end{align*} Therefore, $c \mid d$ and therefore $c \leq d$. Thus, $d = \gcd(m,n)$.
Bézout's lemma is also called Bézout's identity. Confusingly, Bézout has another famous result named after him in algebraic geometry called Bézout's theorem.
Bézout's lemma is interesting in that it provides a characterization for the gcd of $m$ and $n$. Suppose we have $m$, $n$, and what we think is $\gcd(m,n)$. Is there a way to check that the gcd is really correct without computing it? If we had the coefficients of Bézout's lemma for $m$ and $n$, then we could check very easily. Of course, we're getting a bit ahead of ourselves, but keep this idea in mind while we tackle a more immediate question: how does one compute the gcd?
The obvious way to do this is to compute the set of divisors for $m$ and $n$, compute their intersection, and take the largest element in the intersection. The big problem with this approach is that factoring integers is a notoriously hard problem to solve efficiently. There are some heuristics that we can apply and some tricks that we can pull, but in general we do not have a fast algorithm for factoring numbers. This fact happens to be the backbone of many public-key cryptographic systems we use today.
However, if all we care about is the greatest common divisor and not any of the other divisors, then the following theorem will get us there efficiently.
First, a bit of notation.
Let $n,d,q,r$ be integers that satisfy $n = d \cdot q + r$ and $0 \leq r \lt d$, as in the Division Theorem. We define the function $n \mathop{\mathbf{mod}} d = r$.
You may be more familiar with $\mathop{\mathbf{mod}}$ as the modulus operator % from Python. It's a handy way to refer to the remainder without having to say "$r$ as in the Division Theorem" every time.
If $d = \gcd(a,b)$, $b \neq 0$, and $r = a \mathop{\mathbf{mod}} b$, then $d = \gcd(b,r)$.
Let $a$ and $b \gt 0$ be arbitrary. Let $q$ be such that $a = qb + r$. If $d \mid a$ and $d \mid qb$, then $d \mid (a - qb)$, and therefore, $d \mid r$. In other words, $d$ is a common divisor of $b$ and $r$. Now consider an arbitrary common divisor $c$ of $b$ and $r$. If $c \mid b$ and $c \mid r$, we have $c \mid (qb+r) = a$, so $c$ is also a common divisor for $a$. This means we must have $c \leq d$. Therefore, $\gcd(b,r) = d = \gcd(a,b)$.
This result gives us a way to compute the gcd: $$\gcd(a,b) = \begin{cases} a & \text{if $b = 0$,} \\ \gcd(b, a \mathop{\mathbf{mod}} b) & \text{otherwise,} \end{cases}$$
which becomes the following function fairly straightforwardly:
def euclid(a,b):
if b == 0:
return a
else:
return euclid(b, a%b)
This algorithm is called Euclid's algorithm, named after Euclid who describes it in the Elements. Euclid is the Greek mathematician from around 300 BC who, in the Elements, describes a lot of the elementary geometry, algebra, and number theory that we learn in school. Euclid's algorithm is considered to be one of the oldest known algorithms.
Suppose we want to compute the gcd of 232 and 72. We apply Euclid's algorithm as follows: \begin{align*} \gcd(232,72) &= \gcd(72,16) \\ &= \gcd(16,8) \\ &= \gcd(8,0) \\ &= 8 \end{align*}
Note. This is extra material and was not covered in lecture. You are not responsible for it.
Here, we will develop a constructive existence proof for $q$ and $r$ from the Division Theorem. This is more complicated because we are essentially attempting to formulate an algorithm for obtaining $q$ and $r$ and proving that it's correct.
First, we'll need the following definition.
The function $\operatorname{divmod} : \mathbb N \times \mathbb N \rightarrow \mathbb N \times \mathbb N$ is defined inductively on $n$ by: \begin{align*} \operatorname{divmod}(0,d) &= (0,0) \\ \operatorname{divmod}(n+1,d) &= \begin{cases} (q', r' + 1) & \text{if $r'+1 \neq d$, where $(q',r') = \operatorname{divmod}(n,d)$,} \\ (q' + 1, 0) & \text{otherwise.} \end{cases} \end{align*}
Note that our definition again mirrors the recursive structure of the natural numbers. This looks complicated, but we can easily turn it into a fairly simple function that may be less initimidating when expressed as code:
def divmod(n,d):
if n == 0:
return (0,0)
else:
(q1,r1) = divmod(n-1,d)
if r1 + 1 != d:
return (q1, r1 + 1)
else:
return (q1 + 1, 0)
We can try to trace through the code here to see what is being computed. Observe that in the recursive case, $n$ ever only decreases by one and both $q$ and $r$ increase by one. So what's going on here? Basically, this algorithm counts down from $n$, increasing the remainder counter $r$. If $r$ ever hits $d$, we know that we've counted a multiple of $d$, so we increase our quotient counter $q$ by one and reset $r$ to 0.
Note that divmod is actually a built-in function provided by the Python standard library and we don't need to write it ourselves. Helpfully, the Python documentation tells us that the result of divmod(n,d) is supposed to be (n // d, n % d), which is what we want. Of course, the difficult part is proving that this is the case. Like many algorithms you'll prove correctness for in the future, this will be done by induction.
For all natural numbers $n$ and $d$, if $\operatorname{divmod}(n,d) = (q,r)$, then $n = q \cdot d + r$ with $d \neq 0$ and $r \lt d$.
We will prove this by induction on $n$. Let $d$ be an arbitrary nonzero natural number.
Base case. We have $n = 0$. By definition, we have $\operatorname{divmod}(0,d) = (0,0)$. Then we have $0 = 0 \cdot d + 0$ and $d \gt 0 = r$.
Inductive case. Let $k$ be an arbitrary natural number and assume that if $\operatorname{divmod}(k,d) = (q',r')$, then $k = q' \cdot d + r'$ with $r' \lt d$. We will consider $n = k+1$. Note that we have two cases to deal with: when $r'+1 \neq d$ and when it doesn't.
First, we consider the case when $r'+1 \neq d$. In this case, we have $(q,r) = (q',r'+1)$ by definition. This gives us: \begin{align*} q \cdot d + r &= q' \cdot d +(r' + 1) &\text{definition} \\ &= k+1 &\text{ind. hyp.} \end{align*} so this case holds for $k+1$. Now, we need to show that $r \lt d$. Since $r = r' + 1 \neq d$ by our assumption and $r' \lt d$ by our inductive hypothesis, we can conclude that $r \lt d$.
Our second case is when $r'+1 = d$, so we have $(q,r) = (q'+1,0)$ by definition. We then have: \begin{align*} q \cdot d + r &= (q'+1) \cdot d + 0 &\text{definition} \\ &= q' \cdot d + d \\ &= q' \cdot d + (r' + 1) &\text{assumption} \\ &= k + 1 &\text{ind. hyp.} \end{align*} and so this case also holds for $k+1$. And since $r = 0$, we have $r \lt d$.
We've now shown that $k+1 = q \cdot d + r$ with $r \lt d$ if $(q,r) = \operatorname{divmod}(k+1,d)$ and therefore, this must hold for all $n \in \mathbb N$.
This gets us almost all the way there, except that we've only defined $\operatorname{divmod}$ on $\mathbb N$. Since the claim of the Division Theorem is for integers, we will need to somehow extend our definition to $\mathbb Z$, which is doable, but we'll leave that for now.