Though there is a constructive proof due to Āryabhaṭa found in the textbook, the proof that we'll go through is a non-constructive proof. When we proved the Division Theorem, we showed the existence of integers $q$ and $r$ by showing explicitly how to compute them. Here, we'll use a different tactic, a proof by contradiction.
Let's outline the general logical foundations for this approach. First, we have to ask ourselves what it means to prove that a statement is false. With other propositions we've seen, like $p \wedge q$ or $p \rightarrow q$, we construct their proofs using proofs of $p$ and $q$, the constituent parts of the proposition. However, for a proposition like $\neg p$, we only have the proposition $p$ and it's clear that a proof of $p$ would not let us prove $\neg p$.
Are there things that we know definitely can't be true? Yes! Consider the proposition $p \wedge \neg p$. This proposition can't be true because it can never be the case that both $p$ and $\neg p$ are true. Any proposition of this form is called a contradiction.
For any proposition $p$, the proposition $p \wedge \neg p$ is said to be a contradiction, and is denoted $\bot$.
The approach to proving $\neg p$ is show that it cannot be the case that $p$ can be proven. What does that look like?
The inference rule for this is \[\frac{\boxed{\begin{matrix}p \\ \vdots \\ \bot \end{matrix}}}{\neg p} \neg i\] In other words, a proof of $\neg p$ is a proof of the implication $p \rightarrow \bot$. One way to think of this is that $\neg p$ is a refutation of $p$.
Let's proceed with the proof of Bézout's identity; that for all integers $a$ and $b$, there are integers $x$ and $y$ such that $\gcd(a,b) = ax + by$.
If $a = b = 0$, then any $x$ and $y$ works. So without loss of generality, we have non-zero $a$ or $b$. Consider all integers of the form $ax + by$ for all integers $x$ and $y$. Let $d = am + bn$ be the least positive integer of this form where $m$ and $n$ are integers.
First, we will show that $d$ is a common divisor of $a$ and $b$. Suppose it isn't. That is, we assume that $d$ isn't a common divisor of $a$ and $b$. Without loss of generality, say $d \nmid a$. Then by the Division Theorem, there exist integers $q$ and $r$ such that $a = q \cdot d + r$ and $d \gt r \gt 0$. Then we have \begin{align*} r &= a - q \cdot d \\ &= a - q \cdot (am + bn) \\ &= (1-qm)a + (-nq)b \end{align*} This gives us that $r$ is an integer of the form $ax+by$. But since $0 \lt r \lt d$, this contradicts our assumption that $d$ was the least positive integer of that form. Therefore, $d$ must be a common divisor of $a$ and $b$.
Now that we know $d$ is a common divisor, it remains to show that $d$ is the greatest common divisor of $a$ and $b$. So consider any common divisor $c$ of $a$ and $b$. Then there exist integers $s$ and $t$ such that $a = cs$ and $b = ct$. This gives us \begin{align*} d &= am + bn \\ &= csm + ctn \\ &= c \cdot (sm + bn) \end{align*} Therefore, $c \mid d$ and therefore $c \leq |d|$. Thus, $d = \gcd(m,n)$.
Here, it's worth noting a subtle assumption that we made. Our assumption in the proof above was that $d$ was not a common divisor of $a$ and $b$. This is a proposition of the form $\neg p$. We then used this assumption to arrive at a contradiction. According to our rules, this means that we now have a proof of $\neg \neg p$.
Are $p$ and $\neg \neg p$ the same thing? In classical logic, this is the case. But consider the following interpretation for inference rules. We've said that an inference rule says that if we have proofs of the statements on the top, this allows us to construct a proof for the statement on the bottom. This is the intuitionist or constructivist interpretation of logic.
If we think a few steps more, we see that in this interpretation, we can view this as a computation. An inference rule describes consuming proofs of a certain type in order to construct a proof of a certain type. Since the proofs are necessary ingredients for the computation and construction of a new proof, we really need to have such proofs in hand.
So the question we need to think about is this: Is having a proof of $\neg \neg p$ good enough to use as a proof of $p$? What we really have is a proof that it can't be the case that there's a proof of $\neg p$, which is a proof that it can't be the case that there's a proof of $p$. But we are still without a proof of $p$.
Another way is to consider the question of the law of excluded middle (LEM), which is the statement $p \vee \neg p$. In classical logic, this is obviously true—in fact, it's an axiom. It must be the case that $p$ is either true or false. But the intuitionist interpretation of this statement is that it must be the case that we have a proof of $p$ or a proof of $\neg p$. This is not true—there are many statements for which we have no proof either way.
So a proof system for classical logic is one that includes an inference rule for LEM, while intuitionists would reject it. \[\frac{}{p \vee \neg p} \mathrm{LEM}.\] Using LEM, one can then derive the rule \[\frac{\neg \neg p}{p} \neg\neg e,\] that one can prove $p$ from its double negation. This rule can then be used to derive the rule \[\frac{\boxed{\begin{matrix}\neg p \\ \vdots \\ \bot \end{matrix}}}{p} \mathrm{PBC},\] where PBC stands for "proof by contradiction".
We briefly mentioned that the naive way of computing the gcd of $m$ and $n$ was inefficient. How do we measure the efficiency of algorithms? Informally, we can think of this as the number of steps that the algorithm needs to take, and we compare this to the size of the input of the algorithm. We would like the number of steps of our algorithm to grow slowly as we give it larger and larger inputs.
But what is the size of a number? Here, the size of the number is the size of its representation. This would be the number of digits it takes to write the number down. This means that the size of a number $n$ is $d = \log n$. Here, the exact base doesn't matter too much. But what this means then is that searching for divisors from 1 through $\sqrt n$ would require approximately $\sqrt n = 10^{\log \sqrt n} = 10^d$ steps, assuming your number was in decimal representation (swap with the appropriate base if desired). In other words, the number of steps we need to compute this grows exponentially as we increase the size of our number. Unfortunately, exponentially growing functions do not grow slowly.
So does Euclid's algorithm do any better than this? How would we know? We'll take a brief excursion into algorithm analysis and learn about strong induction and some properties of the Fibonacci numbers along the way.
We will define this famous sequence of numbers in this way.
The Fibonacci numbers are defined by
This sequence is attributed to Fibonacci, an Italian mathematician from the 13th Century in what was then the Republic of Pisa, though their origin can be traced further back to a number of Indian mathematicians as early as the 3rd Century BC. The Fibonacci numbers are one of those things from math that spookily shows up in all sorts of different places not only in mathematics, but in all sorts of natural phenomena. You may have been asked to compute the Fibonacci numbers as a programming exercise before.
def fib(n: int) -> int:
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
Since this definition is recursive, the Fibonacci numbers seem to lend themselves to inductive proofs quite naturally. For instance, something we like to do with recursively defined sequences like this is to find a closed form for them. This would allow us to compute something like $f_{200}$ without having to compute $f_{2}$ through $f_{199}$ first.
It turns out that one can get an exact formula for $f_n$ in terms of the solutions to the polynomial equation $x^2 - x - 1 = 0$. There are a few ways to arrive here, none of which we'll be going through. But the roots of this equation are $\varphi = \frac{1 + \sqrt 5}{2}$, the Golden Ratio, and its conjugate root, $\widetilde \varphi = 1 - \varphi = \frac{1-\sqrt 5}{2}$. What we get is $$f_n = \frac{\varphi^n - \widetilde \varphi^n}{\sqrt 5}.$$ This is now known as Binet's formula, although it seems like it was known by a bunch of mathematicians like Euler, Bernoulli, and de Moivre over a century earlier.
For all $n \in \mathbb N$, $$f_n = \frac{\varphi^n - \widetilde \varphi^n}{\sqrt 5}.$$
We will show this by induction on $n$.
Base case. For $n = 0$, we have $f_0 = 0 = \frac{\varphi^0 - \widetilde \varphi^0}{\sqrt 5}$. For $n = 1$, we have \[ \frac{\varphi - \widetilde \varphi}{\sqrt 5} = \frac{\frac{1+\sqrt 5}{2} - \frac{1-\sqrt 5}{2}}{\sqrt 5} = \frac{\frac{2\sqrt 5}{2}}{\sqrt 5} = \frac{\sqrt 5}{\sqrt 5} = 1. \]
Inductive case. Let $k$ be an arbitrary natural number. Suppose that $f_k = \frac{\varphi^k - \widetilde \varphi^k}{\sqrt 5}$. Consider $f_{k+1}$. \begin{align*} f_{k+1} &= f_k + f_{k-1} & \text{Fibonacci numbers} \\ &= \frac{\varphi^k - \widetilde \varphi^k}{\sqrt 5} + f_{k-1} &\text{inductive hypothesis} \\ &\cdots \end{align*}
Unfortunately, it seems like we are stuck here. We need to say something about $f_{k-1}$ but our inductive hypothesis only covers $f_k$. What can we do?
One way to fix this is to change the statement we're trying to prove by including some additional information about terms that come before $n$. This is kind of awkward. What would be better if there were a way to add a bit more to our inductive hypothesis. However, the induction axiom doesn't have any room for this: \[P(0) \wedge \forall k (P(k) \rightarrow P(k+1)) \rightarrow \forall n P(n).\]
The inductive hypothesis here is $P(k)$. If we want to say more in our inductive hypothesis, we have to change our axiom and replace $P(k)$ with something else. In essence, when we add more to our inductive hypothesis, we make our inductive hypothesis stronger. Instead, we will consider the following strong induction axiom: \[P(0) \wedge \forall k (\forall j(j \leq k \rightarrow P(j)) \rightarrow P(k+1)) \rightarrow \forall n P(n).\] This is called strong induction since the assumption we make in the inductive case is stronger.