MPCS 50103 — Lecture 4

Now, what if we throw an additional twist in there and were presented with a system of linear congruences with different moduli? Consider the following: \begin{align*} x &\equiv 2 &\pmod 3 \\ x &\equiv 3 &\pmod 5 \\ x &\equiv 2 &\pmod 7 \end{align*}

This problem, solving for $n$, was posed by the Chinese mathematician Sunzi from the third century AD. The following theorem is due to him.

Consider integers $a_1, a_2, \dots, a_k$ and suppose that $m_1, m_2, \dots, m_k$ are $k$ pairwise positive coprime moduli. Then the system of equations \begin{align*} x & \equiv a_1 & \pmod{m_1} \\ x & \equiv a_2 & \pmod{m_2} \\ & \vdots \\ x & \equiv a_k & \pmod{m_k} \end{align*} has a unique solution modulo $m_1 \cdot m_2 \cdots m_k$.

Let $a_1, \dots, a_k$ be arbitrary integers and let $m_1, \dots, m_k$ be arbitrary positive integers with $\gcd(m_i, m_j) = 1$ for $1 \leq i \lt j \leq k$. Let $n = m_1 \cdot m_2 \cdots m_k$. For each $1 \leq i \leq k$, we define $n_i = \frac n {m_i}$. Here, we will rely on a claim that we have not proved yet:

If $a$ and $b$ are relatively prime and $a$ and $c$ are relatively prime, then $a$ and $bc$ are relatively prime.

From this, we have that $m_i$ is relatively prime to $n_i$.

First, we will consider a solution $x_i$ to the simultaneous congruences \begin{align*} x_i & \equiv 0 & \pmod{m_1} \\ x_i & \equiv 0 & \pmod{m_2} \\ & \vdots \\ x_i & \equiv 0 & \pmod{m_{i-1}} \\ x_i & \equiv 1 & \pmod{m_i} \\ x_i & \equiv 0 & \pmod{m_{i+1}} \\ & \vdots \\ x_i & \equiv 0 & \pmod{m_k} \end{align*} In other words, we set $a_i = 1$ and $a_j = 0$ for $j \neq i$.

Let $b = n_i^{-1} \pmod{m_i}$. Since $n_i$ and $m_i$ are relatively prime, this inverse must exist. Now, consider $x_i = b \cdot n_i$. We claim $x_i$ is a solution to this simultaneous congruence.

Observe that $x_i = b \cdot n_i = n_i^{-1} \cdot n_i = 1 \pmod{m_i}$. Then, we recall that $m_j \mid n_i$ for $i \neq j$. Therefore, we have $m_j \mid b \cdot n_i = x_i$. In other words, $x_i = 0 \bmod m_j$.

To solve the original simultaneous congruences, we take each of the $x_i$'s and let $x = a_1 \cdot x_1 + \cdots + a_k \cdot x_k$. Then $x$ is a solution to the simultaneous congruence. To verify this we observe that for each $m_i$, we have \begin{align*} x &= a_1 \cdot x_1 + \cdots + a_k \cdot x_k & \pmod{m_i} \\ &= a_1 \cdot 0 + \cdots + a_{i-1} \cdot 0 + a_i \cdot 1 + a_{i+1} \cdot 0 + \cdots + a+k \cdot 0 & \pmod{m_i} \\ &= a_i & \pmod{m_i} \end{align*}

To see that our solution is unique, suppose we have two solutions $x_1$ and $x_2$. Then $x_1 - x_2 = 0 \pmod{m_i}$ for every $i$. We must have $m_i \mid (x_1 - x_2)$ for each $i$. Here, we will rely on another claim we have not proved yet:

If $a$ and $b$ are relatively prime, and $a \mid n$ and $b \mid n$, then $ab \mid n$.

Since all of our $m_i$ are pairwise relatively prime, $n = m_1 \cdots m_k$ also divides $x_1 - x_2$. Therefore, we have $x_1 = x_2 \pmod n$.

Here is the problem of Sunzi stated again. \begin{align*} x &\equiv 2 &\pmod 3 \\ x &\equiv 3 &\pmod 5 \\ x &\equiv 2 &\pmod 7 \end{align*}

To solve this, we follow the proof of the Chinese Remainder Theorem and begin by computing each $n_i = \frac{m_i}{n}$. We have $$n_1 = 5 \cdot 7 = 35, \quad n_2 = 3 \cdot 7 = 21, \quad n_3 = 3 \cdot 5 = 15.$$ Next, we compute the inverses of the $n_i$'s. We have $$35^{-1} = 2^{-1} = 2 \pmod 3, \quad 21^{-1} = 1^{-1} = 1 \pmod 5, \quad 15^{-1} = 1^{-1} = 1 \pmod 7.$$

Finally, we compute $x = a_1 x_1 + a_2 x_2 + a_3 x_3 \pmod{105}$: \begin{align*} x &= 2 \cdot 70 + 3 \cdot 21 + 2 \cdot 15 &\pmod{105} \\ &= 140 + 63 + 30 &\pmod{105} \\ &= 233 & \pmod{105}\\ &= 23 & \pmod{105} \end{align*}

Fermat's Little Theorem

Recall that if we consider the structure $\mathbb Z_p$ for prime $p$, we are guaranteed that every element in $\mathbb Z_p$ has a multiplicative inverse. We will now see a result of Pierre de Fermat's about prime numbers and $\mathbb Z_p$, called Fermat's Little Theorem.

This is to distinguish it from the more famous Fermat theorem, Fermat's Last Theorem (there are no integers $a,b,c$ such that $a^n + b^n = c^n$ for $n \gt 2$). Much like Fermat's Last Theorem, Fermat stated the result in the mid 1600s but declined to give a proof of it because the proof was "too long".

While it would be about 400 years before Fermat's Last Theorem gets proven (by Andrew Wiles in 1995), Fermat's Little Theorem was proved only about 100 years later, by Leonhard Euler. The following proof, using modular arithmetic, is due to James Ivory in the early 1800s and a bit later by Dirichlet.

For all prime numbers $p$ and integers $a$ such that $\gcd(a,p) = 1$, $a^{p-1} \equiv 1 \pmod p$.

Consider $n = (a \cdot 1) \cdot (a \cdot 2) \cdot \cdots \cdot (a \cdot (p-1)) \pmod p$. We can view this product in two ways.

First, we claim without proof that $$\{[a]_p, [2a]_p, \dots, [(p-1)a]_p\} = \{[1]_p,[2]_p,\dots,[p-1]_p\}.$$ If you want to prove this, there are two things we must verify. First, you must verify that there is no $i$ with $1 \leq i \leq p-1$ such that $a \cdot i = 0 \pmod p$. Secondly, you must verify that for $1 \leq i \lt j \leq p-1$, $a \cdot i \neq a \cdot j$.

Now, we will cite another useful result that we won't prove:

For all primes $p$, $(p-1)! = -1 \pmod p$.

We can use this theorem together with our claim from above to conclude that $$n = (a \cdot 1) \cdot (a \cdot 2) \cdot \cdots \cdot (a \cdot (p-1)) = (p-1)! = -1 \pmod p.$$

Next, we can rearrange the terms of the product to give us $n = a^{p-1} \cdot (p-1)! \pmod p$. Again, by the cited theorem, we have $n = a^{p-1} \cdot (-1) \pmod p$.

Putting these two equations together, we get that $n = a^{p-1} \cdot (-1) = -1 \pmod p$. Therefore, $a^{p-1} = 1 \pmod p$.

Let's compute $7^{315} \bmod 11$. Instead of performing division on $7^{315}$, we manipulate the exponent. Observing that $10 = 11-1$, we see that $315 = 10 \cdot 31 + 5$ by division. This gives us \begin{align*} 7^{315} &= 7^{31 \cdot 10 + 5} &\pmod{11} \\ &= (7^{10})^{31} \cdot 7^5 &\pmod{11} \\ &= 1^{31} \cdot 49^2 \cdot 7 &\pmod{11} \\ &= 5^2 \cdot 7 &\pmod{11} \\ &= 25 \cdot 7 &\pmod{11} \\ &= 3 \cdot 7 &\pmod{11} \\ &= 21 &\pmod{11} \\ &= 10 &\pmod{11} \\ \end{align*}

Bonus track: Cryptography

Cryptography is the study of how to transform information and communication so that it's concealed and secure. The most basic form of cryptography dates back to the classical era, via the use of substitution ciphers. These involve shifts of letters: if your cipher was shifted by 13, then you would map $a \rightarrow n, b \rightarrow o, c \rightarrow p, \dots$ and so forth. Such ciphers are still in use today, but not for secure communications. If you wanted to talk about your favourite currently screening movies and tv shows on Twitter or something, current internet etiquette is to conceal such spoilers by running it through a ROT13 encoding, which is just a substitution cipher with shift 13 that was just described.

The key to making cryptography work is by ensuring the encoding of the information remains secret. This doesn't work very well for substitution ciphers because the encoding is very simple. In fact, if you read enough ROT13 spoilers on Twitter, you can begin to see patterns of how words form. So one aspect of cryptography is ensuring that the key is complex and remains secret. However, this is only half the battle. The other problem that remains is how to communicate about the key.

In Canada, the banks collectively run a system called Interac, which, among other things, makes it very handy to transfer money via email and is why services like Venmo haven't really caught on. Suppose Alice and Bob, the two famous recurring characters of cryptography, want to transfer some Canadian money. So Alice logs onto her bank's website and initiates an email transfer that can be deposited by Bob into any bank account he wants once he gets the email. How does this stay secure? Alice will set a password that she communicates to Bob, who will then use the password to verify that he is allowed to accept the email transfer.

Hypothetically, what Alice needs to do is communicate the password to Bob in a secure way. What usually ends up happening is that Alice just emails Bob the password to the same account. This is bad, because if an eavesdropper, like Eve, gets access to Bob's email, then Bob is, as they say in Canada, hosed, because both the password and the email containing the transfer are sitting in Bob's email account. All Eve needs to do is use the password and then she can deposit the Canadian dollars into any bank account she wishes.

This highlights a common weakness: even though we may devise an unbreakable cryptographic scheme, if the key for that system is acquired, then we're screwed. This shifts the problem of secure communication from the message that we originally wanted to send to the problem of how to securely distribute our key.

One way to do this is to make it not a cryptographic problem anymore and do something like send the keys physically. Of course, this is what actually happened prior to communication with electronic computers. Note, however, that this is a cryptographically secure system, but that doesn't mean I couldn't just hire someone to steal your mail or break your kneecaps to get your key. Beyond those caveats, the big problem with such a scheme is that no one wants to wait for some mail just so they can log onto a website securely.

The use of a secret key for encryption and decryption is called private-key cryptography. This is also called symmetric-key cryptography. This name hints at the root of our problem and a possible solution: what if we separate the encryption and decryption key?

This leads us to the idea of public-key cryptography, or assymetric-key cryptography. Here, only the decryption keys are secret, while encryption keys are made public. This way, if Alice wants to send Bob a message that only Bob can read (perhaps containing Alice's credit card number or something), then Alice encrypts her message with Bob's public key, sends the encrypted message, and Bob would then decrypt the message.

The key to making this work is that the public and private keys need to be related in such a way that decryption is easy when the private key is known, but difficult when only the public key is known.

RSA

The most famous public-key cryptosystem (and definitely the most popular to teach in elementary number theory classes) is RSA, named for Rivest, Shamir, and Adleman who designed it in 1977 at MIT. RSA is commercially implemented in many Internet communication protocols currently in use and won Rivest, Shamir, and Adleman the Turing Award in 2002.

To see that this is true, we first recall that we have $$R = C^d = (M^e)^d = M^{ed} \bmod n.$$ Since $ed = 1 \bmod{(p-1)(q-1)}$, by the definitions of congruence and divisibility, we have $$ed = 1+k(p-1)(q-1)$$ for some integer $k$. This gives us $$R = M^{1+k(p-1)(q-1)} \bmod n.$$

Now, we show that $R = M \bmod p$. There are two cases: either $p \mid M$ or $p \nmid M$. In the first case, we have $M = 0 \bmod p$, which gives $$R = 0^{1+k(p-1)(q-1)} = 0 \bmod p$$ and therefore $R = 0 = M \bmod p$. In the second case, $p \nmid M$ implies $\gcd(p,M) = 1$ and therefore, by Fermat's Little Theorem, $M^{p-1} = 1 \bmod p$. Then we get $$R = M(M^{p-1})^{k(q-1)} = M\cdot 1^{k(q-1)} = M \bmod p.$$ Therefore, $R = M \bmod p$. A similar argument holds for $q$ and we can show $R = M \bmod q$. Since $p$ and $q$ are distinct primes, we must have $\gcd(p,q) = 1$ and therefore, by the Chinese Remainder Theorem, we can put together the two congruences to get $R = M \bmod{n}$.

Diffie-Hellman and discrete logarithms

RSA was the first practical public-key cryptosystem, but it was inspired by an earlier system proposed by Diffie and Hellman in 1976. They give a protocol for secure key-exchange, which, like RSA, is used all over the Internet today. Rather than factoring primes, their system is based on a related problem, the discrete logarithm problem.

Definition. A primitive root modulo a prime $p$ is an integer $r \in \mathbb Z_p$ such that every nonzero element of $\mathbb Z_p$ is a power of $r$.

Something that we won't prove is the fact that if $p$ is prime, then $\mathbb Z_p$ will contain a primitive root. We can think of this root as a generator, since all we need to generate every element in $\mathbb Z_p$ is to multiply $p$ by itself a bunch of times.

Now, if we can perform exponentation, we can also think of doing the "inverse" of exponentation: take a logarithm. Recall that a logarithm for base $b$ is $\log_b x = e$ if $x = b^e$. We want to "discretize" this notion.

Definition. Let $p$ be a prime, $r$ a primitive root modulo $p$, and $a$ an integer with $1 \leq a \leq p-1$. If $r^e \pmod p = a$ and $0 \leq e \leq p-1$, then $e$ is the discrete logarithm of $a$ modulo $p$ to the base $r$.

Then the discrete logarithm problem is: Given a prime $p$, primitive root $r$, and integer $a \in \mathbb Z_p$, find the discrete logarithm of $a$ for $r$. In other words, we want to find the $e$ such that $a = r^e \pmod p$.

The discrete logarithm problem turns out to be hard, for roughly the same reasons as factoring. Let's see how Diffie and Hellman make use of this.

As before, Alice and Bob want to share a private key using public pieces of information. They do the following.

Here, we distinguish between information that is made "public" (i.e., sent over an insecure channel and likely to be exposed) and "private" (information that is kept secure). The public pieces of information are $p$, $a$, $a^{k_1} \pmod p$, and $a^{k_2} \pmod p$. The private pieces of information are $k_1$, $k_2$, and $a^{k_1 k_2} \pmod p$.

If our eavesdropper Eve wanted to figure out the secret key $a^{k_1 k_2} \pmod p$, then something she could do is get $a^{k_1} \pmod p$ and $a^{k_2} \pmod p$, both of which have been transmitted insecurely, and compute $a^{k_1 k_2} \pmod p$ if she can figure out what $k_1$ and $k_2$ are. But figuring out $k_1$ and $k_2$ is exactly the discrete logarithm problem, since she would also know $a$ and $p$.

Why is this hard? Well, the obvious way to compute this would be to just try computing all the $a^i$s, which gives you up to $p$ things to check. A simple modification lets us go a bit quicker.

If we found the same number in both lists, say $a^{i \cdot r} = c \cdot a^j \pmod p$, then this gives us $a^{ir-j} = c \pmod p$ and $i\cdot r - j$ is the discrete logarithm for $c$.

Computing these takes $O(\sqrt p \log p)$ time. Then to do the search, we can sort both lists, which takes $O(\sqrt p \log p)$ time, and for each of the $\sqrt p$ elements in one list, perform binary search on the other list, for a total of $O(\sqrt p \log p)$ time.

This approach is called baby-step, giant-step, because one list consists of baby steps ($0,1,2,\dots,r-1$) and the other step consists of giant steps ($r,2r, \dots, r^2$).

As with the analysis of Euclid's algorithm, we have to be careful about what $O(\sqrt p \log p)$ means. Here, we're counting operations like multiplications and comparisons. However, you'll learn in algorithms that these computations should really be viewed at the bit level, like we did with Euclid's algorithm. This means that we should be considering the complexity of this algorithm in terms of $\log p$, not $p$. So if we let $n = \log p$, we actually have something like $O(n2^{\sqrt n})$.

The future

The viability of the public-key cryptography systems that we've looked at rely on the ease of multiplying numbers together and the hardness of factoring numbers. This makes it very difficult to design new cryptosystems. Obviously, computing the secret key has to be difficult, but it would be very easy to do if we didn't need to make using the public key easy too.

However, even the assumption that factoring is difficult is a significant one for a variety of reasons. First of all, it is not quite proven that factoring really is "hard". Currently, the best we can say is that there are no known efficient algorithms for factoring, but no one has been able to prove that it's impossible to come up with one. If someone were to come up with one, then we would be in a lot of trouble.

However, the other big problem is that someone has come up with an efficient factoring algorithm—assuming that you have access to a quantum computer. Peter Shor's quantum factoring algorithm from 1994 was one of the first algorithms to demonstrate that quantum computers were provably more powerful than classical computers.

While there are no viable large-scale quantum computers yet, the possibility that they will get built is driving current cryptography research, particularly since secure systems that are built now, need to be secure for the next decade or so. The NSA has recently put out proposals for new quantum-secure cryptographic methods. Many of these proposed systems were weeded out in the first round, by being broken using classical methods.

This speaks to the difficulty of constructing new cryptosystems. One advantage that the elementary number theoretic systems that we're used to had was that the problems underlying these systems had been well-studied for a very long time. Factoring was a difficult problem, but we spent hundreds of years studying it. Newer systems are built on much newer mathematics that are only a couple of decades old. We simply haven't had time to digest and completely understand them.

However, even many current secure systems have moved on from systems based on elementary number theory, like RSA, to systems based on problems from algebraic number theory, like elliptic curve cryptography. These are fairly well-understood, but are not resistant to quantum attacks.

Combinatorics

Combinatorics is the study of counting and enumerating objects and their arrangements. In computer science, many of the problems we study are combinatorial optimization problems, in the sense that we want to minimize or maximize some property of some discrete structure or search for a minimal or maximal such object. Combinatorics gives us the tools to analyze and solve these problems.

Suppose that you've been invited to an elaborate dinner party like for a wedding or something. Assuming you have no further dietary restrictions, you are presented with a choice of four starters, three mains, and three desserts. How many different three-course meals can you construct? If we first choose one of the four starters, then for each choice, we have another three choices of mains, and for each of those choices, we have a choice of three desserts. So this gives us $4 \times 3 \times 3 = 36$ different ways to construct our desired meal.

We can think of each course as a set: $S$, $M$, $D$, and so what we're really doing is asking for the number of tuples that we can get out of these sets, where the first element of the tuple is a starter, the second is a main, and the third is a dessert. In other words, we want to know $|S \times M \times D|$. This gives us a general formula for the size of a Cartesian product.

For finite sets $A_1, A_2, \dots, A_n$, $$|A_1 \times A_2 \times \cdots \times A_n| = |A_1| \cdot |A_2| \cdots |A_n|.$$

Informally, the product rule describes the number of things you get when you're given several sets of things to choose from and you can pick one from each.

Let $\Sigma$ be an alphabet of size $k$. How many strings over $\Sigma$ of length $\ell$ are there? Here, we can think of strings in the C sense, as an array of characters, or a tuple. This means that an $\ell$-length string is an $\ell$-tuple belonging to the set $\Sigma \times \Sigma \times \cdots \times \Sigma = \Sigma^\ell$. Applying the product rule, we get that there are $k^\ell$ $\ell$-length strings over $\Sigma$.

In other words, we want to make strings over the alphabet $U \cup L \cup N \cup S$. Then how many symbols do we have to make our password? Or, what is $|U \cup L \cup N \cup S|$? Luckily, our sets are disjoint, so we can immediately apply a lemma about the size of unions of disjoint sets back from our discussion about set theory:

For finite disjoint sets $A_1, A_2, \dots, A_n$, $$|A_1 \cup A_2 \cup \cdots \cup A_n| = |A_1| + |A_2| + \cdots + |A_n|.$$

Informally, the sum rule describes a situation where you're given disjoint sets of things to choose from, but you may only make one choice in total.

For simplicity, let's consider only passwords of length 12. Taking our discussion above, we know that we can make $$88^{12} = 215,671,155,821,681,003,462,656$$ possible passwords of length 12. This is about $2.1567 \times 10^{23}$, or somewhere around a third of a mole's worth of particles. However, this doesn't take into account any of the restrictions on what kinds of symbols must be included.

Consider passwords from the sets $(U \cup L)^{12}$ and $(U \cup N)^{12}$. We seem to be able to apply the product and sum rules here easily, but there's a catch. The passwords from $U^{12}$ appear in both sets, so if simply added their sizes, we would be double counting this quantity. Thus we get, \begin{align*} |(U \cup L)^{12} \cup (U \cup N)^{12}| &= |(U \cup L)^{12}| + |(U \cup N)^{12}| - |(U \cup L)^{12} \cap (U \cup N)^{12}| \\ &= |(U \cup L)^{12}| + |(U \cup N)^{12}| - |U^{12}| \\ &= 52^{12} + 36^{12} - 26^{12} \\ &= 395,519,958,867,910,127,616 \end{align*}

Here, we made the careful observation that if we take the union of two arbitrary sets, we may have a non-empty intersection. In this case, we need to take care not to count the intersection twice, which the Sum Rule doesn't account for. However, we have seen this result before, during our discussion on sets:

Permutations

Informally, permutations are ordered arrangements of a set of objects. This seems similar to what we'd do with strings or tuples, but the difference is that rather than selecting one element from a particular set, we are arranging elements from the same set. Let's define this formally first.

Consider a set $A = \{1,2,3\}$. We can define the function $\pi$ by $\pi(1) = 3$, $\pi(2) = 1$, and $\pi(3) = 2$. Note that since a permutation is a bijection, we require that each element of $A$ get mapped to a unique element of $A$.

A natural question is how many such functions must exist. We can view such functions as filling in the following $$\begin{pmatrix} 1 & 2 & 3 \\ \_ & \_ & \_ \end{pmatrix}.$$ For something of this size, we can easily enumerate all of the possibilities: \begin{matrix} (1,2,3) & (1,3,2) & (2,1,3) & (2,3,1) & (3,1,2) & (3,2,1) \end{matrix} This suggests that we can view this as an ordered arrangement of the elements of a set and we can apply some of the counting rules we've already seen.

If $A$ is a set containing $n$ elements, then the number of permutations of $A$ is $n!$.

We can think of this as an application of the product rule, where our first choice is from a set $S$ of size $n$, then we have a choice from $S \setminus \{a_1\}$ of size $n-1$, then a choice from $S \setminus \{a_1,a_2\}$ of size $n-2$ and so on.

Suppose you are an old-timey travelling salesperson, going around selling knives or encyclopedia sets or what-not. You have a list of $n$ cities you need to hit up and obviously you would like to minimize the cost of travelling to these cities. The problem you have to solve is: what is the minimal cost route that allows you to visit all of these cities once, including your trip back?

We can try to solve this problem by checking each possible route. How many routes are there? This is the same as asking in which order we would like to travel to each city and how many total choices we have. So for $n$ cities, this is just $n!$. This doesn't seem so bad if you've only got four cities to go to, since $4! = 24$. However, if you're a bit more ambitious and want to go to 8 cities, then you're looking at $8! = 40320$ different routes to think about. This is not something you'd want to do by hand, but easily handled by a computer. Double the number of cities again, though, and we get $16! = 20922789888000$, which Wolfram|Alpha tells me is about the number of red blood cells in the human body or about 70 times the number of stars in our galaxy. This number clearly grows very quickly—on the order of $\sqrt{2\pi n} \left( \frac n e \right)^n$, by Stirling's approximation.

This problem is the Travelling Salesman Problem, one of the most famous problems for which we do not have an efficient algorithm for solving. The problem dates back as far as the 1800s although it wasn't formally stated in its current form until around the 1930s. One point of interest is that I said cost and not necessarily distance. If we assume that the costs are distances, then we impose a condition on our cost function: that the distances satisfy the triangle inequality, $d(x,y) \leq d(x,z) + d(z,y)$. This assumption makes the problem slightly (but not significantly) easier. However, not all cost functions necessarily behave that way and we have a very relevant example of one: flight prices.

But what if we didn't want to go to every city, but instead wanted to go to some subset of them. How many ways are there to do this? We can think about this informally: we carry out the same process as before, but stop when we reach our desired number of choices. this lead to the following definition and result.

An $r$-permutation is a permutation of $r$ elements taken from a set of $n$ elements.

If $P(n,r)$ denotes the number of $r$ permutations of an $n$-element set, then $$P(n,r) = n \cdot (n-1) \cdot \cdots \cdot (n-r+1).$$

Proof. Note that the total number of ways to order a set of $n$ elements is $n!$. From above, we have $P(n,r)$ ways to order $r$ out of $n$ elements. We are then left with $n-r$ elements to order, which we can do $(n-r)!$ ways. This gives us \begin{align*} n! &= P(n,r) \cdot (n-r)! \\ P(n,r) &= \frac{n!}{(n-r)!} \end{align*} $\tag*{$\Box$}$

Note that we could have proved this algebraically—it's not too hard to manipulate the two factorials into our desired form. However, this doesn't really tell us much about how to obtain the value $P(n,r)$. The proof that we gave is an example of a combinatorial proof. This is a proof that relies on arguments about the construction and enumeration of the objects in question.

Notice that we relate the quantity that we're computing with the process by which we're constructing the associated object. This offers more insight into how we're enumerating or constructing the object than a direct algebraic proof. This kind of information is useful, particularly for computer scientists who are often interested in using this information to literally enumerate or construct solutions.

Combinations

Now, suppose that order is not so important and we are only concerned about the selection of $r$ objects from a set of $n$ objects.

An $r$-combination of a set $A$ with $n$ elements is a subset of $r$ elements from $A$. The number of $r$-combinations of a set with $n$ elements is denoted $C(n,r)$ or $\binom n r$. This is read "$n$ choose $r$".

What you'll find is that everyone introduces the notation for combinations as some variant of $C(n,r)$, because $C$ is a nice mnemonic for "choose" or "combination" but then this is almost immediately dropped for the $\binom n r$ notation. The $\binom n r$ are called binomial coefficients for reasons that we'll get to next class (see, I said we'd drop $C(n,r)$ almost immediately).

So when considering the number of $r$-combinations, we are basically counting the number of subsets of size $r$. Recall that sets are unordered so this differs from permutations in that all we care about is whether an element gets chosen at all.

Thinking back to a three element set $A = \{1,2,3\}$, we observe that unlike permutations, there is only one 3-combination: $A$ itself. Then how many 2-combinations are there? Let's enumerate all of the subsets of $A$ of size 2: $$\begin{matrix} \{1,2\} & \{1,3\} & \{2,3\} \end{matrix}$$ Remember that since sets are not ordered, $\{1,2\}$ and $\{2,1\}$ are the same set.

We can make use of the number of $r$-permutations of a set. We know that the number of $r$-permutations of a set of size $n$ is simply the number of a subset of size $r$. So we can do the following: first, choose a subset of size $r$, and then compute the number of permutations of this subset. This gives us $$P(n,r) = C(n,r) \cdot P(r,r).$$ Then doing some algebra, we get $$C(n,r) = \frac{P(n,r)}{P(r,r)} = \frac{n!}{r!(n-r)!}.$$

Suppose you and six of your friends decide to go somewhere for dinner and like fools, you all drive your own cars. When you get there, there are only three parking spots left. How many possible parking situations are there? In this case, it doesn't really matter who arrives first, second, and third, you just need to make sure not to be fourth. In this case, it makes sense to count the number of 3-combinations rather than 3-permutations. So we get $$C(7,3) = \frac{7!}{3!(7-3)!} = \frac{7!}{3! \cdot 4!} = \frac{7 \cdot 6 \cdot 5}{3 \cdot 2 \cdot 1} = 35$$ different ways where three people are happy and four people are stuck circling the parking lot.

Something you might have noticed when going through the previous example is that if we have $\binom 7 3 = \frac{7!}{3! \cdot 4!}$, then this looks the same as $\binom 7 4 = \frac{7!}{4! \cdot 3!}$. This is not a coincidence! First of all, this is very easy to verify.

We have $$C(n,n-r) = \frac{n!}{(n-r)! (n - (n-r))!} = \frac{n!}{(n-r)! \cdot r!} = C(n,r).$$

Now, intuitively, what does this mean? Suppose we have a set of $n$ elements. We want to choose $r$ of these elements to form a subset. Then there are $n-r$ elements that weren't chosen. Alternatively, we can think of this as choosing $n-r$ elements to exclude from our subset, so that the remaining $r$ elements happen to form our subset. In both cases, we get the same result.

We say a word over a binary alphabet, say $\{0,1\}$, is balanced if it contains exactly as many $0$s as it does $1$s. How many balanced words of length $n$ are there? First of all, if $n$ is odd, then there are no balanced words of length $n$. So $n$ has to be even. At first, we might approach this like previous string problems, where we place things sequentially. However, we know exactly how many $0$s and $1$s we need in our word: we want exactly $\frac n 2$ of each.

If we have $n$ spaces to fill, we first think about how to place our $0$s. We need to choose $\frac n 2$ of these spaces to fill. After we choose these spaces, we know the rest of the word must be filled with the $1$s. This gives us $C\left(n,\frac n 2\right) = \frac{n!}{\frac n 2! \left(n - \frac n 2\right)!} = \frac{n!}{\left(\frac n 2 !\right)^2}$ balanced words.

We can apply the same idea if we happen to be working in a larger alphabet. Suppose that we're working in a ternary alphabet $\{0,1,2\}$. Then a balanced word over this alphabet is one that has the same number of 0s, 1s, and 2s. Again, we would make sure that $3 \mid n$ but then our problem is solved in the following way:

First, we choose $\frac n 3$ spots for our 0s. However, we're left with $\frac 2 3 n$ spots for the 1s and 2s. We then choose half of these spots for the 1s and everything left over goes to the 2s. This gives us a total of $$C\left(n, \frac n 3\right) \cdot C\left(\frac{2n}3, \frac n 3\right) = \frac{n!}{\frac n 3! \cdot \left(n - \frac n 3\right)!} \cdot \frac{\frac{2n}3!}{\frac n 3! \cdot \left(\frac{2n}3 - \frac n 3\right)!} = \frac{n!}{\frac n 3! \cdot \frac{2n}3!} \cdot \frac{\frac{2n}3!}{\frac n 3! \cdot \frac n 3!} = \frac{n!}{\left(\frac n 3!\right)^3}.$$

Recall that combinations are really about counting subsets, while this suggests that there's a correspondence between how to arrange or select spots in a binary string. This leads to an intriguing idea.

Suppose that I have a set $A = \{x_1, x_2, \dots, x_n\}$ with $n$ objects in it. How many subsets of $A$ are there? From our previous discussions, we know that a set of $n$ objects has exactly $2^n$ subsets. Let's consider the following approach. For each subset $S \subseteq A$, let $w_S$ be a string over the binary alphabet $\{0,1\}$. Then we define $w_S$ by $w_S = a_1 a_2 \cdots w_n$, where for $1 \leq i \leq n$, \[ a_i = \begin{cases} 0 & \text{if $x_i \not \in S$,} \\ 1 &\text{if $x_i \in S$.} \end{cases} \] For instance, $w_\emptyset = 00 \cdots 0 = 0^n$, while $w_A = 11\cdots 1 = 1^n$. How many strings of the form $w_S$ are there? Our prior discussion about the number of strings makes this easy: over the binary alphabet, there are exactly $2^n$ strings of length $n$. The string $w_S$ is sometimes called the characteristic string of $S$.

One common question that comes up is when to count permutations and when to count combinations. It is very easy to turn a problem of one kind into a problem of the other, just like in our parking example. The key to look for is whether what you're counting has an element of ordering or arrangement or distinguishability.