So far, all of the models of computation we've been looking at are deterministic, in the sense that we'll go through the exact same computation for a particular input, so even a nondeterministic model is deterministic in the way that it magically knows which choice to make to get an accepting computation. Of course, we know that very few things are certain in this world (although this can be considered a point of philosophical or scientific debate), so it makes sense to consider some aspect of uncertainty or probability in our computational model.
But we're also well aware that we don't simply model randomness via computation; we've used it as a tool to create more efficient or elegant algorithms. This is the aspect of computation that we're really interested in, in the context of complexity theory: the computational power of randomness.
First, let's review some basic probability. If we have some event $A$, then the probability of that event $A$ happening is denoted $Pr(A)$. The probability $Pr(A)$ is some real number value between $0$ and $1$. Then the probability of $A$ not happening is $1-Pr(A)$.
A probabilistic Turing machine $M$ is a NTM in which each step has two possible choices, each with probability $\frac 1 2$, called a coin-flip step. Then each branch $b$ of the computation of $M$ on an input word $w$ has probability $Pr(b) = 2^{-k}$, where $k$ is the number of steps on $b$. Then the probability that $M$ accepts $w$ is $$Pr(\text{$M$ accepts $w$}) = \sum_{\text{$b$ is an accepting branch}} Pr(b).$$ Then $Pr(\text{$M$ rejects $w$}) = 1 - Pr(\text{$M$ accepts $w$})$.
When a PTM decides a language, it must accept all strings in the language and reject all strings not in the language, with some possibility of error. One quick and dirty way to define acceptance and rejection is to simply say that if a PTM decides a language $L$, it accepts with probability at least $\frac 1 2$ if $w \in L$ and accepts with probability less than $\frac 1 2$ if $w \not \in L$. This is the notion captured by the complexity class $\mathbf{PP}$. It turns out that this isn't really a great idea. The problem comes when we get results on either side of $\frac 1 2$ with infinitesimally small error. The question is, how would we be able to distinguish the outcomes?
Instead, a more reasonable approach is to put a bound on the error probability. For $0 \leq \varepsilon < \frac 1 2$, we say that $M$ decides language $A$ with error probability $\varepsilon$ if
Then $\mathbf{BPP}$ is the class of languages that are decidable by a probabilistic polynomial time Turing machine with an error probability of $\frac 1 3$. In other words, if a PTM $M$ decides a language $L$ in $\mathbf{BPP}$, then $Pr(\text{$M$ accepts $w$}) \geq \frac 2 3$ if $w \in L$ and $Pr(\text{$M$ rejects $w$}) \geq \frac 2 3$ if $w \not\in L$.
As it turns out, the choice of $\frac 1 3$ is arbitrary, and we could've gone with any constant probability as long as $M$ is correct with some probability at least $\frac 1 2 + \varepsilon$. Why? It turns out that if we have a TM that's correct with probability $\frac 1 2 + n^{-c}$ on an input of size $n$ and constant $c$, we can construct a TM that gives us the right answer with probability $1 - 2^{n^{-d}}$ for some constant $d$. In other words, we can get our error down enough that the TM is correct with probability exponentially close to 1.
Theorem. Let $L$ be a language decidable on a polynomial time PTM $M$ such that for every input word $w$, $M$ decides $L$ correctly with probability $\geq \frac 1 2 + |w|^{-c}$. Then for every $d > 0$, there exist a polynomial time PTM $M'$ such that for every input word $w$, $M'$ decides $L$ correctly with probability $\geq 1 - 2^{-|w|^d}$.
Proof (sketch). For every input word $w$, we simply have $M'$ run $M$ on $w$ for $k$ times and get $k$ decisions, either reject or accept. If the majority of the decisions are to accept, then accept; otherwise, reject.
The key to the proof is showing that there's a $k$ for which we can do this. By doing some analysis, it turns out that $k = 8n^{2d+c}$, where $n$ is the size of $w$.
For each decision of the $k$ decisions, we assign a random variable $X_i$ ($1 \leq i \leq k$), where $X_i = 1$ if the $i$th decision was correct (accept if $w \in L$ or reject if $w \not\in L$) and $0$ otherwise. Now, each $X_i$ is an independent and boolean random variable and each $X_i$ has expected value $E[X_i] = Pr(X_i = 1) \geq \frac 1 2 + n^{-c}$.
Here, we'll mention the Chernoff bound, which we will use to come up with our result. Intuitively, the Chernoff bound expresses the notion of how surprised we should be if flipping a (supposedly) fair coin 1000 times turns up heads 900 times. This gives us the following bound: $$ Pr \left( \left| \frac 1 k \sum_{i=1}^k X_i - p \right| \gt \delta p \right) \lt e^{-\frac{\delta^2}4 pk}.$$ We set $p = \frac 1 2 + n^{-c}$ and $\delta = \frac{n^{-c}}2$ so that if $$\frac 1 k \sum_{i=1}^k X_i \geq p (1 - \delta) = \left( \frac 1 2 n^{-c} \right) \left(1 - \frac{n^{-c}}2\right) ,$$ we are guaranteed to get a correct answer. This means we get the following bound on the probability that the answer is wrong: $$e^{-\frac 1 {4n^{2c}} \frac 1 2 8n^{2c+d}} \leq 2^{-n^d}.$$ $$\tag*{$\Box$}$$
So what is an example of a problem in $\mathbf{BPP}$? We'll be looking at primality testing, or $\mathrm{PRIMES}$: Given an integer $n$ in binary, is $n$ a prime number? We can immediately place $\mathrm{PRIMES}$ in $\mathbf{coNP}$, since testing whether a number is composite can be done in $\mathbf{NP}$: just nondeterministically guess a factor and test it.
Theorem. $\mathrm{PRIMES} \in \mathbf{BPP}$.
Proof (sketch). First, we define the Jacobi symbol $\left( \frac a n \right)$ for an integer $a$ and positive odd integer $n$ by $$ \left( \frac a n \right) = \left( \frac a {p_1} \right)^{m_1} \left( \frac a {p_2} \right)^{m_2} \cdots \left( \frac a {p_k} \right)^{m_k}$$ where $n = p_1^{m_1} p_2^{m_2} \cdots p_k^{m_k}$ is the prime factorization of $n$. Here, each $\left( \frac a {p_i} \right)$ is a Legendre symbol, which is defined for all integers $a$ and odd primes $p$ by $$ \left( \frac a p \right) = \begin{cases} 0 & \text{if $a \equiv 0 \bmod p$}, \\ 1 & \text{if $a \not\equiv 0 \bmod p$ and $a \equiv x^2 \bmod p$ for some integer $x$}, \\ -1 & \text{otherwise.} \end{cases}$$
Then if $N$ is an odd composite, the number of numbers that are relatively prime with $N$ and have a Jacobi symbol of $A^{\frac{N-1}2}$ is less than half of the numbers othat are relatively prime with $N$. This means that if we choose a number $A$ at random, there is a low probability that it will be relatively prime and have a Jacobi symbol of $A^{\frac{N-1}2}$.
The algorithm is as follows: Given an integer $N$ as input we assume it is odd and choose a random number $A$ from $1 \leq A \lt N$. If $\gcd(N,A) \gt 1$ or $\left( \frac N A \right) \neq A^{\frac{N-1}2} \bmod n$ then reject; otherwise accept. If $N$ is prime, this algorithm always accepts. If $N$ is composite, this algorithm will reject with probability at least $\frac 1 2$. $$\tag*{$\Box$}$$
Primality testing was originally thought to be difficult, in $\mathbf{NP}$, until a randomized algorithm was discovered in the late 70s by Solovay, Strassen, and Rabin. In fact, they showed something even stronger than primality testing in $\mathbf{BPP}$. In the proof we just did, observe that we can actually guarantee that if $N$ was prime, we will always have the right answer. In this case, we have an algorithm that has one-sided error and we can define some more complexity classes based on this notion.
A language $L$ is in $\mathbf{RP}$ if it is decidable by a polynomial time PTM $M$ such that \begin{align*} w \in L & \implies Pr(\text{$M$ accepts $w$}) \geq \frac 2 3, \\ w \not\in L & \implies Pr(\text{$M$ accepts $w$}) = 0. \end{align*} Then $\mathbf{coRP} = \{L \mid \overline L \in \mathbf{RP}\}$. Equivalently, we can say that a language $L$ is in $\mathbf{coRP}$ if it is decidable by a polynomial time PTM $M$ such that \begin{align*} w \in L & \implies Pr(\text{$M$ accepts $w$}) = 1, \\ w \not\in L & \implies Pr(\text{$M$ accepts $w$}) \leq \frac 1 3. \end{align*} There's one more class, $\mathbf{ZPP}$, that is $\mathbf{RP} \cap \mathbf{coRP}$. In this class, a TM must output the correct answer or outputs "don't know" up to half the time.
All of these classes are in $\mathbf{BPP}$ and the original randomized algorithm for primality testing was in $\mathbf{coRP}$. However, even more surprisngly, in 2002, Agrawal, Kayal, and Saxena showed that primality testing is actually in $\mathbf P$. This leads to the question of whether $\mathbf P = \mathbf{BPP}$. In other words, just how powerful is randomization?