We'll begin by considering the notion of verifying a solution. In all of the problems we've studied so far, we were interested in solving them. But what if rather than solving a problem, someone claims to have a solution for the problem and you need to verify it? We will consider exactly this in the following problem.
A Hamiltonian path in a directed graph $G$ is a direted path that visits every node exactly once. Let $\mathrm{HAMPATH}$ be the problem of whether there exists a Hamiltonian path between $s$ and $t$ in a directed graph $G$.
It is not known whether $\mathrm{HAMPATH}$ is solvable in polynomial time. The brute force algorithm where we check every possible path is exponential. However, if someone claims that a graph has a Hamiltonian path, then we can check this very easily. All they'd have to do to convince us is give us the path they claim is Hamiltonian and we can check it.
A verifier for a language $A$ is an algorithm $V$, where $$A = \{ w \mid \text{$V$ accepts $\langle w,c \rangle$ for some string $c$}\}.$$ A polynomial time verifier is a verifier that runs in time polynomial in the length of $w$. We call the string $c$ the certificate for membership in $A$. Then length of a certificate is arbitrary, but for polynomial verifiers, $c$ will have polynomial length.
In the case of $\mathrm{HAMPATH}$, the certificate $c$ would be the Hamiltonian path in $G$ from $s$ to $t$.
However, note that if we wanted to show that a graph does not contain a Hamiltonian path, this is a different question. It is not as clear how you would try to convince someone that a graph doesn't contain a Hamiltonian path without simply iterating through all of the possibilities. In fact, there is no polynomial-time verifier for $\overline{\mathrm{HAMPATH}}$.
$\mathbf{NP}$ is the class of languages that have polynomial time verifiers.
This nomenclature seems a bit strange at first having characterized it in terms of verifiers. It makes more sense once we show that an equivalent characterization is via polynomial time solvability on nondeterministic Turing machines.
Let $N$ be a nondeterministic Turing machine that decides some language. The running time of $N$ is the function $T:\mathbb N \to \mathbb N$, where $T(n)$ is the maximum number of steps that $N$ uses on any branch of its computation on any input of length $n$.
We define the nondeterministic time complexity class $\mathbf{NTIME}(T(n))$ to be the collection of all languages that are decidable by an $O(T(n))$ time nondeterministic Turing machine.
This gives us the following definition. $$\mathbf{NP} = \bigcup_{k \geq 1} \mathbf{NTIME}(n^k).$$
Theorem. A language is in $\mathbf{NP}$ if it is decided by some nondeterministic polynomial time Turing machine.
Proof. We show how to convert a polynomial time verifier into a polynomial time NTM and vice versa.
First, we show that we can simulate a polynomial time verifier with an NTM by guessing the certificate $c$. Let $A \in \mathbf{NP}$ and we construct an NTM $N$ to decide it. Let $V$ be a polynomial time verifier for $A$ and suppose it runs in time $n^k$. Then we construct $N$:
To show the other direction, we assume $A$ is decided by a polynomial time NTM and construct a verifier $V$:
$$\tag*{$\Box$}$$
Let's see an example of a nondeterministic polynomial time machine in action.
Theorem. $\mathrm{HAMPATH}$ is in $\mathbf{NP}$.
Proof. We construct the following nondeterministic TM $N$ to decide $\mathrm{HAMPATH}$.
Each step runs in time polynomial in the number of vertices in $G$, so this algorithm runs in nondeterministic polynomial time. $$\tag*{$\Box$}$$
From our definitions, it seems pretty clear that $\mathbf P \subseteq \mathbf{NP}$. We can treat any deterministic Turing machine as a nondeterministic machine where the number of choices on each transition is 1. Then it's clear that $$\mathbf{TIME}(f(n)) \subseteq \mathbf{NTIME}(f(n))$$ for all functions $f : \mathbb N \to \mathbb N$. This gives us $$\mathbf{TIME}(n^k) \subseteq \mathbf{NTIME}(n^k)$$ for all $k \geq 1$ and therefore $\mathbf P \subseteq \mathbf{NP}$.
But is that containment strict? Well, we know that if a nondeterministic Turing machine can decide a problem, then a deterministic machine can as well. But we run into a snag: how much more time does it take on a deterministic machine?
Theorem. Let $T(n)$ be a function where $T(n) \geq n$. Then every $T(n)$ time nondeterministic single-tape Turing machine has an equivalent $2^{O(T(n)}$ time deterministic single-tape Turing machine.
Proof. Let $N$ be a nondeterministic TM running in $T(n)$ time. We construct a deterministic Turing machine $M$ that simluates $N$ as we did earlier.
On an input of length $n$, each computation branch of $N$ has length at most $T(n)$. Every node in the tree has at most $k$ children, where $k$ is the maximum number of nondeterministic choices given by the transition function of $N$. This gives us at most $k^{T(n)}$ leaves.
Recall that we explore the tree breadth first. Since there are at most $k^{T(n)}$ leaves in the tree, there are a total of $O(k^{T(n)})$ nodes. Travelling to each node requires at most $O(T(n))$ steps. Then $M$ runs in $O(T(n)k^{T(n)}) = 2^{O(T(n))}$ time.
Since the machine that we built is a 3-tape machine, we apply the previous theorem to get a running time of $(2^{O(T(n))})^2 = 2^{O(2T(n))} = 2^{O(T(n))}$. $$\tag*{$\Box$}$$
This result gives us the following inclusion: $$ \mathbf{NP} = \bigcup_{k \geq 0} \mathbf{NTIME}(n^k) \subseteq \bigcup_{k \geq 0} \mathbf{TIME}(2^{n^k}) = \mathbf{EXPTIME}.$$
Now, by the time hierarchy theorem, we know that $$ \mathbf{P} = \bigcup_{k \geq 0} \mathbf{TIME}(n^k) \subseteq \mathbf{TIME}(2^n) \subsetneq \mathbf{TIME}(2^{2n}) \subseteq \mathbf{EXPTIME}.$$
Therefore, we know that $\mathbf P \subseteq \mathbf{NP} \subseteq \mathbf{EXPTIME}$ and that $\mathbf P \subsetneq \mathbf{EXPTIME}$. However, we don't know much more than that. Obviously, we must have at least one of $\mathbf P \subsetneq \mathbf{NP}$ or $\mathbf{NP} \subsetneq \mathbf{EXPTIME}$, but no one has been able to show this so far.
At first, it seems like it should be incredibly obvious that $\mathbf P \neq \mathbf{NP}$. Indeed, our intuition tells us that while certain problems should be very difficult to solve, verifying the solution to a problem should be much easier than solving it. It's for this reason that the overwhelming majority of computer scientists believe that $\mathbf P \neq \mathbf{NP}$. And yet, we're still unable to definitively prove this apparently obvious fact.
So why has it been so difficult? After all, all we really need to do to separate the two classes is to find a problem in $\mathbf{NP}$ and show that it isn't decidable in polynomial time. One challenge with this is in choosing the right problems. There have been a number of problems that were thought to be difficult that ended up (after some time) being in $\mathbf P$. So, we really need a way to consider which problems in $\mathbf{NP}$ are really easy (if we want to show that $\mathbf P = \mathbf{NP}$) or hard.
It is not quite as obvious that showing the other possibility, that the two classes are the same, is quite as simple. One of the most important early developments in the pursuit of answering this question is the notion of NP-completeness. NP-completeness gives us an extremely useful tool for studying complexity as well as an extremely easy way to show that $\mathbf P = \mathbf{NP}$. Of course, that we haven't been able to show this yet is just another mystery on top of the already huge mystery.
This tool is reducibility, which we have already encountered earlier in the context of computability. Once again, we will use this concept to show that various problems are as complex as some other problem that we already know the complexity of.
A function $f: \Sigma^* \to \Sigma^*$ is a polynomial time computable function if some polynomial time TM $M$ exists that halts with $f(w)$ on its tape when started on any input $w$.
A language $A$ is polynomial time reducible to a language $B$, written $A \leq_P B$, if a polynomial time computable function $f:\Sigma^* \to \Sigma^*$ exists, where for every $w$, $$ w \in A \iff f(w) \in B.$$ The function $f$ is called the polynomial time reduction of $A$ to $B$.
Theorem. If $A \leq_P B$ and $B \in \mathbf P$, then $A \in \mathbf P$.
Proof. Let $M$ be the polynomial time algorithm deciding $B$ and $f$ be the polynomial time reduction from $A$ to $B$. We describe a polynomial time algorithm $N$ for deciding $A$:
Note $M$ accepts $f(w)$ whenever $w \in A$ by definition of $f$. Then $N$ runs in polynomial time since the computation of $f$ and $M$ are both polynomial and composition of polynomials is polynomial. $$\tag*{$\Box$}$$
A language $B$ is $\mathbf{NP}$-hard if $A \leq_P B$ for all $A \in \mathbf{NP}$. A language $B$ is $\mathbf{NP}$-complete if $B \in \mathbf{NP}$ and $B$ is $\mathbf{NP}$-hard.
These definitions turn out to be extendible to other complexity classes in general. For a complexity class $\mathcal C$, we say that a language $B$ is $\mathcal C$-hard if $A \leq_P B$ for all $A \in \mathcal C$ and $B$ is $\mathcal C$-complete if $B \in \mathcal C$ and $B$ is $\mathcal C$-complete.