CMSC 27200 — Lecture 8

Now, we'll formalize the idea of separating the explored vs unexplored vertices.

A cut is a partition of vertices into two nonempty subsets $(S,V-S)$. The cutset of a cut $S$ is the set of edges with exactly one endpoint in $S$.

Here, we have a graph given in purple. The vertices in green are those that form the cut and the edges in blue are the edges that belong to its cutset.

We will prove a property of edges in the cutset that will be useful in proving the correctness of both of the algorithms for minimum spanning trees.

For a graph $G = (V,E)$ let $S \subseteq V$ be a cut and let $e$ be a minimum weight edge in the cutset of $S$. Then every minimum spanning tree of $G$ contains $e$.

Suppose that $T$ is a spanning tree that doesn't contain $e = (u,v)$. Since $T$ is a tree, there is a unique path from $u$ to $v$. Since $(u,v)$ is in the cutset of $S$, there must be an edge $e'$ in the cutset of $S$ on the unique path from $u$ to $v$.

Consider the graph $T' = T - \{e'\} \cup \{e\}$, obtained by exchanging $e'$ for $e$. Then $T'$ is also a spanning tree. To see this, recall that the addition of an edge, in this case $e$, creates a unique cycle, but we have broken that cycle by removing $e'$, so $T'$ has no cycles. The graph $T'$ is also connected, since any pair of vertices connected by a path that contained $e'$ is now connected by a path that contains $e$.

However, we were given that $e$ is the minimum weight edge in the cutset of $S$. Since $e'$ was another edge in the cutset of $S$, we have $w(e) \lt w(e')$. Therefore, the total cost of $T'$ is strictly less than the cost of $T$.

The proof of the proposition we just proved is essentially an exchange argument. We will employ this in the following proof.

Essentially what we need to show is that the set of edges $T$ that was chosen by Prim's algorithm together with $S$ are a minimum spanning tree for the subgraph induced by $S$. This then proves the proposition when $S = V$. We will show this by induction on $|S|$.

When $|S| = 1$, $S = \{v\}$ and no edges have been selected. So $T$ is empty and $(S,T)$ is a minimum spanning tree for the subgraph induced by $S$.

Now suppose that for $|S| \geq 1$ that $(S,T)$ is a minimum spanning tree for the subgraph of $G$ induced by $S$. Let $v$ be the next vertex to be added to $S$ by Prim's algorithm. Then by definition there is a vertex $u \in S$ such that $(u,v) \in E$ is the minimum weight edge in the cutset of $S$. By Proposition 8.3, every minimum spanning tree of $G$ must contain this edge.

Now, let's discuss the implementation. Our strategy to implement this will be roughly the same as for Dijkstra's algorithm: keep the vertices that aren't explored yet in a priority queue, based on the minimum weight edge that connects them to a vertex that has been explored.

    \begin{algorithmic}
    \PROCEDURE{Prim}{$G$}
        \STATE Let $r$ be some vertex in $V$
        \FORALL{$v \in V$}
            \STATE $\pi[u] = \varnothing$, $d[u] = \infty$
        \ENDFOR
        \STATE $d[r] = 0$
        \STATE Create priority queue $Q$ with $Q = V$
        \WHILE{$Q \neq \emptyset$}
            \STATE $u = $ \CALL{extract-min}{$Q$}
            \FORALL{$(u,v) \in E$}
                \IF{$v \in Q$ and $w((u,v)) \lt d[v]$}
                    \STATE \CALL{update}{$Q,v,w((u,v))$}
                    \STATE $\pi[v] = u$
                    \STATE $d[v] = w(u,v)$
                \ENDIF
            \ENDFOR
        \ENDWHILE
    \ENDPROCEDURE
    \end{algorithmic}

We won't go through the proof, since it's basically the proof for Dijkstra's algorithm (another good exercise would be to adapt that proof for this algorithm). And just like Dijkstra's algorithm, we can get an improvement to $O(|E| + |V| \log |V|)$ time if we swap out a binary heap for a Fibonacci heap.

Kruskal's algorithm

Kruskal's algorithm is the other commonly-cited minimum spanning tree algorithm and is due to Joseph Kruskal in 1956, curiously also at Bell Labs. Kruskal was a student here in the College and had two brothers who also turned out to be mathematicians as well as a nephew who is currently a CS prof at UMD.

While Prim's algorithm can be thought of as growing a tree, we can think of Kruskal's algorithm as starting with a bunch of tiny trees and joining them together to form one big tree. Again, the rule that will guide us is by taking the minimal weight edge.

    \begin{algorithmic}
    \PROCEDURE{Kruskal}{$G$}
        \STATE Sort edges by cost and label so that $w(e_1) \leq w(e_2) \leq \cdots \leq w(e_m)$
        \STATE $T = \emptyset$
        \FORALL{$v \in V$}
            \STATE Make set $\{v\}$
        \ENDFOR
        \FOR{$i$ from 1 to $m$}
            \STATE Let $e_i$ be $(u,v)$
            \IF{$u$ and $v$ are not in the same set}
                \STATE $T = T \cup \{e_i\}$
                \STATE Take the union of the sets containing $u$ and $v$
            \ENDIF
        \ENDFOR
    \RETURN{$T$}
    \ENDPROCEDURE
    \end{algorithmic}

Let's examine what this algorithm is doing. We begin by sorting our edges in order of weight and putting every vertex into its own "tree". Then, we examine the minimum weight edge. If this edge joins two vertices that are not in the same tree, then we add it to our minimum spanning tree $T$ and take the union of the trees that $u$ and $v$ belong to. We repeatedly do this until we have examined every edge.

The following is a graph that is in the middle of executing Kruskal's algorithm. The larger components created by Kruskal's algorithm are highlighted in different colours.

the edge of weight 6 which is highlighted is the next edge to be added to the tree. Observe that this will join two components.

Using Proposition 8.3, it is not too difficult to prove the correctness of Kruskal's algorithm.

First, we will show that this algorithm produces a spanning tree. Suppose first that it isn't a tree, so there is a cycle in the graph. But we couldn't have added an edge that forms a cycle, because this would mean that we added an edge that joins two vertices in the same tree. Now suppose that it isn't connected. But then there would be an edge that connected two trees together that we should have added.

Now, we need to show that $T$ is a minimum spanning tree. Consider an edge of $T$, say $(u,v)$, at the time this edge was chosen by Kruskal's algorithm. Let $S$ be the component of $T$ containing $u$. Since $u \in S$ and $v \not \in S$, adding $(u,v)$ would not have created a cycle. Furthermore, $(u,v)$ must be the first edge considered by Kruskal's algorithm that is in the cutset of $S$. We know this because all such edges would not create a cycle when added, so any other edge in the cutset that would have been encountered earlier by the algorithm would have been added already. Therefore, $(u,v)$ is the least cost edge in the cutset of $S$. By Proposition 8.3, every minimum spanning tree must contain $(u,v)$. Since the choice of $(u,v)$ was arbitrary, every edge in $T$ is contained by every minimum spanning tree.

Thus, $T$ is a minimum spanning tree.