CS410, Summer 1998 Lecture 25 Outline Dan Grossman Goals: * Finish up SCC * Do MST SCC wrap-up: We were a bit rushed at the end of last class. So we reviewed the proof. See last lecture's notes. MST: Today G is a weighted undirected connected graph (with positive weights, n vertices, and m edges). We want to choose a subset of the edges such that: * The graph using only those edges is connected. * The sum of the weights of the edges is less than or equal to the sum of the weights of any other such subset of edges. Easy Claim: Such a subset is a tree of n-1 edges. Proof: Can't be connected with fewer than n-1 edges. If more edges, then there must be a cycle -- remove any edge of the cycle, and the graph is still connected with lower total weight. Another name for a connected graph with n-1 edges is a tree. Hence this is called the minimum spanning tree (MST) problem. Naively it might seem hard -- we're building up an MST and then we find some super cheap edge and have to undo some of our other edges. However, we can actually develop algorithms that don't make mistakes as they go along. This is an example of a greedy algorithm. The general theory of greedy algorithms is covered in 482. Here is a generic MST algorithm: A, a set of edges, empty while A's size is less than n-1 find an edge such that A plus that edges is part of an MST add the edge to A return A So we just need to do the find and it's not at all clear we can do it efficiently. We need some definitions: * A cut in a graph is a partition of the vertices into S and V-S. * An edge (u,v) crosses a cut (S, V-S) if u in S and v in V-S. * An edge is a light edge for a cut if it crosses the cut and no other edge crosses the cut has smaller weight. * A set of edges respects a cut if not edge in the set crosses the cut. The key theorem for proving various MST algorithms correct is the following: Theorem: If A is a subset of a MST that respects a cut (S, V-S) and (u,v) is a light edge on the cut, then A plus (u,v) is a subset of a MST. Proof: By assumption, A is included in a MST T. Either (u,v) is in T or it isn't. If it is, we're done. If it isn't, then consider the simple path from u to v in T. It must cross the cut at least once, say along edge (x,y). Since (u,v) is a light edge for the cut, weight((x,y)) >= weight((u,v)). So if we replace (x,y) with (u,v) our weight has not increased. Furthermore, we are still connected because any path that used to use (x,y) can replace that edge with a sequence of edges taking x to u, u-->v, v to y. So we must still have a MST. (It also must be the case that weight((x,y)) == weight((u,v)).) So just need to find a light edge for some cut that A respects on every iteration. That sounds easier. Corollary: Think of A as a forest over G. (When we're done it will be a tree, but in the middle it will be a multiple-tree forest.) If an edge is the lightest between one tree and everything else, then we can add it. Proof of the corollary: Make the cut be (the one tree, everything else). Now we will present Kruskal's algorithm for finding an MST: Kruskal: sort edges by weight A empty For edges in order: if edge connects different trees of A, add it (thus connecting two trees) return A. Correctness: If a tree in A is unconnected to another, then no lighter edge connects them (else would have been added already). So by corollary, we're safe. But for an edge (u,v) we need to know if u and v are currently in the same tree. Initially all vertices in separate trees. When adding an edge, two trees become one. This is a perfect application for Union-Find! Kruskal: sort edges by weight A empty initialize union-find with each vertex in its own set. for each edge (u,v) in order if (find(u) != find(v)) then add (u,v) to A and union(u,v) return A Running time: sort O(mlog m) union find with 2m finds and n-1 unions is O(mlog* n) So O(mlog m). Notice the sorting is the asymptotic bottleneck, so constant factors probably matter more there. Notice how all the hard work in the analysis of the algorithm (correctness and running time) was already done by more general arguments earlier in the course! A second algorithm is Prim's algorithm. We started discussing it, but the full notes will appear in tomorrow's lecture.