TOPOLOGICAL SORT Lemma: A graph is a dag iff its depth-first forest has no back edges. Proof: Let (u,v) be a back edge. Tree edges make a path from v to u, so there is a cycle. Conversely, if there is no back edge, then postorder numbers can only decrease when traversing an edge, so there can be no cycle. Dags are often used to represent finite partial orders. If G = (V,E) is a dag, let E* be the reflexive transitive closure of E. Thus uE*v iff there is a directed path from u to v in G. The relation E* is a partial order on V--reflexive, antisymmetric, transitive. Antisymmetry (if uE*v and vE*u then u=v) follows from the fact that there are no cycles. A /topological sort/ of G is a numbering of the vertices t:V --> {0,1,2,...,n-1} (one-to-one and onto) such that if uEv then t(u) < t(v). If we define a binary relation T on V by: uTv if t(u) <= t(v), then T is a total order, and T extends E* in the sense that if uE*v then uTv; i.e., E* is a subset of T as sets of ordered pairs. In other words, if there is an E-path from u to v, then t(u) <= t(v). Recall from CS280 that every partial order extends to a total order. In fact, every partial order is the intersection of all total orders that extend it, considered as sets of ordered pairs. A topological sort of the vertices of a dag thus represents a total order extending the partial order E*. Topological Sort algorithm: One can get a topological sort of a dag from the DFS forest in linear time. Here is a little bit simpler algorithm. First, go through all the edges and construct an array indeg[] of the indegrees of all the vertices. Put all vertices v of indegree 0 into a bag (stack or queue). There must exist at least one; if all vertices had positive indegree, then we could find a cycle by following edges backwards. t = 0; //next available topological sort # while (bag is nonempty) { u = remove the next vertex from the bag; t[u] = t++; delete u and all outgoing edges (u,v); for (each edge (u,v) just deleted) { indeg[v]--; if (indeg[v] = 0) put v in the bag; } An invariant of the program is that any vertex in the bag has indeg[v] = 0. It follows that uEv then t[u] < t[v], since v cannot receive its number before v comes out of the bag, which cannot occur before v goes into the bag, which cannot occur until indeg[v] = 0, which cannot happen before u is deleted, which cannot happen before u comes out of the bag, which happens at the same time that u receives its number. Thus t gives a topological sort. Also, it can be proved by well-founded induction on the dag that every vertex eventually enters the bag, and is eventually removed and receives its number. Someone asked in class whether choosing the next element of the bag in all possible ways results in all possible topological sorts. This is true. An application: 2CNF satisfiability. Given a Boolean formula in conjunctive normal form (CNF), we would like to know whether there is a truth assignment to the variables making the formula true. A Boolean formula is in CNF if it is a conjunction of clauses. A /clause/ is a disjunction of literals. A /literal/ is a variable or negation of a variable. A formula is in kCNF if there are at most k literals per clause. For example, here are formulas in 3CNF and 2CNF, respectively: (~x | y | z) & (x | ~y | ~w) & (~y | ~z | w) (~x | y) & (x | ~z) & (~y | ~z) Here | & ~ denote Boolean or, and, not, respectively. Deciding satisfiability of 3CNF formulas (or any kCNF for k >= 3) is NP-complete, which means there is no known efficient algorithm; the best known algorithm runs in exponential time in the worse case. However, 2CNF satisfiability can be solved in linear time using a combination of strong components and topological sort. Rewrite the formula as a conjunction of implications. The 2CNF formula above would become (x -> y) & (~y -> ~x) & (~x -> ~z) & (z -> x) & (y -> ~z) & (z -> ~y) Intuitively, think of the implications -> as propagating truth; if u -> v and u is true, then v has to be true. If u is false, then the implication u -> v imposes no constraint on the truth value of v. Now make a graph whose vertices are the literals and whose directed edges are the implications. For this example, we would have vertices x, y, z, ~x, ~y, ~z and edges (x,y), (~y,~x), (~x,~z), (z,x), (y,~z), (z,~y). Claim: the formula is satisfiable if for no variable x is it the case that x and ~x are in the same strong component. Given the claim, satisfiability can be checked in linear time by doing DFS, finding the strongly connected components by the algorithm given last time, then checking for each variable x whether x and ~x are in the same strong component. Proof of claim: suppose x and ~x are in the same strong component. Then there is a path from x to ~x and a path from ~x to x, so if one of x or ~x is assigned true then the other must also be assigned true. But this is impossible, since x and ~x must have opposite truth values. Conversely, suppose x and ~x are in different components for all x. Collapse the strong components of the graph to single vertices to form the quotient graph, as described in a previous lecture. As argued, this graph is a dag. Topologically sort it. Set x = true (and ~x = false) if ~x occurs before x in the topological ordering, x = false (and ~x = true) if x occurs before ~x. One of the two cases must hold, since x and ~x are in different components, which correspond to different vertices in the quotient graph. We argue that this gives a satisfying assignment. If (u | v) is a clause of the original formula, where u and v are literals, then ~u -> v and ~v -> u are edges, so ~u occurs before v and ~v occurs before u in the topological order. If u and v are both false, then ~u and ~v are true, so u occurs before ~u and v occurs before ~v in the topological order. Putting these conditions together gives t(~u) < t(v) < t(~v) < t(u) < t(~u), which gives a cycle, which is impossible. Thus at least one of u or v must be true, so the clause (u | v) is satisfied. This is true of all clauses, so the formula is satisfied. In class I also mentioned that algorithms similar to topological sort can be used to test emptiness of context-free languages and satisfiability of Horn clauses in linear time. I went over the algorithm for emptiness of CFLs briefly. You will not be tested on this.