Lecture 36: Paths

Last semester's notes
Eulerian and Hamiltonian paths
Review exercises:
- Draw a large graph and find an Eulerian cycle in it (using the algorithm contained in the proof below).
- Justify some of the assertions in the proof of existence of an Eulerian cycle by doing inductive proofs. For example, "the number of untraversed edges must be even".
- Write down a formula with several clauses and variables; convert it to a graph using the Hamiltonian path reduction described below. If you can find a satisfying assignment, draw the corresponding Hamiltonian path.

Definitions

A path is a sequence of vertices \((v_0, v_1, \dots, v_n)\) with \((v_i,v_{i+1}) \in E\) for all \(0 \leq i \lt n\). We say that the path traverses those edges.

A path is a cycle if \(v_0 = v_n\).

Eulerian paths

A path is Eulerian if it traverses all edges of the graph exactly once.

Claim: A connected undirected graph \(G\) contains an Eulerian cycle if and only if the degrees of all vertices are even.

Proof: If \(G\) has an Eulerian cycle, then that cycle must leave each vertex every time it enters; moreover, it must either enter or leave via each edge adjacent to the vertex exactly once. Therefore, there are two edges adjacent to \(v\) for each time the path crosses \(v\); therefore the degree of \(v\) is even.

We prove the other direction by strong induction on the number of edges. Let \(P(n)\) be the statement "for any graph \(G\) with \(n\) edges, if \(G\) is connected and all vertices of \(G\) have even degree, then \(G\) contains an Eulerian cycle". We will show \(P(0)\), and also \(P(n)\) assuming \(P(k)\) for all \(k \lt n\).

To see \(P(0)\), note that since \(G\) has no edges and is connected, it can only contain a single vertex \(v\). The 0-length path \((v)\) is vacuously an Eulerian cycle, since there are no edges that it needs to traverse.

Now, assume \(P(k)\) for all \(k \lt n\); we wish to show \(P(n)\). First, I claim that \(G\) must contain a (possibly non-Eulerian) cycle; to find it, start at any vertex and start traversing untraversed edges. You can never "get stuck", because every time you enter a vertex, you reduce by one the number of untraversed edges, but since the total degree was even, the number of untraversed edges must also be even; you used one of them to enter, so there must be at least one to leave. The only vertex that has an odd number of untraversed edges is the starting vertex; but when you reach the starting vertex, you have traversed a cycle, so you can stop.

Since there are only a finite number of edges, you must eventually find a cycle. Call the cycle \(c\).

Now, remove \(c\) from \(G\); let \(G_1, G_2, \dots\) be the remaining connected subgraphs. Note that there may be more than one, because removing edges may disconnect \(G\). Note that removing a cycle from \(G\) reduces the degree at each vertex by an even amount. Therefore, each vertex of the \(G_i\) still has an even degree. Moreover, the \(G_i\) are connected. Finally, each \(G_i\) must have fewer edges than \(G\). Therefore, we can use the inductive hypothesis to conclude that there is an Eulerian cycle \(c_i\) in each \(G_i\).

Finally, we can piece together \(c\) and the \(c_i\) to form an eulerian cycle \(e\). To traverse \(e\), start at any vertex of \(c\), and start traversing \(c\). Whenever you encounter an untraversed vertex of one of the \(G_i\), you take a detour around \(c_i\). Since every edge of \(G\) is either in \(c\) or in one of the \(c_i\) (and not both), every edge is traversed exactly once, so \(e\) is an Eulerian cycle.

Hamiltonian cycles

A Hamiltonian path is like an Eulerian path, except instead of traversing each edge exactly once, it must traverse each vertex exactly once.

We will show how to encode questions about logical formulas as questions about hamiltonian cycles.

3 CNF satisfiability

A boolean formula \(φ\) is formed according to the following grammar: \[φ ∈ F ::= x \mid \not φ \mid φ_1 \land φ_2 \mid φ_1 \lor φ_2\] where \(x \in Var\) is an element from a fixed set of variables.

A formula is satisfiable if there is an assignment of true or false to each variable such that the formula evaluates to true (\(φ_1 \lor φ_2\) is true if either \(φ_1\) or \(φ_2\) is true; \(φ_1 \land φ_2\) is true if both \(φ_1\) and \(φ_2\) are true, and \(\lnot φ\) is true if \(φ\) is false). For example, \(\lnot(x \lor y)\) is satisfiable because we can choose \(x\) and \(y\) to both be false, but \(\lnot(x \lor \lnot x)\) is unsatisfiable, because \(x\) is either true or false.

A formula is 3-CNF if it is made by and-ing together some number of clauses, where each clause is made of three variables or negated variables or-ed together. for example: \[φ = (x_1 \lor \lnot x_2 \lor x_3) \land (\lnot x_4 \lor x_2 \lor x_3) \land (\lnot x_1 \lor x_5 \lor x_2) \land (\lnot x_5 \lor x_4 \lor x_1)\] has 4 clauses, each of which has 3 terms (either \(x\) or \(\lnot x\)).

converting 3CNF formulas to graphs

We want to show how to convert a 3CNF formula \(φ\) into a directed graph \(G\) in such a way that a Hamiltonian cycle of \(G\) corresponds to a satisfying assignment to \(φ\).

Given \(φ\), we produce a graph as followed. For each variable, we create a row of vertices with edges between them in both directions. There is one pair of edges for each clause (so that there are \(n+1\) vertices in the chain if there are \(n\) clauses).

Our intent is that a hamiltonian cycle will traverse the chain from left to right if the corresponding variable is true in a satisfying assignment, and will traverse from right to left if the variable is false.

In addition, we create one vertex \(c_i\) for each clause. If \(x_j\) occurs in clause \(j\), we add an edge from the \(i\)th vertex in \(x_j\)'s chain to \(c_i\), and an edge from \(c_i\) to the \(i+1\)st vertex of \(x_j\)'s chain. If \(\lnot x_j\) occurs, we add those edges in the other direction. That way, if \(x_j\) was true, a hamiltonian path could be traversing left to right, and can take a detour through \(c_i\) if necessary.

We then add edges from each end of each the chain for \(x_i\) to each end of the chain for \(x_{i+1}\). We add a start and an end vertex; connecting the start vertex to each end of \(x_0\) and connecting each end of \(x_n\) to the end vertex, and finally add an edge from the end vertex back to the start vertex.

We must check the following:

If \(φ\) has a satisfying assignment, then \(G\) has a Hamiltonian path.
- given a satisfying assignment, we construct a Hamiltonian path as follows. We start at the start vertex. We then move to the correct end of the first variable: the left end if the variable is true, the right if it is false. We then walk through to the other end, then on to the correct end of the second variable, and so on. Finally, we step to the end node, and then back to the start node to make it a cycle. For each clause, we choose one of the variables or negations that evaluate to true, and take a detour along the corresponding path to traverse the vertex for the clause.
If \(G\) has a Hamiltonian path, then \(φ\) has a satisfying assignment.
- Any Hamiltonian path can be turned into a satisfying assignment. It must traverse the chain in one direction or the other; we can set \(x_i\) to be true if the path traverses to the right, and false otherwise. Then each clause must evaluate to true, because the Hamiltonian path passes through the clause vertex, so the direction of at least one of the variables must be compatible.
The graph can be constructed efficiently.
- The size of the graph is proportional to the number of variables times the number of clauses; each of these is bounded by the length of the formula.

NP-completeness

It turns out that there are a large class of seemingly hard problems that could all be solved if the 3CNF satisfiability problem can be solved. Nobody has been able to prove that any of these problems are efficiently solvable or unsolvable, but it is widely believed that they are not efficiently solvable.

One such problem is factoring large primes, so if you can efficiently find Hamiltonian paths, you can use your algorithm to break RSA encryption.

We don't have time to give detailed definitions and proofs (take 4820 for details), but here is a summary:

(informal) Definition: A problem is in NP if it can be solved by a non-deterministic Turing machine in polynomial time. In other words, if you are allowed to magically guess the answer, then you can check it in polynomial time.

(informal) Definition: A problem \(P\) is NP-hard if a polynomial-time algorithm for solving \(P\) can be used as a subroutine to solve any \(NP\) problem efficiently.

(informal) Definition: A problem \(P\) is NP-complete if it is NP-hard and it is in NP.

Fact: the 3CNF satisfiability problem is NP-complete.

Corrollary: The problem of finding Hamiltonian paths is NP-complete.