We have seen many useful algorithms and data structures for solving computational problems. However, some problems are intractably hard in the sense that they would require too much time or space. Some problems are not solvable at all, no matter how much time or space is available.

P, NP, and EXPTIME

The class of problems that are generally considered to be tractable are those that can be solved in polynomial time, which is to say O(n^k) for some k. In practice, algorithms that take polynomial time with k larger than 1 scale poorly, and algorithms that require k≥5 are almost never used in practice (and even k=3 and k=4 are often impractically slow).

We define the complexity class P as the set of all problems that can be solved by an algorithm taking polynomial time, that is, time O(n^k) for some constant k.

Beyond P is the class EXPTIME, which includes all algorithms that take time O(2^{n^k}) for some constant k. Algorithms in EXPTIME effectively hit a wall when the problem size n becomes even moderately large. For example, if an algorithm takes time 2ⁿ, increasing the problem size from n to size n+1 requires twice as much time. Even if the algorithm is fast for small n, as n grows, we quickly reach a size where even a small increase is far too expensive. Contrast this with the case of an O(n^k) algorithm, where going from n to n+1 means increasing the time only by a factor of roughly 1+k/n, which gets smaller as n increases. (To see why, note that ((n+1)/n)^k ≈ 1 + k/n when n ≫ k.)

Another important class is nondeterministic polynomial time, or NP. These are the problems for which a potential solution can be checked in polynomial time. It may be intractably difficult to discover a solution to a problem in NP, but once we are given a candidate solution, we can check in (deterministic) polynomial time whether it is indeed a solution.

These are three famous problems in NP. All of them can be solved by exponential-time algorithms that essentially try all exponentially many possible solutions. But this approach is infeasible for all but very small instances of the problem. It is not known whether polynomial-time algorithms for these problems exist, although by now most computer scientists believe they do not.

NP-completeness and the P=NP question

Each of these three problems above has the interesting property that in polynomial time, any problem in NP can be reduced to it (encoded in it). That is, if a computational problem A is in NP, then there is a translation that, given an instance x of A, transforms x into an instance (G,k) of the graph coloring problem such that a valid k-coloring of G could be transformed back to a solution of x.

Because they can express any problem in NP, these problems are said to be NP-complete. If we had a polynomial-time algorithm to solve an NP-complete problem, we could solve any problem in NP in polynomial time! This result would mean that the complexity classes P and NP were the same.

Most computer scientists believe that P and NP are not the same and that there is no algorithm that solves any NP-complete problem in worst-case polynomial time. However, no one has managed to prove that the two classes are different either. The P=NP question has intrigued and stymied researchers for decades. It remains the single most important unsolved problem of computer science.

Space complexity

It is also possible to classify algorithms in terms of the memory space they require. Algorithms in PSPACE require a polynomial amount of space. Algorithms in L require only a logarithmic amount of space in addition to the input data; in effect, they can use a constant number of pointers into the input data. A recent surprising result is that undirected graph reachability is in L.

Space classes have nondeterministic versions too. A surprising result, proved by Savitch in the 1970s and known as Savitch's theorem, states that PSPACE = NPSPACE; that is, for a particular problem, if there is a polynomial-space algorithm to check whether a potential solution is indeed a solution, then there is a polynomial-space algorithm to find solutions.

Some relationships among complexity classes have been proved, such as the following inclusion relationships:

It is also known that L≠PSPACE and that P≠EXPTIME. However, many important things are not known. For example, because L≠PSPACE, we know that at least one of the inequalities L≠P, P≠NP, and NP≠PSPACE must hold, but we don't know which.

The complexity of some important problems is not known either. For example, even though the security of RSA encryption rests on the difficulty of factoring numbers, it is not known whether factoring is in P. (However, it is known that factoring can in principle be solved in polynomial time on a quantum computer, though no one has yet been able to build a useful quantum computer.) On the other hand, testing primality—whether a given integer is prime or composite—is known to be in polynomial time.

Undecidability

Beyond the problems mentioned above, there are even computational problems that cannot be solved at all, even in principle, and even with unlimited time and space. We can prove mathematically that such problems exist. We will focus on decision problems where the goal is to decide whether a given property holds of some input, and we will show that some decision problems are undecidable by any algorithm.

Undecidability of the halting problem

An example of such a decision problem is the halting problem: Given a program p and input x to that program, does p terminate when run on input x? We will see that the halting problem is undecidable in general. That is, there is no algorithm that halts and answers this question correctly for all programs and inputs.

We have seen in previous lectures and from the programming assignments that a program can be represented as a data structure such as an abstract syntax tree (AST) or bytecode. Let us assume there is a class Program that can be used to represent programs.

We want to know whether we can implement a method with the following specification:

That is, for every p and input x, it should successfully return either true or false depending on whether the program represented by p halts on input x. Note that although terminates is just one method, it is allowed to use as many other classes and methods as it likes. We have the full power of Java at our disposal.

For simplicity let us consider only Java programs that implement a decision problem and have the following form:

If p is an instance of Program that represents the Java program P, then terminates(p,x) should return true or false according as P.main(x) halts or does not halt, respectively.

The method main is allowed to use other classes and methods. However, we will only consider programs that receive no input from and send no output to the outside environment. The only input to the program is x and the only output is the boolean result of main. If we can't determine whether such simple programs terminate, then of course we have no hope of determining whether more complex programs do.

We have seen that it is possible to write an interpreter for a programming language. An interpreter for programs like P (as coded by the Program object p) can be written with the following signature:

Note that H.main terminates on any input x, because the only possibility for nontermination is in line 6 in the call to interpret(p,p), but we have already guaranteed that this will terminate by the call to terminates(p,p) in line 5. Therefore H.main(x) always returns either true or false.

But now let h be an instance of Program representing H, and consider what happens when we run H.main(h). As argued above, this must terminate. The program does not return in line 3 and the cast succeeds in line 4 because h is an instance of Program. The program does not return in line 5 because that would only happen if terminates(h,h) returned false, which would only happen if H.main(h) did not terminate, but it does. Thus we get all the way to line 6. But now observe that at line 6, the code returns the value !interpret(h, h), which if the interpreter works correctly, is equal to !H.main(h). In other words, the result of H.main(h) is equal to !H.main(h). This is a contradiction:

H.main(h) returns true ⟺ interpret(h,h) returns false ⟺ H.main(h) returns false.

This contradiction means that our original assumption, that we can test halting in line 5, must be wrong. More precisely, in a language expressive enough to implement interpret, we cannot implement terminates to work correctly on all input programs. We know how to implement interpreters for programming languages (including Java) in Java; we did it for a much simpler language in Assignment 5, but the principle is the same. So the erroneous assumption was that we could decide termination. The method terminates cannot exist.

The conclusion is that some useful results are simply not computable by any algorithm.

Implications for program analysis

This result may sound obscure but it has some far-reaching practical implications. In particular, many different program analyses besides termination are also undecidable. For example, we cannot reliably identify dead code (code that cannot be reached), or tell if a given program will ever generate a NullPointerException, or tell if a run-time type error will occur. These problems are all provably undecidable.

To see why they are undecidable, suppose that we had an analysis tool that could always tell at compile time whether a program could generate a NullPointerException. Using that analysis, we could use it to figure out whether arbitrary code terminates, by constructing code like the following:

Assuming the body of the while loop does not generate any NullPointerExceptions, this code generates NullPointerException if and only if the while loop terminates. If we had an analysis tool that could predict NullPointerExceptions at compile time, we could use it to determine whether the while loop terminates. But we know that termination is undecidable, therefore predicting NullPointerExceptions is impossible in general.

As a result, all “interesting” program analyses must be conservative, giving answers “true”, “false” or “not sure”. Type checking is another example of such an analysis. A compiler type-checks programs conservatively by only allowing programs that (if the type system is sound) definitely have no run-time type errors. However, a type checker will sometime complain about programs being ill-typed even though they do not cause run-time type errors.

By similar arguments, we see that we have to be conservative about many other facts we'd like to know about programs—for example, whether they are correct, whether they are secure, or whether they leak memory. All automatic tools for analyzing programs will either be incomplete, meaning that they reject some safe programs as possibly unsafe (this is a false positive or a false alarm), or else unsound, meaning that they accept some unsafe programs as safe (this is a false negative). Despite these limitations, software developers use incomplete and even unsound automatic tools all the time: they can still be useful despite their limitations.

Hard problems and undecidability

P, NP, and EXPTIME

NP-completeness and the P=NP question

Space complexity

Undecidability

Undecidability of the halting problem

Implications for program analysis