Topic: NP-Completeness Quick survey of last time: Linear Time reduction A problem P LT-reduces to problem Q, if we can write procedures F,G such that (a) given an instance x of P, we can construct an instance y of Q using procedure F, and given a solution to y we can construct from it a solution to x using procedure G (b) F,G run in linear time. Polytime reduction: Replace "linear" by "polytime". Reductions are interesting because: Scientifically: Reduction relates two problems which on the surface might look very different. The relation does not depend on whether P or Q has an algorithm. It is more like "translation of problems" rather than solving problems. Engineering: 1. P <=LT Q. Q has n^k algorithm => P has n^k algorithm 2. P <=LT Q. P has no n^k algorithm => Q has no n^k algorithm. ------ Remark: You have to understand science in at least two ways. First there is the algebraic or the logical or even the legal way. In this you must make sure that every little part in the definition holds. Second there is the intuitive or geometric or artistic way. In this you must "see" what is happening. BOTH ARE IMPORTANT. We will soon see that there is lot of creativity and insight in reductions. But we must first build up the logical machinery for it. ---- A note: we will use n to denote the problem size. We might often use n to denote the number of vertices in the graph. Note that the size of a graph should really be the number of edges. However, if we are only worried about polynomial time, anything that is polynomial in m is also polynomial in n^2>m; and hence in n, the degree will at most double if expressed in terms of n. But it will stay polynomial. So basically, even if we are sloppy, it will not matter. ---- For much of this chapter, it is convenient to consider decision problems: a decision problem is one which has a YES/NO answer. This is because often many problems have a decision version which retain the essence of the original problem, and decision problems are simpler to study. Example: DECISION PROBLEM IS: Does a given graph have an independent set of size k? INPUT: G,k. OUTPUT: YES/NO. MIS: find a maximum independent set given a graph G. How are these problems related? CLAIM 0: Decision IS <=P to MIS : PROOF: F(G,k){Return G} G(G,k,answer){Return true if answer >= k} A more interesting relationship: CLAIM 1: If the decision IS can be solved in polytime, then we can find the size of the IS in polytime. Proof: Solve the decision problem for k=1..|V|. Stop when the answer becomes NO. Binary search is also possible. CLAIM 2: If A solves decision IS in polytime (say n^k), then in polytime we can find the maximum IS. Proof: MIS(G){ k = size of max ind set, found as in claim 1. G' = graph remaining after some vertex v and its incident edges are removed from G. (Not the neigbours of v) If(IS(G',k)) Return MIS(G'). // IS of G' is also IS of G. Else G" = graph after removing v and its neighbours and their incident edges from G. return {v} U MIS(G"). // v must be in the MIS of G. So the rest of the MIS must be a MIS // yof G". } Time analysis: T(n) = n*D(n) + T(n-1) + O(n) If D(n) = O(n^k), then this is O(n^{k+2}). Actually we dont need to find k in each recursion, so this is really O(n^{k+1}). OBSERVATIONS: 1. Apparently the decision problem is easier, but not really from the point of view of finding a polytime algorithm. 2. What we have done above is also a "reduction" in spirit -- we showed that if decision IS has polytime algorithm then so does MIS. But for this we had to solve several -- polynomial number of instances of the decision version. This kind of a reduction is called a COOK reduction. DEFINITION: Suppose an arbitrary instance of problem Y of length n can be solved using a polynomial number (in n) of computational steps, plus a polynomial number (in n) of calls to a procedure A that solves X. Then Y <=c X : Y (cook) reduces to X. Stephen Cook was the inventor of this notion. This was a part of his PhD in the 70s! The other definition is due to Karp who was his advisor! Ideas like this came from Leonid Levin of the then Soviet Union at just about the same time. ------ A comment about the model: When we say "using a polynomial number of computational steps" we must include the time required to copy the input to the procedure A and copyback the results generated by the procedure A." ------ THEOREM: If Q <=C R then Q <=K R. PROOF: The simpler reduction calls a subrouting for R only once which is certainly polytime in n. Thus 1,2 hold also for cook definition. -------- For the most part we will use the Karp definition, i.e. we transform the given instance into exactly one instance of the new problem. For all future discussion, we will mostly only worry about decision problems. -----**********------ Thm: CNF formula satisfiability <=P Independent set CNF SAT: Input: CNF formula. Boolean formula in product of sums (and of ors form). i.e. conjunction of clauses, where each clause is a disjunction of literals, and each literal is either a boolean variable or its negation e.g. (a+b'+c)(c'+d+b+a)(d'+c') Output: Is there an assignment of values to the variables such that the formula evaluates to true Proof: We will describe what F does. G will simply take the answer for y and return it as the answer for x. For each clause: put down a group of as many vertices as the literals in it. We assign the literals as the labels of the vertices. Then we put a complete graph between all vertices in each clause. For each variable x: we add an edge from every vertex labelled x to every vertex labelled x'. We set k = number of clauses. Have we constructed an instance of IS? Have we done this in polynomial time? Most important: correctness. We will prove that the answer to x is the same as the answer to y. Suppose the answer to x is yes. Then there exists a truth assignment such that at least one literal from each clause is true. We pick one such literal from each clause, and pick the vertex corresponding to the literals that we picked. Clearly we pick exactly one vertex from each group. We claim that this set of vertices picked, S, is independent and of size k. Size k is immediate because we picked one per group. Any two vertices picked cannot have an intergroup edge because we picked only one vertex in each group. Intra group edges connect a literal and its complement. But a literal and its complement cannot be both true, and hence both cannot be picked. Hence there cannot be an intra group edge between the vertices either. Suppose the answer to x is no. We want to prove that this implies that the answer to y is also a no. It is convenient, in this case, to prove the contrapositive. What is the contrapositive? "If the answer to y is yes, then the answer to x must also be yes." Suppose the answer to y is yes. Then there is a set of k independent vertices. Since each of the k groups is a clique, we must have at exactly one vertex in the independent set from each group. Next, if a selected vertex has label x, then no vertex with label x' is in the set, because there are edges connecting all such pairs. Now if x is in the set we set it to true, and set x' to false and vice versa. If neither x no x' is in the set ... ------ Vertex cover/Policemen-intersection problem reduces to facility location. VC: G,k. Is there a set of k vertices such that every edge is incident on one of these vertices? FL: Bipartite G=(clients, locations,E), k. E gives which clients can be served by which locations. Is there a set of k locations such that all clients will be served if a facility is build there? Proof: General idea: think of roads to be watched as clients. They demand the service of a facility, which in this case is a policeman. F(G,k){ Return clients = E, locations = V, edge from e to both endpoints incident on e, k} G(G,k,answer){Return answer} --------- 1. VC is a special case of FL: each client can be served by exactly two locations. In general we will say that Q is a special case of R if every instance of Q is also an instance of R. So it is not a surprise that Q reduces to R. 2. FL as we have defined above: "Set cover problem". Input: collection S = s1,...,sm, sets over 1..n. Integer k. Output: subcollection of S having only k sets such that their union is the same as the union of the sets in S. I will let you figure out how clients and locations in the FL problem correspond to si and their elements. -------------------------------------------------------------------------------------