Topics: Reading: CLRS book. Also, these notes will be on-line. Traveling salesman problem Inapproximability result: general TSP Check minimization Approximability and reducibility ------ TSP: Input: a matrix d giving distances between n Points in a metric space, i.e. the distance satisfies symmetry, triangle inequality. Output: cycle passing through all points such that the total length of the edges in the cycle is smallest possible. NP-complete. Even for this special case. We will give a 2-approximation algorithm, i.e. give an algorithm that returns a tour of cost at most 2 times the optimal tour. Algorithm: 1. Find a minimum spanning tree T. 2. Start with a cycle given by the dfs traversal of T rooted at any vertex r. The cycle visits leaves once, and internal nodes several times. The cycle starts at the root and ends at the root. 3. Starting at the root scan the cycle. If you reach a vertex v which appeared earlier, short cut it, i.e. if the preceding vertex is u and the next w, then remove the edges u-v, v-w from the cycle and add u-w. Continue scanning from u. ---------- Example: Take points in the plane. Ensure that the mst is a two root cbt drawn naturally. Then the preorder cycle will cross, indicating it is not optimal. ---------- At the end each vertex remains on the cycle exactly once. So we have a proper cycle. We next prove that it has the right length. 1. The weight W of T <= weight of optimal cycle. 2. The cycle at the end of step 2 has weight 2W. 3. The cycle after step 3 has weight at most 2W. Proof of 1: Weight of an optimal cycle <= weight of the path obtained by removing any edge. <= weight of mst (because a the path is also a spanning tree) = T Proof of 2: Each edge is counted exactly twice. Proof of 3: Because of the triangle inequality, each short cutting step can only reduce the edge. Remark: The final cycle = vertices listed in preorder traversal of T. We go through the long process only to simplify the proof. So total time = m log n + m. ----------------------------------- Thm: If P != NP, then there does not exist an r-approximation algorithm for general (non-metric) TSP , for any number r >= 1. Proof: Define problem r-ATSP to take as input a weighted graph, and return a salesman tour of length at most r * OPT TSP. We will give a reduction from Hamiltonian Cycle to r-approximation. HC: G=(V,E). F(G){ Construct instance of the r-ATSP problem d(u,v) = 1 if (u,v) \in E. = rn+1 where n = number of vertices in G. return instance } G(G,c){ // c = cycle obtained by solving r-approx(d). if c consists of weight 1 edges, return true. else return false. } Proof of correctness: Suppose TSP has only weight 1 edges. Then the same cycle will be the HC solution. Else c has an heavy edge. So the weight of the cycle is at least rn+1. Since this is an r approximation, we know that rn+1 <= r * weight of optimal cycle. Thus weight of optimal cycle >= (rn+1)/r > n. But this means the weight 1 edges do not make an optimal cycle. Hence the original graph did not have a HC. Key idea: Gap inducement. If HC has no solution, then r-approx solution has very high value, as compared to when HC has a solution. ----- Exercise: use similar ideas and a reduction from dominating set to k center to show that metric k center cannot be approximated better than factor 2. ----- A fun problem: Check minimization Suppose n friends just received bills from some m vendors and want to settle them. Of course, each person could write a check that he/she owes to the each vendor. This might potentially cause nm checks to be written. So they try to see if this number can be reduced. Under certain circumstances this can clearly happen. For example, if person A,B both owe Rs 100 each to vendors P,Q, then A,B could write checks of Rs 200 respectively to P,Q, thereby settling the bills in 2 checks instead of 4. Clearly, there could be more complex situations where the number of checks can be reduced. (1) Show that the problem of minimizing the number of checks is NP-hard. (2) Give a simple 2 approximation. -- 2 approximation. First form the net amount $A_i$ that person $i$ owes, and $B_j$ the net amount that vendor j is owed. Now, each person must write at least one check and each vendor receive at least one check. Thus there are two lower bounds, m,n. Thus the combined lower bound is max(m,n). For this, consider any $i,j$ with $A_i, B_j>0$. Have i write a check of min(A_i,B_j) to j. As a result of this check, i pays up what he owes fully or j fully receives what he is owed. Thus in one check, at least one vendor or one person drops out. In otherwords, with each check, one of the two lower bounds drops by 1. Thus with m+n checks, both bounds will become 0, and the problem will have been solved. But m+n-1 <= 2 max(m,n). IMPORTANT IDEA: There may be multiple lower bounds. An optimal choice in a solution might reduce both bounds, so the optimal cost is at least max of the bounds. But it might be easy to find a choice that reduces at least one bound -- thus we will get cost = sum of the bounds. This might be a reasonable approximation algorithm. ------------------------------- Reducibility amongst problems vs. approximability. We know that IS, VC, Clique can be reduced to each other. VC has can be approximated within a factor of 2. Does it mean that the others can also be? The issue is similar to the following statements (1) Knowing how many marks I got in CS 601 is equivalent to knowing how many marks I lost. (2) Knowing my marks to within a factor 1.1 is equivalent to knowing how many I lost to within a factor 1.1 If I expect 90 marks with an uncertainty of 9, then I expect to lose 10 with an uncertainty of 9. But 9 over 90 is 10%, while 9 over 10 is nearly 100 %! So while the first statement is true, the second one is not. Thus while there maybe a relationship as far as exact values, there may be no relationship as far as (multiplicative) approximation. Similarly, we know that size of min VC = n - size of Max IS. But knowing one approximately does not help for the other. On the other hand, the size of a IS is exactly the same as the size of the clique for the complement graph. Hence to find an IS we could run a clique approximation on the complement, and would get a similar approximate answer. So reducibility by itself does not imply approximability. Some additional work will have to be done to exploit additional relationships. ------