due Oct 10 There is a reasonably clear strategy for designing a greedy algorithm, as follows. You could think of it as the strategy for dynamic programming + greedy choice. This can be seen to fit most greedy algorithms, except for the interval colouring algorithm. STATING IN MORE DETAIL: 1. We consider the job of the program to take a sequence of actions. 2. If we can arrange the actions in a certain order and suggest how the first decision is to be taken. 3. After this we must prove greedy choice, which means: there exists an optimal way of taking the decisions in which the first decision is taken in the way described in 2 above. 4. Then we must say how the remaining decisions should be taken, with proof. The standard way of doing this is to argue that the residual problem, i.e. the problem of taking the remaining decisions is of the same kind as the origingal problem. Then we can just recurse. The proof of greedy choice will tell us that our second decision is correct, and the third is correct, and so on. In what follows I will sketch the answers to the problems. 1. Problem 4 of KT. Let S be a sequence of length m and R a sequence of length n. Give an algorithm to decide if S is a subsequence of R that runs in time O(m+n). Note that S is a subsequence of R if S[i] appears at some R[j_i], where j_i < j_{i+1}, i.e. S[i+1] must appear in R after S[i], could be consecutive but need not be. SOLUTION: Although this is a decision problem, you cannot decide unless you match up each leter in S to some sequence of letters in R. So we treat the problem as requiring to decide the match, i.e. for i=1 to m, we should produce M(i) such that M(i) form an increasing sequence and R[M(i)] = S[i]. So our decisions are M(i). It is natural to make these decisions in order 1..m. Also, instead of asking for matching all m characters which may not be possible, we ask for maximizing the number Q such that characters 1..Q are matched. Greedy choice: M(1) = smallest i such that R[i] = S[1]. (You should state this very clearly. I will state why you should think of this, but you dont have to write this parenthesized part in your answer. If you choose to match S[1] with some later occurrence of it in R, then you are reducing the choice you have for matching subsequent letters in S. So go for this; of course the proof is needed.) Proof: Consider the longest possible match M', i.e. match with the largest Q. If M'(1) is also the index of the first match, i.e. M'(1)=M(1), then we are done, because M' itself is a solution in which the first match is according to our greedy strategy. But suppose M'(1) is the index of some later occurrence. In this case we will modify M'(1) and set it to the first occurrence. This still satisfies all conditions, i.e. M' is an increasing sequence, and R[M'(1)] = S[1]. So we are done. Optimal substructure/solution to the residual problem: The rest of solution contains M' values larger than M'(1), and these values specify matching of S[2..m]. So the residual problem is the instance S[2..m] and R[M'(1)+1..n]. You should prove this: if the new instance has a better solution, then we can get a better solution for the original instance... Time taken: T(m,n) = O(M'(1)) to find the first occurrence of S(1) + T(m-1,n-M'(1)). This solves to O(m+n). No code nor pseudocode need be given if you wrote as above. But be sure to define everything clearly. ----------- 2. Problem 6 KT. You are given times s_i, b_i, r_i required by participant i to swim a certain distance, bike a certain distance, run a certain distance in a triathlon race. It is required that there be at most one swimmer in the pool at anytime. Each participant must first swim, then bike, then run. In what order should you send the participants so that the race finishes as quickly as possible? SOLUTION: Note first that biking and riding are independent and can be done consecutively. So we can assume that there is only biking with time b_i+r_i. (Thinking of the greedy criterion: try just two contestants. Which is the better order? The pool use finishes at time s_1+s_2, after that it is better to have the smaller of b_1, b_2. We can conjecture that the same order will work in general.) GREEDY CHOICE: First schedule the participant having the largest b_i. PROOF: Suppose in some optimal solution there is participant j scheduled before i, with b_j <= b_i. Suppose the participants before j finish the pool use by time T. Then j finishes by T+s_j+b_j, and i finishes by T+s_j+s_i+b_i. If we exchange then i finishes by T+s_i+b_i and j by T+s_i+s_j+b_j. Now notice that both the new times are no larger than the old finishing time of i. So we can continue making the exchanges so that i appears first. RESIDUAL PROBLEM: If we just use the remaining contestants it does not work because we would schedule them to swim from time 0. So we add a starting time as a parameter (strenghthening the recursion). So the original problem has n participants who start at time 0. The residual problem has all except the contestant with the largest biking time, and start time is swim time of that contestant. Run time analysis: If you sort by decreasing biking time, just nlogn. Otherwise n^2 if you find max at each step. ------------ 3. Problem 15 KT. You are given intervals. Give an algorithm to pick the smallest number of intervals such as each unpicked interval overlaps with some picked interval. *************** Something is wrong in the solution I have given. Try to find it yourself before reading the correct solution at the end. *************** SOLUTION: (Thinking of the greedy criterion. Try examples. 2 intervals is not interesting. 3 intervals: you should not pick the leftmost one if it intersects the middle one. So clearly what you pick should be the one that intersects the interval ending first. middle one. Additionally it should intersect the maximum number of intervals?) GREEDY CHOICE: Let S be the set of intervals intersecting the earliest finishing interval F, including F. Of these, pick the interval I that has the largest finishing time. PROOF: Clearly some interval from S must be in the optimal solution, in order to intersect with F. Suppose in an optimal solution I is not picked, but some I' in S is. The finishing time of I' is <= finishing time of I. But this means that I will intersect every interval that I' intersects. So we can replace I' by I. RESIDUAL PROBLEM: All intervals starting to the left of I. RUNNING TIME: O(n^2) will suffice; with sorting and some care O(nlogn) should work. ---- ******************* The greedy choice is correct but the residual problem is not. As an example consider the interval set [1,3],[2,10],[9,20],[11,12],[13,14]. The solution above would give as answer [2,10],[11,12],[13,14], while [2,10],[9,20] suffices. When I stated what the residual problem was, I should have proved that the optimal solution to every problem would break in this manner. To solve this, we will generalize: the input consists of sets S,T where S denotes the intervals to intersect, and T the intervals that can be used to intersect. The initial call is S,S because the same set is to intersected by the same set. Greedy choice: I = interval from S having the earliest finishing time. Let T'=set of intervals from T intersecting S. Pick from T' the interval I which finishes last. Output I as being in the solution. S = S - {intervals intersecting with I}. Recurse on S' and T. Claims to be proved: 1. There exists an optimal solution containing interval I. (If not, exchange, as above). 2. I + solution to smaller problem has same number of intervals as any optmal solution Proof of 2: Consider an optimal solution O in which I is present. Let O'=O-I. O' must contain intervals from T, and O' must cover S'. In other words, the problem (S',T) is in fact the residual problem. But then because our greedy choice is correct, our algorithm is a correct recursive algorithm. -----