reset("0","0","1000","780","A Simple Example of Algorithm Design/Analysis") B = TextPanel("5","5","500","515","black","white","red") //B.setFontSize("H1, H2, H3, H4","25") //B.setFontLarge() S = TextPanel("5","525","500","160","black","white","red") B2 = TextPanel ("510","5","475","675","black","white","blue") B2s = TextPanel ("510","5","475","375","black","white","blue") \macro(Sum,∑) \macro(le,≤) \macro(ge,≥) \macro(ne,≠) \macro(times,∏) //\macro(Omega,ω) \macro(Sigma,Σ) \macro(cdot,·) \macro(times,×) \macro(infty,∞) \macro(cap,∩) \macro(sp,         ) B2s.hide() S! In this lecture our main goal will be to give one more example of algorithm analysis. B!

Outline: S! Here is the problem for which we will design an algorithm from specification: B!

Maximum subsequence: Input: Sequence X[1..n]
Output: Integers i,j such that X[i]+X[i+1]+...+X[j] is largest possible over all choices of i,j. \scene(Maximum Subsequence Problem) S! This problem is not particularly important in itself. However, it turns out that important problems (which are more complicated to state right now) have somewhat similar structure. S:: Also, there is a range of algorithms possible for this problem, starting with the most obvious and slowest to one which is very clever and fast. So later you will be able to compare how different algorithm design strategies will work on the same problem. S! We now describe "Design from specification". B2!

Design from specification \scene(Design from specification) S! The first step is to identify what it is we need to output. B2-
  1. Identify what is desired: S! In our case the specification clearly mentions that two integers, i,j need to be output. B.H("ij","red") S! The second step is to find what conditions the desired output must satisfy. B2-
  2. Identify the conditions to be satisfied by the output. S! This in our case is: B.U("ij") B.H("max","red") S! The final step is.. B2-
  3. Systematically consider all possible candidates and check if they satisfy the conditions.
S! Clearly, i,j must be between 1 and n, with i \le j. S:: So there are n choices for i, and n-i+1 choices for j for each choice for i. B.U("max") S:: So we have the following code. \scene(Algorithm) B:: Maxsub(X[1..n]){
result = Sum(X[1..n])
besti=1, bestj=n
for i= 1 to n
\sp for j=i to n
\sp \sp if result < Sum(X[i..j]) then besti=i, bestj=j
\sp end

end

return besti,bestj

}

Sum(Y[1..m]){
sum = 0
for k=1 to n
\sp sum = sum + Y[k]
end
return sum
} S! This code shows a few of the conventions we will use in writing code in this course. B2-

Code conventions: \scene(Algorithm writing conventions) S! Let us first start with the procedure Sum. This takes as argument an array Y. B.H(y1m,red) B2-
  • A procedure parameter of the form Y[1..m] denotes that Y is some kind of set, with m elements, accessed as Y[1], Y[2], ... Y[m]. S! Note that while in this program we are using Y as an array, in other programs we might use it to denote other data structures. What data structure is being described will become clear from the context. S:: Note that this level of sloppiness is allowed since an algorithm is intended to be read by a human being, and not a machine. S:: A human being might often prefer to think about Y as just a set, without worrying about how it is stored -- it might be preferable to worry about storage considerations sometime later. S! In fact for the same reason .. B2-
  • We will rarely have type declarations. Instead we will rely on the names of objects to suggest their type. S! The second important point is that the description Sum(Y[1..m]) not only defines Y as an argument, but defines m to be its length. S! B2-
  • An argument to a procedure call of the form X[i..j] -- B.H(xij,yellow) B2- indicates that a subset (or subarray) of X from the ith component to the jth component is being passed. S! Note however, that in the body of Sum, the ith element of X will become the first element of Y. Similarly, the array Y (or list, or whatever) will have length m=j-i+1 in the body of Sum. S! We will assume that the when an array is passed in this manner, no copying will have to be done. This is justified because arrays are allocated in a contiguous manner, and only the starting index of the array and the length is passed. S! If a list is being passed, then the analysis will have to be more complicated, since getting to the ith element of the list will take time. We will discuss this explicitly when needed. B.U("xij,y1m") B2-
  • A procedure might return multiple values, e.g. Maxsub returns besti,bestj. B.H("bestij",red) B2- This will of course increase the time required for executing the return statement -- we said earlier that returning one value takes 1 step, so returning k will take k steps. S! Again, if one of the values being returned is a part of a list, then the time will have to include the time to extract that part. B2- In the calling procedure multiple values might be retrieved as "a,b = Maxsub(X[1..n])". B.U("bestij") S! Usually, our conventions will be quite self-explanatory. They will often be based on ideas from standard languages, e.g. the multiple values idea is from the Matlab language. However, we will occasionally innovate. S! The important point to note is: algorithm \ne program. An algorithm will hide some details, and be a bit sloppy since that works better for human readers. S! In fact we may not always write down any code -- we might simply describe the algorithms in English, so long as what code to write and how much time it needs to execute is clear. S! Now let us take a few minutes to pursuade ourselves that this algorithm works correctly. B2! \scene(Correctness) S! Note first that Sum(array) returns the sum of the elements in the array. S:: Thus when we call it as this -- B.H("xij",red) S- the sum X[i]+X[i+1]+...+X[j] will be computed. S:: Maxsub will compute all the possible sums considering all possible values of i,j, and the maximum sum will be retained in the variable "result", and the best i,j values will be retained in besti, bestj. S! Clearly the algorithm is correct. B.U(xij) S! Now we analyze the time taken. B2! \scene(Analysis) S! In this analysis we will use upper case letters A,B,... to denote some unknown positive numbers which remain the same no matter what the length of X is. We will typically say "for some constant A" -- which will simply mean "for some number A which does not depend upon the problem size, (n in the case of procedure Maxsub and m for Sum)" S! First we analyze the procedure Sum. B2!

    Time taken by Sum For an array of length m, Sum requires time Am+B, for some positive constants A, B. S! Clearly, each iteration will take some constant number of steps. Since there will be m iterations, and a few additional instructions, the time should be of the form Am+B. S! Now we turn to the analysis of Maxsub(X[1..n]). Even in this simple algorithm, the presence of the if statement -- B.H(stmt,red) S- makes the time taken dependent on the precise values of X. B.U(stmt) S:: So getting a closed form expression which is just a function of n is not possible. S:: However, it is relatively easy to get upper and lower bounds separately. S! So we start by estimating an upper bound. B2-

    Upper bound on time of Maxsub(X[1..n]) \scene(Upper bound) S! We begin by estimating the time taken by the statement in the innermost loop. B.H(stmt,yellow) B2- Time for statement in inner loop = A(j-i+1)+B+B' S! The length of the array passed is j-i+1. Hence the time for the call Sum(X[i..j]) is A(j-i+1)+B. Some additional time B' is required to actually construct the arguments to Sum (without copying array X, but just forming the address of X[i]) check the condition, make the assignment to besti, bestj if needed. B2- \le Cn+D S! Clearly n \ge j-i+1, and hence this holds for some C,D. S:: It might seem that we are giving up a lot in this approximation, and hence our bound will be loose; however you will see shortly that such approximations, which we will continue to make, will not hurt us much. B2:: Time for inner loop S! B.U(stmt) B.H(innerloop,yellow) B2- for fixed i is: S! Each iteration takes at most Cn+D. There are n-i-1 iterations. So the time will be of the form (Cn+D)(n-i-1)+E. This can be upper bounded as Cn2+Dn+E. B2- Cn2+Dn+E. S! B2:: Time for the outer loop B.U(innerloop) B.H(outerloop,yellow) B2- \le Cn3+Dn2+E'n+F S! This is clearly true, since the loop with index i runs n times. In each iteration, there are some looping back instructions, so the term proportional to n will not be E but some E'; and there will be some additional overhead of checking which will cause the term F. B2:: Time for entire procedure: S! B.U(outerloop) B.H(all,yellow) S! The very first computation of result will also take time some Gn+H, and there are some additional instructions to initialize besti,bestj initially. Hence only the term proportional to n and the constant term will increase. B2- \le Cn3+Dn2+E''n+F' B.U(all) S! In the previous lecture, we said that the precise values of C,D,E'',F' will be different for different computers. However, for all computers (including our RAM model) we can claim that the time is at most cubic. B: Time is at most cubic in n. S! We next estimate a lower bound. B2:

    Lower bound on time of Maxsub(X[1..n]) \scene(Lower bound) S! We will start again with the statement in the innermost loop. B.H(stmt,yellow) B2- Time for statement in inner loop \ge A(j-i+1) S! Just the time taken for the body of the subroutine call Sum(X[i..j]) will be A(j-i+1)+B as mentioned earlier. S:: There will be some additional time to check whether the result is smaller and so on. But we are only asserting "\ge". S! B2:: Time for inner loop: B.U(stmt) B.H(innerloop,yellow) B2: \ge \SumjA(j-i+1) S! Here the sum is over j ranging from i to n. B2: = A + 2A + 3A + ... + (n-i+1)A S! We can evaluate this sum exactly, however since we only want an approximate answer, we can save ourselves some algebra. S! The first trick is to ignore the first (n-i+1)/2 terms in the sum. B2: \ge A(n-i+1)/2 + A((n-i+1)/2+1) + ... (n-i+1)A S! Here we are ignoring the possibility that n-i+1 could be odd -- but you can get something similar even if it is odd. This is left as an exercise. S! The second term is to observe that each of these terms is at least A(n-i+1)/2 B2: \ge A(n-i+1)/2 + A(n-i+1)/2 + ... A(n-i+1)/2 S- and there are (n-i+1)/2 of them. S:: So we have.. B2: \ge (n-i+1)2A/4 S! We will write A/4 = K to simplify algebra. B2: = (n-i+1)2K S! B2:: Time for outerloop: B.U(innerloop) B.H(outerloop,yellow) B2: \Sumi K(n-i+1)2 S! Here i goes from 1 to n. B2: = K(n)2 + K(n-1)2 + ... + K(1)2 S! This time we ignore the last n/2 terms. B2: \ge K(n)2 + K(n-1)2 + ... + K(n/2)2 S! But now we have n/2 terms each at least K(n/2)2. Thus we get: B2: \ge Kn3/4 S! In other words, we have proved that .. B: Time is at least cubic in n. S! This finishes the analysis. S! In the next lecture we will develop some notation which will simplify the algebra a little bit. S! Summarizing.. B2!

    Summary
    • Design by specification.
    • Another example of algorithm analysis. B2!

      Exercises
      1. Procedure Maxsub separately calculates both Sum(X[i..j]) and Sum(X[i..j+1]. But clearly, once you have calculated Sum(X[i..j]) you should be able to compute Sum(X[i..j+1]) just by adding X[j+1] to it -- which is just one extra addition rather than the j-i+2 additions needed to calculate Sum(X[i..j+1]) separately. Can you use this idea to get a faster algorithm? You should be able to get an algorithm that takes quadratic time rather than cubic. (By the way, even faster algorithms are possible.)
      2. Analyze the time taken by insertion sort.
      \scene(End)