Problem Statement
-----------------

        1. For the code given below, do the following:
                a. Identify the type of dependence in S1 (RAW, WAR, WAW, 
                   loop carried/loop independent)
                b. compute the vectorization factor
                c. Speculate if the program can be 
                	i. vectorized
                	ii. parallelized
                
        2. Now vectorize and  parallelize the code with  gcc, and verify
           your observations.
        
        3. In the code, for i=0, the value of A[2] is written into A[1].
           Assuming that you are able to figure out the misalignment for
           both the  accesses, it is easy  to see that the  accesses are
           not aligned to  the natural vector size  boundary. Since this
           is a  compile-time misalignment, GCC  will try to  align some
           data references. Inspect  the generated dump and  see how the
           the misalignment is handled.


Test case
---------

        int A[256];
        int main ()
        {
                int i;
                for (i=0; i<204; i++) {
                     A[i+1] = A[i+2];   /* S1 */
                }
                return 0;
        }