Solution -------- 1. a. S1 : a[i] = a[i-3]; Instantiating the statement instances for S1, when i is 3, a[3] = a[0]; when i is 6, a[6] = a[3]; The value is first written into a[3], and then read after 3 iterations. The dependence is therefore Read after Write (true dependence), and the dependence distance is 3. Since the array a is of type int, the vectorization factor is 4. b. The code cannot be vectorized because a[i] is the source for a[i-3], and it lexically executes after a[i-3]. In the dump file, the vectorization decision is given as: test.c:5: note: dependence distance = 3. test.c:5: note: bad data dependence. test.c:2: note: vectorized 0 loops in function. The dependence is loop carried, and therefore the code cannot be parallelized. This is reported in the dump file as: distance_vector: 3 direction_vector: + FAILED: data dependencies exist across iterations 3. a. The current dependence distance in S1 is 3. If we increase the dependence distance beyong Vectorization Factor, we can safely vectorize the loop. For example, if we change S1 to a[i] = a[i-5];, the code can be vectorized. b. Executing 4 iterations atomically causes memory access contention in read and write vectors, thereby prohibiting vectorization. If we can make the dependence distance greater than the vectorization factor, no element will be accessed in both read and write vectors at the same time. Vectorization will be safe then. The size of vector register is 16. If we convert the 'int' data type to 'long int' which occupies 8 bytes, the vectorization factor reduces to 16/8 = 2. Now the loop can be safely vectorized.