Solution -------- 1. a. In a statement, read precedes write. Therefore, the type of dependence in S1 is write after read (anti dependence). The dependence is from a statement instance of S1 to itself, and therefore loop independent. In fact the dependence can be safely ignored. The dependence distance is dumped as: test.c:5: note: dependence distance = 0. test.c:5: note: dependence distance == 0 between a[i_12] and a[i_12] b. We use SSE instruction set to vectorize the code. The size of vector register is thus 16 bytes. The array a is defined with type int, and an int occupies 4 byes in GCC. Therefore, the vectorization factor is 16/4 = 4. This means that 4 ints can be accommodated in a vector register. The vectorization factor is dumped as: test.c:5: note: get vectype for scalar type: int test.c:5: note: vectorization factor = 4 c. Since the source (read access to array a) lexically precedes the sink, the code is safe to vectorize. Since there is only one dependence in the code, and it is loop independent, different iterations of loop i can be assigned to different processors. The code is thus safe to parallelize. 2. Loop i iterates 203 times. The vectorization factor is 4. Performing 4 iterations together will leave 3 iterations of i. These 3 iterations cannot be the part of the vector code, and they are executed sequentially, as the epologue. The relevant part in the dump file indicating this is: vect-par.c:5: note: vectorization_factor = 4, niters = 203 epilogue iterations: 3 vect-par.c:5: note: epilog loop required. Notice that there is no epilogue generated. This is because all the accesses in S1 are aligned to natural vector-size boundary. The sequential eplogue is shown below: : # i_14 = PHI # ivtmp.3_16 = PHI D.1701_17 = a[i_14]; D.1702_18 = D.1701_17 + 2; a[i_14] = D.1702_18; i_20 = i_14 + 1; ivtmp.3_21 = ivtmp.3_16 - 1; if (ivtmp.3_21 != 0) goto ; else goto ; : goto ; Observe how ivtmp.3_16 is initialized with the value 3, and then is decremented after each iteration.