Solution -------- 1. a. S1 : A[i+1] = A[i+2]; Instantiating the statement instances for S1, when i is 0, A[1] = A[2]; when i is 1, A[2] = A[3]; The value is first read, then written into in the next iteration. The dependence is therefore Write After Read (anti dependence). The dependence is loop carried, and the dependence distance is 1. Since the array A is of type int, the vectorization factor is 4. b. The code can be vectorized because A[i+2] is the source for A[i+1], and it lexically precedes A[i+1]. In the dump file, the vectorization decision is given as: test.c:5: note: LOOP VECTORIZED. test.c:2: note: vectorized 1 loops in function. The dependence is loop-carried, and therefore the code cannot be parallelized. This is reported in the dump file as: distance_vector: 1 direction_vector: + FAILED: data dependencies exist across iterations 3. Initially, the write access is misaligned by 4 bytes, and the read access is misaligned by 8 bytes. This information is present in the dump_file as: test.c:5: note: vect_compute_data_ref_alignment: test.c:5: note: misalign = 8 bytes of ref a[D.1702_4] test.c:5: note: vect_compute_data_ref_alignment: test.c:5: note: misalign = 4 bytes of ref a[i_3] Atleast one compile-time misalignment can be corrected by peeling. To align the reads (a[D.1702_4]), we need to peel the loop 2 times. To align the writes (a[i_3]), we need to peel the loop 3 times. The peeling decision is given in the dump file as: test.c:5: note: Try peeling by 2 test.c:5: note: Alignment of access forced using peeling. test.c:5: note: Peeling for alignment will be applied. test.c:5: note: known peeling = 2.