Solution
--------
1. When i = 1, a[1] = a[2];
When i = 2, a[2] = a[4];
We see that the dependence distance is not constant. Hence fails to
GCCcompute the dependence vector for the code, marks the program for
andrun-time alias check. The relevant t in the dump file highlights
parit:
test.c:5: note: versioning for alias required: bad dist vector for a[D.1701_3] and a[i_12]
test.c:5: note: mark for run-time aliasing test between a[D.1701_3] and a[i_12]
2. The statement S1 is a[i] = a[i*2].
In the vectorized code, we will have a[0,1,2,3] = a[0,2,4,6];
The reads are even index elements. But the loads in vector registers
are contiguous. Therefore, to form the vector a[0,2,4,6], we need two
vector loads - a[0,1,2,3] & a[4,5,6,7]. From these two vectors, the 4
even index elements are extracted and put in a vector register. This
is achieved through VEC_EXTRACTEVEN_EXPR and VEC_EXTRACTODD_EXPR
instructions provided in SSE, which operation on two vectors, and
combine the even and odd index elements to give a new vector. The
odd_expr vector is discarded, and the even_expr vector is retained to
be used as the RHS. The relevant CFG code achieving this is:
vect_perm_even.19_40 = VEC_EXTRACTEVEN_EXPR ;
vect_perm_odd.20_41 = VEC_EXTRACTODD_EXPR ;
D.1702_4 = a[D.1701_3];
MEM[(int[245] *)vect_pa.21_43] = vect_perm_even.19_40;
Notice that for strided access, we need more than one vector load to
load the correct vector. In general, for (i*n), we need n register
loads to form the correct vector. This is why any strided accesses
are expensive, and restricted to reads.