CS314

Assignment 8

Problem 1 – Dependencies

Suppose we have the following program (a sequence of instructions), with each instruction labeled as

S<num:

S1 : a := 4;

S2 : b := 2;

S3 : c := 5;

S4 : read ( d );

S5 : a := b + 3;

S6 : b := a – 3;

S7 : c := d * b ;

S8 : e := a + 6;

S9 : print ( c );

S10 : print ( e );

(a) Give the statement-level dependence graph for the above program. A node in the statement-level

dependence graph represents a statement, an edge represents dependence between the statements

(i.e. the nodes). Label each edge as a true data dependence, an anti data dependence, or an

output data dependence.

(b) Assume that each statement takes 1 cycle to execute. What is the execution time of the sequential code? What is the fastest parallel execution time of the program (i.e. the critical path)?

You may assume that I/O operations (read and print) can be done in parallel.

Problem 2 – Dependence Analysis

Give the source and sink references, the type (whether a dependence is true, anti, or output), and

the distance vectors for all dependences in the following loops.

(a) do i = 3, 100

a(i) = a(i – 1) + a(i + 1) – a(i – 2)

enddo

(b) do i = 2, 100

a(3 * i) = a(3 * i – 3) + a(3 * i + 3)

enddo

(c) do i = 1, 10

a(i) = a(5) + a(i)

enddo

Use aW (i) and aR(i) to annotate the write access to a(i) and the read access to a(i) respectively.

Problem 3 – Loop Parallelization

Given the following nested loop:

1

do i = 2 , 100

do j = 2 , 100

S1 : a (i , j ) = b ( i – 1 , j – 1) + 2

S2 : b (i , j ) = i + j – 1

enddo

enddo

(a) Give the statement-level dependence graph. Show the dependence graph for statement instances

in a part of the iteration space: i = 2 … 5, j = 2 … 5.

(b) In its current form, can any loop be parallelized? If so, which loop(s)? If not, justify your

answer.

(c) Provide one valid affine schedule for statements S1 and S2 such that p(S1)= C11*i + C12*j +

d1 and p(S2)= C21*i + C22*j + d2 in order to achieve synchronization-free parallelism. There

could be many possible solutions for {C11, C12, C21, C22, d1, d2}. You only need to provide one

feasible solution. (Hint: You can let d1 = d2 = 0.)

(d) Generate two-level loop code for the affine schedule you provided. Please use p as outermost

loop and i as innermost loop in the transformed loop. Calculate the loop bounds for p and i

using Fourier-Motzkin elimination. You might need to calculate the overlapping polyhedron for

S1 and S2 in order to eliminate the j loop. Please refer to the techniques for code generation in

lecture 20 and lecture 21.

2

Sale!