Parallelization, Compilation and Platforms 5LIM0

Parallelization, Compilation and Platforms 5LIM0
Loop parallelization Dependence checking Henk Corporaal March 2016

Theory and Practice of Dependence Testing
Data and control dependences Scalar data dependences True-, anti-, and output-dependences Loop dependences Loop parallelism Dependence distance and direction vectors Loop-carried dependences Loop-independent dependences Loop dependence and loop iteration reordering Dependence tests Slides largely based on Van Engelen, FSUniv. See also:

Data vs Control Dependences
Data dependence: data is produced and consumed in the correct order Control dependence: a dependence that arises as a result of control flow S1 PI = S2 R = 5.0 S3 AREA = PI * R ** 2 S1 IF (T.EQ.0.0) GOTO S3 S2 A = A / T S3 CONTINUE

Data Dependence Definition
There is a data dependence from statement S1 to statement S2 iff 1) both S1 and S2 access the same memory location and at least one of them stores into it, and 2) there is a feasible run-time execution path from S1 to S2

Data Dependence Classification
Dependence relations are similar to hardware hazards (which may cause pipeline stalls) True dependences (also called flow dependences), denoted by S1  S2, are the same as RAW (Read After Write) hazards Anti dependences, denoted by S1 -1 S2, are the same as WAR (Write After Read) hazards Output dependences, denoted by S1 o S2, are the same as WAW (Write After Write) hazards Note: anti and output dependences are often called false dependences? Why? They are real? How to avoid them? S1 X = … S2 … = X S1 … = X S2 X = … S1 X = … S2 X = … S1  S2 S1 -1 S2 S1 o S2

Compiling for Loop Parallelism
Dependence properties are crucial for determining and exploiting loop parallelism Bernstein’s conditions: Iteration I1 does not write into a location that is read by iteration I2 Iteration I2 does not write into a location that is read by iteration I1 Iteration I1 does not write into a location that is written into by iteration I2

Fortran’s PARALLEL DO The PARALLEL DO (FORTRAN) / ForALL guarantees that there are no scheduling constraints among its iterations PARALLEL DO I = 1, N A(I+1) = A(I) + B(I) ENDDO PARALLEL DO I = 1, N A(I) = A(I) + B(I) ENDDO PARALLEL DO I = 1, N A(I-1) = A(I) + B(I) ENDDO OK PARALLEL DO I = 1, N S = S + B(I) ENDDO Not OK

Loop Dependences While the PARALLEL DO requires the programmer to assure the loop iterations can be executed in parallel, an optimizing compiler must analyze loop dependences for automatic loop parallelization and vectorization Granularity of parallelism Instruction level parallelism (ILP), not loop-specific Synchronous (e.g. SIMD, also vector and SSE) parallelism is fine grain and compiler aims to optimize inner loops Asynchronous (e.g. MIMD, work sharing) parallelism is coarse grain and compiler aims to optimize outer loops

Synchronous Parallel Machines
SIMD model, simpler and cheaper compared to MIMD Processors operate in lockstep executing the same instruction on different portions of the data space Application should exhibit data parallelism Control flow requires masking operations (similar to predicated instructions), which can be inefficient p1,1 p1,2 p1,3 p1,4 p2,1 p2,2 p2,3 p2,4 p3,1 p3,2 p3,3 p3,4 p4,1 p4,2 p4,3 p4,4 Example 4x4 processor mesh

Asynchronous Parallel Computers
Multiprocessors Shared memory is directly accessible by each PE (processing element) Multicomputers Distributed memory requires message passing between PEs SMPs Shared memory in small clusters of processors, need message passing across clusters p1 p2 p3 p4 mem bus p1 p2 p3 p4 m1 m2 m3 m4 bus, x-bar, or network

Advanced Compiler Technology
Powerful restructuring compilers transform loops into parallel or vector code Must analyze data dependences in loop nests besides straight-line code DO I = 1, N DO J = 1, N C(J,I) = DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDO ENDDO parallelize vectorize DO I = 1, N DO J = 1, N, C(J:J+63,I) = DO K = 1, N C(J:J+63,I) = C(J:J+63,I) A(J:J+63,K)*B(K,I) ENDDO ENDDO ENDDO PARALLEL DO I = 1, N DO J = 1, N C(J,I) = DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDO ENDDO

Normalized Iteration Space
In most situations it is preferable to use normalized iteration spaces For an arbitrary loop with loop index I and loop bounds L and U and stride S, the normalized iteration number is i = (I-L+S)/S DO I = 100, 20, A(I) = B(100-I) + C(I/5) ENDDO is normalized with i = (110-I)/10 DO i = 1, 9 A(110-10*i) = B(10*i-10) C(22-2*i) ENDDO

Dependence in Loops In this example loop statement S1 “depends on itself” from the previous iteration More precisely, statement S1 in iteration i has a true dependence with S1 in iteration i+1 DO I = 1, N S1 A(I+1) = A(I) + B(I) ENDDO S1 A(2) = A(1) + B(1) S1 A(3) = A(2) + B(2) S1 A(4) = A(3) + B(3) S1 A(5) = A(4) + B(4) … is executed as

Iteration Vector Definition
Given a nest of n loops, the iteration vector i of a particular iteration of the innermost loop is a vector of integers i = (i1, i2, …, in) where ik represents the iteration number for the loop at nesting level k The set of all possible iteration vectors is an iteration space

Iteration Vector Example
The iteration space of the statement at S1 is {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} At iteration i = (2,1) the value of A(1,2) is assigned to A(2,1) DO I = 1, DO J = 1, I S1 A(I,J) = A(J,I) ENDDO ENDDO (1,1) (2,1) (2,2) (3,2) (3,3) (3,1) I J

Iteration Vector Ordering
The iteration vectors are naturally ordered according to a lexicographical order, e.g. iteration (1,2) precedes (2,1) and (2,2) in the example on the previous slide Definition, assuming normalized loops: Iteration i precedes iteration j, denoted i < j, iff 1) i[1] < j[1], or 2) i[1:k-1] = j[1:k-1] and ik < jk i.e. the first k indices are equal and the first non-equal index value should be smaller lexicographical ordering represents the ordering of iterations as defined by the original, un-transformed, loop; so the ordering specified by the programmer.

Loop Dependence Definition
There exist a dependence from S1 to S2 in a loop nest iff there exist two iteration vectors i and j such that 1) i < j and there is a path from S1 to S2 2) S1 accesses memory location M on iteration i and S2 accesses memory location M on iteration j 3) one of these accesses is a write What if both are writes: WaW

Loop Dependence Example
Show that the example loop has no loop dependences Answer: there are no iteration vectors i and j in {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} such that i < j and either S1 in i writes to the same element of A that is read at S1 in iteration j, or S1 in iteration i reads an element A that is written in iteration j DO I = 1, DO J = 1, I S1 A(I,J) = A(J,I) ENDDO ENDDO What about WaW?

Loop Reordering Transformations
Definitions A reordering transformation is any program transformation that changes the execution order of the code, without adding or deleting any statement executions A reordering transformation preserves a dependence if it preserves the relative execution order of the source and sink of that dependence

Fundamental Theorem of Dependence
Any reordering transformation that preserves every dependence in a program preserves the meaning of that program

Valid Transformations
A transformation is said to be valid for the program to which it applies if it preserves all dependences in the program DO I = 1, 3 S1 A(I+1) = B(I) S2 B(I+1) = A(I) ENDDO invalid valid DO I = 1, B(I+1) = A(I) A(I+1) = B(I) ENDDO DO I = 3, 1, A(I+1) = B(I) B(I+1) = A(I) ENDDO

Dependences and Loop Transformations
Loop dependences are tested before a transformation is applied When a dependence test is inconclusive, dependence must be assumed In this example the value of K is unknown and loop dependence is assumed: DO I = 1, N S1 A(I+K) = A(I) + B(I) ENDDO

Dependence Distance Vector
Definition Suppose that there is a dependence from S1 on iteration i to S2 on iteration j; then the dependence distance vector d(i, j) is defined as d(i, j) = j - i Example: True dependence between S1 and itself on i = (1,1) and j = (2,1): d(i, j) = (1,0) i = (2,1) and j = (3,1): d(i, j) = (1,0) i = (2,2) and j = (3,2): d(i, j) = (1,0) DO I = 1, DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO

Dependence Direction Vector
Definition Suppose that there is a dependence from S1 on iteration i and S2 on iteration j; then the dependence direction vector D(i, j) is defined as: D(i, j)k = “<” if d(i, j)k > 0 “=” if d(i, j)k = 0 “>” if d(i, j)k < 0 Watch the flipping of the condition from < to >

Example DO I = 1, 3 DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO
(1,1) (2,1) (2,2) (3,2) (3,3) (3,1) I J DO I = 1, DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO source sink True dependence between S1 and itself on i = (1,1) and j = (2,1): d(i, j) = (1,0), D(i, j) = (<, =) i = (2,1) and j = (3,1): d(i, j) = (1,0), D(i, j) = (<, =) i = (2,2) and j = (3,2): d(i, j) = (1,0), D(i, j) = (<, =)

Data Dependence Graphs
Data dependences arise between statement instances It is generally infeasible to represent all data dependences that arise in a program Usually only static data dependences are recorded using the data dependence direction vector and depicted in a data dependence graph to compactly represent data dependences in a loop nest DO I = 1, S1 A(I) = B(I) * 5 S2 C(I+1) = C(I) + A(I) ENDDO Static data dependences for accesses to A and C: S1 (=) S2 and S2 (<) S2 Explain difference static and dynamic dependences (compare DSA and SSA). E.g. a static dependence may not hold for all iterations, i.e. for all dynamic statement instances Data dependence graph: S1 (=) (<) S2

Loop-Independent Dependences
Definition Statement S1 has a loop-independent dependence on S2 iff there exist two iteration vectors i and j such that 1) S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and i = j 2) there is a control flow path from S1 to S2 within the iteration Also called INTRA loop dependence

Loop-Carried Dependences
Definitions Statement S1 has a loop-carried dependence on S2 iff there exist two iteration vectors i and j such that S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and d(i, j) > 0, (that is, D(i, j) contains a “<” as its leftmost non-“=” component) A loop-carried dependence from S1 to S2 is 1) forward if S2 appears after S1 in the loop body 2) backward if S2 appears before S1 (or if S1=S2) The level of a loop-carried dependence is the index of the leftmost non- “=” of D(i, j) for the dependence Also called INTER-loop iteration dependences

Example (1) The loop-carried dependence S1 (<) S2 is forward
The loop-carried dependence S2 (<) S1 is backward All loop-carried dependences are of level 1, because D(i, j) = (<) for every dependence DO I = 1, 3 S1 A(I+1) = F(I) S2 F(I+1) = A(I) ENDDO S1 (<) S2 and S2 (<) S1

Example (2) All loop-carried dependences are of level 3, because D(i, j) = (=, =, <) Note: level-k dependences are also denoted by Sx k Sy DO I = 1, DO J = 1, DO K = 1, 10 S A(I,J,K+1) = A(I,J,K) ENDDO ENDDO ENDDO S1 (=,=,<) S1 Alternative notation for a level-3 dependence S1 3 S1

Direction Vectors and Reordering Transformations (1)
Let T be a transformation on a loop nest that does not rearrange the statements in the body of the innermost loop; then T is valid if, after it is applied, none of the direction vectors for dependences with source and sink in the nest has a leftmost non-“=” component that is “>”

A reordering transformation preserves all level-k dependences if 1) it preserves the iteration order of the level-k loop, 2) if it does not interchange any loop at level < k to a position inside the level-k loop, and 3) if it does not interchange any loop at level > k to a position outside the level-k loop

Both S1 (<) S2 and S2 (<) S1 are level-1 dependences
Examples DO I = 1, 10 S1 F(I+1) = A(I) S2 A(I+1) = F(I) ENDDO DO I = 1, DO J = 1, DO K = 1, 10 S A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO Find deps. Find deps. Both S1 (<) S2 and S2 (<) S1 are level-1 dependences S1 (<,<,<) S1 is a level-1 dependence Choose transform Choose transform DO I = 1, DO K = 10, 1, DO J = 1, 10 S A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO DO I = 1, 10 S2 A(I+1) = F(I) S1 F(I+1) = A(I) ENDDO

A reordering transformation preserves a loop- independent dependence between statements S1 and S2 if it does not move statement instances between iterations and preserves the relative order of S1 and S2

S1 (=) S1 is a loop-independent dependence
Examples DO I = 1, 10 S1 A(I) = … S2 … = A(I) ENDDO DO I = 1, DO J = 1, 10 S1 A(I,J) = … S2 … = A(I,J) ENDDO ENDDO Find deps. Find deps. S1 (=) S1 is a loop-independent dependence S1 (=,=) S1 is a loop-independent dependence Choose transform Choose transform DO J = 10, 1, DO I = 10, 1, -1 S1 A(I,J) = … S2 … = A(I,J) ENDDO ENDDO DO I = 10, 1, -1 S2 A(I) = … S1 … = A(I) ENDDO

Combining Direction Vectors
When a loop nest has multiple different directions at the same loop level k, we sometimes abbreviate the level-k direction vector component with “*” S1 (*) S2 = S1 (<) S2 S1 (=) S2 S1 (>) S2 DO I = 1, 9 S1 A(I) = … S2 … = A(10-I) ENDDO DO I = 1, 10 S1 S = S + A(I) ENDDO S1 (*) S1 DO I = 1, DO J = 1, 10 S1 A(J) = A(J) ENDDO ENDDO S1 (*,=) S1

Dependence vectors: example 1
DO I = 1, DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO Distance vector is (1,0) S1 (<,=) S1 1 2 3 J I Note: J is inner-loop index, plotted vertically

J 4 DO I = 1, DO J = 1, 4 S1 A(I,J+1) = A(I-1,J) ENDDO ENDDO 3 2 1 Distance vector is (1,1) S1 (<,<) S1 I 1 2 3 4

J 4 DO I = 1, DO J = 1, 5-I S1 A(I+1,J+I-1) = A(I,J+I-1) ENDDO ENDDO 3 2 1 Distance vector is (1,-1) S1 (<,>) S1 I 1 2 3 4

J 4 DO I = 1, DO J = 1, 4 S1 B(I) = B(I) + C(I) S2 A(I+1,J) = A(I,J) ENDDO ENDDO 3 2 1 I 1 2 3 4 Distance vectors are (0,1) and (1,0) S1 (=,<) S1 and S2 (<,=) S2

Parallelization It is valid to convert a sequential loop to a parallel loop if the loop carries no inter- iteration dependence DO I = 1, DO J = 1, 4 S1 A(I,J+1) = A(I,J) ENDDO ENDDO S1 (=,<) S1 parallelize J 4 3 PARALLEL DO I = 1, DO J = 1, 4 S1 A(I,J+1) = A(I,J) ENDDO ENDDO 2 1 I 1 2 3 4

Vectorization (1) A single-statement loop that carries no (inter- iteration) dependence can be vectorized DO I = 1, 4 S1 X(I) = X(I) + C ENDDO vectorize S1 X(1:4) = X(1:4) + C X(1) X(2) X(3) X(4) + Fortran 90 array statement C X(1) X(2) X(3) X(4) Vector operation

Vectorization (2) Loop-carried (inter- iteration) dependence => and is therefore invalid S1 (<) S1 DO I = 1, N S1 X(I+1) = X(I) + C ENDDO S1 X(2:N+1) = X(1:N) + C Invalid (Parallel) Fortran 90 statement I=1 I=12 I=3 I=4 I=5 I=6 I=7 I=8

Vectorization (3) For some compilers: only single loop statements can be vectorized, loops with multiple statements must be transformed using the loop distribution transformation Loop has no loop-carried dependence or has forward flow dependences DO I = 1, N S1 A(I+1) = B(I) + C S2 D(I) = A(I) + E ENDDO loop distribution DO I = 1, N S1 A(I+1) = B(I) + C ENDDO DO I = 1, N S2 D(I) = A(I) + E ENDDO vectorize S1 A(2:N+1) = B(1:N) + C S2 D(1:N) = A(1:N) + E

Vectorization (4) When a loop has backward flow dependences and no loop-independent dependences, interchange the statements to enable loop distribution DO I = 1, N S1 D(I) = A(I) + E S2 A(I+1) = B(I) + C ENDDO loop distribution DO I = 1, N S2 A(I+1) = B(I) + C ENDDO DO I = 1, N S1 D(I) = A(I) + E ENDDO vectorize S2 A(2:N+1) = B(1:N) + C S1 D(1:N) = A(1:N) + E

Vectorization (5) Can this loop be vectorized?
Can this loop be parallelized? DO I = 1, N S1 A(I) = A[I+1] ENDDO I=1 I=12 I=3 I=4 I=5 I=6 I=7 I=8

Preliminary Transformations to Support Dependence Testing
To determine distance and direction vectors with dependence tests require array subscripts to be in a standard form Preliminary transformations put more subscripts in affine form, where subscripts are linear integer functions of loop induction variables Loop normalization Induction-variable substitution Constant propagation

Loop Normalization S1 (<,=) S1 S1 (<,>) S1 Normalized loop:
iteration starts at 0 (or 1) increment with 1 Dependence tests assume iteration spaces are normalized The normalization distorts dependences, but it is otherwise preferred to support loop analysis Make lower bound 1 (or 0) Make stride 1 DO I = 1, M DO J = I, N S1 A(J,I) = A(J,I-1) ENDDO ENDDO DO I = 1, M DO J = 1, N-I+1 S1 A(J+I-1,I) = A(J+I-1,I-1) ENDDO ENDDO normalize S1 (<,>) S1 S1 (<,=) S1

Data Flow Analysis Data flow analysis with reaching definitions or SSA form supports constant propagation and dead code elimination The above are needed to standardize array subscripts K = 2 DO I = 1, 100 S1 A(I+K) = A(I) ENDDO constant propagation DO I = 1, 100 S1 A(I+2,I) = A(I) ENDDO

Forward Substitution Forward substitution and induction variable substitution (IVS) yield array subscripts that are amenable to dependence analysis DO I = 1, K = I + 2 S1 A(K) = A(K) ENDDO forward substitution DO I = 1, 100 S1 A(I+2) = A(I+2) ENDDO

Induction Variable (IV) Substitution
Example loop with IVs After IV substitution (IVS) (note the affine indexes) After parallelization I = 0 J = 1 while (I<N) I = I … = A[J] J = J K = 2*I A[K] = … endwhile for i=0 to N-1 S1: … = A[2*i+1] S2: A[2*i+2] = … endfor forall (i=0,N-1) … = A[2*i+1] A[2*i+2] = … endforall IVS Dep test A[] … GCD test to solve dependence equation 2id - 2iu = -1 Since 2 does not divide 1 there is no data dependence. W R A[2*i+1] A[2*i+2]

Dependence Testing After loop normalization, IVS, and constant propagation, array subscripts are standardized Affine subscripts are important, since most dependence tests requires affine forms: a1 i1 + a2 i2 + … + an in + e Non-linear subscripts contain unknowns (symbolic terms), function values, array values, shifts >> <<, etc.

Dependence Equation (1)
A dependence equation defines the access requirement DO I = 1, N S1 A(f(I)) = A(g(I)) ENDDO To prove flow dependence: for which values of  <  is f() = g() DO I = 1, N S1 A(I+1) = A(I) ENDDO +1 =  has solution =-1 DO I = 1, N S1 A(2*I+1) = A(2*I) ENDDO 2*+1 = 2* has no solution

A dependence equation defines the access requirement DO I = 1, N S1 A(f(I)) = A(g(I)) ENDDO To prove anti dependence: for which values of  >  is f() = g() DO I = 1, N S1 A(I+1) = A(I) ENDDO +1 =  has no solution DO I = 1, N S1 A(2*I) = A(2*I+2) ENDDO 2* = 2*+2 has solution =+1

To prove flow dependence: for which values of  <  is f() = g() DO I = 1, N DO J = 1, N S1 A(I-1) = A(J) ENDDO ENDDO 1-1 = 2 has solution, note =(1, 2) and =(1, 2) are iteration vectors DO I = 1, N S1 A(5) = A(I) ENDDO 5 =  has solution if 5 < N DO I = 1, N S1 A(5) = A(6) ENDDO 5 = 6 has no solution

ZIV, SIV, and MIV ZIV (zero index variable) subscript pairs
SIV (single index variable) subscript pairs MIV (multiple index variable) subscript pairs DO I = … DO J = … DO K = … S A(5,I+1,J) = A(N,I,K) ENDDO ENDDO ENDDO ZIV pair SIV pair MIV pair

Example First distance vector component is easy (SIV), but second is not so easy (MIV) SIV pair MIV pair DO I = 1, DO J = 1, 5-I S1 A(I+1,J+I-1) = A(I,J+I-1) ENDDO ENDDO J 4 3 Distance vector is (1,-1) S1 (<,>) S1 2 1 I 1 2 3 4

Separability and Coupled Subscripts
When testing multidimensional arrays, subscript are separable if indices do not occur in other subscripts ZIV and SIV subscripts pairs are separable MIV pairs may give rise to coupled subscript groups DO I = … DO J = … DO K = … S A(I,J,J) = A(I,J,K) ENDDO ENDDO ENDDO Index J is used in both 2nd and 3rd dimension of A-index SIV pair is separable MIV pair gives coupled indices

Dependence Testing Overview
Partition subscripts into separable and coupled groups Classify each subscript as ZIV, SIV, or MIV For each separable subscript, apply the applicable single subscript test (ZIV, SIV, or MIV) to prove independence or produce direction vectors for dependence For each coupled group, apply a multiple subscript test If any test yields independence, no further testing needed Otherwise, merge all dependence vectors into one set

Solving Data Dependences
Memory disambiguation Left: Unsolvable at compile time Array indices should be affine = linear expression of (outer) loop bounds read(n) Do I = 1, 4 A[I] = A[n] + 3

Dependence Testing is Conservative or Exact
Dependence tests solve dependence equations Conservative tests attempts to prove that there is no solution to a dependence equation Absence of a dependence is proven Proving dependence sometimes possible and useful But if proofs fail, dependence must be assumed When dependence has to be assumed, translated code is correct but suboptimal Exact tests detect dependences iff they exists

Test overview: comparing 2 index expr.
ZIV Zero Induction Variables: compares 2 expression containing no loop indices SIV Single Induction Variable MIV Multiple Induction Variable GCD: testing multiple dimensions GCD performs no loop bounds checking so GCD can only prove independence !! Banerjee: enhanced GCD, taking loop bounds into account

ZIV Test The ZIV test compares two subscripts
If the expressions are proven unequal, no dependence can exist DO I = 1, N S1 A(5) = A(6) ENDDO K = DO I = 1, N S1 A(5) = A(K) ENDDO K = DO I = 1, N DO J = 1, N S1 A(I,5) = A(I,K) ENDDO ENDDO

Strong SIV Test Requires subscript pairs of the form aI+c1 and aI’+c2 for the def and use, respectively Dependence equation aI+c1 = aI’+c2 has a solution if the dependence distance d = (c1 - c2)/a is integer and |d| < U - L with L and U loop bounds DO I = 1, N S1 A(I+1) = A(I) ENDDO DO I = 1, N S1 A(2*I+2) = A(2*I) ENDDO DO I = 1, N S1 A(I+N) = A(I) ENDDO

Weak-Zero SIV Test Requires subscript pairs of the form a1I+c1 and a2I’+c2 with either a1= 0 or a2 = 0 If a2 = 0, the dependence equation I = (c2 - c1)/a1 has a solution if (c2 - c1)/a1 is integer and L < (c2 - c1)/a1 < U If a1 = 0, similar case DO I = 1, N S1 A(I) = A(1) ENDDO DO I = 1, N S1 A(2*I+2) = A(2) ENDDO DO I = 1, N S1 A(1) = A(I) + A(1) ENDDO

GCD The greatest common divisor (GCD) of integers a1, a2, …, an, denoted gcd(a1, a2, …, an), is the largest integer that evenly divides all these integers. Theorem: The linear Diophantine equation has an integer solution x1, x2, …, xn iff gcd(a1, a2, …, an) divides c

GCD examples Example 1: gcd(2,-2) = 2. No solutions
Example 2: gcd(24,36,54) = 6. Many solutions

GCD Test Assumes that f() and g() are affine: f(x1,x2,…,xn)= a0+a1x1+…+anxn g(y1,y2,…,yn)= b0+b1y1+…+bnyn Reordering gives linear Diophantine equation a1x1-b1y1 +…+anxn-bnyn =b0-a0 which has a solution iff gcd(a1,…,an,b1,…,bn) divides b0-a0 DO I = 1, N S1 A(2*I+1) = A(2*I) ENDDO DO I = 1, N S1 A(4*I+1) = A(2*I+3) ENDDO DO I = 1, DO J = 1, 10 S1 A(1+2*I+20*J) = A(2+20*I+2*J) ENDDO ENDDO

GCD dependence test example 1
for(int i=0;i<100;i++) { s1 a[2*i]=b[i]; s2 c[i]=a[4*i+6]; } GCD (2,4) = 2, does divide 6 => solution, e.g. 2*3 = 4*0 + 6 is this possible? GCD does not check direction, and whether the solution is within the loop bounds !!

GCD dependence test example 2
for (i=0; i< Ni; i++) for (j=0; j<Nj, j++) A(4*i + 2*j + 1) = .. A(6*i + 2*j + 4) gcd(4,-6,2,-2) = 2 Does 2 divide 4-1?

Banerjee Test for (i=L; i<=U; i++) { x[a_0 + a_1*i] = = x[b_0 + b_1*i] } Does a_0 + a_1*i = b_0 + b_1*i’ for some integer i and i’? If so then (a_1*i - b_1*i’) = (b_0 - a_0) Determine upper and lower bounds on (a_1*i - b_1*i’) for (i=1; i<=5; i++) x[i+5] = x[i]; upper bound = a_1*max(i) - b_1 * min(i’) = 4 lower bound = a_1*min(i) - b_1*max(i’) = -4 b_0 - a_0 = 5 is out of this range!

Run-Time Dependence Testing
When a dependence equation includes unknowns, we can sometimes find the constraints on these unknowns to solve the equation and impose these constraints at run time on multi-version code DO I = 1, 10 S1 A(I) = A(I-K) + B(I) ENDDO IF K = 0 OR K < OR K > 9 THEN PARALEL DO I = 1, 10 S1 A(I) = A(I-K) + B(I) ENDDO ELSE DO I = 1, 10 S1 A(I) = A(I-K) + B(I) ENDDO ENDIF

If nothing work: use Profiling
Track dependences you can insert them in the IR of your compiler can 'easily' check all dependences, including e.g. A[B[i]] cases, and inter procedural dependences for i=0,N A[i] = f(..) function f(..) for j=0,M A[j] = A[..] However: when you do not 'see' a dependence (during profiling) independence is not guaranteed

What did you learn?

Summary Loop can be parallelized or vectorized if no inter-iteration dependences GCD test can check whether two affine index expressions can be equal it does not check the dependence directions it does not check whether the solution is within loop bounds More advanced tests needed can take larger computation time can use iteratively more elaborate tests to prove independence

Backup slides

Banerjee Test (1) f() and g() are affine: f(x1,x2,…,xn)= a0+a1x1+…+anxn g(y1,y2,…,yn)= b0+b1y1+…+bnyn Dependence equation a0-b0+a1x1-b1y1 +…+anxn-bnyn=0 has no solution if 0 is not within the lower bound L and upper bound U of the equation’s LHS IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO Dependence equation  = +M is rewritten as - M +  -  = 0 Constraints: 1 < M <  0 <  < 9 0 <  < 9

Banerjee Test (2) Test for (*) dependence: possible dependence because 0 lies within [-,8] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M +  -  = 0 L U step - M +  -  original eq. - M +  - 9 - M +  - 0 eliminate  - M - 9 - M + 9 eliminate  -  8 eliminate M Constraints: 1 < M <  0 <  < 9 0 <  < 9

1 < M <  0 <  < -1 < 8
Banerjee Test (3) Test for (<) dependence: assume that  < -1 No dependence because 0 does no lie within [-,-2] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M +  -  = 0 L U step - M +  -  original eq. - M +  - 9 - M +  - (+1) eliminate  - M - 9 - M - 1 eliminate  -  - 2 eliminate M Constraints: 1 < M <  0 <  < -1 < 8 1 < +1 <  < 9

1 < M <  1 < -1 <  < 9
Banerjee Test (4) Test for (>) dependence: assume that +1 >  Possible dependence because 0 lies within [-,8] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M +  -  = 0 L U step - M +  -  original eq. - M +  - (+1) - M +  eliminate  - M - 1 - M + 9 eliminate  -  8 eliminate M Constraints: 1 < M <  1 < -1 <  < 9 0 <  < +1 < 8

1 < M <  0 <  =  < 9
Banerjee Test (5) Test for (=) dependence: assume that  =  No dependence because 0 does not lie within [-,-1] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M +  -  = 0 L U step - M +  -  original eq. - M +  -  eliminate  - M eliminate  -  - 1 eliminate M Constraints: 1 < M <  0 <  =  < 9

Testing for All Direction Vectors
Refine dependence between a pair of statements by testing for the most general set of direction vectors, and upon failure refine each component recursively (*,*,*) (=,*,*) (<,*,*) (>,*,*) (<,=,*) (<,<,*) (<,>,*) (<,=,=) (<,=,<) (<,=,>) yes (may depend) yes no (indep)

Exact analysis Most memory disambiguations are simple integer programs. Approach: Solve exactly – yes, or no solution Solve exactly with Fourier-Motzkin + branch and bound Omega package from University of Maryland

Parallelization, Compilation and Platforms 5LIM0

Similar presentations

Presentation on theme: "Parallelization, Compilation and Platforms 5LIM0"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallelization, Compilation and Platforms 5LIM0

Similar presentations

Presentation on theme: "Parallelization, Compilation and Platforms 5LIM0"— Presentation transcript:

Similar presentations

About project

Feedback