The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI.

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min

Introduction Motivation : Current parallelizing compilers cannot handle complex or statically insufficiently defined access patterns. ( input dependent, run-time dependent conditions, subscripted subscripts, etc…) LRPD Test - Speculatively executes the loop as a doall - applies a fully parallel data dependency test (x-iter.) - if the test fails, then the loop is re-executed serially

Inspector-Executor Method Inspector/Executor - extract and analyze the memory access pattern - transform the loop if necessary and execute Disadvantage - cost and side effect : if the address computation of the array under test depends on the actual data computation. - parallel execution of the inspector loop is not always possible

speculative run-time parallelization Static analysis Run-time transformations Polaris Checkpoint Speculative parallel execution test restore heuristic fail pass reorder sequential execution Compile time Run Time

Hazards (during the speculative execution) Exceptions - invalidate the parallel execution - clear the exception flag, restore the values of any altered variables, and execute serially. Cross-iteration dependencies in the loop - LRPD Test

LPD Test (The Lazy Privatizing doall Test) 1. Marking Phase - For each shared array A[1:s] - read, write and not-private shadow arrays, A r [1:s], A w [1:s], and A np [1:s] (a) Uses : if this array element has not been modified, then set corresponding elem. in A r and A np (b) Defs : set corresp. elem. in A w and clear in A r if set. (c) tw i (A) : Count the total number of write accesses to A that are set in this iteration (i : iteration #)

LPD Test (The Lazy Privatizing doall Test) 2. Analysis Phase (Performed after the speculative exec.) (a) Compute (i) tw(A) = (tw i (A)) (ii) tm(A) = sum(A w [1:s]) (iii) tm(A) != tw(A) : cross iteration output depend. (b) If any(A w [:] & A r [:]), then ends the phase. : def and use values stored at the same location in different iterations (flow/anti dependency)

LPD Test (The Lazy Privatizing doall Test) 2. Analysis Phase (Performed after the speculative exec.) (c) Else if tw(A) == tm(A), then the loop is doall (without privatizing the array A) (d) Else if any(A w [:] & A np [:]), then the array A is not privatizable. (there is at least one iteration in which some element of A was used before modified) (e) Otherwise, the loop was made into a doall by privatizing the shared array A.

Dynamic dead reference elimination To avoid introducing false dependences, the marking of the read and private shadow arrays, A r and A np can be postponed until the value of the shared variable is actually used. Definition : A dynamic dead read reference in a loop is a read access of a shared variable that does not contribute to the computation of any other shared variable which is live at loop end. The “lazy” marking employed by the LPD test, i.e., the dynamic dead reference elimination tech., allows it to qualify more loops than the PD test.

PD Test Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then A(L(i)) = z + C(i) endif enddo PD testShadow arraystwtm 1234 AwAw ArAr 1111 A np 1111 A w (:) & A r (:) A w (:) & A np (:) Do i=1, 5 markread(K(i)) z = A(K(i)) if (B1(i).eq..true.) then markwrite(L(i)) A(L(i)) = z + C(i) endif enddo B1(1:5) = (1 0 1 0 1) K(1:5) = (1 2 3 4 1) L(1:5) = (2 2 4 4 2)

PD Test Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then A(L(i)) = z + C(i) endif enddo PD testShadow arraystwtm 1234 AwAw 010132 ArAr 1010 A np 1111 A w (:) & A r (:)0000 A w (:) & A np (:)0101 Do i=1, 5 markread(K(i)) z = A(K(i)) if (B1(i).eq..true.) then markwrite(L(i)) A(L(i)) = z + C(i) endif enddo B1(1:5) = (1 0 1 0 1) K(1:5) = (1 2 3 4 1) L(1:5) = (2 2 4 4 2)

LPD Test Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then A(L(i)) = z + C(i) endif enddo PD testShadow arraysTwtm 1234 AwAw 010132 ArAr 1010 A np 1010 A w (:) & A w (:)0000 A w (:) & A np (:)0000 Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then markread(K(i)) markwrite(L(i)) A(L(i)) = z + C(i) endif enddo B1(1:5) = (1 0 1 0 1) K(1:5) = (1 2 3 4 1) L(1:5) = (2 2 4 4 2)

Run-time Reduction Parallelization Recognition of reduction variable + Parallelizing reduction variable Pattern matching identification - The DD test to qualify a statement as a reduction statement cannot be performed statically in the presence of input- dependent access patterns. - Syntactic pattern matching cannot identify all potential reduction variables (e.g. subscripted subscripts)

The LRPD Test : Extending the LPD Test for Reduction Validation do i = 1, n S1: A(K(i)) = ……… S2: ……… = A(L(i)) S3: A(R(i)) = A(R(i)) + exp() enddo doall i = 1, n markwrite(K(i)) markredux(K(i)) S1: A(K(i)) = ……… markread(L(i)) markredux(L(i)) S2: ……… = A(L(i)) markwrite(R(i)) S3: A(R(i)) = A(R(i)) + exp() enddo (a) Source program (b) transformed program markredux operation sets the shadow array element of A nx to true A nx : To check only that the reduction variable is not accessed outside the single reduction statement.

LRPD Test Modified Analysis Pass - 2(d’) Else if any(A w [:] & A np [:] & A nx [:]), then some elements of A written in the loop is neither a reduction variable nor privatizable. Thus, the loop is not a doall and the phase ends. - 2(e’) Otherwise, the loop was made into a doall by parallelizing reduction and privatization.

Performance (1)

Performance (2)

Experimental Results Summary

Other Run-time Parallelization Papers “Techniques for Speculative Run-Time Parallelization of Loops”, Manish, Gupta and Rahul Nim, SC’98. - More efficient run-time array privatization - No rolling back of entire loop computation and complete the loop (by generating synchronization) - Early hazard detection

Other Run-time Parallelization Papers “Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors”, Ye Zhang, L., Rauchwerger, and Josep Torrellas. HPCA 1998. - Run-time parallelization techniques are often computationally expensive and not general enough. - Idea : execute the code in parallel speculatively and let extended cache coherence protocol hardware detect any dependence violations. - Perf. 7.3 for 16 procs. & 50% faster than soft-only

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI.

Similar presentations

Presentation on theme: "The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI.

Similar presentations

Presentation on theme: "The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI."— Presentation transcript:

Similar presentations

About project

Feedback