Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ohio State Univ Effective Automatic Parallelization of Stencil Computations * Sriram Krishnamoorthy 1 Muthu Baskaran 1, Uday Bondhugula 1, Atanas Rountev.

Similar presentations


Presentation on theme: "Ohio State Univ Effective Automatic Parallelization of Stencil Computations * Sriram Krishnamoorthy 1 Muthu Baskaran 1, Uday Bondhugula 1, Atanas Rountev."— Presentation transcript:

1 Ohio State Univ Effective Automatic Parallelization of Stencil Computations * Sriram Krishnamoorthy 1 Muthu Baskaran 1, Uday Bondhugula 1, Atanas Rountev 1, J. Ramanujam 2, P. Sadayappan 1 1 The Ohio State University 2 Lousiana State University * Work supported by NSF

2 Ohio State Univ Introduction Stencil computations Sweep through large data set Multiple time iterations Simple load balanced schedule Tiling – essential to improve data locality Dependences between tiles Pipelined execution Skewed iteration spaces – load imbalance Solution: Adjust tiling – re-enable concurrent execution

3 Ohio State Univ Motivation FOR t = 0 TO T-1 FOR i = 1 TO N-1 A[t,i]=(A[t,i-1]+A[t,i]+A[t,i+1])/3 t i

4 Ohio State Univ Notation Iteration space B: n-dim polyhedron Dependences D: n-dim vectors Hyperplanes H: n-dim normal vectors Tile bounded by pairs of hyperplanes

5 Ohio State Univ Approach Concurrent start in non-tiled iteration space Identify hyperplanes inhibiting concurrent start in tiled space Replace one face for each inhibiting pair Overlapped Tiling – Replace “back-face” Split Tiling – Replace “front-face”

6 Ohio State Univ Concurrent Start: Before Tiling Condition: A boundary that does not carry any dependence

7 Ohio State Univ Inter-tile Dependences Shift vectors Tile traversal order Normal to all other hyperplanes Hyperplane carries dependence A dependence “pokes” through Inter-tile dependence vector Shift vector Corresponding hyperplane carries dependence

8 Ohio State Univ Concurrent Start Inhibition Concurrent start in original iteration space along a boundary But that boundary carries an inter-tile dependence A boundary has concurrent start S_j is an inter-tile dependence That boundary carries Inter-tile dependence

9 Ohio State Univ Companion Hyperplane Hyperplane that destroys the inter-tile dependence Swivel a hyperplane “backward” Dependences carried by original hyperplane are “neutralized” Incoming dependences become non-incoming Outgoing dependences become non-outgoing

10 Ohio State Univ Overlapped Tiling Replace “back face” with companion hyperplane Additional region is shared with preceding tile Region of preceding tile that caused the dependence Each new tile independent of preceding tile (“do-all” parallelism) Increased computation cost; communication volume

11 Ohio State Univ Split Tiling Replace “front face” with companion hyperplane Tile split into independent and dependent regions Execute independent region followed by dependent region Increased #communications

12 Ohio State Univ Experimental Evaluation Cluster 2.8 GHz dual-processor Opteron 254 1MB L2 cache; 4GB RAM Linux 2.6.9; Intel compiler (icc) –O3 Comparison Two pipelined schedules – along space and time 1000 time steps 1 – 32 processors

13 Ohio State Univ Pipelined Execution: Parameters Space tile size: 1000 Time tile size: 16 64000 elements; 32 processors

14 Ohio State Univ Performance with Problem Size

15 Ohio State Univ Weak Scaling Problem size = #procs * 20000 Horizontal line – Linear Scaling

16 Ohio State Univ Conclusion Time tiling stencils – crucial for data locality Might inhibit concurrent execution Presented: Two approaches to enabling concurrent execution Ongoing work: Modeling relative benefits of the two approaches

17 Ohio State Univ Thank You!


Download ppt "Ohio State Univ Effective Automatic Parallelization of Stencil Computations * Sriram Krishnamoorthy 1 Muthu Baskaran 1, Uday Bondhugula 1, Atanas Rountev."

Similar presentations


Ads by Google