Presentation is loading. Please wait.

Presentation is loading. Please wait.

Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University.

Similar presentations


Presentation on theme: "Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University."— Presentation transcript:

1 Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory

2 Oslo, June 15, 2005ICPP-HPSEC Motivation  fully permutable loops always a computational challenge for HPC  hybrid parallelization attractive for DSM architectures  currently, popular free message passing libraries provide limited multi-threading support  SPMD hybrid parallelization suffers from intrinsic load imbalance

3 Oslo, June 15, 2005ICPP-HPSEC Contribution  two static thread load balancing schemes (constant-variable) for coarse-grain funneled hybrid parallelization of fully permutable loops generic simple to implement  experimental evaluation against micro-kernel benchmarks of different programming models message passing fine-grain hybrid coarse-grain hybrid (unbalanced, balanced)

4 Oslo, June 15, 2005ICPP-HPSEC Algorithmic model foracross tile 1 do … foracross tile N do for tile n-1 do Receive(tile); Compute(A,tile); Send(tile); Restrictions:  fully permutable loops  unitary inter-process dependencies

5 Oslo, June 15, 2005ICPP-HPSEC Message passing parallelization  tiling transformation  (overlapped?) computation and communication phases  pipelined execution portable scalable highly optimized

6 Oslo, June 15, 2005ICPP-HPSEC Hybrid parallelization So… why bother?

7 Oslo, June 15, 2005ICPP-HPSEC Hybrid parallelization: why bother I shared memory programming model vs message passing programming model for shared memory architecture

8 Oslo, June 15, 2005ICPP-HPSEC Hybrid parallelization: why bother II DSM architectures are popular!

9 Oslo, June 15, 2005ICPP-HPSEC Fine-grain hybrid parallelization  incremental parallelization of loops  relatively easy to implement  popular  Amdahl’s law restricts parallel efficiency  overhead of thread structures re-initialization  restrictive programming model for many applications

10 Oslo, June 15, 2005ICPP-HPSEC Coarse-grain hybrid parallelization  generic SPMD programming style  good parallelization efficiency  no thread re-initialization overhead  more difficult to implement  intrinsic load imbalance assuming common funneled thread support level

11 Oslo, June 15, 2005ICPP-HPSEC MPI thread support levels  single  masteronly  funneled  serialized  multiple fine-grain hybrid coarse-grain hybrid comm comp comm … comp …

12 Oslo, June 15, 2005ICPP-HPSEC Load balancing Idea Consequence master thread assumes a smaller fraction of the process tile computational load compared to other threads

13 Oslo, June 15, 2005ICPP-HPSEC Load balancing (2) T………total number of threads p………current process id Assuming It follows

14 Oslo, June 15, 2005ICPP-HPSEC Load balancing (3)

15 Oslo, June 15, 2005ICPP-HPSEC Experimental Results  8-node dual SMP Linux Cluster (800 MHz PIII, 256 MB RAM, kernel )  MPICH v ( --with-device=ch_p4, --with-comm=shared, P4_SOCKBUFSIZE=104KB )  Intel C++ compiler 8.1 ( -O3 -static -mcpu=pentiumpro )  FastEthernet interconnection network

16 Oslo, June 15, 2005ICPP-HPSEC Alternating Direction Implicit (ADI)  Stencil computation used for solving partial differential equations  Unitary data dependencies  3D iteration space (X x Y x Z)

17 Oslo, June 15, 2005ICPP-HPSEC ADI

18 Oslo, June 15, 2005ICPP-HPSEC Synthetic benchmark

19 Oslo, June 15, 2005ICPP-HPSEC Conclusions  fine-grain hybrid parallelization inefficient  unbalanced coarse-grain hybrid parallelization also inefficient  balancing improves hybrid model performance  variable balanced coarse-grain hybrid model most efficient approach overall  relative performance improvement increases for higher communication vs computation needs

20 Oslo, June 15, 2005ICPP-HPSEC Thank You! Questions?

21 Oslo, June 15, 2005ICPP-HPSEC ADI

22 Oslo, June 15, 2005ICPP-HPSEC Synthetic benchmark


Download ppt "Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University."

Similar presentations


Ads by Google