Presentation is loading. Please wait.

Presentation is loading. Please wait.

Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University.

Similar presentations


Presentation on theme: "Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University."— Presentation transcript:

1 Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr

2 Oslo, June 15, 2005ICPP-HPSEC 20052 Motivation  fully permutable loops always a computational challenge for HPC  hybrid parallelization attractive for DSM architectures  currently, popular free message passing libraries provide limited multi-threading support  SPMD hybrid parallelization suffers from intrinsic load imbalance

3 Oslo, June 15, 2005ICPP-HPSEC 20053 Contribution  two static thread load balancing schemes (constant-variable) for coarse-grain funneled hybrid parallelization of fully permutable loops generic simple to implement  experimental evaluation against micro-kernel benchmarks of different programming models message passing fine-grain hybrid coarse-grain hybrid (unbalanced, balanced)

4 Oslo, June 15, 2005ICPP-HPSEC 20054 Algorithmic model foracross tile 1 do … foracross tile N do for tile n-1 do Receive(tile); Compute(A,tile); Send(tile); Restrictions:  fully permutable loops  unitary inter-process dependencies

5 Oslo, June 15, 2005ICPP-HPSEC 20055 Message passing parallelization  tiling transformation  (overlapped?) computation and communication phases  pipelined execution portable scalable highly optimized

6 Oslo, June 15, 2005ICPP-HPSEC 20056 Hybrid parallelization So… why bother?

7 Oslo, June 15, 2005ICPP-HPSEC 20057 Hybrid parallelization: why bother I shared memory programming model vs message passing programming model for shared memory architecture

8 Oslo, June 15, 2005ICPP-HPSEC 20058 Hybrid parallelization: why bother II DSM architectures are popular!

9 Oslo, June 15, 2005ICPP-HPSEC 20059 Fine-grain hybrid parallelization  incremental parallelization of loops  relatively easy to implement  popular  Amdahl’s law restricts parallel efficiency  overhead of thread structures re-initialization  restrictive programming model for many applications

10 Oslo, June 15, 2005ICPP-HPSEC 200510 Coarse-grain hybrid parallelization  generic SPMD programming style  good parallelization efficiency  no thread re-initialization overhead  more difficult to implement  intrinsic load imbalance assuming common funneled thread support level

11 Oslo, June 15, 2005ICPP-HPSEC 200511 MPI thread support levels  single  masteronly  funneled  serialized  multiple fine-grain hybrid coarse-grain hybrid comm comp comm … comp …

12 Oslo, June 15, 2005ICPP-HPSEC 200512 Load balancing Idea Consequence master thread assumes a smaller fraction of the process tile computational load compared to other threads

13 Oslo, June 15, 2005ICPP-HPSEC 200513 Load balancing (2) T………total number of threads p………current process id Assuming It follows

14 Oslo, June 15, 2005ICPP-HPSEC 200514 Load balancing (3)

15 Oslo, June 15, 2005ICPP-HPSEC 200515 Experimental Results  8-node dual SMP Linux Cluster (800 MHz PIII, 256 MB RAM, kernel 2.4.26)  MPICH v.1.2.6 ( --with-device=ch_p4, --with-comm=shared, P4_SOCKBUFSIZE=104KB )  Intel C++ compiler 8.1 ( -O3 -static -mcpu=pentiumpro )  FastEthernet interconnection network

16 Oslo, June 15, 2005ICPP-HPSEC 200516 Alternating Direction Implicit (ADI)  Stencil computation used for solving partial differential equations  Unitary data dependencies  3D iteration space (X x Y x Z)

17 Oslo, June 15, 2005ICPP-HPSEC 200517 ADI

18 Oslo, June 15, 2005ICPP-HPSEC 200518 Synthetic benchmark

19 Oslo, June 15, 2005ICPP-HPSEC 200519 Conclusions  fine-grain hybrid parallelization inefficient  unbalanced coarse-grain hybrid parallelization also inefficient  balancing improves hybrid model performance  variable balanced coarse-grain hybrid model most efficient approach overall  relative performance improvement increases for higher communication vs computation needs

20 Oslo, June 15, 2005ICPP-HPSEC 200520 Thank You! Questions?

21 Oslo, June 15, 2005ICPP-HPSEC 200521 ADI

22 Oslo, June 15, 2005ICPP-HPSEC 200522 Synthetic benchmark


Download ppt "Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University."

Similar presentations


Ads by Google