Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical.

Similar presentations


Presentation on theme: "Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical."— Presentation transcript:

1 Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr

2 April 27, 2004IPDPS 20042 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

3 April 27, 2004IPDPS 20043 Motivation  Active research interest in SMP clusters Hybrid programming models  However: Mostly fine-grain hybrid paradigms (masteronly model) Mostly DOALL multi-threaded parallelization

4 April 27, 2004IPDPS 20044 Contribution  Comparison of 3 programming models for the parallelization of tiled loops algorithms pure message-passing fine-grain hybrid coarse-grain hybrid  Advanced hyperplane scheduling minimize synchronization need overlap computation with communication preserves data dependencies

5 April 27, 2004IPDPS 20045 Algorithmic Model Tiled nested loops with constant flow data dependencies FORACROSS tile 0 DO … FORACROSS tile n-2 DO FOR tile n-1 DO Receive(tile); Compute(tile); Send(tile); END FOR END FORACROSS … END FORACROSS

6 April 27, 2004IPDPS 20046 Target Architecture SMP clusters

7 April 27, 2004IPDPS 20047 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

8 April 27, 2004IPDPS 20048 Pure Message-passing Model tile 0 = pr 0 ; … tile n-2 = pr n-2 ; FOR tile n-1 = 0 TO DO Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); Compute(tile); MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); END FOR

9 April 27, 2004IPDPS 20049 Pure Message-passing Model

10 April 27, 2004IPDPS 200410 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

11 April 27, 2004IPDPS 200411 Hyperplane Scheduling  Implements coarse-grain parallelism assuming inter-tile data dependencies  Tiles are organized into data-independent subsets (groups)  Tiles of the same group can be concurrently executed by multiple threads  Barrier synchronization between threads

12 April 27, 2004IPDPS 200412 Hyperplane Scheduling tile ( mpi_rank, omp_tid, tile ) group

13 April 27, 2004IPDPS 200413 Hyperplane Scheduling #pragma omp parallel { group 0 = pr 0 ; … group n-2 = pr n-2 ; tile 0 = pr 0 * m 0 + th 0 ; … tile n-2 = pr n-2 * m n-2 + th n-2 ; FOR(group n-1 ){ tile n-1 = group n-1 - ; if(0 <= tile n-1 <= ) compute(tile); #pragma omp barrier }

14 April 27, 2004IPDPS 200414 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

15 April 27, 2004IPDPS 200415 Fine-grain Model  Incremental parallelization of computationally intensive parts  Pure MPI + hyperplane scheduling  Inter-node communication outside of multi- threaded part ( MPI_THREAD_MASTERONLY )  Thread synchronization through implicit barrier of omp parallel directive

16 April 27, 2004IPDPS 200416 Fine-grain Model FOR(group n-1 ){ Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); #pragma omp parallel { thread_id=omp_get_thread_num(); if(valid(tile,thread_id,group n-1 )) Compute(tile); } MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); }

17 April 27, 2004IPDPS 200417 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

18 April 27, 2004IPDPS 200418 Coarse-grain Model  Threads are only initialized once  SPMD paradigm (requires more programming effort)  Inter-node communication inside multi- threaded part (requires MPI_THREAD_FUNNELED )  Thread synchronization through explicit barrier ( omp barrier directive)

19 April 27, 2004IPDPS 200419 Coarse-grain Model #pragma omp parallel { thread_id=omp_get_thread_num(); FOR(group n-1 ){ #pragma omp master{ Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); } if(valid(tile,thread_id,group n-1 )) Compute(tile); #pragma omp master{ MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); } #pragma omp barrier }

20 April 27, 2004IPDPS 200420 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

21 April 27, 2004IPDPS 200421 Experimental Results  8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20)  MPICH v.1.2.5 ( --with-device=ch_p4, --with-comm=shared )  Intel C++ compiler 7.0 ( -O3 -mcpu=pentiumpro -static )  FastEthernet interconnection  ADI micro-kernel benchmark (3D)

22 April 27, 2004IPDPS 200422 Alternating Direction Implicit (ADI)  Stencil computation used for solving partial differential equations  Unitary data dependencies  3D iteration space (X x Y x Z)

23 April 27, 2004IPDPS 200423 ADI – 2 dual SMP nodes

24 April 27, 2004IPDPS 200424 ADI X=128 Y=512 Z=8192 – 2 nodes

25 April 27, 2004IPDPS 200425 ADI X=256 Y=512 Z=8192 – 2 nodes

26 April 27, 2004IPDPS 200426 ADI X=512 Y=512 Z=8192 – 2 nodes

27 April 27, 2004IPDPS 200427 ADI X=512 Y=256 Z=8192 – 2 nodes

28 April 27, 2004IPDPS 200428 ADI X=512 Y=128 Z=8192 – 2 nodes

29 April 27, 2004IPDPS 200429 ADI X=128 Y=512 Z=8192 – 2 nodes Computation Communication

30 April 27, 2004IPDPS 200430 ADI X=512 Y=128 Z=8192 – 2 nodes Computation Communication

31 April 27, 2004IPDPS 200431 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

32 April 27, 2004IPDPS 200432 Conclusions  Tiled loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm  Hybrid models can be competitive to the pure message-passing paradigm  Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated  Programming efficiently in OpenMP not easier than programming efficiently in MPI

33 April 27, 2004IPDPS 200433 Future Work  Application of methodology to real applications and standard benchmarks  Work balancing for coarse-grain model  Investigation of alternative topologies, irregular communication patterns  Performance evaluation on advanced interconnection networks (SCI, Myrinet)

34 April 27, 2004IPDPS 200434 Thank You! Questions?


Download ppt "Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical."

Similar presentations


Ads by Google