Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical.

Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr

April 27, 2004IPDPS 20042 Overview  Introduction  Pure Message-passing Model  Hybrid Models Hyperplane Scheduling Fine-grain Model Coarse-grain Model  Experimental Results  Conclusions – Future Work

April 27, 2004IPDPS 20043 Motivation  Active research interest in SMP clusters Hybrid programming models  However: Mostly fine-grain hybrid paradigms (masteronly model) Mostly DOALL multi-threaded parallelization

April 27, 2004IPDPS 20044 Contribution  Comparison of 3 programming models for the parallelization of tiled loops algorithms pure message-passing fine-grain hybrid coarse-grain hybrid  Advanced hyperplane scheduling minimize synchronization need overlap computation with communication preserves data dependencies

April 27, 2004IPDPS 20045 Algorithmic Model Tiled nested loops with constant flow data dependencies FORACROSS tile 0 DO … FORACROSS tile n-2 DO FOR tile n-1 DO Receive(tile); Compute(tile); Send(tile); END FOR END FORACROSS … END FORACROSS

April 27, 2004IPDPS 20046 Target Architecture SMP clusters

April 27, 2004IPDPS 20048 Pure Message-passing Model tile 0 = pr 0 ; … tile n-2 = pr n-2 ; FOR tile n-1 = 0 TO DO Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); Compute(tile); MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); END FOR

April 27, 2004IPDPS 20049 Pure Message-passing Model

April 27, 2004IPDPS 200411 Hyperplane Scheduling  Implements coarse-grain parallelism assuming inter-tile data dependencies  Tiles are organized into data-independent subsets (groups)  Tiles of the same group can be concurrently executed by multiple threads  Barrier synchronization between threads

April 27, 2004IPDPS 200412 Hyperplane Scheduling tile ( mpi_rank, omp_tid, tile ) group

April 27, 2004IPDPS 200413 Hyperplane Scheduling #pragma omp parallel { group 0 = pr 0 ; … group n-2 = pr n-2 ; tile 0 = pr 0 * m 0 + th 0 ; … tile n-2 = pr n-2 * m n-2 + th n-2 ; FOR(group n-1 ){ tile n-1 = group n-1 - ; if(0 <= tile n-1 <= ) compute(tile); #pragma omp barrier }

April 27, 2004IPDPS 200415 Fine-grain Model  Incremental parallelization of computationally intensive parts  Pure MPI + hyperplane scheduling  Inter-node communication outside of multi- threaded part ( MPI_THREAD_MASTERONLY )  Thread synchronization through implicit barrier of omp parallel directive

April 27, 2004IPDPS 200416 Fine-grain Model FOR(group n-1 ){ Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); #pragma omp parallel { thread_id=omp_get_thread_num(); if(valid(tile,thread_id,group n-1 )) Compute(tile); } MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); }

April 27, 2004IPDPS 200418 Coarse-grain Model  Threads are only initialized once  SPMD paradigm (requires more programming effort)  Inter-node communication inside multi- threaded part (requires MPI_THREAD_FUNNELED )  Thread synchronization through explicit barrier ( omp barrier directive)

April 27, 2004IPDPS 200419 Coarse-grain Model #pragma omp parallel { thread_id=omp_get_thread_num(); FOR(group n-1 ){ #pragma omp master{ Pack(snd_buf, tile n-1 – 1, pr); MPI_Isend(snd_buf, dest(pr)); MPI_Irecv(recv_buf, src(pr)); } if(valid(tile,thread_id,group n-1 )) Compute(tile); #pragma omp master{ MPI_Waitall; Unpack(recv_buf, tile n-1 + 1, pr); } #pragma omp barrier }

April 27, 2004IPDPS 200421 Experimental Results  8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20)  MPICH v.1.2.5 ( --with-device=ch_p4, --with-comm=shared )  Intel C++ compiler 7.0 ( -O3 -mcpu=pentiumpro -static )  FastEthernet interconnection  ADI micro-kernel benchmark (3D)

April 27, 2004IPDPS 200422 Alternating Direction Implicit (ADI)  Stencil computation used for solving partial differential equations  Unitary data dependencies  3D iteration space (X x Y x Z)

April 27, 2004IPDPS 200423 ADI – 2 dual SMP nodes

April 27, 2004IPDPS 200424 ADI X=128 Y=512 Z=8192 – 2 nodes

April 27, 2004IPDPS 200429 ADI X=128 Y=512 Z=8192 – 2 nodes Computation Communication

April 27, 2004IPDPS 200430 ADI X=512 Y=128 Z=8192 – 2 nodes Computation Communication

April 27, 2004IPDPS 200432 Conclusions  Tiled loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm  Hybrid models can be competitive to the pure message-passing paradigm  Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated  Programming efficiently in OpenMP not easier than programming efficiently in MPI

April 27, 2004IPDPS 200433 Future Work  Application of methodology to real applications and standard benchmarks  Work balancing for coarse-grain model  Investigation of alternative topologies, irregular communication patterns  Performance evaluation on advanced interconnection networks (SCI, Myrinet)

April 27, 2004IPDPS 200434 Thank You! Questions?

Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical.

Similar presentations

Presentation on theme: "Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical.

Similar presentations

Presentation on theme: "Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical."— Presentation transcript:

Similar presentations

About project

Feedback