Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters 3-5,11 in Pacheco

Parallel Processing2 Introduction Objective: To introduce distributed memory parallel programming using message passing. Introduction to the MPI standard for message passing. Topics –Introduction to MPI hello.c hello.f –Example Problem (numeric integration) –Collective Communication –Performance Model

Parallel Processing3 MPI Message Passing Interface Distributed Memory Model –Single Program Multiple Data (SPMD) –Communication using message passing Send/Recv –Collective Communication Broadcast Reduce (AllReduce) Gather (AllGather) Scatter (AllScatter) Alltoall

Parallel Processing4 Benefits/Disadvantges No new language is requried Portable Good performance Explicitly forces programmer to deal with local/global access Harder to program that shared memory – requires larger program/algorithm changes

Parallel Processing5 Further Information http://www-unix.mcs.anl.gov/mpi/ en.wikipedia.org/wiki/Message_Passing_Interface www.mpi-forum.org www.open-mpi.org www.mcs.anl.gov/research/projects/mpich2 Textbook –Peter S. Pacheco, Parallel Programming with MPI, Morgan Kaufman, 1997.

Parallel Processing6 Basic MPI Functions int MPI_Init( int* argc /* in/out */, char** argv /* in/out */) int MPI_Finalize(void) Int MPI_Comm_size( MPI_Comm communicator /* in */, int* number_of_processors /* out */) Int MPI_Comm_rank( MPI_Comm communicator /* in */, int* my_rank /* out */)

Parallel Processing7 Send Must package message in envelope containing destination, size, and an identifying tag, set of processors participating in the communication. int MPI_Send( void* message /* in */ int count /* in */ MPI_Datatype datatype /* in */ int dest /* in */ int tag /* in */ MPI_Comm communicator /* in */)

Parallel Processing8 Receive int MPI_Recv( void* message /* out */ int count /* in */ MPI_Datatype datatype /* in */ int source /* in */ int tag /* in */ MPI_Comm communicator /* in */ MPI_Status* status /* out */)

Parallel Processing9 Status Status-> MPI_SOURCE Status-> MPI_TAG Status-> MPI_ERROR Int MPI_Get_count( MPI_Status* status /* in */, MPI_Datatype datatype /* in */, int* count_ptr /* out */)

Parallel Processing10 hello.c #include #include "mpi.h" main(int argc, char * argv[]) { int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status; /* return status for receive */ /* Start up MPI */ MPI_Init(&argc, &argv); /* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p);

Parallel Processing11 hello.c if (my_rank != 0) { /* create message */ sprintf(message, "Greetings from process %d!\n",my_rank); dest = 0; /* user strlen + 1 so tat '\0' gets transmitted */ MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1; source < p; source++) { MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); printf("%s\n",message); } /* Shut down MPI */ MPI_Finalize(); }

Parallel Processing12 AnySource if (my_rank != 0) { /* create message */ sprintf(message, "Greetings from process %d!\n",my_rank); dest = 0; /* user strlen + 1 so tat '\0' gets transmitted */ MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1; source < p; source++) { MPI_Recv(message,100, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD,&status); printf("%s\n",message); }

Ring Communication Oct. 30, 2002Parallel Processing13 01 54 7 6 2 3

Parallel Processing14 First Attempt sprintf(message, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); source = (my_rank -1) % p; MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); printf("PE %d received: %s\n",my_rank,message);

Parallel Processing15 Deadlock sprintf(message, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; source = (my_rank -1) % p; MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); printf("PE %d received: %s\n",my_rank,message);

Oct. 30, 2002Parallel Processing16 Buffering Assumption Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur. SendRecv can be used to guarantee that deadlock does not occur.

Oct. 30, 2002Parallel Processing17 SendRecv int MPI_Sendrecv( void* send_buf /* in */, int send_count /* in */, MPI_Datatype send_type /* in */, int dest /* in */, int send_tag /* in */, void* recv_buf /* out */, int recv_count /* in */, MPI_Datatype recv_type /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */)

Parallel Processing18 Correct Version with SendReceive sprintf(omessage, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; source = (my_rank -1) % p; MPI_Sendrecv(omessage,strlen(omessage)+1,MPI_CHAR,dest,tag, imessage,100,MPI_CHAR,source,tag,MPI_COMM_WORLD,&status ); printf("PE %d received: %s\n",my_rank,imessage);

Parallel Processing19 Lower Level Implementation sprintf(smessage, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; source = (my_rank -1) % p; if (my_rank % 2 == 0) { MPI_Send(smessage, strlen(smessage)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); MPI_Recv(dmessage,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); } else { MPI_Recv(dmessage,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); MPI_Send(smessage, strlen(smessage)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } printf("PE %d received: %s\n",my_rank,dmessage);

Parallel Processing20 Compiling and Executing MPI Programs with OpenMPI To compile a C program with MPI calls –mpicc hello.c -o hello To run an MPI program –mpirun –np PROCS hello –You can provide a hostfile with –hostfile NAME (see man page for details)

Parallel Processing21 dot.c #include float Serial_doc(float x[] /* in */, float y[] /* in */, int n /* in */) { int i; float sum = 0.0; for (i=0; i< n; i++) sum = sum + x[i]*y[i]; return sum; }

Parallel Processing22 Parallel Dot float Parallel_doc(float local_x[] /* in */, float local_y[] /* in */, int n_bar /* in */) { float local_dot; local_dot = Serial_dot(local_x, local_y,b_bar); MPI_Reduce(&local_dot, &dot, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD); return dot; }

Parallel Processing23 Parallel All Dot float Parallel_doc(float local_x[] /* in */, float local_y[] /* in */, int n_bar /* in */) { float local_dot; local_dot = Serial_dot(local_x, local_y,b_bar); MPI_Allreduce(&local_dot, &dot, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD); return dot; }

Parallel Processing24 Reduce int MPI_Reduce( void* operand /* in */ void* result /* out */ int count /* in */ MPI_Datatype datatype /* in */ MPI_Op operator /* in */ int root /* in */ MPI_Comm communicator /* in */) Operators –MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

x 0 + x 1 +x 2 +x 3 +x 4 + x 5 +x 6 + x 7 Parallel Processing25 Reduce 0 0 1 0 12 3 0 12 3 4 56 7 x0x0 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x 0 +x 4,x 1 +x 5,x 2 +x 6,x 3 +x 7 x 0 +x 4 +x 2 +x 6 x 1 +x 5 +x 3 + x 7

Parallel Processing26 AllReduce int MPI_Allreduce( void* operand /* in */ void* result /* out */ int count /* in */ MPI_Datatype datatype /* in */ MPI_Op operator /* in */ int root /* in */ MPI_Comm communicator /* in */) Operators –MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

Parallel Processing27 AllReduce 0 12 3 4 56 7 0 12 3 4 56 7 0 12 3 4 56 7 0 12 3 4 56 7

Parallel Processing28 Broadcast int MPI_Bcast( void* message /* in */ int count /* in */ MPI_Datatype datatype /* in */ int root /* in */ MPI_Comm communicator /* in */)

Parallel Processing29 Broadcast 0 0 1 0 12 3 0 12 3 4 56 7

Parallel Processing30 Gather int MPI_Gather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ int root /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1 x2 x3

Parallel Processing31 Scatter int MPI_Scatter( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ int root /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1x2 x3

Parallel Processing32 AllGather int MPI_AllGather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1 x2 x3

Parallel Processing33 Matrix-Vector Product (block cyclic storage) y = Ax, y j =  0  i<n A ij *x j 0  i < m –Store blocks of A, x, y in local memory –Gather local blocks of x in each process –Compute chunks of y in parallel Process 0 Process 1 Process 2 Process 3 Axy = 

Parallel Processing34 Parallel Matrix-Vector Product float Parallel_matrix_vector_product(LOCAL_MATRIX_T local_A, int m, int n, float local_x, float global_x, float local_y, int local_m, int local_n) { /* local_m = m/p0, local_n = n/p */ int I, j; MPI_Allgather(local_x, local_n, MPI_FLOAT, global_x, local_n, MPI_FLOAT, MPI_COMM_WORLD); for (i=0; i< local_m; i++) { local_y[i] = 0.0; for (j = 0; j< n; j++) local_y[i] = local_y[i] + local_A[I][j]*global_x[j]; }

Parallel Processing35 Embarrassingly Parallel Example Numerical integration (trapezoid rule)  t=0..1 f(t) dt  1/2*h[f(x 0 ) + 2f(x 1 )+…+ 2f(x n-1 )+f(x n )] abx1x1 x n-1 

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Similar presentations

Presentation on theme: "Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Similar presentations

Presentation on theme: "Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters."— Presentation transcript:

Similar presentations

About project

Feedback