1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication.

2 What is MPI? M P I = Message Passing Interface An Interface Specification: MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a library - but rather the specification of what such a library should be. Simply stated, the goal of the Message Passing Interface is to provide a widely used standard for writing message passing programs. The interface attempts to be: practical, portable, efficient, flexible. Interface specifications have been defined for Fortran, C/C++ and Java programs.

3 Some History: MPI resulted from the efforts of numerous individuals and groups over the course of a 2 year period between 1992 and 1994. 1980s - early 1990s: Recognition of the need for a standard arose. April, 1992: The basic features essential to a standard message passing interface were discussed, and a working group established to continue the standardization process. Preliminary draft proposal developed subsequently. November 1992: MPI draft proposal (MPI1) from ORNL presented. Group adopts procedures and organization to form the MPI Forum.MPI Forum. November 1993: Supercomputing 93 conference - draft MPI standard presented. May 1994: Final version of MPI 1 was released. 1996-1998: Developed MPI2.

4 Programming Model: SPMD MPI lends itself to most (if not all) distributed memory parallel programming models. Distributed Memory: Originally, MPI was targeted for distributed memory systems. Shared Memory: As shared memory systems became more popular, MPI implementations for these platforms appeared. Hybrid: MPI is now used on just about any common parallel architecture including massively parallel machines, SMP clusters, workstation clusters and heterogeneous networks. All parallelism is explicit: the programmer is responsible for correctly identifying parallelism and implementing parallel algorithms using MPI constructs. The number of tasks dedicated to run a parallel program is static. New tasks cannot be dynamically spawned during run time. (MPI-2 addresses this issue).

5 C Coding – Recalling Some facts Structure of a C program: 1.Include all headers 2.Declare all functions 3.Define all functions including main Simple Facts: 1.Declarations take the first part of a block 2.Same syntax for statements 3.Various important headers stdio, stdlib etc

6 MPI - Getting Started MPI Header: Required for all programs/routines which make MPI library calls. # include “mpi.h” Sometime we have to include some other MPI libraries. MPI Functions: Format: rc = MPI_Xxxxx(parameter,... ) Example: rc = MPI_Bsend(&buf,count,type,dest,tag,comm) Error code: Returned as "rc". MPI_SUCCESS if successful

7 MPI - Getting Started General MPI Program Structure: include declarations (MPI header) the main function - initialise the MPI environment - get the MPI basic elements: size, rank, etc - do the parallel work - acquire the local data for processor rank - perform the computation of the data -terminate the MPI environment. some other functions

8 MPI Programs 1.#include 2.#include "mpi.h" 3.int main( int argc, char* argv[]) 4.{ 5.int rank, size; 6. 7. 8.MPI_Init( &argc, &argv ); 9.MPI_Comm_size( MPI_COMM_WORLD, &size ); 10.MPI_Comm_rank( MPI_COMM_WORLD, &rank ); 1.// the parallel computation of processor rank 2. // get the data from somewhere 3. // process the data for processor rank 4.MPI_Finalize(); 5.return 0; 6.}

9 Environment Management Routines Intialize/terminate the comm, find information about it, etc. MPI_InitMPI_Init Initializes the MPI execution environment. MPI_Init (&argc, &argv) where argc and argv are the arguments of main() MPI_AbortMPI_Abort Terminates all MPI processes associated with the communicator. MPI_Abort (comm,errorcode) MPI_WtimeMPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor. MPI_Wtime () MPI_FinalizeMPI_Finalize Terminates the MPI execution. MPI_Finalize ().

10 Environment Management Routines MPI_Comm_sizeMPI_Comm_size Determines the number of processes from a communicator. MPI_Comm_size (comm,&size) MPI_Comm_rankMPI_Comm_rank Determines the rank of the calling process within the communicator. MPI_Comm_rank (comm,&rank) MPI_Get_processor_nameMPI_Get_processor_name Returns the processor name. MPI_Get_processor_name (&name,&resultlength)

11 Communicators: MPI_COMM_WORLD Communicators: set of processors that communicate each other. MPI routines require a communicator. MPI_COMM_WORLD the default communicator. Within a communicator each process has a rank.

12 Hello World 1.#include 2.#include "mpi.h" 3.int main( int argc, char* argv[]) 4.{ 5.int rank, size; 6.int i,namelen; 7.char processor_name[MPI_MAX_PROCESSOR_NAME]; 8. 9.MPI_Init( &argc, &argv ); 10.MPI_Comm_size( MPI_COMM_WORLD, &size ); 11.MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Get_processor_name(processor_name,&namelen); 12. printf("called on %s\n",processor_name); 13. printf( "Hello World from process %d of %d\n", rank, size ); 14.MPI_Finalize(); 15.return 0; 16.}

13 [sabin@cuc100 hello]$ ls hellos hellos.c hellos.o Makefile [sabin@cuc100 hello]$ mpirun -np 4 hellos called on cuc100.ucc.ie called on cuc104.ucc.ie called on cuc106.ucc.ie called on cuc108.ucc.ie Hello world from process 0 of 4 Hello world from process 2 of 4 Hello world from process 1 of 4 Hello world from process 3 of 4

14 Simple Structures – All Processors Work #include #include "mpi.h" int main( int argc, char* argv[]) { int rank, size; int i,namelen; double time; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); // describe the processing to be done by processor rank // 1. identify the local data by using rank // 2. process the local data MPI_Finalize(); return 0; }

15 Case Study: Count Prime Numbers Some facts about the lab program:  The first numbers 2*i+1 are tested for i = 0, 1, 2,..., n-1  With size processors  Each gets n/size numbers to test  Block partition of the odd numbers onto processors:  Proc 0: 0, 1, 2, 3,..., n/size-1  Proc 1: n/size, n/size+1,..., 2*n/size-1  Proc 2: 2*n/size, 2*n/size+1,..., 3*n/size-1  Proc rank gets: rank*(n/size),..., (rank+1)*n/size-1

16 Count Primes #include #include "mpi.h" int main( int argc, char* argv[]) { int rank, size, i, count = 0; double time; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); time=MPI_Wtime(); for(i=rank*n/size;i<(rank+1)*n/size;i++) if(isPrime(2*i+1)) count++ time=MPI_Wtime()-time; printf(“Processor %d finds %d primes in %lf\n", rank, count, time); MPI_Finalize(); return 0; } LOCAL DATA

17 Count Primes Cyclic partition of the odd numbers onto processors:  Proc 0: 0, size, 2*size,...  Proc 1: 1, size+1, 2*size+1,...  Proc rank gets: rank, rank+size,..., for(i=rank; i<n; i+=size) if(isPrime(2*i+1) count++

18 Simple Structures – One Processor Works #include #include "mpi.h" int main( int argc, char* argv[]) { int rank, size; int i,namelen; double time; MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); if(rank==0) { // describe the processing to be done by processor 0 } MPI_Finalize(); return 0; } SERIAL PART

19 P2P Communication MPI P2P operations involve message passing between two different MPI processors. The sender should have MPI_Send and the receiver a MPI_Recv. The code should look like if(rank==sender || rank == receiver) {if (rank==sender) MPI_Send(…); else if (rank==receiver) MPI_Recv(); }

20 Different types of send and receive routines for different purposes. Blocking send / blocking receive; Synchronous send; Non-blocking send / non-blocking receive; Buffered send; Combined send/receive; "Ready" send. Blocking: methods return only after the operation has been done successfully. Synchronous: Blocking plus handshaking. Non-Blocking: methods return immediately but must be backed up by MPI wait or test - non blocking call - do some other computation - wait or test the call completion. Types of P2P Communication

21 Blocking

22 Non-Blocking

23 Synchronous

24 Envelope Details The P2P operations should have envelope details. Blocking send: MPI_Send(buffer, count, type, dest, tag, comm) Blocking receive: MPI_Recv(buffer, count, type, source, tag, comm, status) Non-blocking send: MPI_Isend(buffer, count, type, dest, tag, comm, request) Non-blocking receive: MPI_Irecv(buffer, count, type, source, tag, comm, request) buffer - the address of the message count - number of elements to be sent dest - the process destination tag - the message tag/id; wild card MPI_ANY_TAG comm - the communicator status - general info about the message

25 MPI Data Types MPI_CHAR signed char MPI_SHORTsigned short int MPI_INT signed int MPI_LONG signed long int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double MPI_LOGICAL logical

26 Basic Blocking Operations MPI_Send – Basic send routine returns only after the application buffer in the sending task is free for reuse. MPI_Send (&buf,count,datatype,dest,tag,comm) MPI_Recv - Receive a message and block until the requested data is available. MPI_Recv (&buf,count,datatype,source,tag,comm,&status) MPI_Ssend - Synchronous blocking send: MPI_Ssend (&buf,count,datatype,dest,tag,comm,ierr) MPI_Bsend - Buffered blocking send MPI_Bsend (&buf,count,datatype,dest,tag,comm) MPI_Rsend - Blocking ready send. MPI_Rsend (&buf,count,datatype,dest,tag,comm) Similar MPI_?recv can be analyzed.

27 Basic Non-Blocking Operations MPI_Isend – immediate send operation that should be followed by MPI_Wait or MPI_Test. MPI_Isend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irecv - immediate receive MPI_Irecv (&buf,count,datatype,source,tag,comm,&request) MPI_Issend – immediate synchronous send. MPI_Wait() or MPI_Test() indicates when the destination process has received the message. MPI_Issend (&buf,count,datatype,dest,tag,comm,&request) MPI_Ibsend - Non-blocking buffered send. MPI_Ibsend (&buf,count,datatype,dest,tag,comm,&request) MPI_Irsend Non-blocking ready send. MPI_Irsend (&buf,count,datatype,dest,tag,comm,&request) MPI_Test - MPI_Test checks the status of a specified non-blocking send or receive operation.

28 Simple Ping-Pong Example

29 Simple Ping-Pong Example

30 Simple Ping-Pong Example Ping-Pong computation works with the following elements: - Only two processors involved in; the rest are idle. -Processor rank1 does: -1. Prepare the message. -2. Send the message to Processor 2. -3. Receive the message from Processor 2. -Processor rank2 does: -1. Prepare the message. -2. Receive the message to Processor 1. -3. Send the message from Processor 2.

31 // MPI program to ping-pong between Processor 0 and Processor 1 #include "mpi.h" #include int main(int argc, char * argv []) int numtasks, rank, dest, source, rc, count, tag=1; char inmsg, outmsg; MPI_Status Stat ; MPI_Init (&argc,&argv); MPI_Comm_size (MPI_COMM_WORLD, &numtasks); MPI_Comm_rank (MPI_COMM_WORLD, &rank); if (rank == 0) { dest = source = 1;outmsg=’x’; rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); } else if (rank == 1) { dest = source = 0;outmsg=’y’; rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } rc = MPI_Get_count (&Stat, MPI_CHAR, &count); printf("Task %d: Received %d char(s) from task %d with tag %d \n", rank, count, Stat.MPI_SOURCE,Stat.MPI_TAG); MPI_Finalize (); }

32 [sabin@cuc100 pingpong]$ make /usr/local/mpich/bin/mpicc -c pingpong.c /usr/local/mpich/bin/mpicc -o pingpong pingpong.o -lm [sabin@cuc100 pingpong]$ mpirun -np 2 pingpong Task 0 received the char x Task 0: Received 1 char(s) from task 1 with tag 1 Task 1 received the char y Task 1: Received 1 char(s) from task 0 with tag 1

33 All-to-Root as P2P Communication

34 All-to-Root as P2P Communication All-to-root computation involves: - The processor rank sends the message to root. - If Processor 0 then : -for size times do -Receive the message from Processor source. Overall execution time computation?

35 #include #include "mpi.h" int main(int argc, char** argv) { int rank; /* Rank of process */ int size; /* Number of processes */ int source, dest, int tag = 50; MPI_Status status; /* Return status for receive */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); sprintf(message, "Greetings from process %d!", rank); dest = 0; /* Use strlen(message)+1 to include '\0' */ MPI_Send(message, strlen(message)+1, MPI_CHAR, 0, tag, MPI_COMM_WORLD); if(rank == 0) { for (source = 0; source < size; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } MPI_Finalize(); }

36 [sabin@cuc100 fireP0]$ make /usr/local/mpich/bin/mpicc -c fire.c /usr/local/mpich/bin/mpicc -o fire fire.o -lm [sabin@cuc100 fireP0]$ mpirun -np 5 fire Here it is Process 0 Greetings from process 1! Greetings from process 2! Greetings from process 3! Greetings from process 4!

37 Ring Communication How can the processors all know a variable? How many values they have to know? How to achieve this? circular process Each processor repeats for size times: Send the value to the right Receive a value from left Store the value or process the value

38 Ring Communication a b d c ef

39 Ring Communication a,f b,a d,c c,b e,df,e

40 # include # include“mpi.h” # define tag 100 int main (int argc, char *argv[]){ int ierror, rank, size; int right, left; int ibuff, obuff, sum, i; MPI_Status recv_status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); right = rank + 1; if (right == size) right = 0; left = rank - 1; if (left == -1) left = size-1; sum = 0; obuff = rank; for( i = 0; i < size; i++) { MPI_Send(&obuff, 1, MPI_INT, right, tag, MPI_COMM_WORLD); MPI_Recv(&ibuff, 1, MPI_INT, left, tag, MPI_COMM_WORLD, &recv_status); // storebuff[(rank-i)%n] = obuff; sum = sum + ibuff; obuff = ibuff; } printf ("\t Processor %d: \t Sum = %d\n", rank, sum); MPI_Finalize(); }

41 [sabin@cuc100 ring]$ make /usr/local/mpich/bin/mpicc -c ring.c /usr/local/mpich/bin/mpicc -o ring ring.o -lm [sabin@cuc100 ring]$ mpirun -np 5 ring Processor 0: Sum = 10 Processor 1: Sum = 10 Processor 3: Sum = 10 Processor 4: Sum = 10 Processor 2: Sum = 10

42 References: 1. LLNL MPI Tutorial – Sections on P2P communication. 2. Wilkinson Book – Sections on P2P Communication.

1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication.

Similar presentations

Presentation on theme: "1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication.

Similar presentations

Presentation on theme: "1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication."— Presentation transcript:

Similar presentations

About project

Feedback