A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing Center (AHPCC)

Point To Point Communication + What is meant by this concept? 6There is a sender and a receiver : The sender prepares a message in a package from the application storage area : The sender has a protocol on how it contacts and communicates with the receiver –The protocol is an agreement on how the communication is set up –The sender and receive agree to and how to communicate : The receiver receives the message package per its agreement with the sender : The receiver processes the packet and installs the data in the application storage area

Communication Models + Many models are feasible and have been implemented in various environments, past and current + MPI’s goal is to be portable across all of the reasonable models 6This means that essentially NO assumptions can be made either –by the implementation, or –by the user : as to which model is or can be used + Let’s talk about two possible models 6Models like these actually were used informally and differently by individual “CPUs” in our recent trial communications amongst the three institutions

MPIs Conventions + Messages have a format or a template 6Message container, called a buffer, which is frequently assumed to be specified in user space – the storage set up by the user’s code 6Length in terms of number of objects of message type 6The type of objects in the message (basic type or user defined type) 6A message tag – a user specified integer id for the message 6Destination (for the sender) or source (for the receiver) of the message : The destination is the rank of the process in the process group 6Communication world or group – named arrangement established by calls to MPI

MPIs Conventions Continued + Kinds of communication 6Blocking : Sender does not return from an MPI call until the message buffer (the user’s container for the message) can be reused without corrupting the message that is being sent : Receiver does not return until the receiving message buffer contains all of the message 6Non-blocking : Sender call returns after sufficient processing has been performed to allow the processor in a separate and independent thread to complete sending the message – in particular, changes in the sending tasks message buffer may change the message sent : Receiver call returns after sufficient processing has been performed to allow the processor in a separate and independent thread to complete receiving the message – in particular, receiver tasks message buffer likely changes after the receiver call returns to the user’s code : Other MPI procedures test or wait for the completion of sends and receives

MPI Conventions Continued + Modes of communication (contact protcols and assumptions) 6These are assumptions that may be made by the user and the implementation must follow these assumptions 6Modes are determined by the name of the MPI SEND procedure used : Eg: MPI_BSEND specifies a buffered send 6Standard (no letter) : Assumes no particular protocol used – see later modes for typical protocols –Because no protocol is assumed, the programmer must assume the most restrictive one is used – namely “Ready” mode : Non-local operation – another process may have ‘to do something’ before this operation completes 6Buffered (B letter) : Buffers created used by the protocol and allocated in user-space : Send can be started whether or not a receive has been posted : Local operation – another process does not have to do anything before this operation completes

Modes Continued 6Synchronous (S letter) : Rendezvous semantics implemented –Sender starts but does not complete until the receiver has posted a receive »Buffer may be created in the receiver’s space or may be a direct transfer –Non-local operation 6Ready (R letter) : Sender starts only if the matching receive has been posted : Erroneous if receive not posted – result is undefined : Non-local operation : Highest performance as it can be a direct transfer with no buffer

MPI Conventions Continued + Communication “worlds” or communicators 6Specifies the domain of the processes within the group 6A processor may be in more than one processor group 6Each processor has a rank in each group : The rank of a particular process may be different in each group 6The purpose of the groups is to arrange the processors so that it is convenient to send/receive message to the particular group and others processors do not see the message : Processors in a grid (north-south-east-west communication) : Processors distributed in a line or row or column of a grid : Processors in a circle : Processors in a hypercube configuration

Pictures of Implementation Models Sender User data Buffer Receiver User data Buffer Send buffer used No receive buffer used Sender User data Buffer Receiver User data Buffer Send buffer used Receive buffer used

Pictures of Implementation Models Sender User data Buffer Receiver User data Buffer No send buffer used No receive buffer used Sender User data Buffer Receiver User data Buffer No send buffer used Receive buffer used

Blocking Communication Operations + MPI_SEND and MPI_RECV 6Let’s look at 3 reasonable ways to perform communication between 2 processors which exchange messages : One always works : One always deadlocks –That is, both processors hang waiting for the other to communicate : One may or may not work depending on the actual protocols used by the MPI implementation

+ Steps: 6Determine what rank the process is 6If rank == 0 : Send a message from send_buffer to process with rank 1 : Receive a message into recv_buffer from process with rank 1 6Else if rank == 1 : Receive a message into recv_buffer from process with rank 0 : Send a message from send_buffer to process with rank 0 + Pattern of communication (doesn’t matter who (0 or 1) executes first) This One Always Works Processor 0 Processor 1 Send first Receive next Receive first Send next

Example Code – Always Works Call MPI_Comm_rank( comm, rank, ierr) If( rank == 0 ) then call MPI_Send( sendbuf, count, MPI_REAL, & 1, tag, comm, ierr ) call MPI_Recv( recvbuf, count, MPI_REAL, & 1, tag, comm, status, ierr ) Else if( rank == 1 ) then call MPI_Recv( recvbuf, count, MPI_REAL, & 0, tag, comm, status, ierr ) call MPI_Send( recvbuf, count, MPI_REAL, & 0, tag, comm, ierr ) Endif

This One Always Deadlocks + Steps: 6Determine what rank the process is 6If rank == 0 : Receive a message into recv_buffer from process with rank 1 : Send a message from send_buffer to process with rank 1 6Else if rank == 1 : Receive a message into recv_buffer from process with rank 0 : Send a message from send_buffer to process with rank 0 + Pattern of communication (doesn’t matter who (0 or 1) executes first) Processor 0 Processor 1 Receive first Send next Receive first Send next

Example Code – Always Deadlocks Call MPI_Comm_rank( comm, rank, ierr) If( rank == 0 ) then call MPI_Recv( recvbuf, count, MPI_REAL, & 1, tag, comm, status, ierr ) call MPI_Send( sendbuf, count, MPI_REAL, & 1, tag, comm, ierr ) Else if( rank == 1 ) then call MPI_Recv( recvbuf, count, MPI_REAL, & 0, tag, comm, status, ierr ) call MPI_Send( recvbuf, count, MPI_REAL, & 0, tag, comm, ierr ) Endif

This One may or May Not Work – The Worst Of All Possibilities + That is, it may work on one implementation and not work on another 6Whether it works may depend on the size of the message or other unknown features of the implementation 6It relies on the buffering of the messages for which the code does not specify – no MPI_BSEND used or no MPI_Buffer_attach + Pattern of communication (doesn’t matter who (0 or 1) executes first) Processor 0 Processor 1 Send first Receive next Send first Receive next

Example Code – May Fail Call MPI_Comm_rank( comm, rank, ierr) If( rank == 0 ) then call MPI_Send( sendbuf, count, MPI_REAL, & 1, tag, comm, ierr ) call MPI_Recv( recvbuf, count, MPI_REAL, & 1, tag, comm, status, ierr ) Else if( rank == 1 ) then call MPI_Send( recvbuf, count, MPI_REAL, & 0, tag, comm, ierr ) call MPI_Recv( recvbuf, count, MPI_REAL, & 0, tag, comm, status, ierr ) Endif

An Application Showing These Issues – Very Close To Your Code + Consider a 2-D Jacobi iteration (n  n matrix) using a 5 point stencil 6The data structure to be used here is a 1-D data structure : The coding illustrations are simpler here : However, this code does not scale well when the ratio of the size of the problem n to the number of processors is large – the practical case –The communication overhead is too large in this case 6The algorithm or computation is: : Given an initial data for the matrix A, compute the average of the E-W-N- S neighbors of a point and assign it to the matrix B : Assign matrix B to A and repeat the process until the process has converged

Serial Code real A(0:n+1,0:n+1), B(1:n,:1:n) ! Main loop do while(.NOT. Converged(A) ) do j = 1, n b(1:n,j) = 0.25*(a(0:n-1,j)+a(2:n,j)+ & a(1:n,j-1)+a(1:n,j+1)) enddo a(1:n,1:n) = b(1:n,1:n) enddo

Partitioning A an B Amongst The Processors + For simplicity of explaining the SEND/RECV commands, we use a 1-D partition A 0 0 m+1 n+1 0 0 m+1 n+1 0 0 m+1 n+1 Process 0 1 1 m n B 1 1 m n 1 1 m n

Code For This -- Unsafe real A(0:n+1,0:n+1), B(1:n,:1:n) ! Call MPI to return p (number of processors), and myrank ! Assume m is an integral multiple of p ! Main loop do while(.NOT. Converged(A) ) ! Compute with A and store in B as in the serial code … if( myrank > 0 ) then ! Send first column of B to last column of A of myrank-1 endif if( myrank < p-1 ) then ! Send last column of B to first column of A of myrank+1 endif if( myrank > 0 ) then ! Receive last column of B to first column of A of myrank-1 endif if( myrank < p-1 ) then ! Receive first column of B to last column of A of myrank+1 endif enddo

Unsafe Why? + All the sends are executed before any received is posted 6Assumes as before that the messages are buffered : This should not be assumed in standard mode + Solution: 6Divide the processors in two groups – even and odd proccssors : The odd processors send to the even processors first –Then the odd processors receive from the even processors : The even processors receive from the odd processors first –Then the even processors send to the odd processors 6The effect is to interleave the send and receive commands so that no buffers are required to complete the communication : They, of course, may be used

Safe Communication do while(.NOT. Converged(A) ) ! Compute with A and store in B as in the serial code … if( mod(myrank,2) == 1 ) then ! Odd ranked processors ! Send first column of B to last column of A of myrank-1 ! If not the last processor, send the last column of B to ! processor myrank+1 ! Receive into first column of A from processor myrank-1 ! If not the last processor, receive into last column of A ! from processor myrank+1 else ! Even ranked processors if( mod(myrank,2) == 1 ) then ! Odd ranked processors ! If not the first processor, receive last column of B to ! first column of A of myrank-1 ! If not the last processor, receive the first column of B to ! processor myrank+1 ! If not the first processor, send into first column of B to ! processor myrank-1 ! If not the last processor, send the last column of B ! to processor myrank+1 endif enddo

Safe And Simpler Communications + Use the send/receive commands for all but the first and last processors + Use null processes to avoid the use of the special cases of dealing with the first and last processors

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Similar presentations

Presentation on theme: "A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Similar presentations

Presentation on theme: "A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing."— Presentation transcript:

Similar presentations

About project

Feedback