A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Slides:



Advertisements
Similar presentations
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
Advertisements

MPI Message Passing Interface
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
The Building Blocks: Send and Receive Operations
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Implementing Master/Slave Algorithms l Many algorithms have one or more master processes that send tasks and receive results from slave processes l Because.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Point-to-Point Communication Self Test with solution.
Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Lesson2 Point-to-point semantics Embarrassingly Parallel Examples.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
1 Message protocols l Message consists of “envelope” and data »Envelope contains tag, communicator, length, source information, plus impl. private data.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
CS 179: GPU Programming Lecture 20: Cross-system communication.
1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Choosing MPI Alternatives l MPI offers may ways to accomplish the same task l Which is best? »Just like everything else, it depends on the vendor, system.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
Its.unc.edu 1 University of North Carolina - Chapel Hill ITS Research Computing Instructor: Mark Reed Point to Point Communication.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Parallel Computing Presented by Justin Reschke
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Introduction to parallel computing concepts and technics
MPI Point to Point Communication
MPI Message Passing Interface
Parallel Programming with MPI and OpenMP
More on MPI Nonblocking point-to-point routines Deadlock
Distributed Systems CS
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
CSCE569 Parallel Computing
Quiz Questions ITCS 4145/5145 Parallel Programming MPI
Introduction to parallelism and the Message Passing Interface
Barriers implementations
Introduction to High Performance Computing Lecture 16
5- Message-Passing Programming
Presentation transcript:

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing Center (AHPCC)

Point To Point Communication + What is meant by this concept? 6There is a sender and a receiver : The sender prepares a message in a package from the application storage area : The sender has a protocol on how it contacts and communicates with the receiver –The protocol is an agreement on how the communication is set up –The sender and receive agree to and how to communicate : The receiver receives the message package per its agreement with the sender : The receiver processes the packet and installs the data in the application storage area

Communication Models + Many models are feasible and have been implemented in various environments, past and current + MPI’s goal is to be portable across all of the reasonable models 6This means that essentially NO assumptions can be made either –by the implementation, or –by the user : as to which model is or can be used + Let’s talk about two possible models 6Models like these actually were used informally and differently by individual “CPUs” in our recent trial communications amongst the three institutions

MPIs Conventions + Messages have a format or a template 6Message container, called a buffer, which is frequently assumed to be specified in user space – the storage set up by the user’s code 6Length in terms of number of objects of message type 6The type of objects in the message (basic type or user defined type) 6A message tag – a user specified integer id for the message 6Destination (for the sender) or source (for the receiver) of the message : The destination is the rank of the process in the process group 6Communication world or group – named arrangement established by calls to MPI

MPIs Conventions Continued + Kinds of communication 6Blocking : Sender does not return from an MPI call until the message buffer (the user’s container for the message) can be reused without corrupting the message that is being sent : Receiver does not return until the receiving message buffer contains all of the message 6Non-blocking : Sender call returns after sufficient processing has been performed to allow the processor in a separate and independent thread to complete sending the message – in particular, changes in the sending tasks message buffer may change the message sent : Receiver call returns after sufficient processing has been performed to allow the processor in a separate and independent thread to complete receiving the message – in particular, receiver tasks message buffer likely changes after the receiver call returns to the user’s code : Other MPI procedures test or wait for the completion of sends and receives

MPI Conventions Continued + Modes of communication (contact protcols and assumptions) 6These are assumptions that may be made by the user and the implementation must follow these assumptions 6Modes are determined by the name of the MPI SEND procedure used : Eg: MPI_BSEND specifies a buffered send 6Standard (no letter) : Assumes no particular protocol used – see later modes for typical protocols –Because no protocol is assumed, the programmer must assume the most restrictive one is used – namely “Ready” mode : Non-local operation – another process may have ‘to do something’ before this operation completes 6Buffered (B letter) : Buffers created used by the protocol and allocated in user-space : Send can be started whether or not a receive has been posted : Local operation – another process does not have to do anything before this operation completes

Modes Continued 6Synchronous (S letter) : Rendezvous semantics implemented –Sender starts but does not complete until the receiver has posted a receive »Buffer may be created in the receiver’s space or may be a direct transfer –Non-local operation 6Ready (R letter) : Sender starts only if the matching receive has been posted : Erroneous if receive not posted – result is undefined : Non-local operation : Highest performance as it can be a direct transfer with no buffer

MPI Conventions Continued + Communication “worlds” or communicators 6Specifies the domain of the processes within the group 6A processor may be in more than one processor group 6Each processor has a rank in each group : The rank of a particular process may be different in each group 6The purpose of the groups is to arrange the processors so that it is convenient to send/receive message to the particular group and others processors do not see the message : Processors in a grid (north-south-east-west communication) : Processors distributed in a line or row or column of a grid : Processors in a circle : Processors in a hypercube configuration

Pictures of Implementation Models Sender User data Buffer Receiver User data Buffer Send buffer used No receive buffer used Sender User data Buffer Receiver User data Buffer Send buffer used Receive buffer used

Pictures of Implementation Models Sender User data Buffer Receiver User data Buffer No send buffer used No receive buffer used Sender User data Buffer Receiver User data Buffer No send buffer used Receive buffer used

Blocking Communication Operations + MPI_SEND and MPI_RECV 6Let’s look at 3 reasonable ways to perform communication between 2 processors which exchange messages : One always works : One always deadlocks –That is, both processors hang waiting for the other to communicate : One may or may not work depending on the actual protocols used by the MPI implementation

+ Steps: 6Determine what rank the process is 6If rank == 0 : Send a message from send_buffer to process with rank 1 : Receive a message into recv_buffer from process with rank 1 6Else if rank == 1 : Receive a message into recv_buffer from process with rank 0 : Send a message from send_buffer to process with rank 0 + Pattern of communication (doesn’t matter who (0 or 1) executes first) This One Always Works Processor 0 Processor 1 Send first Receive next Receive first Send next

Example Code – Always Works Call MPI_Comm_rank( comm, rank, ierr) If( rank == 0 ) then call MPI_Send( sendbuf, count, MPI_REAL, & 1, tag, comm, ierr ) call MPI_Recv( recvbuf, count, MPI_REAL, & 1, tag, comm, status, ierr ) Else if( rank == 1 ) then call MPI_Recv( recvbuf, count, MPI_REAL, & 0, tag, comm, status, ierr ) call MPI_Send( recvbuf, count, MPI_REAL, & 0, tag, comm, ierr ) Endif

This One Always Deadlocks + Steps: 6Determine what rank the process is 6If rank == 0 : Receive a message into recv_buffer from process with rank 1 : Send a message from send_buffer to process with rank 1 6Else if rank == 1 : Receive a message into recv_buffer from process with rank 0 : Send a message from send_buffer to process with rank 0 + Pattern of communication (doesn’t matter who (0 or 1) executes first) Processor 0 Processor 1 Receive first Send next Receive first Send next

Example Code – Always Deadlocks Call MPI_Comm_rank( comm, rank, ierr) If( rank == 0 ) then call MPI_Recv( recvbuf, count, MPI_REAL, & 1, tag, comm, status, ierr ) call MPI_Send( sendbuf, count, MPI_REAL, & 1, tag, comm, ierr ) Else if( rank == 1 ) then call MPI_Recv( recvbuf, count, MPI_REAL, & 0, tag, comm, status, ierr ) call MPI_Send( recvbuf, count, MPI_REAL, & 0, tag, comm, ierr ) Endif

This One may or May Not Work – The Worst Of All Possibilities + That is, it may work on one implementation and not work on another 6Whether it works may depend on the size of the message or other unknown features of the implementation 6It relies on the buffering of the messages for which the code does not specify – no MPI_BSEND used or no MPI_Buffer_attach + Pattern of communication (doesn’t matter who (0 or 1) executes first) Processor 0 Processor 1 Send first Receive next Send first Receive next

Example Code – May Fail Call MPI_Comm_rank( comm, rank, ierr) If( rank == 0 ) then call MPI_Send( sendbuf, count, MPI_REAL, & 1, tag, comm, ierr ) call MPI_Recv( recvbuf, count, MPI_REAL, & 1, tag, comm, status, ierr ) Else if( rank == 1 ) then call MPI_Send( recvbuf, count, MPI_REAL, & 0, tag, comm, ierr ) call MPI_Recv( recvbuf, count, MPI_REAL, & 0, tag, comm, status, ierr ) Endif

An Application Showing These Issues – Very Close To Your Code + Consider a 2-D Jacobi iteration (n  n matrix) using a 5 point stencil 6The data structure to be used here is a 1-D data structure : The coding illustrations are simpler here : However, this code does not scale well when the ratio of the size of the problem n to the number of processors is large – the practical case –The communication overhead is too large in this case 6The algorithm or computation is: : Given an initial data for the matrix A, compute the average of the E-W-N- S neighbors of a point and assign it to the matrix B : Assign matrix B to A and repeat the process until the process has converged

Serial Code real A(0:n+1,0:n+1), B(1:n,:1:n) ! Main loop do while(.NOT. Converged(A) ) do j = 1, n b(1:n,j) = 0.25*(a(0:n-1,j)+a(2:n,j)+ & a(1:n,j-1)+a(1:n,j+1)) enddo a(1:n,1:n) = b(1:n,1:n) enddo

Partitioning A an B Amongst The Processors + For simplicity of explaining the SEND/RECV commands, we use a 1-D partition A 0 0 m+1 n m+1 n m+1 n+1 Process m n B 1 1 m n 1 1 m n

Code For This -- Unsafe real A(0:n+1,0:n+1), B(1:n,:1:n) ! Call MPI to return p (number of processors), and myrank ! Assume m is an integral multiple of p ! Main loop do while(.NOT. Converged(A) ) ! Compute with A and store in B as in the serial code … if( myrank > 0 ) then ! Send first column of B to last column of A of myrank-1 endif if( myrank < p-1 ) then ! Send last column of B to first column of A of myrank+1 endif if( myrank > 0 ) then ! Receive last column of B to first column of A of myrank-1 endif if( myrank < p-1 ) then ! Receive first column of B to last column of A of myrank+1 endif enddo

Unsafe Why? + All the sends are executed before any received is posted 6Assumes as before that the messages are buffered : This should not be assumed in standard mode + Solution: 6Divide the processors in two groups – even and odd proccssors : The odd processors send to the even processors first –Then the odd processors receive from the even processors : The even processors receive from the odd processors first –Then the even processors send to the odd processors 6The effect is to interleave the send and receive commands so that no buffers are required to complete the communication : They, of course, may be used

Safe Communication do while(.NOT. Converged(A) ) ! Compute with A and store in B as in the serial code … if( mod(myrank,2) == 1 ) then ! Odd ranked processors ! Send first column of B to last column of A of myrank-1 ! If not the last processor, send the last column of B to ! processor myrank+1 ! Receive into first column of A from processor myrank-1 ! If not the last processor, receive into last column of A ! from processor myrank+1 else ! Even ranked processors if( mod(myrank,2) == 1 ) then ! Odd ranked processors ! If not the first processor, receive last column of B to ! first column of A of myrank-1 ! If not the last processor, receive the first column of B to ! processor myrank+1 ! If not the first processor, send into first column of B to ! processor myrank-1 ! If not the last processor, send the last column of B ! to processor myrank+1 endif enddo

Safe And Simpler Communications + Use the send/receive commands for all but the first and last processors + Use null processes to avoid the use of the special cases of dealing with the first and last processors