Parallel Computing Message Passing Interface

Parallel Computing Message Passing Interface
EE SPECIAL TOPICS ON PARALLEL COMPUTING Professor: Nagi Mekhiel Presented by: Leili, Sanaz, Mojgan, Reza

Outline Overview Messages and Point-to-Point Communication
Non-blocking communication Collective communication Derived data types Other MPI-1 features Installing and Utilizing MPI Experimental results Conclusion References November-20-18 November-20-18 Ryerson University Ryerson University 2

Overview Mail Server Web Server Other Servers Workstation Head Node
Compute Node Private Network MPI Interconnect Ring November-20-18 November-20-18 Ryerson University Ryerson University 3

Overview Parallel Computing
A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we basically need? The ability to start the tasks A way for them to communicate November-20-18 Ryerson University

Overview What is MPI? A message passing library specification
Message-passing model Not a compiler specification (i.e. not a language) Not a specific product Designed for parallel computers, clusters, and heterogeneous networks November-20-18 Ryerson University

Overview Synchronous Communication
A synchronous communication does not complete until the message has been received. A FAX or registered mail. ok beep November-20-18 Ryerson University

Overview Asynchronous Communication
An asynchronous communication completes as soon as the message is on the way. A post card or . November-20-18 Ryerson University

Overview Collective Communications Types of Collective Transfers
Point-to-point communications involve pairs of processes. Many message passing systems provide operations which allow larger numbers of processes to participate Types of Collective Transfers Barrier Synchronizes processors. No data is exchanged but the barrier blocks until all processes. have called the barrier routine. Broadcast (sometimes multicast) A broadcast is a one-to-many communication. One processor sends one message to several destinations. Reduction Often useful in a many-to-one communication. November-20-18 Ryerson University

Overview What’s in a Message? Basic C Datatypes in MPI
An MPI message is an array of elements of a particular MPI datatype. All MPI messages are typed The type of the contents must be specified in both the send and the receive. Basic C Datatypes in MPI MPI Datatype C datatype MPI_CHAR signed char MPI_SHORT signed short int MPI_INT signed int MPI_LONG signed long int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED_INT unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double MPI_BYTE MPI_PACKED November-20-18 Ryerson University

Overview MPI Handles MPI Errors
MPI maintains internal data-structures which are referenced by the user through handles. Handles can be returned by and passed to MPI procedures. Handles can be copied by the usual assignment operation. MPI Errors MPI routines return an int that can contain an error code. The default action on the detection of an error is to cause the parallel operation to abort. The default can be changed to return an error code. November-20-18 Ryerson University

Overview Initializing MPI
The first MPI routine called in any MPI program must be the initialization routine MPI_INIT. MPI_INIT is called once by every process, before any other MPI routines. int mpi_Init( int *argc, char **argv ); November-20-18 Ryerson University

Overview Skeleton MPI Program #include <mpi.h>
main( int argc, char** argv ) { MPI_Init( &argc, &argv ); /* main part of the program */ MPI_Finalize(); } November-20-18 Ryerson University

Overview Point-to-point Communication
Always involves exactly two processes The destination is identified by its rank within the communicator There are four communication modes provided by MPI (these modes refer to sending not receiving) Standard Synchronous Buffered Ready November-20-18 Ryerson University

Overview Standard Send
MPI_Send( buf, count, datatype, dest, tag, comm ) Where buf is the address of the data to be sent count is the number of elements of the MPI datatype which buf contains datatype is the MPI datatype dest is the destination process for the message. This is specified by the rank of the destination within the group associated with the communicator comm tag is a marker used by the sender to distinguish between different types of messages comm is the communicator shared by the sender and the receiver November-20-18 Ryerson University

Overview Synchronous Send
MPI_Ssend( buf, count, datatype, dest, tag, comm ) can be started whether or not a matching receive was posted will complete successfully only if a matching receive is posted, and the receive operation has started to receive the message sent by the synchronous send. provides synchronous communication semantics: a communication does not complete at either end before both processes rendezvous at the communication. has non-local completion semantics. November-20-18 Ryerson University

Overview Buffered Send A buffered-mode send
Can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted. Has local completion semantics: its completion does not depend on the occurrence of a matching receive. In order to complete the operation, it may be necessary to buffer the outgoing message locally. For that purpose, buffer space is provided by the application. November-20-18 Ryerson University

Overview Ready Mode Send A ready-mode send completes immediately
may be started only if the matching receive has already been posted. has the same semantics as a standard-mode send. saves on overhead by avoiding handshaking and buffering November-20-18 Ryerson University

Messages and Point-to-Point Communication Non-blocking communication
Overview Messages and Point-to-Point Communication Non-blocking communication Collective communication Derived data types Other MPI-1 features Installing and Utilizing MPI Experimental results Conclusion References November-20-18 Ryerson University

Point-to-Point Communication
Communication between two processes. Source process sends message to destination process. Communication takes place within a communicator, e.g., MPI_COMM_WORLD. Processes are identified by their ranks in the communicator. 1 5 2 4 3 6 communicator message destination source November-20-18 November-20-18 Ryerson University Ryerson University 19

For a communication to succeed: Sender must specify a valid destination rank. Receiver must specify a valid source rank. The communicator must be the same. Tags must match. Message datatypes must match. Receiver’s buffer must be large enough. November-20-18 November-20-18 Ryerson University Ryerson University 20

Communication Modes Send communication modes: synchronous send  MPI_SSEND buffered [asynchronous] send  MPI_BSEND standard send  MPI_SEND Ready send  MPI_RSEND Receiving all modes  MPI_RECV November-20-18 November-20-18 Ryerson University Ryerson University 21

Communication Modes — Definitions Sender modes Definition Notes Synchronous send MPI_SSEND Only completes when the receive has started Buffered send MPI_BSEND Always completes (unless an error occurs), irrespective of receiver needs application-defined buffer to be declared with MPI_BUFFER_ATTACH Synchronous MPI_SEND Standard send. Either uses an internal buffer or buffered Ready send MPI_RSEND May be started only if the matching receive is already posted! highly dangerous! Receive MPI_RECV Completes when a the message (data) has arrived November-20-18 November-20-18 Ryerson University Ryerson University 22

Message Order Preservation Rule for messages on the same connection, i.e., same communicator, source, and destination rank. Messages do not overtake each other. This is true even for non-synchronous sends. If both receives match both messages, then the order is preserved. 1 5 2 4 3 6 November-20-18 November-20-18 Ryerson University Ryerson University 23

Overview Messages and Point-to-Point Communication Non-blocking communication Collective communication Derived data types Other MPI-1 features Installing and Utilizing MPI Experimental results Conclusion References November-20-18 November-20-18 Ryerson University Ryerson University 24

Non-Blocking Communication
Meaning of Blocking and Non-Blocking: Blocking: the program will not return from the subroutine call until the copy to/from the system buffer has finished. Non-blocking: the program immediately returns from the subroutine call. It is not assured that the copy to/from the system buffer has completed so that user has to make sure of the completion of the copy. November-20-18 Ryerson University

Characteristics: Separate communication into three phases: Initiate non-blocking communication returns Immediately routine name starting with MPI_I… Do some work “latency hiding” Wait for non-blocking communication to complete November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 26 26

Non-Blocking Examples Non-Blocking Send MPI_Isend(...) doing some other work MPI_Wait(...) Non-Blocking receive MPI_Irecv(...) doing some other work MPI_Wait(...) = waiting until operation locally completed November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 27 27

Non-blocking Synchronous Send: C: MPI_Issend (buf, count, datatype, dest, tag, comm, OUT &request_handle); MPI_Wait (INOUT &request_handle, &status); Fortran: CALL MPI_ISSEND(buf, count, datatype, dest, tag, comm, OUT request_handle, ierror) CALL MPI_WAIT(INOUT request_handle, status, ierror) buf must not be used between Issend and Wait. (in all progr. languages). “Issend + Wait directly after Issend” is equivalent to blocking call. (Ssend) status is not used in Issend, but in Wait. (with send: nothing returned) November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 28 28

Non-Blocking Communications
Non-blocking Receive: C: MPI_Irecv (buf, count, datatype, source, tag, comm, OUT &request_handle); MPI_Wait (INOUT &request_handle,&status); Fortran: CALL MPI_IRECV (buf, count, datatype, source, tag, comm, OUT request_handle, ierror) CALL MPI_WAIT( INOUT request_handle, status, ierror) buf must not be used between Irecv and Wait (in all progr. languages) November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 29 29

Collective Communication
Characteristic: Communications exist in a group of processes. Must be called by all processes in a communicator. Synchronization may or may not occur. All collective operations are blocking. Receive buffers must have exactly the same size as send buffers. November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 31 31

Barrier Synchronization: process waits for the other group members; when all of them have reached the barrier, they can continue C: int MPI_Barrier(MPI_Comm comm); Fortran: MPI_BARRIER(COMM, IERROR)INTEGER COMM, IERROR all synchronization is done automatically by the data communication: a process cannot continue before it has the data that it needs. November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 32 32

Broadcast: sends the data to all members of the group given by a communicator. C: int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm); Example: the first process that finds the solution in a competition informs everyone to stop November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 33 33

Gather: collect information from all participating Processes C: int MPI_Gather( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype,int root, MPI_Comm omm); Example: each process computes some part of the solution, which shall now be assembled by one process 20 November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 34 34

Scatter: Distribution of data among processes. C: int MPI_Scatter( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount,MPI_Datatype recvtype, int root, MPI_Comm comm); Example: two vectors are distributed in order to prepare a parallel computation of their scalar product November-20-18 November-20-18 November-20-18 Ryerson University Ryerson University Ryerson University 35 35

Derived Data Types MPI Data types
Description of the memory layout of the buffer for sending for receiving Basic types Derived types Vectors, structs, others Built from existing data types November-20-18 November-20-18 Ryerson University Ryerson University 37

the datatype handle describes the data layout
Derived Data Types Data Layout and the Describing Datatype Handle struct buff_layout {int i_val[3]; double d_val[5]; } buffer; array_of_types[0]=MPI_INT; array_of_blocklengths[0]=3; array_of_displacements[0]=0; array_of_types[1]=MPI_DOUBLE; array_of_blocklengths[1]=5; array_of_displacements[1]=…; MPI_Type_struct(2, array_of_blocklengths, array_of_displacements, array_of_types, &buff_datatype); MPI_Type_commit(&buff_datatype); Compiler MPI_Send(&buffer, 1, buff_datatype, …) the datatype handle describes the data layout &buffer = the start address of the data int double November-20-18 November-20-18 Ryerson University Ryerson University 38

Derived Data Types Type Maps
A derived data type is logically a pointer to a list of entries: basic data type at displacement basic datatype 0 displacement of datatype 0 basic datatype 1 displacement of datatype 1 ... basic datatype n-1 displacement of datatype n-1 November-20-18 November-20-18 Ryerson University Ryerson University 39

derived data type handle
Derived Data Types Example: 11 d+107 c 22 derived data type handle 16 MPI_DOUBLE 8 MPI_INT 4 MPI_CHAR basic datatype displacement A derived data type describes the memory layout of, e.g., structures, common blocks, subarrays, some variables in the memory November-20-18 November-20-18 Ryerson University Ryerson University 40

Derived Data Types Contiguous Data The simplest derived data type
Consists of a number of contiguous items of the same data type C: int MPI_Type_contiguous (int count, MPI_Datatype oldtype, MPI_Datatype *newtype) Fortran: MPI_TYPE_CONTIGUOUS( COUNT, OLDTYPE, NEWTYPE, IERROR) INTEGER COUNT, OLDTYPE INTEGER NEWTYPE, IERROR oldtype newtype November-20-18 November-20-18 Ryerson University Ryerson University 41

Derived Data Types Vector Datatype
C: int MPI_Type_vector(int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype) Fortran: MPI_TYPE_VECTOR(COUNT, BLOCKLENGTH, STRIDE, OLDTYPE, NEWTYPE, IERROR) INTEGER COUNT, BLOCKLENGTH, STRIDE INTEGER OLDTYPE, NEWTYPE, IERROR oldtype holes, that should not be transferred newtype blocklength = 3 elements per block stride = 5 (element stride between blocks) count = 2 blocks November-20-18 November-20-18 Ryerson University Ryerson University 42

Derived Data Types MPI_TYPE_VECTOR: An example
Sending the first row of a N*M Matrix C Fortran Sending the first column of an N*M Matrix November-20-18 November-20-18 Ryerson University Ryerson University 43

Derived Data Types Sending a row using MPI_TYPE_vector C
MPI_Type_vector(1, 5, 1, MPI_INT, MPI_ROW) Fortran MPI_Type_Commit(MPI_ROW) MPI_Send(&buf …, MPI_ROW…) MPI_Recv(&buf …, MPI_ROW…) November-20-18 November-20-18 Ryerson University Ryerson University 44

Derived Data Types Sending a column using MPI_TYPE_vector C
MPI_Type_vector(4, 1, 5, MPI_INT, MPI_COL) Fortran MPI_Type_Commit(MPI_COL) MPI_Send(buf …, MPI_COL…) MPI_Recv(buf …, MPI_COL…) November-20-18 November-20-18 Ryerson University Ryerson University 45

Derived Data Types Sending a sub-matrix using MPI_TYPE_vector C
MPI_Type_vector(2, 3, 5, MPI_INT, MPI_SUBMAT) Fortran MPI_Type_Commit(MPI_SUBMAT) MPI_Send(&buf …, MPI_SUBMAT…) MPI_Recv(&buf …, MPI_SUBMAT…) November-20-18 November-20-18 Ryerson University Ryerson University 46

Other MPI features (1) Point-to-point Collective Operations Topologies
MPI_Sendrecv & MPI_Sendrecv_replace Null processes, MPI_PROC_NULL (see Chap. 7??, slide on MPI_Cart_shift) MPI_Pack & MPI_Unpack MPI_Probe: check length (tag, source rank) before calling MPI_Recv MPI_Iprobe: check whether a message is available MPI_Request_free, MPI_Cancel Persistent requests MPI_BOTTOM (in point-to-point and collective communication) Collective Operations MPI_Allgather  MPI_Alltoall  MPI_Reduce_scatter MPI_ …v (Gatherv, Scatterv, Allgatherv, Alltoallv) Topologies MPI_DIMS_CREATE A B C A B C A1 B1 C1 A2 B2 C2 A3 B3 C3 A1 A2 A3 B1 B2 B3 C1 C2 C3  A  B  C November-20-18 November-20-18 Ryerson University Ryerson University 48

Other MPI features (2) Error Handling
the communication should be reliable if the MPI program is erroneous: by default: abort, if error detected by MPI library otherwise, unpredictable behavior Fortran: call MPI_Errhandler_set ( comm, MPI_ERRORS_RETURN, ierr) C: MPI_Errhandler_set ( comm, MPI_ERRORS_RETURN); then ierror returned by each MPI routine undefined state after an erroneous MPI call has occurred (only MPI_ABORT(…) should be still callable) November-20-18 November-20-18 Ryerson University Ryerson University 49

Installing MPI sequence:/home/grad/yourname> tar xfz mpich p1.tar.gz sequence:/home/grad/yourname> mkdir mpich2-install sequence:/home/grad/yourname> cd mpich sequence:/home/grad/yourname/mpich > configure –prefix=/home/you/mpich2-install |& tee configure.log November-20-18 November-20-18 Ryerson University Ryerson University 51

Building MPICH2 sequence:/home/grad/yourname/mpich > make |& tee make.log sequence:/home/grad/yourname/mpich > make install |& tee install.log November-20-18 November-20-18 Ryerson University Ryerson University 52

Building MPICH2 which mpd which mpicc which mpiexec which mpirun
sequence:/home/grad/yourname/mpich > setenv PATH /home/you/mpich2-install/bin:$PATH To check that everything is in order at this point by doing: which mpd which mpicc which mpiexec which mpirun November-20-18 November-20-18 Ryerson University Ryerson University 53

MPD Security MPICH2, uses an external process manager for scalable startup of large MPI jobs. The default process manager is called MPD which is a ring of daemons on the machines where you will run your MPI programs. For security reasons, MPD looks in your home directory for a file named .mpd.conf containing the line. secretword=<secretword> cd $HOME touch .mpd.conf chmod 600 .mpd.conf secretword=mr45-j9z November-20-18 November-20-18 Ryerson University Ryerson University 54

Bringing up a Ring of One MPD
The first sanity check consists of bringing up a ring of one MPD on the local machine Mpd & Testing one MPD command mpdtrace Bringing the “ring” down mpdallexit November-20-18 November-20-18 Ryerson University Ryerson University 55

Bringing up a Ring of MPDs
Now we will bring up a ring of mpd’s on a set of machines. There is a work around. mpd & mpdtrace –l Then log into each of the other machines, put the install/bin directory in your path, and do: mpd -h <hostname> -p <port> & November-20-18 November-20-18 Ryerson University Ryerson University 56

Automatic Ring Startup
Avoiding password prompt cd ~/.ssh ssh-keygen -t rsa cp id_rsa.pub authorized_keys mpdboot -n 4 -f mpd.hosts November-20-18 November-20-18 Ryerson University Ryerson University 57

Testing the Ring Test the ring we have just created mpdtrace
mpdringtest mpdringtest 100 mpdringtest 1000 November-20-18 November-20-18 Ryerson University Ryerson University 58

Testing the Ring Test that the ring can run a multiprocess job
mpiexec -n <number> hostname mpiexec -l -n 30 hostname mpiexec -l -n 30 /bin/hostname November-20-18 November-20-18 Ryerson University Ryerson University 59

Compiling & Running an MPI Job
Compilation in C: mpicc -o prog prog.c Compilation in C++: mpiCC -o prpg prog.c (Bull) mpicxx -o prog prog.cpp (IBM cluster) Compilation in Fortran: mpif77 -o prog prog.f mpif90 -o prog prog.f90 Executing program with num processes: mprun –n num prog (Bull) mpiexec -n num prog (Standard MPI-2) November-20-18 November-20-18 Ryerson University Ryerson University 60

Automatic Parallelization
By creating a mapping from one language to the other, we can expose the capabilities of existing automatically parallelizing compilers to the C language. Polaris is an automatically parallelizing source-to-source FORTRAN compiler. It accepts FORTRAN77 input and produces a FORTRAN output in a new dialect that supports explicit parallelism. November-20-18 November-20-18 Ryerson University Ryerson University 61

Parallel Computing Software
Many tools and utilities listed in the following link can be integrated with user programs as well as used as standalone problem solving systems. Some tools are: Adlib -- a C++ library implementing distributed array descriptor. ARCH -- object-oriented library of tools for parallel programming. Aztec -- parallel iterative library for solving linear systems. AutoMap /AutoLink -- a tool to simplify the creation of MPI data types and to transfer dynamic data-types. BERT an automatic and optimizing parallelizer for FORTRAN 77. November-20-18 November-20-18 Ryerson University Ryerson University 62

Experimental results (LAN)
For (i=0, i<10, i++) do A[600, 500] x B[500, 600] LAN Active server 2 3 5 8 10 15 Time 249 154 140 100 115 November-20-18 November-20-18 Ryerson University Ryerson University 64

Experimental results (Wireless)
For (i=0, i<10, i++) do A[600, 500] x B[500, 600] Wireless Active server 2 3 5 8 10 15 Time 413 349 337 324 321 454 November-20-18 November-20-18 Ryerson University Ryerson University 65

Conclusion Many MPI implementation with similar performance
Multiple measures criteria and multiple tools Latency, bandwidth Benchmarks and microbenchmarks Real applications High performance networks lead to consider small performance details Network bandwidth equals the memory bandwidth Latency smaller than some OS operations Performance relies on good programming Performance results can vary a lot according to the type of communication employed Asynchronism is mandatory Bad programming results in bad performance 0-copy can be mandatory November-20-18 Ryerson University

References Overview Messages and Point-to-Point Communication
Non-blocking communication Collective communication Derived data types Other MPI-1 features Installing and Utilizing MPI Experimental results Conclusion References November-20-18 November-20-18 Ryerson University Ryerson University 68

Parallel Computing Message Passing Interface

Similar presentations

Presentation on theme: "Parallel Computing Message Passing Interface"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Computing Message Passing Interface

Similar presentations

Presentation on theme: "Parallel Computing Message Passing Interface"— Presentation transcript:

Similar presentations

About project

Feedback