MPI: Message Passing Interface ECE563 Presentation ECE563 Presentation -- Zhelong Pan -- Zhelong Pan Feb. 10, 2003 Feb. 10, 2003.

MPI: Message Passing Interface ECE563 Presentation ECE563 Presentation -- Zhelong Pan -- Zhelong Pan Feb. 10, 2003 Feb. 10, 2003

Outline  Introduction to MPI Computing model of MPI Computing model of MPI An example An example  Details on MPI Initialize and finalize MPI Initialize and finalize MPI Point to point communication Point to point communication Datatypes Datatypes Communicators Communicators Communication Modes Communication Modes Collective communication Collective communication Features not mentioned Features not mentioned

 What: standard for a message passing library (C, C++ and Fortran) to be used for message-passing parallel computing.  When: 92-94 MPI1; 95-97 MPI2  Size: MPI1: 127 calls; MPI2: ~150 calls. Many parallel programs can be written with 6 basic functions. Many parallel programs can be written with 6 basic functions. Functions are orthogonal. Functions are orthogonal. Support for many different communication paradigms.Support for many different communication paradigms. Support for different communication modes.Support for different communication modes. Options offered via different function names, rather than parameters.Options offered via different function names, rather than parameters.  Where: Parallel computers and clusters (distributed or shared memory) Parallel computers and clusters (distributed or shared memory) NOWs (Network of workstations, heterogeneous systems) NOWs (Network of workstations, heterogeneous systems)  Find more: http://www.mcs.anl.gov/Projects/MPI http://www.mcs.anl.gov/Projects/MPI Brief overview

Basic programming model  Communicating sequential processes Each process runs in its own local address space. Each process runs in its own local address space. Processes exchange data and synchronize via message passing. ( Usually, but not always, same code executed by all processes.) Processes exchange data and synchronize via message passing. ( Usually, but not always, same code executed by all processes.) Used even where shared memory is available. Used even where shared memory is available. Need to take care of locality, in order to achieve performance – message passing does this explicitly. Need to take care of locality, in order to achieve performance – message passing does this explicitly. Harder to do naïve ports of sequential codes to message-passing (rather than shared memory); as easy (easier) to do performance ports. Harder to do naïve ports of sequential codes to message-passing (rather than shared memory); as easy (easier) to do performance ports.

An example

#include #include // this allows us to manipulate text strings #include "mpi.h" // this adds the MPI header files to the program int main(int argc, char* argv[]) { int my_rank; // process rank int p; // number of processes int source; // rank of sender int dest; // rank of receiving process int tag = 0; // tag for messages char message[100]; // storage for message MPI_Status status; // stores status for MPI_Recv statements // starts up MPI MPI_Init(&argc, &argv); // finds out rank of each process MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); // finds out number of processes MPI_Comm_size(MPI_COMM_WORLD, &p); if (my_rank!=0) { sprintf(message, "Greetings from process %d!", my_rank); dest = 0; // sets destination for MPI_Send to process 0 // sends the string to process 0 MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for(source = 1; source < p; source++){ // receives greeting from each process MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); // prints out greeting to screen } MPI_Finalize(); // shuts down MPI return 0; }

Compiling and running  Head file Fortran -- mpif.h Fortran -- mpif.h C -- mpi.h (*we use C in this presentation) C -- mpi.h (*we use C in this presentation)  Compile: implementation dependent. Typically requires specification of header file directory and MPI library. implementation dependent. Typically requires specification of header file directory and MPI library. SGI: cc source.c -lmpi SGI: cc source.c -lmpi  Run: mpirun -np mpirun -np

Result  cc hello.c -lmpi  mpirun -np 6 a.out Greetings from process 1! Greetings from process 2! Greetings from process 3! Greetings from process 4! Greetings from process 5!

Startup and endup  int MPI_Init(int *argc, char ***argv) The first MPI call in any MPI process The first MPI call in any MPI process Establishes MPI environment Establishes MPI environment One and only one call to MPI_INIT per process One and only one call to MPI_INIT per process  int MPI_Finalize(void) Exiting from MPI Exiting from MPI Cleans up state of MPI Cleans up state of MPI The last call of an MPI process The last call of an MPI process

Point to point communication  Basic communication in message passing libraries. Send(dest, tag, addr, len), Recv(src,tag,addr,len) Send(dest, tag, addr, len), Recv(src,tag,addr,len) Src/dest: integer identifying sending/receiving processes. Src/dest: integer identifying sending/receiving processes. Tag: integer identifying message Tag: integer identifying message (addr,len): communication buffer, contiguous area. (addr,len): communication buffer, contiguous area.  MPI extensions. Messages are typed: supports heterogeneous computing. Messages are typed: supports heterogeneous computing. Buffers need not be contiguous: supports scatter/gather. Buffers need not be contiguous: supports scatter/gather. Non-interfering communication domains: Used for scoping of communication and process name space. Non-interfering communication domains: Used for scoping of communication and process name space.

P2P continue...  MPI_Send(start,count,datatype,dest,tag,comm)  MPI_Recv(start,count,datatype,source,tag, comm,status) Start: buffer initial address Start: buffer initial address Count: (maximum) number of elements received. Count: (maximum) number of elements received. Datatype: a descriptor for type of data items received; can describe an arbitrary (noncontiguous) data layout in memory. Datatype: a descriptor for type of data items received; can describe an arbitrary (noncontiguous) data layout in memory. Source: rank within communication group; can be MPI_ANY_SOURCE Source: rank within communication group; can be MPI_ANY_SOURCE Tag: Integer message identifier; can be MPI_ANY_TAG Tag: Integer message identifier; can be MPI_ANY_TAG Communicator: Communicator: specify an ordered group of communicating processes.specify an ordered group of communicating processes. specify a distinct communication domain. Message sent with one communicator can be received only with “same” communicator.specify a distinct communication domain. Message sent with one communicator can be received only with “same” communicator. Status: provides information on completed communication. Status: provides information on completed communication.

MPI message  Message = data + envelope  MPI_Send(startbuf, count, datatype, dest, tag, comm) DATA ENVELOPE

MPI data  startbuf (starting location of data)  count (number of elements) receive count >= send count receive count >= send count  datatype (basic or derived) receiver datatype = send datatype (unless MPI_PACKED) receiver datatype = send datatype (unless MPI_PACKED) Specifications of elementary datatypes allows heterogeneous communication. Specifications of elementary datatypes allows heterogeneous communication.

Datatype  MPI Datatype C Datatype MPI_CHAR MPI_CHAR MPI_SHORT MPI_SHORT MPI_INT MPI_INT MPI_LONG MPI_LONG MPI_UNSIGNED_CHAR MPI_UNSIGNED_CHAR MPI_UNSIGNED_SHORT MPI_UNSIGNED_SHORT MPI_UNSIGNED MPI_UNSIGNED MPI_UNSIGNED_LONG MPI_UNSIGNED_LONG MPI_FLOAT MPI_FLOAT MPI_DOUBLE MPI_DOUBLE MPI_LONG_DOUBLE MPI_LONG_DOUBLE MPI_BYTE MPI_BYTE MPI_PACKED MPI_PACKED  Derived datatypes mixed datatypes mixed datatypes contiguous arrays of datatypes contiguous arrays of datatypes strided blocks of datatypes strided blocks of datatypes indexed array of blocks of datatypes indexed array of blocks of datatypes general structure general structure  Datatypes are constructed recursively.

Functions to create new types  MPI_Type_contiguous(count, old, new) define a new MPI type comprising count contiguous values of type old define a new MPI type comprising count contiguous values of type old  MPI_Type_commit(type) commit the type - must be called before the type can be used commit the type - must be called before the type can be used  Derived types routines MPI_Type_commit MPI_Type_contiguous MPI_Type_count MPI_Type_extent MPI_Type_free MPI_Type_hindexed MPI_Type_hvector MPI_Type_indexed MPI_Type_lb MPI_Type_size MPI_Type_struct MPI_Type_ub MPI_Type_vectorMPI_Type_commit MPI_Type_contiguous MPI_Type_count MPI_Type_extent MPI_Type_free MPI_Type_hindexed MPI_Type_hvector MPI_Type_indexed MPI_Type_lb MPI_Type_size MPI_Type_struct MPI_Type_ub MPI_Type_vector

MPI envelope  destination or source rank in a communicator rank in a communicator receive = sender or MPI_ANY_SOURCE receive = sender or MPI_ANY_SOURCE  tag integer chosen by programmer integer chosen by programmer receive = sender or MPI_ANY_TAG receive = sender or MPI_ANY_TAG  communicator defines communication "space” defines communication "space” group + context group + context receive = send receive = send

Envelope continue...  MPI provides groups of processes initial all group initial all group group management routines (build, delete groups) group management routines (build, delete groups)  A context partitions the communication space.  A message sent in one context cannot be received in another context.  Contexts are managed by the system.  A group and a context are combined in a communicator.  Source/destination in send/receive operations refer to rank in group associated with a given communicator

Group routines  MPI_Group_size returns number of processes in group  MPI_Group_rank returns rank of calling process in group  MPI_Group_compare compares group members and group order  MPI_Group_translate_ranks translates ranks of processes in one group to those in another group  MPI_Comm_group returns the group associated with a communicator  MPI_Group_union creates a group by combining two groups  MPI_Group_intersection creates a group from the intersection of two groups

Group routines...  MPI_Group_difference creates a group from the difference between two groups  MPI_Group_incl creates a group from listed members of an existing group  MPI_Group_excl creates a group excluding listed members of an existing group  MPI_Group_range_incl creates a group according to first rank, stride, last rank  MPI_Group_range_excl creates a group by deleting according to first rank, stride, last rank  MPI_Group_free marks a group for deallocation

Communicator routines  MPI_Comm_size returns number of processes in communicator's group  MPI_Comm_rank returns rank of calling process in communicator's group  MPI_Comm_compare compares two communicators  MPI_Comm_dup duplicates a communicator  MPI_Comm_create creates a new communicator for a group  MPI_Comm_split splits a communicator into multiple, non- overlapping communicators  MPI_Comm_free marks a communicator for deallocation

Communication modes  MPI defines four communication modes synchronous mode ("safest") synchronous mode ("safest") ready mode (lowest system overhead) ready mode (lowest system overhead) buffered mode (decouples sender from receiver) buffered mode (decouples sender from receiver) standard mode (compromise) standard mode (compromise)  Communication mode is selected with send routine  Calls are also blocking or nonblocking. Blocking stops the program until the message buffer is safe to use Blocking stops the program until the message buffer is safe to use Non-blocking separates communication from computation Non-blocking separates communication from computation

Communication modes... Communication Blocking Non-Blocking Mode Routines Routines Mode Routines Routines Synchronous MPI_SSEND MPI_ISSEND Ready MPI_RSEND MPI_IRSEND Buffered MPI_BSEND MPI_IBSEND Standard MPI_SEND MPI_ISEND MPI_RECV MPI_IRECV MPI_RECV MPI_IRECV

Communication modes...  Buffer-mode: send operation can be started whether or not a matching receive has been posted. It may complete before a matching receive is posted.  synchronous-mode: send can be started whether or not a matching receive was posted. However, the send will complete successfully only if a matching receive is posted, and the receive operation has started to receive the message sent by the synchronous send.  ready-mode send may be started only if the matching receive has already been posted.  standard communication mode. In this mode, it is up to MPI to decide whether outgoing messages will be buffered. A send in standard mode can be started whether or not a matching receive has been posted.

Communication modes... Synchrono us ReadyBufferedStandard Safest, and therefore most portable SEND/RECV order not critical Amount of buffer space irrelevant Safest, and therefore most portable SEND/RECV order not critical Amount of buffer space irrelevant Lowest total overhead SEND/RECV handshake not required Lowest total overhead SEND/RECV handshake not required Decouples SEND from RECV No sync overhead on SEND Order of SEND/RECV irrelevant Programmer can control size of buffer space Decouples SEND from RECV No sync overhead on SEND Order of SEND/RECV irrelevant Programmer can control size of buffer space Good for many cases Good for many cases Can incur substantial syncronization overhead Can incur substantial syncronization overhead RECV must precede SEND RECV must precede SEND Additional system overhead incurred by copy to buffer Additional system overhead incurred by copy to buffer Your program may not be suitable Your program may not be suitable Advantages Disadvantages

Collective communication  MPI_Allgather All processes gather messages  MPI_Allreduce Reduce to all processes  MPI_Alltoall All processes gather distinct messages  MPI_Bcast Broadcast a message  MPI_Gather Gather a message to root  MPI_Reduce Global reduce operation  MPI_ReduceScatter Reduce and scatter results  MPI_Scatter Scatter a message from root  MPI_Scan Global prefix reduction

Collective communication...

Broadcast and reduce  MPI_Bcast(buffer, count, datatype, root, comm) Broadcast the message of length count in buffer from the process root to all other processes in the group. All processes must call with same arguments. Broadcast the message of length count in buffer from the process root to all other processes in the group. All processes must call with same arguments.  MPI_Reduce(sbuf, rbuf, count, stype, op, root, comm ) Apply the reduction function op to the data of each process in the group (type stype in sbuf) and store the result in rbuf on the root process. op can be a pre- defined function, or defined by the user. Apply the reduction function op to the data of each process in the group (type stype in sbuf) and store the result in rbuf on the root process. op can be a pre- defined function, or defined by the user.

Timing  MPI Wtime() returns the wall-clock time. double start, finish, time; MPI_Barrier(MPI_COMM_WORLD); start = MPI_Wtime(); ……MPI_Barrier(MPI_COMM_WORLD); finish = MPI_Wtime(); time = finish - start;

More MPI?  Important MPI features and issues not covered: Topology Topology Error handling Error handling Blocking versus non-blocking communications Blocking versus non-blocking communications Probing for new messages Probing for new messages Environmental management Environmental management Profiling and debugging interface Profiling and debugging interface

MPI-2  MPI-2 new topics: process creation and management, including client/server routines process creation and management, including client/server routines one-sided communications (put/get, active messages) one-sided communications (put/get, active messages) extended collective operations extended collective operations external interfaces external interfaces I/O I/O

Thanks !!! (But I have more to say. )

Designing MPI programs  Partitioning Before tackling MPI Before tackling MPI  Communication Many point to collective operations Many point to collective operations  Agglomeration Needed to produce MPI processes Needed to produce MPI processes  Mapping Handled by MPI Handled by MPI

MPI v.s. OpenMP  Both use SPMD model.  Message passing v.s. shared data  Processes v.s. Threads  MPI has no work sharing structure.

MPI  Pros: Very portable Very portable Requires no special compiler Requires no special compiler Requires no special hardware but can make use of high performance hardware Requires no special hardware but can make use of high performance hardware Very flexible -- can handle just about any model of parallelism Very flexible -- can handle just about any model of parallelism No shared data! (You don’t have to worry about processes "treading on each other's data" by mistake.) No shared data! (You don’t have to worry about processes "treading on each other's data" by mistake.) Can download free libraries for your Linux PC! Can download free libraries for your Linux PC! Forces you to do things the "right way" in terms of decomposing your problem. Forces you to do things the "right way" in terms of decomposing your problem.  Cons: All-or-nothing parallelism (difficult to incrementally parallelize existing serial codes) All-or-nothing parallelism (difficult to incrementally parallelize existing serial codes) No shared data! Requires distributed data structures No shared data! Requires distributed data structures Could be thought of assembler for parallel computing -- you generally have to write more code Could be thought of assembler for parallel computing -- you generally have to write more code Partitioning operations on distributed arrays can be messy. Partitioning operations on distributed arrays can be messy.

OpenMP  Pros: Incremental parallelism -- can parallelize existing serial codes one bit at a time Incremental parallelism -- can parallelize existing serial codes one bit at a time Quite simple set of directives Quite simple set of directives Shared data! Shared data! Partitioning operations on arrays is very simple. Partitioning operations on arrays is very simple.  Cons: Requires proprietary compilers Requires proprietary compilers Requires shared memory multiprocessors Requires shared memory multiprocessors Shared data! Shared data! Having to think about what data is shared and what data is private Having to think about what data is shared and what data is private Cannot handle models like master/slave work allocation (yet) Cannot handle models like master/slave work allocation (yet) Generally not as scalable (more synchronization points) Generally not as scalable (more synchronization points) Not well-suited for non-trivial data structures like linked lists, trees etc Not well-suited for non-trivial data structures like linked lists, trees etc

MPI: Message Passing Interface ECE563 Presentation ECE563 Presentation -- Zhelong Pan -- Zhelong Pan Feb. 10, 2003 Feb. 10, 2003.

Similar presentations

Presentation on theme: "MPI: Message Passing Interface ECE563 Presentation ECE563 Presentation -- Zhelong Pan -- Zhelong Pan Feb. 10, 2003 Feb. 10, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MPI: Message Passing Interface ECE563 Presentation ECE563 Presentation -- Zhelong Pan -- Zhelong Pan Feb. 10, 2003 Feb. 10, 2003.

Similar presentations

Presentation on theme: "MPI: Message Passing Interface ECE563 Presentation ECE563 Presentation -- Zhelong Pan -- Zhelong Pan Feb. 10, 2003 Feb. 10, 2003."— Presentation transcript:

Similar presentations

About project

Feedback