High Performance Computing Course Notes 2007-2008 Message Passing Programming I.

Slides:



Advertisements
Similar presentations
1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of.
Advertisements

MPI Message Passing Interface
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
CS 179: GPU Programming Lecture 20: Cross-system communication.
1 An Introduction to MPI Parallel Programming with the Message Passing Interface Originally by William Gropp and Ewing Lusk Adapted by Anda Iamnitchi.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
CS 484. Message Passing Based on multi-processor Set of independent processors Connected via some communication net All communication between processes.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
Introduction to MPI.
MPI Message Passing Interface
Send and Receive.
CS 584.
Send and Receive.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Message Passing Models
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
MPI: Message Passing Interface
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to parallelism and the Message Passing Interface
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

High Performance Computing Course Notes Message Passing Programming I

2 Computer Science, University of Warwick Message Passing Programming  Message Passing is the most widely used parallel programming model  Message passing works by creating a number of tasks, uniquely named, that interact by sending and receiving messages to and from one another (hence the message passing)  Generally, processes communicate through sending the data from the address space of one process to that of another  Communication of processes (via files, pipe, socket)  Communication of threads within a process (via global data area)  Programs based on message passing can be based on standard sequential language programs (C/C++, Fortran), augmented with calls to library functions for sending and receiving messages

3 Computer Science, University of Warwick Message Passing Interface (MPI)  MPI is a specification, not a particular implementation  Does not specify process startup, error codes, amount of system buffer, etc  MPI is a library, not a language  The goals of MPI: functionality, portability and efficiency  Message passing model > MPI specification > MPI implementation

4 Computer Science, University of Warwick OpenMP vs MPI In a nutshell MPI is used on distributed-memory systems OpenMP is used for code parallelisation on shared-memory systems  Both are explicit parallelism  High-level control (OpenMP), lower-level control (MPI)

5 Computer Science, University of Warwick A little history  Message-passing libraries developed for a number of early distributed memory computers  By 1993 there were loads of vendor specific implementations  By 1994 MPI-1 came into being  By 1996 MPI-2 was finalized

6 Computer Science, University of Warwick The MPI programming model  MPI standards -  MPI-1 (1.1, 1.2), MPI-2 (2.0)  Forwards compatibility preserved between versions  Standard bindings - for C, C++ and Fortran. Have seen MPI bindings for Python, Java etc (all non-standard)  We will stick to the C binding, for the lectures and coursework. More info on MPI  Implementations - For your laptop pick up MPICH (free portable implementation of MPI ( gov/mpi/mpich/index.htm) gov/mpi/mpich/index.htm  Coursework will use MPICH

7 Computer Science, University of Warwick MPI MPI is a complex system comprising of 129 functions with numerous parameters and variants Six of them are indispensable, but can write a large number of useful programs already Other functions add flexibility (datatype), robustness (non-blocking send/receive), efficiency (ready-mode communication), modularity (communicators, groups) or convenience (collective operations, topology). In the lectures, we are going to cover most commonly encountered functions

8 Computer Science, University of Warwick The MPI programming model  Computation comprises one or more processes that communicate via library routines and sending and receiving messages to other processes  (Generally) a fixed set of processes created at outset, one process per processor  Different from PVM

9 Computer Science, University of Warwick Intuitive Interfaces for sending and receiving messages  Send(data, destination), Receive(data, source)  minimal interface  Not enough in some situations, we also need  Message matching – add message_id at both send and receive interfaces  they become Send(data, destination, msg_id), receive(data, source, msg_id)  Message_id Is expressed using an integer, termed as message tag Allows the programmer to deal with the arrival of messages in an orderly fashion (queue and then deal with

10 Computer Science, University of Warwick How to express the data in the send/receive interfaces  Early stages:  (address, length) for the send interface  (address, max_length) for the receive interface  They are not always good  The data to be sent may not be in the contiguous memory locations  Storing format for data may not be the same or known in advance in heterogeneous platform  Enventually, a triple (address, count, datatype) is used to express the data to be sent and (address, max_count, datatype) for the data to be received  Reflecting the fact that a message contains much more structures than just a string of bits, For example, (vector_A, 300, MPI_REAL)  Programmers can construct their own datatype  Now, the interfaces become send(address, count, datatype, destination, msg_id) and receive(address, max_count, datatype, source, msg_id)

11 Computer Science, University of Warwick How to distinguish messages  Message tag is necessary, but not sufficient  So, communicator is introduced …

12 Computer Science, University of Warwick Communicators  Messages are put into contexts  Contexts are allocated at run time by the system in response to programmer requests  The system can guarantee that each generated context is unique  The processes belong to groups  The notions of context and group are combined in a single object, which is called a communicator  A communicator identifies a group of processes and a communication context  The MPI library defines a initial communicator, MPI_COMM_WORLD, which contains all the processes running in the system  The messages from different process groups can have the same tag  So the send interface becomes send(address, count, datatype, destination, tag, comm)

13 Computer Science, University of Warwick Status of the received messages  The structure of the message status is added to the receive interface  Status holds the information about source, tag and actual message size  In the C language, source can be retrieved by accessing status.MPI_SOURCE,  tag can be retrieved by status.MPI_TAG and  actual message size can be retrieved by calling the function MPI_Get_count(&status, datatype, &count)  The receive interface becomes receive(address, maxcount, datatype, source, tag, communicator, status)

14 Computer Science, University of Warwick How to express source and destination  The processes in a communicator (group) are identified by ranks  If a communicator contains n processes, process ranks are integers from 0 to n-1  Source and destination processes in the send/receive interface are the ranks

15 Computer Science, University of Warwick Some other issues In the receive interface, tag can be a wildcard, which means any message will be received In the receive interface, source can also be a wildcard, which match any source

16 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

17 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

18 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

19 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Calculating the size of the data to be send … buf address of send buffer count* sizeof (datatype) bytes of data

20 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

21 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Send (buf, count, datatype, dest, tag, comm) Send a message buf address of send buffer countno. of elements to send (>=0) datatype of elements destprocess id of destination tagmessage tag commcommunicator (handle)

22 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Recv (buf, count, datatype, source, tag, comm, status) Receive a message buf address of receive buffer (var param) countmax no. of elements in receive buffer (>=0) datatype of receive buffer elements sourceprocess id of source process, or MPI_ANY_SOURCE tagmessage tag, or MPI_ANY_TAG commcommunicator statusstatus object

23 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Init (int *argc, char ***argv) Initiate a computation argc (number of arguments) and argv (argument vector) are main program’s arguments Must be called first, and once per process MPI_Finalize ( ) Shut down a computation The last thing that happens

24 Computer Science, University of Warwick MPI basics First six functions (C bindings) MPI_Comm_size (MPI_Comm comm, int *size) Determine number of processes in comm comm is communicator handle, MPI_COMM_WORLD is the default (including all MPI processes) size holds number of processes in group MPI_Comm_rank (MPI_Comm comm, int *pid) Determine id of current (or calling) process pid holds id of current process

25 Computer Science, University of Warwick #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); }MPI_InitMPI_Comm_sizeMPI_Comm_rankMPI_Finalize MPI basics – a basic example mpirun –np 4 myprog Hello, world. I am 1 of 4 Hello, world. I am 3 of 4 Hello, world. I am 0 of 4 Hello, world. I am 2 of 4

26 Computer Science, University of Warwick MPI basics – send and recv example (1) #include "mpi.h" #include int main(int argc, char *argv[]) { int rank, size, i; int buffer[10]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (size < 2) { printf("Please run with two processes.\n"); MPI_Finalize(); return 0; } if (rank == 0) { for (i=0; i<10; i++) buffer[i] = i; MPI_Send(buffer, 10, MPI_INT, 1, 123, MPI_COMM_WORLD); }MPI_InitMPI_Comm_sizeMPI_Comm_rankMPI_FinalizeMPI_Send

27 Computer Science, University of Warwick MPI basics – send and recv example (2) if (rank == 1) { for (i=0; i<10; i++) buffer[i] = -1; MPI_Recv(buffer, 10, MPI_INT, 0, 123, MPI_COMM_WORLD, &status); for (i=0; i<10; i++) { if (buffer[i] != i) printf("Error: buffer[%d] = %d but is expected to be %d\n", i, buffer[i], i); } } MPI_Finalize(); }MPI_RecvMPI_Finalize

28 Computer Science, University of Warwick MPI language bindings  Standard (accepted) bindings for Fortran, C and C++  Java bindings are work in progress  JavaMPIJava wrapper to native calls  mpiJavaJNI wrappers  jmpipure Java implementation of MPI library  MPIJsame idea  Java Grande Forum trying to sort it all out  We will use the C bindings

29 Computer Science, University of Warwick High Performance Computing Course Notes Message Passing Programming II

30 Computer Science, University of Warwick Modularity  MPI supports modular programming via communicators  Provides information hiding by encapsulating local communications and having local namespaces for processes  All MPI communication operations specify a communicator (process group that is engaged in the communication)

31 Computer Science, University of Warwick Forming new communicators – one approach MPI_Comm world, workers; MPI_Group world_group, worker_group; int ranks[1]; MPI_Init(&argc, &argv); world=MPI_COMM_WORLD; MPI_Comm_size(world, &numprocs); MPI_Comm_rank(world, &myid); server=numprocs-1; MPI_Comm_group(world, &world_group); ranks[0]=server; MPI_Group_excl(world_group, 1, ranks, &worker_group); MPI_Comm_create(world, worker_group, &workers); MPI_Group_free(&world_group); MPI_Comm_free(&workers);

32 Computer Science, University of Warwick Forming new communicators - functions int MPI_Comm_group(MPI_Comm comm, MPI_Group *group) int MPI_Group_excl(MPI_Group group, int n, int *ranks, MPI_Group *newgroup) Int MPI_Group_incl(MPI_Group group, int n, int *ranks, MPI_Group *newgroup) int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *newcomm) int MPI_Group_free(MPI_Group *group) int MPI_Comm_free(MPI_Comm *comm)

33 Computer Science, University of Warwick Forming new communicators – another approach (1) MPI_Comm_split (comm, colour, key, newcomm) Creates one or more new communicators from the original comm commcommunicator (handle) colourcontrol of subset assignment (processes with same colour are in same new communicator) keycontrol of rank assignment newcommnew communicator Is a collective communication operation (must be executed by all processes in the process group comm) Is used to (re-) allocate processes to communicator (groups)

34 Computer Science, University of Warwick Forming new communicators – another approach (2) MPI_Comm_split (comm, colour, key, newcomm) MPI_Comm comm, newcomm; int myid, color; MPI_Comm_rank(comm, &myid); // id of current process color = myid%3; MPI_Comm_split(comm, colour, myid, *newcomm); :1:2:

35 Computer Science, University of Warwick Forming new communicators – another approach (3) MPI_Comm_split (comm, colour, key, newcomm)  New communicator created for each new value of colour  Each new communicator (sub-group) comprises those processes that specify its value in colour  These processes are assigned new identifiers (ranks, starting at zero) with the order determined by the value of key (or by their ranks in the old communicator in event of ties)

36 Computer Science, University of Warwick Communications  Point-to-point communications: involving exact two processes, one sender and one receiver  For example, MPI_Send() and MPI_Recv()  Collective communications: involving a group of processes

37 Computer Science, University of Warwick Collective operations  i.e. coordinated communication operations involving multiple processes  Programmer could do this by hand (tedious), MPI provides a specialized collective communications  barrier – synchronize all processes  broadcast – sends data from one to all processes  gather – gathers data from all processes to one process  scatter – scatters data from one process to all processes  reduction operations – sums, multiplies etc. distributed data  all executed collectively (on all processes in the group, at the same time, with the same parameters)

38 Computer Science, University of Warwick MPI_Barrier (comm) Global synchronization comm is the communicator handle No processes return from function until all processes have called it Good way of separating one phase from another Collective operations

39 Computer Science, University of Warwick Barrier synchronizations You are only as quick as your slowest process Barrier sync.

40 Computer Science, University of Warwick MPI_Bcast (buf, count, type, root, comm) Broadcast data from root to all processes buf address of input buffer or output buffer (root) countno. of entries in buffer (>=0) type datatype of buffer elements rootprocess id of root process commcommunicator Collective operations proc. data A0A0 A0A0 A0A0 A0A0 A0A0 One to all broadcast MPI_BCAST

41 Computer Science, University of Warwick Broadcast 100 ints from process 0 to every process in the group MPI_Comm comm; int array[100]; int root = 0; … MPI_Bcast (array, 100, MPI_INT, root, comm); Example of MPI_Bcast

42 Computer Science, University of Warwick MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm) Collective data movement function inbuf address of input buffer incountno. of elements sent from each (>=0) intype datatype of input buffer elements outbufaddress of output buffer (var param) outcountno. of elements received from each outtypedatatype of output buffer elements rootprocess id of root process commcommunicator Collective operations proc. data A0A0 A0A0 A1A1 A2A2 A3A3 All to one gather MPI_GATHER A1A1 A2A2 A3A3

43 Computer Science, University of Warwick MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm) Collective data movement function inbuf address of input buffer incountno. of elements sent from each (>=0) intype datatype of input buffer elements outbufaddress of output buffer outcountno. of elements received from each outtypedatatype of output buffer elements rootprocess id of root process commcommunicator Collective operations proc. data A0A0 A0A0 A1A1 A2A2 A3A3 All to one gather MPI_GATHER A1A1 A2A2 A3A3 Input to gather

44 Computer Science, University of Warwick MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm) Collective data movement function inbuf address of input buffer incountno. of elements sent from each (>=0) intype datatype of input buffer elements outbufaddress of output buffer (var param) outcountno. of elements received from each outtypedatatype of output buffer elements rootprocess id of root process commcommunicator Collective operations proc. data A0A0 A0A0 A1A1 A2A2 A3A3 All to one gather MPI_GATHER A1A1 A2A2 A3A3 Output gather

45 Computer Science, University of Warwick MPI_Gather (inbuf, incount, intype, outbuf, outcount, outtype, root, comm) Collective data movement function inbuf address of input buffer incountno. of elements sent from each (>=0) intype datatype of input buffer elements outbufaddress of output buffer (var param) outcountno. of elements received from each outtypedatatype of output buffer elements rootprocess id of root process commcommunicator Collective operations proc. data A0A0 A0A0 A1A1 A2A2 A3A3 All to one gather MPI_GATHER A1A1 A2A2 A3A3 Receiving proc.

46 Computer Science, University of Warwick MPI_Gather example Gather 100 ints from every process in group to root MPI_Comm comm; int gsize, sendarray[100]; int root, myrank, *rbuf;... MPI_Comm_rank( comm, myrank);// find proc. id If (myrank == root) { MPI_Comm_size( comm, &gsize); // find group size rbuf = (int *) malloc(gsize*100*sizeof(int)); // calc. receive buffer } MPI_Gather( sendarray, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);

47 Computer Science, University of Warwick MPI_Scatter (inbuf, incount, intype, outbuf, outcount, outtype, root, comm) Collective data movement function inbuf address of input buffer incountno. of elements sent to each (>=0) intype datatype of input buffer elements outbufaddress of output buffer outcountno. of elements received by each outtypedatatype of output buffer elements rootprocess id of root process commcommunicator Collective operations proc. data A0A0 A0A0 A1A1 A2A2 A3A3 One to all scatter MPI_SCATTER A1A1 A2A2 A3A3

48 Computer Science, University of Warwick Example of MPI_Scatter MPI_Scatter is reverse of MPI_Gather It is as if the root sends using MPI_Send(inbuf+i*incount * sizeof(intype), incount, intype, i, …) MPI_Comm comm; int gsize, *sendbuf; int root, rbuff[100]; … MPI_Comm_size (comm, &gsize); sendbuf = (int *) malloc (gsize*100*sizeof(int)); … MPI_Scatter (sendbuf, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);

49 Computer Science, University of Warwick MPI_Reduce (inbuf, outbuf, count, type, op, root, comm) Collective reduction function inbuf address of input buffer outbufaddress of output buffer countno. of elements in input buffer (>=0) type datatype of input buffer elements opoperation rootprocess id of root process commcommunicator Collective operations proc. data 2 Using MPI_MIN Root = 0 MPI_REDUCE

50 Computer Science, University of Warwick MPI_Reduce (inbuf, outbuf, count, type, op, root, comm) Collective reduction function inbuf address of input buffer outbufaddress of output buffer countno. of elements in input buffer (>=0) type datatype of input buffer elements opoperation rootprocess id of root process commcommunicator Collective operations proc. data 2 Using MPI_SUM Root = 1 MPI_REDUCE

51 Computer Science, University of Warwick MPI_Allreduce (inbuf, outbuf, count, type, op, comm) Collective reduction function inbuf address of input buffer outbufaddress of output buffer (var param) countno. of elements in input buffer (>=0) type datatype of input buffer elements opoperation commcommunicator Collective operations proc. data 2 Using MPI_MIN MPI_ALLREDUCE

52 Computer Science, University of Warwick Buffering in MPI communications  Application buffer: specified by the first parameter in MPI_Send/Recv functions  System buffer:  Hidden from the programmer and managed by the MPI library  Is limitted and can be easy to exhaust

53 Computer Science, University of Warwick Blocking and non-blocking communications  Blocking send  The sender doesn’t return until the application buffer can be re-used (which often means that the data have been copied from application buffer to system buffer), but doesn’t mean that the data will be received MPI_Send (buf, count, datatype, dest, tag, comm)buf, count, datatype, dest, tag, comm  Blocking receive  The receiver doesn’t return until the data have been ready to use by the receiver (which often means that the data have been copied from system buffer to application buffer)  Non-blocking send/receive  The calling process returns immediately  Just request the MPI library to perform the operation, the user cannot predict when that will happen  Unsafe to modify the application buffer until you can make sure the requested operation has been performed (MPI provides routines to test this)  Can be used to overlap computation with communication and have possible performance gains MPI_Isend (buf, count, datatype, dest, tag, comm, request)

54 Computer Science, University of Warwick Testing non-blocking communications for completion  Completion tests come in two types:  WAIT type  TEST type  WAIT type: the WAIT type testing routines block until the communication has completed.  A non-blocking communication immediately followed by a WAIT- type test is equivalent to the corresponding blocking communication  TEST type: these routines return TRUE or FALSE value  The process can perform some other tasks when the communication has not completed

55 Computer Science, University of Warwick Testing non-blocking communications for completion The WAIT-type test is: MPI_Wait (request, status) This routine blocks until the communication specified by the handle request has completed. The request handle will have been returned by an earlier call to a non-blocking communication routine. The TEST-type test is: MPI_Test (request, flag, status) In this case the communication specified by the handle request is simply queried to see if the communication has completed and the result of the query (TRUE or FALSE) is returned immediately in flag.

56 Computer Science, University of Warwick Testing multiple non-blocking communications for completion Wait for all communications to complete MPI_Waitall (count, array_of_requests, array_of_statuses) This routine blocks until all the communications specified by the request handles, array_of_requests, have completed. The statuses of the communications are returned in the array array_of_statuses and each can be queried in the usual way for the source and tag if required Test if all communications have completed MPI_Testall (count, array_of_requests, flag, array_of_statuses) If all the communications have completed, flag is set to TRUE, and information about each of the communications is returned in array_of_statuses. Otherwise flag is set to FALSE and array_of_statuses is undefined.

57 Computer Science, University of Warwick Testing multiple non-blocking communications for completion Query a number of communications at a time to find out if any of them have completed Wait: MPI_Waitany (count, array_of_requests, index, status)MPI_Waitany (count, array_of_requests, index, status)  MPI_WAITANY blocks until one or more of the communications associated with the array of request handles, array_of_requests, has completed.  The index of the completed communication in the array_of_requests handles is returned in index, and its status is returned in status.  Should more than one communication have completed, the choice of which is returned is arbitrary. Test: MPI_Testany (count, array_of_requests, index, flag, status)MPI_Testany (count, array_of_requests, index, flag, status)  The result of the test (TRUE or FALSE) is returned immediately in flag.