1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology

2/44 Introduction Massage-Passing interface (MPI) A library of functions and macros Objectives: define an international long-term standard API for portable parallel applications and get all hardware vendors involved in implementations of this standard; define a target system for parallelizing compilers Can be used in C,C++,FORTRAN The MPI Forum (http://www.mpi-forum.org/) brings together all contributing parties

3/44 The User’s View Communication System (MPI) Processor Process Processor Process Processor Process Processor Process

4/44 Programming with MPI Include the lib file mpi.h (or however called) into the source code Initialize the MPI environment: MPI_Init (&argc, &argv) Must be called and only once before any other MPI functions At the end of the program : MPI_Finalize( ); Cleans up any unfinished business left by MPI General MPI Programs

5/44 Programming with MPI (cont.) Get your own process ID (rank): MPI_Comm_rank (MPI_Comm comm, int rank) First argument is a communicator Communicator: a collection of processes send message to each other Get the number of processes (including oneself): MPI_Comm_size (MPI_comm comm, int size) Size: number of processes in comm

6/44 What is message? Message: Data + Envelope Envelope : Additional information to message be communicated successfully Envelop contains: Rank of sender (who send the message) Can be a wildcard: MPI_ANY_SOURCE Rank of receiver (who received the message) No wildcard for dest A tag: used to distinguish messages received from a single process Can be a wildcard: MPI_ANY_TAG Communicator

7/44 Point-to-Point Communication a send command can be Blocking: continuation possible after passing to communication system has been completed (buffer can be re-used) non-blocking: immediate continuation possible (check buffer whether message has been sent and buffer can be re-used)

8/44 Point-to-Point Communication (Cont.) Four types of point-to-point send operations, each of them available in a blocking and a non-blocking variant Standard (regular) send: MPI_SEND or MPI_ISEND Asynchronous; the system decides whether or not to buffer messages to be sent Successful completion may depend on matching receive Buffered send: MPI_BSEND or MPI_IBSEND Asynchronous, but buffering of messages to be sent by the system is enforced Synchronous send: MPI_SSEND or MPI_ISSEND Synchronous, i.e. the send operation is not completed before the receiver has started to receive the message

9/44 Point-to-Point Communication (Cont.) Ready send: MPI_RSEND or MPI_IRSEND Send may started only if matching receive has been posted: if no corresponding receive operation is available, the result is undefined Could be replaced by standard send with no effect other than performance Meaning of blocking or non-blocking (variants with ‘I’): Blocking: send operation is not completed before the send buffer can be reused Non-blocking: immediate continuation, and the user has to make sure that the buffer won’t be corrupted

10/44 Point-to-Point Communication (cont.) one receive function: Blocking MPI_Recv : Receive operation is completed when the message has been completely written into the receive buffer Non-blocking MPI_IRecv : Continuation immediately after the receiving has begun Can be combined with four send modes

11/44 Point-to-Point Communication (Cont.) Syntax: MPI_SEND(buf, count, datatype, dest, tag, comm) MPI_RECV(buf, count, datatype, source, tag, comm, status) where Void *bufpointer to the buffer’s begin int countnumber of data objects int sourceprocess ID of the sending process int destprocess ID of the destination process int tagID of the message MPI_Datatypedata type of the data objects MPI_Comm commcommunicator (see later) MPI_Status *statusobject containing message information In the non-blocking versions, there’s one additional argument complete (request) for checking the completion of the communication.

12/44 Test Message Arrived MPI_Buffer_attach(...): lets MPI provide a buffer MPI_Probe(...)/ MPI_Iprobe(...): Blocking/ non-blocking test whether a message has arrived without actually receive them MPI_Test(...): checks whether a send or receive operation is completed MPI_Wait(...): causes the process to wait until a send or receive operation has been completed MPI_Get_count(...): provides the length of a message received

13/44 Data Types Standard MPI data types: MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_UNSIGNED MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE(8-binary digit) MPI_PACKED

14/44 Grouping Data Why? The fewer messages sent, better overall performance Three mechanisms: Count Parameter: group data having the same basic type as an array Derived Types Pack/Unpack

15/44 Building Derived Types Specify types of members of the derived type Number of elements of each type Calculate addresses of members Calculate displacements: Relative location Create the derived type MPI_Type_Struct(…) Commit it MPI_Type_commit(…)

16/44 Other Derived Data type constructors MPI_Type_contiguous(...): Constructs an array consisting of count elements of type old type belong to contiguous memory MPI_Type_vector(...): constructs an MPI array with element-to-element distance stride MPI_Type_ indexed(...): constructs an MPI array with different block lenghts

17/44 Packing and Unpacking Elements of a complex data structure can be packed, sent, and unpacked again element by element: expensive and error-prone Pack: store noncontiguous data in contiguous memory location Unpack: copy data from a contiguous buffer into noncontiguous memory locations MPI functions for explicit packing and unpacking: MPI_Pack(...): Packs data into a buffer MPI_Unpack(...): unpacks data from the buffer

18/44 Collective Communication Why? Many applications require not only a point-to-point communication, but also collective communication operations Collective communication: Broadcast Gather Scatter All-to-All Reduce

19/44 Broadcast P0 P1 P2 P3 P0 P1 P2 P3 Send BuffersReceive Buffers

20/44 Gather P0 P1 P2 P3 P0 P1 P2 P3 Send BuffersReceive Buffers

21/44 Scatter P0 P1 P2 P3 P1 Send BuffersReceive Buffers P0 P2 P3

22/44 All to All Send BuffersReceive Buffers ABCDC B A D

23/44 Reduce P0 P1 P2 P3 P0 P1 P2 P3 Send BuffersReceive Buffers Reduction Operation

24/44 All Reduce Send BuffersReceive Buffers P0 P1 P2 P3 P0 P1 P2 P3 Reduction Operation

25/44 Collective Communication (Cont.) Important application scenario: distribute the elements of vectors or matrices among several processors Some functions offered by MPI MPI_Barrier(...): synchronization barrier: process waits for the other group members; when all of them have reached the barrier, they can continue MPI_Bcast(...): sends the data to all members of the group given by a communicator (hence more a multicast than a broadcast) MPI_Gather(...): collects data from the group members

26/44 Collective Communication (Cont.) MPI_Allgather(...): gather-to-all: data are collected from all processes, and all get the collection MPI_Scatter(...): classical scatter operation: distribution of data among processes MPI_Reduce(...): executes a reduce operation MPI_Allreduce(...): executes a reduce operation where all processes get its result MPI_Op_create(...) and MPI_Op_free(...): defines a new reduce operation or removes it, respectively Note that all of the functions above are with respect to a communicator (hence not necessarily a global communication)

27/44 Process Groups and Communicators Messages are tagged for identification – message tag is message ID! Again: process groups for restricted message exchange and restricted collective communication Process groups are ordered sets of processes Each process is locally uniquely identified via its local (group-related) process ID or rank Ordering starts with zero, successive numbering Global identification of a process via the pair (process group, rank)

28/44 Process Groups and Communicators MPI communicators: concept for working with contexts Communicator = process group + message context MPI offers intra-communicators for collective communication within a process group and inter- communicators for (point-to-point) communication between two process groups Default (including all processes): MPI_COMM_WORLD MPI provides a lot of functions for working with process groups and communicators

29/44 Working with communicator To create new communicator Make a list of the processes in new communicator Get a group of processor in the list MPI_Comm_Group(…) Create new group MPI_Group_incl(…) Create actual communicator MPI_Comm_create(…) Note: To create several communicator simultaneously MPI_Comm_split(…)

30/44 Process Topologies Provide a convenient naming mechanism for processes of a group Assist the runtime system in mapping onto hardware Only for intra-communicator virtual topology: Set of process represented by a graph Most common topologies: mesh,tori

31/44 Some useful functions MPI_Comm_rank(…) Indicates rank of the process call it MPI_Comm_size Returns size of the group MPI_Comm_dup(..) Cerates a new communicator with the same attributes of input communicator MPI_Comm_free(MPI_Comm *comm) set the handle to MPI_COMM_NULL

32/44 An example of Cartesian graph Upper number is rank lower pair is (row,col) coordinates

33/44 Cartesian Topology Functions MPI_Cart_create(…) Returns a handle to a new communicator to which the Cartesian topology information is attached MPI_Dimes_create(…) To select a balanced distribution of process MPI_Cartdim_get(…) Returns numbers of dimensions MPI_Cart_get(…) Returns information on topology MPI_Cart_sub(…) Partition Cartesian topology into a Cartesian of lower dimension MPI_Cart_coords(..), MPI_Cart_rank(…)

34 DCT Parallelism

35/44 Preliminary DCT: Discrete Cosine Transform 2D DCT: applied a 1D DCT twice 2D-DCT Equation X: N*N Matrix C: N*N matrix defined as: Y contains DCT coefficients Main operation is matrix mult

36/44 FOX’s Algorithm Multiply two square matrices Assume two matrices: A = (a ij ) and B = (b ij ) Matrices are from order n Assume number of processes are p: perfect square so: p=q 2 n_bar = n/q: an integer Each process has a block of A and B as a matrices from order n/q

37/44 FOX’s Algorithm (Cont.) For example: p=9 and n=6

38/44 FOX’s Algorithm (Cont.)

39/44 FOX’s Algorithm (Cont.) The chosen submatrix in the r’th row is A r,u where u= (r+step) mode q Example: at step=0 these multiplication done r=0: A 00 B 00,A 00 B 01,A 00 B 02 r=1:A 11 B 10,A 11 B 11,A 11 B 12 r=1:A 22 B 20,A 22 B 21,A 22 B 22 Other mults done in other steps Processes communicate to each other so the mult of two matrices results

40/44 Implementation of algorithm Assume each row of processes as a communicator Assume each column of processes as a communicator MPI_Cart_sub(Com, var_coor, row_com); MPI_Cart_sub(grid->Com, var_coor,col_com)); Can use other functions: (more general communicator cunstruction functions) MPI_Comm_incl(com,q,rank,row_comm) MPI_Comm_create(comm,row_com,&row_com)

41/44 Implementation of MPI An MPI implementation consists of a subroutine library with all MPI functions include files for the calling application program some startup script (usually called mpirun, but not standardized) MPICH Support both operating systems: linux and Microsaft Windows Other implementation of MPI: Many different MPI implementation are available i.e: LAM Support MPI programming on networks of unix workstation See other implementation and their features: http://www.lam-mpi.org/mpi/implementations/fulllist.php

42/44 Implementation of MPI (Cont.) IMPI: Interoperable MPI A protocol specification to allow multiple MPI implementations to cooperate on a single MPI job. Any correct MPI program will run correctly under IMPI Divided into four parts: Startup/shutdown protocols Data transfer protocol Collective algorithm A centralized IMPI conformance testing methodology

43/44 Extensions to MPI External Interfaces One-sided Communication Dynamic Resource Management Extended Collective Bindings Real Time Some of these features are still subject to change

44/44 Question?

1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

Similar presentations

Presentation on theme: "1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

Similar presentations

Presentation on theme: "1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology."— Presentation transcript:

Similar presentations

About project

Feedback