1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Reference: Getting Started with MPI.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Introduction to MPI Programming Ganesh C.N.
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
CS 584.
MPI-Message Passing Interface
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
MPI: Message Passing Interface
Introduction to parallelism and the Message Passing Interface
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message Passing Programming Based on MPI
Message-Passing Computing Message Passing Interface (MPI)
Hello, world in MPI #include <stdio.h> #include "mpi.h"
5- Message-Passing Programming
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology

2/44 Introduction Massage-Passing interface (MPI) A library of functions and macros Objectives: define an international long-term standard API for portable parallel applications and get all hardware vendors involved in implementations of this standard; define a target system for parallelizing compilers Can be used in C,C++,FORTRAN The MPI Forum ( brings together all contributing parties

3/44 The User’s View Communication System (MPI) Processor Process Processor Process Processor Process Processor Process

4/44 Programming with MPI Include the lib file mpi.h (or however called) into the source code Initialize the MPI environment: MPI_Init (&argc, &argv) Must be called and only once before any other MPI functions At the end of the program : MPI_Finalize( ); Cleans up any unfinished business left by MPI General MPI Programs

5/44 Programming with MPI (cont.) Get your own process ID (rank): MPI_Comm_rank (MPI_Comm comm, int rank) First argument is a communicator Communicator: a collection of processes send message to each other Get the number of processes (including oneself): MPI_Comm_size (MPI_comm comm, int size) Size: number of processes in comm

6/44 What is message? Message: Data + Envelope Envelope : Additional information to message be communicated successfully Envelop contains: Rank of sender (who send the message) Can be a wildcard: MPI_ANY_SOURCE Rank of receiver (who received the message) No wildcard for dest A tag: used to distinguish messages received from a single process Can be a wildcard: MPI_ANY_TAG Communicator

7/44 Point-to-Point Communication a send command can be Blocking: continuation possible after passing to communication system has been completed (buffer can be re-used) non-blocking: immediate continuation possible (check buffer whether message has been sent and buffer can be re-used)

8/44 Point-to-Point Communication (Cont.) Four types of point-to-point send operations, each of them available in a blocking and a non-blocking variant Standard (regular) send: MPI_SEND or MPI_ISEND Asynchronous; the system decides whether or not to buffer messages to be sent Successful completion may depend on matching receive Buffered send: MPI_BSEND or MPI_IBSEND Asynchronous, but buffering of messages to be sent by the system is enforced Synchronous send: MPI_SSEND or MPI_ISSEND Synchronous, i.e. the send operation is not completed before the receiver has started to receive the message

9/44 Point-to-Point Communication (Cont.) Ready send: MPI_RSEND or MPI_IRSEND Send may started only if matching receive has been posted: if no corresponding receive operation is available, the result is undefined Could be replaced by standard send with no effect other than performance Meaning of blocking or non-blocking (variants with ‘I’): Blocking: send operation is not completed before the send buffer can be reused Non-blocking: immediate continuation, and the user has to make sure that the buffer won’t be corrupted

10/44 Point-to-Point Communication (cont.) one receive function: Blocking MPI_Recv : Receive operation is completed when the message has been completely written into the receive buffer Non-blocking MPI_IRecv : Continuation immediately after the receiving has begun Can be combined with four send modes

11/44 Point-to-Point Communication (Cont.) Syntax: MPI_SEND(buf, count, datatype, dest, tag, comm) MPI_RECV(buf, count, datatype, source, tag, comm, status) where Void *bufpointer to the buffer’s begin int countnumber of data objects int sourceprocess ID of the sending process int destprocess ID of the destination process int tagID of the message MPI_Datatypedata type of the data objects MPI_Comm commcommunicator (see later) MPI_Status *statusobject containing message information In the non-blocking versions, there’s one additional argument complete (request) for checking the completion of the communication.

12/44 Test Message Arrived MPI_Buffer_attach(...): lets MPI provide a buffer MPI_Probe(...)/ MPI_Iprobe(...): Blocking/ non-blocking test whether a message has arrived without actually receive them MPI_Test(...): checks whether a send or receive operation is completed MPI_Wait(...): causes the process to wait until a send or receive operation has been completed MPI_Get_count(...): provides the length of a message received

13/44 Data Types Standard MPI data types: MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_UNSIGNED MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE(8-binary digit) MPI_PACKED

14/44 Grouping Data Why? The fewer messages sent, better overall performance Three mechanisms: Count Parameter: group data having the same basic type as an array Derived Types Pack/Unpack

15/44 Building Derived Types Specify types of members of the derived type Number of elements of each type Calculate addresses of members Calculate displacements: Relative location Create the derived type MPI_Type_Struct(…) Commit it MPI_Type_commit(…)

16/44 Other Derived Data type constructors MPI_Type_contiguous(...): Constructs an array consisting of count elements of type old type belong to contiguous memory MPI_Type_vector(...): constructs an MPI array with element-to-element distance stride MPI_Type_ indexed(...): constructs an MPI array with different block lenghts

17/44 Packing and Unpacking Elements of a complex data structure can be packed, sent, and unpacked again element by element: expensive and error-prone Pack: store noncontiguous data in contiguous memory location Unpack: copy data from a contiguous buffer into noncontiguous memory locations MPI functions for explicit packing and unpacking: MPI_Pack(...): Packs data into a buffer MPI_Unpack(...): unpacks data from the buffer

18/44 Collective Communication Why? Many applications require not only a point-to-point communication, but also collective communication operations Collective communication: Broadcast Gather Scatter All-to-All Reduce

19/44 Broadcast P0 P1 P2 P3 P0 P1 P2 P3 Send BuffersReceive Buffers

20/44 Gather P0 P1 P2 P3 P0 P1 P2 P3 Send BuffersReceive Buffers

21/44 Scatter P0 P1 P2 P3 P1 Send BuffersReceive Buffers P0 P2 P3

22/44 All to All Send BuffersReceive Buffers ABCDC B A D

23/44 Reduce P0 P1 P2 P3 P0 P1 P2 P3 Send BuffersReceive Buffers Reduction Operation

24/44 All Reduce Send BuffersReceive Buffers P0 P1 P2 P3 P0 P1 P2 P3 Reduction Operation

25/44 Collective Communication (Cont.) Important application scenario: distribute the elements of vectors or matrices among several processors Some functions offered by MPI MPI_Barrier(...): synchronization barrier: process waits for the other group members; when all of them have reached the barrier, they can continue MPI_Bcast(...): sends the data to all members of the group given by a communicator (hence more a multicast than a broadcast) MPI_Gather(...): collects data from the group members

26/44 Collective Communication (Cont.) MPI_Allgather(...): gather-to-all: data are collected from all processes, and all get the collection MPI_Scatter(...): classical scatter operation: distribution of data among processes MPI_Reduce(...): executes a reduce operation MPI_Allreduce(...): executes a reduce operation where all processes get its result MPI_Op_create(...) and MPI_Op_free(...): defines a new reduce operation or removes it, respectively Note that all of the functions above are with respect to a communicator (hence not necessarily a global communication)

27/44 Process Groups and Communicators Messages are tagged for identification – message tag is message ID! Again: process groups for restricted message exchange and restricted collective communication Process groups are ordered sets of processes Each process is locally uniquely identified via its local (group-related) process ID or rank Ordering starts with zero, successive numbering Global identification of a process via the pair (process group, rank)

28/44 Process Groups and Communicators MPI communicators: concept for working with contexts Communicator = process group + message context MPI offers intra-communicators for collective communication within a process group and inter- communicators for (point-to-point) communication between two process groups Default (including all processes): MPI_COMM_WORLD MPI provides a lot of functions for working with process groups and communicators

29/44 Working with communicator To create new communicator Make a list of the processes in new communicator Get a group of processor in the list MPI_Comm_Group(…) Create new group MPI_Group_incl(…) Create actual communicator MPI_Comm_create(…) Note: To create several communicator simultaneously MPI_Comm_split(…)

30/44 Process Topologies Provide a convenient naming mechanism for processes of a group Assist the runtime system in mapping onto hardware Only for intra-communicator virtual topology: Set of process represented by a graph Most common topologies: mesh,tori

31/44 Some useful functions MPI_Comm_rank(…) Indicates rank of the process call it MPI_Comm_size Returns size of the group MPI_Comm_dup(..) Cerates a new communicator with the same attributes of input communicator MPI_Comm_free(MPI_Comm *comm) set the handle to MPI_COMM_NULL

32/44 An example of Cartesian graph Upper number is rank lower pair is (row,col) coordinates

33/44 Cartesian Topology Functions MPI_Cart_create(…) Returns a handle to a new communicator to which the Cartesian topology information is attached MPI_Dimes_create(…) To select a balanced distribution of process MPI_Cartdim_get(…) Returns numbers of dimensions MPI_Cart_get(…) Returns information on topology MPI_Cart_sub(…) Partition Cartesian topology into a Cartesian of lower dimension MPI_Cart_coords(..), MPI_Cart_rank(…)

34 DCT Parallelism

35/44 Preliminary DCT: Discrete Cosine Transform 2D DCT: applied a 1D DCT twice 2D-DCT Equation X: N*N Matrix C: N*N matrix defined as: Y contains DCT coefficients Main operation is matrix mult

36/44 FOX’s Algorithm Multiply two square matrices Assume two matrices: A = (a ij ) and B = (b ij ) Matrices are from order n Assume number of processes are p: perfect square so: p=q 2 n_bar = n/q: an integer Each process has a block of A and B as a matrices from order n/q

37/44 FOX’s Algorithm (Cont.) For example: p=9 and n=6

38/44 FOX’s Algorithm (Cont.)

39/44 FOX’s Algorithm (Cont.) The chosen submatrix in the r’th row is A r,u where u= (r+step) mode q Example: at step=0 these multiplication done r=0: A 00 B 00,A 00 B 01,A 00 B 02 r=1:A 11 B 10,A 11 B 11,A 11 B 12 r=1:A 22 B 20,A 22 B 21,A 22 B 22 Other mults done in other steps Processes communicate to each other so the mult of two matrices results

40/44 Implementation of algorithm Assume each row of processes as a communicator Assume each column of processes as a communicator MPI_Cart_sub(Com, var_coor, row_com); MPI_Cart_sub(grid->Com, var_coor,col_com)); Can use other functions: (more general communicator cunstruction functions) MPI_Comm_incl(com,q,rank,row_comm) MPI_Comm_create(comm,row_com,&row_com)

41/44 Implementation of MPI An MPI implementation consists of a subroutine library with all MPI functions include files for the calling application program some startup script (usually called mpirun, but not standardized) MPICH Support both operating systems: linux and Microsaft Windows Other implementation of MPI: Many different MPI implementation are available i.e: LAM Support MPI programming on networks of unix workstation See other implementation and their features:

42/44 Implementation of MPI (Cont.) IMPI: Interoperable MPI A protocol specification to allow multiple MPI implementations to cooperate on a single MPI job. Any correct MPI program will run correctly under IMPI Divided into four parts: Startup/shutdown protocols Data transfer protocol Collective algorithm A centralized IMPI conformance testing methodology

43/44 Extensions to MPI External Interfaces One-sided Communication Dynamic Resource Management Extended Collective Bindings Real Time Some of these features are still subject to change

44/44 Question?