A Message Passing Standard for MPP and Workstations

Slides:



Advertisements
Similar presentations
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Advertisements

1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
1 The Message-Passing Model l A process is (traditionally) a program counter and address space. l Processes may have multiple threads (program counters.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Introduction to MPI Programming Ganesh C.N.
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
CS 584.
An Introduction to Parallel Programming with MPI
Distributed Systems CS
More on MPI Nonblocking point-to-point routines Deadlock
MPI-Message Passing Interface
Distributed Systems CS
Lecture 14: Inter-process Communication
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Quiz Questions ITCS 4145/5145 Parallel Programming MPI
Introduction to parallelism and the Message Passing Interface
More on MPI Nonblocking point-to-point routines Deadlock
Barriers implementations
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Hello, world in MPI #include <stdio.h> #include "mpi.h"
5- Message-Passing Programming
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker

Message Passing Interface (MPI) Message passing library Can be added to sequential languages (C, Fortran) Designed by consortium from industry, academics, government Goal is a message passing standard

MPI Programming Model Multiple Program Multiple Data (MPMD) Processors may execute different programs (unlike SPMD) Number of processes is fixed (one per processor) No support for multi-threading Point-to-point and collective communication

MPI Basics MPI_INIT initialize MPI MPI_FINALIZE terminate computation MPI_COMM SIZE number of processes MPI_COMM RANK my process identifier MPI_SEND send a message MPI_RECV receive a message

Language Bindings Describes for a given base language concrete syntax error-handling conventions parameter modes Popular base languages: C Fortran

Point-to-point message passing Messages sent from one processor to another are FIFO ordered Messages sent by different processors arrive non-deterministically Receiver may specify source source = sender's identity => symmetric naming source = MPI_ANY_SOURCE => asymmetric naming example: specify sender of next pivot row in ASP Receiver may also specify tag Distinguishes different kinds of messages Similar to operation name in SR

Examples (1/2) int x, status; float buf[10]; MPI_SEND (buf, 10, MPI_FLOAT, 3, 0, MPI_COMM_WORLD); /* send 10 floats to process 3; MPI_COMM_WORLD = all processes */ MPI_RECV (&x, 1, MPI_INT, 15, 0, MPI_COMM_WORLD, &status); /* receive 1 integer from process 15 */ MPI_RECV (&x, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status); /* receive 1 integer from any process */

Examples (2/2) int x, status; #define NEW_MINIMUM 1 MPI_SEND (&x, 1, MPI_INT, 3, NEW_MINIMUM, MPI_COMM_WORLD); /* send message with tag NEW_MINIMUM */. MPI_RECV (&x, 1, MPI_INT, 15, NEW_MINIMUM, MPI_COMM_WORLD, &status); /* receive 1 integer with tag NEW_MINIMUM */ MPI_RECV (&x, 1, MPI_INT, MPI_ANY_SOURCE, NEW_MINIMUM, MPI_COMM_WORLD, &status); /* receive tagged message from any source */

Forms of message passing MPI has a very wide and complex variety of primitives Programmer can decide whether to: minimize copying overhead -> synchronous and ready-mode sends minimize idle time, overlap communication & computation -> buffered sends and nonblocking sends Communication modes control the buffering (Non)blocking determines when sends complete

Communication modes Standard: Buffered: Synchronous: Ready: Programmer cannot make any assumptions whether the message is buffered, this is up to the system Buffered: Programmer provides (bounded) buffer space SEND completes when message is copied into buffer (local) Erroneous if buffer space is insufficient Synchronous: Send waits for matching receive No buffering -> easy to get deadlocks Ready: Programmer asserts that receive has already been posted Erroneous if there is no matching receive yet

Unsafe programs MPI does not guarantee any system buffering Programs that assume it are unsafe and may deadlock Example of such a deadlock: Machine 0: MPI_SEND (&x1, 10, MPI_INT, 1, 0, MPI_COMM_WORLD); MPI_RECV (&x2, 10, MPI_INT, 1, 0, MPI_COMM_WORLD, &status); Machine 1: MPI_SEND (&y1, 10, MPI_INT, 0, 0, MPI_COMM_WORLD); MPI_RECV (&y2, 10, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);

(Non)blocking sends A blocking send returns when it’s safe to modify its arguments A non-blocking ISEND returns immediately (dangerous) int buf[10]; request = MPI_ISEND (buf, 10, MPI_FLOAT, 3, 0, MPI_COMM_WORLD); compute(); /* these computations can be overlapped with the transmission */ buf[2]++; /* dangerous: may or may not affect the transmitted buf */ MPI_WAIT(&request, & status); /* waits until the ISEND completes */ Don’t confuse this with synchronous vs asynchronous ! synchronous = wait for (remote) receiver nonblocking = wait till arguments have been saved (locally)

Non-blocking receive MPI_IPROBE check for pending message MPI_PROBE wait for pending message MPI_GET_COUNT number of data elements in message MPI_PROBE (source, tag, comm, &status) => status MPI_GET_COUNT (status, datatype, &count) => message size status.MPI_SOURCE => identity of sender status.MPI_TAG => tag of message

Example: Check for Pending Message int buf[1], flag, source, minimum; while ( ...) { MPI_IPROBE(MPI_ANY_SOURCE, NEW_MINIMUM, comm, &flag, &status); if (flag) { /* handle new minimum */ source = status.MPI_SOURCE; MPI_RECV (buf, 1, MPI_INT, source, NEW_MINIMUM, comm, &status); minimum = buf[0]; } ... /* compute */

Example: Receiving Message with Unknown Size int count, *buf, source; MPI_PROBE(MPI_ANY_SOURCE, 0, comm, &status); source = status.MPI_SOURCE; MPI_GET_COUNT (status, MPI_INT, &count); buf = malloc (count * sizeof (int)); MPI_RECV (buf, count, MPI_INT, source, 0, comm, &status);

Global Operations - Collective Communication Coordinated communication involving all processes Functions: MPI_BARRIER synchronize all processes MPI_BCAST send data to all processes MPI_GATHER gather data from all processes MPI_SCATTER scatter data to all processes MPI_REDUCE reduction operation MPI_REDUCE ALL reduction, all processes get result

Barrier MPI_BARRIER (comm) Synchronizes group of processes All processes block until all have reached the barrier Often invoked at end of loop in iterative algorithms

Figure 8.3 from Foster's book

Reduction Combine values provided by different processes Result sent to one processor (MPI REDUCE) or all processors (MPI REDUCE ALL) Used with commutative and associative operators: MAX, MIN, +, x , AND, OR

Example 1 Global minimum operation MPI_REDUCE (inbuf, outbuf, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD) outbuf[0] = minimum over inbuf[0]'s outbuf[1] = minimum over inbuf[1]'s

Figure 8.4 from Foster's book

Example 2: SOR in MPI

SOR communication scheme Each CPU communicates with left & right neighbor (if existing) Also need to determine convergence criteria

Expressing SOR in MPI Use a ring topology Each processor exchanges rows with left/right neighbor Use REDUCE_ALL to determine if grid has changed less than epsilon during last iteration

Figure 8.5 from Foster's book

Semantics of collective operations Blocking operations: It’s safe to reuse buffers after they return Standard mode only: Completion of a call does not guarantee that other processes have completed the operation A collective operation may or may not have the effect of synchronizing all processes

Modularity MPI programs use libraries Library routines may send messages These messages should not interfere with application messages Tags do not solve this problem

Communicators Communicator denotes group of processes (context) MPI_SEND and MPI_RECV specify a communicator MPI_RECV can only receive messages sent to same communicator Library routines should use separate communicators, passed as parameter

Discussion Library-based: No language modifications No compiler Syntax is awkward Message receipt based on identity of sender and operation tag, but not on contents of message Needs separate mechanism for organizing name space No type checking of messages

Syntax SR: call slave.coordinates(2.4, 5.67); in coordinates (x, y); MPI: #define COORDINATES_TAG 1 #define SLAVE_ID 15 float buf[2]; buf[0] = 2.4; buf[1] = 5.67; MPI_SEND (buf, 2, MPI_FLOAT, SLAVE_ID, COORDINATES_TAG, MPI_COMM_WORLD); MPI_RECV (buf, 2, MPI_FLOAT, MPI_ANY_SOURCE, COORDINATES_TAG, MPI_COMM_WORLD, &status);