1 Lecture 4: Part 2: MPI Point-to-Point Communication.

Slides:



Advertisements
Similar presentations
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
Advertisements

1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
MPI Message Passing Interface
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
1 Implementing Master/Slave Algorithms l Many algorithms have one or more master processes that send tasks and receive results from slave processes l Because.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
1 Performance Modeling l Basic Model »Needed to evaluate approaches »Must be simple l Synchronization delays l Main components »Latency and Bandwidth »Load.
Reference: / Point-to-Point Communication.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Point-to-Point Communication Self Test with solution.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Lesson2 Point-to-point semantics Embarrassingly Parallel Examples.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
1 Message protocols l Message consists of “envelope” and data »Envelope contains tag, communicator, length, source information, plus impl. private data.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
Non-Blocking I/O CS550 Operating Systems. Outline Continued discussion of semaphores from the previous lecture notes, as necessary. MPI Types What is.
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg.
1 Choosing MPI Alternatives l MPI offers may ways to accomplish the same task l Which is best? »Just like everything else, it depends on the vendor, system.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
1 Using HPS Switch on Bassi Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
1 MPI Primer Lesson 10 2 What is MPI MPI is the standard for multi- computer and cluster message passing introduced by the Message-Passing Interface.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
MPI – Message Passing Interface Source:
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Its.unc.edu 1 University of North Carolina - Chapel Hill ITS Research Computing Instructor: Mark Reed Point to Point Communication.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Implementation and Optimization of MPI point-to-point communications on SMP-CMP clusters with RDMA capability.
CS4402 – Parallel Computing
MPI Point to Point Communication
Last Class: RPCs and RMI
Blocking / Non-Blocking Send and Receive Operations
A Message Passing Standard for MPP and Workstations
More Quiz Questions Parallel Programming MPI Non-blocking, synchronous, asynchronous message passing routines ITCS 4/5145 Parallel Programming, UNC-Charlotte,
Presentation transcript:

1 Lecture 4: Part 2: MPI Point-to-Point Communication

2 Realizing Message Passing Separate network from processor Separate user memory from system memory node 0 user system PE NI node 1 user system PE NI Network

3 Communication Modes for “Send” Blocking/Non-Blocking : Timing regarding the use of user message buffer Ready: Timing regarding the invocation of send and receive Buffered : User/System Buffer Allocation

4 Communication Modes for “Send” Synchronous/Asynchronous: Timing regarding the invocation of send and receive + the execution of receive operation local/non-local: completion independ/depend on the execution of another user process

5 Messaging Semantics SenderReceiver User-space System-space Blocking/nonblockin g Synchronous/asynchronous Ready Not Ready

6 Blocking/Non-blocking Send Blocking send: messaging command does not return until the message data have been safely stored away so that the sender is free to access and overwrite the send buffer. The message might be copied directly into the matching receive buffer. May be copied into a temporary system buffer, even no matching receive is invoked. Local (completion does not depend on the execution of another user process)

7 Blocking Receive -MPI_recv Return when receive is locally complete Message buffer can be read from after return

8 Nonblocking Send - MPI_Isend Non-blocking, asynchronous Does not block for receive ( Return “immediately”) Check for completion with MPI_Wait( ) before using buffer MPI_Wait( ) returns when message has been safely sent, not when it has been received.

9 Non-blocking Receive MPI_irecv Return “immediately” Message buffer should not be read from after return Must check for local completion MPI_wait (..): block until the communication is complete MPI_waitall: block until all communication operations in a given list have completed

10 Non-blocking Receive - MPI_Irecv MPI_Irecv(Buf, count,source, tag, comm, REQUEST,..): REQUEST can be used to query the status of the communication MPI_WAIT(REQUEST,status): return only if REQUEST is complete MPI_Waitall(count, array_of_request,..): wait for the completion of all REQUESTs in the array.

11 Nonblocking Communication Improve Performance by overlapping communication and computation You need intelligent communication interface (messaging co-processor used in SP2, Paragon, CS-2, Myrinet, ATM) startuptransferstartuptransfer startup Add computation

12 Ready Send -- MPI_Rsend( ) Receive must be posted before message arrives. Otherwise, the operation is erroneous and its outcome is undefined. Non-local (completion depends on the starting time of the receiving process) Overheads for synchronization.

13 Buffered Send -- MPI_Bsend( ) Explicitly buffers messages on sending side User allocates buffer by himself/herself ( MPI_BUFFER_ATTACH( )) Programmer likes to control the usage of buffer -- writing new communication libraries.

14 Buffered Send -- MPI_Bsend( ) user system PE NI user allocated buffer

15 Synchronous Send --MPI_Ssend( ) Does not return until message is actually received Send buffer can be reused if send operation completed Non-local (receiver must have received the message)

16 Standard Send -- MPI_Send( ) Standard Send: depends on the implementation (usually, synchronous, blocking, and non-local) Safe to reuse buffer when MPI_Send( ) returns May block until message is received (depends on implementation)

17 Standard Send -- MPI_Send( ) A good implementation short message: send immediately, buffer if no receive posted. Should try to reduce latency. Buffering is unimportant large message: use Rendezvous protocol (request-reply-send; wait for matching receive then send)

18 How to Exchange Data Simple (code in node 0) sid = MPI_Isend(buf1, node1) rid = MPI_Irecv(buf2, node1)..... computation call MPI_Wait(sid) call MPI_Wait(rid) For maximum performance ids(1) = MPI_Isend(buf1, node1) ids(2) = MPI_Irecv(buf2, node1)..... computation call MPI_Waitall(2, ids)

19 Model and Measure p2p communication in MPI data transfer time = latency + message size/bandwidth latency (T 0 ) is startup time, independent of message time (but depends on the communication mode/protocol) bandwidth (B) is number of bytes transferred per second (memory access rate + network transmission rate)

20 Latency and Bandwidth for short message: latency dominates transfer time for long message: the bandwidth term dominates transfer time Critical message size (n 1/2 ) = latency x bandwidth (let latency = message size/bandwidth)

21 Measure p2p performance Round-trip time (ping-pong) time/2 send recv send

22 Some MPI Performance Results

23 Protocols Rendezvous Eager Mixed Pull (get)

24 Rendezvous Algorithm: Sender sends request-to-send Receiver acknowledges Sender sends data No buffering required High latency (three-steps) High bandwidth (no extra buffer copy)

25 Eager Algorithm: Sender sends data immediately Usually must be buffered May be directly transferred if receive already posted Features: Low latency Low bandwidth (buffer copy)

26 Mixed Algorithm: Eager for short messages Rendezvous for long messages Switch protocols near n 1/2

27 Mixed Features: Low latency for latency-dominated (short) messages High bandwidth for bandwidth-dominated (long) messages Reasonable memory management Non-ideal performance for some messages near n 1/2

28 Pull (Get) Protocol One-side communication Used in shared memory machines

29 MPICH p2p on SGI Default : byte: Short, K: Eager, > 128KB: Rendezvous MPID_PKT_MAX_DATA_SIZE = 256 Short (fill data in the header)

30 Let MPID_PKT_MAX_DATA_SIZE = 256 Short Eager Rendezvous

31 MPI-FM (HPVM: Fast Messages) Performance One-way latency (µs) WorseBetter Bandwidth (MB/s) Worse Better HPVM Pwr. Chal. SP-2 T3E Origin 2K Beowulf Note: Supercomputer measurements taken by NAS, JPL, and HLRS (Germany)

32 MPI Collective Operations

MPI_Alltoall(v) MPI_Alltoall It is an extension of MPI_Allgather to the case where each process sends distinct data to each of the receivers. The j-th block of data sent from process i is received by process j and is placed in the i-th block of receive buffer of process j.

MPI_Alltoall(v) alltoalldata process Define i j be the i-th block of data of process j.

MPI_Alltoall(v) Current Implementation: Process j sends i j directly to process i Send bufferReceive buffer

MPI_Alltoall(v) Current Implementation: Process j sends i j directly to process i