Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.


Similar presentations
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)

1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
MPI Message Passing Interface
1 Computer Science, University of Warwick Accessing Irregularly Distributed Arrays Process 0’s data arrayProcess 1’s data arrayProcess 2’s data array Process.
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
The Building Blocks: Send and Receive Operations
Asynchronous I/O with MPI Anthony Danalis. Basic Non-Blocking API  MPI_Isend()  MPI_Irecv()  MPI_Wait()  MPI_Waitall()  MPI_Waitany()  MPI_Test()
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Lesson2 Point-to-point semantics Embarrassingly Parallel Examples.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
Collective Communication
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
CSE 160 – Lecture 16 MPI Concepts, Topology and Synchronization.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
MPI Derived Data Types and Collective Communication
Message Passing Interface Using resources from
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
MPI Point to Point Communication
Computer Science Department
Distributed Systems CS
More on MPI Nonblocking point-to-point routines Deadlock
Distributed Systems CS
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
More on MPI Nonblocking point-to-point routines Deadlock
Barriers implementations
Synchronizing Computations
September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Computer Science Department
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Presentation transcript:

Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived from chapters 13 in Pacheco

Parallel Processing2 Introduction Objective: To further examine message passing communication patterns. Topics –Implementing Allgather Ring Hypercube –Non-blocking send/recv MPI_Isend MPI_Wait MPI_Test

Parallel Processing3 Broadcast/Reduce Ring P3P2 P1 P0 P3P2 P1 P0 P3P2 P1 P0 P3P2 P1 P0

Parallel Processing4 Bi-directional Broadcast Ring P3P2 P1 P0 P3P2 P1 P0 P3P2 P1 P0

Parallel Processing5 Allgather Ring x3x2 x0x1 P3P2 P1 P0 x2,x3x1,x2 x0,x3x0,x1 P3P2 P1 P0 x1,x2,x3 x0,x2,x3 P3P2 P1 P0 x0,x1,x2,x3 P3P2 P1 P0 x0,x1,x2 x0,x1,x3 x0,x1,x2,x3

Parallel Processing6 AllGather int MPI_AllGather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1 x2 x3

Parallel Processing7 Allgather_ring void Allgather_ring(float x[], int blocksize, float y[], MPI_Comm comm) { int i, p, my_rank; int successor, predecessor; int send_offset, recv_offset; MPI_Status status; MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; successor = (my_rank + 1) % p; predecessor = (my_rank – 1 + p) % p;

Parallel Processing8 Allgather_ring for (i=0; i < p-1; i++) { send_offset = ((my_rank – i + p) % p)*blocksize; recv_offset = ((my_rank –i – 1+p) % p)*blocksize; MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm); MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&status); }

Parallel Processing9 Hypercube Graph (recursively defined) n-dimensional cube has 2 n nodes with each node connected to n vertices Binary labels of adjacent nodes differ in one bit

Parallel Processing Broadcast/Reduce

Parallel Processing Allgather

Parallel Processing12 Allgather

Parallel Processing13 Allgather_cube void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) { int i, d, p, my_rank; unsigned eor_bit, and_bits; int stage, partner; MPI_Datatype hole_type; int send_offset, recv_offset; MPI_Status status; int log_base2(int p); MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1;

Parallel Processing14 Allgather_cube for (stage = 0; stage < d; stage++) { partner = my_rank ^ eor_bit; send_offset = (my_rank & and_bits) * blocksize; recv_offset = (partner & and_bits)*blocksize; MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage))*blocksize, MPI_FLOAT,&hold_type); MPI_Type_commit(&hole_type); MPI_Send(y+send_offset,1,hole_type,partner, 0, comm); MPI_Recv(y+recv_offset,1,hole_type,partner, 0, comm,&status); MPI_Type_free(&hole_type); eor_bit = eor_bit >> 1; and_bits = and_bits >> 1; }

Parallel Processing15 Buffering Assumption Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur. SendRecv can be used to guarantee that deadlock does not occur.

Parallel Processing16 SendRecv int MPI_Sendrecv( void* send_buf /* in */, int send_count /* in */, MPI_Datatype send_type /* in */, int dest /* in */, int send_tag /* in */, void* recv_buf /* out */, int recv_count /* in */, MPI_Datatype recv_type /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */)

Parallel Processing17 SendRecvReplace int MPI_Sendrecv_replace( void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int dest /* in */, int send_tag /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */)

Parallel Processing18 Nonblocking Send/Recv Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur. The communication is posted and can be tested later for completion int MPI_Isend( /* Immediate */ void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int dest /* in */, int tag /* in */, MPI_Comm comm /* in */, MPI_Request* request /* out */)

Parallel Processing19 Nonblocking Send/Recv int MPI_Irecv( void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int source /* in */, int tag /* in */, MPI_Comm comm /* in */, MPI_Request* request /* out */) int MPI_Wait( MPI_Request* request /* in/out a*/, MPI_Status* status /* out */) int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status);

Parallel Processing20 Allgather_ring (Overlapped) recv_offset = ((my_rank –1 + p) % p)*blocksize; for (i=0; i < p-1; i++) { MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm, &send_request); MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&recv_request); send_offset = ((my_rank – i -1 + p) % p)*blocksize; recv_offset = ((my_rank – i – 2 +p) % p)*blocksize; MPI_Wait(&send_request, &status); MPI_Wait(&recv_request, &status); }

Parallel Processing21 AllGather int MPI_AllGather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1 x2 x3

Parallel Processing22 Alltoall int MPI_Alltoall( void* send_buffer /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_buffer /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process

Parallel Processing23 AlltoAll Sequence of permutations implemented with send_recv

Parallel Processing24 AlltoAll (2 way) Sequence of permutations implemented with send_recv

Parallel Processing25 Communication Modes Synchronous (wait for receive) Ready (make sure receive has been posted) Buffered (user provides buffer space)