Parallel Programming – Process- Based Communication Operations David Monismith CS599 Based upon notes from Introduction to Parallel Programming, Second.

Slides:



Advertisements
Similar presentations
Basic Communication Operations
Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
High Performance Computing
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
CS 684.
CS 584. Algorithm Analysis Assumptions n Consider ring, mesh, and hypercube. n Each process can either send or receive a single message at a time. n No.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Non-Blocking I/O CS550 Operating Systems. Outline Continued discussion of semaphores from the previous lecture notes, as necessary. MPI Types What is.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Introduction to Parallel Programming with C and MPI at MCSR Part 2 Broadcast/Reduce.
Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.
HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 MCSR Unix Camp.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
Parallel Programming with MPI By, Santosh K Jena..
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Parallel Programming & Cluster Computing MPI Collective Communications Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Basic Communication Operations Carl Tropper Department of Computer Science.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Implementing Processes and Threads CS550 Operating Systems.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Introduction to MPI Programming Ganesh C.N.
CS4402 – Parallel Computing
MPI Message Passing Interface
Send and Receive.
Parallel Programming with MPI and OpenMP
CS 584.
An Introduction to Parallel Programming with MPI
Collective Communication Operations
Send and Receive.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
More Quiz Questions Parallel Programming MPI Collective routines
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Parallel Programming – Process- Based Communication Operations David Monismith CS599 Based upon notes from Introduction to Parallel Programming, Second Edition by Grama, Gupta, Karypis, and Kumar and from CS550

Last time We reviewed the Scan pattern We will continue with OpenMP scheduling operations later in the course. For now, we are going to move on to MPI so we can make use of multi-process programming.

Interprocess Communication Often communication between processes is necessary. Communication may occur sporadically from one process to another. It may also occur in well defined patterns some of which are used collectively (by all processes). Collective patterns are frequently used in parallel algorithms.

Send and Receive (Abstract operations) Point to point (i.e. process to process) communication occurs as send and receive operations. send – send data from this process with to a process identified by rank. – Example: send(myMessage, rank) receive – receive data in this process from the process with identifier rank. – Example: receive(receivedMessage, rank)

MPI Message Passing Send and receive are implemented concretely in MPI using the MPI_Send and MPI_Recv functions. MPI - the message passing interface allows for communication (IPC) between running processes, even those using the same source code.

Using MPI Processes use MPI by using #include "mpi.h" or depending upon the system and MPI stack. MPI is started in a program using: MPI_Init(&argc, &argv); and ended with: MPI_Finalize(); These function almost like curly brackets to start and end the parallel portion of the program.

Using MPI on LittleFe Anything between the MPI_Init and MPI_Finalize statements runs in as many processes that are requested by " mpirun " at the command line. For example, on littlefe : mpirun -np 12 -machinefile machines-openmpi prog1.exe Runs 12 processes using the executable code from prog1.exe.

Try running MPI Hello World on LittleFe1 or LittleFe2

Using MPI on Stampede On Stampede, one specifies the number of tasks in a batch script using the –n operator. Example: –#SBATCH -n 32 – Specifies to use 32 tasks (MPI Processes, one per CPU core) After all options have been specified, an MPI program is started in the script using ibrun Example –ibrun prog1.exe

Identifying Processes in MPI The MPI_Comm_rank and MPI_Comm_size functions get the rank (process identifier) and number of processes (the value 12 after -np, and the value 32 on the previous slides). These were previously reviewed in class.

MPI Message Passing Messages are passed in MPI using MPI_Send and MPI_Recv MPI_Send - sends a message of a given size with a given type to a process with a specific rank. MPI_Recv - receives a message of a maximum size with a given type from a process with a specific rank. MPI_COMM_WORLD - the "world" in which the processes exist. This is a constant.

Sending and Receiving Messages MPI_Send and MPI_Recv have the following parameters: MPI_Send( pointer to message, message size, message type, process rank to send to, message tag or id, MPI_COMM_WORLD) MPI_Recv( pointer to variable used to receive, maximum recv size, message type, process rank to receive from, message tag or id, MPI_COMM_WORLD, MPI_STATUS_IGNORE)

MPI Types MPI_CHARMPI_LONG MPI_SHORTMPI_FLOAT MPI_INTMPI_DOUBLE Many other types exist These types are analogous to C primitive types See the MPI Reference Manual for more examples

Blocking I/O MPI_Send and MPI_Recv are blocking I/O operations. In blocking I/O, when a message is sent, a process waits until it has acknowledgement that the message has been received before it can continue processing. Similarly, when a message is requested (a receive method/function is called) the program waits until the message has been received before continuing processing.

Blocking I/O Example Process1 Process send msg |MPI_Send | > |MPI_Recv | |wait for ack| |wait for msg| |ack received| < |ack receipt | |3b.continue | 2.send ack |3a.continue |

Before we continue… Try #1 from worksheet 6, and DON’T PANIC!!! Most functional MPI programs can be implemented with only 6 functions: –MPI_Init –MPI_Finalize –MPI_Send –MPI_Recv –MPI_Comm_rank –MPI_Comm_size

Why are Send and Receive Important? MPI is not the only framework in which send and receive operations are used. Send and receive exist in Java, Android Services, iOS, Web Services (i.e. GET and POST), etc. It is likely that you have used these operations before and that you will use them again.

Collective Message Patterns We will investigate commonly used collective message communication patterns. Collective means that the functions representing these patterns must be called in ALL processes. These include: – Broadcast – Reduction – All-to-all – Scatter – Gather – Scan – And more Communication patterns on simple interconnect networks will also be covered for linear arrays, meshes, and hypercubes.

One to All Broadcast Send identical data from one process to all other processes or a subset thereof. Initially the root process only has the data (size m) After completing the operation, there are p copies of the data where p is the number of processes to which the data was broadcast Implemented by MPI_Bcast

All-to-One Reduction Each of p processes starts with a buffer B of size m Data from all processes is combined using an associative operator such as +, *, min, max, etc. Data is accumulated at a single process into one buffer B_reduce of size m Data element i of B_reduce is the sum, product, minimum, maximum, etc., of all of the ith elements of each original buffer B. This reduction is implemented by MPI_Reduce

Broadcasting On a ring or linear array, the naïve way to send data is to send p – 1 messages from the source to the other p – 1 processes. After the first message is sent, recursive doubling can be used to send the message to two processes. That is, the message can be sent from both the original source and the first destination to two additional processes. This algorithm can be repeated to reduce the number of steps required to broadcast to log(p) Note that on a linear array, the initial message must be sent to the farthest node, thereafter the distances are halved.

Mesh Communication on a mesh can be regarded as a extension of the linear array. A 2d mesh of p nodes consists of sqrt(p) linear arrays. Therefore, the first sqrt(p-1) messages can be sent from the root to those sqrt(p-1) nodes in the linear array. From there, messages may be sent in parallel to the remaining sqrt(p-1) linear arrays. A similar process can be carried out with a hypercube of size 2^d as it can be modeled as a d-dimensional mesh with 2 nodes per dimension. Therefore, on a hypercube, a broadcast may be carried out in d steps.

Hypercube Broadcast Algorithm one_to_all_bc(d, my_id, X) mask = 2^d – 1 //Set d bits of mask to 1 for i = d – 1 to 0 //Start loop mask = mask XOR 2^i //Set bit i of mask to 0 if(my_id AND mask == 0) //If lower i bits of my_id are 0 if(my_id AND 2^i == 0) dest = my_id XOR 2^i send X to dest else source = my_id XOR 2^i recv X from source endif endfor

All-to-All Broadcast and Reduction In an All-to-all Broadcast every process out of p processes simultaneously initiates a a broadcast. Each process sends the same message of size m to every other process, but different processes may broadcast different messages. This is useful in matrix multiplication and matrix-vector multiplication. Naïve implementations may take p times as long as the one-to-all broadcast. It is possible to implement the all-to-all algorithm in such a manner to take advantage of the interconnect network so all messages traversing the same path at the same time are concatenated. The dual operation of such a broadcast is an all-to-all reduction in which every node is the destination of an all-to-one reduction These operations are implemented via the MPI_Allgather (All-to-all broadcast) and MPI_Reduce_scatter (All-to-all reduction) operations.

Ring All to All Broadcast Consider a ring topology. All links can be kept busy until the all-to-all broadcast is complete. An algorithm for such a broadcast follows below. all_to_all_ring_bc(myId,myMsg, p, result) left = (myId-1) % p right = (myId+1) % p Result = myMsg msg = result for i = 1 to p-1 send msg to right recv msg from left result = concat(result, msg) endfor

Ring All to One Reduce Algorithm All_to_all_ring_reduce(myId, myMsg, p, result) left = (myId-1)%p right = (myId+1)%p recvVal = 0 for i = 1 to p-1 j = (myId + 1) % p temp = myMsg[j] + recvVal send temp to left recv recvVal from right endfor result = myMsg[myId] + recvVal

Mesh and Hypercube Implementations Mesh and Hypercube implementations can be constructed by expanding upon the linear array and ring algorithms, to carry these out in two steps. The hypercube algorithm is a generalization of the mesh algorithm to log(p) dimensions. It is important to realize that such implementations are used to take advantage of the existing interconnect networks on large scale systems.

Scatter and Gather Scatter and gather are personalized operations. Scatter – single node sends a unique message of size m to every other node. One to many personalized communication Gather – a single node collects unique messages from each node Implemented using MPI_Scatter and MPI_Gather respectively

MPI Operations One to All – MPI_Bcast All-to-one – MPI_Reduce All-to-all Broadcast – MPI_Allgather All-to-all Reduction – MPI_Reduce_scatter All-reduce – MPI_Allreduce Gather – MPI_Gather, MPI_Gatherv Scatter – MPI_Scatter, MPI_Scatterv All-to-all personalized – MPI_Alltoall Scan – MPI_Scan

Next Time: All-to-All Personalized Communication Total exchange Used in FFT, matrix transpose, sample sort, and parallel DB join operations Different algorithms exist for: – Linear Array – Mesh – Hypercube