Lecture 14: Inter-process Communication

Slides:



Advertisements
Similar presentations
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Advertisements

EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
CS 179: GPU Programming Lecture 20: Cross-system communication.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
1 Review –6 Basic MPI Calls –Data Types –Wildcards –Using Status Probing Asynchronous Communication Collective Communications Advanced Topics –"V" operations.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
CS 179: GPU Programming Lab 7 Recitation: The MPI/CUDA Wave Equation Solver.
An Introduction to MPI (message passing interface)
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI 2 Part II NPACI Parallel Computing Institute.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Lecture 5 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.
1 MPI: Message Passing Interface Prabhaker Mateti Wright State University.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Chapter 4.
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
Computer Science Department
Send and Receive.
CS 584.
An Introduction to Parallel Programming with MPI
Send and Receive.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Message Passing Models
Distributed Systems CS
CS 5334/4390 Spring 2017 Rogelio Long
A Message Passing Standard for MPP and Workstations
MPI: Message Passing Interface
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to parallelism and the Message Passing Interface
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Computer Science Department
5- Message-Passing Programming
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Lecture 14: Inter-process Communication CS 179: GPU Programming Lecture 14: Inter-process Communication

The Problem What if we want to use GPUs across a distributed system? GPU cluster, CSIRO

Distributed System A collection of computers Each computer has its own local memory! Communication over a network Communication suddenly becomes harder! (and slower!) GPUs can’t be trivially used between computers

We need some way to communicate between processes…

Message Passing Interface (MPI) A standard for message-passing Multiple implementations exist Standard functions that allow easy communication of data between processes Works on non-networked systems too! (Equivalent to memory copying on local system)

MPI Functions There are seven basic functions: MPI_Init initialize MPI environment MPI_Finalize terminate MPI environment MPI_Comm_size how many processes we have running MPI_Comm_rank the ID of our process MPI_Isend send data (nonblocking) MPI_Irecv receive data (nonblocking) MPI_Wait wait for request to complete

MPI Functions Some additional functions: MPI_Barrier wait for all processes to reach a certain point MPI_Bcast send data to all other processes MPI_Reduce receive data from all processes and reduce to a value MPI_Send send data (blocking) MPI_Recv receive data (blocking)

The workings of MPI Big idea: We have some process ID We know how many processes there are A send request/receive request pair are necessary to communicate data!

MPI Functions – Init int MPI_Init( int *argc, char ***argv ) Takes in pointers to command-line parameters (more on this later)

MPI Functions – Size, Rank int MPI_Comm_size( MPI_Comm comm, int *size ) int MPI_Comm_rank( MPI_Comm comm, int *rank ) Take in a “communicator” For us, this will be MPI_COMM_WORLD – essentially, our communication group contains all processes Take in a pointer to where we will store our size (or rank)

MPI Functions – Send Non-blocking send int MPI_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) Non-blocking send Program can proceed without waiting for return! (Can use MPI_Send for “blocking” send)

MPI Functions – Send Takes in… … A pointer to what we send int MPI_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) Takes in… A pointer to what we send Size of data Type of data (special MPI designations MPI_INT, MPI_FLOAT, etc) Destination process ID …

MPI Functions – Send Takes in… (continued) int MPI_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) Takes in… (continued) A “tag” – nonnegative integers 0-32767+ Uniquely identifies our operation – used to identify messages when many are sent simultaneously MPI_ANY_TAG allows all possible incoming messages A “communicator” (MPI_COMM_WORLD) A “request” object Holds information about our operation

MPI Functions - Recv Non-blocking receive (similar idea) int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) Non-blocking receive (similar idea) (Can use MPI_Irecv for “blocking” receive)

MPI Functions - Recv Takes in… … int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) Takes in… A buffer (pointer to where received data is placed) Size of incoming data Type of data (special MPI designations MPI_INT, MPI_FLOAT, etc) Sender’s process ID …

MPI Functions - Recv Takes in… A “tag” int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) Takes in… A “tag” Should be the same tag as the tag used to send message! (assuming MPI_ANY_TAG not in use) A “communicator” (MPI_COMM_WORLD) A “request” object

Blocking vs. Non-blocking MPI_Isend and MPI_Irecv are non-blocking communications Calling these functions returns immediately Operation may not be finished! Should use MPI_Wait to make sure operations are completed Special “request” objects for tracking status MPI_Send and MPI_Recv are blocking communications Functions don’t return until operation is complete Can cause deadlock! (we won’t focus on these)

MPI Functions - Wait Takes in… int MPI_Wait(MPI_Request *request, MPI_Status *status) Takes in… A “request” object corresponding to a previous operation Indicates what we’re waiting on A “status” object Basically, information about incoming data

MPI Functions - Reduce Takes in… int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) Takes in… A “send buffer” (data obtained from every process) A “receive buffer” (where our final result will be placed) Number of elements in send buffer Can reduce element-wise array -> array Type of data (MPI label, as before) …

MPI Functions - Reduce Takes in… (continued) int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) Takes in… (continued) Reducing operation (special MPI labels, e.g. MPI_SUM, MPI_MIN) ID of process that obtains result MPI communication object (as before)

MPI Example Two processes Sends a number from process 0 to process 1 int main(int argc, char **argv) { int rank, numprocs; MPI_Status status; MPI_Request request; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); int tag=1234; int source=0; int destination=1; int count=1; int send_buffer; int recv_buffer; if(rank == source){ send_buffer=5678; MPI_Isend(&send_buffer,count,MPI_INT,destination,tag, MPI_COMM_WORLD,&request); } if(rank == destination){ MPI_Irecv(&recv_buffer,count,MPI_INT,source,tag, MPI_COMM_WORLD,&request); MPI_Wait(&request,&status); printf("processor %d sent %d\n",rank,recv_buffer); printf("processor %d got %d\n",rank,recv_buffer); MPI_Finalize(); return 0; MPI Example Two processes Sends a number from process 0 to process 1 Note: Both processes are running this code!

Solving Problems Goal: To use GPUs across multiple computers! Two levels of division: Divide work between processes (using MPI) Divide work among a GPU (using CUDA, as usual) Important note: Single-program, multiple-data (SPMD) for our purposes i.e. running multiple copies of the same program!

Multiple Computers, Multiple GPUs Can combine MPI, CUDA streams/device selection, and single-GPU CUDA!

MPI/CUDA Example Suppose we have our “polynomial problem” again Suppose we have n processes for n computers Each with one GPU (Suppose array of data is obtained through process 0) Every process will run the following:

MPI/CUDA Example MPI_Init declare rank, numberOfProcesses set rank with MPI_Comm_rank set numberOfProcesses with MPI_Comm_size if process rank is 0: array <- (Obtain our raw data) for all processes idx = 0 through numberOfProcesses - 1: Send fragment idx of the array to process ID idx using MPI_Isend This fragment will have the size (size of array) / numberOfProcesses) declare local_data Read into local_data for our process using MPI_Irecv MPI_Wait for our process to finish receive request local_sum <- ... Process local_data with CUDA and obtain a sum declare global_sum set global_sum on process 0 with MPI_reduce (use global_sum somehow) MPI_Finalize

MPI/CUDA – Wave Equation Recall our calculation: 𝑦 𝑥,𝑡+1 =2 𝑦 𝑥,𝑡 − 𝑦 𝑥,𝑡−1 + 𝑐∆𝑡 ∆𝑥 2 (𝑦 𝑥+1,𝑡 −2 𝑦 𝑥,𝑡 + 𝑦 𝑥−1,𝑡 )

MPI/CUDA – Wave Equation Big idea: Divide our data array between n processes!

MPI/CUDA – Wave Equation In reality: We have three regions of data at a time (old, current, new)

MPI/CUDA – Wave Equation Calculation for timestep t+1 uses the following data: 𝑦 𝑥,𝑡+1 =2 𝑦 𝑥,𝑡 − 𝑦 𝑥,𝑡−1 + 𝑐∆𝑡 ∆𝑥 2 (𝑦 𝑥+1,𝑡 −2 𝑦 𝑥,𝑡 + 𝑦 𝑥−1,𝑡 ) t+1 t t-1

MPI/CUDA – Wave Equation Problem if we’re at the boundary of a process! 𝑦 𝑥,𝑡+1 =2 𝑦 𝑥,𝑡 − 𝑦 𝑥,𝑡−1 + 𝑐∆𝑡 ∆𝑥 2 (𝑦 𝑥+1,𝑡 −2 𝑦 𝑥,𝑡 + 𝑦 𝑥−1,𝑡 ) t+1 t t-1 x Where do we get 𝑦 𝑥−1,𝑡 ? (It’s outside our process!)

Wave Equation – Simple Solution After every time-step, each process gives its leftmost and rightmost piece of “current” data to neighbor processes! Proc0 Proc1 Proc2 Proc3 Proc4

Wave Equation – Simple Solution Pieces of data to communicate: Proc0 Proc1 Proc2 Proc3 Proc4

Wave Equation – Simple Solution Can do this with MPI_Irecv, MPI_Isend, MPI_Wait: Suppose process has rank r: If we’re not the rightmost process: Send data to process r+1 Receive data from process r+1 If we’re not the leftmost process: Send data to process r-1 Receive data from process r-1 Wait on requests

Wave Equation – Simple Solution Boundary conditions: Use MPI_Comm_rank and MPI_Comm_size Rank 0 process will set leftmost condition Rank (size-1) process will set rightmost condition

Simple Solution – Problems Communication can be expensive! Expensive to communicate every timestep to send 1 value! Better solution: Send some m values every m timesteps!