12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Slides:



Advertisements
Similar presentations
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Advertisements

MPI Collective Communications
MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.
MPI_AlltoAllv Function Outline int MPI_Alltoallv ( void *sendbuf, int *sendcnts, int *sdispls, MPI_Datatype sendtype, void *recvbuf, int *recvcnts, int.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
SOME BASIC MPI ROUTINES With formal datatypes specified.
MPI Workshop - II Research Staff Week 2 of 3.
Collective Communications
12d.1 Two Example Parallel Programs using MPI UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007.
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Chapter 6 Parallel Sorting Algorithm Sorting Parallel Sorting Bubble Sort Odd-Even (Transposition) Sort Parallel Odd-Even Transposition Sort Related Functions.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Parallel Programming with MPI By, Santosh K Jena..
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Task/ChannelMessage-passing TaskProcess Explicit channelsMessage communication.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
Message Passing Interface Using resources from
Lecture 3: Today’s topics MPI Broadcast (Quinn Chapter 5) –Sieve of Eratosthenes MPI Send and Receive calls (Quinn Chapter 6) –Floyd’s algorithm Other.
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
CS4402 – Parallel Computing
Introduction to MPI Programming
Computer Science Department
Send and Receive.
Collective Communication with MPI
An Introduction to Parallel Programming with MPI
Collective Communication Operations
Send and Receive.
Distributed Systems CS
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Systems CS
Lecture 14: Inter-process Communication
High Performance Parallel Programming
A Message Passing Standard for MPP and Workstations
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Barriers implementations
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Synchronizing Computations
Computer Science Department
5- Message-Passing Programming
Presentation transcript:

12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

12c.2 Barrier A barrier is a way to synchronize all (or a subset) of the processors. When processors reach the MPI_Barrier(), they block until all processors have reach the same barrier All processors should call the barrier function, or else you have a deadlock Syntax: MPI_Barrier(MPI_COMM_WORLD);

12c.3 Barrier Example: MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv1, NULL); }... // Do some work MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv2, NULL); }

12c.4 Broadcast A broadcast is when one processor needs to send the same information to all (or a subset) of the other processors Syntax: MPI_Bcast (buffer, count, datatype, root, MPI_COMM_WORLD)‏ buffer, count, datatype are the same as with MPI_Send()‏ root is the id of the thread initiating the broadcast

12c.5 Broadcast Example: int N = ___; float b = ____; float a[N]; MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast (&b, 1, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Bcast (a, N, MPI_FLOAT, 0, MPI_COMM_WORLD);

12c.6 Broadcast All processors participating in the broadcast (whether they are the source or a destination) must call the broadcast function with the same parameters or else it won't work The runtime of a broadcast is O(log(p)) instead of O(p), where p is the number of processors, as it would be if the root send the data to each processor in turn

12c.7 Broadcast 0 Communication Non communication

12c.8 Reduction A Reduction is where an array of values is reduced to a single value by applying a binary (usually commutative) operators.

12c.9 Reduction P1P1 P0P0 P3P3 P2P2 P5P5 P4P4 P7P7 P6P Communication Non communication

12c.10 Reduction Syntax: MPI_Reduce(sendbuf, recvbuf, count, MPI_Datatype, MPI_Op, root, MPI_Comm)‏ sendbuf, count, datatype, and MPI_Comm are the same as with MPI_Send() and Bcast()‏ root is the id of the thread which will posses the final value MPI_Op is one of the constants on previous slide

12c.11 Reduction Example: int x, y; // Each processor has a different // value for x MPI_Reduce(&x, &y, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); The root thread (0) has the sum of all x's in the variable y

12c.12 Reduction Example: int x[N], y[N]; // Each processor has different // values in the array x MPI_Reduce(x, y, N, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); The root thread (0) has the sum of all x[0]'s in y[0], the sum of all x[1]'s in y[1],...

12c.13 Reduction All processors participating in the reduction (whether they are the source or a destination) must call the Reduce function with the same parameters or else it won't work The runtime of a reduction is O(log(p)) instead of O(p), where p is the number of processors

12c.14 Reduction 0 Communication Non communication

12c.15 Scatter/Gather Scatter sends parts of an array from the root to each processors Syntax: MPI_Scatter(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)‏ Gather brings together parts of an array of different processors to the root Syntax: MPI_Gather(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)‏

12c.16 } Scatter P1P1 P0P0 P3P3 P2P2 P0P0 }} }

12c.17 } Gather P1P1 P0P0 P3P3 P2P2 P0P0 }}}

12c.18 Scatter/Gather float a[N], localA[N];... if (mypid == 0) { printf (" : a = ",mypid); for (i = 0; i < N; i++)‏ printf ("%f ", a[i]); printf ("\n"); }

12c.19 Scatter/Gather blksz = (int) ceil (((float) N)/P); MPI_Scatter(a, blksz, MPI_FLOAT, &localA[0], blksz, MPI_FLOAT, 0, MPI_COMM_WORLD);

12c.20 Scatter/Gather for (i = 0; i < blksz; i++)‏ printf (" : localA = %.2f\n", mypid, localA[i]); for (i = 0; i < blksz; i++)‏ localA[i] += mypid; for (i = 0; i < blksz; i++)‏ printf (" : new localA =%.2f\n", mypid, localA[i]);

12c.21 Scatter/Gather MPI_Gather(&localA[0], blksz, MPI_FLOAT, a, blksz, MPI_FLOAT, 0, MPI_COMM_WORLD); if (mypid == 0) { printf (" : A = ",mypid); for (i = 0; i < N; i++)‏ printf ("%f ", a[i]); printf ("\n"); }

12c.22 Scatter/Gather $ mpirun -nolocal -np 3 mpiGatherScatter 6 : A = : localA = : localA = : new localA = : new localA = 39.44

12c.23 Scatter/Gather : localA = : localA = : new localA = : new localA = : localA = : localA = : new localA = : new localA = : A =

12c.24 For further reading Man pages on MPI Routines: – www3/ Barry Wilkinson and Michael Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall, Upper Saddle River, NJ, 1999, ISBN Peter S. Panache, Parallel Programming with MPI, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 1997, ISBN