Presentation is loading. Please wait.

Presentation is loading. Please wait.

12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Similar presentations


Presentation on theme: "12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008."— Presentation transcript:

1 12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008

2 12c.2 Barrier A barrier is a way to synchronize all (or a subset) of the processors. When processors reach the MPI_Barrier(), they block until all processors have reach the same barrier All processors should call the barrier function, or else you have a deadlock Syntax: MPI_Barrier(MPI_COMM_WORLD);

3 12c.3 Barrier Example: MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv1, NULL); }... // Do some work MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv2, NULL); }

4 12c.4 Broadcast A broadcast is when one processor needs to send the same information to all (or a subset) of the other processors Syntax: MPI_Bcast (buffer, count, datatype, root, MPI_COMM_WORLD)‏ buffer, count, datatype are the same as with MPI_Send()‏ root is the id of the thread initiating the broadcast

5 12c.5 Broadcast Example: int N = ___; float b = ____; float a[N]; MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast (&b, 1, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Bcast (a, N, MPI_FLOAT, 0, MPI_COMM_WORLD);

6 12c.6 Broadcast All processors participating in the broadcast (whether they are the source or a destination) must call the broadcast function with the same parameters or else it won't work The runtime of a broadcast is O(log(p)) instead of O(p), where p is the number of processors, as it would be if the root send the data to each processor in turn

7 12c.7 Broadcast 0 Communication Non communication 04 0426 04261537

8 12c.8 Reduction A Reduction is where an array of values is reduced to a single value by applying a binary (usually commutative) operators.

9 12c.9 Reduction P1P1 P0P0 P3P3 P2P2 P5P5 P4P4 P7P7 P6P6 + 371648273443211 5316502117444311 1031650 2 160 444311 26316502160444311 +++ ++ + Communication Non communication

10 12c.10 Reduction Syntax: MPI_Reduce(sendbuf, recvbuf, count, MPI_Datatype, MPI_Op, root, MPI_Comm)‏ sendbuf, count, datatype, and MPI_Comm are the same as with MPI_Send() and Bcast()‏ root is the id of the thread which will posses the final value MPI_Op is one of the constants on previous slide

11 12c.11 Reduction Example: int x, y; // Each processor has a different // value for x MPI_Reduce(&x, &y, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); The root thread (0) has the sum of all x's in the variable y

12 12c.12 Reduction Example: int x[N], y[N]; // Each processor has different // values in the array x MPI_Reduce(x, y, N, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); The root thread (0) has the sum of all x[0]'s in y[0], the sum of all x[1]'s in y[1],...

13 12c.13 Reduction All processors participating in the reduction (whether they are the source or a destination) must call the Reduce function with the same parameters or else it won't work The runtime of a reduction is O(log(p)) instead of O(p), where p is the number of processors

14 12c.14 Reduction 0 Communication Non communication 04 0426 04261537

15 12c.15 Scatter/Gather Scatter sends parts of an array from the root to each processors Syntax: MPI_Scatter(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)‏ Gather brings together parts of an array of different processors to the root Syntax: MPI_Gather(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)‏

16 12c.16 } Scatter P1P1 P0P0 P3P3 P2P2 P0P0 }} } 371648273443211 371648273443211

17 12c.17 } Gather P1P1 P0P0 P3P3 P2P2 P0P0 }}} 371648273443211 371648273443211

18 12c.18 Scatter/Gather float a[N], localA[N];... if (mypid == 0) { printf (" : a = ",mypid); for (i = 0; i < N; i++)‏ printf ("%f ", a[i]); printf ("\n"); }

19 12c.19 Scatter/Gather blksz = (int) ceil (((float) N)/P); MPI_Scatter(a, blksz, MPI_FLOAT, &localA[0], blksz, MPI_FLOAT, 0, MPI_COMM_WORLD);

20 12c.20 Scatter/Gather for (i = 0; i < blksz; i++)‏ printf (" : localA = %.2f\n", mypid, localA[i]); for (i = 0; i < blksz; i++)‏ localA[i] += mypid; for (i = 0; i < blksz; i++)‏ printf (" : new localA =%.2f\n", mypid, localA[i]);

21 12c.21 Scatter/Gather MPI_Gather(&localA[0], blksz, MPI_FLOAT, a, blksz, MPI_FLOAT, 0, MPI_COMM_WORLD); if (mypid == 0) { printf (" : A = ",mypid); for (i = 0; i < N; i++)‏ printf ("%f ", a[i]); printf ("\n"); }

22 12c.22 Scatter/Gather $ mpirun -nolocal -np 3 mpiGatherScatter 6 : A = 84.019997 39.439999 78.309998 79.839996 91.160004 19.760000 : localA = 84.02 : localA = 39.44 : new localA = 84.02 : new localA = 39.44

23 12c.23 Scatter/Gather : localA = 78.31 : localA = 79.84 : new localA = 79.31 : new localA = 80.84 : localA = 91.16 : localA = 19.76 : new localA = 93.16 : new localA = 21.76 : A = 84.019997 39.439999 79.309998 80.839996 93.160004 21.760000

24 12c.24 For further reading Man pages on MPI Routines: –http://www-unix.mcs.anl.gov/mpi/www/ www3/ Barry Wilkinson and Michael Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall, Upper Saddle River, NJ, 1999, ISBN 0-13-671710-1 Peter S. Panache, Parallel Programming with MPI, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 1997, ISBN 1-55860-339-5


Download ppt "12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008."

Similar presentations


Ads by Google