12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008
12c.2 Barrier A barrier is a way to synchronize all (or a subset) of the processors. When processors reach the MPI_Barrier(), they block until all processors have reach the same barrier All processors should call the barrier function, or else you have a deadlock Syntax: MPI_Barrier(MPI_COMM_WORLD);
12c.3 Barrier Example: MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv1, NULL); }... // Do some work MPI_Barrier(MPI_COMM_WORLD); if (mypid == 0) { gettimeofday(&tv2, NULL); }
12c.4 Broadcast A broadcast is when one processor needs to send the same information to all (or a subset) of the other processors Syntax: MPI_Bcast (buffer, count, datatype, root, MPI_COMM_WORLD) buffer, count, datatype are the same as with MPI_Send() root is the id of the thread initiating the broadcast
12c.5 Broadcast Example: int N = ___; float b = ____; float a[N]; MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast (&b, 1, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Bcast (a, N, MPI_FLOAT, 0, MPI_COMM_WORLD);
12c.6 Broadcast All processors participating in the broadcast (whether they are the source or a destination) must call the broadcast function with the same parameters or else it won't work The runtime of a broadcast is O(log(p)) instead of O(p), where p is the number of processors, as it would be if the root send the data to each processor in turn
12c.7 Broadcast 0 Communication Non communication
12c.8 Reduction A Reduction is where an array of values is reduced to a single value by applying a binary (usually commutative) operators.
12c.9 Reduction P1P1 P0P0 P3P3 P2P2 P5P5 P4P4 P7P7 P6P Communication Non communication
12c.10 Reduction Syntax: MPI_Reduce(sendbuf, recvbuf, count, MPI_Datatype, MPI_Op, root, MPI_Comm) sendbuf, count, datatype, and MPI_Comm are the same as with MPI_Send() and Bcast() root is the id of the thread which will posses the final value MPI_Op is one of the constants on previous slide
12c.11 Reduction Example: int x, y; // Each processor has a different // value for x MPI_Reduce(&x, &y, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); The root thread (0) has the sum of all x's in the variable y
12c.12 Reduction Example: int x[N], y[N]; // Each processor has different // values in the array x MPI_Reduce(x, y, N, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); The root thread (0) has the sum of all x[0]'s in y[0], the sum of all x[1]'s in y[1],...
12c.13 Reduction All processors participating in the reduction (whether they are the source or a destination) must call the Reduce function with the same parameters or else it won't work The runtime of a reduction is O(log(p)) instead of O(p), where p is the number of processors
12c.14 Reduction 0 Communication Non communication
12c.15 Scatter/Gather Scatter sends parts of an array from the root to each processors Syntax: MPI_Scatter(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm) Gather brings together parts of an array of different processors to the root Syntax: MPI_Gather(send_data, send_count, send_type, recv_data, recv_count, recv_type, root, MPI_Comm)
12c.16 } Scatter P1P1 P0P0 P3P3 P2P2 P0P0 }} }
12c.17 } Gather P1P1 P0P0 P3P3 P2P2 P0P0 }}}
12c.18 Scatter/Gather float a[N], localA[N];... if (mypid == 0) { printf (" : a = ",mypid); for (i = 0; i < N; i++) printf ("%f ", a[i]); printf ("\n"); }
12c.19 Scatter/Gather blksz = (int) ceil (((float) N)/P); MPI_Scatter(a, blksz, MPI_FLOAT, &localA[0], blksz, MPI_FLOAT, 0, MPI_COMM_WORLD);
12c.20 Scatter/Gather for (i = 0; i < blksz; i++) printf (" : localA = %.2f\n", mypid, localA[i]); for (i = 0; i < blksz; i++) localA[i] += mypid; for (i = 0; i < blksz; i++) printf (" : new localA =%.2f\n", mypid, localA[i]);
12c.21 Scatter/Gather MPI_Gather(&localA[0], blksz, MPI_FLOAT, a, blksz, MPI_FLOAT, 0, MPI_COMM_WORLD); if (mypid == 0) { printf (" : A = ",mypid); for (i = 0; i < N; i++) printf ("%f ", a[i]); printf ("\n"); }
12c.22 Scatter/Gather $ mpirun -nolocal -np 3 mpiGatherScatter 6 : A = : localA = : localA = : new localA = : new localA = 39.44
12c.23 Scatter/Gather : localA = : localA = : new localA = : new localA = : localA = : localA = : new localA = : new localA = : A =
12c.24 For further reading Man pages on MPI Routines: – www3/ Barry Wilkinson and Michael Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall, Upper Saddle River, NJ, 1999, ISBN Peter S. Panache, Parallel Programming with MPI, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 1997, ISBN