12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007.

12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007

12e.2 Block Mapping (Review) blksz = (int)ceil((float)N / P); for (i = lb + my_rank * blksz; i < min(N, lb + (my_rank + 1) * blksz); i++) {... } (lb is the lower bound of the original loop)

12e.3 Example for (i = 1; i < N; i++) { for (j = 0; j < N; j++) { a[i][j] += f(a[i-1][j]); }

12e.4 Example 0,00,10,20,3 0,N-1... 1,0 1,1 1,2 1,31,N-1... 2,02,12,22,3 2,N-1... N-1,0N-1,1N-1,2N-1,3N-1,N-1... j i

12e.5 Example If we mapped iterations of the i loop to processors, the dependencies cross processors boundaries Therefore interprocessor communication would be required

12e.6 N-1,N-1 Example 0,00,10,20,3 0,N-1... 1,0 1,1 1,2 1,31,N-1... 2,02,12,22,3 2,N-1... N-1,0N-1,1N-1,2N-1,3... PE 0 : PE 1 : PE 2 : PE P :

12e.7 Example A better solution would be to map iterations of the j loop to processors

12e.8 N-1,N-1 Example 0,00,10,20,3 0,N-1... 1,0 1,1 1,2 1,31,N-1... 2,02,12,22,3 2,N-1... N-1,0N-1,1N-1,2N-1,3... PE 0 : PE 1 : PE 2 : PE 3 :

12e.9 Example for (i = 1; i < N; i++) { for (j = my_rank * blksz; i < min(N, (my_rank + 1) * blksz); i++) { a[i][j] += f(a[i-1][j]); }

12e.10 Block Mapping (Review) blksz = (int)ceil((float)N / P); for (i = lb + my_rank * blksz; i < min(N, lb + (my_rank + 1) * blksz); i++) {... } (lb is the lower bound of the original loop)

12e.11 Block Mapping

12e.12 Block Mapping The problem is that block mapping can lead to a load imbalance Example, let N=26, P=6 blksz = ceiling(26/6) = 5 (lb = 0)

12e.13 Block Mapping Processors 0-4 have 5 iterations of work Processor 5 has 1 iteration

12e.14 Cyclic Mapping An alternative to block mapping is cyclic mapping This is where each iteration is assigned to each processors in a round robin fashion This leads to a better load balance

12e.15 Cyclic Mapping Processors 0-2 have 6 iterations of work Processor 3-6 have only 5, but it is only 1 iteration fewer!

12e.16 Cyclic Mapping for (i = lb + my_rank; i < N; i += P) {... } (lb is the lower bound of the original loop)

12e.17 Cyclic Mapping Conceptually, this is an easier mapping to implement than block mapping It leads to better load balancing However, it can (and often does) lead to more communication Suppose that each iteration in the above example is dependent on the previous iteration

12e.18 Cyclic Mapping A message is sent from iteration 0 to 1, from 1 to 2, from 2 to 3, from 3 to 4, from 4 to 5, from 5 to 6,...

12e.19 Block Mapping With block mapping, only messages are sent from iteration 5 to 6, from 11 to 12, from 17 to 18, and from 23 to 24

12e.20 Block vs Cyclic Block mapping increases the granularity and reduces overall communication (O(P)). However, it can lead to load imbalances (O(N/P)). Cyclic mapping decreases granularity and increases overall communication (O(N)). However, it improves load balance (O(1)). Block-Cyclic is a combination of the two

12e.21 Block-Cyclic Mapping Block-cyclic with N=26, P=6, and blksz=2 The load imbalance will be <= blksz

12e.22 Block-Cyclic Mapping (N, P, and blksz are given) nLayers = (int)ceil(((float)N)/(blksz*P)); for (layer = 0; layer < nLayers; layer++) { beginBlk = layer*blksz*N; for (i = beginBlk + mypid*blksz; i < min(N, beginBlk + (mypid + 1)*blksz); i++) {... }

12e.23 Block vs Cyclic Block-Cyclic is in between Block and Cyclic in terms of granularity, communication, and load balancing. Block and Cyclic are special cases of Block-Cyclic –Block = Block-Cyclic with blksz = ceiling(N/P) –Cyclic = Block-Cyclic with blksz = 1

12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007.

Similar presentations

Presentation on theme: "12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007.

Similar presentations

Presentation on theme: "12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007."— Presentation transcript:

Similar presentations

About project

Feedback