Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication.

Similar presentations


Presentation on theme: "Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication."— Presentation transcript:

1 Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication Analysis of Floyd’s Algorithm

2 Parallel Reduction Evolution

3 Binomial Trees Subgraph of hypercube

4 Finding Global Sum 4207 -35-6-3 8123 -446

5 Finding Global Sum 17-64 4582

6 Finding Global Sum 8-2 910

7 Finding Global Sum 178

8 Finding Global Sum 25 Binomial Tree

9 Agglomeration

10 sum

11 Gather

12 All-gather

13 Complete Graph for All-gather

14 Hypercube for All-gather

15 Analysis of Communication Lambda is latency = message delay = overhead to send 1 message Beta is bandwidth = number of data items per unit time = bytes per message Sending a message with n data items costs

16 Communication Time for All-Gather Hypercube Complete graph

17 Adding Data Input

18 Scatter

19 Scatter in log p Steps 12345678 567812345612 7834

20 Communication Time for Scatter Hypercube Complete graph

21 Recall Parallel Floyd’s Computational Complexity Innermost loop has complexity  (n) Middle loop executed at most  n/p  times Outer loop executed n times Overall computation complexity  (n 3 /p)

22 Floyd’s Communication Complexity No communication in inner loop No communication in middle loop Broadcast in outer loop — complexity is Executed n times

23 Execution Time Expression (1) Iterations of outer loop Iterations of middle loop Cell update time Iterations of outer loop Messages per broadcast Message-passing time bytes/msg Iterations of inner loop

24 Accounting for Computation/communication Overlap Note that after the 1 st broadcast all the wait times overlap the computation time of Process 0.

25 Execution Time Expression (2) Iterations of outer loop Iterations of middle loop Cell update time Iterations of outer loop Messages per broadcast Message-passing time Iterations of inner loop Message transmission

26 Predicted vs. Actual Performance Execution Time (sec) ProcessesPredictedActual 125.54 213.0213.89 39.019.60 46.897.29 55.865.99 65.015.16 74.404.50 83.943.98 X=25.5 nsec L = 250 usecs B = 10MB/sec N = 1000

27 Summary Two matrix decompositions –Rowwise block striped –Columnwise block striped Blocking send/receive functions –MPI_Send –MPI_Recv Overlapping communications with computations


Download ppt "Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication."

Similar presentations


Ads by Google