Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee,

Similar presentations


Presentation on theme: "Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee,"— Presentation transcript:

1 Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306

2 Broadcast communication(MPI_Bcast) n0n0 n1n1 n2n2 n3n3 n0n0 n1n1 n2n2 n3n3 Before After ABCD ABCDABCDABCDABCD Let T(msize) = time to send a message of size msize Broadcast(msize) >= T(msize)

3 Ethernet Switched Cluster switch

4 Problem statement: How to efficiently realize the broadcast operation with large message sizes on Ethernet switched clusters. Using pipelined broadcast can achieve near optimal results (T(msize) time for broadcasting a message of size msize). Finding contention free broadcast tree Finding a good segment size

5 Traditional Broadcast algorithms 01234567 Linear tree 1234567 Flat tree 0 Time = (P-1) x T(msize)

6 0 12 3456 7 Binary tree 0 123 4567 k-ary tree Time = 2x(log 2 (P+1)-1)xT(msize)

7 0 42 65 1 3 7 Binomial tree Time = log 2 P x T(msize)

8 Scatter/Allgather n0n0 n1n1 n2n2 n3n3 Before ABCD ABCD Scatter Allgather ABCDABCDABCDABCD Time = 2 x T(msize)

9 Time Complexity for large messages Linear tree(P-1) x T(msize) Flat tree(P-1) x T(msize) Binary tree2x(log 2 (P+1)-1)xT(msize) Approx. 2xlog 2 P x T(msize) Binomial treelog 2 P x T(msize) Scatter/allgather2xT(msize)

10 Pipelined Broadcast Algorithm Linear pipeline 0123

11 Performance of pipelined broadcast: Assume no network contention a message of size msize be broken into X messages of msize/X. H: tree hight, D: the number of children Size of pipelined stage: D * T(msize/X) Total time T: (X + H –1) * (D * T(msize /X)) linear tree: H = P, D = 1, T = T(msize) Binary tree: H = log(P), D= 2, T = 2T(msize) K-ary tree: H = log_k(P), D = k, in general not as efficient as binary tree.

12 Time Complexity for large messages Pipelined (linear)T(msize) Pipelined (binary)2 x T(msize) k-ary pipelinek x T(msize) Binomial treelog 2 P x T(msize) Scatter/allgather2xT(msize)

13 Pipelined broadcast How to find a contention-free broadcast tree? How to select the best segment size?

14 Example of network contention 0 12 3456 7 Binary tree switch n 0,n 1,n 2,n 3 n 4,n 5,n 6,n 7 There is a link contention cause by communication (1  4), (2  5), (2  6), and (3  7)

15 Linear tree switch n 0,n 1,n 4,n 5 n 2,n 3,n 6,n 7 The linear tree 0  1  2  3  …  7 will have a contention caused by (1  2) and (5  6)

16 Algorithm for constructing contention free linear tree Step 1: Traverse through all switches using depth-first-search (DFS) algorithm, name the switch by the order of their arrival in DFS tree Step 2: The linear tree consists of all machines in switch S 0, follows by all machines in S 1, then S 2,and so on

17 Example of contention free linear tree Switch S0 Switch S1 n 0,n 1,n 4,n 5 n 2,n 3,n 6,n 7 Switch S3 Switch S2 n 12,n 13,n 14,n 15 n 8,n 9,n 10,n 11 Linear tree: n0  n1  n4  n5  2  3  6  7  8  9  …  15

18 Algorithm for constructing contention free binary tree Start with a contention free linear tree Recursively divide the tree into 2 sub-trees Make sure that the cannot be a contention The sub-trees are chosen such that the height of the whole tree will be minimal 0123456789101112131415

19 Binary tree height Performance of binary pipeline broadcast depends on the height of a binary tree Even though contention free binary tree may not be a complete binary tree, its height is not that much more than a complete binary tree

20 Average tree heights for 20 randomly generated topologies

21 Evaluation Contention free pipelined algorithms: Routine generators from topology information The generated routines are based on MPICH p2p primitives. Linear tree Binary tree 3-nary tree Targets for comparison: MPICH: Binomial tree, Scatter/allgather LAM: Flat-tree, Binomial Topology unaware pipelined linear and binary algorithms

22 Evaluation

23 Performance of different pipelined trees (topology 1)

24 Comparing pipelined broadcast with other schemes

25 Topology unaware and contention-free pipelined broadcast

26 Segment size for pipelined broadcast

27 Conclusions Pipelined broadcast is faster than the current broadcast algorithm for medium and large messages Linear pipeline has a completion time roughly equal to T(msize) binary pipeline broadcast is best for medium messages Contention free broadcast tree is necessary for pipelined algorithms A good segment size for pipelined broadcast is not difficult to find.

28 Questions?


Download ppt "Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee,"

Similar presentations


Ads by Google