Download presentation

Presentation is loading. Please wait.

Published byHanna Mallard Modified over 2 years ago

1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Partitioning and Divide-and-Conquer Strategies Chapter 4

2
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Divide and Conquer Divide the problem into sub-problems of same form as larger problem. Further divide into still smaller sub-problems (usually done by recursion) Recursive divide and conquer amenable to parallelization because separate processes can be used for divided parts.

3
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Divide-and-Conquer Using a Hypercube

4
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Partial summation

5
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Bucket sort One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m)). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

6
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Parallel version of bucket sort Simple approach Assign one processor for each bucket:

7
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Parallel bucket sort - improved version Requires all-to-all scatter

8
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.14 “all-to-all” scatter routine actually transfers rows of an array to columns: Transposes a matrix.

9
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. All-to-All scatter on a 3-dim Hypercube T par = plogp

10
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.14 Tim’s version of All-to-All scatter on a 3-dim Hypercube Step 1: P0: x00, x10, x20, x30, x40, x50, x60, x70 ____/ ___/ ___/ ___/ / / / / P1: x01, x11, x21, x31, x41, x51, x61, x71 Step 2: P0: x00, x01, x20, x21, x40, x41, x60, x61 _________/____/ _________/____/ / / / / P2: x02, x03, x22, x23, x42, x43, x62, x63 Step 3: P0: x00, x01, x02, x03, x40, x41, x42, x43 ___________________/____/____/____/ / / / / P4: x05, x06, x07, x08, x45, x46, x47, x48 T par = plogp

11
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11 Time Complexity Analysis for Bucket Sort Sequential Algorithm: 1.Each element is put into one of the m buckets O(n) 2.Sort elements in each bucket: (m* (n/m * log(n/m))) = nlog(n/m) If m=O(n) then TIME = O(n) Overall: T seq = O(n) (* How is this possible? A: uniform distribution assumption *) Parallel Algorithm (m=p): 1.Each PE partitions its n/p elements into p=m sub-buckets O(n/p) 2.Using all-to-all personalized broadcast algorithm, each PE sends sub- buckets to the appropriate Pes. All-to-All Scatter of 1 unit messages can be done in O(plogp) time on a p processor Hypercube. Therefore, this step takes (n/p 2 )*plogp= (n/p)logp time 3.Each PE sorts its own bucket using the seq. bucket sort alg. If there is uniform distribution and (n/p) elements can be put in (n/p) buckets then sorting can be done in O(n/p) Time. (Otherwise, it is [n/p*log(n/p)]) Total Time T par = T (1) + T (2) + T (3) = n/p + (n/p)logp + n/p

12
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Next: Pipelining + Prime Number Sieve

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google