Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004.

Similar presentations


Presentation on theme: "Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004."— Presentation transcript:

1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Partitioning and Divide-and-Conquer Strategies Chapter 4

2 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Divide and Conquer Divide the problem into sub-problems of same form as larger problem. Further divide into still smaller sub-problems (usually done by recursion) Recursive divide and conquer amenable to parallelization because separate processes can be used for divided parts.

3 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Divide-and-Conquer Using a Hypercube

4 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Partial summation

5 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Bucket sort One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m)). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

6 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Parallel version of bucket sort Simple approach Assign one processor for each bucket:

7 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Parallel bucket sort - improved version Requires  all-to-all scatter

8 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved “all-to-all” scatter routine actually transfers rows of an array to columns: Transposes a matrix.

9 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. All-to-All scatter on a 3-dim Hypercube T par = plogp

10 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Tim’s version of All-to-All scatter on a 3-dim Hypercube Step 1: P0: x00, x10, x20, x30, x40, x50, x60, x70 ____/ ___/ ___/ ___/ / / / / P1: x01, x11, x21, x31, x41, x51, x61, x71 Step 2: P0: x00, x01, x20, x21, x40, x41, x60, x61 _________/____/ _________/____/ / / / / P2: x02, x03, x22, x23, x42, x43, x62, x63 Step 3: P0: x00, x01, x02, x03, x40, x41, x42, x43 ___________________/____/____/____/ / / / / P4: x05, x06, x07, x08, x45, x46, x47, x48 T par = plogp

11 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Time Complexity Analysis for Bucket Sort Sequential Algorithm: 1.Each element is put into one of the m buckets  O(n) 2.Sort elements in each bucket:  (m* (n/m * log(n/m))) = nlog(n/m) If m=O(n) then TIME = O(n) Overall: T seq = O(n) (* How is this possible? A: uniform distribution assumption *) Parallel Algorithm (m=p): 1.Each PE partitions its n/p elements into p=m sub-buckets  O(n/p) 2.Using all-to-all personalized broadcast algorithm, each PE sends sub- buckets to the appropriate Pes. All-to-All Scatter of 1 unit messages can be done in O(plogp) time on a p processor Hypercube. Therefore, this step takes  (n/p 2 )*plogp= (n/p)logp time 3.Each PE sorts its own bucket using the seq. bucket sort alg. If there is uniform distribution and (n/p) elements can be put in (n/p) buckets then sorting can be done in  O(n/p) Time. (Otherwise, it is [n/p*log(n/p)]) Total Time  T par = T (1) + T (2) + T (3) = n/p + (n/p)logp + n/p

12 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Next: Pipelining + Prime Number Sieve


Download ppt "Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004."

Similar presentations


Ads by Google