Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.

Similar presentations


Presentation on theme: "1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H."— Presentation transcript:

1 1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H. Jackson Brown

2 2 Bubble Sort Sequential bubble sort algorithm.

3 3 258146 258146 Bubble Sort

4 4 258146 258146 258146

5 5 258146 258146 258146 258146

6 6 258146 258146 258146 258146 251846

7 7 258146 258146 258146 258146 251846 251486

8 8 258146 258146 258146 258146 251846 251486 251468 Complexity? Bubble Sort

9 9 258146 258146 258146 258146 251846 251486 251468 O(n 2 ) Inherently sequential Bubble Sort

10 10 Odd-Even Transposition Sorting n = 8 elements, using the odd-even transposition sort algorithm.

11 11 Odd-Even Transposition Sequential odd-even transposition sort algorithm. Complexity?

12 12 8765432187654321 7856341278563412 7583614275836142 5738162457381624 5371826453718264 3517284635172846 3152748631527486 1325476813254768 1234567812345678 Odd Even Odd Even Odd Even Odd Even Odd

13 13 Odd-Even  After n phases of odd-even exchanges, the sequence is sorted.  Each phase of the algorithm (either odd or even) requires Θ(n) comparisons.  Serial complexity is Θ(n 2 ).

14 14 Parallel Odd-Even Transposition §Consider the one item per processor case. §Assume processes are arranged in one- dimensional array.  There are n iterations, in each iteration, each processor does one compare-exchange. §The parallel run time is ?

15 15 Parallel Odd-Even Transposition §Consider the one item per processor case.  There are n iterations, in each iteration, each processor does one compare-exchange.  The parallel run time of this formulation is Θ(n). §Cost optimal?

16 16 Parallel Odd-Even Transposition  Consider a block of n/p elements per processor. §The first step is a local sort. §In each subsequent step, the compare exchange operation is replaced by the compare-split operation. §How many odd-even phases will be executed?

17 17 Compare Split Operation A compare-split operation. Each process sends its block of size n/p to the other process. Each process merges the received block with its own block and retains only the appropriate half of the merged block.

18 18 Parallel Odd-Even Transposition §The first step is a local sort. §There are p phases. §The parallel run time of the formulation is

19 19 Odd Even Sorting

20 20 Odd Even Sorting

21 21 Odd Even Sorting

22 22 Odd Even Sorting

23 23 2345678123456781 2345671823456718 2345617823456178 2345167823451678 2341567823415678 2314567823145678 2134567821345678 1234567812345678 Odd Even Odd Even Odd Even Odd Even

24 24 Shellsort  Let n be the number of elements to be sorted and p be the number of processes. §During the first phase, processes that are far away from each other in the array compare- split their elements. §During the second phase, the algorithm switches to an odd-even transposition sort.

25 25 03456721 02456731 02451376 02451367 An example of the first phase of parallel shellsort on an eight-process array. Parallel Shellsort

26 26 0245136702451367 0245136702451367 0241536702415367 0214356702143567 0123456701234567 Odd Even Odd Even Odd

27 27 Parallel Shellsort  Each process performs d = log p compare-split operations.  With O(p) bisection width, each communication can be performed in time Θ(n/p) for a total time of Θ((nlog p)/p).  In the second phase, l odd and even phases are performed, each requiring time Θ(n/p). §The parallel run time of the algorithm is:

28 28 Quicksort §Quicksort selects one of the entries in the sequence to be the pivot and divides the sequence into two - one with all elements less than the pivot and other greater. §The process is recursively applied to each of the sublists.

29 29

30 30 Quicksort §The performance of quicksort depends critically on the quality of the pivot.

31 31 Quicksort §Parallel formulation of quick sort. §Do we start with a single process?

32 32 Parallel Quicksort  Consider a list of size n equally divided across p processors. §A pivot is selected by one of the processors and made known to all processors.  Each processor partitions its list into two, say L i and U i, based on the selected pivot.

33 33 Parallel Quicksort  All of the L i lists are merged and all of the U i lists are merged separately.  The set of processors is partitioned into two (in proportion of the size of lists L and U ). §The process is recursively applied to each of the lists. §The recursion stops when a particular sub-block is assigned to a single process. At which point, the lists are sorted locally using serial quick sort.

34 34

35 35

36 36

37 37

38 38 §Pivot selection

39 39 OpenMP and MPI §Suitable for multiprocessors? §Suitable for multi-computers?

40 40 Combining MPI and OpenMP §Many commercial multi-computers are collections of centralized multiprocessors.

41 41 Combining MPI and OpenMP §Hybrid parallel program

42 42 Combining MPI and OpenMP §Suppose we are executing on a cluster of m multiprocessors, where each multiprocessor has k CPUs §In order to utilize every CPU, MPI program has o create mk processes l During communication mk processes are active §Hybrid needs ?

43 43 Combining MPI and OpenMP §Suppose we are executing on a cluster of m multiprocessors, where each multiprocessor has k CPUs §In order to utilize every CPU, MPI program has o create mk processes l During communication mk processes are active §Hybrid needs m processes and workload is divided among k threads on each multiprocessor.

44 44 Combining MPI and OpenMP §Suppose we are executing on a cluster of m multiprocessors, where each multiprocessor has k CPUs §In order to utilize every CPU, MPI program has o create mk processes l During communication mk processes are active §Hybrid needs m processes and workload is divided among k threads on each multiprocessor. l Lower communication overhead

45 45 Combining MPI and OpenMP §Suppose a serial program executes in 100 seconds. §5 seconds in inherently sequential operations. §90 seconds are perfectly parallelizable. §Remaining 5 percent can be done in parallel but has a large communication overhead so we replicate these operations.


Download ppt "1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H."

Similar presentations


Ads by Google