Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.

Slides:



Advertisements
Similar presentations
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Advertisements

Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
ALGORITMOS DE ORDENACIÓN EN PARALELO
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Spring 2015 Lecture 5: QuickSort & Selection
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Chapter 7: Sorting Algorithms
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of Computer.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
Chapter 10 in textbook. Sorting Algorithms
Sorting Algorithms CS 524 – High-Performance Computing.
1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.
Algorithms and Applications
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
CHAPTER 11 Sorting.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
Design of parallel algorithms Sorting J. Porras. Problem Rearrange numbers (x 1,...,x n ) into ascending order ? What is your intuitive approach –Take.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Sorting Algorithms: Topic Overview
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley,
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
1 Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. ITCS4145/5145, Parallel Programming B. Wilkinson.
Parallel Programming in C with MPI and OpenMP
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Analysis of Algorithms CS 477/677
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
1. 2 Sorting Algorithms - rearranging a list of numbers into increasing (strictly nondecreasing) order.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
“Sorting networks and their applications”, AFIPS Proc. of 1968 Spring Joint Computer Conference, Vol. 32, pp
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta,
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes P i and P j send their elements to each other. Process P.
Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.
Advanced Sorting 7 2  9 4   2   4   7
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Lecture 3: Parallel Algorithm Design
Introduction to parallel algorithms
Parallel Sorting Algorithms
Chapter 4: Divide and Conquer
Bitonic Sorting and Its Circuit Design
Introduction to parallel algorithms
Parallel Sorting Algorithms
Parallel Sorting Algorithms
Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. Sorting number is important in applications as it can.
Introduction to parallel algorithms
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2

Introduction  Sorting is the most common operations performed by a computer  Internal or external  Comparison-based Θ(nlogn) and non comparison- based Θ(n) 3

background  Where the input and output sequence are stored? stored on one process distributed among the process ○ Useful as an intermediate step  What’s the order of output sequence among the processes? Global enumeration 4

How comparisons are performed  Compare-exchange is not easy in parallel sorting algorithms  One element per process Ts+Tw, Ts>>Tw => poor performance 5

How comparisons are performed (contd’)  More than one element per process n/p elements, Ai <= Aj Compare-split, (ts+tw*n/p)=> Ɵ (n/p) 6

Outline  introduction  Sorting Networks Bitonic sort Mapping bitonic sort to hypercube and mesh  Bubble Sort and its Variants 7

Sorting Networks Ɵ (log 2 n)  Key component: Comparator Increasing comparator Decreasing comparator 8

A typical sorting network  Depth: the number of columns it contains Network speed is proportional to it 9

Bitonic sort: Ɵ (log 2 n)  Bitonic sequence Monotonically increasing then decreasing There exists a cyclic shift of indices so that the above satisfied EG:  How to rearrange a bitonic sequence to obtain a monotonic sequence? Let s= is a bitonic sequence s 1,s 2 are bitonic every element of s 1 are smaller than every element of s 2 Bitonic-split; bitonic-merge=>bitonic-merging network or 10

Example of bitonic merging 11

Bitonic merging network  Logn column 12

Sorting n unordered elements  Bitonic sort, bitonic-sorting network  d(n)=d(n/2)+logn => d(n)=Θ(log 2 n) 13

The first three stage 14

How to map Bitonic sort to a hypercube ?  One element per process  How to map the bitonic sort algorithm on general purpose parallel computer? Process a wire Compare-exchange function is performed by a pair of processes Bitonic is communication intensive=> considering the topology of the interconnection network ○ Poor mapping => long distance before compare, degrading performance  Observation: Communication happens between pairs of wire which have 1 bit different 15

The last stage of bitonic sort 16

Communication characteristics 17

Bitonic sort algorithm on 2 d processors  Tp=Θ(log 2 n), cost optimal to bitonic sort 18

Mapping Bitonic sort to a mesh 19

The last stage of the bitonic sort 20

A block of elements per process case  Each processor has n/p elements S1: Think of each process as consisting of n/p smaller processes ○ Poor parallel implementation S2: Compare-exchange=> compare-split:Θ(n/p)+Θ(n/p) The different: S2 initially sorted locally Hypercube mesh 21

Performance on different Architecture  Either very efficient nor very scalable, since the sequential algorithm is sub optimal 22

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 23

Bubble sort  O(n 2 )  Inherently sequential 24

Odd-even transposition  N phases, each Θ(n) comparisons 25

Odd-even transposition 26

Parallel formulation  O(n) 27

Shellsort  Drawback of odd-even sort A sequence which has a few elements out of order, still need Θ(n 2 ) to sort.  idea Add a preprocessing phase, moving elements across long distance Thus reduce the odd and even phase 28

Shellsort 29

Conclusion  Sorting Networks Bitonic network Mapping to hypercube and mesh  Bubble Sort and its Variants Odd-even sort Shell sort 30

Outline  Issues in Sorting  Sorting Networks  Bubble Sort and its Variants  Quick sort  Bucket and Sample sort  Other sorting algorithms 32

Quick Sort  Feature Simple, low overhead Θ(nlogn) ~ Θ(n 2 ),  Idea Choosing a pivot, how? Partitioning into two parts, Θ(n) Recursively solving two sub-problems  complexity T(n)=T(n-1)+ Θ(n)=> Θ(n 2 ) T(n)=T(n/2)+ Θ(n)=> Θ(nlogn) 33

The sequential algorithm 34

Parallelizing quicksort  Solution 1 Recursive decomposition Drawback: partition handled by single process, Ω(n). Ω(n 2 )  Solution 2 Idea: performing partition parallelly we could partition an array of size n into two smaller arrays in time Θ(1) by using Θ(n) processes ○ how? ○ CRCW PRAM, Shard-address, message-passing model 35

Parallel Formulation for CRCW PRAM –cost optimal  assumption n elements, n process  write conflicts are resolved arbitrarily  Executing quicksort can be visualized as constructing a binary tree 36

Example 37

algorithm procedure BUILD TREE (A[1...n]) 2. begin 3. for each process i do 4. begin 5. root := i; 6. parent i := root; 7. leftchild[i] := rightchild[i] := n + 1; 8. end for 9. repeat for each process i ≠ root do 10. begin 11. if (A[i] < A[parent i ]) or (A[i]= A[parent i ] and i <parent i ) then 12. begin 13. leftchild[parent i ] :=i ; 14. ifi = leftchild[parent i ] then exit 15. else parent i := leftchild[parent i ]; 16. end for 17. else 18. begin 19. rightchild[parent i ] :=i; 20. If i = rightchild[parent i ] then exit 21. else parent i := rightchild[parent i ]; 22. end else 23. end repeat 24. end BUILD_TREE Assuming balanced tree: Partition distribute To all process O(1) Θ(logn) * Θ(1)

Parallel Formulation for Shared- Address-Space Architecture  assumption N element, p processes Shared memory  How to parallelize?  Idea of the algorithm Each process is assigned a block Selecting a pivot element, broadcast Local rearrangement Global rearrangement=> smaller block S, larger block L redistributing blocks to processes ○ How many? Until breaking the array into p parts 39

Example 40 How to compute the location?

Example(contd’) 41

How to do global rearrangement? 42

Analysis  Assumption Pivot selection results in balanced partitions  Logp steps Broadcasting Pivot Θ(logp) Locally rearrangement Θ(n/p) Prefix sum Θ(log p) Global rearrangement Θ(n/p) 43

Parallel Formulation for Message Passing Architecture  Similar to shared-address architecture  Different Array distributed to p processes 44

Pivot selection  Random selection Drawback: bad pivot lead to significant performance degradation  Median selection Assumption: the initial distribution of elements in each process is uniform 45

Outline  Issues in Sorting  Sorting Networks  Bubble Sort and its Variants  Quick sort  Bucket and Sample sort  Other sorting algorithms 46

Bucket Sort  Assumption n elements distributed uniformly over [a, b]  Idea Divided into m equal sized subinterval Element replacement Sorted each one  Θ(nlog(n/m)) => Θ(n)  Compare with QuickSort 47

Parallelization on message passing architecture  N elements, p processes=> p buckets  Preliminary idea Distributing elements n/p Subinterval, elements redistribution Locally sorting Drawback: the assumption is not realistic => performance degradation  Solution: Sample sorting => splitters Guarantee elements < 2n/m 48

Example 49

analysis  Distributing elements n/p  Local sort & sample selection Θ(p)  Sample combining Θ(P 2 ),sortingΘ(p 2 logp), global splitter Θ(p)  elements partitioning Θ(plog(n/p)), redistribution O(n)+O(plogp)  Locally sorting 50

Outline  Issues in Sorting  Sorting Networks  Bubble Sort and its Variants  Quick sort  Bucket and Sample sort  Other sorting algorithms 51

Enumeration Sort  Assumption O(n 2 ) process, n elements, CRCW PRAM  Feature Based the rank of each element  Θ(1) 52

Algorithm procedure ENUM SORT (n) 2. begin 3. for each process P1,j do 4. C[j] :=0; 5. for each process Pi,j do 6. if (A[i] < A[j]) or ( A[i]= A[j] and i < j) then 7. C[j] := 1; 8. else 9. C[j] := 0; 10. for each process P1,j do 11. A[C[j]] := A[j]; 12. end ENUM_SORT Common structure: A[n], C[n]

Radix Sort  Assumption n elements, n process  Feature Based on binary presentation of the elements Leveraging the enumeration sorting 54

Algorithm procedure RADIX SORT(A, r) 2. begin 3. for i := 0 to b/r - 1 do 4. begin 5. offset := 0; 6. for j := 0 to 2^r -1 do 7. begin 8. flag := 0; 9. if the ith least significant r-bit block of A[Pk] = j then 10. flag := 1; 11. index := prefix_sum(flag) // Θ(log n) 12. if flag = 1 then 13. rank := offset + index; 14. offset := parallel_sum(flag); // Θ(log n) 15. endfor 16. each process Pk send its element A[Pk] to process Prank;//Θ(n) 17. endfor 18. end RADIX_SORT

Conclusion  Sorting Networks Bitonic network, mapping to hypercube and mesh  Bubble Sort and its Variants Odd-even sorting, shell sorting  Quick sort Parallel formation on CRCW PRAM, shared address/MP architecutre  Bucket and Sample sort  Enumeration and radix sorting 56