# CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

## Presentation on theme: "CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they."— Presentation transcript:

CSC 213 – Large Scale Programming

Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they fastest possible sorts?  Another way to sort data presented  How can we sort data with single simple value?  What are limits on using buckets to sort our data?  If we want more buckets, can we expand these limits?  How does radix sort work? How long does it need?

Quick Sort v. Merge Sort Quick SortMerge Sort  Divide data around pivot  Want pivot to be near middle  All comparisons occur here  Conquer with recursion  Does not need extra space  Merge usually done already  Data already sorted!  Divide data in blindly half  Always gets even split  No comparisons performed!  Conquer with recursion  Needs * to use other arrays  Merge combines solutions  Compares from (sorted) halves

Complexity of Sorting  With n ! external nodes, binary tree’s height is: O(n log n)

Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from C & add to B[v] 2. Move elements from each bucket back to C A B C

Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from C & add to B[v] 2. Move elements from each bucket back to C

Bucket-Sort Algorithm Algorithm bucketSort( Sequence C) B = new Sequence [10] // & instantiate each Sequence // Phase 1 for each element v in C B[v].addLast(v) // Assumes each number in C between 0 & 9 endfor // Phase 2 loc = 0 for each Sequence b in B for each element v in b C.set(loc, v) loc += 1 endfor endfor return C

Bucket Sort Properties  For this to work, values must be legal indices  Non-negative integer indices needed to access arrays  Sorting occurs without comparing objects

Bucket Sort Properties  For this to work, values must be legal indices  Non-negative integer indices needed to access arrays  Sorting occurs without comparing objects

Bucket Sort Properties

 For this to work, values must be legal indices  Non-negative integer indices needed to access arrays  Sorting occurs without comparing objects  Stable sort describes any sort of this type  Preserves relative ordering of objects with same value  (B UBBLE - SORT & M ERGE - SORT are other stable sorts)

Bucket Sort Extensions  Use Comparator for B UCKET - SORT  Get index for v using compare( v, null)  Comparator for booleans could return  0 when v is false  1 when v is true  Comparator for US states, could return  Annual per capita consumption of Jello  Consumption of jello overall, in cubic feet  State’s ranking by population

Bucket Sort Extensions  State’s ranking by population 1 California 2 Texas 3 New York 4 Florida 5 Illinois 6 Pennsylvania 7 Ohio 8 Michigan 9 Georgia

Bucket Sort Extensions  Extended B UCKET - SORT works with many types  Limited set of data needed for this to work enumerate  Need way to enumerate values of the set

Bucket Sort Extensions  Extended B UCKET - SORT works with many types  Limited set of data needed for this to work enumerate  Need way to enumerate values of the set enumerate is subtle hint

d -Tuples  Combination of d values such as ( k 1, k 2, …, k d )  k i is i th dimension of the tuple  A point ( x, y, z ) is 3-tuple  x is 1 st dimension’s value  Value of 2 nd dimension is y  z is 3 rd dimension’s value

Lexicographic Order  Assume a & b are both d-tuples  a = ( a 1, a 2, …, a d )  b = ( b 1, b 2, …, b d )  Can say a < b if and only if  a 1 < b 1 OR  a 1 = b 1 && ( a 2, …, a d ) < ( b 2, …, b d )  Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8)

Lexicographic Order  Assume a & b are both d-tuples  a = ( a 1, a 2, …, a d )  b = ( b 1, b 2, …, b d )  Can say a < b if and only if  a 1 < b 1 OR  a 1 = b 1 && ( a 2, …, a d ) < ( b 2, …, b d )  Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8) (1 4) (3 2) (3 4) (4 8) (7 8)

Radix-Sort  Very fast sort for data expressed as d-tuple  Cheats to win  Cheats to win; faster than sorting’s lower bound  Sort performed using d calls to bucket sort  Sorts least to most important dimension of tuple  Luckily lots of data are d-tuples  String is d-tuple of char

Radix-Sort  Very fast sort for data expressed as d-tuple  Cheats to win  Cheats to win; faster than sorting’s lower bound  Sort performed using d calls to bucket sort  Sorts least to most important dimension of tuple  Luckily lots of data are d-tuples  Digits of an int can be used for sorting, also

Radix-Sort For Integers  Represent int as a d-tuple of digits: 6210 10 = 111110 2 0410 10 = 000100 2  Decimal digits needs 10 buckets to use for sorting  Ordering using their bits needs 2 buckets  O (d∙ n ) time needed to run R ADIX - SORT  d is length of longest element in input  In most cases value of d is constant (d = 31 for int )  Radix sort takes O ( n ) time, ignoring constant

Radix-Sort In Action  List of 4-bit integers sorted using R ADIX - SORT 1001 0010 1101 0001 1110

Radix-Sort In Action  List of 4-bit integers sorted using R ADIX - SORT 1001 0010 1101 0001 1110 0010 1110 1001 1101 0001

Radix-Sort In Action  List of 4-bit integers sorted using R ADIX - SORT 1001 0010 1101 0001 1110 1001 1101 0001 0010 1110 0010 1110 1001 1101 0001

Radix-Sort In Action  List of 4-bit integers sorted using R ADIX - SORT 1001 0010 1101 0001 1110 1001 0001 0010 1101 1110 1001 1101 0001 0010 1110 0010 1110 1001 1101 0001

Radix-Sort In Action  List of 4-bit integers sorted using R ADIX - SORT 0001 0010 1001 1101 1110 1001 0010 1101 0001 1110 1001 0001 0010 1101 1110 1001 1101 0001 0010 1110 0010 1110 1001 1101 0001

Radix-Sort Algorithm radixSort( Sequence C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bit endfor return C  What is big-Oh complexity for Radix-Sort?  Call in loop uses each element twice  Loop repeats once per digit to complete sort

Radix-Sort Algorithm radixSort( Sequence C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bit endfor return C  What is big-Oh complexity for Radix-Sort?  Call in loop uses each element twice O(n)  Loop repeats once per digit to complete sort * O(1) O(n)