# Linear Sorts Counting sort Bucket sort Radix sort.

## Presentation on theme: "Linear Sorts Counting sort Bucket sort Radix sort."— Presentation transcript:

Linear Sorts Counting sort Bucket sort Radix sort

Linear Sorts 2 Linear Sorts We will study algorithms that do not depend only on comparing whole keys to be sorted. Counting sort Bucket sort Radix sort

Linear Sorts 3 Counting sort Assumptions: –n records –Each record contains keys and data –All keys are in the range of 1 to k Space –The unsorted list is stored in A, the sorted list will be stored in an additional array B –Uses an additional array C of size k

Linear Sorts 4 Counting sort Main idea: 1. For each key value i, i = 1,…,k, count the number of times the keys occurs in the unsorted input array A. Store results in an auxiliary array, C 2. Use these counts to compute the offset. Offset i is used to calculate the location where the record with key value i will be stored in the sorted output list B. The offset i value has the location where the last key i. When would you use counting sort? How much memory is needed?

Linear Sorts 5 Counting Sort Counting-Sort( A, B, k) 1. for i  1 to k 2.do C[i ]  0 3. for j  1 to length[A] 4.do C[A[ j ] ]  C[A[ j ] ] + 1 5. for i  2 to k 6.do C[i ]  C[i ] +C[i -1] 7. for j  length[A] down 1 8.do B [ C[A[ j ] ] ]  A[ j ] 9.C[A[ j ] ] ]  C [A[ j ] ] -1 Analysis: Input: A [ 1.. n ], A[J]  {1,2,..., k } Output: B [ 1.. n ], sorted Uses C [ 1.. k ], auxiliary storage Adapted from Cormen,Leiserson,Rivest

Linear Sorts 6 A 431443 123456 k = 4, length = 6 C after lines 1-2 0000 C after lines 3-4 1023 Counting-Sort( A, B, k) 1. for i  1 to k 2.do C[i ]  0 3. for j  1 to length[A] 4.do C[A[ j ] ]  C[A[ j ] ] + 1 5. for i  2 to k 6.do C[i ]  C[i ] +C[i -1] C after lines 5-6 1136

Linear Sorts 7 7. for j  length[A] down 1 8.do B [ C[A[ j ] ] ]  A[ j ] 9.C[A[ j ] ] ]  C [A[ j ] ] -1 A 431443 123456 B 123456 C 1136

Linear Sorts 8 Counting sort 3 Clinton 4 Smith 1 Xu 2 Adams 3 Dunn 4 Yi 2 Baum 1 Fu 3 Gold 1 Lu 1 Land 12341234 00000000 12341234 42324232 12341234 (4)(3)2 6 (9)8 11 1 Lu 1 Land 3 Gold 1 2 3 4 5 6 7 8 9 10 11 Original list B CCC 1 2 3 4 5 6 7 8 9 10 11 final counts "offsets" A Sort buckets

Linear Sorts 9 Analysis: O (k + n) time –What if k = O (n) But Sorting takes  (n lg n) ???? Requires k + n extra storage. This is a stable sort: It preserves the original order of equal keys. Clearly no good for sorting 32 bit values.

Linear Sorts 10 Bucket sort Keys are distributed uniformly in interval [0, 1) The records are distributed into n buckets The buckets are sorted using one of the well known sorts Finally the buckets are combined

Linear Sorts 11 Bucket sort. 78.17.39.26.72.94.21.12.23.68 1 2 3 4 5 6 7 8 9 10 01234567890123456789 ////////.12.17/.23.68/.72.94/.39/.78/.21.26/ Step 1 distribute 01234567890123456789 ////////.12.17/.21.68/.72.94/.39/.78/.23.26/ Step 2 sorted Step3 combine

Linear Sorts 12 Analysis P = 1/n, probability that the key goes to bucket i. Expected size of bucket is np = n  1/n = 1 The expected time to sort one bucket is  (1). Overall expected time is  (n).

Linear Sorts 13 How did IBM get rich originally? In the early 1900's IBM produced punched card readers for census tabulation. Cards are 80 columns with 12 places for punches per column. Only 10 places needed for decimals. –Picture of punch card. Sorters had 12 bins. Key idea: sort the least significant digit first.

Linear Sorts 14 A punched card

Linear Sorts 15 Card punching machine IBM card punching machine

Linear Sorts 16 Hollerith’s tabulating machines As the cards were fed through a "tabulating machine," pins passed through the positions where holes were punched completing an electrical circuit and subsequently registered a value. The 1880 census in the U.S. took seven years to complete With Hollerith's "tabulating machines" the 1890 census took the Census Bureau six weeks

Linear Sorts 17 Card sorting machine IBM’s card sorting machine

Linear Sorts 18 Radix sort Main idea –Break key into “digit” representation key = i d, i d-1, …, i 2, i 1 –"digit" can be a number in any base, a character, etc Radix sort: for i= 1 to d sort “digit” i using a stable sort Analysis :  (d  (stable sort time)) where d is the number of “digit”s

Linear Sorts 19 Radix sort Which stable sort? –Since the range of values of a digit is small the best stable sort to use is Counting Sort. –When counting sort is used the time complexity is  (d  (n +k )) where k is the range of a "digit". When k  O(n),  (d  n)

Linear Sorts 20 Radix sort- with decimal digits 178 139 326 572 294 321 910 368 1234567812345678 910 321 572 294 326 178 368 139 910 321 326 139 368 572 178 294 139 178 294 321 326 368 572 910    Input list Sorted list

Linear Sorts 21 Radix sort with unstable digit sort 17 13 1212 13171317 17 13  Input list List not sorted Since unstable and both keys equal to 1

Linear Sorts 22 Is Quicksort stable? Note that data is sorted by key Since sort unstable cannot be used for radix sort 5155485155485 123123 4855514855515  Key Data After partition of 0 to 2 After partition of 1 to 2 4855514855515

Linear Sorts 23 Is Heapsort stable? Note that data is sorted by key Since sort unstable cannot be used for radix sort 515551555 1212   Key Data Complete binary tree, and max heap 5151 5 5 55515551 Heap Sorted After swap

Linear Sorts 24 Example Sort 1 million 64-bit numbers. We could use an in place comparison sort which would run in  (n lg n) in the average case. lg 1,000,000  20 passes over the data We can treat a 64 bit number as a 4 digit, radix-2 16 number. So d = 4, k = 2 16, n = 1,000,000  (d (n + k )) =  ( 4(2 16 +n)). This takes 4 * 2 passes over the data. 16 bits d 3 16 bits d 2 16 bits d 1 16 bits d o 64 bits number = d 3 *(2 16 ) 3 + d 2 *(2 16 ) 2+ d 1 (2 16 ) 1 + d 0 (2 16 ) 0 Adapted from Cormen,Leiserson,Rivest