CSE 326: Data Structures Lecture 24 Spring Quarter 2001 Sorting, Part B David Kaplan

Slides:



Advertisements
Similar presentations
Sorting in Linear Time Introduction to Algorithms Sorting in Linear Time CSE 680 Prof. Roger Crawfis.
Advertisements

Analysis of Algorithms
Bryce Giddens.  Born in Buffalo, New York on February 29,  Parents were immigrants to the US from Germany in  He was clever at an early.
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Sorting in Linear Time Comp 550, Spring Linear-time Sorting Depends on a key assumption: numbers to be sorted are integers in {0, 1, 2, …, k}. Input:
Non-Comparison Based Sorting
Linear Sorts Counting sort Bucket sort Radix sort.
CSE 3101: Introduction to the Design and Analysis of Algorithms
MS 101: Algorithms Instructor Neelima Gupta
1 Sorting in Linear Time How can we do better?  CountingSort  RadixSort  BucketSort.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Lower bound for sorting, radix sort COMP171 Fall 2005.
CSCE 3110 Data Structures & Algorithm Analysis
Lower bound for sorting, radix sort COMP171 Fall 2006.
1 SORTING Dan Barrish-Flood. 2 heapsort made file “3-Sorting-Intro-Heapsort.ppt”
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Tirgul 4 Subjects of this Tirgul: Counting Sort Radix Sort Bucket Sort.
Divide and Conquer Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 5 Linear-time sorting Can we do better than comparison sorting? Linear-time sorting.
10/15/2002CSE More on Sorting CSE Algorithms Sorting-related topics 1.Lower bound on comparison sorting 2.Beating the lower bound 3.Finding.
1 CSE 326: Data Structures: Sorting Lecture 16: Friday, Feb 14, 2003.
1 CSE 326: Data Structures: Sorting Lecture 17: Wednesday, Feb 19, 2003.
September 24, 2001 L5.1 Introduction to Algorithms Lecture 6 Prof. Shafi Goldwasser.
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Linear Sorts Chapter 12.3, Last Updated: :39 AM CSE 2011 Prof. J. Elder Linear Sorts?
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
IT253: Computer Organization
Introduction to Algorithms Jiafen Liu Sept
Sorting Fun1 Chapter 4: Sorting     29  9.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
Analysis of Algorithms CS 477/677
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.
September 26, 2005Copyright © Erik D. Demaine and Charles E. Leiserson L5.1 Introduction to Algorithms 6.046J/18.401J Prof. Erik Demaine LECTURE5.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Bucket & Radix Sorts. Efficient Sorts QuickSort : O(nlogn) – O(n 2 ) MergeSort : O(nlogn) Coincidence?
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
CSE 326 Sorting David Kaplan Dept of Computer Science & Engineering Autumn 2001.
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
CSE 326: Data Structures Lecture #16 Sorting Things Out Bart Niswonger Summer Quarter 2001.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 17 Linear Sorting.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
1 CSE 326: Data Structures Sorting in (kind of) linear time Zasha Weinberg in lieu of Steve Wolfman Winter Quarter 2000.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
Lecture 5 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Herman was born to George and Franciska Hollerith on February 29, in Buffalo, New York in 1860 His parents were immigrants from Germany He began school.
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Continued Medians and Order Statistics
Introduction to Algorithms
Algorithm Design and Analysis (ADA)
Data Structures & Algorithms
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Algorithms
Lower bound for sorting, radix sort
Algorithms CSCI 235, Spring 2019 Lecture 18 Linear Sorting
Linear Time Sorting.
Presentation transcript:

CSE 326: Data Structures Lecture 24 Spring Quarter 2001 Sorting, Part B David Kaplan

Outline Sorting: The Problem Space Revisited Sorting in Linear Time –Bin (aka Bucket) Sort –Radix Sort Memory Hierarchy –Other performance factors besides running time Outline

Sorting: The Problem Space Revisited General problem Given a set of N orderable items, put them in order Without (significant) loss of generality, assume: –Items are integers –Ordering is  Most sorting problems can be mapped to the above in linear time But what if we have more information available to us?

Sorting in Linear Time Sorting by Comparison –Only information available to us is the set of N items to be sorted –Only operation available to us is pairwise comparison between 2 items –Best running time under these constraints is O(N lg(N)) What if we relax the constraints? –Know something in advance about item values

BinSort (a.k.a. BucketSort) If keys are known to be in {1, …, K} Have array of size K Put items into correct bin (cell) of array, based on key

BinSort example K=5 list=(5,1,3,4,3,2,1,1,5,4,5) Bins in array key = 11,1,1 key = 22 key = 33,3 key = 44,4 key = 55,5,5 Sorted list: 1,1,1,2,3,3,4,4,5,5,5

BinSort Pseudocode procedure BinSort (List L,K) LinkedList bins[1..K] // Each element of array bins is linked list. // Could also BinSort with array of arrays. For Each number x in L bins[x].Append(x) End For For i = 1..K For Each number x in bins[i] Print x End For

BinSort Running Time K is a constant –BinSort is linear time K is variable –Not simply linear time K is large (e.g ) –Impractical

BinSort is “stable” Stable Sorting algorithm –Items in input with the same key end up in the same order as when they began. –Important if keys have associated values –Critical for RadixSort

Mr. Radix Herman Hollerith invented and developed a punch-card tabulation machine system that revolutionized statistical computation. Born in Buffalo, New York, the son of German immigrants, Hollerith enrolled in the City College of New York at age 15 and graduated from the Columbia School of Mines with distinction at the age of 19. His first job was with the U.S. Census effort of Hollerith successively taught mechanical engineering at the Massachusetts Institute of Technology and worked for the U.S. Patent Office. The young engineer developed an electrically actuated brake system for the railroads, but the Westinghouse steam-actuated brake prevailed. Hollerith began working on the tabulating system during his days at MIT, filing for the first patent in He developed a hand- fed 'press' that sensed the holes in punched cards; a wire would pass through the holes into a cup of mercury beneath the card closing the electrical circuit. This process triggered mechanical counters and sorter bins and tabulated the appropriate data. Hollerith's system-including punch, tabulator, and sorter-allowed the official 1890 population count to be tallied in six months, and in another two years all the census data was completed and defined; the cost was $5 million below the forecasts and saved more than two years' time. His later machines mechanized the card-feeding process, added numbers, and sorted cards, in addition to merely counting data. In 1896 Hollerith founded the Tabulating Machine Company, forerunner of Computer Tabulating Recording Company (CTR). He served as a consulting engineer with CTR until retiring in In 1924 CTR changed its name to IBM- the International Business Machines Corporation. Herman Hollerith Born February 29, Died November 17, 1929 Art of Compiling Statistics; Apparatus for Compiling Statistics Patent Nos. 395,781; 395,782; 395,783 Inducted 1990 Source: National Institute of Standards and Technology (NIST) Virtual Museum -

RadixSort Radix = “The base of a number system” (Webster’s dictionary) –alternate terminology: radix is number of bits needed to represent 0 to base-1; can say “base 8” or “radix 3” Idea: BinSort on each digit, bottom up.

RadixSort – magic! It works. Input list: 126, 328, 636, 341, 416, 131, 328 BinSort on lower digit: 341, 131, 126, 636, 416, 328, 328 BinSort result on next-higher digit: 416, 126, 328, 328, 131, 636, 341 BinSort that result on highest digit: 126, 131, 328, 328, 341, 416, 636

RadixSort – how it works

Not magic. It provably works. Keys –K-digit numbers –base B Claim: after i th BinSort, least significant i digits are sorted. –e.g. B=10, i=3, keys are 1776 and comes before 1776 for last 3 digits.

RadixSort Proof by Induction Base case: –i=0. 0 digits are sorted (that wasn’t hard!) Induction step –assume for i, prove for i+1. –consider two numbers: X, Y. Say X i is i th digit of X (from the right) X i+1 < Y i+1 then i+1 th BinSort will put them in order X i+1 > Y i+1, same thing X i+1 = Y i+1, order depends on last i digits. Induction hypothesis says already sorted for these digits. (Careful about ensuring that your BinSort preserves order aka “stable”…)

What data types can you RadixSort? Any type T that can be BinSorted Any type T that can be broken into parts A and B, such that: –You can reconstruct T from A and B –A can be RadixSorted –B can be RadixSorted –A is always more significant than B, in ordering

Example: 1-digit numbers can be BinSorted 2 to 5-digit numbers can be BinSorted without using too much memory 6-digit numbers, broken up into A=first 3 digits, B=last 3 digits. –A and B can reconstruct original 6-digits –A and B each RadixSortable as above –A more significant than B

RadixSorting Strings 1 Character can be BinSorted Break strings into characters Need to know length of biggest string (or calculate this on the fly). Null-pad shorter strings Running time: –N is number of strings –L is length of longest string –RadixSort takes O(N*L)

Evaluating Sorting Algorithms What factors other than asymptotic complexity could affect performance? Suppose two algorithms perform exactly the same number of instructions. Could one be better than the other?

Memory Hierarchy Stats (made up) NameExtra CPU cycles used to access Size L1 (on chip) cache 032 KB L2 cache8512 KB RAM35256 MB Hard Drive500,0008 GB

Memory Hierarchy Exploits Locality of Reference Idea: small amount of fast memory Keep frequently used data in the fast memory LRU replacement policy –Keep recently used data in cache –To free space, remove Least Recently Used data

Cache Details (simplified) Main Memory Cache Cache line size (4 adjacent memory cells)

Traversing an Array One miss for every 4 accesses in a traversal cache misses cache hits

Iterative MergeSort Poor Cache Performance Cache Size no temporal locality!

Recursive MergeSort Better Cache Performance Cache Size

Quicksort Pretty Good Cache Performance Initial partition causes a lot of cache misses As subproblems become smaller, they fit into cache Good cache performance

Radix Sort Lousy Cache Performance On each BinSort –Sweep through input list – cache misses along the way (bad!) –Append to output list – indexed by pseudorandom digit (ouch!) Truly evil for large Radix (e.g ), which reduces # of passes

Sorting Instruction Count

Sorting Cache Misses

Sorting Execution Time Note: the “in place” (no extra memory) version of Mergesort used by the Java applet we saw in class is much slower than regular Mergesort!

Summary Linear-time sorting is possible if we know more about the input (e.g. that all input values fall within a known range) BinSort –Moderately useful O(n) categorizer –Most useful as building block for RadixSort RadixSort –O(n), but input must conform to fairly narrow profile –Poor cache performance Memory Hierarchy and Cache –Cache locality can have major impact on performance of sorting, and other, algorithms