Parallel Algorithms Continued Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2012.

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Analysis of Algorithms
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
n-bit comparator using 1-bit comparator
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Copyright © 2014, 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Eighth Edition by Tony Gaddis,
Linear Sorts Counting sort Bucket sort Radix sort.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
Practice Quiz Question
Chapter 9: Searching, Sorting, and Algorithm Analysis
Computability Start complexity. Motivation by thinking about sorting. Homework: Finish examples.
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of Computer.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Lecture 24: Query Execution Monday, November 20, 2000.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
Sorting Algorithms CS 524 – High-Performance Computing.
Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.
Tirgul 4 Subjects of this Tirgul: Counting Sort Radix Sort Bucket Sort.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Sorting in Linear Time Lower bound for comparison-based sorting
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
Sorting HKOI Training Team (Advanced)
Lecture 7 – Data Reorganization Pattern Data Reorganization Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
File Organization and Processing Week 13 Divide and Conquer.
1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Starting Out with C++ Early Objects Seventh Edition by Tony Gaddis, Judy Walters, and Godfrey Muganda Modified for use by MSU Dept. of Computer Science.
Algorithms and their Applications CS2004 ( ) Professor Jasna Kuljis (adapted from Dr Steve Swift) 6.1 Classic Algorithms - Sorting.
CUDA Performance Considerations (2 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
© David Kirk/NVIDIA, Wen-mei W. Hwu, and John Stratton, ECE 498AL, University of Illinois, Urbana-Champaign 1 CUDA Lecture 7: Reductions and.
1 Radix Sort. 2 Classification of Sorting algorithms Sorting algorithms are often classified using different metrics:  Computational complexity: classification.
A Comparison of Parallel Sorting Algorithms on Different Architectures Nancy M. Amato, Ravishankar Iyer, Sharad Sundaresan and Yan Wu Texas A&M University.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Elementary Sorting 30 January Simple Sort // List is an array of size == n for (i = 1; i < n; i++) for (j = i+1; j List[j])
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Static block can be used to check conditions before execution of main begin, Suppose we have developed an application which runs only on Windows operating.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
+ Even Odd Sort & Even-Odd Merge Sort Wolfie Herwald Pengfei Wang Rachel Celestine.
CS 540 Database Management Systems
Sorting Quick, Merge & Radix Divide-and-conquer Technique subproblem 2 of size n/2 subproblem 1 of size n/2 a solution to subproblem 1 a solution to.
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
Merge Sort Algorithm A pretty decent algorithm. What does it do? Takes an unsorted list Splits it into a bunch of tiny, one element lists Compares each.
Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes P i and P j send their elements to each other. Process P.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Seventh Edition by Tony Gaddis, Judy.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Lecture 5 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
Advanced Sorting 7 2  9 4   2   4   7
Sorting Networks Characteristics The basic unit: a comparator
CS 6068 Parallel Computing Fall 2015 Week 4 – Sept 21
Real-time double buffer For hard real-time
GPGPU: Parallel Reduction and Scan
Patrick Cozzi University of Pennsylvania CIS Spring 2011
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Presentation transcript:

Parallel Algorithms Continued Patrick Cozzi University of Pennsylvania CIS Spring 2012

Announcements Presentation topics due tomorrow, 02/07 Homework 2 due next Monday, 02/13

Agenda Parallel Algorithms  Review Stream Compression  Radix Sort CUDA Performance

Radix Sort Efficient for small sort keys  k-bit keys require k passes

Radix Sort Each radix sort pass partitions its input based on one bit First pass starts with the least significant bit (LSB). Subsequent passes move towards the most significant bit (MSB) 010 LSBMSB

Radix Sort 100 Example from Example input:

Radix Sort First pass: partition based on LSB LSB == 0 LSB == 1

Radix Sort Second pass: partition based on middle bit bit == 0 bit ==

Radix Sort Final pass: partition based on MSB MSB == 0 MSB ==

Radix Sort Completed:

Radix Sort Completed:

Parallel Radix Sort Where is the parallelism?

Parallel Radix Sort 1. Break input arrays into tiles  Each tile fits into shared memory for an SM 2. Sort tiles in parallel with radix sort 3. Merge pairs of tiles using a parallel bitonic merge until all tiles are merged. Our focus is on Step 2

Parallel Radix Sort Where is the parallelism?  Each tile is sorted in parallel  Where is the parallelism within a tile?

Parallel Radix Sort Where is the parallelism?  Each tile is sorted in parallel  Where is the parallelism within a tile? Each pass is done in sequence after the previous pass. No parallelism Can we parallelize an individual pass? How?  Merge also has parallelism

Parallel Radix Sort Implement spilt. Given:  Array, i, at pass b:  Array, b, which is true/false for bit b: Output array with false keys before true keys:

Parallel Radix Sort i array b array Step 1: Compute e array e array

Parallel Radix Sort b array Step 2: Exclusive Scan e e array f array i array

Parallel Radix Sort i array b array Step 3: Compute totalFalses e array f array totalFalses = e[n – 1] + f[n – 1] totalFalses = totalFalses = 4

Parallel Radix Sort i array b array Step 4: Compute t array e array f array t array t[i] = i – f[i] + totalFalses totalFalses = 4

Parallel Radix Sort i array b array Step 4: Compute t array e array f array 4 t array t[0] = 0 – f[0] + totalFalses t[0] = 0 – t[0] = 4 totalFalses =

Parallel Radix Sort i array b array Step 4: Compute t array e array f array 4 4 t array t[1] = 1 – f[1] + totalFalses t[1] = 1 – t[1] = 4 totalFalses =

Parallel Radix Sort i array b array Step 4: Compute t array e array f array 44 5 t array t[2] = 2 – f[2] + totalFalses t[2] = 2 – t[2] = 5 totalFalses =

Parallel Radix Sort i array b array Step 4: Compute t array e array f array t array totalFalses = 4 t[i] = i – f[i] + totalFalses

Parallel Radix Sort i array b array Step 5: Scatter based on address d e array f array t array 0 d[i] = b[i] ? t[i] : f[i]

Parallel Radix Sort i array b array Step 5: Scatter based on address d e array f array t array 04 d[i] = b[i] ? t[i] : f[i]

Parallel Radix Sort i array b array Step 5: Scatter based on address d e array f array t array 041 d[i] = b[i] ? t[i] : f[i]

Parallel Radix Sort i array b array Step 5: Scatter based on address d e array f array t array d[i] = b[i] ? t[i] : f[i]

Parallel Radix Sort i array Step 5: Scatter based on address d d output

Parallel Radix Sort i array Step 5: Scatter based on address d d output

Parallel Radix Sort Given k-bit keys, how do we sort using our new split function? Once each tile is sorted, how do we merge tiles to provide the final sorted array?