Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science.

Slides:



Advertisements
Similar presentations
An Overview of ABFT in cloud computing
Advertisements

THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Chapter 4 Partition I. Covering and Dominating.
Combinational Circuit Yan Gu 2 nd Presentation for CS 6260.
Lecture 19: Parallel Algorithms
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
The Study of Cache Oblivious Algorithms Prepared by Jia Guo.
Totally Unimodular Matrices
COP 3502: Computer Science I (Note Set #21) Page 1 © Mark Llewellyn COP 3502: Computer Science I Spring 2004 – Note Set 21 – Balancing Binary Trees School.
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science.
The number of edge-disjoint transitive triples in a tournament.
Heiko Schröder, 2003 Parallel Architectures 1 Various communication networks State of the art technology Important aspects of routing schemes Known results.
Examples of Two- Dimensional Systolic Arrays. Obvious Matrix Multiply Rows of a distributed to each PE in row. Columns of b distributed to each PE in.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
ISPDC 2007, Hagenberg, Austria, 5-8 July On Grid-based Matrix Partitioning for Networks of Heterogeneous Processors Alexey Lastovetsky School of.
Aho-Corasick String Matching An Efficient String Matching.
Objectives Learn how to implement the sequential search algorithm Explore how to sort an array using the selection sort algorithm Learn how to implement.
Review of Matrix Algebra
1 Parallel Algorithms III Topics: graph and sort algorithms.
Mesh connected networks. Sorting algorithms Efficient Parallel Algorithms COMP308.
Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.
Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.
Chapter 5 Determinants.
Lecture 7: Matrix-Vector Product; Matrix of a Linear Transformation; Matrix-Matrix Product Sections 2.1, 2.2.1,
How to find the inverse of a matrix
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
College of Nanoscale Science and Engineering A uniform algebraically-based approach to computational physics and efficient programming James E. Raynolds.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 12: February 6, 2006 Sorting.
Topic III The Simplex Method Setting up the Method Tabular Form Chapter(s): 4.
DISCRETE COMPUTATIONAL STRUCTURES CS Fall 2005.
Lars Arge1 Permuting Lower Bound Permuting N elements according to a given permutation takes I/Os in “indivisibility” model Indivisibility model: Move.
SANDRA GUASCH CASTELLÓ PHD EVOTING WORKSHOP LUXEMBOURG, 15-16/10/2012 SUPERVISOR: PAZ MORILLO BOSCH Verifiable Mixnets.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
1 Section 6.5 Inclusion/Exclusion. 2 Finding the number of elements in the union of 2 sets From set theory, we know that the number of elements in the.
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms – Matrix multiplication – Distribution sort – Static searching.
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed.
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
CS 420 Design of Algorithms Parallel Algorithm Design.
Basic Communication Operations Carl Tropper Department of Computer Science.
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
Arrays Department of Computer Science. C provides a derived data type known as ARRAYS that is used when large amounts of data has to be processed. “ an.
An Analysis of the n- Queens problem Saleem Karamali.
Information and Computer Security CPIS 312 Lab 6 & 7 1 TRIGUI Mohamed Salim Symmetric key cryptography.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Distributed and Parallel Processing
April 19th – Avl Operations
Mesh connected networks. Sorting algorithms
Communication Complexity as a Lower Bound for Learning in Games
Lecture 22: Parallel Algorithms
PERIODIC TABLE OF ELEMENTS
Chapter 3: The Efficiency of Algorithms
FFTW and Matlab*p Richard Hu Project.
4th Homework Problem In this homework problem, we wish to exercise the tearing and relaxation methods by means of a slightly larger problem than that presented.
Chapter 3: The Efficiency of Algorithms
The following model requirements are based on a basic slab door finish kitchen similar to the one shown in the image.
Dense Linear Algebra (Data Distributions)
CPS120: Introduction to Computer Science
CPS120: Introduction to Computer Science
Taking it one step further!
Presentation transcript:

Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science

What Do We Know About Columnsort? Sorts N values on an r  s mesh Uses 8 steps –Each step either sorts each column or performs a fixed permutation Divisibility restriction: s divides r Height restriction: r ≥ 2s 2 4s 3/2 –Exponent of s goes from 2 to 3/2 –Mesh need not be quite so tall and skinny –Cost: 2 additional steps –Can simultaneously remove the divisibility restriction and relax the height restriction to r ≥ 6s 3/2

Why Relax the Conditions? Columnsort applies in more circumstances Our motivation: out-of-core sorting Column height r is limited by amount of memory –Either per processor or in entire system –N = rs, r ≥ 2s 2 ==> N ≤ r 3/2 /2 1/2 –N = rs, r ≥ 4s 3/2 ==> N ≤ r 5/3 /4 2/3 –Reducing the exponent of s in the bound for r allows us to sort more values with a given amount of memory A similar technique works for applying columnsort to in-core sorting

This Talk Slabpose columnsort –r ≥ 4s 3/2 –Requires divisibility restriction Also in the paper –Subblock columnsort r ≥ 4s 3/2 with divisibility restriction r ≥ 6s 3/2 without divisibility restriction –Proof that the divisibility restriction is unnecessary in the basic columnsort algorithm

Columnsort Steps 1.Sort each column 2.Transpose entire mesh 3.Sort each column 4.Untranspose entire mesh 5.Sort each column 6.Shift down by half a column 7.Sort each column 8.Shift up by half a column

1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns 5.Slabpose 6.Sort each column 7.Untranspose entire mesh 8.Sort each column 9.Shift down by half a column 10.Sort each column 11.Shift up by half a column 1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns 5.Slabpose 6.Sort each column 7.Untranspose entire mesh 8.Sort each column 9.Shift down by half a column 10.Sort each column 11.Shift up by half a column 1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns 5.Slabpose 6.Sort each column 7.Untranspose entire mesh 8.Sort each column 9.Shift down by half a column 10.Sort each column 11.Shift up by half a column Slabpose Columnsort Steps Oblivious!

1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns + slabpose 5.Sort each column 6.Untranspose entire mesh 7.Sort each column 8.Shift down by half a column 9.Sort each column 10.Shift up by half a column Slabpose Columnsort Steps Oblivious!

Why Work With Vertical Slabs? In regular columnsort, the matrix needs to be tall and skinny Working with vertical slabs allows us to change the aspect ratio to use tall and skinny slabs We’ll use slabs that are s columns wide The mesh will have s slabs

0-1 Principle If an oblivious algorithm sorts all input sets consisting solely of 0s and 1s, then it sorts all input sets with arbitrary values Use the 0-1 Principle by looking at portions of the r  s mesh Clean: all 0s or all 1s Dirty: may be mixed 0s and 1s

Step 1: Sort Each Column 0 1 dirtyr s

Step 2: Slabpose s-slab s slabs column s ≤ s dirty rows

Step 3: Sort Each Column ≤ s rows

Step 4: Shuffle s-slab s slabs s-slab s slabs ≤ s rows

Step 5: Slabpose s-slab s slabs s-slab s slabs r/ s rows ≤ 2 rows s sets of dirty rows

Step 6: Sort Each Column ≤ 2 s rows ≤ 2s 3/2 elements

Step 7: Untranspose Entire Mesh ≤ 2s 3/2 elements r ≥ 4s 3/2 ==> 2s 3/2 ≤ r/2 ==> dirty area ≤ half a column Once the size of the dirty area is at most half a column, the last four steps will finish up

Step 8: Sort Each Column dirty area resides in one column ==> done

Step 8: Sort Each Column dirty area resides in two columns ==> no change

Step 9: Shift Down by Half a Column dirty area resides in one column

Step 10: Sort Each Column dirty area resides in one column

Step 11: Shift Up by Half a Column sorted

Subblock Columnsort Adds two steps to columnsort –Sort each column –A fixed permutation The permutation is any one that distributes all elements of each s  s subblock to all s columns Like slabpose columnsort, the size of the dirty area is ≤ 2s 3/2 entering the last four steps As long as 2s 3/2 ≤ r/2 (half a column), the last four steps complete the sorting

Removing the Divisibility Restriction from Columnsort With the divisibility restriction, the dirty rows after the transpose step have only 0  1 transitions Without the divisibility restriction, there may also be 1  0 transitions The proof shows that even with the 1  0 transitions, the size of the dirty area entering the last four steps does not increase Thus r ≥ 2s 2 suffices, even without the divisibility restriction

Conclusion We can get around the restrictions of columnsort Reduce the exponent in the height restriction from 2 to 3/2 –The mesh need not be quite so tall and skinny –Cost: Two extra steps –In out-of-core implementation, slabpose columnsort requires no additional I/O The divisibility restriction is unnecessary Open question: Can we reduce the exponent further within the columnsort framework?