Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing.

Slides:



Advertisements
Similar presentations
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Advertisements

Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Improve Run Merging Reduce number of merge passes.  Use higher order merge.  Number of passes = ceil(log k (number of initial runs)) where k is the merge.
CSE506: Operating Systems Disk Scheduling. CSE506: Operating Systems Key to Disk Performance Don’t access the disk – Whenever possible Cache contents.
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Infinite Horizon Problems
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Memory Management – 4 Page Replacement Algorithms CS 342 – Operating Systems.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Chapter 7 Multimedia Operating Systems File System Paradigms File Replacement Caching Disk Scheduling.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Computer Organization and Architecture
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting CS 202 – Fundamental Structures of Computer Science II Bilkent.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
External Sorting Sort n records/elements that reside on a disk. Space needed by the n records is very large.  n is very large, and each record may be.
I/O Management and Disk Structure Introduction to Operating Systems: Module 14.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 8 Sorting and Searching Goals: 1.Java implementation of sorting algorithms 2.Selection and Insertion Sorts 3.Recursive Sorts: Mergesort and Quicksort.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
I/O Efficient Directed Model Checking Shahid Jabbar and Stefan Edelkamp, Computer Science Department University of Dortmund, Germany.
Internal and External Sorting External Searching
CS 3343: Analysis of Algorithms Lecture 19: Introduction to Greedy Algorithms.
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
Hierarchical Memory Systems Prof. Sin-Min Lee Department of Computer Science.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Part IV I/O System Chapter 12: Mass Storage Structure.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
LINKED LISTS.
( ) 1 Chapter # 8 How Data is stored DATABASE.
Sorting.
CMSC 611: Advanced Computer Architecture
External Sorting Sort n records/elements that reside on a disk.
Operating Systems Disk Scheduling A. Frank - P. Weisberg.
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Lecture#12: External Sorting (R&G, Ch13)
Unit-2 Divide and Conquer
Improve Run Merging Reduce number of merge passes.
13.3 Accelerating Access to Secondary Storage
External Sorting.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing Duke University EEF Summer School on Massive Data Sets

Read (Prefetching) Problem:  Process blocks from the disks in the order given by an access sequence Σ = {b 1, b 2, b 3,…}. Write Problem:  Process the reverse access sequence Σ R in order. Each processed block must reside either in memory or on the disks. Two main points:  How to schedule I/Os exactly optimally, given fixed memory size?  There is a natural duality between the read and write problems. The reverse of a valid write schedule is a valid read schedule for the reverse access sequence.  OK, one more point: Duality applies to sorting too! We know how to do distribution. Now, by duality, merging too. Duality between Reading and Writing

 Define a block’s trigger value to be the smallest item in the block.  The order in which a block must be used in the merge is defined by its trigger value.  The trigger values in sorted order give the sequence Σ.  At each I/O step, the greedy algorithm conceptually...  reads in from each disk the block with smallest trigger value.  if there is not enough room in main memory (say, k blocks too many), then the k blocks with the largest trigger values are kicked out (back to disk). Greedy Read Schedule (used by SRM method)

Disk contents:Sorted Runs: Disk 1: A 1 A 2 A 3 A 4 1: A 1 B 1 C 1 Disk 2: B 1 B 2 B 3 B 4 2: C 2 B 2 A 2 Disk 3: C 1 C 2 C 3 C 4 3: C 3 B 3 A 3 4: C 4 B 4 A 4 (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Reverse Write Stream Σ R : C 1 B 1 A 4 A 3 A 2 A 1 B 4 B 3 B 2 C 4 C 3 C 2 Optimal Schedule via Duality (Absolute Best!) Greedy (Nonoptimal) Read Schedule Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Read I/O Steps Disk 1A1A1 A2A2 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1

Greedy (Nonoptimal) Read Schedule Read I/O Steps Disk 1 Disk 2 Disk A1A1 B2B2 C2C2 Memory Buffer Contents A1A1 B2B2 C2C2 C3C3 B3B3 A2A2 A2A2 B3B3 C3C3 C4C4 B4B4 B4B4 C4C4 B1B1 C1C1 A2A2 B1B1 C1C1 A3A3 A3A3 A4A4 A4A4 The sequence Σ is defined by order of trigger values. At each step of the algorithm, for each disk, the block still on the disk with the smallest trigger value is read into memory. In step 1, A 1, B 2, C 2 are read into memory. C 2 (the first block in Σ) is processed and removed from memory. A 2, B 3, C 3 are read into memory (smallest trigger value left on each disk). C 3 is processed and removed from memory. A 3, B 4, C 4 have the smallest trigger values left on each disk. However, there are already four items in memory. Of the seven, A 2 and A 3 have the largest trigger values, so C 4 and B 4 are read in, A 2 is kicked back out to disk, and A 3 is not read in. C 4, B 2, B 3, B 4, and A 1 are processed and removed from memory. A 2, B 1, and C 1 are read into memory (smallest trigger value left on each disk). A 2 is processed and is then removed from memory. A 3 is read into memory. A 3 is processed and then removed from memory. A 4 is read into memory. A 4, B 1, and C 1 are processed and removed from memory. Greedy Algorithm takes 6 I/O read steps. Disk contents:Sorted Runs: Disk 1: A 1 A 2 A 3 A 4 1: A 1 B 1 C 1 Disk 2: B 1 B 2 B 3 B 4 2: C 2 B 2 A 2 Disk 3: C 1 C 2 C 3 C 4 3: C 3 B 3 A 3 4: C 4 B 4 A 4 (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Reverse Write Stream Σ R : C 1 B 1 A 4 A 3 A 2 A 1 B 4 B 3 B 2 C 4 C 3 C 2

Optimal Write Schedule (Reverse Optimal Read Schedule) Memory Buffers Disk 1 Disk 2 Disk 3 Write I/O Steps Disk 1 Disk 2 Disk A4A4 B1B1 C1C1 A4A4 C1C1 B1B1 A3A3 A2A2 A1A1 B3B3 B4B4 A3A3 B4B4 B2B2 C4C4 A2A2 B3B3 C4C4 C3C3 C2C2 A1A1 B2B2 C3C3 C2C2 The reverse write sequence is processed to find the optimal solution. The algorithm fills main memory in the order of the reverse sequence and writes the blocks at the head of the queues when main memory is full. Main memory is filled with C 1, B 1, A 4, A 3, and A 2. A 4, B 1, and C 1 are written. Main memory is filled with A 1, B 4, and B 3. A 3 and B 4 are written and can be removed from memory. Main memory is filled with B 2 and C 4. A 2, B 3, and C 4 are written and can be removed from memory. Main memory is filled with C 3 and C 2. A 1, B 2, and C 3 are written and can be removed from memory. C 2 is written and can be removed from memory. The reverse of this sequence of writes is the optimal read schedule. Disk contents:Sorted Runs: Disk 1: A 1 A 2 A 3 A 4 1: A 1 B 1 C 1 Disk 2: B 1 B 2 B 3 B 4 2: C 2 B 2 A 2 Disk 3: C 1 C 2 C 3 C 4 3: C 3 B 3 A 3 4: C 4 B 4 A 4 (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Reverse Write Stream Σ R : C 1 B 1 A 4 A 3 A 2 A 1 B 4 B 3 B 2 C 4 C 3 C 2

 The optimal read schedule as defined by duality reads blocks in as late as possible (since the reverse write schedule writes blocks as early as possible). We call it Lazy Prefetching.  This laziness can be a problem if there are timing issues. It also does not lend itself to online scheduling with finite lookahead.  Solution: Prudent Prefetching pushes up the reads of the optimal lazy schedule as far as possible.  The algorithm processes the optimal schedule starting from the first read step and reads blocks earlier when possible, under the consistency constraint that if block x is read before block y in the original lazy schedule, then y cannot be read before x in the prudent schedule.  Counteracts possible I/O delays by not waiting until the last minute to read blocks. This algorithm is less sensitive to timing delays. Prudent Prefetching

Optimal Schedule via Duality Prudent Prefetching Schedule Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Prudent Prefetching Example (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1

Greedy Schedule Prudent Prefetching Schedule Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Comparison of Prudent Prefetching with Greedy (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Read I/O Steps Disk 1A1A1 A2A2 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1

 It is usually easier to demonstrate optimality for the Writing Problem.  Result: (exact) optimal solutions to the corresponding Read (Prefetching) Problems on the reverse sequence [Kallahalla- Varman, Hutchinson-Sanders-Vitter]. Exploiting Duality between Reading and Writing

What if a block appears multiple times in Σ (and thus Σ R )?  For the Read (Prefetching) Problem, a block does not need to be read from the disks if it is already (cached) in memory.  Define a corresponding write problem: a block does not need to be written to the disk if a more recent version of the block has replaced it in memory. Each disk queue in memory is managed using Belady-algorithm: At each I/O step, write to disk the block whose next access is furthest into the future.  Result is optimal write schedule, and thus by duality, optimal prefetching (read) schedule. Duplicate Blocks in the Access Sequence

 In the previous read and write problems, the layout of the blocks on the disks was specified beforehand.  What if we can choose the layout of the blocks on the disks, as in sorting? Can we do better? What’s to Come:  Using a regular but randomized layout (randomized cycling), we showed how to do distribution (writing) very efficiently.  Result is a fast and practical distribution sort.  If there is a duality between distribution and merging, then we can get an optimal merge by computing the optimal distribution on the reverse sequendce.  But what is the sequence??? The merge is defined by runs, not a single access sequence Σ. Our Goal  Compute in advance the access sequence Σ corresponding to the merge. (We saw a hint of this earlier.) Application to Sorting

Duality between Reading and Writing

Duality between Distribution Sort and Merge Sort

 In each pass of Mergesort, sorted runs of data are merged together. A run consists of blocks of sorted data.  Runs are typically striped across parallel disks.  In main memory, each run has a (partially filled) block participating in the current merge.  The first element in each block is a trigger value that determines when the block must be in main memory.  Σ is formed by ordering the blocks based on trigger value.  Duality can be used to obtain an optimal read schedule by finding an optimal write schedule on the reverse sequence Σ R. Coming Full Circle: I/O Efficient Mergesort via Duality