Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing.

Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing Duke University EEF Summer School on Massive Data Sets

Read (Prefetching) Problem:  Process blocks from the disks in the order given by an access sequence Σ = {b 1, b 2, b 3,…}. Write Problem:  Process the reverse access sequence Σ R in order. Each processed block must reside either in memory or on the disks. Two main points:  How to schedule I/Os exactly optimally, given fixed memory size?  There is a natural duality between the read and write problems. The reverse of a valid write schedule is a valid read schedule for the reverse access sequence.  OK, one more point: Duality applies to sorting too! We know how to do distribution. Now, by duality, merging too. Duality between Reading and Writing

 Define a block’s trigger value to be the smallest item in the block.  The order in which a block must be used in the merge is defined by its trigger value.  The trigger values in sorted order give the sequence Σ.  At each I/O step, the greedy algorithm conceptually...  reads in from each disk the block with smallest trigger value.  if there is not enough room in main memory (say, k blocks too many), then the k blocks with the largest trigger values are kicked out (back to disk). Greedy Read Schedule (used by SRM method)

Disk contents:Sorted Runs: Disk 1: A 1 A 2 A 3 A 4 1: A 1 B 1 C 1 Disk 2: B 1 B 2 B 3 B 4 2: C 2 B 2 A 2 Disk 3: C 1 C 2 C 3 C 4 3: C 3 B 3 A 3 4: C 4 B 4 A 4 (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Reverse Write Stream Σ R : C 1 B 1 A 4 A 3 A 2 A 1 B 4 B 3 B 2 C 4 C 3 C 2 Optimal Schedule via Duality (Absolute Best!) Greedy (Nonoptimal) Read Schedule Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Read I/O Steps123456 Disk 1A1A1 A2A2 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1

Greedy (Nonoptimal) Read Schedule Read I/O Steps Disk 1 Disk 2 Disk 3 123456 A1A1 B2B2 C2C2 Memory Buffer Contents 12345 A1A1 B2B2 C2C2 C3C3 B3B3 A2A2 A2A2 B3B3 C3C3 C4C4 B4B4 B4B4 C4C4 B1B1 C1C1 A2A2 B1B1 C1C1 A3A3 A3A3 A4A4 A4A4 The sequence Σ is defined by order of trigger values. At each step of the algorithm, for each disk, the block still on the disk with the smallest trigger value is read into memory. In step 1, A 1, B 2, C 2 are read into memory. C 2 (the first block in Σ) is processed and removed from memory. A 2, B 3, C 3 are read into memory (smallest trigger value left on each disk). C 3 is processed and removed from memory. A 3, B 4, C 4 have the smallest trigger values left on each disk. However, there are already four items in memory. Of the seven, A 2 and A 3 have the largest trigger values, so C 4 and B 4 are read in, A 2 is kicked back out to disk, and A 3 is not read in. C 4, B 2, B 3, B 4, and A 1 are processed and removed from memory. A 2, B 1, and C 1 are read into memory (smallest trigger value left on each disk). A 2 is processed and is then removed from memory. A 3 is read into memory. A 3 is processed and then removed from memory. A 4 is read into memory. A 4, B 1, and C 1 are processed and removed from memory. Greedy Algorithm takes 6 I/O read steps. Disk contents:Sorted Runs: Disk 1: A 1 A 2 A 3 A 4 1: A 1 B 1 C 1 Disk 2: B 1 B 2 B 3 B 4 2: C 2 B 2 A 2 Disk 3: C 1 C 2 C 3 C 4 3: C 3 B 3 A 3 4: C 4 B 4 A 4 (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Reverse Write Stream Σ R : C 1 B 1 A 4 A 3 A 2 A 1 B 4 B 3 B 2 C 4 C 3 C 2

Optimal Write Schedule (Reverse Optimal Read Schedule) Memory Buffers Disk 1 Disk 2 Disk 3 Write I/O Steps Disk 1 Disk 2 Disk 3 12345 A4A4 B1B1 C1C1 A4A4 C1C1 B1B1 A3A3 A2A2 A1A1 B3B3 B4B4 A3A3 B4B4 B2B2 C4C4 A2A2 B3B3 C4C4 C3C3 C2C2 A1A1 B2B2 C3C3 C2C2 The reverse write sequence is processed to find the optimal solution. The algorithm fills main memory in the order of the reverse sequence and writes the blocks at the head of the queues when main memory is full. Main memory is filled with C 1, B 1, A 4, A 3, and A 2. A 4, B 1, and C 1 are written. Main memory is filled with A 1, B 4, and B 3. A 3 and B 4 are written and can be removed from memory. Main memory is filled with B 2 and C 4. A 2, B 3, and C 4 are written and can be removed from memory. Main memory is filled with C 3 and C 2. A 1, B 2, and C 3 are written and can be removed from memory. C 2 is written and can be removed from memory. The reverse of this sequence of writes is the optimal read schedule. Disk contents:Sorted Runs: Disk 1: A 1 A 2 A 3 A 4 1: A 1 B 1 C 1 Disk 2: B 1 B 2 B 3 B 4 2: C 2 B 2 A 2 Disk 3: C 1 C 2 C 3 C 4 3: C 3 B 3 A 3 4: C 4 B 4 A 4 (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Reverse Write Stream Σ R : C 1 B 1 A 4 A 3 A 2 A 1 B 4 B 3 B 2 C 4 C 3 C 2

 The optimal read schedule as defined by duality reads blocks in as late as possible (since the reverse write schedule writes blocks as early as possible). We call it Lazy Prefetching.  This laziness can be a problem if there are timing issues. It also does not lend itself to online scheduling with finite lookahead.  Solution: Prudent Prefetching pushes up the reads of the optimal lazy schedule as far as possible.  The algorithm processes the optimal schedule starting from the first read step and reads blocks earlier when possible, under the consistency constraint that if block x is read before block y in the original lazy schedule, then y cannot be read before x in the prudent schedule.  Counteracts possible I/O delays by not waiting until the last minute to read blocks. This algorithm is less sensitive to timing delays. Prudent Prefetching

Optimal Schedule via Duality Prudent Prefetching Schedule Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Prudent Prefetching Example (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1

Greedy Schedule Prudent Prefetching Schedule Read I/O Steps12345 Write I/O Steps54321 Disk 1A1A1 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1 Comparison of Prudent Prefetching with Greedy (Internal) Memory size = 5 Ordered Read Stream Σ : C 2 C 3 C 4 B 2 B 3 B 4 A 1 A 2 A 3 A 4 B 1 C 1 Read I/O Steps123456 Disk 1A1A1 A2A2 A2A2 A3A3 A4A4 Disk 2B2B2 B3B3 B4B4 B1B1 Disk 3C2C2 C3C3 C4C4 C1C1

 It is usually easier to demonstrate optimality for the Writing Problem.  Result: (exact) optimal solutions to the corresponding Read (Prefetching) Problems on the reverse sequence [Kallahalla- Varman, Hutchinson-Sanders-Vitter]. Exploiting Duality between Reading and Writing

What if a block appears multiple times in Σ (and thus Σ R )?  For the Read (Prefetching) Problem, a block does not need to be read from the disks if it is already (cached) in memory.  Define a corresponding write problem: a block does not need to be written to the disk if a more recent version of the block has replaced it in memory. Each disk queue in memory is managed using Belady-algorithm: At each I/O step, write to disk the block whose next access is furthest into the future.  Result is optimal write schedule, and thus by duality, optimal prefetching (read) schedule. Duplicate Blocks in the Access Sequence

 In the previous read and write problems, the layout of the blocks on the disks was specified beforehand.  What if we can choose the layout of the blocks on the disks, as in sorting? Can we do better? What’s to Come:  Using a regular but randomized layout (randomized cycling), we showed how to do distribution (writing) very efficiently.  Result is a fast and practical distribution sort.  If there is a duality between distribution and merging, then we can get an optimal merge by computing the optimal distribution on the reverse sequendce.  But what is the sequence??? The merge is defined by runs, not a single access sequence Σ. Our Goal  Compute in advance the access sequence Σ corresponding to the merge. (We saw a hint of this earlier.) Application to Sorting

Duality between Reading and Writing

Duality between Distribution Sort and Merge Sort

 In each pass of Mergesort, sorted runs of data are merged together. A run consists of blocks of sorted data.  Runs are typically striped across parallel disks.  In main memory, each run has a (partially filled) block participating in the current merge.  The first element in each block is a trigger value that determines when the block must be in main memory.  Σ is formed by ordering the blocks based on trigger value.  Duality can be used to obtain an optimal read schedule by finding an optimal write schedule on the reverse sequence Σ R. Coming Full Circle: I/O Efficient Mergesort via Duality

Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing.

Similar presentations

Presentation on theme: "Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing.

Similar presentations

Presentation on theme: "Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing."— Presentation transcript:

Similar presentations

About project

Feedback