Memory Management during Run Generation in External Sorting – Larson & Graefe.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
Memory Management Design & Implementation Segmentation Chapter 4.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
External Sorting R & G Chapter 13 One of the advantages of being
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Memory Allocation CS Introduction to Operating Systems.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Virtual Memory Chantha Thoeun. Overview  Purpose:  Use the hard disk as an extension of RAM.  Increase the available address space of a process. 
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
ICS 145B -- L. Bic1 Project: Main Memory Management Textbook: pages ICS 145B L. Bic.
Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
CS 241 Section Week #9 (11/05/09). Topics MP6 Overview Memory Management Virtual Memory Page Tables.
CS4432: Database Systems II Query Processing- Part 2.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
CENG 3511 External Sorting. CENG 3512 Outline Introduction Heapsort Multi-way Merging Multi-step merging Replacement Selection in heap-sort.
LINKED LISTS.
Module 11: File Structure
CHP - 9 File Structures.
External Sorting Chapter 13
Disk Storage, Basic File Structures, and Buffer Management
Lecture#12: External Sorting (R&G, Ch13)
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CENG 351 Data Management and File Structures
Database Systems (資料庫系統)
External Sorting Chapter 13
Lecture 20: Representing Data Elements
Presentation transcript:

Memory Management during Run Generation in External Sorting – Larson & Graefe

Need For External Merge Sort : One of most often used operations Used in joins Used in duplicate elimination User requests “sort” It is important for the algorithm to degrade gracefully. The query optimizer does not know ahead the exact number of records (vs. amount of memory available).

External Sorting External N-way merge sort (Usually) two phases: Run generation and merge During run generation, create one run at a time (load-sort-store): 1.Load records 2.Sort array of pointers to records (quicksort) 3.Scan pointer array while outputting records

Run Generation Problem: –Discontinuity in the cost function (thus it is difficult to estimate cost of a sort operator) due to load-sort-store –Note: That’s why (naively) hash join is preferred over sort-merge join Contribution: –Sorting algorithm with smooth cost function (graceful degradation)

This Paper Idea –Similar to hybrid-hash: Keep parts of the previous run in-memory Problem: Memory management during initial run generation

Assumptions Ascending sort order Sort must copy records into its own workspace No specialized disk page/file management Single thread of execution Variable-length records, no vertical partitioning of records possible

Assumptions (Contd.) Fixed amount of sort workspace (main memory) available, divided into extents An extent is a sequence of segments –Record and free segments, one bit indication in header –Each segment holds one record. –Segment type and length stored in segment header –Free segments store pointers for space management –Free segments never touch Separate output buffer (one or several pages for asynch. I/O)

Variable-Length Fields and Records: Human readable text data (non-numeric) Even in numeric records, NULL values will usually be not allocated Any non-traditional data (pictures, sound) Padding records is NOT a good idea! Managing variable length records deterred researches from using this approach

Memory Management Algorithms First fit (find first space that matches) Next fit (same as above, but move around) Best fit (find closest, larger match) Collapse free spaces Move records

Run Formation First, do not output the run from memory until needed! That is, only output records from memory when need space for new record These ‘left over’ records can be used in first merge step of external sort – reduces IO Depends on the fan-in of input buffers. If fan-in maximal, there is little records left in memory for this optimization.

Replacement Selection If we keep track of the highest key of a record written to disk from current output, we can add next record to the current output provided its key is higher Pointer array no longer efficient – use priority heap (‘tree of losers’) Runs on average twice the size of the available memory Result: steady IO exploits pre-sorting

Merging Always merge as many runs as possible Keep final run in memory to eliminate unnecessary IO Possibly reverse record order when writing to disk (not used)

Memory Management Next Fit –free segments implemented as linked list –first check the list –second consider moving records –finally throw current record out –example: inserting 120 byte record 100 free, N occupied, 60 free, 80 occupied

100 bytes80 bytes 140 bytes60 bytes 120 bytes 60 bytes

Next Fit Structure Record SegmentFree Segment Type Length Pointer to reference map(Forward Pointer) Actual RecordBackward Pointer

Memory Management (cont) Best Fit –unbalanced binary tree –go down the tree until right fit found or move one to right –need to know neighboring segments for merging

Best Fit Structure Record SegmentFree Segment Type Length Actual Record(Forward Pointer) Backward Pointer (Parent Pointer) Left Child Pointer Right Child Pointer

Results Best fit memory management algorithm achieved 90% memory utilization Runs were 1.8x size of available memory Degrades Gracefully!