Cosequential Processing Chapter 8. Cosequential processing model Two or more input files sorted the same way on the same keys set current record to first.

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
February 15 & 171 Csci 2111: Data and File Structures Week 6, Lectures 1 & 2 Cosequential Processing and the Sorting of Large Files.
Chapter 8 Cosequential Processing and the Sorting of Large Files
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Heapsort By: Steven Huang. What is a Heapsort? Heapsort is a comparison-based sorting algorithm to create a sorted array (or list) Part of the selection.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Chapter 8 Cosequential Processing and the Sorting of Large Files
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
3-Sorting-Intro-Heapsort1 Sorting Dan Barrish-Flood.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Heaps and heapsort COMP171 Fall 2005 Part 2. Sorting III / Slide 2 Heap: array implementation Is it a good idea to store arbitrary.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Sorting Chapter 12 Objectives Upon completion you will be able to:
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Heapsort Based off slides by: David Matuszek
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Data Structure & Algorithm II.  Delete-min  Building a heap in O(n) time  Heap Sort.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
CSC 213 – Large Scale Programming Lecture 15: Heap-based Priority Queue.
Data Structure Introduction.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Heapsort Idea: two phases: 1. Construction of the heap 2. Output of the heap For ordering number in an ascending sequence: use a Heap with reverse.
1 Heaps (Priority Queues) You are given a set of items A[1..N] We want to find only the smallest or largest (highest priority) item quickly. Examples:
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
Internal and External Sorting External Searching
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Ludim Castillo. How does the algorithm work? 2 step algorithm 1 st step Build heap out of the data 2 nd step Remove the largest element of the heap. Insert.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Heap Sort Uses a heap, which is a tree-based data type Steps involved: Turn the array into a heap. Delete the root from the heap and insert into the array,
Chapter 5 Record Storage and Primary File Organizations
CENG 3511 External Sorting. CENG 3512 Outline Introduction Heapsort Multi-way Merging Multi-step merging Replacement Selection in heap-sort.
Sorting With Priority Queue In-place Extra O(N) space
Sorting Lower Bound 4/25/2018 8:49 PM
External Sorting Chapter 13
Heapsort CSE 373 Data Structures.
Subject Name: File Structures
Efficiency (Chapter 2).
(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.
Heap Sort The idea: build a heap containing the elements to be sorted, then remove them in order. Let n be the size of the heap, and m be the number of.
Database Management Systems (CS 564)
External Sorting Chapter 13
(2,4) Trees 12/4/2018 1:20 PM Sorting Lower Bound Sorting Lower Bound.
Improve Run Generation
Lecture 2- Query Processing (continued)
Heapsort CSE 373 Data Structures.
(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.
제 7 장 Cosequential Processing and the Sorting of Large Files
Heapsort Build the heap.
CENG 351 Data Management and File Structures
Database Systems (資料庫系統)
External Sorting Chapter 13
Presentation transcript:

Cosequential Processing Chapter 8

Cosequential processing model Two or more input files sorted the same way on the same keys set current record to first record in each file loop till no more current records –compare all current records –“smallest” current record copied to output –read next record in all files with “smallest” current record

8.3 K-way merge Comparison loop: find smallest key from among k input files Maintaining a selection tree reduces the number of comparisons from k to log 2 k

Selection tree example 3, 5, 12, , 15, 23, , 14, 20, , 8, 11, , 6, 7, , 16, 17, , 13, 18, , 16, 16, list 1 list 3 list 7 list 2 list 6 list 4 list 5 list 8 3 (list 1) 11 (list 4) 9 (list 7) 2 (list 6) 3 (list 1) 2 (list 6) How will the tree look after the next record is processed?

8.4 Heapsort Animated demonstration: Items may be inserted into sorted heap using O(log 2 n) comparisons Overlapped I/O

8.5.6 Replacement Selection Each time a record is written from the current heap, compare it to the next incoming record to see if it can be included in the current run (i.e., if it comes after the record just written in sorted order) New records that can’t go in current run are added to secondary heap

Replacement selection example Secondary Heap: 8 records Primary Heap: 12 records Next record to be output Total memory capacity: 20 records Next record in input file

All records in secondary stack are “smaller” than the last record already written to the current output file, and therefore cannot be included in the current run Output next (root) record from primary heap, making room in memory for the next record to be read in Replacement selection example

Secondary Heap: 8 records Primary Heap: 12 records Next record output to current run Total memory capacity: 20 records Next record read into memory

If new record key is smaller than the one just output... Secondary Heap: 8 records Primary Heap: 12 records Total memory capacity: 20 records...add it to the secondary heap 9 11

... and readjust both heaps Secondary Heap: 9 records Primary Heap: 11 records Total memory capacity: 20 records

Otherwise, add it to the primary heap and readjust, so it will be included in the current run. Secondary Heap: 8 records Primary Heap: 12 records Total memory capacity: 20 records

Secondary heap grows as primary heap shrinks When last record is written from primary heap, and secondary heap fills memory: –close the file for the current run –start a new run, using the current secondary heap as the primary heap Replacement selection example

Would the overall efficiency of the merge sort be improved by just writing the new record out to a separate file instead of keeping it in memory in a secondary heap? –How would this affect the size of the current and subsequent runs? –How would this effect the number of times each record must be read and written? Replacement selection question