Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Chapter 15 B External Methods – B-Trees. © 2004 Pearson Addison-Wesley. All rights reserved 15 B-2 B-Trees To organize the index file as an external search.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
© 2006 Pearson Addison-Wesley. All rights reserved12 A-1 Chapter 12 Tables and Priority Queues.
© 2006 Pearson Addison-Wesley. All rights reserved10-1 Chapter 10 Algorithm Efficiency and Sorting CS102 Sections 51 and 52 Marc Smith and Jim Ten Eyck.
Chapter 8 File organization and Indices.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
© 2006 Pearson Addison-Wesley. All rights reserved12 A-1 Chapter 12 Heaps.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Organizing files for performance Chapter Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:
Preliminaries Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children Trees –For all.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 9: Algorithm Efficiency and Sorting Data Abstraction &
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
© 2011 Pearson Addison-Wesley. All rights reserved 10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Chapter 10 B Algorithm Efficiency and Sorting. © 2004 Pearson Addison-Wesley. All rights reserved 9 A-2 Sorting Algorithms and Their Efficiency Sorting.
© 2006 Pearson Addison-Wesley. All rights reserved12 A-1 Chapter 12 Tables and Priority Queues.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Indexing.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Elementary Data Organization. Outline  Data, Entity and Information  Primitive data types  Non primitive data Types  Data structure  Definition 
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Internal and External Sorting External Searching
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Chapter 5 Record Storage and Primary File Organizations
1 Indexing Lecture HW#3 & Project See course page for new instructions: submit source code and output of program on the given pairs of actors Can.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Introduction to File Processing with PHP. Review of Course Outcomes 1. Implement file reading and writing programs using PHP. 2. Identify file access.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
CHP - 9 File Structures.
Indexing Goals: Store large files Support multiple search keys
Ch. 8 File Structures Sequential files. Text files. Indexed files.
Processing Data in External Storage
External Methods Chapter 15 (continued)
File organization and Indexing
Chapter 11: Indexing and Hashing
Lecture#12: External Sorting (R&G, Ch13)
Indexing and Hashing Basic Concepts Ordered Indices
Lecture 2- Query Processing (continued)
Advanced Implementation of Tables
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Chapter 11: Indexing and Hashing
Presentation transcript:

Chapter 15 A External Methods

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the execution period of a program –Generally, there is more external storage than internal memory Sequential access file –To access the data, you must advance the file window beyond all the intervening data –Resembles a linked list Random access file –Data can be accessed at a given position directly –Resembles an array –Essential for external tables

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-3 A Look At External Storage Figure 15.1 Internal and external memory Blocks

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-4 A Look At External Storage A file consists of data records –Records are organized into one or more blocks The number of records in a block is a function of the size of the records #records = f(size_of_record) = size_of_block / size_of_record Random access file –All input and output is at the block level –Blocks, not records or data items, are transferred between memory and disk Buffer –A location that temporarily stores data as it makes its way from one process or location to another –Used while transferring data between internal and external memory

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-5 A Look At External Storage Figure 15.2 A file partitioned into blocks of records

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-6 A Look At External Storage Once the system has read a block into the buffer buf, the program can process the records in the block If the program modifies the records in buf, it must write buf back out to dataFile The number of block accesses should be reduced as much as possible –Block access time is the dominant factor when considering an algorithm’s efficiency

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-7 Sorting Data in An External File What “data” are to be sorted? –Records will be sorted in search-key order The challenge with sorting data in an external file –An external file is too large to fit into internal memory all at once –Sorting algorithms presented earlier in the book assume that all the data to be sorted is available at one time in internal memory Solution –Use a modified version of mergesort

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-8 Sorting Data in An External File External mergesort –Phase 1 Read a block from F (data file to be sorted) into internal memory, sort its records by using an internal sort, and write the sorted block out to F 1 (a work file) before reading the next block from F Repeat the above step for all the blocks of F –Phase 2 (a sequence of merge steps) Each merge step –Merges pairs of sorted runs to form larger sorted runs –Doubles the number of blocks in each sorted run –Halves the total number of sorted runs At the end –F 1 will contain all the records of the original file in sorted order

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-9 Sorting Data in An External File Figure 15.3 a) 16 sorted runs, 1 block each, in file F 1 ; b) 8 sorted runs, 2 blocks each, in file F 2 ; c) 4 sorted runs, 4 blocks each, in file F 1 ; d) 2 sorted runs, 8 blocks each, in file F 2

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-10 Sorting Data in An External File Figure 15.4 a) Merging single blocks; b) merging log runs in1 in2 out B1B1 B3B3 … R1R1 B2B2 B4B4 … R2R2 Next pair to merge Write when out becomes full in1 in2 out B1B1 B2B2 R1R1 B5B5 R2R2 Read when either in1 or in2 becomes empty Write when out becomes full B3B3 B4B4 B6B6 B7B7 B8B8

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-11 External Tables External implementation of the ADT table –Records are stored in search-key order The file can be traversed in sorted order Main advantage –A binary search can be used to locate the block that contains a given search key Main disadvantage –tableInsert and tableDelete operations can require many costly block accesses due to the need to shift records

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-12 External Tables Figure 15.5 Shifting across block boundaries

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-13 Indexing An External File An index (or index file) –Used to locate items in an external data file –Contains an index record for each record in the data file Figure 15.6 A data file with an index

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-14 Indexing An External File An index record has two parts –A key contains the same value as the search key of its corresponding record in the data file –A pointer shows the number of the block in the data file that contains the data record Advantages of an index file –An index file can often be manipulated with fewer block accesses than would be needed to manipulate the data file –Data records do not need to be shifted during insertions and deletions –Allows multiple indexing

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-15 Indexing An External File A simple scheme for organizing the index file –Store index records sequentially Figure 15.7 A data file with a sorted index file

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-16 Indexing An External File Storing index records sequentially –tableRetrieve operation Can be performed by using a binary search on the index file –tableInsert and tableDelete operations Require only the shifting of index records, not data records –Benefits of shifting index records rather than data records »Reduction in the maximum number of block accesses required »Reduction in the time requirement for a single shift –More efficient than having a sorted data file –Not as efficient as using hashing or search trees to organize the index file

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-17 External Hashing The index file, not the data file, is hashed –Separate chaining –Each entry table[i] is associated with a linked list of blocks of the index file –Each block of table[i] ’s linked list contains index records whose keys hash into location i –To form the linked list, space must be reserved in each block for a block pointer A block pointer is the integer block number of the next block in the chain A block pointer is NOT a reference. Null is represented by -1. Figure 15.9 A single block with a pointer

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-18 External Hashing Figure 15.8 A hashed index file Note that: Index file – index blocks – index records Data file – data blocks – data records

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-19 External Hashing Retrieval under external hashing of an index file –Apply the hash function to the search key –Find the first block in the chain of index blocks (these blocks contain index records that hash into location i) –Search for the block with the desired index record –Retrieve the data item, if present

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-20 External Hashing Insertion under external hashing of an index file –Step 1: Insert the data record into the data file New record can be inserted anywhere in the data file –Step 2: Insert a corresponding index record into the index file For an index record that has key value searchKey and reference value p –Apply the hash function to searchKey, letting »i = h(searchKey) –Insert the index record into the chain of blocks that the entry table[i] points to

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-21 External Hashing Deletion under external hashing of an index file –To delete the data record whose search key is searchKey Step 1: Search the index file for the corresponding index record –Apply the hash function to searchKey, letting »i = hash(searchKey) –Search the chain of index blocks pointed to by the entry table[i] for an index record whose key value equals searchKey –If an index record is found »Note the block number p »Delete the index record

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-22 External Hashing Deletion under external hashing of an index file –To delete the data record whose search key is searchKey –Step 2: Delete the data record from the data file Access the block p Search the block for the record Delete the record Write the block back to the file Discussions: –Insertion updates data file then index file –Deletion updates index file then data file Why opposite steps?

© 2004 Pearson Addison-Wesley. All rights reserved 15 A-23 External Hashing External hashing implementation –Should be chosen for performing the following operations on a large external table tableRetrieve tableInsert tableDelete –Not practical for some operations, such as Sorted traversal Retrieval of the smallest or largest item Range queries that require ordered data –Not practical for applications demanding guaranteed performance