Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Query Processing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Dr. Kalpakis CMSC 461, Database Management Systems Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Query Processing Chapter 12
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
©Silberschatz, Korth and Sudarshan7.1 Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Chapter 13: Query Processing Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 13: Query Processing
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
1 B + -Trees: Search  If there are n search-key values in the file,  the path is no longer than  log  f/2  (n)  (worst case).
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
13.1 Chapter 13: Query Processing n Overview n Measures of Query Cost n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation.
Chapter 13: Query Processing. Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
©Silberschatz, Korth and Sudarshan1 Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Computing & Information Sciences Kansas State University Wednesday, 02 Apr 2008CIS 560: Database System Concepts Lecture 27 of 42 Wednesday, 02 April 2008.
Chapter 13: Query Processing
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Query Processing  Basic Steps in Query Processing – an overview  Measures of Query Cost  Query Processing- Several algorithms  Selection Operation.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 4: Query Processing
Database Management System
Chapter 13: Query Processing
Chapter 12: Query Processing
Chapter 12: Query Processing
Query Processing.
Chapter 13: Query Processing
File Processing : Query Processing
Dynamic Hashing Good for database that grows and shrinks in size
Query Processing.
Chapter 13: Query Processing
Chapter 12: Query Processing
Chapter 13: Query Processing
Module 13: Query Processing
Chapter 13: Query Processing
Chapter 12: Query Processing
Lecture 2- Query Processing (continued)
Chapter 13: Query Processing
Chapter 12 Query Processing (1)
Chapter 13: Query Processing
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Chapter 13: Query Processing
Chapter 13: Query Processing
Chapter 13: Query Processing
Chapter 12: Query Processing
Presentation transcript:

Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park

Outline  Overview  Measures of Query Cost  Selection Operation  Sorting  Join Operation(will be covered in the next file)  Other Operations(will be covered in the next file)  Evaluation of Expressions(will be covered in the next file)

Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation

Optimization  A relational algebra expression can be evaluated in many ways  Example  salary  (  salary (instructor)) is equivalent to  salary (  salary  (instructor))  Annotated expression specifying detailed evaluation strategy is called an evaluation plan  Example (1) can use an index on instructor to find tuples with salary < (2) can perform complete relation scan and discard instructors with salary ≥  Query optimization: Amongst all equivalent evaluation plans choose the one with lowest cost (details in Chapter 13)

Measures of Query Cost (1/2)  Cost is generally measured as total elapsed time for answering query; many factors contribute the cost (disk accesses, CPU, …)  Typically disk access is the predominant cost, and is also relatively easy to estimate; measured by taking into account  Number of seeks  average-seek-cost  Number of blocks read  average-block-read-cost  Number of blocks written  average-block-write-cost  For simplicity we just use number of block transfers from disk as the cost measure

Measures of Query Cost (2/2)  Costs depend on the size of the buffer in main memory  Having more memory reduces need for disk access  Amount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution  We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available  Real systems take CPU cost, difference between sequential and random I/O, and buffer size into account  We do not include cost to writing output to disk in our cost formula

Selection Operation  In query processing, the file scan is the lowest-level operator to access data  File scans are search algorithms that locate and retrieve records that satisfy a selection condition  In relational systems, a file scan allows an entire relation to be read in those cases where the relation is stored in a single, dedicated file

Basic Algorithms: Linear Search (A1)  Scan each file block and test all records to see whether they satisfy the selection condition  Cost estimate (number of disk blocks scanned) = b r (b r denotes number of blocks containing records from relation r)  Selections on key attributes have an average cost b r /2, but still have a worst-case cost of b r  Linear search algorithm can be applied to any file, regardless of  Ordering of records in the file  Availability of indices  Nature of the selection operation

Basic Algorithms: Binary Search (A2)  Applicable if selection is an equality comparison on the attribute on which the file is ordered  Cost estimate (number of disk blocks to be scanned)   log 2 (b r )  - cost of locating the first tuple by a binary search on the blocks  Plus number of blocks containing records that satisfy selection condition

Selection Using Indices (1/2)  Search algorithms that use an index are referred to as index scans (selection condition must be on search-key of index)  A3 (primary index on candidate key, equality)  Retrieve a single record that satisfies the equality condition  If a B + -tree is used, the cost is equal to the height of the tree plus one I/O to fetch the record; Cost = HT i + 1

Selection Using Indices (2/2)  A4 (primary index on nonkey, equality)  Records will be on consecutive blocks  Cost = HT i + number of blocks containing retrieved records  A5(a) (secondary index on candidate key, equality)  Cost = HT i + 1 (ignoring the cost for bucket access)  A5(b) (secondary index on nonkey, equality)  Cost = HT i + number of records retrieved (ignoring the cost for bucket access) (each record may be on a different block, very expensive)

Selections Involving Comparisons (1/2)  Can implement selections of the form  A≤V (r) or  A≥V (r) by using  A file scan  Or by using indices in the following ways  A6 (primary index, comparison) (Relation is sorted on A)  For  A≥V (r), use index to find first tuple ≥ v and scan relation sequentially from there  For  A≤V (r), just scan relation sequentially till first tuple > v; do not use index

Selections Involving Comparisons (2/2)  A7 (secondary index, comparison)  For  A≥V (r), use index to find first index entry ≥ v and scan index sequentially from there, to find pointers to records  For  A≤V (r), just scan leaf pages of index finding pointers to records, till first entry > v  In either case, retrieve records that are pointed to  Requires an I/O for each record  Linear file scan may be cheaper if many records are to be fetched

Sorting  For relations that fit in memory, techniques like quicksort can be used  For relations that don’t fit in memory, external sort-merge is a good choice

External Sort-Merge (1/3) Let M denote memory size (in blocks)  Create sorted runs  Let i be 0 initially  Repeatedly do the following till the end of the relation:  Read M blocks of relation into memory  Sort the in-memory blocks  Write sorted data to run R i ; increment i  Let the final value of i be N

External Sort-Merge (2/3)  Merge the runs (N-way merge)  We assume (for now) that N < M  Use N blocks of memory as buffers for input runs, and 1 block as buffer for output. Read the first block of each run into its input buffer  Repeatedly do the following until all input buffers are empty:  Select the first record (in sort order) among all input buffers  Write the record to the output buffer; if the output buffer is full, write it to disk  Delete the record from its input buffer; if the input buffer becomes empty, then read the next block (if any) of the run into the input buffer

External Sort-Merge (3/3)  If N  M, several merge passes are required  In each pass, contiguous groups of M - 1 runs are merged  A pass reduces the number of runs by a factor of M - 1, and creates runs longer by the same factor  Repeated passes are performed till all runs have been merged into one

Example: External Sort-Merge 1 9