Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park

Outline  Overview  Measures of Query Cost  Selection Operation  Sorting  Join Operation(will be covered in the next file)  Other Operations(will be covered in the next file)  Evaluation of Expressions(will be covered in the next file)

Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation

Optimization  A relational algebra expression can be evaluated in many ways  Example  salary  75000 (  salary (instructor)) is equivalent to  salary (  salary  75000 (instructor))  Annotated expression specifying detailed evaluation strategy is called an evaluation plan  Example (1) can use an index on instructor to find tuples with salary < 75000 (2) can perform complete relation scan and discard instructors with salary ≥ 75000  Query optimization: Amongst all equivalent evaluation plans choose the one with lowest cost (details in Chapter 13)

Measures of Query Cost (1/2)  Cost is generally measured as total elapsed time for answering query; many factors contribute the cost (disk accesses, CPU, …)  Typically disk access is the predominant cost, and is also relatively easy to estimate; measured by taking into account  Number of seeks  average-seek-cost  Number of blocks read  average-block-read-cost  Number of blocks written  average-block-write-cost  For simplicity we just use number of block transfers from disk as the cost measure

Measures of Query Cost (2/2)  Costs depend on the size of the buffer in main memory  Having more memory reduces need for disk access  Amount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution  We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available  Real systems take CPU cost, difference between sequential and random I/O, and buffer size into account  We do not include cost to writing output to disk in our cost formula

Selection Operation  In query processing, the file scan is the lowest-level operator to access data  File scans are search algorithms that locate and retrieve records that satisfy a selection condition  In relational systems, a file scan allows an entire relation to be read in those cases where the relation is stored in a single, dedicated file

Basic Algorithms: Linear Search (A1)  Scan each file block and test all records to see whether they satisfy the selection condition  Cost estimate (number of disk blocks scanned) = b r (b r denotes number of blocks containing records from relation r)  Selections on key attributes have an average cost b r /2, but still have a worst-case cost of b r  Linear search algorithm can be applied to any file, regardless of  Ordering of records in the file  Availability of indices  Nature of the selection operation

Basic Algorithms: Binary Search (A2)  Applicable if selection is an equality comparison on the attribute on which the file is ordered  Cost estimate (number of disk blocks to be scanned)   log 2 (b r )  - cost of locating the first tuple by a binary search on the blocks  Plus number of blocks containing records that satisfy selection condition

Selection Using Indices (1/2)  Search algorithms that use an index are referred to as index scans (selection condition must be on search-key of index)  A3 (primary index on candidate key, equality)  Retrieve a single record that satisfies the equality condition  If a B + -tree is used, the cost is equal to the height of the tree plus one I/O to fetch the record; Cost = HT i + 1

Selection Using Indices (2/2)  A4 (primary index on nonkey, equality)  Records will be on consecutive blocks  Cost = HT i + number of blocks containing retrieved records  A5(a) (secondary index on candidate key, equality)  Cost = HT i + 1 (ignoring the cost for bucket access)  A5(b) (secondary index on nonkey, equality)  Cost = HT i + number of records retrieved (ignoring the cost for bucket access) (each record may be on a different block, very expensive)

Selections Involving Comparisons (1/2)  Can implement selections of the form  A≤V (r) or  A≥V (r) by using  A file scan  Or by using indices in the following ways  A6 (primary index, comparison) (Relation is sorted on A)  For  A≥V (r), use index to find first tuple ≥ v and scan relation sequentially from there  For  A≤V (r), just scan relation sequentially till first tuple > v; do not use index

Selections Involving Comparisons (2/2)  A7 (secondary index, comparison)  For  A≥V (r), use index to find first index entry ≥ v and scan index sequentially from there, to find pointers to records  For  A≤V (r), just scan leaf pages of index finding pointers to records, till first entry > v  In either case, retrieve records that are pointed to  Requires an I/O for each record  Linear file scan may be cheaper if many records are to be fetched

Sorting  For relations that fit in memory, techniques like quicksort can be used  For relations that don’t fit in memory, external sort-merge is a good choice

External Sort-Merge (1/3) Let M denote memory size (in blocks)  Create sorted runs  Let i be 0 initially  Repeatedly do the following till the end of the relation:  Read M blocks of relation into memory  Sort the in-memory blocks  Write sorted data to run R i ; increment i  Let the final value of i be N

External Sort-Merge (2/3)  Merge the runs (N-way merge)  We assume (for now) that N < M  Use N blocks of memory as buffers for input runs, and 1 block as buffer for output. Read the first block of each run into its input buffer  Repeatedly do the following until all input buffers are empty:  Select the first record (in sort order) among all input buffers  Write the record to the output buffer; if the output buffer is full, write it to disk  Delete the record from its input buffer; if the input buffer becomes empty, then read the next block (if any) of the run into the input buffer

External Sort-Merge (3/3)  If N  M, several merge passes are required  In each pass, contiguous groups of M - 1 runs are merged  A pass reduces the number of runs by a factor of M - 1, and creates runs longer by the same factor  Repeated passes are performed till all runs have been merged into one

Example: External Sort-Merge 1 9

Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Similar presentations

Presentation on theme: "Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Similar presentations

Presentation on theme: "Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park."— Presentation transcript:

Similar presentations

About project

Feedback