Database Administration

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 15-1 Query Processing and.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Chapter 19 Query Processing and Optimization
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing and Optimization
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Copyright © Curt Hill Query Evaluation Translating a query into action.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Query optimization Algorithms for execution query elements Execution strategy Query optimization when using indices.
Algorithms for Query Processing and Optimization
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Chapter 12: Query Processing
Query Processing.
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Relational Operations
Dynamic Hashing Good for database that grows and shrinks in size
Query Processing.
Chapter 12: Query Processing
Module 13: Query Processing
1/3/2019.
Chapter 13: Query Processing
Lecture 2- Query Processing (continued)
Advance Database Systems
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Evaluation of Relational Operations: Other Techniques
Algorithms for Query Processing and Optimization
Lecture 20: Query Execution
Presentation transcript:

Database Administration Query Processing

Query Processing How to evaluate this query efficiently? SELECT A.Name, B.Grade FROM A, B WHERE A.Id = B.Id  Name, Grade (Id=Id(A  B)) How to evaluate this query efficiently? What algorithms and access path to use?

Query Processing in Oracle You can view the query execution plan used in Oracle and other DBMSs

Explain in MySQL Simple means no union or subqueries All means do a full scan of the table Expected # rows to examine Filter rows using where clause sorting is needed for order by

Explain in MySQL Matching party_index to const to fetch the rows Expected # rows to examine decreases

Use primary key to retrieve the row Explain in MySQL Matches at most 1 row Use primary key to retrieve the row

Use primary index to fetch row with id = 300 Explain in MySQL SELECT in Outer Query Inner subquery Use primary index to fetch row with id = 300

Sorting Sorting is used to implement many relational operations (e.g., join, project, intersect, …) Ordering is specifically requested by users: SELECT … ORDER BY … Eliminate duplicate tuples: SELECT DISTINCT … To do sort-merge join (which is used for JOIN, UNION, and INTERSECTION operations) Problems Relations are typically large, do not fit in main memory So we cannot use traditional in-memory sorting algorithms (such as quicksort)

External Sorting Combines in-memory sorting with techniques for minimizing I/O Cost of sorting is often measured in terms of number of block transfers cost of in-memory sorting << I/O cost of block transfers 2-phase in external sorting: Partial sorting phase Merging phase

External Sorting nb = number of input buffer space in memory b = number of disk blocks for the file to be sorted Example: nb = 2, b = 7 5 3 2 6 2 input buffers in main memory 5 3 2 6 1 10 15 7 20 11 8 4 7 5 Unsorted file on external storage (disk) run

Partial Sorting Partial Sorting Phase: Example: nb=2, b=7 Fetch a segment of the unsorted file from disk into buffers Sort the data in buffers (e.g., using quicksort) Write the sorted file segment back to disk Example: nb=2, b=7 Number of runs = 7/2 = 4 Unsorted data file 5 3 2 6 1 10 15 7 20 11 8 4 7 5 Partially sorted file 2 3 5 6 Run 1 1 7 10 15 Run 2 4 8 11 20 Run 3 5 7 Run 4

Merging Merging Phase: Merge all runs using k input buffers and 1 output buffer If number of runs > k (k: degree of merging) Divide runs into groups of size k and merge each group into a run Repeat until all runs are merged into 1 group k-way merging

Partially sorted Input Merging Example Partially sorted Input Output 2-way merging 2 3 5 6 2 5 3 6 10 6 3 1 7 15 2 5 1 2 3 5 6 7 10 15 1 7 10 15 10 1 7 15 2 runs (of size 2) are merged into 1 run (of size 4) Input buffers Output buffer

Example of External Sorting nb = 4, b = 10, k=3 (3-way merging): After partial sorting After 3-way merging

Example of External Sorting nb = 4, b = 10, k=2 (2-way merging): After partial sorting After 2-way merging After 2-way merging

Cost of External Sorting Suppose we have b = 1024 blocks and nB = 5 input buffers Partial sorting: number of runs, nR = 1024/5 = 205 Merging: assume degree of merging, k = 5 After pass 1: number of runs remaining = 205/5 = 41 After pass 2: number of runs remaining = 41/5 = 9 After pass 3: number of runs remaining = 9/5 = 2 After pass 4: number of runs remaining = 2/5 = 1 Number of merging passes needed = logk nR = logk b/nB

Cost of External Sorting Partial sorting : Cost = 2b (each disk block is read once and written back once) Cost of merge phase (k is degree of merging) b/nB runs to be merged Number of passes needed = log k (b/nB) Each pass requires 2b page transfers Cost of merging =  2b log k(b/nB)  = 2b log k (nR) Overall cost of external sorting = cost of partial sorting + cost of merging = 2b + 2b log k (nR)

Query Processing SELECT AttribList FROM relations R1, …, Rk WHERE condition Translated to: AttribList (condition (R1  R2  …  Rk)) Example: SELECT Dno FROM Employee WHERE SSN = ‘1234567890’ Dno (SSN=‘1234567890’ (Employee))

Algorithms for Query Processing Relational Algebra operators SELECT JOIN PROJECT SET (UNION, INTERSECT, SET DIFFERENCE) AGGREGATE

Algorithms for SELECT Operations EMPLOYEE(SSN, Fname, Minit, Lname, Sex, Address, Salary, Dno) DEPARTMENT(Dno, Dname, Mgrssn, MgrStartDate) WORKS_ON(ESSN, Pno, Hours) Examples of SELECT operations: Simple selection (OP1): s SSN='123456789' (EMPLOYEE) (OP2): s DNUMBER>5(DEPARTMENT) (OP3): s DNO=5(EMPLOYEE) Complex selection: (OP4): s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE) (OP5): s ESSN=123456789 AND PNO=10(WORKS_ON)

Algorithms for SELECT Operation Algorithms for Simple Selection: Linear search (brute force, full tablescan): Retrieve every record in the file, and test whether its attribute values satisfy the selection condition Binary search If the selection condition involves an equality comparison on a key attribute on which the file is ordered, binary search can be used Use primary/clustering/secondary index Use the index to find the record(s) satisfying the corresponding selection condition

Algorithms for SELECT Operation Algorithms for Complex Selection:  DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE) Linear search (brute-force; full tablescan) Use an individual index E.g.: use the Dno index to retrieve the records and then check whether each retrieved record satisfies the remaining conditions Use a composite index E.g.: use the (Dno, Sex) composite index to retrieve the records and check whether they satisfy the remaining condition Intersection of record pointers: This method is possible if secondary indexes are available on all (or some of) the fields involved E.g.: intersect the record pointers returned by the indexes for Dno, Salary, and Sex

Algorithms for JOIN Operations R A=B S The cost of joining two relations makes the choice of a join algorithm crucial Examples EMPLOYEE DNO=DNUMBER DEPARTMENT DEPARTMENT MGRSSN=SSN EMPLOYEE

Computing Joins S R R A=B S Suppose bR and bS are the number of blocks in R and S, rR and rS are the number of tuples in r and s rS rows bS blocks S rR rows bR blocks R

Algorithms for JOIN Operations J1 Nested-loop join (brute force) R R.A=S.B S foreach tuple t  R do foreach tuple t’  S do if t.A = t’.B then output (t, t’) Cost = bR + rR  bS + cost of writing the final result Very expensive Order of the loop matters

Algorithms for JOIN Operations J1(b) Nested-block join Instead of joining 1 tuple at a time, join one block at a time Cost = bR + bR/(nB – 2)  bS + cost of writing the result nB is the number of buffers available Number of blocks (bR) << Number of records (rR) R R1 R2 Memory buffer R1 R2 Output Buffer Join R1 & R2 with blocks in S S2 S3 S4 S1 S S1 S2 S3 S4

Algorithms for JOIN Operations J2 Single-loop join (Using an access structure) R A=B S foreach tuple t in R do { use index to find all tuples t’ in S satisfying t.A = t’.B; output (t.t’) } Cost = bR + rR  cost of search + cost of writing output

Algorithms for JOIN Operations J3 Sort-merge join R R,A=S.B S sort R on attribute A; sort S on attribute B; while !eof(R) and !eof(S) do { Scan R and S concurrently until t.A = t’.B = c; Output A=c(R)  B=c (S) } A=c(R) R  S B=c (S)

Sort-Merge Join R S R A=B S D A B E 1 3 p p 0 9 q q 8 7 3 s s s 5 7 1 3 p p 0 9 q q 8 7 3 s s s 5 7 u u 1 1 v v 1 3 1 3 p p p p 4 0 0 4 8 7 3 s s s 7 7 7 5 7 5 7 5 7 u u u u u u 2 2 5 5 0 0 B E p p 4 0 r 9 s 7 t t 2 5 u u u 2 5 0 x R A=B S S

Algorithms for JOIN Operations J4 Hash-join: Use the same hashing function on the join attributes A of R and B of S as hash keys Hash the file with fewer records (say, R) to the hash file buckets. Hash the other file (S) to the appropriate bucket, where the record is combined with all matching records from R.

Example EMPLOYEE DNO=DNUMBER DEPARTMENT Meta-data: DEPARTMENT: Number of records, rD = 50 Number of disk blocks to store records, bD = 10 EMPLOYEE Number of records, rE = 6000 Number of disk blocks to store records, bE = 2000 Number of buffers available in memory, nB = 7 Size of each buffer is the same as size of each block on disk

Example: Nested Loop Join foreach tuple t  R do foreach tuple t’  S do if t.A = t’.B then output (t, t’) Simple nested loop join Cost = bR + rR bs + cost of output Nested-block join Suppose there are nB buffers Use 1 buffer for output, 1 buffer for relation S, nB – 2 for relation R Cost = bR + bR/(nB – 2)  bs + cost of output

Example: Block Nested Loop Join EMPLOYEE DNO=DNUMBER DEPARTMENT If EMPLOYEE is outer loop Cost = 2000 + 2000/5  10 + cost of output = 6000 If DEPARTMENT is outer loop Cost = 10 + 10/5  2000 + cost of output = 4010 More efficient to use DEPARTMENT as the outer loop Order of the tables in the nested loop matters!

MySQL Example

MySQL Example department table chosen for outer loop of nested block join employee table chosen for inner loop of nested block join

MySQL Example straight_join forces MySQL to process the join in the order given (employee for outer loop and department for inner loop)

Example: Single Loop Join DEPARTMENT MGRSSN=SSN EMPLOYEE Suppose there are multi-level indexes on SSN (for EMPLOYEE) : number of index levels, XSSN = 4 MGR_SSN (for DEPARTMENT): number of index levels, XMGR_SSN = 2 foreach tuple t in R do use index to find all tuples t’ in S satisfying t.A = t’.B; Use EMPLOYEE as outer loop Max Cost = bE + rE  (xMGR_SSN + 1) + cost of output = 2000 + 6000  3 + cost of output = 20000 + cost of output Use DEPARTMENT as outer loop Max Cost = bD + rD  (xSSN + 1) + cost of output = 10 + 50  5 + cost of output = 260 + cost of output

Scan all the rows in department (outer loop of single loop join) MySQL Example Scan all the rows in department (outer loop of single loop join) For each mgrId value, use the primary key index in employee table to fetch its corresponding row

Explaining EXPLAIN in MySQL type – determines how table is accessed (most frequent) “ALL” - full table scan “eq_ref” - reference by primary or unique key (1 row) “ref” - reference by non-unique key (multiple rows) possible_keys - indexes MySQL could use for this table key – index MySQL sellected to use “ref” - The column or constant this key is matched against “rows” - How many rows will be looked up in this table “extra” - Extra Information “Using Filesort” - external sort is used “Using where” - where clause will be resolved

Algorithms for PROJECT Operations Algorithm for PROJECT operations  <attribute list>(R) If <attribute list> has a key of relation R, extract all tuples from R with only the values for the attributes in <attribute list>. If <attribute list> does NOT include a key of relation R, duplicated tuples are removed from the results. Methods to remove duplicate tuples Sorting Hashing

Algorithms for SET Operations UNION, INTERSECTION, SET DIFFERENCE and CARTESIAN PRODUCT CARTESIAN PRODUCT of relations R and S Includes all possible combinations of records from R and S. The attributes of the result include all attributes of R and S. CARTESIAN PRODUCT operation is very expensive and should be avoided if possible. If R has n records and j attributes and S has m records and k attributes, the result relation will have n*m records and j+k attributes.

Algorithms for SET Operations UNION (See Figure 15.3c) Sort the two relations on the same attributes. Scan and merge both sorted files concurrently, whenever the same tuple exists in both relations, only one is kept in the merged results. INTERSECTION (See Figure 15.3d) Scan and merge both sorted files concurrently, keep in the merged results only those tuples that appear in both relations. SET DIFFERENCE R-S (See Figure 15.3e) Scan and merge both sorted files concurrently, keep in the merged results only those tuples that appear in relation R but not in relation S.

Implementing Aggregate Operations Aggregate operators: MIN, MAX, SUM, COUNT and AVG Options to implement aggregate operators: Table Scan Use Index Example: SELECT MAX (SALARY) FROM EMPLOYEE; If an (ascending) index on SALARY exists, then the optimizer could traverse the index for the largest value, which would entail following the right most pointer in each index node from the root to a leaf.