CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

CS 540 Database Management Systems
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
1 Implementation of Relational Operations Module 5, Lecture 1.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CS186 Final Review Query Optimization.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
1 Implementation of Relational Operations: Joins.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
Query and Join Optimization 11/5. Overview Recap of Merge Join Optimization Logical Optimization Histograms (How Estimates Work. Big problem!) Physical.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Advanced Database Systems Notes:Query Processing (Overview) Shivnath Babu.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Data Engineering SQL Query Processing Shivnath Babu.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 440 Database Management Systems Query Optimization 1.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Query Optimization Problem Pick the best plan from the space of physical plans.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CS 440 Database Management Systems
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Data Engineering Query Optimization (Cost-based optimization)
Query Execution Presented by Khadke, Suvarna CS 257
Introduction to Database Systems
Performance Join Operator Select * from R, S where R.a = S.a;
Chapters 15 and 16b: Query Optimization
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 13: Query Execution
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu

Query Optimization Problem Pick the best plan from the space of physical plans

Cost-Based Optimization Prune the space of plans using heuristics Estimate cost for remaining plans –Be smart about how you iterate through plans Pick the plan with least cost Focus on queries with joins

Heuristics for pruning plan space Predicates as early as possible Avoid plans with cross products Only left-deep join trees

Physical Plan Selection Logical Query Plan P1 P2 …. Pn C1 C2 …. Cn Pick minimum cost one Physical plans Costs

Review of Notation T (R) : Number of tuples in R B (R) : Number of blocks in R

Simple Cost Model Cost (R S) = T(R) + T(S) All other operators have 0 cost Note: The simple cost model used for illustration only

Cost Model Example RS T X T(R) + T(S) T(X) + T(T) Total Cost: T(R) + T(S) + T(T) + T(X)

Selinger Algorithm Dynamic Programming based Dynamic Programming: –General algorithmic paradigm –Exploits “principle of optimality” –Useful reading: Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest

Principle of Optimality Optimal for “whole” made up from optimal for “parts”

Principle of Optimality Query: R1 R2 R3 R4 R5 R3R2 R4 R1 R5 Optimal Plan:

Principle of Optimality Query: R1 R2 R3 R4 R5 R3R2 R4 R1 R5 Optimal Plan: Optimal plan for joining R3, R2, R4, R1

Principle of Optimality Query: R1 R2 R3 R4 R5 R3R2 R4 R1 R5 Optimal Plan: Optimal plan for joining R3, R2, R4

Exploiting Principle of Optimality Query: R1 R2 … Rn R3R1 R2 R3 R1 Optimal for joining R1, R2, R3 Sub-Optimal for joining R1, R2, R3

Exploiting Principle of Optimality R3R1 R2 Ri Rj Sub-Optimal for joining R1,…,Rn A sub-optimal sub-plan cannot lead to an optimal plan

Query: R1 R2 R3 R4 { R1 }{ R2 }{ R3 }{ R4 } { R1, R2 }{ R1, R3 }{ R1, R4 }{ R2, R3 }{ R2, R4 }{ R3, R4 } { R1, R2, R3 }{ R1, R2, R4 }{ R1, R3, R4 }{ R2, R3, R4 } { R1, R2, R3, R4 } Progress of algorithm Selinger Algorithm:

Notation OPT ( { R1, R2, R3 } ): Cost of optimal plan to join R1,R2,R3 T ( { R1, R2, R3 } ): Number of tuples in R1 R2 R3

OPT ( { R1, R2, R3 } ): OPT ( { R2, R3 } ) + T ( { R2, R3 } ) + T(R1) OPT ( { R1, R2 } ) + T ( { R1, R2 } ) + T(R3) OPT ( { R1, R3 } ) + T ( { R1, R3 } ) + T(R2) Min Selinger Algorithm: Note: Valid only for the simple cost model

Query: R1 R2 R3 R4 { R1 }{ R2 }{ R3 }{ R4 } { R1, R2 }{ R1, R3 }{ R1, R4 }{ R2, R3 }{ R2, R4 }{ R3, R4 } { R1, R2, R3 }{ R1, R2, R4 }{ R1, R3, R4 }{ R2, R3, R4 } { R1, R2, R3, R4 } Progress of algorithm Selinger Algorithm:

Query: R1 R2 R3 R4 { R1 }{ R2 }{ R3 }{ R4 } { R1, R2 }{ R1, R3 }{ R1, R4 }{ R2, R3 }{ R2, R4 }{ R3, R4 } { R1, R2, R3 }{ R1, R2, R4 }{ R1, R3, R4 }{ R2, R3, R4 } { R1, R2, R3, R4 } Progress of algorithm Selinger Algorithm:

R2 R3 R4 R1 Selinger Algorithm: Optimal plan: Query: R1 R2 R3 R4

More Complex Cost Model DB System: –Two join algorithms: Tuple-based nested loop join Sort-Merge join –Two access methods Table Scan Index Scan (all indexes are in memory) –Plans pipelined as much as possible Cost: Number of disk I/O s

Cost of Table Scan Table Scan R Cost: B (R)

Cost of Clustered Index Scan Index Scan R Cost: B (R)

Cost of Clustered Index Scan Index Scan R Cost: B (X) R.A > 50 X

Cost of Non-Clustered Index Scan Index Scan R Cost: T (R)

Cost of Non-Clustered Index Scan Index Scan R Cost: T (X) R.A > 50 X

Cost of Tuple-Based NLJ NLJ entire Cost for entire plan: Cost (Outer) + T(X) x Cost (Inner) Inner X Outer

Cost of Sort-Merge Join Right X Left Y Sort Merge Cost for entire plan: Cost (Right) + Cost (Left) + 2 (B (X) + B (Y) ) R1.A = R2.A R1 R2

Cost of Sort-Merge Join Right X Left Y Sort Merge Cost for entire plan: Cost (Right) + Cost (Left) + 2 B (Y) R1.A = R2.A R1 R2 Sorted on R1.A

Cost of Sort-Merge Join Right X Left Y Merge Cost for entire plan: Cost (Right) + Cost (Left) R1.A = R2.A R1 R2 Sorted on R1.A Sorted on R2.A

Cost of Sort-Merge Join Bottom Line: Cost depends on sorted-ness of inputs

Principle of Optimality? Query: R1 R2 R3 R4 R5 SMJ R1 Is Plan X the optimal plan for joining R2,R3,R4,R5? Optimal plan: (R1.A = R2.A) Plan X Scan

Violation of Principle of Optimality Plan YPlan X (sorted on R2.A) (unsorted on R2.A) Optimal plan for joining R2,R3,R4,R4 Suboptimal plan for joining R2,R3,R4,R5

Principle of Optimality? Query: R1 R2 R3 R4 R5 SMJ Optimal plan: (R1.A = R2.A) Plan X Can we assert anything about plan X? R1 Scan

Weaker Principle of Optimality If plan X produces output sorted on R2.A then plan X is the optimal plan for joining R2,R3,R4,R5 that produces output sorted on R2.A If plan X produces output unsorted on R2.A then plan X is the optimal plan for joining R2, R3, R4, R5

Interesting Order interesting orderAn attribute is an interesting order if: –participates in a join predicate –Occurs in the Group By clause –Occurs in the Order By clause

Interesting Order: Example Select * From R1(A,B), R2(A,B), R3(B,C) Where R1.A = R2.A and R2.B = R3.B Interesting Orders: R1.A, R2.A, R2.B, R3.B

Modified Selinger Algorithm {R1}{R1}(A){R3}(B){R2}{R2}(A){R2}(B){R3} {R1,R2}{R1,R2}(A){R1,R2}(B){R2,R3}{R2,R3}(A){R2,R3}(B) {R1,R2,R3}

Notation {R1,R2} (C) Optimal way of joining R1, R2 so that output is sorted on attribute R2.C

Modified Selinger Algorithm {R1}{R1}(A){R3}(B){R2}{R2}(A){R2}(B){R3} {R1,R2}{R1,R2}(A){R1,R2}(B){R2,R3}{R2,R3}(A){R2,R3}(B) {R1,R2,R3}

Roadmap Query Optimization –Heuristics for pruning plan space –Selinger Algorithm –Estimating costs of plans