Data Engineering Query Optimization (Cost-based optimization)

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Data Engineering SQL Query Processing Shivnath Babu.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Query Optimization Problem Pick the best plan from the space of physical plans.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
CS 440 Database Management Systems
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 16: Relational Operators
Introduction to Query Optimization
Evaluation of Relational Operations
CPSC-608 Database Systems
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Introduction to Database Systems
File Processing : Query Processing
Relational Operations
Query Optimization Overview
CS143:Evaluation and Optimization
Performance Join Operator Select * from R, S where R.a = S.a;
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Database Management Systems (CS 564)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 13: Query Execution
CS505: Intermediate Topics in Database Systems
Relational Query Optimization
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
C. Faloutsos Query Optimization – part 2
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 20: Query Execution
Presentation transcript:

Data Engineering Query Optimization (Cost-based optimization) Shivnath Babu

Query Optimization Problem Pick the best plan from the space of physical plans

Cost-Based Optimization Prune the space of plans using heuristics Estimate cost for remaining plans Be smart about how you iterate through plans Pick the plan with least cost Focus on queries with joins

Heuristics for pruning plan space Predicates as early as possible Avoid plans with cross products Only left-deep join trees

Physical Plan Selection Logical Query Plan P1 P2 …. Pn C1 C2 …. Cn Pick minimum cost one Physical plans Costs

Review of Notation T (R) : Number of tuples in R B (R) : Number of blocks in R

Note: The simple cost model used for illustration only Cost (R S) = T(R) + T(S) All other operators have 0 cost Note: The simple cost model used for illustration only

Cost Model Example X T(X) + T(T) T T(R) + T(S) R S Total Cost: T(R) + T(S) + T(T) + T(X)

Selinger Algorithm Dynamic Programming based Dynamic Programming: General algorithmic paradigm Exploits “principle of optimality” Useful reading: Chapter 16, Introduction to Algorithms, Cormen, Leiserson, Rivest

Principle of Optimality Optimal for “whole” made up from optimal for “parts”

Principle of Optimality Query: R1 R2 R3 R4 R5 R3 R2 R4 R1 R5 Optimal Plan:

Principle of Optimality Query: R1 R2 R3 R4 R5 Optimal Plan: R5 R1 R4 R3 R2 Optimal plan for joining R3, R2, R4, R1

Principle of Optimality Query: R1 R2 R3 R4 R5 R3 R2 R4 R1 R5 Optimal Plan: Optimal plan for joining R3, R2, R4

Exploiting Principle of Optimality Query: R1 R2 … Rn R3 R1 R2 R3 R1 R2 Sub-Optimal for joining R1, R2, R3 Optimal for joining R1, R2, R3

Exploiting Principle of Optimality Rj Sub-Optimal for joining R1,…,Rn R2 R3 R1 A sub-optimal sub-plan cannot lead to an optimal plan

Selinger Algorithm: Query: R1 R2 R3 R4 Progress of algorithm

Notation OPT ( { R1, R2, R3 } ): Cost of optimal plan to join R1,R2,R3 Number of tuples in R1 R2 R3

Selinger Algorithm: Min OPT ( { R1, R2, R3 } ): OPT ( { R1, R2 } ) + T ( { R1, R2 } ) + T(R3) Min OPT ( { R2, R3 } ) + T ( { R2, R3 } ) + T(R1) OPT ( { R1, R3 } ) + T ( { R1, R3 } ) + T(R2) Note: Valid only for the simple cost model

Selinger Algorithm: Query: R1 R2 R3 R4 Progress of algorithm

Selinger Algorithm: Query: R1 R2 R3 R4 Progress of algorithm

Selinger Algorithm: Query: R1 R2 R3 R4 Optimal plan: R2 R4 R3 R1

More Complex Cost Model DB System: Two join algorithms: Tuple-based nested loop join Sort-Merge join Two access methods Table Scan Index Scan (all indexes are in memory) Plans pipelined as much as possible Cost: Number of disk I/O s

Cost of Table Scan Table Scan Cost: B (R) R

Cost of Clustered Index Scan Cost: B (R) R

Cost of Clustered Index Scan Cost: B (X) R.A > 50 R

Cost of Non-Clustered Index Scan Cost: T (R) R

Cost of Non-Clustered Index Scan Cost: T (X) R.A > 50 R

Cost of Tuple-Based NLJ Cost for entire plan: NLJ Cost (Outer) + T(X) x Cost (Inner) X Outer Inner

Cost of Sort-Merge Join Cost for entire plan: Sort Sort Cost (Right) + Cost (Left) + 2 (B (X) + B (Y) ) X R1.A = R2.A Y Left Right R1 R2

Cost of Sort-Merge Join Cost for entire plan: Sort Cost (Right) + Cost (Left) + 2 B (Y) X R1.A = R2.A Y Left Right Sorted on R1.A R1 R2

Cost of Sort-Merge Join Cost for entire plan: Cost (Right) + Cost (Left) X R1.A = R2.A Y Sorted on R2.A Left Right Sorted on R1.A R1 R2

Cost of Sort-Merge Join Bottom Line: Cost depends on sorted-ness of inputs

Principle of Optimality? Query: R1 R2 R3 R4 R5 Optimal plan: SMJ (R1.A = R2.A) Plan X Scan R1 Is Plan X the optimal plan for joining R2,R3,R4,R5?

Violation of Principle of Optimality (unsorted on R2.A) (sorted on R2.A) Plan X Plan Y Suboptimal plan for joining R2,R3,R4,R5 Optimal plan for joining R2,R3,R4,R4

Principle of Optimality? Query: R1 R2 R3 R4 R5 Optimal plan: SMJ (R1.A = R2.A) Plan X Scan R1 Can we assert anything about plan X?

Weaker Principle of Optimality If plan X produces output sorted on R2.A then plan X is the optimal plan for joining R2,R3,R4,R5 that produces output sorted on R2.A If plan X produces output unsorted on R2.A then plan X is the optimal plan for joining R2, R3, R4, R5

Interesting Order An attribute is an interesting order if: participates in a join predicate Occurs in the Group By clause Occurs in the Order By clause

Interesting Order: Example Select * From R1(A,B), R2(A,B), R3(B,C) Where R1.A = R2.A and R2.B = R3.B Interesting Orders: R1.A, R2.A, R2.B, R3.B

Modified Selinger Algorithm {R1,R2,R3} {R1,R2} {R1,R2}(A) {R1,R2}(B) {R2,R3} {R2,R3}(A) {R2,R3}(B) {R1} {R1}(A) {R2} {R2}(A) {R2}(B) {R3} {R3}(B)

Notation {R1,R2} (C) Optimal way of joining R1, R2 so that output is sorted on attribute R2.C

Modified Selinger Algorithm {R1,R2,R3} {R1,R2} {R1,R2}(A) {R1,R2}(B) {R2,R3} {R2,R3}(A) {R2,R3}(B) {R1} {R1}(A) {R2} {R2}(A) {R2}(B) {R3} {R3}(B)