UNIT 15 Query Optimization 15-1. Wei-Pang Yang, Information Management, NDHU Contents  15.1 Introduction to Query Optimization  15.2 The Optimization.

Slides:

Advertisements

Similar presentations

Query optimisation.

Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.

Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.

Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

1 Relational Query Optimization Module 5, Lecture 2.

Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.

CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Relational Query Optimization (this time we really mean it)

Query Optimization Chapter 15. Query Evaluation Catalog Manager Query Optmizer Plan Generator Plan Cost Estimator Query Plan Evaluator Query Parser Query.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.

CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.

Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.

ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.

Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.

CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.

Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.

©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.

Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:

Query Processing Presented by Aung S. Win.

Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Overview of Implementing Relational Operators and Query Evaluation

Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.

Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.

Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.

Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.

Database Management 9. course. Execution of queries.

Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.

Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.

CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.

Copyright © Curt Hill Query Evaluation Translating a query into action.

1-1 Homework 3 Practical Implementation of A Simple Rational Database Management System.

SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.

Switch off your Mobiles Phones or Change Profile to Silent Mode.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.

Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.

Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.

Index in Database Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦博士國立東華大學資訊管理系教授.

CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.

Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty

Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.

Chapter 13: Query Processing

CS4432: Database Systems II Query Processing- Part 1 1.

Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.

CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.

UNIT 11 Query Optimization

Database Management System

CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Introduction to Query Optimization

Overview of Query Optimization

Chapter 15 QUERY EXECUTION.

Introduction to Database Systems

Relational Query Optimization

Relational Query Optimization (this time we really mean it)

Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦博士國立東華大學資訊管理系教授.

CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Relational Query Optimization

Relational Query Optimization

Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦博士國立東華大學資訊管理系教授.

Presentation transcript:

UNIT 15 Query Optimization 15-1

Wei-Pang Yang, Information Management, NDHU Contents  15.1 Introduction to Query Optimization  15.2 The Optimization Process: An Overview  15.3 Optimization in System R  15.4 Optimization in INGRES  15.5 Implementing the Join Operators 15-2

15.1 Introduction to Query Optimization 15-3

Wei-Pang Yang, Information Management, NDHU The Problem  How to choose an efficient strategy for evaluating a given expression (a query). Expression (a query): e.g. select distinct S.SNAME from S, SP where S.S# =SP.S# and SP.P#= 'p2' Evaluate: Efficient strategy: First class e.g. (A join B) where condition-on-B (A join (B where condition-on-B) ) e.g. SP.P# = 'p2' Second class e.g. from S, SP ==> S join SP How to implement join operation efficiently? “Improvement" may not be an "optimal" version §15.5 Implementing the Join Operators

Wei-Pang Yang, Information Management, NDHU Query Processing in the DBMS Language Processor Optimizer Operator Processor Access Method File System database Language Processor Access Method ? Query in SQL: SELECT CUSTOMER. NAME FROM CUSTOMER, INVOICE WHERE REGION = 'N.Y.' AND AMOUNT > AND CUTOMER.C#=INVOICE.C# Internal Form : P( σ (S SP) Operator : SCAN C using region index, create C SCAN I using amount index, create I SORT C?and I?on C# JOIN C?and I?on C# EXTRACT name field Calls to Access Method: OPEN SCAN on C with region index GET next tuple. Calls to file system: GET10th to 25th bytes from block #6 of file #5 15-5

Wei-Pang Yang, Information Management, NDHU An Example Suppose: |S| = 100, |SP| = 10,000, and there are 50 tuples in SP with p# = 'p2'? Results are placed in Main Memory. Query in SQL: SELECT S.* FROM S,SP WHERE S.S# = SP.S# AND SP.P# = 'p2‘  Method 1: iteration (Join + Restrict) SNAME STATUSCITY S# S2 S5. S S S# P# QTY S3 S1. S ,000  SP Cost = 100 * 10,000 = 1,000,000 tuple I/O's 15-6

Wei-Pang Yang, Information Management, NDHU An Example (cont.) Method 2: Restriction iteration Join S# P# QTY S3 S1. S2 P4 P2. P SP S# P# QTY S1 S3. S2 P2. P SP' restrict p#= 'p2' SNAME STATUSCITY S# S2 S5. S S S# P# QTY S1 S3. S2 P2. P SP' cost = 10, * 50 = 15,000 I/O 15-7

Wei-Pang Yang, Information Management, NDHU An Example (cont.) Method 3: Sort-Merge Join + Restrict Suppose S, SP are sorted on S#. S# SNAME STATUS CITY S1 S2. S S S# P# QTY S1. S , SP cost = ,000 = 10,100 I/O 15-8

15.2 The Optimization Process: An Overview (1) Query => internal form (2) Internal form => efficient form (3) Choose candidate low-level procedures (4) Generate query plans and choose the cheapest one 15-9

Wei-Pang Yang, Information Management, NDHU Step 1: Cast the query into some internal representation Query: "get names of suppliers who supply part p2" SQL: select distinct S.SNAME from S,SP where S.S# = SP.S# and SP.P# = 'p2' Query tree: result | project (SNAME) | restrict (SP.P# = 'p2') | join (S.S# = SP.S#) S SP Query => Algebra 'P2'SNAME ( (S join SP) where P#= 'P2') [SNAME] or ( ( S SP) ) S.S# = SP.S# Algebra: 15-10

Wei-Pang Yang, Information Management, NDHU Step 2: Convert to equivalent and efficient form  Def: Canonical Form Given a set Q of queries, for q1, q2 belong to Q, q1 are equivalent to q2 (q1 q2) iff they produce the same result, Subset C of Q is said to be a set of canonical forms for Q iff  Note: Sufficient to study the small set C  Transformation Rules C Q output of step1 trans. equivalent and more efficient form Step2 Algebra Efficient Algebra 15-11

Wei-Pang Yang, Information Management, NDHU Step 2: Convert to equivalent and efficient form (cont.) e.g.1 [restriction first] (A join B) where restriction_B A join ( B where restriction_B) e.g.2 [More general case] (A join B) where restriction_A and restriction_B (A where rest_on_A) join ( B where rest_on_B) e.g.3 [ Combine restriction] ( A where rest_1 ) where rest_2 A where rest_1 and rest_2 C scan 2 1 q1q1 q2q2 q 1 ≡q

Wei-Pang Yang, Information Management, NDHU Step 2: Convert to equivalent and efficient form (cont.) e.g.4 [projection] last attribute (A [attribute_list_1] ) [attri_2] A [attri_2] e.g.5 [restriction first] (A [attri_1]) where rest_1 (A where rest _1) [attri_1] n1<n n+n

Wei-Pang Yang, Information Management, NDHU Step 2: Convert to equivalent and efficient form (cont.) e.g.6 [Introduce extra restriction] SP JOIN (P WHERE P.P#= 'P2') (SP WHERE SP.P# = 'P2') JOIN (P WHERE P.P# = 'P2') e.g.7 [Semantic transformation] (SP join P ) [S#] SP[S#] Note: a very significant improvement. Ref.[17.27] P.571 J. J. King, VLDB81  sp.p# = p.p# if restriction on join attribute if SP.P# is a foreign key matching the primary term P.P# SP P P# P1 P2 P3 P4 P5 S# P#QTY S1 S2 sp.p# = p.p# 15-14

Wei-Pang Yang, Information Management, NDHU Step 3: Choose candidate low-level procedures  Low-level procedure e.g. Join, restriction are low-level operators there will be a set of procedures for implementing each operator, e.g. Join (ref p.11-31) Nested Loop (a brute force) Index lookup (if one relation is indexed on join attribute) Hash lookup (if one relation is hashed by join attribute) Merge (if both relations are indexed on join attribute)

Wei-Pang Yang, Information Management, NDHU Step 3: Choose candidate low-level procedures (cont.)  Data flow existence of indexes cardinalities of relations. Optimizer step3 : access path selection predefined low-level procedures Ref. p Canonical Form e.g. One or more candidate procedures for each operator Step4 e.g.,, p.554 ( (C I)) System catalog Lib SQL Algebra 6 choose

Wei-Pang Yang, Information Management, NDHU Step 4: Generate query plans and choose the cheapest  Query plan is built by combing together a set of candidate implementation procedures for any given query many many reasonable plans Note: may not be a good idea to generate all possible plans. heuristic technique "keep the set within bound" (reducing the search space) 15-17

Wei-Pang Yang, Information Management, NDHU Step 4: Generate query plans and choose the cheapest (cont.)  Data flow Step 4(a) Step 4(b) choose the cheapest cheapest query plans output of step 3 ( (C I))

Wei-Pang Yang, Information Management, NDHU Step 4: Generate query plans and choose the cheapest (cont.)  Choosing the cheapest require a method for assigning a cost to any given plan. factor of cost formula: (1) # of disk I/O (2) CPU utilization (3) size of intermediate results a difficult problem [Jarke 84, p.564 ACM computing surveys] [Yao 79, 17.8 TODS]

15.3 Optimization in System R 15-20

Wei-Pang Yang, Information Management, NDHU Optimization in System R  Only minor changes to DB2 and SQL/DS.  Query in System R (SQL) is a set of "select-from-where" block  System R optimizer step1: choosing block order first in case of nested => innermost block first step2: optimizing individual blocks Note: certain possible query plan will never be considered.  The statistical information for optimizer Where: from the system catalog What: 1. # of tuples on each relation 2. # of pages occupied by each relation. 3. percentage of pages occupied by each relation. 4. # of distinct data values for each index. 5. # of pages occupied by each index. Note: not updated every time the database is updated. (overhead??)

Wei-Pang Yang, Information Management, NDHU Optimization in System R (cont.) Given a query block case 1. involves just a restriction and/or projection 1. statistical information (in catalog) 2. formulas for size estimates of intermediate results. 3. formulas for cost of low-level operations (next section) choose a strategy for constructing the query operation. case 2. involves one or more join operations e.g. A join B join C join D ((A join B) join C) join D Never: (A join B) join (C join D) Why? See next page 15-22

Wei-Pang Yang, Information Management, NDHU Optimization in System R (cont.) Note : 1. "reducing the search space" 2. heuristics for choosing the sequence of joins are given in [17.34] P (A join B) join C not necessary to compute entirely before join C i.e. if any tuple has been produced It may never be necessary to finish relation "A B ", why ? pass to join C ((A join B) join C) join D Never: (A join B) join (C join D) ∵ C has run out ?? 15-23

Wei-Pang Yang, Information Management, NDHU Optimization in System R (cont.)  How to determine the order of join in System R ? consider only sequential execution of multiple join. ((A B) C) D (A B) (C D) × STEP1: Generate all possible sequences (1) ((A B) C) D (2) ((A B) D) C (3) ((A C) B) D (4) ((A C) D) B (5) ((A D) B) C (6) ((A D) C) B (7) ((B C) A) D (8) ((B C ) D) A (9) ((B D) A) C (10) ((B D) C) A (11) ((C D) A) B (12) ((C D) B) A Total # of sequences = ( 4! )/ 2 =

Wei-Pang Yang, Information Management, NDHU Optimization in System R (cont.) STEP 2: Eliminate those sequences that involve Cartesian Product if A and B have no attribute names in common, then A B = A x B STEP 3: For the remainder, estimate the cost and choose a cheapest

15.4 Optimization in INGRES 15-26

Wei-Pang Yang, Information Management, NDHU Query Decomposition  a general idea for processing queries in INGRES.  basic idea: break a query involving multiple tuple variables down into a sequence of smaller queries involving one such variable each, using detachment and tuple substitution. avoid to build Cartesian Product. keep the # of tuple to be scanned to a minimum. "Get names of London suppliers who supply some red part weighing less than 25 pounds in a quantity greater than 200" Initial query: Q0: RETRIEVE (S.SNAME) WHERE S.CITY= 'London' AND S.S# = SP.S# AND SP.QTY > 200 AND SP.P# = P.P# AND P.COLOR = Red AND P.WEIGHT < 2 5 detach P 15-27

Wei-Pang Yang, Information Management, NDHU Query Decomposition (cont.) Q1: RETRIVE (S.SNAME) WHERE S.CITY = 'London' AND S.S# = SP.S# AND SP.QTY > 200 AND SP.P# = P'.P# S join SP join P’ D1: RETRIEVE INTO P' (P.P#) WHERE P.COLOR= 'Red' AND P.WEIGHT < 25 D2: RETRIEVE INTO SP' (SP.S#, SP.P#) WHERE SP.QTY > 200 Q2: RETRIEVE (S.SNAME) WHERE S.CITY = 'London' AND S.S#=SP'.S# AND SP'.P#=P'.P# detach SPdetach S 15-28

Wei-Pang Yang, Information Management, NDHU Query Decomposition (cont.) D3: RETRIEVE INTO S' (S.S#, S.SNAME) WHERE S.CITY = 'LONDON' Q3: RETRIEVE (S'.SNAME) WHERE S'.S# =SP'.S# AND SP'.P# = P'.P# D4: RETRIEVE INTO SP"(SP'.S#) WHERE SP'.P# =P'.P# Q4: RETRIEVE (S'.SNAME) WHERE S'.S# = SP".S# D5: RETRIEVE INTO SP"(SP'.S#) WHERE SP'.P# = 'P1' OR SP'.P#= 'P3‘ Q5: RETRIEVE (S'.SNAME) WHERE S'.S# = 'S1' OR S'.S# = 'S2' OR S'.S# = 'S4' detach P' and SP' D4: two var. --> tuple substitution ( Suppose D1 evaluate to {P1, P3 } Q4 : two var. --> tuple substitution ( Suppose D5 evaluate to { S1, S2, S4}) 15-29

Wei-Pang Yang, Information Management, NDHU Query Decomposition (cont.)  Decomposition tree for query Q 0 : Overall result Q4 (Q5) D3 S' SP'' D4 (D5) S D1 D2 P' SP' P SP D1, D2, D3: queries involve only one variable => evaluate D4, Q4: queries involve tow variable => tuple substitution - Objectives : avoid to build Cartesian Product. keep the # of tuple to be scanned to a minimum

15.5 Implementing the Join Operators  Method 1: Nested Loop  Method 2: Index Lookup  Method 3: Hash Lookup  Method 4: Merge 15-31

Wei-Pang Yang, Information Management, NDHU Join Operation  Suppose R S is required, R.A and S.A are join attributes. R A abab..e..e m S A baba..a..a n

Wei-Pang Yang, Information Management, NDHU Method 1: Nested Loop A a b..e..e m A baba..a..a n R S - O (mn) - the worst case - assume that S is neither indexed nor hashed on A - will usually be improved by constructing index or hash on S.A dynamically and then proceeding with an index or hash lookup scan. Suppose R and S are not sorted on A

Wei-Pang Yang, Information Management, NDHU Method 2: Index Lookup  Suppose S in indexed on A S A baba..a..a A abab..e..e m R a a b S.A_index 1n1n

Wei-Pang Yang, Information Management, NDHU Method 3: Hash Lookup  Suppose S is hashed on A. R.A abzabz..e..e m R b S h(a) h(e) h(e) = 1 h(b) = 0 h(a) = 2 h(z) = z a eaea S.A -Calculate hash function is faster than search in index

Wei-Pang Yang, Information Management, NDHU Method 4: Merge  Suppose R and S are both sorted (for indexed) on A. = A baba zcbzcb A baba dbadba R S a a b a a A abab bczbcz m R A aaaa bbdbbd S 1n1n  Only index is retrieved for any unmatched tuple

Wei-Pang Yang, Information Management, NDHU end of unit