Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

Slides:



Advertisements
Similar presentations
ICS 434 Advanced Database Systems
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Relational Query Optimization (this time we really mean it)
QUERY OPTIMIZATION AND QUERY PROCESSING.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Processing & Optimization
Chapter 19 Query Processing and Optimization
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Processing and Optimization
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Lecture 11: Query processing and optimization Jose M. Peña
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
Copyright © 2011 Ramez Elmasri and Shamkant Navathe CPSC 8620: Database Management System Design Notes 7 Query Processing.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Query Processing and Optimization, and Database Tuning
Query Optimization Heuristic Optimization
Database System Implementation CSE 507
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
QUERY OPTIMIZATION.
Advance Database Systems
Evaluation of Relational Operations: Other Techniques
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Algorithms for Query Processing and Optimization
Presentation transcript:

Query Optimization Chap. 19

Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where clauses eliminated – Rows grouped by – Groups not satisfying having eliminated – Select clause target list evaluated – If distinct eliminate duplicate rows – Union after each subselect evaluated – Rows sorted in order by

Actual Order of Evaluation Order chosen by query optimizer – Determines efficient way Steps to optimization: – Syntax checking phase scan - identify tokens of text parse- check syntax validate - check attributes and relation names – Query optimization phase create internal representation - Query Tree identify execution strategies for Query Plan – Maintains statistics for tables and columns, indexes choose suitable Query Plan to optimize query e.g. order of execution of ops, use indexes, etc.

How to produce an execution plan Oracle’s example

Evaluation cont’d – Execution phase query optimizer produces execution plan code generator generates code runtime db processor runs query code – To minimize run time chosen strategy NOT optimal, but reasonably efficient For procedural languages limited need for query optimization

Strategy How would you optimize an SQL query?

Heuristics in Query Optimization Apply select and project before join or other binary operations. Why? – select and project reduce size Strategy is obvious, but challenge was to show could be done with rules

Create Internal Representation – Query Tree

Converting Query Trees into Query Plans An execution plan for a query tree includes information about access methods for each relation and algorithms for operators For example: – For  operation Choose an index Use a tablescan – For |X| Use nested loop, sort-merge or hash join – For  Scan result of join or combine with |X| when write out result from join

Query Optimization Canonical form (initial query tree - conceptual order of evaluation) – Leaf nodes are tables – Internal nodes are operations – Begin by separating select conditions from joins (end up with X) – Combine all selects then all projects transform to final query tree using general transformation rules for relational algebra

Query tree for SQL query Select lname From employee, works_on, project Where pname=‘Aquarius’ and pnumber=pno and essn=ssn and bdate > ‘ ’ Write as canonical query tree Write as relational algebra expression

General Transformation Rules for Relational Algebra 1.Cascade of  2.Commutativity of  3.Cascade of  4.Commuting  with  5.Commutativity of |X| (and X) 6.Commuting  with |X| (or X) 7.Commuting  with |X| (or X)

General Transformation Rules for Relational Algebra 8. Commutativity of set operations U and ∩ 9. Associativity of |X|, X, U and ∩ 10. Commuting  with set operations 11. The  operation commuted with U 12. Converting a ( , X) sequence into |X|

Outline of a Heuristic Algebraic Optimization Algorithm Use Rule 1 break up conjunctive  ’s into cascades of  ’s Use Rules 2,4,6, 10 for commutativity of  to: – move  ’s as far down tree as possible Use Rules 5 and 9 for commutativity and associativity of binary operations to: – Place most restrictive  (and |X|) so executed first fewest tuples, smallest absolute size or smallest selectivity But make sure no cartesian products result

Outline of a Heuristic Algebraic Optimization Algorithm Use Rule 12, combine Cartesian product with  to: – create |X| Use Rules 3, 4, 7, 11 concerning cascade of  ’s and commuting  with other ops to: – move down tree as far as possible Identify subtrees that represent groups of operations than can be executed by a single algorithm

Summary of Heuristics Apply first operations that reduce size of intermediate results – Perform  ’s and  ’s as early as possible (move down tree as far as possible) – Execute most restrictive  and |X| first (reorder leaf nodes but avoid cartesian products)

Multiple table joins – Query plan identifies most efficient ordering of joins – uses dynamic programming

Order of joins - Oracle Much more complicated – when determining order of joins, keep track or resulting sort order (interesting order) Using dynamic programming considers order of result – can void redundant sort operation later and/or speed up subsequent join Can flatten nested joins – dynamic programming can escalate optimization time, so use rules instead Estimating cost is difficult

Joins – May not have to materialize actual table resulting from join – Instead use pipelining - successive rows output from one step fed into next plan

Converting trees into Query plans pipelined evaluation – Forward result from an operation directly to next operation Result from , placed in buffer |X| consumes tuples from buffer Result from |X| pipelined to 

Query Tree Question Should we do a  pname, pnumber then  pname = ‘Aquarius’ then  pnumber ? No, since the operations are done together –the processor would read a row of project, see if pname = ‘Aquarius’ then use pnumber to perform the join.

Algorithms DBMS has general access algorithms to implement select, join, or combinations of ops Don't have separate access routines for each op – Creating temporary files is inefficient – Generate algorithms for combinations of operations – join, select, project - code is created dynamically to implement multiple operations

Materialized table Think about what operations require utilizing a materialized table – Input to select? – Input to project? – Input to join?

Identify Execution Strategies and Suitable Query Plans

Cost Optimizers combine: – Heuristic rules for ordering ops – Systematically estimate cost of different execution strategies - choose lowest cost E.g. nested loop or hash join? CPU cost usually similar for the Query Plans

Cost of Query Plans Cost components for query execution – Computation cost: in-memory ops on data buffers e.g. sorting, searching, implementation/order of operatopms CPU cost usually similar for the Query Plans – Memory usage cost: Number of memory buffers needed – Access cost to secondary storage: cost of search for hashing, indexes, R/W contiguous versus scattered storage on disk – Storage cost if intermediate tables – Communication cost: cost to ship query and results from DB site to request site

Cost cont’d For small DB's, minimize computation cost For large DB's, minimize cost to secondary storage e.g. block transfers between disk and main memory For distributed DB's, minimize communication cost Workload – Mix of queries and frequencies of queries – Given workload and query execution plans, can determine CPU and I/O resource needs

Statistics – Maintain statistics about tables # rows, #columns, domains SYSCOLUMNS col_name, table_name, #of values, High, Low – Statistics on columns that deviate strongly from the uniform assumption – Selectivity of values – Execute special command to gather info RUNSTATS DB2 ANALYZE Oracle

Cost function information Number of tuples or records (r) Record size (R) Number of blocks (b) Blocking factor – records per block (bfr) Number of distinct values (d) Selectivity (sl)

Selectivity sl a.k.a Filter Factor FF Fraction of rows with specified values(s) for specific attribute that result from the predicate restriction How many tuples satisfy predicate Hopefully only need to access those tuples + index

Selectivity sl, Filter Factor FF # records satisfying condition c total# of records in relation Estimate attribute with i distinct values as: – Assume |R| is #rows in table R FF = sl = ( |R|/i) / |R| = 1/col_cardinality FF = sl =(10,000/2)/10,000 = 1/2

Examples of sl if SQL statement specified: – col = const, DB2 assumes sl = 1/col_cardinality – col between const1 and const2 DB2 assumes sl = (const2 - const1)/(High - Low) For some predicates, sl not predictable by simple formula

Assumptions Uniform distribution of column value Attribute values independent Independent distribution of values from any 2 columns C1 and C2 sl(C1) * sl(C2) e.g. 1/2 (gender) * 1/4 (class) = 1/8 undergrads are female freshman

Cost of a Join How to determine join selectivity (js) – js = |(R|X| c S)|/|(RXS)| = |(R|X| c S)| / (|R|X|S|) If no join condition, js=? js=1 If no matching tuples, js=? js=0

Cost of a Join Assuming R.A=S.B is join condition – if A is a key of R, js=? js ≤ (1/|R|) – Unless B is a foreign key and NOT NULL, then js=? js = (1/|R|) – if B is a key of S, js=? js ≤ (1/|S|)

Cost of different implementations of join:

You Can Retrieve Query Plan Explain plan set queryno=1000 for select * from customers where city = ‘Boston’ Select * from plan_table where queryno=1000; Givess access type (index or not), columns, etc.

Information for Optimization When using indexes – Cluster Ratio how well clustering property holds for rows with respect to a given index if clustering ratio 80% or more, use sequential prefetch