Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

Query Optimization Chap. 19

Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where clauses eliminated – Rows grouped by – Groups not satisfying having eliminated – Select clause target list evaluated – If distinct eliminate duplicate rows – Union after each subselect evaluated – Rows sorted in order by

Actual Order of Evaluation Order chosen by query optimizer – Determines efficient way Steps to optimization: – Syntax checking phase scan - identify tokens of text parse- check syntax validate - check attributes and relation names – Query optimization phase create internal representation - Query Tree identify execution strategies for Query Plan – Maintains statistics for tables and columns, indexes choose suitable Query Plan to optimize query e.g. order of execution of ops, use indexes, etc.

How to produce an execution plan Oracle’s example

Evaluation cont’d – Execution phase query optimizer produces execution plan code generator generates code runtime db processor runs query code – To minimize run time chosen strategy NOT optimal, but reasonably efficient For procedural languages limited need for query optimization

Strategy How would you optimize an SQL query?

Heuristics in Query Optimization Apply select and project before join or other binary operations. Why? – select and project reduce size Strategy is obvious, but challenge was to show could be done with rules

Create Internal Representation – Query Tree

Converting Query Trees into Query Plans An execution plan for a query tree includes information about access methods for each relation and algorithms for operators For example: – For  operation Choose an index Use a tablescan – For |X| Use nested loop, sort-merge or hash join – For  Scan result of join or combine with |X| when write out result from join

Query Optimization Canonical form (initial query tree - conceptual order of evaluation) – Leaf nodes are tables – Internal nodes are operations – Begin by separating select conditions from joins (end up with X) – Combine all selects then all projects transform to final query tree using general transformation rules for relational algebra

Query tree for SQL query Select lname From employee, works_on, project Where pname=‘Aquarius’ and pnumber=pno and essn=ssn and bdate > ‘1987-12-31’ Write as canonical query tree Write as relational algebra expression

General Transformation Rules for Relational Algebra 1.Cascade of  2.Commutativity of  3.Cascade of  4.Commuting  with  5.Commutativity of |X| (and X) 6.Commuting  with |X| (or X) 7.Commuting  with |X| (or X)

General Transformation Rules for Relational Algebra 8. Commutativity of set operations U and ∩ 9. Associativity of |X|, X, U and ∩ 10. Commuting  with set operations 11. The  operation commuted with U 12. Converting a ( , X) sequence into |X|

Outline of a Heuristic Algebraic Optimization Algorithm Use Rule 1 break up conjunctive  ’s into cascades of  ’s Use Rules 2,4,6, 10 for commutativity of  to: – move  ’s as far down tree as possible Use Rules 5 and 9 for commutativity and associativity of binary operations to: – Place most restrictive  (and |X|) so executed first fewest tuples, smallest absolute size or smallest selectivity But make sure no cartesian products result

Outline of a Heuristic Algebraic Optimization Algorithm Use Rule 12, combine Cartesian product with  to: – create |X| Use Rules 3, 4, 7, 11 concerning cascade of  ’s and commuting  with other ops to: – move down tree as far as possible Identify subtrees that represent groups of operations than can be executed by a single algorithm

Summary of Heuristics Apply first operations that reduce size of intermediate results – Perform  ’s and  ’s as early as possible (move down tree as far as possible) – Execute most restrictive  and |X| first (reorder leaf nodes but avoid cartesian products)

Multiple table joins – Query plan identifies most efficient ordering of joins – uses dynamic programming

Order of joins - Oracle Much more complicated – when determining order of joins, keep track or resulting sort order (interesting order) Using dynamic programming considers order of result – can void redundant sort operation later and/or speed up subsequent join Can flatten nested joins – dynamic programming can escalate optimization time, so use rules instead Estimating cost is difficult

Joins – May not have to materialize actual table resulting from join – Instead use pipelining - successive rows output from one step fed into next plan

Converting trees into Query plans pipelined evaluation – Forward result from an operation directly to next operation Result from , placed in buffer |X| consumes tuples from buffer Result from |X| pipelined to 

Query Tree Question Should we do a  pname, pnumber then  pname = ‘Aquarius’ then  pnumber ? No, since the operations are done together –the processor would read a row of project, see if pname = ‘Aquarius’ then use pnumber to perform the join.

Algorithms DBMS has general access algorithms to implement select, join, or combinations of ops Don't have separate access routines for each op – Creating temporary files is inefficient – Generate algorithms for combinations of operations – join, select, project - code is created dynamically to implement multiple operations

Materialized table Think about what operations require utilizing a materialized table – Input to select? – Input to project? – Input to join?

Identify Execution Strategies and Suitable Query Plans

Cost Optimizers combine: – Heuristic rules for ordering ops – Systematically estimate cost of different execution strategies - choose lowest cost E.g. nested loop or hash join? CPU cost usually similar for the Query Plans

Cost of Query Plans Cost components for query execution – Computation cost: in-memory ops on data buffers e.g. sorting, searching, implementation/order of operatopms CPU cost usually similar for the Query Plans – Memory usage cost: Number of memory buffers needed – Access cost to secondary storage: cost of search for hashing, indexes, R/W contiguous versus scattered storage on disk – Storage cost if intermediate tables – Communication cost: cost to ship query and results from DB site to request site

Cost cont’d For small DB's, minimize computation cost For large DB's, minimize cost to secondary storage e.g. block transfers between disk and main memory For distributed DB's, minimize communication cost Workload – Mix of queries and frequencies of queries – Given workload and query execution plans, can determine CPU and I/O resource needs

Statistics – Maintain statistics about tables # rows, #columns, domains SYSCOLUMNS col_name, table_name, #of values, High, Low – Statistics on columns that deviate strongly from the uniform assumption – Selectivity of values – Execute special command to gather info RUNSTATS DB2 ANALYZE Oracle

Cost function information Number of tuples or records (r) Record size (R) Number of blocks (b) Blocking factor – records per block (bfr) Number of distinct values (d) Selectivity (sl)

Selectivity sl a.k.a Filter Factor FF Fraction of rows with specified values(s) for specific attribute that result from the predicate restriction How many tuples satisfy predicate Hopefully only need to access those tuples + index

Selectivity sl, Filter Factor FF # records satisfying condition c total# of records in relation Estimate attribute with i distinct values as: – Assume |R| is #rows in table R FF = sl = ( |R|/i) / |R| = 1/col_cardinality FF = sl =(10,000/2)/10,000 = 1/2

Examples of sl if SQL statement specified: – col = const, DB2 assumes sl = 1/col_cardinality – col between const1 and const2 DB2 assumes sl = (const2 - const1)/(High - Low) For some predicates, sl not predictable by simple formula

Assumptions Uniform distribution of column value Attribute values independent Independent distribution of values from any 2 columns C1 and C2 sl(C1) * sl(C2) e.g. 1/2 (gender) * 1/4 (class) = 1/8 undergrads are female freshman

Cost of a Join How to determine join selectivity (js) – js = |(R|X| c S)|/|(RXS)| = |(R|X| c S)| / (|R|X|S|) If no join condition, js=? js=1 If no matching tuples, js=? js=0

Cost of a Join Assuming R.A=S.B is join condition – if A is a key of R, js=? js ≤ (1/|R|) – Unless B is a foreign key and NOT NULL, then js=? js = (1/|R|) – if B is a key of S, js=? js ≤ (1/|S|)

Cost of different implementations of join:

You Can Retrieve Query Plan Explain plan set queryno=1000 for select * from customers where city = ‘Boston’ Select * from plan_table where queryno=1000; Givess access type (index or not), columns, etc.

Information for Optimization When using indexes – Cluster Ratio how well clustering property holds for rows with respect to a given index if clustering ratio 80% or more, use sequential prefetch

Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

Similar presentations

Presentation on theme: "Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

Similar presentations

Presentation on theme: "Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where."— Presentation transcript:

Similar presentations

About project

Feedback