1 Overview of Query Evaluation Chapter 12. 2 Outline  Query Optimization Overview  Algorithm for Relational Operations.

1 Overview of Query Evaluation Chapter 12

2 Outline  Query Optimization Overview  Algorithm for Relational Operations

3 Overview of Query Evaluation  DBMS keeps descriptive data in system catalogs.  SQL queries are translated into an extended form of relational algebra:  Query Plan Reasoning : Tree of operators with choice of one among several algorithms for each operator

Query Plan Evaluation Reserves Sailors sid=sid bid=100 rating > 5 sname  Query Plan Execution: Each operator typically implemented using a `pull’ interface when an operator is `pulled’ for next output tuples, it `pulls’ on its inputs and computes them.

5 Overview of Query Evaluation  Query Plan Optimization :  Ideally: Want to find best plan. Practically: Avoid worst plans!  Two main issues in query optimization:  For a given query, what plans are considered? Algorithm to search plan space for cheapest (estimated) plan.  How is the cost of a plan estimated? Cost models based on I/O estimates

6 System Catalogs  For each index:  structure (e.g., B+ tree) and search key fields  For each relation:  name, file name, file structure (e.g., Heap file)  attribute name and type, for each attribute  index name, for each index  integrity constraints  For each view:  view name and definition  Plus statistics, authorization, buffer pool size, etc. * Catalogs are themselves stored as relations !

7 Statistics and Catalogs  Need : Information about relations and indexes.  Catalogs typically contain at least:  # tuples (NTuples) # pages (NPages) for each relation.  # distinct key values (NKeys) and NPages for each index.  Index height, low/high key values (Low/High) for each tree index.  Catalogs updated periodically.  Updating whenever many data changes occurred;  Lots of approximation anyway, so slight inconsistency ok.

8 How Catalogs are stored Attr_Cat(attr_name, rel_name, type, position) System Catalog is itself a collection of tables. Catalog tables describe all tables in database, including catalog tables themselves. The code retrieves the catalog table must be handled specially.

9 Query Processing  Common Techniques for Query Processing Algorithms:  Indexing: Can use WHERE conditions to retrieve small set of tuples from large relation  Iteration: Examine all tuples in an input table, one after the other (like in sorting algorithm).  Partitioning: By using sorting or hashing, we partition input tuples and replace expensive operation by similar operations on smaller inputs. Watch for these techniques as we discuss query evaluation!

10 Access Paths  An access path  A method of retrieving tuples from a table  Method:  File scan,  Index that matches a selection (in the query)  Note :  Contributes significantly to cost of relational operator.

11 Matching an Access Path  A tree index matches (a conjunction of) terms that involve only attributes in a prefix of search key.  Example : Given tree index on  selection a=5 AND b=3 ?  selection a=5 AND b>6 ?  selection b=3 ?

12 Matching an Access Path  A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in search key of index.  Example : Given hash index on  selection a=5 AND b=3 and c =6 ?  selection c = 5 AND b = 6 ?  selection a = 5 ?  selection a > 5 AND b=3 and c =5 ?

13 Access Path  Most selective access path: An index or file scan that we estimate will require fewest page I/Os.

14 Query Evaluation of Selection

15 Selection  Example :  R.attr OP value (R)  Case 1: No Index, NOT sorted on R.attr  Must scan the entire relation. Most selective access path = file scan Cost: M

16 Selection  Example :  R.attr OP value (R)  Case 2: No Index, Sorted Data on R.attr  Binary search for first tuple. Scan R for all satisfied tuples. Cost: O(log 2 M)

17 Selection Using B+ tree index  Example :  R.attr OP value (R)  Case 3: B+ tree Index on R.attr  Costs :  cost I (finding qualifying data entries), and  cost II (retrieving records)  Cost I: 2-3 I/Os. (depth of B+ tree)  Cost II:  clustered index 1 I/O and few tuples that match  unclustered index up to one I/O per qualifying tuple.

18 Example : Using B+ Index for Selections  Example :  Assume uniform distribution of names,  Then say about ~ 10% of tuples qualify  Assume R is on 100 pages and has 10,000 tuples.  Clustered index: 2 IOs + little more than 10 I/Os  Unclustered index : 2 IOs + up to 1,000 I/Os SELECT * FROM Reserves R WHERE R.rname < ‘D%’

19 Selection --- B+ Index  Refinement for unclustered indexes : 1. Find qualifying data entries. 2. Sort rids of data records to be retrieved. 3. Fetch rids in order. Avoid retrieving the same page multiple times. However, # of such pages likely to be still higher than with clustering.  Use of unclustered index for a range selection could be expensive.  Often better (and simpler) to instead scan data file.

20 Selection – Hash Index  Hash index is good for equality selection.  Costs:  Cost I (retrieve index bucket page)  + Cost II (retrieving qualifying tuples from R)  Cost I is one I/O  Cost II could up to one I/O per satisfying tuple.

21 General Condition : Conjunction  A condition with several predicates combined by conjunction (AND):  Example : day<8/9/94 AND bid=5 AND sid=3.

22 General Selections (Conjunction) First approach: (utilizing single index)  Find the most selective access path, retrieve tuples using it.  To reduce the number of tuples retrieved  Apply any remaining terms that don’t match the index:  To discard some retrieved tuples  This does not affect number of tuples/pages fetched.  Example : Consider day<8/9/94 AND bid=5 AND sid=3.  A B+ tree index on day can be used; then bid=5 and sid=3 must be checked for each retrieved tuple.  Hash index on could be used day<8/9/94 must then be checked on fly.

23 General Selections Second approach (utilizing multiple indices)  Assuming 2 or more matching indexes that use Alternatives (2) or (3) for data entries.  Get sets of rids of data records using each matching index.  Then intersect these sets of rids  Retrieve records and apply any remaining terms.  Example : Consider day<8/9/94 AND bid=5 AND sid=3.  A B+ tree index I on day and an index II on sid, both Alternative (2).  - Retrieve rids of records satisfying day<8/9/94 using index I,  - Retrieve rids of recs satisfying sid=3 using Index II  - Intersect rids  - Retrieve records and check bid=5.

24 General Condition : Disjunction  Disjunction condition: one or more terms (R.attr op value) connected by OR (  ).  Example : ( day<8/9/94) OR (bid=5 AND sid=3)

25 General Selection (Disjunction)  Case 1: Index is not available for one of terms.  Then :  Need a file scan anyways.  Thus check other conditions in this file scan.  E.g., Consider day<8/9/94 OR rname ='Joe'  No index on day. Need a File scan.  Even index is available in rname does not help.

26 General Selection (Disjunction)  Case 2: Every term has a matching index.  Retrieve candidate tuples using respective index.  Then Union the results  Example : consider day<8/9/94 OR rname ='Joe'  Assume two B+ tree indexes on day and rname.  Retrieve tuples satisfying day < 8/9/94  Retrieve tuples satisfying rname = 'Joe'  Union the retrieved tuples.

27 Query Evaluation of Projection

28 Algorithms for Projection SELECT R.sid, R.bid FROM Reserves R

29 Algorithms for Projection  The expensive part is removing duplicates.  SQL systems don’t remove duplicates unless keyword DISTINCT is specified in query.  Sorting Approach:  Sort on and remove duplicates. (Can optimize this by dropping unwanted information while sorting.)  Hashing Approach:  Hash on to create partitions. Load partitions into memory one at a time, build in-memory hash structure, and eliminate duplicates.  Indexing Approach :  If there is an index with both R.sid and R.bid in the search key, may be cheaper than sorting data entries SELECT DISTINCT R.sid, R.bid FROM Reserves R

30 Summary  There are several alternative evaluation algorithms for each relational operator.

31 Conclusion Not one method wins. Optimizer must assess situation to select best possible candidate

1 Overview of Query Evaluation Chapter 12. 2 Outline  Query Optimization Overview  Algorithm for Relational Operations.

Similar presentations

Presentation on theme: "1 Overview of Query Evaluation Chapter 12. 2 Outline  Query Optimization Overview  Algorithm for Relational Operations."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Overview of Query Evaluation Chapter 12. 2 Outline  Query Optimization Overview  Algorithm for Relational Operations.

Similar presentations

Presentation on theme: "1 Overview of Query Evaluation Chapter 12. 2 Outline  Query Optimization Overview  Algorithm for Relational Operations."— Presentation transcript:

Similar presentations

About project

Feedback