Presentation is loading. Please wait.

Presentation is loading. Please wait.

Access Path Selection in a Relation Database Management System (summarized in section 2)

Similar presentations


Presentation on theme: "Access Path Selection in a Relation Database Management System (summarized in section 2)"— Presentation transcript:

1 Access Path Selection in a Relation Database Management System (summarized in section 2)

2 Processing an SQL statement parsing, optimization, code generation, execution an SQL statement may have many query blocks (nesting)

3 Optimizer validates parsed query collects statistics on referenced relations & columns discovers available access paths for each relation checks for type errors in expressions Access path selection: –determines order of evaluation of query blocks –a tree of alternate path choices is created for each query block with more than one relation –minimum cost access path is chosen from the tree results of optimizer is passed to code generation and execution components

4 RSS (Research Storage System) storage manager for System R Maintains physical storage, access paths, locking, logging, and recovery Relations are stored as a collection of tuples tuples are stored on 4K pages; pages are organized into segments segments completely contain one or more relations tuples are accessed via a scan: sequential scan or index scan indexes are B-trees with linked leaves sequential scan touches all the pages of a segment that contains a relation once index scans touch all the leaf pages of the index once; relation pages >=1 times if index and data tuples are in the same order, the data is “clustered” scans may takes a set of predicates to apply to a tuple before returning it –predicates are of the form (column op value)

5 Cost computation cost = page fetches + W*(RSI calls) –cost = IO costs + W * CPU costs an index that matches a boolean factor of the query is an efficient access path

6 Statistics NCARD(T): cardinality of the relation T TCARD(T): number of pages used for T P(T): fraction of pages in a segment used for T ICARD(I): number of distinct keys in index I NINDX(I): number of pages in index I

7 Selectivity column = value : F = 1/ICARD(column) if there is an index. F = 1/10 otherwise column1 = column2: F = 1/MAX(ICARD(column1), ICARD(column2)); F = 1/ICARD(column i); F = 1/10 column > value: F = (high key value - value) / (high key - low key) column between value1 and value2: F= (value2 - value1)/ (high key - low key) column IN (list of values): F = (# of items in list) * (selectivity for column = value) max is 1/2 columnA IN subquery: F = (card. of subquery) / (  card. of subquery relations) (pred1) OR (pred2): F = F(pred1) + F(pred2) - F(pred1) * F(pred2) (pred1) AND (pred2): F = F(pred1) * F(pred2) NOT pred: F = 1 - F(pred)

8 QCARD QCARD is (  card. of all relations) * (  F(pred i)) RSICARD is the expected number of calls to RSI (  card. of all relations) * (  F(sargable pred i)) An “interesting order” is an order specified by the GROUP BY or ORDER BY clause Single relation cost: cheapest access path which produces the “interesting order” or cheapest access path plus sorting cost of result

9 Cost Table (p. 515) index pages fetched plus data pages fetched plus W times RSI tuple retrieval calls. unique index matching an equal predicate: 1+1+W clustered index I matching one or more boolean factors: F(preds) *(NINDX(I) + TCARD) + W * RSICARD etc…

10 Joins nested loops and merging scans merging scans require sorts on the join column -- another “interesting order” n-way joins can be done by a succession of 2-way joins; not necessarily using the same technique. Results may be pipelined if a sort is not required.

11 Join ordering n! permutations of relation join orders join of (k+1) relation with previous k relations is independent of first k join order avoid Cartesian products when possible; make them as late as possible

12 Construct a tree construct a tree of possible join orderings: keep the cheapest order that produces an interesting ordering. First find the best way to access each single relation for each interesting ordering and unordered. Next, find the best way of joining any relation to each of these. Repeat until all relations have been added to each branch Choose the cheapest strategy that has an interesting ordering, or the cheapest strategy plus a sort. Total number of solutions to store: 2 n

13


Download ppt "Access Path Selection in a Relation Database Management System (summarized in section 2)"

Similar presentations


Ads by Google