# CS 44321 CS4432: Database Systems II Logical Plan Rewriting.

## Presentation on theme: "CS 44321 CS4432: Database Systems II Logical Plan Rewriting."— Presentation transcript:

CS 44321 CS4432: Database Systems II Logical Plan Rewriting

CS 4432query processing2 parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics

CS 44323 Query in SQL  Query Plan in Algebra (logical)  Other Query Plan in Algebra (logical)

CS 44324 Query plan 1 (in relational algebra)  B,D  R.A =“c”  S.E=2  R.C=S.C  X RS

CS 44325 Query plan 2 (in relational algebra)  B,D  R.A = “c”  S.E = 2 R S natural join on R.C=S.C

CS 44326 Relational algebra optimization What are transformation rules ? –preserve equivalence What are good transformations? –reduce query execution costs

CS 44327 Rules: Natural join rewriting. R S=SR (R S) T= R (S T) R SS T T R Can also write as trees, e.g.:

CS 44328 Rules: Other binary operators ? R S=SR (R S) T= R (S T) What about : Cross product? Condition join? Union? Intersection ? Difference ?

CS 44329 Note: T R R SS T

CS 443210 R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T Rules: Natural joins & cross products & union R S=SR (R S) T= R (S T)

CS 443211 Rules: Selects  p1  p2 (R)=  p1 [  p2 (R)] [  p1 (R)] U [  p2 (R)]  p1vp2 (R) =

CS 443212 Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} What about union R U S = ? Option 1 SUM R U S = {a,a,b,b,b,b,b,c,c,c,d} Option 2 MAX R U S = {a,a,b,b,b,c,c,d}

CS 443213 Which option makes this rule work ?  p1vp2 (R) =  p1 (R) U  p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c  p1vp2 (R) = {a,a,b,b,b,c}  p1 (R) = {a,a,b,b,b}  p2 (R) = {b,b,b,c}  p1 (R) U  p2 (R) = {a,a,b,b,b,c} Let us try MAX():

CS 443214 Which option makes this rule work ?  p1vp2 (R) =  p1 (R) U  p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c  p1vp2 (R) = {a,a,b,b,b,c}  p1 (R) = {a,a,b,b,b}  p2 (R) = {b,b,b,c}  p1 (R) U  p2 (R) = {a,a,b,b,b,b,b,b,c} What about Sum()?

CS 443215 Which option makes this rule work ?  p1  p2 (R)=  p1 [  p2 (R)] Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c What about MAX versus SUM ?

CS 443216 Option 2 (MAX) makes this rule work:  p1vp2 (R) =  p1 (R) U  p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c  p1vp2 (R) = {a,a,b,b,b,c}  p1 (R) = {a,a,b,b,b}  p2 (R) = {b,b,b,c}  p1 (R) U  p2 (R) = {a,a,b,b,b,c}

CS 443217 Yet another example ! Senators (……)Reps (……) T1 =  yr,state Senators; T2 =  yr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? “Sum” option makes more sense!

CS 443218 Executive Decision -> Use “SUM” option for bag unions -> CAREFUL ! Some rules cannot be used for bags

CS 443219 Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y  xy (R) =  x [  y (R)]

CS 443220 Let p = predicate with only R attributes q = predicate with only S attributes m = predicate with both R and S attribs  p (R S) =  q (R S) = Rules:  combined [  p (R)] S R [  q (S)]

CS 443221  p  q (R S) = ? Rules:  combined Rule can be derived !

CS 443222 Derivation for rule :  p  q (R S) =  p [  q (R S) ] =  p [ R  q (S) ] = [  p (R)] [  q (S)]

CS 443223 More Rules can be Derived:  p  q (R S) =  p  q  m (R S) =  pvq (R S) = Rules:  combined (continued)

CS 443224 We did one, do others on your own :  p  q (R S) = [  p (R)] [  q (S)]  p  q  m (R S) =  m [ (  p R) (  q S) ]  pvq (R S) = [ (  p R) S ] U [ R (  q S) ]

CS 443225 Rules:  combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes)  x [  p ( R ) ] =  {  p [  x ( R ) ] } x x  xz

CS 443226 Rules:  combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes  xy (R S) =  xy { [  xz ( R ) ] [  yz ( S ) ] }

CS 443227  xy {  p (R S) } =  xy {  p [  xz’ (R)  yz’ (S)] } z’ = z U { attributes used in P }

CS 443228  p (R U S) =  p (R) U  p (S)  p (R - S) =  p (R) - S =  p (R) -  p (S) Rules   U  combined:

CS 443229 Which are “good” transformations?

CS 443230 Conventional wisdom: do projects early Example: relation R(A,B,C,D,E) predicate P: (A=3)  (B=“cat”)  E {  p (R)} vs.  E {  p {  ABE (R)} }

CS 443231 What if we have A, B indexes? B = “cat” A=3 Intersect pointers to get pointers to matching tuples! But Then better to do projection later !

CS 443232  p1  p2 (R)   p1 [  p2 (R)]  p (R S)  [  p (R)] S R S  S R  x [  p (R)]   x {  p [  xz (R)] } Which are “good” transformations?

CS 443233 Bottom line: Some heuristics : –Early selection is usually good No transformation is always good Rule application defines a search space –Need cost criteria to make decision

CS 443234 In textbook: more transformations Chapter 16.2, 16.3.3 More rewrite rules Other operations, such as, duplicate elimination, etc.