Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.

Example: Consider relation schemas R(A,B) and S(B,C) and the expression below:  (A=1 OR A=3) AND B<C (R  S) 1.Splitting AND  A=1 OR A=3 (  B < C (R  S)) 2.Push  to S  A=1 OR A=3 (R   B < C (S)) 3.Push  to R  A=1 OR A=3 (R)   B < C (S)

Pushing selections Usually selections are pushed down the expression tree. The following example shows that it is sometimes useful to pull selection up in the tree. StarsIn(title,year,starName) Movie(title,year,length,studioName) CREATE VIEW MoviesOf1996 AS SELECT * FROM MOVIE WHERE year=1996; Query: Which stars worked for which studios in 1996? SELECT starName,studioName FROM MoviesOf1996 NATURAL JOIN StarsIN;

pull selection up then push down

Laws for (bag) Projection A simple law: Project out attributes that are not needed later. –I.e. keep only the input attr. and any join attribute.

Examples for pushing projection Schema R(a,b,c), S(c,d,e)

Example: Pushing Projection Schema : StarsIn(title,year,starName) Query: SELECT starName FROM StarsIn WHERE year = 1996; Should we transform to  ? Depends! Is StarsIn stored or computed ?  starName  year=1996 StarsIn  starName  year=1996 StarsIn  starName,year

Reasons for not pushing the projection If StarsIn is stored, the for the projection we have to scan the relation. If the relation is pipelined from some previous computation, then yes, we better do the projection (on the fly). Also, if for example there is an index on year for StarsIn, such index is useless in the projected relation  starName,year (StarsIn) –While such an index is very useful for the selection on “year=1996”

Laws for duplicate elimination and grouping Try to move  in a position where it can be eliminated altogether E.g. when  is applied on A stored relation with a declared primary key A relation that is the result of a  operation, since grouping creates a relation with no duplicates.  absorbs  Also: What’s M?

Improving logical query plans Push  as far down as possible (sometimes pull them up first). Do splitting of complex conditions in  in order to push  even further. Push  as far down as possible, introduce new early  (but take care for exceptions) Combine  with  to produce  - joins or equi-joins Choose an order for joins

Example of improvement SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE ‘%1960’;  title  starname=name AND birthdate LIKE ‘%1960’ StarsIn  MovieStar  title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’

 title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’  name And a better plan introducing a projection to filter out useless attributes:

Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. What’s the cost? The number of I/O’s to needed to manage the intermediate relations. This number will be a function of the size of intermediate relations, –i.e. number of their tuples times the number of bytes per tuple How can we estimate the number of tuples in an intermediate relation? Rules about estimation formulas: 1.Give (somehow) accurate estimates 2.Easy to compute

Projection Projection  retains duplicates, so the number of tuples in the result is the same as in the input. Result tuples are usually shorter than the input tuples. The size of a projection is the only one we can compute exactly.

Selection Let S =  A=c (R) We can estimate T(S) = T(R) / V(R,A) Let S =  A<c (R) On average, T(S) would be T(R)/2, but more properly: T(R)/3 Let S =  A  c (R), Then, an estimate is: T(S) = T(R)*[(V(R,A)-1)/V(R,A)], or simply T(S) = T(R)

Selection... Let S =  C AND D (R) =  C (  D (R)) and U =  D (R). First estimate T(U) and then use this to estimate T(S). Example S =  a=10 AND b<20 (R) T(R) = 10,000, V(R,a) = 50 T(S) = (1/50)* (1/3) * T(R) = 67 Note: Watch for selections like:  a=10 AND a>20 (R)

Selection... Let S =  C OR D (R). Simple estimate: T(S) = T(  C (R)) + T(  D (R)). Problem: It is possible that T(S)  T(R)! A more accurate estimate Let: –T(R)=n, –m 1 = size of selection on C, and –m 2 = size of selection on D. Then T(S) = n(1-(1-m 1 /n)(1-m 2 /n)) Why? Example: S =  a=10 OR b<20 (R). T(R) = 10,000, V(R,a) =50 Simple estimation: T(S) = 3533 More accurate: T(S) = 3466

Natural Join R(X,Y)  S(Y,Z) Anything could happen! No tuples join: T(R  S) = 0 Y is the key in S and a foreign key in R (i.e., R.Y refers to S.Y): Then, T(R  S) = T(R) All tuples join: i.e. R.Y=S.Y = a. Then, T(R  S) = T(R)*T(S)

Two Assumptions Containment of value sets If V(R,Y) ≤ V(S,Y), then every Y-value in R is assumed to occur as a Y-value in S When such thing can happen? For example when: Y is foreign key in R, and key in S Preservation of set values If A is an attribute of R but not S, then is assumed that V(R  S, A)=V(R, A) This may be violated when there are dangling tuples in R There is no violation when: Y is foreign key in R, and key in S

Natural Join size estimation Let, R(X,Y) and S(Y,Z), where Y is a single attribute. What’s the size of T(R  S)? Let r be a tuple in R and s be a tuple in S. What’s the probability that r and s join? Suppose V(R,Y)  V(S,Y) By the containment of set values we infer that: Every Y’s value in R appears in S. So, the tuple r of R surely is going match with some tuples of S, but what’s the probability it matches with s? It’s 1/V(S,Y). Hence, T(R  S) = T(R)*T(S)/V(S,Y) –When V(R,Y)  V(S,Y) By a similar reasoning, for the case when V(S,Y)  V(R,Y), we get T(R  S) = T(R)*T(S)/V(S,Y). So, sumarizing we have as an estimate: T(R  S) = T(R)*T(S)/max{V(R,Y),V(S,Y)}

Remember: T(R  S) = T(R)*T(S)/max{V(R,Y),V(S,Y)} Example: R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500 Estimate the size of R  S  U T(R  S) = 40,000, T((R  S)  U) = 400,000 T(S  U) = 20,000, T(R  (S  U)) = 400,000 The equality of results is not a coincidence. Note 1: estimate of final result should not depend on the evaluation order Note 2: intermediate results could be of different sizes

Natural join with multiple join attrib. R(x,y 1,y 2 )  S(y 1,y 2,z) T(R  S) = T(R)*T(S)/m 1 *m 2, where m 1 = max{V(R,y 1 ),V(S,y 1 )} m 2 = max{V(R,y 2 ),V(S,y 2 )} Why? Let r be a tuple in R and s be a tuple in S. What’s the probability that r and s agree on y 1 ? From the previous reasoning, it’s 1/max{V(R,y 1 ),V(S,y 1 )} Similarly, what’s the probability that r and s agree on y 2 ? It’s 1/max{V(R,y 2 ),V(S,y 2 )} Assuming that aggrements on y1 and y2 are independent we estimate: T(R  S) = T(R)*T(S)/[max{V(R,y 1 ),V(S,y 1 )} * max{V(R,y 2 ),V(S,y 2 )}] Example: T(R)=1000, V(R,b)=20, V(R,c)=100 T(S)=2000, V(S,d)=50, V(S,e)=50 R(a,b,c)  R.b=S.d AND R.c=S.e S(d,e,f) T(R  S) = (1000*2000)/(50*100)=400

Another example: (one of the previous) R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500 Estimate the size of R  S  U Observe that R  S  U = (R  U)  S T(R  U) = 1000*5000 = 5,000,000 Note that the number of b’s in the product is 20 (V(R,b)), and the number of c’s is 500 (V(U,c)). T((R  U)  S) = 5,000,000 * 2000 / (50 * 500) = 400,000

Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.

Similar presentations

Presentation on theme: "Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.

Similar presentations

Presentation on theme: "Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R."— Presentation transcript:

Similar presentations

About project

Feedback