Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.

Similar presentations


Presentation on theme: "CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner."— Presentation transcript:

1 CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner

2 CS 4432estimation - lecture 162 Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs

3 CS 4432estimation - lecture 163 Estimating result statistics Keep statistics for relation R –T(R) : # tuples in R –S(R) : # of bytes in each R tuple –B(R): # of blocks to hold all R tuples –V(R, A) : # distinct values in R for attribute A

4 CS 4432estimation - lecture 164 Note: for complex expressions, need intermediate T,S,V results. E.g. W = [  A=a (R1) ] R2 Treat as relation U T(U) = ? T(U) = T(R1)/V(R1,A) S(U) = ? S(U) = S(R1) Also need V (U, *) !?

5 CS 4432estimation - lecture 165 To estimate Vs E.g., U =  A=a (R1) Say R1 has attributes A,B,C,D V(U, A) = V(U, B) = V(U, C) = V(U, D) =

6 CS 4432estimation - lecture 166 Example R1V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U =  A=dog (R1) ABCD cat110 cat120 dog13010 dog14030 bat15010 V(U,A) =1 V(U,B) =1 V(U,C) = T(R1) V(R1,A) V(U,D)... somewhere in between

7 CS 4432estimation - lecture 167 Possible Guess U =  A=a (R) V(U,A) = 1 V(U,B)= V(R,B)

8 CS 4432estimation - lecture 168 For Joins U = R1(A,B) R2(A,C) What are different V(U,*) values? V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) Property: “preservation of value sets”

9 CS 4432estimation - lecture 169 Example: Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 R1 R2 R3

10 CS 4432estimation - lecture 1610 T(U) = 1000  2000 200 V(U,A) = 50 V(U,B) = 100 V(U,C) = 300 Partial Result: U = R1 R2

11 CS 4432estimation - lecture 1611 Z = U R3 T(Z) = 1000  2000  3000 200  300 V(Z,A) = 50 V(Z,B) = 100 V(Z,C) = 90 V(Z,D) = 500 T(Z)?

12 CS 4432estimation - lecture 1612 More on Estimation Uniform distribution is not accurate since real data is not uniformly distributed. Histogram: –a data structure maintained by a DBMS to approximate a data distribution. –divide range of column values into subranges (buckets). Assume distribution within histogram bucket is uniform. 2 3 3 1 2 1 3 8 4 2 0 1 2 4 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 10 2030 40 10 20 30 40 number of tuples in R with A value in given range

13 CS 4432estimation - lecture 1613 Summary on Estimation of “Sizes” Estimating size of results is an “art” Don’t forget: Statistics must be kept up to date… (cost?)

14 CS 4432estimation - lecture 1614CS 24514 Summary Estimating cost of query plan –Estimating size of results done! –Estimating # of IOs to do … Generate and compare plans


Download ppt "CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner."

Similar presentations


Ads by Google