CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS CS4432: Database Systems II Logical Plan Rewriting.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS CS4432: Database Systems II Query Processing – Size Estimation.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
CS 4432query processing1 CS4432: Database Systems II.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206.
Query Processing & Optimization
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
CSCE Database Systems Chapter 15: Query Execution 1.
Advanced Database Systems Notes:Query Processing (Overview) Shivnath Babu.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CPS216: Advanced Database Systems Query Rewrite Rules for Subqueries Shivnath Babu.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
QUERY PROCESSING AND OPTIMIZATION. Overview SQL is a declarative language:  Specifies the outcome of the computation without defining any flow of control.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
Query Processing Notes. Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations.
Query Compilation Contains slides by Hector Garcia-Molina.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
Data Engineering SQL Query Processing Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 1 1.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Advanced Database Systems: DBS CB, 2nd Edition
CS 245: Database System Principles
Focus: Relational System
CS 245: Database System Principles
Chapters 15 and 16b: Query Optimization
Outline - Query Processing
Algebraic Laws.
Relational Algebra Friday, 11/14/2003.
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
Outline - Query Processing
CPS216: Data-Intensive Computing Systems Query Processing (Overview)
CPSC-608 Database Systems
Presentation transcript:

CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina

CS 245Notes 62 Query Processing Q  Query Plan

CS 245Notes 63 Query Processing Q  Query Plan Focus: Relational System Others?

CS 245Notes 64 Example Select B,D From R,S Where R.A = “c”  S.E = 2  R.C=S.C

CS 245Notes 65 RABC S CDE a11010x2 b12020y2 c21030z2 d23540x1 e34550y3

CS 245Notes 66 RABC S CDE a11010x2 b12020y2 c21030z2 d23540x1 e34550y3 AnswerB D 2 x

CS 245Notes 67 How do we execute query? - Do Cartesian product - Select tuples - Do projection One idea

CS 245Notes 68 RXSR.AR.BR.CS.CS.DS.E a x 2 a y 2. C x 2.

CS 245Notes 69 RXSR.AR.BR.CS.CS.DS.E a x 2 a y 2. C x 2. Bingo! Got one...

CS 245Notes 610 Relational Algebra - can be used to describe plans... Ex: Plan I  B,D  R.A =“c”  S.E=2  R.C=S.C  X RS

CS 245Notes 611 Relational Algebra - can be used to describe plans... Ex: Plan I  B,D  R.A =“c”  S.E=2  R.C=S.C  X RS OR:  B,D [  R.A=“c”  S.E=2  R.C = S.C (RXS)]

CS 245Notes 612 Another idea:  B,D  R.A = “c”  S.E = 2 R S Plan II natural join

CS 245Notes 613 R S A B C  ( R )  ( S ) C D E a 1 10 A B C C D E 10 x 2 b 1 20c x 2 20 y 2 c y 2 30 z 2 d z 2 40 x 1 e y 3

CS 245Notes 614 Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples

CS 245Notes 615 Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E  2 (4) Join matching R,S tuples, project B,D attributes and place in result

CS 245Notes 616 R S A B C C D E a x 2 b y 2 c z 2 d x 1 e y 3 AC I1I1 I2I2

CS 245Notes 617 R S A B C C D E a x 2 b y 2 c z 2 d x 1 e y 3 AC I1I1 I2I2 =“c”

CS 245Notes 618 R S A B C C D E a x 2 b y 2 c z 2 d x 1 e y 3 AC I1I1 I2I2 =“c”

CS 245Notes 619 R S A B C C D E a x 2 b y 2 c z 2 d x 1 e y 3 AC I1I1 I2I2 =“c” check=2? output:

CS 245Notes 620 R S A B C C D E a x 2 b y 2 c z 2 d x 1 e y 3 AC I1I1 I2I2 =“c” check=2? output: next tuple:

CS 245Notes 621 Overview of Query Optimization

CS 245Notes 622 parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics

CS 245Notes 623 Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960)

CS 245Notes 624 Example: Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName SELECT FROM WHERE LIKE name MovieStar birthDate ‘%1960’

CS 245Notes 625 Example: Generating Relational Algebra  title  StarsIn IN  name  birthdate LIKE ‘%1960’ starName MovieStar Fig. 7.15: An expression using a two-argument , midway between a parse tree and relational algebra

CS 245Notes 626 Example: Logical Query Plan  title  starName=name StarsIn  name  birthdate LIKE ‘%1960’ MovieStar Fig. 7.18: Applying the rule for IN conditions 

CS 245Notes 627 Example: Improved Logical Query Plan  title starName=name StarsIn  name  birthdate LIKE ‘%1960’ MovieStar Fig. 7.20: An improvement on fig Question: Push project to StarsIn?

CS 245Notes 628 Example: Estimate Result Sizes Need expected size StarsIn MovieStar 

CS 245Notes 629 Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scanindex scan Parameters: Select Condition,... StarsInMovieStar

CS 245Notes 630 Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best!

CS 245Notes 631 Textbook outline Chapter 15 5 Algebra for queries [bags vs sets] - Select, project, join, ….[project list a,a+b->x,…] - Duplicate elimination, grouping, sorting 15.1 Physical operators - Scan,sort, … Implementing operators + estimating their cost [Ch 5] [15.1] [ ]

CS 245Notes 632 Chapter [16.1]Parsing 16.2[16.2]Algebraic laws 16.3[16.3]Parse tree -> logical query plan 16.4[16.4]Estimating result sizes [16.5-7] Cost based optimization

CS 245Notes 633 Reading textbook - Chapters 15, 16 Optional: –Sections 15.7, 15.8, 15.9 [15.7, 15.8] –Sections 16.6, 16.7 [16.6, 16.7] Optional: Duplicate elimination operator grouping, aggregation operators

CS 245Notes 634 Query Optimization - In class order Relational algebra level (A) Detailed query plan level –Estimate Costs (B) without indexes with indexes –Generate and compare plans (C)

CS 245Notes 635 parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics (A) (C) (B) (C)

CS 245Notes 636 Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations?

CS 245Notes 637 Rules: Natural joins & cross products & union R S=SR (R S) T= R (S T)

CS 245Notes 638 Note: Carry attribute names in results, so order is not important Can also write as trees, e.g.: T R R SS T

CS 245Notes 639 R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T Rules: Natural joins & cross products & union R S=SR (R S) T= R (S T)

CS 245Notes 640 Rules: Selects  p1  p2 (R) =  p1vp2 (R) =

CS 245Notes 641 Rules: Selects  p1  p2 (R) =  p1vp2 (R) =  p1 [  p2 (R)] [  p1 (R)] U [  p2 (R)]

CS 245Notes 642 Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} RUS = ?

CS 245Notes 643 Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} RUS = ? Option 1 SUM RUS = {a,a,b,b,b,b,b,c,c,c,d} Option 2 MAX RUS = {a,a,b,b,b,c,c,d}

CS 245Notes 644 Option 2 (MAX) makes this rule work:  p1vp2 (R) =  p1 (R) U  p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c

CS 245Notes 645 Option 2 (MAX) makes this rule work:  p1vp2 (R) =  p1 (R) U  p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c  p1vp2 (R) = {a,a,b,b,b,c}  p1 (R) = {a,a,b,b,b}  p2 (R) = {b,b,b,c}  p1 (R) U  p2 (R) = {a,a,b,b,b,c}

CS 245Notes 646 “Sum” option makes more sense: Senators (……)Rep (……) T1 =  yr,state Senators; T2 =  yr,state Reps T1 Yr State T2 Yr State 14 CA 16 CA 16 CA 16 CA 15 AZ 15 CA Union?

CS 245Notes 647 Executive Decision -> Use “SUM” option for bag unions -> Some rules cannot be used for bags

CS 245Notes 648 Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y  xy (R) =

CS 245Notes 649 Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y  xy (R) =  x [  y (R)]

CS 245Notes 650 Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y  xy (R) =  x [  y (R)]

CS 245Notes 651 Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs  p (R S) =  q (R S) = Rules:  combined

CS 245Notes 652 Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs  p (R S) =  q (R S) = Rules:  combined [  p (R)] S R [  q (S)]

CS 245Notes 653 Some Rules can be Derived:  p  q (R S) =  p  q  m (R S) =  pvq (R S) = Rules:  combined (continued)

CS 245Notes 654 Do one, others for homework:  p  q (R S) = [  p (R)] [  q (S)]  p  q  m (R S) =  m [ (  p R) (  q S) ]  pvq (R S) = [ (  p R) S ] U [ R (  q S) ]

CS 245Notes > Derivation for first one:  p  q (R S) =

CS 245Notes > Derivation for first one:  p  q (R S) =  p [  q (R S) ] =  p [ R  q (S) ] = [  p (R)] [  q (S)]

CS 245Notes 657 Rules:  combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes)  x [  p ( R ) ] =

CS 245Notes 658 Rules:  combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes)  x [  p ( R ) ] =  {  p [  x ( R ) ] }

CS 245Notes 659 Rules:  combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes)  x [  p ( R ) ] =  {  p [  x ( R ) ] } x x  xz

CS 245Notes 660 Rules:  combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes  xy (R S) =

CS 245Notes 661 Rules:  combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes  xy (R S) =  xy { [  xz ( R ) ] [  yz ( S ) ] }

CS 245Notes 662  xy {  p (R S) } =

CS 245Notes 663  xy {  p (R S) } =  xy {  p [  xz’ (R)  yz’ (S)] } z’ = z U { attributes used in P }

CS 245Notes 664 Rules for    combined with X similar... e.g.,  p (R X S) = ?

CS 245Notes 665  p (R U S) =  p (R) U  p (S)  p (R - S) =  p (R) - S =  p (R) -  p (S) Rules   U  combined:

CS 245Notes 666  p1  p2 (R)   p1 [  p2 (R)]  p (R S)  [  p (R)] S R S  S R  x [  p (R)]   x {  p [  xz (R)] } Which are “good” transformations?

CS 245Notes 667 Conventional wisdom: do projects early Example: R(A,B,C,D,E) x={E} P: (A=3)  (B=“cat”)  x {  p (R)} vs.  E {  p {  ABE (R)} }

CS 245Notes 668 What if we have A, B indexes? B = “cat” A=3 Intersect pointers to get pointers to matching tuples But

CS 245Notes 669 Bottom line: No transformation is always good Usually good: early selections

CS 245Notes 670 In textbook: more transformations Eliminate common sub-expressions Other operations: duplicate elimination

CS 245Notes 671 Outline - Query Processing Relational algebra level –transformations –good transformations Detailed query plan level –estimate costs –generate and compare plans

CS 245Notes 672 Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs

CS 245Notes 673 Estimating result size Keep statistics for relation R –T(R) : # tuples in R –S(R) : # of bytes in each R tuple –B(R): # of blocks to hold all R tuples –V(R, A) : # distinct values in R for attribute A

CS 245Notes 674 Example RA: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string ABCD cat110a cat120b dog130a dog140c bat150d

CS 245Notes 675 Example RA: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string ABCD cat110a cat120b dog130a dog140c bat150d T(R) = 5 S(R) = 37 V(R,A) = 3V(R,C) = 5 V(R,B) = 1V(R,D) = 4

CS 245Notes 676 Size estimates for W = R1 x R2 T(W) = S(W) =

CS 245Notes 677 Size estimates for W = R1 x R2 T(W) = S(W) = T(R1)  T(R2) S(R1) + S(R2)

CS 245Notes 678 S(W) = S(R) T(W) = ? Size estimate for W =  A=a  (R)

CS 245Notes 679 Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W =  z=val (R) T(W) = ABCD cat110a cat120b dog130a dog140c bat150d

CS 245Notes 680 Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W =  z=val (R) T(W) = ABCD cat110a cat120b dog130a dog140c bat150d what is probability this tuple will be in answer?

CS 245Notes 681 Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W =  z=val (R) T(W) = ABCD cat110a cat120b dog130a dog140c bat150d T(R) V(R,Z)

CS 245Notes 682 Assumption: Values in select expression Z = val are uniformly distributed over possible V(R,Z) values.

CS 245Notes 683 Alternate Assumption: Values in select expression Z = val are uniformly distributed over domain with DOM(R,Z) values.

CS 245Notes 684 Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = ?

CS 245Notes 685 Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = ? what is probability this tuple will be in answer?

CS 245Notes 686 Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = T(R) DOM(R,Z)

CS 245Notes 687 Selection cardinality SC(R,A) = average # records that satisfy equality condition on R.A T(R) V(R,A) SC(R,A) = T(R) DOM(R,A)

CS 245Notes 688 What about W =  z  val (R) ? T(W) = ?

CS 245Notes 689 What about W =  z  val (R) ? T(W) = ? Solution # 1: T(W) = T(R)/2

CS 245Notes 690 What about W =  z  val (R) ? T(W) = ? Solution # 1: T(W) = T(R)/2 Solution # 2: T(W) = T(R)/3

CS 245Notes 691 Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W=  z  15 (R) Max=20

CS 245Notes 692 Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W=  z  15 (R) Max=20 f = = 6 (fraction of range) T(W) = f  T(R)

CS 245Notes 693 Equivalently: f  V(R,Z) = fraction of distinct values T(W) = [f  V(Z,R)]  T(R) = f  T(R) V(Z,R)

CS 245Notes 694 Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2

CS 245Notes 695 Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 X  Y =  Same as R1 x R2 Case 1

CS 245Notes 696 W = R1 R2 X  Y = A R1 A B C R2 A D Case 2

CS 245Notes 697 W = R1 R2 X  Y = A R1 A B C R2 A D Case 2 Assumption: V(R1,A)  V(R2,A)  Every A value in R1 is in R2 V(R2,A)  V(R1,A)  Every A value in R2 is in R1 “containment of value sets” Sec

CS 245Notes 698 R1 A B C R2 A D Computing T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match

CS 245Notes 699 R1 A B C R2 A D Computing T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2)  T(R1) V(R2, A)

CS 245Notes 6100 V(R1,A)  V(R2,A) T(W) = T(R2) T(R1) V(R2,A) V(R2,A)  V(R1,A) T(W) = T(R2) T(R1) V(R1,A) [A is common attribute]

CS 245Notes 6101 T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) } In general W = R1 R2

CS 245Notes 6102 with alternate assumption Values uniformly distributed over domain R1 ABC R2 A D This tuple matches T(R2)/DOM(R2,A) so T(W) = T(R2) T(R1) = T(R2) T(R1) DOM(R2, A) DOM(R1, A) Case 2 Assume the same

CS 245Notes 6103 In all cases: S(W) = S(R1) + S(R2) - S(A) size of attribute A

CS 245Notes 6104 Using similar ideas, we can estimate sizes of:  AB (R) ….. Sec (same for either edition)  A=a  B=b (R) …. Sec R S with common attribs. A,B,C Sec Union, intersection, diff, …. Sec

CS 245Notes 6105 Note: for complex expressions, need intermediate T,S,V results. E.g. W = [  A=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) S(U) = S(R1) Also need V (U, *) !!

CS 245Notes 6106 To estimate Vs E.g., U =  A=a (R1) Say R1 has attribs A,B,C,D V(U, A) = V(U, B) = V(U, C) = V(U, D) =

CS 245Notes 6107 Example R1V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U =  A=a (R1) ABCD cat110 cat120 dog13010 dog14030 bat15010

CS 245Notes 6108 Example R1V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U =  A=a (R1) ABCD cat110 cat120 dog13010 dog14030 bat15010 V(U,A) =1 V(U,B) =1 V(U,C) = T(R1) V(R1,A) V(D,U)... somewhere in between

CS 245Notes 6109 Possible Guess U =  A=a (R) V(U,A) = 1 V(U,B)= V(R,B)

CS 245Notes 6110 For Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [called “preservation of value sets” in section 7.4.4]

CS 245Notes 6111 Example: Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 R1 R2 R3

CS 245Notes 6112 T(U) = 1000  2000 V(U,A) = V(U,B) = 100 V(U,C) = 300 Partial Result: U = R1 R2

CS 245Notes 6113 Z = U R3 T(Z) = 1000  2000  3000 V(Z,A) =  300 V(Z,B) = 100 V(Z,C) = 90 V(Z,D) = 500

CS 245Notes 6114 A Note on Histograms number of tuples in R with A value in given range  A=val (R) = ?

CS 245Notes 6115 Summary Estimating size of results is an “art” Don’t forget: Statistics must be kept up to date… (cost?)

CS 245Notes 6116 Outline Estimating cost of query plan –Estimating size of results done! –Estimating # of IOsnext… Generate and compare plans