CS 245: Database System Principles

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS CS4432: Database Systems II Logical Plan Rewriting.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS CS4432: Database Systems II Query Processing – Size Estimation.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
CS 4432query processing1 CS4432: Database Systems II.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206.
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
CSCE Database Systems Chapter 15: Query Execution 1.
Advanced Database Systems Notes:Query Processing (Overview) Shivnath Babu.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
QUERY PROCESSING AND OPTIMIZATION. Overview SQL is a declarative language:  Specifies the outcome of the computation without defining any flow of control.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
1 Algebra of Queries Classical Relational Algebra It is a collection of operations on relations. Each operation takes one or two relations as its operand(s)
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
CS4432: Database Systems II Query Processing- Part 2.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Data Engineering SQL Query Processing Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 1 1.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Advanced Database Systems: DBS CB, 2nd Edition
CS4432: Database Systems II
CS 245: Database System Principles
Query Execution Presented by Khadke, Suvarna CS 257
Focus: Relational System
Chapters 15 and 16b: Query Optimization
Outline - Query Processing
Algebraic Laws.
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Relational Algebra Friday, 11/14/2003.
CPSC-608 Database Systems
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
Outline - Query Processing
CPS216: Data-Intensive Computing Systems Query Processing (Overview)
CPSC-608 Database Systems
Presentation transcript:

CS 245: Database System Principles Notes 6: Query Processing Peter Bailis CS 245 Notes 6

Query Processing Q  Query Plan CS 245 Notes 6

Focus: Relational System Query Processing Q  Query Plan Focus: Relational System Others? CS 245 Notes 6

Example Select B,D From R,S Where R.A = “c”  S.E = 2  R.C=S.C CS 245 Notes 6

R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 CS 245 Notes 6

R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 Answer B D 2 x CS 245 Notes 6

How do we execute query? One idea - Do Cartesian product - Select tuples - Do projection One idea CS 245 Notes 6

RXS R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . C 2 10 10 x 2 CS 245 Notes 6

RXS R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . C 2 10 10 x 2 Bingo! Got one... CS 245 Notes 6

Relational Algebra - can be used to describe plans... Ex: Plan I B,D sR.A=“c” S.E=2  R.C=S.C X R S CS 245 Notes 6

Relational Algebra - can be used to describe plans... Ex: Plan I B,D sR.A=“c” S.E=2  R.C=S.C X R S OR: B,D [ sR.A=“c” S.E=2  R.C = S.C (RXS)] CS 245 Notes 6

Another idea: Plan II B,D sR.A = “c” sS.E = 2 R S natural join CS 245 Notes 6

R S A B C s (R) s(S) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c 2 10 10 x 2 20 y 2 c 2 10 20 y 2 30 z 2 d 2 35 30 z 2 40 x 1 e 3 45 50 y 3 CS 245 Notes 6

Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples CS 245 Notes 6

Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E  2 (4) Join matching R,S tuples, project B,D attributes and place in result CS 245 Notes 6

R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 CS 245 Notes 6

R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 =“c” <c,2,10> A C I1 I2 CS 245 Notes 6

R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 =“c” <c,2,10> A C I1 I2 <10,x,2> CS 245 Notes 6

R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 =“c” <c,2,10> A C I1 I2 <10,x,2> check=2? output: <2,x> CS 245 Notes 6

R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 =“c” <c,2,10> A C I1 I2 <10,x,2> check=2? output: <2,x> next tuple: <c,7,15> CS 245 Notes 6

Overview of Query Optimization CS 245 Notes 6

consider physical plans SQL query parse parse tree convert answer logical query plan execute apply rules statistics Pi “improved” l.q.p pick best estimate result sizes l.q.p. +sizes {(P1,C1),(P2,C2)...} estimate costs consider physical plans {P1,P2,…..} CS 245 Notes 6

Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) CS 245 Notes 6

SELECT <SelList> FROM <FromList> WHERE <Condition> Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthDate ‘%1960’ CS 245 Notes 6

title  Example: Generating Relational Algebra StarsIn <condition> <tuple> IN name <attribute> birthdate LIKE ‘%1960’ starName MovieStar Fig. 7.15: An expression using a two-argument , midway between a parse tree and relational algebra CS 245 Notes 6

Fig. 7.18: Applying the rule for IN conditions Example: Logical Query Plan title starName=name  StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 7.18: Applying the rule for IN conditions CS 245 Notes 6

Fig. 7.20: An improvement on fig. 7.18. Example: Improved Logical Query Plan title Question: Push project to StarsIn? starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 7.20: An improvement on fig. 7.18. CS 245 Notes 6

Example: Estimate Result Sizes Need expected size StarsIn MovieStar P s CS 245 Notes 6

Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar CS 245 Notes 6

Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best! CS 245 Notes 6

Textbook outline Chapter 15 5 Algebra for queries [bags vs sets] - Select, project, join, …. [project list a,a+b->x,…] - Duplicate elimination, grouping, sorting 15.1 Physical operators - Scan,sort, … 15.2 - 15.6 Implementing operators + estimating their cost [Ch 5] [15.1] [15.2-15.6] CS 245 Notes 6

16.3[16.3] Parse tree -> logical query plan Chapter 16 16.1[16.1] Parsing 16.2[16.2] Algebraic laws 16.3[16.3] Parse tree -> logical query plan 16.4[16.4] Estimating result sizes 16.5-7[16.5-7] Cost based optimization CS 245 Notes 6

Reading textbook - Chapters 15, 16 Optional: Sections 15.7, 15.8, 15.9 [15.7, 15.8] Sections 16.6, 16.7 [16.6, 16.7] Optional: Duplicate elimination operator grouping, aggregation operators CS 245 Notes 6

Query Optimization - In class order Relational algebra level (A) Detailed query plan level Estimate Costs (B) without indexes with indexes Generate and compare plans (C) CS 245 Notes 6

consider physical plans SQL query parse parse tree convert answer logical query plan execute (A) apply rules statistics Pi “improved” l.q.p pick best (B) estimate result sizes l.q.p. +sizes {(P1,C1),(P2,C2)...} (C) estimate costs consider physical plans (C) {P1,P2,…..} CS 245 Notes 6

Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations? CS 245 Notes 6

Rules: Natural joins & cross products & union R S = S R (R S) T = R (S T) CS 245 Notes 6

Note: Carry attribute names in results, so order is not important Can also write as trees, e.g.: T R R S S T CS 245 Notes 6

Rules: Natural joins & cross products & union R S = S R (R S) T = R (S T) R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T CS 245 Notes 6

Rules: Selects sp1p2(R) = sp1vp2(R) = CS 245 Notes 6

sp1p2(R) = sp1 [ sp2 (R)] sp1vp2(R) = Rules: Selects [ sp1 (R)] U [ sp2 (R)] CS 245 Notes 6

Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} RUS = ? CS 245 Notes 6

Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} RUS = ? Option 1 SUM RUS = {a,a,b,b,b,b,b,c,c,c,d} Option 2 MAX RUS = {a,a,b,b,b,c,c,d} CS 245 Notes 6

Option 2 (MAX) makes this rule work: sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c CS 245 Notes 6

Option 2 (MAX) makes this rule work: sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c sp1vp2 (R) = {a,a,b,b,b,c} sp1(R) = {a,a,b,b,b} sp2(R) = {b,b,b,c} sp1(R) U sp2 (R) = {a,a,b,b,b,c} CS 245 Notes 6

“Sum” option makes more sense: Senators (……) Rep (……) T1 = pyr,state Senators; T2 = pyr,state Reps T1 Yr State T2 Yr State 14 CA 16 CA 16 CA 16 CA 15 AZ 15 CA Union? CS 245 Notes 6

Executive Decision -> Use “SUM” option for bag unions -> Some rules cannot be used for bags CS 245 Notes 6

pxy (R) = Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = CS 245 Notes 6

pxy (R) = px [py (R)] Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)] CS 245 Notes 6

pxy (R) = px [py (R)] Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)] CS 245 Notes 6

sp (R S) = sq (R S) = Rules: s + combined Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs sp (R S) = sq (R S) = CS 245 Notes 6

sp (R S) = sq (R S) = Rules: s + combined Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs sp (R S) = sq (R S) = [sp (R)] S R [sq (S)] CS 245 Notes 6

Rules: s + combined (continued) Some Rules can be Derived: spq (R S) = spqm (R S) = spvq (R S) = CS 245 Notes 6

Do one, others for homework: spq (R S) = [sp (R)] [sq (S)] spqm (R S) = sm [(sp R) (sq S)] spvq (R S) = [(sp R) S] U [R (sq S)] CS 245 Notes 6

--> Derivation for first one: spq (R S) = CS 245 Notes 6

--> Derivation for first one: spq (R S) = sp [sq (R S) ] = sp [ R sq (S) ] = [sp (R)] [sq (S)] CS 245 Notes 6

px[sp (R) ] = Rules: p,s combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = CS 245 Notes 6

px[sp (R) ] = Rules: p,s combined {sp [ px (R) ]} Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = {sp [ px (R) ]} CS 245 Notes 6

px[sp (R) ] = Rules: p,s combined px pxz {sp [ px (R) ]} Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = px pxz {sp [ px (R) ]} CS 245 Notes 6

pxy (R S) = Rules: p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S) = CS 245 Notes 6

pxy{[pxz (R) ] [pyz (S) ]} Rules: p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S) = pxy{[pxz (R) ] [pyz (S) ]} CS 245 Notes 6

pxy {sp (R S)} = CS 245 Notes 6

pxy {sp [pxz’ (R) pyz’ (S)]} pxy {sp (R S)} = pxy {sp [pxz’ (R) pyz’ (S)]} z’ = z U {attributes used in P } CS 245 Notes 6

Rules for s, p combined with X similar... e.g., sp (R X S) = ? CS 245 Notes 6

sp(R - S) = sp(R) - S = sp(R) - sp(S) Rules s, U combined: sp(R U S) = sp(R) U sp(S) sp(R - S) = sp(R) - S = sp(R) - sp(S) CS 245 Notes 6

Which are “good” transformations? sp1p2 (R)  sp1 [sp2 (R)] sp (R S)  [sp (R)] S R S  S R px [sp (R)]  px {sp [pxz (R)]} CS 245 Notes 6

Conventional wisdom: do projects early Example: R(A,B,C,D,E) x={E} P: (A=3)  (B=“cat”) px {sp (R)} vs. pE {sp{pABE(R)}} CS 245 Notes 6

What if we have A, B indexes? But B = “cat” A=3 Intersect pointers to get pointers to matching tuples CS 245 Notes 6

Bottom line: No transformation is always good Usually good: early selections CS 245 Notes 6

In textbook: more transformations Eliminate common sub-expressions Other operations: duplicate elimination CS 245 Notes 6

Outline - Query Processing Relational algebra level transformations good transformations Detailed query plan level estimate costs generate and compare plans CS 245 Notes 6

Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs CS 245 Notes 6

Estimating result size Keep statistics for relation R T(R) : # tuples in R S(R) : # of bytes in each R tuple B(R): # of blocks to hold all R tuples V(R, A) : # distinct values in R for attribute A CS 245 Notes 6

Example R A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d CS 245 Notes 6

Example R A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) = 5 S(R) = 37 V(R,A) = 3 V(R,C) = 5 V(R,B) = 1 V(R,D) = 4 CS 245 Notes 6

Size estimates for W = R1 x R2 T(W) = S(W) = CS 245 Notes 6

Size estimates for W = R1 x R2 T(W) = S(W) = T(R1)  T(R2) S(R1) + S(R2) CS 245 Notes 6

Size estimate for W = sA=a (R) S(W) = S(R) T(W) = ? CS 245 Notes 6

Example R V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W = sz=val(R) T(W) = A B cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d CS 245 Notes 6

Example R V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W = sz=val(R) T(W) = A B cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d what is probability this tuple will be in answer? CS 245 Notes 6

Example R V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W = sz=val(R) T(W) = cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) V(R,Z) CS 245 Notes 6

Assumption: Values in select expression Z = val are uniformly distributed over possible V(R,Z) values. CS 245 Notes 6

Alternate Assumption: Values in select expression Z = val are uniformly distributed over domain with DOM(R,Z) values. CS 245 Notes 6

R Alternate assumption Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d W = sz=val(R) T(W) = ? CS 245 Notes 6

R Alternate assumption Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d what is probability this tuple will be in answer? W = sz=val(R) T(W) = ? CS 245 Notes 6

R Alternate assumption Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) DOM(R,Z) W = sz=val(R) T(W) = CS 245 Notes 6

Selection cardinality SC(R,A) = average # records that satisfy equality condition on R.A T(R) V(R,A) SC(R,A) = DOM(R,A) CS 245 Notes 6

What about W = sz  val (R) ? CS 245 Notes 6

What about W = sz  val (R) ? Solution # 1: T(W) = T(R)/2 CS 245 Notes 6

What about W = sz  val (R) ? Solution # 1: T(W) = T(R)/2 Solution # 2: T(W) = T(R)/3 CS 245 Notes 6

Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W= sz  15 (R) Max=20 CS 245 Notes 6

Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W= sz  15 (R) Max=20 f = 20-15+1 = 6 (fraction of range) 20-1+1 20 T(W) = f  T(R) CS 245 Notes 6

fV(R,Z) = fraction of distinct values Equivalently: fV(R,Z) = fraction of distinct values T(W) = [f  V(Z,R)] T(R) = f  T(R) V(Z,R) CS 245 Notes 6

Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 CS 245 Notes 6

X  Y =  Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 X  Y =  Same as R1 x R2 Case 1 CS 245 Notes 6

W = R1 R2 X  Y = A Case 2 R1 A B C R2 A D CS 245 Notes 6

W = R1 R2 X  Y = A Case 2 R1 A B C R2 A D Assumption: V(R1,A)  V(R2,A)  Every A value in R1 is in R2 V(R2,A)  V(R1,A)  Every A value in R2 is in R1 “containment of value sets” Sec. 7.4.4 CS 245 Notes 6

Computing T(W) when V(R1,A)  V(R2,A) R1 A B C R2 A D Take 1 tuple Match CS 245 Notes 6

Computing T(W) when V(R1,A)  V(R2,A) R1 A B C R2 A D Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2)  T(R1) V(R2, A) CS 245 Notes 6

V(R1,A)  V(R2,A) T(W) = T(R2) T(R1) V(R2,A) [A is common attribute] CS 245 Notes 6

In general W = R1 R2 T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) } CS 245 Notes 6

with alternate assumption Case 2 Values uniformly distributed over domain R1 A B C R2 A D This tuple matches T(R2)/DOM(R2,A) so T(W) = T(R2) T(R1) = T(R2) T(R1) DOM(R2, A) DOM(R1, A) Assume the same CS 245 Notes 6

In all cases: S(W) = S(R1) + S(R2) - S(A) size of attribute A CS 245 Notes 6

Using similar ideas, we can estimate sizes of: PAB (R) ….. Sec. 16.4.2 (same for either edition) sA=aB=b (R) …. Sec. 16.4.3 R S with common attribs. A,B,C Sec. 16.4.5 Union, intersection, diff, …. Sec. 16.4.7 CS 245 Notes 6

Note: for complex expressions, need intermediate T,S,V results. E.g. W = [sA=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) S(U) = S(R1) Also need V (U, *) !! CS 245 Notes 6

To estimate Vs E.g., U = sA=a (R1) Say R1 has attribs A,B,C,D V(U, A) = V(U, B) = V(U, C) = V(U, D) = CS 245 Notes 6

Example R 1 V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U = sA=a (R1) A B cat 1 10 20 dog 30 40 bat 50 CS 245 Notes 6

V(D,U) ... somewhere in between Example R 1 V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U = sA=a (R1) A B C D cat 1 10 20 dog 30 40 bat 50 V(U,A) =1 V(U,B) =1 V(U,C) = T(R1) V(R1,A) V(D,U) ... somewhere in between CS 245 Notes 6

Possible Guess U = sA=a (R) V(U,A) = 1 V(U,B) = V(R,B) CS 245 Notes 6

For Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [called “preservation of value sets” in section 7.4.4] CS 245 Notes 6

Example: Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 R1 R2 R3 CS 245 Notes 6

Partial Result: U = R1 R2 T(U) = 10002000 V(U,A) = 50 200 V(U,B) = 100 V(U,C) = 300 CS 245 Notes 6

Z = U R3 T(Z) = 100020003000 V(Z,A) = 50 200300 V(Z,B) = 100 V(Z,C) = 90 V(Z,D) = 500 CS 245 Notes 6

A Note on Histograms sA=val(R) = ? number of tuples in R with A value 40 30 number of tuples in R with A value in given range 20 10 10 20 30 40 sA=val(R) = ? CS 245 Notes 6

Summary Estimating size of results is an “art” Don’t forget: Statistics must be kept up to date… (cost?) CS 245 Notes 6

Outline Estimating cost of query plan Generate and compare plans Estimating size of results done! Estimating # of IOs next… Generate and compare plans CS 245 Notes 6