Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-608 Database Systems

Similar presentations


Presentation on theme: "CPSC-608 Database Systems"— Presentation transcript:

1 CPSC-608 Database Systems
Fall 2018 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #19

2 parse tree-lqp convertor
Query Optimization An input database program P Prepare a collection C of efficient algorithms for operations in relational algebra; parser View processing, Semantic checking parse tree preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan reduce the size of intermediate results Optimization via logic and size logic query plan Lqp-pqp convertor take care of issues in optimization and security. physical query plan choices of algorithms, data structures, and computational modes Optimization via algorithms and cost Machine executable code

3 Improving logic plan via relation size
Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations. R S δ 500 150 1500 5000 2000 5000 2000

4 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3

5 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is small)

6 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

7 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

8 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

9 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R ∩ S) can be as large as T(S), and can be as small as 0. So in average, take T(R ∩ S) = T(S)/2?

10 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R) + T(S)/2 (assume S is small)

11 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(S). So in average, take T(R U S) = (T(R)+T(S))/2 ?

12 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = (T(R)+T(S))/2 ?

13 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?

14 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) Assuming T(R) ≥ T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?

15 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2

16 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

17 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

18 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

19 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

20 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ max{T(R)/2, T(S)/2} T(R‒S) can be as large as T(R), and as small as 0. So take T(R‒S) = T(R)/2 ? S has no impact ! So take T(R‒S) = T(R) ‒ max{T(R)/2, T(S)/2} ?

21 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}

22 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

23 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

24 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

25 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R⋈S) = T(R)T(S)/max{V(R,A),V(S,A)}

26 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

27 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ∏AV(R,A) } γ : T(γ(R)) = min{ T(R)/2, ∏grouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

28 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

29 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S))

30 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block

31 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block

32 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) = min{ T(R)/2, ΠAV(R,A) } γ : T(γ(R)) = min{ T(R)/2, Πgrouping AV(R,A) } σA=c: T(σA=c(R)) = T(R)/V(R,A) σA<c: T(σA<c(R)) = T(R)/3 ∩ : T(R ∩ S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) ‒ : T(R ‒ S) = T(R) ‒ T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T(σC(R×S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block

33 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS) B(W) = T(W)/#tuples-per-block

34 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : T(δ(R)) ⟹ B(δ(R)) γ : T(γ(R)) ⟹ B(γ(R)) σA=c: T(σA=c(R)) ⟹ B(σA=c(R)) σA<c: T(σA<c(R)) ⟹ B(σA<c(R)) ∩ : T(R ∩ S) ⟹ B(R ∩ S) U : T(R U S) ⟹ B(R U S) ‒ : T(R ‒ S) ⟹ B(R ‒ S) : T(R S) ⟹ B(R S) C : T(R CS) ⟹ B(R CS)

35 Estimating size parameters (T,B,V)
Similar to that for the parameter T

36 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Similar to that for the parameter T

37 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Containment Law: if V(R,A) > V(S,A), then all A-values in S are in R

38 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) Preservation Law: if attribute A is not involved in the operation, then the # of A-values is unchanged.

39 Estimating size parameters (T,B,V)
π, τ,×: size parameters can be calculated precisely δ : V(δ(R),A) = V(R,A) γ : V(γ(R),A) = V((R,A) (A is a grouping attribute) σA=c: V(σA=c(R),B) = V(R,B), V(σA=c(R),A) = 1 σA<c: V(σA<c(R),B) = V(R,B), V(σA<c(R),A) = V(R,A)/3 ∩ : V(R∩S,A) = V(S,A)/2 (assume V(R,A) ≥ V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) ≥ V(S,A)) ‒ : V(R‒S,A) = V(R,A) ‒ max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V(σC(R×S),A) The formulas for set/bag operations may depend on applications.

40 Improving logic plan via relation size
Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.

41 Improving logic plan via relation size
Major Steps: Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) Set up estimation rules for size parameters on relational algebraic operators; Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.

42 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10

43 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60

44 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60

45 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60

46 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60

47 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 Cost = 1100 Cost = 1150

48 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 Cost = 1100 Cost = 1150

49 Improving logic plan via relation size
R(a,b): T(R) = 5000, V(R,a) = 50, V(R,b) = 60, S(b,c): T(S) = 2000, V(S,b) = 200, V(S,c) = 100 T(δ(S)) = min{ T(S)/2, ΠAV(R,A) } R S δ σa=10 T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 R S δ T(R)=5000 V(R,a)=50 V(R,b)=60 T(S)=2000 V(S,b)=200 V(S,c)=100 σa=10 T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} T(*)=50 T(*)=1000 T(*)=1000 T(σA=c(R)) = T(R)/V(R,A) T(*)=100 V(*,a)=1 V(*,b)=60 T=100 V(R,a)=1 V(R,b)=60 To be more precise, we may also need to consider the #blocks Cost = 1100 Cost = 1150


Download ppt "CPSC-608 Database Systems"

Similar presentations


Ads by Google