Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Database Management Systems

Similar presentations


Presentation on theme: "Distributed Database Management Systems"— Presentation transcript:

1 Distributed Database Management Systems
Lecture 30

2 In the previous lecture
Locking based CC Timestamp ordering based CC Concluded TM.

3 In this Lecture Basic Concepts of Query Optimization
QP in centralized and Distributed DBs.

4 Introduction SQL one of the success factors of RDBMS
Query processor transforms complex queries into concise and simple ones

5 Query processing is critical performance issue
QP a complex problem specially in DDBS environment

6 Main function of QP is to transform an SQL query into equivalent relational algebra one (low level language) Transformation must achieve correctness and efficiency

7 Correctness is straightforward since rules exist
An SQL query can have many equivalents in R Algebra

8 Considering the tables
EMP(eNo, eName, title) ASG(eNo, pNo, resp, dur) PROJ(pNo, pName, budget, loc) Query: Get the names of employees who are managing a project

9 SELECT eName FROM EMP, ASG WHERE EMP.eNo = ASG.eNo AND resp = ‘Manager’

10 eName(resp=‘Manager’ ^ EMP.eNo = ASG.eNo) (EMPxASG)
eName(EMP ⋈ (resp=‘Manager’ (ASG))) Obviously second one needs less computing resources since avoids Cartesian product

11 Centralized QP is to choose best query execution plan
Distributed is more complex; it also involves the selection of site to execute query

12 Same query in DDBS Suppose EMP and ASG are HF as EMP1 = eNo ≤ ‘E3’ (EMP) EMP2 = eNo > ‘E3’ (EMP) ASG1 = eNo ≤ ‘E3’ (ASG) ASG2 = eNo > ‘E3’ (ASG)

13 Further suppose these fragments are stored at site 1, 2, 3 and 4 and result at site 5

14 Site 5 Site 4 Site 3 Site 2 Site 1 EMP1’ EMP2’ ASG1’ ASG2’
ASC1’=resp = ‘Manager(ASG1) EMP1’=EMP1 ⋈(ASG1’) Site 1 Site 3 ASC2’=resp = ‘Manager(ASG2) EMP2’=EMP2 ⋈(ASG2’) Site 2 Site 4 ASG1’ ASG2’ result = EMP1’ U EMP2’ Site 5 EMP1’ EMP2’

15  resp = ‘Manager’ (ASG1 U ASG2)
result = (EMP1 U EMP2) ⋈ eNo  resp = ‘Manager’ (ASG1 U ASG2) Site 1 Site 2 Site 3 Site 4 ASG1 ASG2 EMP1 EMP2

16 Lets Assume size(EMP) size(ASG) 400 1000 tuple access cost
tuple transfer cost 1 unit 10 units There are 20 Managers Data distributed evenly at all sites

17 Strategy 1 produce ASG': 20*1 20
transfer ASG' to the sites of E: 20 * 10 200 produce EMP': (10+10) *1*2 40 transfer EMP' to result site: 20*10 Total 460

18 Strategy 2 Transfer EMP to site 5: 400 * 10 4000
Transfer ASG to the site * 10 10000 Produce ASG‘ by selecting ASG 1000 Join EMP and ASG’ 8000 Total 23000

19 Query Optimization An important aspect of QP
Minimize resource consumption I/O cost + CPU cost + communication cost First two in Centralized DB

20 Communication Cost will dominate in WAN
Not that dominant in LANs, so total cost should be considered in LANs QO can also maximize throughput

21 Operators’ Complexity
Select, Project (without duplicate elimination) O(n) Project (with duplicate elimination), Group O(nlogn) Join, Semi-Join, Division, Set Operators O(nlog n) Cartesian Product O(n2)

22 Characterization of Query Processors

23 Types of Optimization Exhaustive search for the cost of each strategy to find the most optimal one May be very costly in case of multiple options and more fragments Heuristics

24 Optimization Timing Static: during compilation
Size of intermediate tables not known always Cost justified with repeated execution Dynamic: during execution Intermediate tables’ size known Re-optimzation may be required

25 Statistics Relation/Fragment: Cardinality, size of a tuple, fraction of tuples participating in a join with another relation Attribute: cardinality of domain, actual number of distinct values

26 Decision Sites Centralized: simple, need knowledge about the entire distributed database Distributed: cooperation among sites to determine the schedule, need only local information Hybrid: one site determines the global schedule, each site optimizes the local subqueries

27 Other factors like: Network topology Replicated fragments
Use of semijoins.

28 Optimized Local Query SQL Query on Distributed Relations QUERY GLOBAL
DECOMPOSITION GLOBAL SCHEMA Algebraic Query on Distributed Relations DATA LOCALIZATION FRAGMENT Fragment Query OPTIMIZATION STAT OF FRAGMENTS Optimized Fragment Query with Communication Operations LOCAL Optimized Local Query


Download ppt "Distributed Database Management Systems"

Similar presentations


Ads by Google