Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Database Management Systems

Similar presentations


Presentation on theme: "Distributed Database Management Systems"— Presentation transcript:

1 Distributed Database Management Systems
Lecture 35

2 In the previous lecture
Query Optimization Centralized QO Best access path Join Processing QO in Distributed Environment.

3 In this lecture Query Optimization Fragmented Queries
Joins replaced by Semijoins Three major QO algorithms.

4 Semijoin based Algorithms

5 Reduces cost of join queries
Semijoin is ……. Join of two relations can be replaced SJ of one or both relations.

6 Which one? Need to estimate costs. So R ⋈A S can be replaced:
(R ⋉A S) ⋈A S R ⋈A (S ⋉A R) (R ⋉A S) ⋈A (S ⋉A R) Which one? Need to estimate costs.

7 Same Assumptions: R at site 1, S at site 2 Size (R) < Size (S), so
A (S)  site 1 Site1 computes R’ = R ⋉A S’ R’  site 2 Site2 computes R’ ⋈A S

8 Ignoring Tmsg semijoin is better if
Size(A(S)) + size(R ⋉A S) < size(R) Join is better if …..- Semijoin is better if…..-.

9 SJ with more than two tables Will be more complex
Semijoin approach can be applied to each individual join, consider EMP ⋈ ASG ⋈ PROJ

10 EMP ⋈ ASG ⋈ PROJ = EMP’ ⋈ ASG’ ⋈ PROJ where EMP’ = EMP ⋉ ASG and ASG’ = ASG ⋉ PROJ rather EMP” = EMP ⋉ (ASG ⋉ PROJ)

11 Many SJ expressions possible for a relation
“Full reducer” a SJ expression that reduces R the maximum Not exists for cyclic queries.

12 Select eName From EMP, ASG, PROJ Where
EMP.eNo = ASG.eNo and ASG.eNo = PROJ.eNo and EMP.city = PROJ.city

13 Cyclic Query Tree Query city eNo pNo eNo, city pNo, city ASG EMP PROJ

14 Full Reducer may be hard to find.
Easy for a chained query Most systems use single SJs to reduce relation size.

15 Distributed Query Processing Algorithms

16 Three main representative algos are
Distributed INGRES Algorithm R* Algorithm SDD-1 Algorithm.

17 R* Algorithm Static, exhaustive
Algorithm supports fragmentation, actual implementation doesn’t Master, execution and apprentice sites.

18 Optimizer of Master site makes inter-site decisions
Apprentice sites make local decisions Optimizes local processing time & communication time.

19 Optimizer, based on stats of DB and size of iterm results, decides about
Join Ordering Join Algo (nested/mergeJoin) Access path (indexed/seq.).

20 Inter-site transfers Ship-whole Fetch-as-needed
Entire relation transferred Stored in a temp relation In case of merge-join approach, tuples can be processed as they arrive Fetch-as-needed nExternal relation is sequentially scanned Join attribute value is sent to other relation Relevant tuples scanned at other site and sent to first site.

21 Inter-site transfers: comparison
Ship-whole larger data transfer smaller number of messages better if relations are small Fetch-as-needed number of messages = O(cardinality of external relation) data transfer per message is minimal better if relations are large and the join selectivity is good.

22 Example, join of an external relation R with an internal relation S, there are four strategies.

23 1-Move outer relation tuples to the site of the inner relation
Can be joined as they arrive Total Cost = LT (retrieve card(R) tuples from R) + CT (size(R)) + LT (retrieve s tuples from S) * card (R)

24 2- Move inner relation to the site of outer relation
cannot join as they arrive; they need to be stored Total Cost = LT (retrieve card(S) tuples from S) + CT (size (S)) + LT (store card(S) tuples as T) + LT (retrieve card(R) tuples from R) + LT (retrieve s tuples from T) * card (R).

25 3- Fetch inner tuples as needed
For each tuple in R, send join attribute value to site of S Retrieve matching inner tuples at site S Send the matching S tuples to site of R Join as they arrive

26 Total Cost = LT (retrieve card(R) tuples from R)+ CT (length(A) * card (R)) + LT(retrieve s tuples from S) * card(R) + CT (s * length(S)) * card(R)

27 4- Move both inner and outer relations to another site
Example: A query consisting join of PROJ (ext) and ASG (int) on pNo Four strategies

28 1- Ship PROJ to site of ASG
2- Ship ASG to site of PROJ 3- Fetch ASG tuples as needed for each tuple of PROJ 4- Move both to a third site Optimization involves costing for each possibility.

29 That is it regarding R* algorithm for distributed query optimization
Lets review it.

30 SDD-1 Algorithm System for Distributed Databases A non-commercial database.

31 Based on the Hill Climbing Algorithm
No semijoins, No rep/frag Cost of transferring the result to the user site from the final result site is not considered Can minimize either total time or response time

32 Input include Query Graph Locations of relations
Relations’ statistics.

33 1- Do the initial local processing
2- Select the initial best plan (ES0) Calculate cost of moving all relations to a single site Plan with the least cost is ES0

34 3- Split ES0 into ES1 and ES2 ES1: Sending one of the relation to other site, relations joined there ES2:Sending the result back to site in ES0.

35 5- Recursively apply step 3 and 4 on ES1 and ES2, until no improvement
4- Replace ES0 with ES1 and ES2 when we should have cost(ES1) + cost(local join) + cost (ES2) < cost (ES0) 5- Recursively apply step 3 and 4 on ES1 and ES2, until no improvement

36 Example “Find the salaries of engineers working on CAD/CAM project” Involves EMP, PAY, PROJ and ASG sal(PAY ⋈title(EMP ⋈eNo(ASG ⋈pNo(pName = ‘CAD/CAM’ (PROJ)))))

37 Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1
Relation Size Site EMP PAY PROJ ASG 8 4 1 10 2 3 Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1 So size(R) = card(R)

38 Site 1 Considering only transfers costs PAY  site 1 = 4
PROJ  site 1 = 1 ASG  site 1 = 10 Total = 15

39 Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1
Relation Size Site EMP PAY PROJ ASG 8 4 1 10 2 3 Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1 So size(R) = card(R)

40 Site 1 Considering only transfers costs PAY  site 1 = 4
PROJ  site 1 = 1 ASG  site 1 = 10 Total = 15

41 Cost for site 2 = 19 Cost for site 3 = 22 Cost for site 4 = 13 So site 4 is our ES0 Move all relations to site 4.

42 Thanks


Download ppt "Distributed Database Management Systems"

Similar presentations


Ads by Google