Presentation is loading. Please wait.

Presentation is loading. Please wait.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Similar presentations


Presentation on theme: "PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University."— Presentation transcript:

1 PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

2 Lecture 06 Distributed Database Design

3 Outline Distributed Database Design  Distributed Design Problem  Distributed Design Issues o Fragmentation o Data Allocation Slide 3

4 The design of a distributed computer system involves  making decisions on the placement of data and programs across the sites of a computer network  as well as possibly designing the network itself In the case of distributed DBMSs, the distribution of applications involves two things:  distribution of the distributed DBMS software  distribution of the application programs that run on it. Are not significant problem  Assume a copy of the distributed DBMS software exists at each site where data are stored  Network has already been designed We concentrate on distribution of data Slide 4 Distributed Design Problem

5 Distribution Design Issues The following set of interrelated questions covers the entire issue.  Why fragment at all?  How should we fragment?  How much should we fragment?  How to test correctness?  How should we allocate?  What is the necessary information for fragmentation and allocation? Slide 5

6 Reasons for Fragmentation The important issue is the appropriate unit of distribution.  A relation is not a suitable unit, for a number of reasons. o First, application views are usually subsets of relations. o Therefore, the locality of accesses of applications is defined not on entire relations but on their subsets o Hence consider subsets of relations as distribution units. Slide 6

7 Reasons for Fragmentation The relation is not replicated and is stored at only one site,  results in an unnecessarily high volume of remote data accesses The relation is replicated at all or some of the sites where the applications reside.  May has unnecessary replication, which causes problems in executing updates  may not be desirable if storage is limited. Finally, the decomposition of a relation into fragments, each being treated as a unit, permits a number of transactions to execute concurrently.  Thus fragmentation typically increases the level of concurrency and therefore the system throughput. Slide 7

8 Relation instances are essentially tables, so the issue is one of finding alternative ways of dividing a table into smaller ones. There are clearly two alternatives for this:  dividing it horizontally or dividing it vertically. Slide 8 Fragmentation Alternatives

9 Fragmentation Alternatives – Horizontal PROJ 1 :projects with budgets less than $200,000 PROJ 2 :projects with budgets greater than or equal to $200,000 PROJ 1 PNOPNAMEBUDGET LOC P3CAD/CAM250000New York P4Maintenance310000Paris P5CAD/CAM500000Boston PNOPNAME LOC P1Instrumentation150000Montreal P2Database Develop.135000New York BUDGET PROJ 2 New York PROJ PNOPNAMEBUDGETLOC P1Instrumentation150000Montreal P3CAD/CAM250000 P2Database Develop.135000 P4Maintenance310000Paris P5CAD/CAM500000Boston Example of Horizontal Partitioning Slide 9

10 Fragmentation Alternatives – Vertical PROJ 1 :information about project budgets PROJ 2 :information about project names and locations PNOBUDGET P1150000 P3250000 P2135000 P4310000 P5500000 PNOPNAMELOC P1InstrumentationMontreal P3CAD/CAMNew York P2Database Develop.New York P4MaintenanceParis P5CAD/CAMBoston PROJ 1 PROJ 2 New York PROJ PNOPNAMEBUDGETLOC P1Instrumentation150000Montreal P3CAD/CAM250000 P2Database Develop.135000 P4Maintenance310000Paris P5CAD/CAM500000Boston Example of Vertical Partitioning Slide 10

11 Correctness of Fragmentation Completeness  Decomposition of relation R into fragments R 1, R 2,..., R n is complete if and only if each data item in R can also be found in some R i  This property, which is identical to the lossless decomposition property of normalization  it ensures that the data in a global relation are mapped into fragments without any loss Slide 11

12 Correctness of Fragmentation Reconstruction  If relation R is decomposed into fragments F R ={R 1, R 2,..., R n }, then there should exist some relational operator ∇ such that R = ∇ Ri,  The reconstructability of the relation from its fragments ensures that constraints defined on the data in the form of dependencies are preserved. Slide 12

13 Correctness of Fragmentation Disjointness  If relation R is horizontally decomposed into fragments F R ={R 1, R 2,..., R n }, and data item d j is in R j, then d j should not be in any other fragment R k (k ≠ j ).  This criterion ensures that the horizontal fragments are disjoint.  If relation R is vertically decomposed, its primary key attributes are typically repeated in all its fragments (for reconstruction).  Therefore, in case of vertical partitioning, disjointness is defined only on the non-primary key attributes of a relation. Slide 13

14 Allocation Alternatives Non-replicated  partitioned : each fragment resides at only one site Replicated  fully replicated : each fragment at each site  partially replicated : each fragment at some of the sites Slide 14

15 Information Requirements The information needed for distribution design can be divided into four categories:  Database information  Application information  Communication network information  Computer system information Slide 15

16 Fragmentation Horizontal Fragmentation (HF) Vertical Fragmentation (VF) Hybrid Fragmentation (HF) Slide 16

17 Horizontal Fragmentation (HF) Horizontal fragmentation partitions a relation along its tuples.  each fragment has a subset of the tuples of the relation. There are two versions of horizontal partitioning:  Primary horizontal fragmentation  Derived horizontal fragmentation Primary horizontal fragmentation of a relation is performed  using predicates that are defined on that relation. Derived horizontal fragmentation is the partitioning of a relation  results from predicates being defined on another relation. Slide 17

18 Information Requirements of HF Database Information  how the database relations are connected to one another, especially with joins.  In the relational model, these relationships are also depicted as relations.  cardinality of each relation: card(R) TITLE, SAL SKILL ENO, ENAME, TITLEPNO, PNAME, BUDGET, LOC ENO, PNO, RESP, DUR EMP PROJ ASG L1L1 L2L2 L3L3 Expression of Relationships Among Relations Using Links Slide 18

19 Application Information  simple predicates : Given R[A 1, A 2, …, A n ], a simple predicate p j is p j : A i θValue where θ  {=,,≥,≠}, Value  D i and D i is the domain of A i. For relation R we define Pr = {p 1, p 2, …,p m } Example : PNAME = "Maintenance" BUDGET ≤ 200000 Information Requirements of HF Slide 19

20 Application Information  minterm predicates : Given R and Pr = {p 1, p 2, …,p m } define M = {m 1,m 2,…,m r } as M = { m i | m i =  p j  Pr  p j * }, 1≤j≤m, 1≤i≤z where p j * = p j or p j * = ¬(p j ). Information Requirements of HF Example m 1 : PNAME="Maintenance"  BUDGET≤200000 m 2 : NOT(PNAME="Maintenance")  BUDGET≤200000 m 3 : PNAME= "Maintenance"  NOT(BUDGET≤200000) m 4 : NOT(PNAME="Maintenance")  NOT(BUDGET≤200000) Slide 20

21 Application Information In terms of quantitative information about user applications, we need to have two sets of data.  minterm selectivities: sel(m i ) o The number of tuples of the relation that would be accessed by a user query which is specified according to a given minterm predicate m i.  access frequencies: acc(q i ) o The frequency with which a user application qi accesses data. o Access frequency for a minterm predicate can also be defined. Information Requirements of HF Slide 21

22 Primary Horizontal Fragmentation Definition : A primary horizontal fragmentation is defined by a selection operation on the owner relations of a database schema. Therefore, given relation R, its horizontal fragments are given by R i =  F i (R), 1 ≤ i ≤ w where F i is a selection formula used to obtain fragment R i. If F i is in conjunctive normal form, it is a minterm predicate (m i ).  A horizontal fragment R i of relation R consists of all the tuples of R which satisfy a minterm predicate m i.  Given a set of minterm predicates M, there are as many horizontal fragments of relation R as there are minterm predicates.  Set of horizontal fragments also referred to as minterm fragments. Slide 22

23 Primary Horizontal Fragmentation PROJ1 =  LOC=“Montreal” (PROJ) PROJ2 =  LOC=“New York” (PROJ) PROJ3 =  LOC=“Paris” (PROJ) Slide 23 Primary Horizontal Fragmentation of Relation PROJ

24 PHF – Algorithm Given:A relation R, the set of simple predicates Pr Output:The set of fragments of R = {R 1, R 2,…,R w } which obey the fragmentation rules. Preliminaries :  Pr should be complete  Pr should be minimal Slide 24

25 Completeness of Simple Predicates A set of simple predicates Pr is said to be complete if and only if there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to Pr. Example :  Assume PROJ[PNO,PNAME,BUDGET,LOC] has two applications defined on it.  Find the budgets of projects at each location.(1)  Find projects with budgets less than $200000.(2) Slide 25

26 Completeness of Simple Predicates According to (1), Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”} which is not complete with respect to (2). Modify Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”, BUDGET≤200000,BUDGET>200000} which is complete. Slide 26

27 Minimality of Simple Predicates If a predicate influences how fragmentation is performed, (i.e., causes a fragment f to be further fragmented into, say, f i and f j ) then there should be at least one application that accesses f i and f j differently. In other words, the simple predicate should be relevant in determining a fragmentation. If all the predicates of a set Pr are relevant, then Pr is minimal. P i is relevant if and only if  acc(m i ) the access frequency of a minterm m i. Slide 27

28 Minimality of Simple Predicates Example : Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”, BUDGET≤200000,BUDGET>200000} is minimal (in addition to being complete). However, if we add PNAME = “Instrumentation” then Pr is not minimal. Slide 28

29 COM_MIN Algorithm Given:a relation R and a set of simple predicates Pr Output:a complete and minimal set of simple predicates Pr' for Pr Rule 1:a relation or fragment is partitioned into at least two parts which are accessed differently by at least one application. Slide 29

30 COM_MIN Algorithm  Initialization :  find a p i  Pr such that p i partitions R according to Rule 1  set Pr' = p i ; Pr  Pr – {p i } ; F  {f i }  Iteratively add predicates to Pr' until it is complete  find a p j  Pr such that p j partitions some f k defined according to minterm predicate over Pr' according to Rule 1  set Pr' = Pr'  {p j }; Pr  Pr – {p j }; F  F  {f j }  if  p k  Pr' which is nonrelevant then Pr'  Pr' – {p k } F  F – {f k } Slide 30

31 PHORIZONTAL Algorithm Makes use of COM_MIN to perform fragmentation. Input:a relation R and a set of simple predicates Pr Output:a set of minterm predicates M according to which relation R is to be fragmented  Pr'  COM_MIN (R,Pr)  determine the set M of minterm predicates  determine the set I of implications among p i  Pr‘  Iteratively eliminate the contradictory minterms from M  M  M-m i Slide 31

32 PHF – Example Two candidate relations : PAY and PROJ. Fragmentation of relation PAY  Application: Check the salary info and determine raise.  Employee records kept at two sites  application run at two sites  Simple predicates p 1 : SAL ≤ 30000 p 2 : SAL > 30000 Pr = {p 1,p 2 } which is complete and minimal Pr'=Pr  Minterm predicates m 1 : (SAL ≤ 30000) m 2 : NOT(SAL ≤ 30000)  (SAL > 30000) Slide 32

33 PHF – Example TITLE Mech. Eng. Programmer SAL 27000 24000 PAY 1 PAY 2 TITLE Elect. Eng. Syst. Anal. SAL 40000 34000 Slide 33

34 PHF – Example Fragmentation of relation PROJ  Applications: o Find the name and budget of projects given their location. – Issued at three sites o Access project information according to budget – one site accesses ≤200000 other accesses >200000  Simple predicates  For application (1) p 1 : LOC = “Montreal” p 2 : LOC = “New York” p 3 : LOC = “Paris”  For application (2) p 4 : BUDGET ≤ 200000 p 5 : BUDGET > 200000  Pr = Pr' = {p 1,p 2,p 3,p 4,p 5 } Slide 34

35 Fragmentation of relation PROJ continued  Minterm fragments left after elimination m 1 : (LOC = “Montreal”)  (BUDGET ≤ 200000) m 2 : (LOC = “Montreal”)  (BUDGET > 200000) m 3 : (LOC = “New York”)  (BUDGET ≤ 200000) m 4 : (LOC = “New York”)  (BUDGET > 200000) m 5 : (LOC = “Paris”)  (BUDGET ≤ 200000) m 6 : (LOC = “Paris”)  (BUDGET > 200000) PHF – Example Slide 35

36 PHF – Example PROJ 1 PNOPNAMEBUDGETLOC PNOPNAMEBUDGETLOC P1Instrumentation150000Montreal P2 Database Develop. 135000 New York PROJ 3 PROJ 4 PROJ 6 PNOPNAMEBUDGETLOC P3CAD/CAM250000New York PNOPNAMEBUDGETLOC MaintenanceP4310000Paris The result of the primary horizontal fragmentation of PROJ forms six fragments FPROJ = {PROJ1, PROJ2, PROJ3, PROJ4, PROJ5, PROJ} according to the minterm predicates M. Since fragments PROJ 2, and PROJ 5 are empty, they are not depicted in Figure Slide 36

37 Completeness  Since Pr' is complete and minimal, the selection predicates are complete Reconstruction  If relation R is fragmented into F R = {R 1,R 2,…,R r } R =   R i  F R R i Disjointness  Minterm predicates that form the basis of fragmentation should be mutually exclusive. PHF – Correctness Slide 37

38 Derived Horizontal Fragmentation Defined on a member relation of a link according to a selection operation specified on its owner.  Each link is an equijoin.  Equijoin can be implemented by means of semijoins. TITLE,SAL SKILL ENO, ENAME, TITLEPNO, PNAME, BUDGET, LOC ENO, PNO,RESP, DUR EMPPROJ ASG L1L1 L2L2 L3L3 Slide 38

39 DHF – Definition Given a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as R i = R ⋉ S i, 1≤i≤w where w is the maximum number of fragments that will be defined on R and S i =  F i  (S) where F i is the formula according to which the primary horizontal fragment S i is defined. Slide 39

40 Given link L 1 where owner(L 1 )=SKILL and member(L 1 )=EMP EMP 1 = EMP ⋉ SKILL 1 EMP 2 = EMP ⋉ SKILL 2 where SKILL 1 =  SAL≤30000 (SKILL) SKILL 2 =  SAL>30000 (SKILL) DHF – Example ENOENAMETITLE E3A. LeeMech. Eng. E4J. MillerProgrammer E7R. DavisMech. Eng. EMP 1 ENOENAMETITLE E1J. DoeElect. Eng. E2M. SmithSyst. Anal. E5B. CaseySyst. Anal. EMP 2 E6L. ChuElect. Eng. E8J. JonesSyst. Anal. Slide 40

41 DHF – Correctness Completeness  Referential integrity o ensures that the tuples of any fragment of the member relation are also in the owner relation. o Let R be the member relation of a link whose owner is relation S which is fragmented as F S = {S 1, S 2,..., S n }. Furthermore, let A be the join attribute between R and S. Then, for each tuple t of R, there should be a tuple t' of S such that t[A] = t' [A] Reconstruction Let a relation R with fragmentation F R = { R 1,R 2,……R w } Disjointness  Simple join graphs between the owner and the member fragments. Slide 41

42 Slide 42 DHF – Correctness

43 More difficult than horizontal, because more alternatives exist. Example:  In horizontal partitioning, if the total number of simple predicates in Pr is n, there are 2 n possible minterm predicates that can be defined on it.  some of these will contradict the existing implications, further reducing the candidate fragments that need to be considered In the case of vertical partitioning if a relation has m non- primary key attributes, the number of possible fragments is equal to B(m), which is the mth Bell number. For large values of m;B(m)= approximately (m m )  for m=10, B(m) =115,000,  for m=15, B(m) =10 9,  for m=30, B(m) = 10 23 Vertical Fragmentation Slide 43

44 In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications. In this case a vertical fragmentation may be followed by a horizontal one, or vice versa, producing a tree structured Partitioning. Since the two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation. It has also been named mixed fragmentation or nested fragmentation. Slide 44 Hybrid Fragmentation

45 R HF R1R1 VF R 11 R 12 R 21 R 22 R 23 R2R2 It is also called mixed fragmentation or nested fragmentation. Slide 45

46 To reconstruct the original global relation in case of hybrid fragmentation, one starts at the leaves of the partitioning tree and moves upward by performing joins and unions. The fragmentation is complete if the intermediate and leaf fragments are complete. Similarly, disjointness is guaranteed if intermediate and leaf fragments are disjoint. Slide 46 Correctness of Hybrid Fragmentation

47 Allocation Allocation Problem Given F = {F 1, F 2, …, F n } fragments S ={S 1, S 2, …, S m } network sites on which a set of applications Q = {q 1, q 2,…, q q } is running. The allocation problem involves finding the “optimal” distribution of F to S. Optimality can be defined with respect to two measures:  Minimal cost o The cost function consists of the cost of storing each Fi at a site Sj, o the cost of querying Fi at site Sj, the cost of updating Fi at all sites where it is stored, o the cost of data communication.  Performance o minimize the response time. o maximize the system throughput at each site. Slide 47

48 General Form min(Total Cost) subject to response time constraint storage constraint processing constraint Decision Variable Allocation Model x ij  1 if fragment F i is stored at site S j 0 otherwise Slide 48

49 Total Cost Storage Cost (of fragment F j at S k ) We choose a different approach in our model of the database allocation problem (DAP) and specify it as consisting of the processing cost (PC) and the transmission cost (TC). Thus the query processing cost (QPC) for application qi is: processing component + transmission component Allocation Model (unit storage cost at S k )  (size of F j )  x jk query processing cost  all queries  cost of storing a fragment at a site all fragments  all sites  Slide 49

50 Allocation Model Query Processing Cost  Processing component PC, consists of three cost factors  the access cost (AC) + the integrity enforcement cost (IE) + the concurrency control cost (CC)  Access cost o The first two terms calculate the number of accesses of user query qi to fragment Fj. o We assume that the local costs of processing them are identical. o The summation gives the total number of accesses for all the fragments referenced by qi. Multiplication by LPC k gives the cost of this access at site S k. o We again use x ij to select only those cost values for the sites where fragments are stored.  Integrity enforcement and concurrency control costs o Can be similarly calculated (no. of update accesses+ no. of read accesses)  all fragments  all sites  x ij  local processing cost at a site Slide 50

51 Query Processing Cost Transmission component cost of processing updates + cost of processing retrievals  In update queries it is necessary to inform all the sites where replicas exist, while in retrieval queries, it is sufficient to access only one of the copies.  In addition, at the end of an update request, there is no data transmission back to the originating site other than a confirmation message, whereas the retrieval-only queries may result in significant data transmission.  Cost of updates  Retrieval Cost Allocation Model update message cost  all fragments  all sites  acknowledgment cost all fragments  all sites  min all sites all fragments  (cost of retrieval command  cost of sending back the result) Slide 51

52 Allocation Model Constraints  Response Time execution time of query ≤ max. allowable response time for that query  Storage Constraint (for a site)  Processing constraint (for a site) storage requirement of a fragment at that site  all fragments  storage capacity at that site processing load of a query at that site  all queries  processing capacity of that site Slide 52

53 Thank You Slide 53


Download ppt "PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University."

Similar presentations


Ads by Google