1 Distributed Databases architecture, fragmentation, allocation Lecture 1.

1 Distributed Databases architecture, fragmentation, allocation Lecture 1

2 Centralized DBMS Software: Application SQL Front End Query Processor Transaction Proc. File Access P M... Simplifications: single front end one place to keep locks if processor fails, system fails, …..

3 Distributed DB Multiple processors, memories, and disks –Opportunity for parallelism (+) –Opportunity for enhanced reliability (+) –Synchronization issues (-) Heterogeneity and autonomy of “components” –Autonomy example: may not get statistics for query optimization from a site

4 Heterogeneity Select new investments Application RDBMS Files Stock ticker tape Portfolio History of dividends, ratios,...

5 Data management with multiple processors and possible autonomy, heterogeneity. Impacts: Data organization Query processing Access structures Concurrency control Recovery Big Picture

6 Outline Introductory topics –Database architectures –Distributed versus Parallel DB systems Distributed database design –Fragmentation –Allocation

7 Common DB architectures Shared memory PPP... M P M P M P M Shared disk

8 Common DB architectures P M... P M P M Shared nothing Number of other “hybrid” architectures are possible.

9 Selecting the “right” architecture Reliability Scalability Geographic distribution of data Performance Cost

10 Parallel vs. Distributed DB system Typically, parallel DBs: –Fast interconnect –Homogeneous software –Goals: High performance and Transparency Typically, distributed DBs: –Geographically distributed –Disconnected operation possible –Goal: Data sharing (heterogeneity, autonomy)

11 Typical query processing scenarios Parallel DB: –Distribute/partition/sort…. data to make certain DB operations (e.g., Join) fast Distributed DB: –Given data distribution, find query processing strategy to minimize cost (e.g. communication cost)

12 Distributed DB Design Top-down approach: have a database how to split and allocate to individual sites Multi-databases (or bottom-up): combine existing databases how to deal with heterogeneity & autonomy

13 Two issues in top-down design Fragmentation Allocation Note: issues not independent, but studied separately for simplicity.

14 Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E where loc=Sa where loc=Sb and… and... Motivation: Two sites: Sa, Sb Qa   Qb Sa Sb

15 # Name Loc Sal 5 7 8 Sa10 SallySb25 TomSa15 Joe 5 8 Sa10 TomSa15 Joe7Sb25Sally.. F = {F 1,F 2 } At Sa At Sb E F 1 =  loc=Sa (E)F 2 =  loc=Sb (E)  primary horizontal fragmentation

16 Fragmentation Horizontal Primary depends on local attributes R Derived depends on foreign relation Vertical R

17 Horizontal partitioning techniques Round robin Hash partitioning Range partitioning

18 Round robin RF 0 F 1 F 2t1 t2 t3t4...t5 Evenly distributes data Good for scanning full relation Not good for point or range queries

19 Hash partitioning RF 0 F 1 F 2 t1  h(k 1 )=2t1 t2  h(k 2 )=0t2 t3  h(k 3 )=0t3 t4  h(k 4 )=1t4... Good for point queries on key; also for joins Not good for range queries; point queries not on key Good hash function even distribution

20 Range partitioning RF 0 F 1 F 2 t1: A=5t1 t2: A=8t2 t3: A=2t3 t4: A=3t4... Good for some range queries on A Need to select good vector: else create imbalance  data skew  execution skew 47 partitionin g vector V 0 V 1

21 Which are good fragmentations? Example 2: F = {F 3,F 4 } F 1 =  sal 5 (E) Example 1: F = {F 1,F 2 } F 1 =  sal 20 (E) Problem: Some tuples lost! Tuples with 5 < sal < 10 are duplicated

22 Prefer to deal with replication explicitly Example: F = { F 5, F 6, F 7 } F 5 =  sal<=5 (E) F 6 =  5 < sal<10 (E) F 7 =  sal>=10 (E)  Then replicate F 6 if desired as part of allocation

23 Horizontal Fragmentation Desiderata R  F = {F 1,F 2,….} (1) Completeness  t  R,  F i  F such that t  F i (2) Disjointness F i  F j = Ø,  i,j such that i  j (3) Reconstruction   such that R =  F i i

24 Generating horizontal fragments Given simple predicates P r = {p 1, p 2,.. p m } and relation R. Generate “minterm” predicates M = {m | m =  p k *, 1  k  m}, where p k * is either p k or ¬ p k Eliminate useless minterms and simplify M to get M’. Generate fragments  m (R) for each m  M’.

25 Example: say queries use predicates A 5, Loc = S A, Loc = S B Eliminate and simplify minterms A 5  Loc = S A  Loc = S B A 5  Loc = S A  ¬(Loc = S B ) Final set of fragments (5 < A < 10)  (Loc = S A ) (5 < A < 10)  (Loc = S B ) (A  5)  (Loc = S A ) (A  5)  (Loc = S B ) (A  10)  (Loc = S A ) (A  10)  (Loc = S B ) Example Work out details for all minterms. 5 < A < 10

26 More on Horizontal Fragmentation Elimination of useless fragments/predicates depends on application semantics: –e.g.: if Loc  S A and  S B is possible, must retain fragments such as (5 <A < 10)  (Loc  S A )  (Loc  S B ) Minterm-based fragmentation generates complete, disjoint, and reconstructible fragments. Justify this statement.

27 Choosing simple predicates E (#,name,loc,sal,…) with common queries Qa: select * from E where loc = S A and… Qb: select * from E where loc = S B and… Qc: select * from E where sal < 10 and… Three choices for P r and hence F[P r ]: –P r = {} F 1 = F[P r ] = {E} –P r = {Loc = S A, Loc = S B } F 2 = F[P r ] = {  loc=SA (E),  loc=SB (E)} –P r = {Loc = S A, Loc = S B, Sal < 10} F 3 = F[P r ] = {  loc=SA  sal< 10 (E),  loc=SB  sal< 10 (E),  loc=SA  sal  10 (E),  loc=SA  sal  10 (E)}

28 Loc=S A  sal < 10 Loc=S A  sal  10 Loc=S B  sal < 10 Loc=S B  sal  10 F1F1 F3F3 F2F2 Q a : Select … loc = S A... Q b : Select … loc = S B... Prefer F 2 to F 1 and F 3

29 Desiderata for simple predicates Completeness Set of predicates P r is complete if for every F i  F[P r ], every t  F i has equal probability of access by every major application. Minimality Set of predicates P r is minimal if no P r ’  P r is complete. To get complete and minimal P r use predicates that are “relevant” in frequent queries Different from completeness of fragmentation

30 Derived horizontal fragmentation Example: Two relations Employee and Jobs E(#, NAME, SAL, LOC) J(#, DES,…) Fragment E into {E 1, E 2 } by LOC Common query: “Given employee name, list projects (s)he works in”

31 E1E1 (at S a ) (at S b ) E2E2 J

32 E1E1 (at S a ) (at S b ) E2E2 J1J1 J2J2 J 1 = J E 1 J 2 = J E 2

33 Derived horizontal fragmentation R, fragmented as F = {F 1, F 2, …, F n }  S, derive D = {D 1, D 2, …, D n } where D i =S F i Convention: R is called the owner relation S is called the member relation

34 Completeness of derived fragmentation J 1 U J 2  J (incomplete fragmentation) For completeness, enforce referential integrity constraint join attribute of member relation  joint attribute of owner relation Example: Say J is

35 E1E1 E2E2 J1J1 J J2J2 Fragmentation is not disjoint! Common way to enforce disjointness: make join attribute key of owner relation.

36 Vertical fragmentation E1E1 E E2E2 Example: R[T]  R 1 [T 1 ], R 2 [T 2 ],…, R n [T n ] T i  T  Just like normalization of relations

37 Properties R[T]  R i [T i ], i = 1.. n Completeness:  T i = T Reconstruction: R i = R (lossless join) –One way to guarantee lossless join: repeat key in each fragment, i.e., key  T i  i Disjointness: T i  T j = {key} –Check disjointness only on non-key attributes

38 E 1 (#,NM,LOC) E 2 (#,SAL) Example: E(#,NM,LOC,SAL)E 1 (#,NM) E 2 (#,LOC) E 3 (#,SAL) Which is the right vertical fragmentation? ….. Grouping Attributes

39 A 1 A 2 A 3 A 4 A 5 A 1 78 50 45 1 0 A 2 50 25 28 2 0 A 3 45 28 34 0 4 A 4 1 2 0 20 75 A 5 0 0 4 75 40 A i,j  a measure of how “often” A i and A j are accessed by the same query Hand constructed using knowledge of queries and their frequencies Attribute affinity matrix Cluster attributes based on affinity R 1 [K,A 1,A 2,A 3 ] R 2 [K,A 4,A 5 ]

40 Allocation Example: E  F 1 =  loc=Sa (E); F 2 =  loc=Sb (E) Site a Site b Fragment E Do we replicate fragments? Where do we place each copy of each fragment? Site c F1F1 F1F1 F2F2

41 Issues Origin of queries Communication cost and size of answers, relations, etc. Storage capacity, storage cost at sites, and size of fragments Processing power at the sites Query processing strategy –How are joins done? Where are answers collected? Fragment replication –Update cost, concurrency control overhead

42 Optimization problem What is the best placement of fragments and/or best number of copies to: –minimize query response time –maximize throughput –minimize “some cost” –... Subject to constraints –Available storage –Available bandwidth, processing power,… –Keep 90% of response time below X –... Very hard problem

1 Distributed Databases architecture, fragmentation, allocation Lecture 1.

Similar presentations

Presentation on theme: "1 Distributed Databases architecture, fragmentation, allocation Lecture 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Distributed Databases architecture, fragmentation, allocation Lecture 1.

Similar presentations

Presentation on theme: "1 Distributed Databases architecture, fragmentation, allocation Lecture 1."— Presentation transcript:

Similar presentations

About project

Feedback