Presentation is loading. Please wait.

Presentation is loading. Please wait.

V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.

Similar presentations


Presentation on theme: "V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS."— Presentation transcript:

1 V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS Dept. CIS616– Principles of Data Management

2 Overview Problem – motivation Design issues Query processing – semijoins transactions (recovery, conc. control)

3 Problem – definition centralized DB: LANY CHICAGO

4 Problem – definition distributed DB: DB stored in many places (cites)... connected LA NY

5 Problem – definition distributed DB: stored in many connected cites DB typically geographically separated separately administered transactions are differentiated: Local Global possibly different DBMSs, DB schemas (heterogeneous) LA NY

6 Problem – definition LA NY EMP EMPLOYEE connect to LA; exec sql select * from EMP;... connect to NY; exec sql select * from EMPLOYEE;... now: DBMS1 DBMS2

7 Problem – definition LA NY EMP EMPLOYEE connect to D-DBMS; exec sql select * from EMPL; ideally: DBMS1 DBMS2 D-DBMS

8 Pros + Cons ?

9 Pros + Cons Pros data sharing reliability & availability autonomy (local) speed up of query processing Cons software development cost more bugs increased processing overhead (msg)

10 Overview Problem – motivation Design issues Query processing – semijoins transactions (recovery, conc. control)

11 Design of Distr. DBMS Homogeneous distr. DBs Identical DBMS Same DB Schema Aware of one another Agree to cooperate in transaction processing Heterogeneous distr. DBs Different DBMS Different DB Schema May not be aware of one another May provide limited facilities for cooperation in transaction processing

12 Design of Distr. DBMS what are our choices of storing a table?

13 Design of Distr. DBMS replication (several copies of a table at different sites) fragmentation (horizontal; vertical; hybrid) or both…

14 Design of Distr. DBMS Replication: a copy of a relation is stored in two or more sites Pros and cons Availability Increased parallelism (possible minimization of movement of data among sites) Increased overhead on update (replicas should be consistent)

15 Design of Distr. DBMS ssnnameaddress 123smithwall str.... 234johnsonsunset blvd horiz. fragm. vertical fragm. Fragmentation: keep tuples/attributes at the sites where they are used the most ensure that the table can be reconstructed

16 Transparency & autonomy Issues/goals: naming and local autonomy replication transparency fragmentation transparency location transparency i.e.:

17 Problem – definition LA NY EMP EMPLOYEE connect to D-DBMS; exec sql select * from EMPL; ideally: DBMS1 DBMS2 D-DBMS

18 Overview Problem – motivation Design issues Query processing – semijoins transactions (recovery, conc. control)

19 Distributed Query processing issues (additional to centralized q-opt) cost of transmission parallelism / overlap of delays (cpu, disk, #bytes-transmitted, #messages-transmitted) minimize elapsed time? or minimize resource consumption?

20 Distributed Query processing s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S2 S3 SUPPLIER Join SHIPMENT = ?

21 semijoins choice of plans? plan #1: ship SHIP -> S1; join; ship -> S3 plan #2: ship SHIP->S3; ship SUP->S3; join... others?

22 Distr. Q-opt – semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S2 S3 SUPPLIER Join SHIPMENT = ?

23 Semijoins Idea: reduce the tables before shipping s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 SUPPLIER Join SHIPMENT = ?

24 Semijoins How to do the reduction, cheaply? E.g., reduce ‘SHIPMENT’:

25 Semijoins Idea: reduce the tables before shipping s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 SUPPLIER Join SHIPMENT = ? (s1,s2,s5,s11)

26 Semijoins Formally: SHIPMENT’ = SHIPMENT SUPPLIER express semijoin w/ rel. algebra

27 Semijoins Formally: SHIPMENT’ = SHIPMENT SUPPLIER express semijoin w/ rel. algebra

28 Semijoins – e.g.: suppose each attr. is 4 bytes Q: transmission cost (#bytes) for semijoin SHIPMENT’ = SHIPMENT semijoin SUPPLIER

29 Semijoins Idea: reduce the tables before shipping s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 SUPPLIER Join SHIPMENT = ? (s1,s2,s5,s11) 4 bytes

30 Semijoins – e.g.: suppose each attr. is 4 bytes Q: transmission cost (#bytes) for semijoin SHIPMENT’ = SHIPMENT semijoin SUPPLIER A: 4*4 bytes

31 Semijoins – e.g.: suppose each attr. is 4 bytes Q1: give a plan, with semijoin(s) Q2: estimate its cost (#bytes shipped)

32 Semijoins – e.g.: A1: reduce SHIPMENT to SHIPMENT’ SHIPMENT’ -> S3 SUPPLIER -> S3 do join @ S3 Q2: cost?

33 Semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 (s1,s2,s5,s11) 4 bytes

34 Semijoins – e.g.: A2: 4*4 bytes - reduce SHIPMENT to SHIPMENT’ 3*8 bytes - SHIPMENT’ -> S3 4*8 bytes - SUPPLIER -> S3 0 bytes - do join @ S3 72 bytes TOTAL

35 Other plans?

36 P2: reduce SHIPMENT to SHIPMENT’ reduce SUPPLIER to SUPPLIER’ SHIPMENT’ -> S3 SUPPLIER’ -> S3

37 Other plans? P3: reduce SUPPLIER to SUPPLIER’ SUPPLIER’ -> S2 do join @ S2 ship results -> S3

38 A brilliant idea: two-way semijoins (not in book, not in final exam) reduce both relations with one more exchange: [Kang, ’86] ship back the list of keys that didn’t match CAN NOT LOSE! (why?) further improvement: or the list of ones that matched – whatever is shorter!

39 Two-way Semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 (s1,s2,s5,s11) (s5,s11) S2

40 Overview Problem – motivation Design issues Query processing – semijoins transactions (recovery, conc. control)

41 Transactions – recovery Problem: e.g., a transaction moves $100 from NY  $50 to LA, $50 to Chicago 3 sub-transactions, on 3 systems how to guarantee atomicity (all-or- none)? Observation: additional types of failures (links, servers, delays, time-outs....)

42 Transactions – recovery Problem: e.g., a transaction moves $100 from NY -> $50 to LA, $50 to Chicago

43 Distributed recovery NY CHICAGO LA NY T1,1:-$100 T1,2: +$50 T1,3: +$50 How?

44 Distributed recovery NY CHICAGO LA NY T1,1:-$100 T1,2: +$50 T1,3: +$50 Step1: choose coordinator

45 Distributed recovery Step 2: execute a commit protocol, e.g., “2 phase commit” when a transaction T completes execution (i.e., when all sites at which T has executed inform the transaction coordinator Ci that T has completed) Ci starts the 2PC protocol ->

46 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit

47 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y Y

48 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y Y commit

49 2 phase commit (e.g., failure) time T1,1 (coord.)T1,2 T1,3 prepare to commit

50 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y N

51 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y N abort

52 Distributed recovery Many, many additional details (what if the coordinator fails? what if a link fails? etc) and many other solutions (e.g., 3-phase commit)

53 Overview Problem – motivation Design issues Query processing – semijoins transactions (recovery, conc. control)

54 Distributed conc. control also more complicated: distributed deadlocks!

55 Distributed deadlocks NY CHICAGO LA NY T 1,la T 2,la T 1,ny T 2,ny

56 Distributed deadlocks LANY T 1,la T 2,la T 1,ny T 2,ny

57 Distributed deadlocks LANY T 1,la T 2,la T 1,ny T 2,ny

58 Distributed deadlocks LA NY T 1,la T 2,la T 1,ny T 2,ny cites need to exchange wait-for graphs clever algorithms, to reduce # messages

59 Conclusions Distr. DBMSs: not deployed BUT: produced clever ideas: semijoins distributed recovery / conc. control which can be useful for parallel db / clusters ‘active disks’ replicated db (e-commerce servers)


Download ppt "V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS."

Similar presentations


Ads by Google