Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel and distributed databases R & G Chapter 22.

Similar presentations


Presentation on theme: "Parallel and distributed databases R & G Chapter 22."— Presentation transcript:

1 Parallel and distributed databases R & G Chapter 22

2 What is a distributed database?

3 Why distribute a database Scalability and performance Resilience to failures Throughput Data size versus X X

4 Why distribute a database Data is already distributed Or needs to be distributed Data is in multiple systems

5 Why not distribute a database You must earn your complexity! Communication needed Must build a complex infrastructure Unpredictable latencies must be masked More types of failures More components to fail Network failures Congestion, timeouts More complex planning Communication cost plus I/O cost May have to deal with heterogeneity Different types of systems Different schemas, possibly incompatible Different administrative domains

6 Types of distributed databases

7 The old days: mainframes Definitely not distributed!

8 Client-server User interaction Data processing Network

9 Parallel database

10 Primary/secondary X

11 Multidatabase

12 How do they work? What is shared? How to distribute the data? How to process the data? How to update the data?

13 What is shared? Memory CPUsRAM Disk Most modern DBMSs

14 What is shared? Disk RAM Oracle RAC

15 What is shared? Nothing RAM Search engines, Teradata

16 Server 1Server 2Server 3Server 4 Bike$86 6/2/07636353 Chair$10 6/5/07662113 How to distribute the data? Couch$570 6/1/07424252 Car$1123 6/1/07256623 Lamp$19 6/7/07121113 Bike$56 6/9/07887734 Scooter$18 6/11/07252111 Hammer$8000 6/11/07116458

17 How to distribute the data? Hash partitioning Range partitioning (key,value) Hash() (key,value) <= X> X

18 Server 1Server 2Server 3Server 4 How to distribute the data? Bike Chair Couch Car Lamp Bike Scooter Hammer $86 $10 $570 $1123 $19 $56 $18 $8000 6/2/07 6/5/07 6/1/07 6/7/07 6/9/07 6/11/07 636353 662113 424252 256623 121113 887734 252111 116458

19 Query processing Intra-operator parallelism Inter-operator parallelism

20 Parallel scanning filter Result

21 Sorting

22

23 Parallel hash join Hash()

24 Join

25 Semi-join

26 Inter-operator parallelism

27 Updating distributed data Synchronous: read-any-write-all Reads are fast

28 Updating distributed data Synchronous: voting

29 Updating distributed data Synchronous: voting Writes tolerant to disconnection

30 Consistency of distributed data Should provide ACID

31 Primary/secondary

32 Two-phase commit PREPARE PREPARED COMMIT

33 Two-phase commit PREPARE PREPAREDABORT

34 Two-phase commit PREPARE PREPARED ABORT

35 Two-phase commit PREPARE PREPARED X

36 Conclusion Parallelism and distribution very useful Performance Fault tolerance Scale But complex! Rethink lots of aspects of the system Must earn the complexity


Download ppt "Parallel and distributed databases R & G Chapter 22."

Similar presentations


Ads by Google