Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20.

Similar presentations


Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20."— Presentation transcript:

1 1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20

2 2 Outline Parallelism Databases Distributed Databases  Overview  Distributed Query Processing  Replicated Data Algorithms  Distributed Locking  Distributed Commit Protocol (2PC)  Reliable Distributed Database Management Peer-2-Peer Systems Summary

3 333 Parallel Databases

4 4 Models of Parallelism  Pipelining:  Example: T 1  SELECT * FROM A WHERE cond;  T 2  JOIN T 1 and B  Concurrent Operations:  Example: SELECT * FROM A WHERE cond; Parallel Databases A B (with index) select join A where A.x < 10 select A where 10  A.x < 20 select A where 20  A.x merge data location is important...

5 5 Goals:  Improve performance by executing multiple operations in parallel  Want to scale transactions per second  Want to scale large analytics queries (complexity) Key benefit:  Cheaper to scale than relying on a single increasingly more powerful processor Key challenge:  Ensure overhead and contention do not kill performance Parallel Databases

6 6 Speedup:  More processors  higher speed Scale-up:  More processors  can process more data  Transaction scale-up vs. batch scale-up Challenges to speedup and scale-up:  Startup cost: cost of starting an operation on many processors  Interference: contention for resources between processors  Skew: slowest step becomes the bottleneck Parallel Databases

7 7 Linear vs. Non-linear speedupLinear vs. Non-linear sclae-up

8 8 Architecture for Parallel Databases:  Shared memory  Shared disk (Oracle RAC)  Shared nothing (Teradata, HP, Vertica, etc.) Parallel Databases

9 9 Taxonomy for Parallel Query Evaluation  Inter-query parallelism Each query runs on one processor  Inter-operator parallelism A query runs on multiple processors An operator runs on one processor  Intra-operator parallelism An operator runs on multiple processors Parallel Databases Intra-operator parallelism is the most scalable

10 10 Vertical Data Partitioning: Relation R split into P projected chunks (set of attributes) R 0, …, R P-1, stored at P nodes Reconstruct relation by join on tid, then evaluate the query Parallel Databases

11 11 Horizontal Data Partitioning: Relation R split into P chunks R 0, …, R P-1, stored at P nodes  Round robin: good for scanning the whole relation. Not good for range queries Tuple t i to chunk (i mod P)  Hash based partitioning on attribute A: good for point queries, and for joins. Not good for range for range queries or join for non key point queries Tuple t to chunk h(t.A) mod P  Range based partitioning on attribute A: good for some range queries. Not good for skewed data  skewed execution Tuple t to chunk i if v i-1 < t.A < v i Parallel Databases

12 12 Cost with Parallelism: Compute σ A=v (R), or σ v1<A<v2 (R) On a conventional database: cost = B(R) What is the cost on a parallel database with P processors?  B(R) / P in all cases However, different processors do the work:  Round robin partitioning: all servers do the work  Hash partitioning: one server for σ A=v (R), or σ v1<A<v2 (R)  Range partitioning: one server only Parallel Databases

13 13 Parallel Group-by: Compute  A, sum(B) (R)?  Step 1: server i partitions chunk Ri using a hash function h(t.A) mod P: R i0, R i1, …, R i,P-1  Step 2: server i sends partition R ij to serve j  Step 3: server j computes  A, sum(B) on R 0j, R 1j, …, R P-1,j  Step 4: compute the Group-by on the results from all j servers Parallel Databases

14 14 Parallel Join:  Step 1 For all servers in [0,k], server i partitions chunk R i using a hash function h(t.A) mod P: R i0, R i1, …, R i,P-1 For all servers in [k+1,P], server j partitions chunk S j using a hash function h(t.A) mod P: S j0, S j1, …, R j,P-1  Step 2: Server i sends partition R iu to server u Server j sends partition S ju to server u  Steps 3: Server u computes the join of R iu with S ju Parallel Databases 01 K-1 01 K P-1 K-1 01 K P-1 K-1 … … … … …

15 15 Parallel Dataflow Implementation:  Use relational operators unchanged  Add special split and merge operators Handle data routing, buffering, and flow control  Example: exchange operator Inserted between consecutive operators in the query plan Can act as either a producer or consumer Producer pulls data from operator and sends to n consumers  Producer acts as driver for operators below it in query plan  Consumer buffers input data from n producers and makes it available to operator through getNext() interface Parallel Databases

16 16 Distributed Databases

17 17 Approaches:  Top-down approach: have one DB and split and allocate the sites – can be an issue…  Multi-DBs or bottom-up: no design issues Overview:  Data is stored at several nodes: each node is managed possibly by a DBMS that can run independently  Distributed Data Independence: users should not have to know where relevant data is located  Distributed Transaction Atomicity: users should be able to write Xacts accessing multiple nodes just like local Xacts Distributed Databases: Overview

18 18 Types of Distributed Databases:  Homogeneous: all nodes run the same database type (Oracle, DB2, etc.)  Heterogeneous: different nodes run different database type Architecture:  Client-Server: Client ships query to a single node to be all processed at that node  Collaborating-Servers: Query can span multiple nodes Distributed Databases: Overview

19 19 Distributed Catalog Management:  Must keep track of how data is distributed across nodes  Must be able to name each replica of each fragment  Node catalog: describes all objects (fragments, replicas) at a node + keep track of replicas of relations created at this node Distributed Databases: Overview

20 20 Distributed Query Optimization:  Cost-based approach: consider all plans, pick cheapest; similar to centralized optimization Difference-1: communication costs must be considered Difference-2: local site autonomy must be respected Difference-3: new distributed join methods  Query node constructs global plan, with suggested local plans describing processing at each node; a node can improve it local plan execution Distributed Databases: Distributed Query Optimization

21 21 Distributed Join Problem:  Assume you want to compute R(A,B) ⋈ S(B,C) where R and S are on different nodes?  Obvious solution is to copy R to the node of S or vice versa, and compute the join  In some scenarios this is not a viable solution because of the communication overhead!  Use Semijoin Reductions! Distributed Databases: Distributed Query Optimization

22 22 Semijoin Reductions:  We will try to ship the relevant part of each relation to the other node(s)  R S = R ⋈ ( π Y (S)); that is we project S onto the common attribute and ship to node of R.  Node of R does the semijoin R S to eliminate dangling tuples of R and ship back to node S  Node S compute the true join Distributed Databases: Distributed Query Optimization π Y (S) R S R(X,Y)S(Y,Z)

23 23 Replication:  Gives increased availability  Faster query evaluation  Synchronous vs. Asynchronous; vary in how current copies are! Synchronous replication: all copies of a modified relation (fragment) must be updated before the modifying Xact commits: Asynchronous replication: copies of a fragment are only periodically updated; different copies may get out of synch in the meantime Distributed Databases: Replicated Data Algorithms

24 24 Synchronous Replication – Basic Solution for CC:  Treat each copy as an independent data item Distributed Databases: Replicated Data Algorithms X1X2X3 Lock mgr X3 Lock mgr X2 Lock mgr X1 Tx i Tx j Tx k Object X has copies X1, X2, X3

25 25 Synchronous Replication – Basic Solution for CC:  Read(X): get shared X1 lock get shared X2 lock get shared X3 lock read one of X1, X2, X3 at end of transaction, release X1, X2, X3 locks Distributed Databases: Replicated Data Algorithms X1 lock mgr X3 lock mgr X2 lock mgr read

26 26 Synchronous Replication – Basic Solution for CC:  Write(X): get exclusive X1 lock get exclusive X2 lock get exclusive X3 lock write new value into X1, X2, X3 at end of transaction, release X1, X2, X3 locks Distributed Databases: Replicated Data Algorithms X1 X3 X2 lock write

27 27 Synchronous Replication – Problem (Low Availability):  Correctness OK 2PL  serializability 2PC  atomic transactions Distributed Databases: Replicated Data Algorithms X1 X3 X2 down!  cannot access X!

28 28 Synchronous Replication – Enhanced Algorithms:  Voting: Xact must write a majority of copies to modify an object; must read enough copies to ensure seeing at least one most recent copy  Read-any Write-all: reads are fast and write is slower than voting  Choice between the above two models will impact which locks to set  Before an update Xact can commit, it must obtain locks on all modified copies (expensive protocol with many messages)  As a result asynchronous replication is widely used Distributed Databases: Replicated Data Algorithms

29 29 Asynchronous Replication:  Allows modifying Xact to commit before all copies have been changed. Reader look at just one copy  Two approaches: Primary site: exactly one copy of a relation is designated as the primary/master copy. Other replicas can’t be updated directly. Q) How updates are propagated to the secondary copies? A) Two steps: (1) log-based capture (for recovery). (2) apply snapshot at the secondary site periodically Peer-to-Peer replication: more than one copy can be the master of this object. Changes to a master must be propagated to the other copies somehow. Two master copies are updated in a conflicting fashion must be resolved somehow. Best used for scenarios where conflicts do not arise frequently! Distributed Databases: Replicated Data Algorithms

30 30 Data Warehousing and Replication:  A hot trend for building giant “warehouse” with data from many data sources to enable complex decision support queries (DSS and BI) over data from across the enterprise  Warehouse can be viewed as an instance of asynchronous replication from the different data sources (each is typically controlled by an RDBMS) in the enterprise  ETL (Extract-Transform-Load): is typically used for updating the warehouse Distributed Databases: Replicated Data Algorithms

31 31 Distributed Locking: How do we manage locks for objects across many nodes?  Centralized: one node does all the locking; vulnerable to single node failure  Primary Copy: all locking for a given object is done at the primary copy node for this object  Fully Distributed: locking for a copy is done at the node where the copy is stored Distributed Databases: Distributed Locking

32 32 Distributed Deadlock Detection:  Each site maintains a local waits-for graph  A global deadlock might exist even if the local graphs contain no cycles:  Three solutions: Centralized: send all local graphs to one site Hierarchical: organize nodes into hierarchy and send local graphs to parent in the hierarchy Timeout: abort the Xact if it waits too long (most common approach) Distributed Databases: Distributed Locking

33 33 Distributed Recovery: New issues:  New kind of failures, e.g., links and remote nodes  If sub-transaction executes at different node, all or none must commit!  A log is maintained at each site, and commit protocol actions are additionally logged We need 2PC! Distributed Databases: Distributed Commit Protocol

34 34 Two-Phase Commit (2PC):  Node at which Xact originated is coordinator; other nodes at which it executes are subordinates  When a Xact wants to commit: Coordinator sends prepare msg to each subordinate Subordinate force-writes an abort or prepare log record and then sends a No or Yes msg to the coordinator If coordinator gets unanimous yes vote, force-write a commit record and sends commit msg to all subordinates. Else, force- write abort log record, and sends abort msg Subordinates force-write abort/commit log record based on the msg they receives, then send back an ack msg to coordinator Coordinator writes end log record after getting all acks Distributed Databases: Distributed Commit Protocol

35 35 Observations on 2PC: Two rounds of communication messages: first for voting, then to terminate Any node can decide to abort a Xact Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log All commit protocol log records for a Xact contain the Xact- id and the coordinator-id. The coordinator’s abort/commit record also includes ids of all subordinates Distributed Databases: Distributed Commit Protocol

36 36 Observations on 2PC (Contd.): Ack msgs used to let coordinator know when it can “forget” a Xact; until it receives all acks, it must keep T in the Xact Table If coordinator fails after sending prepare msgs but before writing commit/abort log records, when it comes back up it aborts the Xact If a subtransaction does no updates, its commit or abort status is irrelevant Distributed Databases: Distributed Commit Protocol

37 37 2PC: Restart After a Failure at a Node:  If we have a commit or abort log record for Xact T, but not an end record, must redo/undo T If this site is the coordinator for T, keep sending commit/abort msgs to subs until acks received  If we have a prepare log record for Xact T, but not commit/abort, this site is a subordinate for T Repeatedly contact the coordinator to find status of T, then write commit/abort log record; redo/undo T; and write end log record  If we don’t have even a prepare log record for T, unilaterally abort and undo T This site may be coordinator! If so, subs may send msgs Distributed Databases: Distributed Commit Protocol

38 38 2PC: Blocking  If coordinator for Xact T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers: T is blocked Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no Distributed Databases: Distributed Commit Protocol

39 39 2PC: Communication Link and Remote Node Failures  If a remote site does not respond during the commit protocol for Xact T, either because the site failed or the link failed: If the current site is the coordinator for T, should abort T If the current site is a subordinate, and has not yet voted yes, it should abort T If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds Distributed Databases: Distributed Commit Protocol

40 40 2PC with Presumed Abort:  When coordinator aborts T, it undoes T and removes it from the Xact Table immediately Doesn’t wait for acks; “presumes abort” if Xact not in Xact Table. Names of subs not recorded in abort log record  Subordinates do not send acks on abort  If subxact does not do updates, it responds to prepare msg with reader instead of yes/no  Coordinator subsequently ignores readers  If all subxacts are readers, 2nd phase is not needed Distributed Databases: Distributed Commit Protocol

41 41 Variants of 2PC:  Linear  Hierarchical ok commit Coord Distributed Databases: Distributed Commit Protocol

42 42 Variants of 2PC:  Distributed  Nodes broadcast all messages  Every node knows when to commit Distributed Databases: Distributed Commit Protocol

43 43 3PC = non-blocking commit:  Assume: failed node is down forever  Key idea: before committing, coordinator tells participants everyone is ok Distributed Databases: Distributed Commit Protocol

44 44 3PC – Recovery Rules (Termination Protocol):  Survivors try to complete transaction, based on their current states  Goal: If dead nodes committed or aborted, then survivors should not contradict! Else, survivors can do as they please... Distributed Databases: Distributed Commit Protocol survivors

45 45 3PC – Recovery Rules (Termination Protocol):  Let {S 1,S 2,…S n } be survivor sites  If one or more S i = COMMIT  COMMIT T  If one or more S i = ABORT  ABORT T  If one or more S i = PREPARE  T could not have aborted  COMMIT T  If no S i = PREPARE (or COMMIT)  T could not have committed  ABORT T Distributed Databases: Distributed Commit Protocol survivors

46 46 Reliability:  Correctness Serializability Atomicity Persistence  Availability Failure types: Processor failure, storage failure, and network failure – multiple failures! Scenarios Distributed Databases: Reliable Distributed Database Mgmt

47 47 Failure Models  Cannot protect against everything  Unlikely failures (e.g., flooding in the Sahara)  Expensive to protect failures (e.g., earthquake)  Failures we know how to protect against (e.g., message sequence numbers; stable storage) Distributed Databases: Reliable Distributed Database Mgmt Events Desired Undesired Expected Unexpected

48 48 Node Models:  Fail-stop nodes Distributed Databases: Reliable Distributed Database Mgmt Volatile memory lost Stable storage ok PerfectHaltedRecoveryPerfect time

49 49 Node Models:  Byzantine nodes  At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3) Distributed Databases: Reliable Distributed Database Mgmt PerfectHaltedRecoveryPerfect time A B C

50 50 Network Models:  Reliable Network In order messages no spontaneous messages timeout TD Distributed Databases: Reliable Distributed Database Mgmt If no ack in TD sec. Destination down (not paused) I.e., no lost messages, except for node failures

51 51 Network Models:  Variation of Reliable Network Persistent messages  If destination down, net will eventually deliver message  Simplifies node recovery, but leads to inefficiencies (hides too much)  Not considered here Distributed Databases: Reliable Distributed Database Mgmt

52 52 Network Models:  Partitioned Network In order messages No spontaneous messages Distributed Databases: Reliable Distributed Database Mgmt no timeout; nodes can have different view of failures

53 53 Scenarios:  Reliable network Fail-stop nodes  No data replication (1)  Data replication (2)  Partitionable network Fail-stop nodes (3) Distributed Databases: Reliable Distributed Database Mgmt

54 54 Scenarios:  Reliable network / Fail-stop nodes / No data replication (1) Basic idea: node P  controls X Single control point simplifies concurrency control, recovery Not an availability hit: if P  down, X is unavailable too! “P  controls X” means  P  does concurrency control for X  P  does recovery for X Distributed Databases: Reliable Distributed Database Mgmt PP Item X net Node

55 55 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Say transaction T wants to access X: Distributed Databases: Reliable Distributed Database Mgmt PTPT req Local DMBS Lock mgr LOG X P T is process that represents T at this node

56 56 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Process Model: Cohorts Distributed Databases: Reliable Distributed Database Mgmt USER T1T1 Local DMBS T2T2 Local DMBS T3T3 Local DMBS Spawn process Communication Data Access

57 57 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Process Model: Transaction servers (manager) Distributed Databases: Reliable Distributed Database Mgmt Spawn process Communication Data Access USER Local DMBS Trans MGR Local DMBS Trans MGR Local DMBS Trans MGR

58 58 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Process Model Summary: Cohorts: application code responsible for remote access Transaction manager: “system” handles distribution, remote access Distributed Databases: Reliable Distributed Database Mgmt

59 59 Peer-2-Peer Systems

60 60 Distributed application where nodes are:  Autonomous  Very loosely coupled  Equal in role or functionality  Share & exchange resources with each other Distributed Databases: Peer-2-Peer Distributed Systems

61 61 Related Terms:  File Sharing Ex: Napster, Gnutella, Kaaza, E-Donkey, BitTorrent, FreeNet, LimeWire, Morpheus  Grid Computing  Autonomic Computing Distributed Databases: Peer-2-Peer Distributed Systems

62 62 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X?

63 63 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X? answers

64 64 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X? answers request resource receive resource

65 65 Distributed Lookup:  Have pairs  Given k, find matching values Distributed Databases: Peer-2-Peer Distributed Systems 1 a 1 b 7 c 4 d 1 a 3 a 4 a K V lookup(4) = {a, d}

66 66 Data Distributed Over Nodes:  N nodes  Each holds some data  Notations: X.func(params) means RPC of procedure func(params) at node X X.A means send message to X to get value of A If X omitted, we refer to local procedure or data structures Distributed Databases: Peer-2-Peer Distributed Systems node Y data: A, B... node X data: A, B...... B := X.A A := X.f(B)...

67 67 Summary

68 68 Parallel DBMSs are designed for scalable performance. Relational operators very well-suited for parallel execution.  Pipeline and partitioned parallelism Distributed DBMSs offer site autonomy and distributed administration. Must revisit storage and catalog techniques, concurrency control, and recovery issues. Summary

69 69 END


Download ppt "1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20."

Similar presentations


Ads by Google