1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20.

1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20

2 Outline Parallelism Databases Distributed Databases  Overview  Distributed Query Processing  Replicated Data Algorithms  Distributed Locking  Distributed Commit Protocol (2PC)  Reliable Distributed Database Management Peer-2-Peer Systems Summary

333 Parallel Databases

4 Models of Parallelism  Pipelining:  Example: T 1  SELECT * FROM A WHERE cond;  T 2  JOIN T 1 and B  Concurrent Operations:  Example: SELECT * FROM A WHERE cond; Parallel Databases A B (with index) select join A where A.x < 10 select A where 10  A.x < 20 select A where 20  A.x merge data location is important...

5 Goals:  Improve performance by executing multiple operations in parallel  Want to scale transactions per second  Want to scale large analytics queries (complexity) Key benefit:  Cheaper to scale than relying on a single increasingly more powerful processor Key challenge:  Ensure overhead and contention do not kill performance Parallel Databases

6 Speedup:  More processors  higher speed Scale-up:  More processors  can process more data  Transaction scale-up vs. batch scale-up Challenges to speedup and scale-up:  Startup cost: cost of starting an operation on many processors  Interference: contention for resources between processors  Skew: slowest step becomes the bottleneck Parallel Databases

7 Linear vs. Non-linear speedupLinear vs. Non-linear sclae-up

8 Architecture for Parallel Databases:  Shared memory  Shared disk (Oracle RAC)  Shared nothing (Teradata, HP, Vertica, etc.) Parallel Databases

9 Taxonomy for Parallel Query Evaluation  Inter-query parallelism Each query runs on one processor  Inter-operator parallelism A query runs on multiple processors An operator runs on one processor  Intra-operator parallelism An operator runs on multiple processors Parallel Databases Intra-operator parallelism is the most scalable

10 Vertical Data Partitioning: Relation R split into P projected chunks (set of attributes) R 0, …, R P-1, stored at P nodes Reconstruct relation by join on tid, then evaluate the query Parallel Databases

11 Horizontal Data Partitioning: Relation R split into P chunks R 0, …, R P-1, stored at P nodes  Round robin: good for scanning the whole relation. Not good for range queries Tuple t i to chunk (i mod P)  Hash based partitioning on attribute A: good for point queries, and for joins. Not good for range for range queries or join for non key point queries Tuple t to chunk h(t.A) mod P  Range based partitioning on attribute A: good for some range queries. Not good for skewed data  skewed execution Tuple t to chunk i if v i-1 < t.A < v i Parallel Databases

12 Cost with Parallelism: Compute σ A=v (R), or σ v1<A<v2 (R) On a conventional database: cost = B(R) What is the cost on a parallel database with P processors?  B(R) / P in all cases However, different processors do the work:  Round robin partitioning: all servers do the work  Hash partitioning: one server for σ A=v (R), or σ v1<A<v2 (R)  Range partitioning: one server only Parallel Databases

13 Parallel Group-by: Compute  A, sum(B) (R)?  Step 1: server i partitions chunk Ri using a hash function h(t.A) mod P: R i0, R i1, …, R i,P-1  Step 2: server i sends partition R ij to serve j  Step 3: server j computes  A, sum(B) on R 0j, R 1j, …, R P-1,j  Step 4: compute the Group-by on the results from all j servers Parallel Databases

14 Parallel Join:  Step 1 For all servers in [0,k], server i partitions chunk R i using a hash function h(t.A) mod P: R i0, R i1, …, R i,P-1 For all servers in [k+1,P], server j partitions chunk S j using a hash function h(t.A) mod P: S j0, S j1, …, R j,P-1  Step 2: Server i sends partition R iu to server u Server j sends partition S ju to server u  Steps 3: Server u computes the join of R iu with S ju Parallel Databases 01 K-1 01 K P-1 K-1 01 K P-1 K-1 … … … … …

15 Parallel Dataflow Implementation:  Use relational operators unchanged  Add special split and merge operators Handle data routing, buffering, and flow control  Example: exchange operator Inserted between consecutive operators in the query plan Can act as either a producer or consumer Producer pulls data from operator and sends to n consumers  Producer acts as driver for operators below it in query plan  Consumer buffers input data from n producers and makes it available to operator through getNext() interface Parallel Databases

16 Distributed Databases

17 Approaches:  Top-down approach: have one DB and split and allocate the sites – can be an issue…  Multi-DBs or bottom-up: no design issues Overview:  Data is stored at several nodes: each node is managed possibly by a DBMS that can run independently  Distributed Data Independence: users should not have to know where relevant data is located  Distributed Transaction Atomicity: users should be able to write Xacts accessing multiple nodes just like local Xacts Distributed Databases: Overview

18 Types of Distributed Databases:  Homogeneous: all nodes run the same database type (Oracle, DB2, etc.)  Heterogeneous: different nodes run different database type Architecture:  Client-Server: Client ships query to a single node to be all processed at that node  Collaborating-Servers: Query can span multiple nodes Distributed Databases: Overview

19 Distributed Catalog Management:  Must keep track of how data is distributed across nodes  Must be able to name each replica of each fragment  Node catalog: describes all objects (fragments, replicas) at a node + keep track of replicas of relations created at this node Distributed Databases: Overview

20 Distributed Query Optimization:  Cost-based approach: consider all plans, pick cheapest; similar to centralized optimization Difference-1: communication costs must be considered Difference-2: local site autonomy must be respected Difference-3: new distributed join methods  Query node constructs global plan, with suggested local plans describing processing at each node; a node can improve it local plan execution Distributed Databases: Distributed Query Optimization

21 Distributed Join Problem:  Assume you want to compute R(A,B) ⋈ S(B,C) where R and S are on different nodes?  Obvious solution is to copy R to the node of S or vice versa, and compute the join  In some scenarios this is not a viable solution because of the communication overhead!  Use Semijoin Reductions! Distributed Databases: Distributed Query Optimization

22 Semijoin Reductions:  We will try to ship the relevant part of each relation to the other node(s)  R S = R ⋈ ( π Y (S)); that is we project S onto the common attribute and ship to node of R.  Node of R does the semijoin R S to eliminate dangling tuples of R and ship back to node S  Node S compute the true join Distributed Databases: Distributed Query Optimization π Y (S) R S R(X,Y)S(Y,Z)

23 Replication:  Gives increased availability  Faster query evaluation  Synchronous vs. Asynchronous; vary in how current copies are! Synchronous replication: all copies of a modified relation (fragment) must be updated before the modifying Xact commits: Asynchronous replication: copies of a fragment are only periodically updated; different copies may get out of synch in the meantime Distributed Databases: Replicated Data Algorithms

24 Synchronous Replication – Basic Solution for CC:  Treat each copy as an independent data item Distributed Databases: Replicated Data Algorithms X1X2X3 Lock mgr X3 Lock mgr X2 Lock mgr X1 Tx i Tx j Tx k Object X has copies X1, X2, X3

25 Synchronous Replication – Basic Solution for CC:  Read(X): get shared X1 lock get shared X2 lock get shared X3 lock read one of X1, X2, X3 at end of transaction, release X1, X2, X3 locks Distributed Databases: Replicated Data Algorithms X1 lock mgr X3 lock mgr X2 lock mgr read

26 Synchronous Replication – Basic Solution for CC:  Write(X): get exclusive X1 lock get exclusive X2 lock get exclusive X3 lock write new value into X1, X2, X3 at end of transaction, release X1, X2, X3 locks Distributed Databases: Replicated Data Algorithms X1 X3 X2 lock write

27 Synchronous Replication – Problem (Low Availability):  Correctness OK 2PL  serializability 2PC  atomic transactions Distributed Databases: Replicated Data Algorithms X1 X3 X2 down!  cannot access X!

28 Synchronous Replication – Enhanced Algorithms:  Voting: Xact must write a majority of copies to modify an object; must read enough copies to ensure seeing at least one most recent copy  Read-any Write-all: reads are fast and write is slower than voting  Choice between the above two models will impact which locks to set  Before an update Xact can commit, it must obtain locks on all modified copies (expensive protocol with many messages)  As a result asynchronous replication is widely used Distributed Databases: Replicated Data Algorithms

29 Asynchronous Replication:  Allows modifying Xact to commit before all copies have been changed. Reader look at just one copy  Two approaches: Primary site: exactly one copy of a relation is designated as the primary/master copy. Other replicas can’t be updated directly. Q) How updates are propagated to the secondary copies? A) Two steps: (1) log-based capture (for recovery). (2) apply snapshot at the secondary site periodically Peer-to-Peer replication: more than one copy can be the master of this object. Changes to a master must be propagated to the other copies somehow. Two master copies are updated in a conflicting fashion must be resolved somehow. Best used for scenarios where conflicts do not arise frequently! Distributed Databases: Replicated Data Algorithms

30 Data Warehousing and Replication:  A hot trend for building giant “warehouse” with data from many data sources to enable complex decision support queries (DSS and BI) over data from across the enterprise  Warehouse can be viewed as an instance of asynchronous replication from the different data sources (each is typically controlled by an RDBMS) in the enterprise  ETL (Extract-Transform-Load): is typically used for updating the warehouse Distributed Databases: Replicated Data Algorithms

31 Distributed Locking: How do we manage locks for objects across many nodes?  Centralized: one node does all the locking; vulnerable to single node failure  Primary Copy: all locking for a given object is done at the primary copy node for this object  Fully Distributed: locking for a copy is done at the node where the copy is stored Distributed Databases: Distributed Locking

32 Distributed Deadlock Detection:  Each site maintains a local waits-for graph  A global deadlock might exist even if the local graphs contain no cycles:  Three solutions: Centralized: send all local graphs to one site Hierarchical: organize nodes into hierarchy and send local graphs to parent in the hierarchy Timeout: abort the Xact if it waits too long (most common approach) Distributed Databases: Distributed Locking

33 Distributed Recovery: New issues:  New kind of failures, e.g., links and remote nodes  If sub-transaction executes at different node, all or none must commit!  A log is maintained at each site, and commit protocol actions are additionally logged We need 2PC! Distributed Databases: Distributed Commit Protocol

34 Two-Phase Commit (2PC):  Node at which Xact originated is coordinator; other nodes at which it executes are subordinates  When a Xact wants to commit: Coordinator sends prepare msg to each subordinate Subordinate force-writes an abort or prepare log record and then sends a No or Yes msg to the coordinator If coordinator gets unanimous yes vote, force-write a commit record and sends commit msg to all subordinates. Else, force- write abort log record, and sends abort msg Subordinates force-write abort/commit log record based on the msg they receives, then send back an ack msg to coordinator Coordinator writes end log record after getting all acks Distributed Databases: Distributed Commit Protocol

35 Observations on 2PC: Two rounds of communication messages: first for voting, then to terminate Any node can decide to abort a Xact Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log All commit protocol log records for a Xact contain the Xact- id and the coordinator-id. The coordinator’s abort/commit record also includes ids of all subordinates Distributed Databases: Distributed Commit Protocol

36 Observations on 2PC (Contd.): Ack msgs used to let coordinator know when it can “forget” a Xact; until it receives all acks, it must keep T in the Xact Table If coordinator fails after sending prepare msgs but before writing commit/abort log records, when it comes back up it aborts the Xact If a subtransaction does no updates, its commit or abort status is irrelevant Distributed Databases: Distributed Commit Protocol

37 2PC: Restart After a Failure at a Node:  If we have a commit or abort log record for Xact T, but not an end record, must redo/undo T If this site is the coordinator for T, keep sending commit/abort msgs to subs until acks received  If we have a prepare log record for Xact T, but not commit/abort, this site is a subordinate for T Repeatedly contact the coordinator to find status of T, then write commit/abort log record; redo/undo T; and write end log record  If we don’t have even a prepare log record for T, unilaterally abort and undo T This site may be coordinator! If so, subs may send msgs Distributed Databases: Distributed Commit Protocol

38 2PC: Blocking  If coordinator for Xact T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers: T is blocked Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no Distributed Databases: Distributed Commit Protocol

39 2PC: Communication Link and Remote Node Failures  If a remote site does not respond during the commit protocol for Xact T, either because the site failed or the link failed: If the current site is the coordinator for T, should abort T If the current site is a subordinate, and has not yet voted yes, it should abort T If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds Distributed Databases: Distributed Commit Protocol

40 2PC with Presumed Abort:  When coordinator aborts T, it undoes T and removes it from the Xact Table immediately Doesn’t wait for acks; “presumes abort” if Xact not in Xact Table. Names of subs not recorded in abort log record  Subordinates do not send acks on abort  If subxact does not do updates, it responds to prepare msg with reader instead of yes/no  Coordinator subsequently ignores readers  If all subxacts are readers, 2nd phase is not needed Distributed Databases: Distributed Commit Protocol

41 Variants of 2PC:  Linear  Hierarchical ok commit Coord Distributed Databases: Distributed Commit Protocol

42 Variants of 2PC:  Distributed  Nodes broadcast all messages  Every node knows when to commit Distributed Databases: Distributed Commit Protocol

43 3PC = non-blocking commit:  Assume: failed node is down forever  Key idea: before committing, coordinator tells participants everyone is ok Distributed Databases: Distributed Commit Protocol

44 3PC – Recovery Rules (Termination Protocol):  Survivors try to complete transaction, based on their current states  Goal: If dead nodes committed or aborted, then survivors should not contradict! Else, survivors can do as they please... Distributed Databases: Distributed Commit Protocol survivors

45 3PC – Recovery Rules (Termination Protocol):  Let {S 1,S 2,…S n } be survivor sites  If one or more S i = COMMIT  COMMIT T  If one or more S i = ABORT  ABORT T  If one or more S i = PREPARE  T could not have aborted  COMMIT T  If no S i = PREPARE (or COMMIT)  T could not have committed  ABORT T Distributed Databases: Distributed Commit Protocol survivors

46 Reliability:  Correctness Serializability Atomicity Persistence  Availability Failure types: Processor failure, storage failure, and network failure – multiple failures! Scenarios Distributed Databases: Reliable Distributed Database Mgmt

47 Failure Models  Cannot protect against everything  Unlikely failures (e.g., flooding in the Sahara)  Expensive to protect failures (e.g., earthquake)  Failures we know how to protect against (e.g., message sequence numbers; stable storage) Distributed Databases: Reliable Distributed Database Mgmt Events Desired Undesired Expected Unexpected

48 Node Models:  Fail-stop nodes Distributed Databases: Reliable Distributed Database Mgmt Volatile memory lost Stable storage ok PerfectHaltedRecoveryPerfect time

49 Node Models:  Byzantine nodes  At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3) Distributed Databases: Reliable Distributed Database Mgmt PerfectHaltedRecoveryPerfect time A B C

50 Network Models:  Reliable Network In order messages no spontaneous messages timeout TD Distributed Databases: Reliable Distributed Database Mgmt If no ack in TD sec. Destination down (not paused) I.e., no lost messages, except for node failures

51 Network Models:  Variation of Reliable Network Persistent messages  If destination down, net will eventually deliver message  Simplifies node recovery, but leads to inefficiencies (hides too much)  Not considered here Distributed Databases: Reliable Distributed Database Mgmt

52 Network Models:  Partitioned Network In order messages No spontaneous messages Distributed Databases: Reliable Distributed Database Mgmt no timeout; nodes can have different view of failures

53 Scenarios:  Reliable network Fail-stop nodes  No data replication (1)  Data replication (2)  Partitionable network Fail-stop nodes (3) Distributed Databases: Reliable Distributed Database Mgmt

54 Scenarios:  Reliable network / Fail-stop nodes / No data replication (1) Basic idea: node P  controls X Single control point simplifies concurrency control, recovery Not an availability hit: if P  down, X is unavailable too! “P  controls X” means  P  does concurrency control for X  P  does recovery for X Distributed Databases: Reliable Distributed Database Mgmt PP Item X net Node

55 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Say transaction T wants to access X: Distributed Databases: Reliable Distributed Database Mgmt PTPT req Local DMBS Lock mgr LOG X P T is process that represents T at this node

56 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Process Model: Cohorts Distributed Databases: Reliable Distributed Database Mgmt USER T1T1 Local DMBS T2T2 Local DMBS T3T3 Local DMBS Spawn process Communication Data Access

57 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Process Model: Transaction servers (manager) Distributed Databases: Reliable Distributed Database Mgmt Spawn process Communication Data Access USER Local DMBS Trans MGR Local DMBS Trans MGR Local DMBS Trans MGR

58 Scenarios:  Reliable network / Fail-stop nodes / No data replication (Contd.)  Process Model Summary: Cohorts: application code responsible for remote access Transaction manager: “system” handles distribution, remote access Distributed Databases: Reliable Distributed Database Mgmt

59 Peer-2-Peer Systems

60 Distributed application where nodes are:  Autonomous  Very loosely coupled  Equal in role or functionality  Share & exchange resources with each other Distributed Databases: Peer-2-Peer Distributed Systems

61 Related Terms:  File Sharing Ex: Napster, Gnutella, Kaaza, E-Donkey, BitTorrent, FreeNet, LimeWire, Morpheus  Grid Computing  Autonomic Computing Distributed Databases: Peer-2-Peer Distributed Systems

62 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X?

63 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X? answers

64 Search in a P2P System: Distributed Databases: Peer-2-Peer Distributed Systems Resources: R 1,1, R 1,2,... Resources: R 2,1, R 2,2,... Resources: R 3,1, R 3,2,... Query: Who has X? answers request resource receive resource

65 Distributed Lookup:  Have pairs  Given k, find matching values Distributed Databases: Peer-2-Peer Distributed Systems 1 a 1 b 7 c 4 d 1 a 3 a 4 a K V lookup(4) = {a, d}

66 Data Distributed Over Nodes:  N nodes  Each holds some data  Notations: X.func(params) means RPC of procedure func(params) at node X X.A means send message to X to get value of A If X omitted, we refer to local procedure or data structures Distributed Databases: Peer-2-Peer Distributed Systems node Y data: A, B... node X data: A, B...... B := X.A A := X.f(B)...

67 Summary

68 Parallel DBMSs are designed for scalable performance. Relational operators very well-suited for parallel execution.  Pipeline and partitioned parallelism Distributed DBMSs offer site autonomy and distributed administration. Must revisit storage and catalog techniques, concurrency control, and recovery issues. Summary

69 END

1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20.

Similar presentations

Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20.

Similar presentations

Presentation on theme: "1 Advanced Database Systems: DBS CB, 2 nd Edition Parallel and Distributed Databases Ch. 20."— Presentation transcript:

Similar presentations

About project

Feedback