1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

4 Copyright © Ellis Cohen, 2002-2005 Data Replication Put copies of the same data at multiple sites Achieves Decreased network latency by placing replicas near multiple high demand sites (access data at nearest replica) High availability & reliability in face of failure Parallel processing Scalability (single copy no longer bottleneck) Disconnected operation Query Processing Is even more complex, since the coordinator's query optimizer additionally needs to decide which replica to use

5 Copyright © Ellis Cohen, 2002-2005 Replication Group: Group of related data items which are replicated together at different sites A database may have multiple replication groups Two different replication groups may be replicated –at the same set of sites –at disjoint sets of sites –at overlapping sites A replication group typically represents all the data used in an application A transaction generally accesses data in a single replication group Much more complicated if a transaction accesses multiple replication groups Replication Groups S1 S2 S3 Ra Rb

6 Copyright © Ellis Cohen, 2002-2005 Consistency Timeframes Immediate Consistency All replicas in a group are consistent after each and every update is made. Transactional Consistency At the end of each transaction, the replica group has a consistent model of the committed data. Updates need only be made at a single replica, and then are consistently propagated to others Eventual Consistency Updates made at one replica will eventually be propagated to other replicas, but at any point in time, different replicas may have inconsistent committed data. The replica group will only have a consistent model of the committed data when the system quiesces (all updates are propagated)

7 Copyright © Ellis Cohen, 2002-2005 1SR - 1 Copy Serializability Replicated form of serializability: Interleaved execution of transactions on a replicated database is equivalent to Serial execution of those transactions on a database where there is only one copy of each data item Eventual approaches –may not be able to ensure 1SR, but –may be able to satisfy other weak consistency guarantees

8 Copyright © Ellis Cohen, 2002-2005 Basic Replication Topologies Master Master/Snapshot The replication group has a primary copy held at the master. Its committed state represents the current committed state of the entire replication group. Snapshots Group The replication group has no single primary copy. It has a consistent (committed) state only if the replicas are consistent with one another

9 Copyright © Ellis Cohen, 2002-2005 Master vs Snapshot Sites Master (Primary Copy) Site Holds all the data in the replication group Holds most consistent / up-to-date version of the data Snapshot Site May hold a partial (instead of a complete) subset of the data in a replication group i.e. a subset of its tables, or even horizontal and/or vertical fragments of tables Often not as up-to-date as Master sites May be completely or partly read-only, in particular, if it contains materialized views (involving multiple tables) of the data in the replication group

10 Copyright © Ellis Cohen, 2002-2005 Partial Snapshots Partial Snapshots further complicate query processing Coordinator may need to get part of the queried data from one snapshot, and part from another Especially complicated if an operation needs to update different partial snapshots (we'll ignore this case!) Static vs Dynamic Partial Snapshots Static Partial Snapshots Specified declaratively, made known to coordinator who uses it for query processing Dynamic Partial Snapshots Replica appears complete, but processes queries by obtaining missing data from primary copy Leads to complex query processing at snapshot site, since its request to the primary copy must take into account data already replicated (which might have already been modified during the transaction!)

11 Copyright © Ellis Cohen, 2002-2005 Snapshots with Materialized Views A materialized view is a view where the results of the view are actually maintained persistently When the underlying data is modified, the materialized view must generally be modified as well (e.g. by triggers, sometimes set up automatically by the replication manager) or deleted. If a query optimizer knows about materialized views, it can rewrite queries on the underlying data to efficiently use the materialized view instead ("query rewriting") Different snapshots may contain different materialized views of the data. If the coordinator's query optimizer knows the location of materialized views, this can affect the replica it asks to process a query. Materialized views can be static or dynamic. Dynamic materialized views often result from remembering the result sets of executed queries.

13 Copyright © Ellis Cohen, 2002-2005 Consistency Models Total Consistency All replicas (unless they are crashed or disconnected) are consistent with one another. Master Consistency There is a master replica (the primary copy). Transactions committed at the master site reflect the intended state of the data. MultiMaster Consistency Like master consistency, but there are multiple master replicas, all consistent with one another

14 Copyright © Ellis Cohen, 2002-2005 Update Propagation Models Synchronous Write-Synchronous (ROWA) Coordinator's write operation is not completed until every replica is updated Commit-Synchronous (EAGER) All replicas commit (using 2PC) as part of coordinator's transaction Asynchronous As-Needed-Propagation (LAZY) After transaction ends, updates are propagated to other replicas as needed Eventual-Propagation (EVENTUAL) After transaction ends, updates are eventually propagated to other replicas What is the consistency timeframe & model for each one?

15 Copyright © Ellis Cohen, 2002-2005 Update & Consistency Models Write-Synchronous (ROWA) Coordinator's write operation is not completed until every replica is updated Immediate Total Consistency Commit-Synchronous (EAGER) All replicas commit (using 2PC) as part of coordinator's transaction Transactional Total Consistency As-Needed-Propagation (LAZY) After transaction ends, updates are propagated to other replicas as needed Transactional (Multi-)Master Consistency Eventual-Propagation (EVENTUAL) After transaction ends, updates are eventually propagated to other replicas Eventual Consistency

18 Copyright © Ellis Cohen, 2002-2005 ROWA Update Model Synchronous Write-Synchronous (ROWA) Coordinator's write operation is not completed until every replica is updated Commit-Synchronous (EAGER) All replicas commit (using 2PC) as part of coordinator's transaction Asynchronous As-Needed-Propagation (LAZY) After transaction ends, updates are propagated to other replicas as needed Eventual-Propagation (EVENTUAL) After transaction ends, updates are eventually propagated to other replicas

19 Copyright © Ellis Cohen, 2002-2005 ROWA Consistency Model Immediate Consistency Write-Synchronous (ROWA) Coordinator's write operation is not completed until every replica is updated Transactional Consistency Commit-Synchronous (EAGER) All replicas commit (using 2PC) as part of coordinator's transaction As-Needed-Propagation (LAZY) After transaction ends, updates are propagated to other replicas as needed Eventual Consistency Eventual-Propagation (EVENTUAL) After transaction ends, updates are eventually propagated to other replicas

20 Copyright © Ellis Cohen, 2002-2005 ROWA Overview Read One, Write All All replicas are updated immediately (without waiting until the transaction doing the update commits) Data can be read from any replica Immediate Total Consistency All replicas (unless crashed or disconnected) are always consistent with one another Topology Group-based. No need for a special primary copy. Concurrency Usually lock-based. Other concurrency models can be used as well. When might the ROWA model be used?

21 Copyright © Ellis Cohen, 2002-2005 ROWA Uses Hot Standby Upon failure, another replica can be switched in immediately Mobile Use Suppose every cell has a nearby replica. A mobile coordinator can switch from replica to replica during a transaction, using whichever one is nearest Reliability Read from multiple replicas simultaneously to – avoid waiting in case of site/link failure – ensure that data is correct Updates can be very expensive. Either they're done infrequently, or they must be worth the cost

22 Copyright © Ellis Cohen, 2002-2005 ROWA C On write, Coordinator must acquire X locks for all replicas, and writes to all of them On read, Coordinator acquires S lock for the one replica it will actually read Ensures 1SR Can use non-locking also ROWA Advantage: Can read from any replica ROWA Disadvantage: Every write requires communication round trip involving the farthest & slowest replicas Serious ROWA Problem If a replica site crashes, the coordinator and all competing transactions must wait until it recovers Solutions All Available Writes Quorum Consensus (Read Some, Write Some)

23 Copyright © Ellis Cohen, 2002-2005 All Available Writes (AAW) At transaction start, assume All replicas are available Coordinator writes by Writing to all known available replicas. Those which do not ACK within timeout period are marked as unavailable but otherwise ignored Coordinator reads by Reading from chosen (e.g. nearest) replica. If it times out, mark it as unavailable, and read from a different replica Coordinator augments 2PC with Missing Writes Validation: Makes sure that all replicas that were not written to are still unavailable Access Validation: Make sure that all replicas read or written are still available. This is necessary for 1SR C

24 Copyright © Ellis Cohen, 2002-2005 Partitioning Assume a set of replicas are partitioned. C1 C2 Majority Partition Approach Only if the partition contains a (weighted) majority of the replicas. Disconnected Operation Each can continue. Requires reconciliation when the network recovers (discuss later). Can each partition continue executing read-write transactions that update its set of replicas?

25 Copyright © Ellis Cohen, 2002-2005 Site Recovery On restart –Site contacts sibling replica –Obtains & processes [relevant portion of] log of all (sub)transactions committed while site was down, carefully in case [a] new transaction completes while processing the log –Makes itself available again (i.e. responds to reads and writes) Many variations of this protocol, esp to accommodate –Dynamic creation, removal and relocation of replica sites C

26 Copyright © Ellis Cohen, 2002-2005 Multiple Reads Read from n replicas in parallel Allows fastest one to respond Avoids taking time for reading another replica if first one is unavailable Use Voting: Detect/correct errors/sabotage by comparing results of multiple reads Guarantees getting latest value even if not all replicas were updated (Quorum Consensus Protocol: Requires that the write set contains weighted majority of replicas ) C

27 Copyright © Ellis Cohen, 2002-2005 ROWA Summary ROWA Advantages Global Consistency & 1SR Can read from any replica ROWA Disadvantages Every write requires writing all (available) replicas High overhead for every write (Can trade off write all for quorum read, though it is generally more expensive)

29 Copyright © Ellis Cohen, 2002-2005 Eager Update Model Synchronous Write-Synchronous (ROWA) Coordinator's write operation is not completed until every replica is updated Commit-Synchronous (EAGER) All replicas commit (using 2PC) as part of coordinator's transaction Asynchronous As-Needed-Propagation (LAZY) After transaction ends, updates are propagated to other replicas as needed Eventual-Propagation (EVENTUAL) After transaction ends, updates are eventually propagated to other replicas

30 Copyright © Ellis Cohen, 2002-2005 ROWA Consistency Model Immediate Consistency Write-Synchronous (ROWA) Coordinator's write operation is not completed until every replica is updated Transactional Consistency Commit-Synchronous (EAGER) All replicas commit (using 2PC) as part of coordinator's transaction As-Needed-Propagation (LAZY) After transaction ends, updates are propagated to other replicas as needed Eventual Consistency Eventual-Propagation (EVENTUAL) After transaction ends, updates are eventually propagated to other replicas

31 Copyright © Ellis Cohen, 2002-2005 Eager Overview Read & Write Coordinator uses a single replica for all reads & writes of replication group data. [If replicas hold partial snapshot, may need to read/write from multiple ones] There are variants that just write to the master. Transactional Total Consistency At the end of each transaction, all replicas in the group have a consistent model of the committed data. Updates made at a single replica are consistently propagated to others at/by commit-time. Topology Either Group-based or Master/Snapshot Concurrency All concurrency mechanisms can be used When might the Eager model be used?

32 Copyright © Ellis Cohen, 2002-2005 Eager Uses Hot Standby Upon failure, another replica can be switched in immediately, although transactions which updated the failed replica will need to be aborted Disconnected Operation If the network is partitioned and contains a replica, operations can continue Read-only transactions will have access to an up-to- date version of the data Read/write operations can continue if reconciliation is supported Serializability Ensuring transactional consistency ensures that concurrent transactions which use different replicas are serializable. Commits can be expensive, since they require 2PC involving every replica

33 Copyright © Ellis Cohen, 2002-2005 Eager Master/Snapshot C Coordinator interacts with a single replica (e.g. nearest one) chosen from the replica group During 2PC –Coordinator requests PREPARE from that replica –(Unless the chosen replica is the master), the chosen replica requests PREPARE from the Master, propagating all updates along with the request –The master requests PREPARE from all the other snapshot replicas, propagating all updates along with the request Read & write from single replica What happens in an hierarchical master topology? If the transaction uses data from two replication groups, which have replicas on the same machine, how does that affect 2PC?

34 Copyright © Ellis Cohen, 2002-2005 Eager Master/Snapshot Concurrency C Lock-Based Data locked at primary copy. Either the coordinator or the chosen replica requests those locks. Non-Lock-Based Validation/checking is done at the master, which acts as a commit gateway Can use either locking or non-locking concurrency

35 Copyright © Ellis Cohen, 2002-2005 Eager Propagation Models When and how are updates propagated –from chosen replica to master –from master to other snapshot replicas Transactional Batch Send batched information about writes to replicas along with PREPARE message Continuous on Write Propagate each update when it occurs (don't wait for the end of the transaction) Immediate Confirmation Propagate each update when it occurs, and wait for an ACK. Similar to ROWA, but propagates managed by the replication group, not by the coordinator.

36 Copyright © Ellis Cohen, 2002-2005 Propagation Capture & Apply In what format are updates "captured" where they are made, and how are they applied by the other replicas? Log-Based –Operations (logical log format; operation may need to be modified for partial replicas) –Deltas (physiological log format: "before" & "after" values of rows) Procedural –Suppose each transaction is implemented by a stored DB procedure. Just propagate the identity of the procedure and the parameters to it –May require that replicas be complete

37 Copyright © Ellis Cohen, 2002-2005 Eager Group Coordinator interacts with a single replica (e.g. nearest one) chosen from the replica group During 2PC –Coordinator requests PREPARE from that replica –That replica requests PREPARE from all the other snapshot replicas, propagating all updates along with the request No primary copy, so –Must use a non-locking protocol –Validation/checking must be done at every replica Read & write from single replica C

38 Copyright © Ellis Cohen, 2002-2005 Eager Variants All reads and writes are to master only –Other replicas used for hot standbys, or to support disconnected operation –Used to implement 2-safe backup All writes are to master only –Queries of data unchanged by current transaction can be directed to any replica –How about querying data affected by the transaction's updates Must either be directed to master Coordinator maintains client-side cache with all changes, and queries use cache + any replica

39 Copyright © Ellis Cohen, 2002-2005 Eager Summary EAGER Advantages Global Consistency & 1SR Need not immediately propagate each write Can read/write from any single replica (except for variants) EAGER Disadvantages Every commit requires propagating to all (available) replicas High overhead for every commit

40 Copyright © Ellis Cohen, 2002-2005 MultiMaster Model MultiMaster (Combines Group & Master) What kind of update model should be used among the master sites? What kind of update model should be used among a master and its snapshots?

42 Copyright © Ellis Cohen, 2002-2005 Failure and Partitioning When eager replication is used, the replicas all need to be able to communicate with one another. Failure prevents communication. –Site failure -- a site crashes –Network failure -- a link or links fail, partitioning the network. A live replica can't tell which of these is responsible for its inability to communicate. It can generally assume that it is in a partition with just the replicas it can communicate with.

43 Copyright © Ellis Cohen, 2002-2005 Partitioning Assume a set of replicas are partitioned. C1 C2 Majority Partition Approach Only if the partition contains a (weighted) majority of the replicas. Disconnected Operation Each can continue. Requires merging (a.k.a. reconciliation) when the network recovers. Can each partition continue executing read-write transactions that update its set of replicas?

44 Copyright © Ellis Cohen, 2002-2005 Primary Copy Election In a master/snapshot topology, each partition needs a primary copy. What if a partition doesn’t have one? Majority Partition Approach Use weights to ensure that the majority partition contains the primary copy. [But what if the primary copy itself crashed?] Elect a Primary Copy Elect a primary copy using an election protocol [similar to 3PC protocol to elect a new coordinator] What should be done in a multimaster environment?

45 Copyright © Ellis Cohen, 2002-2005 Discovering Transaction Conflicts As part of healing (i.e. recovery from) a network partition, conflicts may be discovered between committed transactions that were in disconnected partitions. Modification (W/W) Conflicts: Transactions in different partitions modified the same data item (inconsistently). Can lead to lost updates. R/W Conflicts: A transaction in one partition read data that was modified in the other partition. Can lead to non-serializable results; however, because the results in each partition are consistent (w.r.t. the partition), it is sometimes acceptable to ignore pure R/W conflicts.

46 Copyright © Ellis Cohen, 2002-2005 Eager Reconciliation Approaches Compensation "Undo" conflicting committed transactions by executing compensating transactions. Tentative Commit When disconnected, transactions only commit tentatively. During reconciliation, these are either fully committed or aborted. Conflict Resolution Conflicting modifications are resolved by "merging" the changes.

47 Copyright © Ellis Cohen, 2002-2005 Primary vs Group Reconciliation Eager Primary Reconciliation During healing, the elected primary provides a description (typically the log) of transactions committed during partition to the original primary The original primary identifies and reconciles all conflicts, which are (in the normal course of things) propagated to all the replicas Eager Group Reconciliation During healing, a replica provides its changes to some or all replicas it was partitioned from. Each replica identifies, reconciles and propagates changes independently. To maintain consistency, this implies –Symmetric reconciliation: The results of reconciliation must be identical at each replica, independent of the order in which changes and propagated updates are received –A replica must be able to ignore changes and propagated updates it has already processed

48 Copyright © Ellis Cohen, 2002-2005 Compensation Every transaction that might need to be "undone" has a compensating transaction associated with it. A committed transaction that has a conflict is "undone" by executing its compensating transaction (often followed by re-executing the original transaction) This can lead to cascading compensation. Any committed transaction which read data written by the original transaction may need to have its compensating transaction run as well.

49 Copyright © Ellis Cohen, 2002-2005 Motivating Tentative Commit If we can delay all commitments during partition, we can simply abort conflicting transactions during healing. However, commitment is necessary for reducing resource conflicts Long-running transactions that don't commit –If lock-based: Can block other transactions for long periods –If validation-based: Are more likely to fail validation

50 Copyright © Ellis Cohen, 2002-2005 Tentative Commitment During network partition, commits are tentative –A tentatively committed transaction is not yet durable and may subsequently be aborted. –However, other transactions may see its updates. This can lead to cascaded aborts, so they must be tentative as well. Reconciliation resolves tentative commits –Transactions without conflicts will be fully committed –Transactions with conflicts will be aborted Usually uses primary reconciliation –All resolution is done at the primary copy –If group reconciliation is used, it must be symmetric, otherwise transactions will be committed at some sites and aborted at others A system might also allow a transaction to explicitly commit tentatively (even without using replicas), and then be either committed or aborted at a later time (forcing cascaded aborts)

51 Copyright © Ellis Cohen, 2002-2005 Clients & Tentative Commitment Explicit Abort A client may be able to explicitly abort a transaction that is still only tentatively committed Triggering & Notification A client may be able to arrange to –execute a procedure when a tentatively committed transaction is about to be committed (and which could actually abort the transaction) –to notify the user or (more generally) execute a procedure after a tentatively committed transaction is committed or aborted.

52 Copyright © Ellis Cohen, 2002-2005 Identifying Modification Conflicts A site may receive an unprocessed update (transaction log entry) which conflicts with its current state Update Conflict Old value of log entry <> Current record state Insert Conflict Primary key of record to be inserted is already in the table Delete Conflict Primary key of record to be updated or deleted not present in table

53 Copyright © Ellis Cohen, 2002-2005 Resolution Techniques for Modification Conflicts Latest Timestamp If update timestamp > data timestamp, do update, else discard Example: Address Change Max If new value > current data value, do update, else discard Example: Max daily temperature Additive Data value := current data value + update's new value - update's old value Example: Bank account balance These are built-in to Oracle; others may be defined by DBA. These conflict resolution techniques can be used as a prelude to either compensation or tentative commit resolution.

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations

Presentation on theme: "1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations

Presentation on theme: "1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

Similar presentations

About project

Feedback