CS 603 Data Replication February 25, 2002. Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.

CS 603 Data Replication February 25, 2002

Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased reliance on network This is a two-edged sword

Data Replication: What? Correctness criterion: Replication invisible –Results indistinguishable from one-copy database –One-copy serializability (1SR) Alternatives –Bounded inconsistency –User selection of real/copy More discussion Friday

Data Replication: How? Goal: Ensure one-copy serializability Write-all solution: All copies identical –Write goes to every site –Read from any site –Standard single-copy concurrency control –Guarantees 1SR Single-copy concurrency control gives serializable execution Equivalent to serial execution where all writes happen in one transaction

Write All Approach 3 Writer 5 Reader 33 55 555 read 3 5

Problem: Site Failure Failure causes write to block –Must maintain locks –Clogs up entire system Is this fault tolerance? What about “write all available”? –T 0 : w 0 [x A ] w 0 [x B ] w 0 [y C ] c 0 –B-fails –T 1 : r 1 [y C ] w 1 [x A ] c 1 –B-recovers –T 2 : r 2 [x B ] w 2 [y C ] c 2 What is the serial equivalent order?

3 Writer 5 Reader 33 55 555 read53 53

Model for Replicated Data Data and Transaction Managers at each site –Data Manager: local concurrency control to guarantee local serializability –Transaction manager: Distributed actions Turns reads/writes into multi-site reads/writes Runs commit protocol Directory to get sites of each copy

Failure Assumptions Communications failure: Site A does not receive reads/writes on x A issued by B Site failure: Site A is unable to process reads/writes on x A issued by B Communications failure: Site A processes but does not acknowledge reads/writes on x A issued by B Fail-stop model, detectable by timeout

Types of Write Write(x): All copies of x will eventually be written Immediate write –Send write to all sites on request –Quick detection of conflict Delayed write –Delays non-local writes until commit –Minimizes message traffic –Abort is cheap Primary copy write –Quick detection of conflict –Lower message traffic than immediate write

Distributed Serializability A complete replicated data (RD) history H over T = {T 0, …, T n } is a partial order with ordering relation < where –H = h(  n i=0 T i ) for some translation function h –for each T i and all operations p i, q i in T i, if p i < i q i, then every operation in h(p i ) is related by < to every operation in h(q i ) –for every r j [x A ], there is at least one w i [x A ] < r j [x A ] –if w i [x]  H and r j [x]  H, then w i [x] < r j [x] or r j [x] < w i [x] –if w i [x] < i r i [x] and h(r i [x]) = r i [x A ] then w i [x A ]  h(w i [x]) Theorem: If reads-from relationships same as serial history, RD history is 1-copy serializable

Write All Available Fails Even if no recovery!

Solutions Validate availability on commit –Check if any failed writes now available –Check that all sites read or written still available –Enforces serializability for site failures Doesn’t work with communication failures!

Communication Failures Available copies fails on network partition –Each side succeeds in validation Write all blocks Write n-k, read k+1 –Generalization of the “write all” approach –Handles up to min(n-k, k+1) failures –Tradeoff read vs. write performance –Partition effect based on size of partition: <k+1: small partition acts as if all sites failed, large continues Otherwise entire system becomes read-only

Other approaches: Don’t enforce Serializability! Master copy –Writes must update master copy –Reads can be consistent or inconsistent Bounded inconsistency –Time bound on update of copies –Value bound: write all if difference too great Dumps consistency on the application –Added complexity –Better performance

CS 603 Data Replication February 25, 2002. Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.

Similar presentations

Presentation on theme: "CS 603 Data Replication February 25, 2002. Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 603 Data Replication February 25, 2002. Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.

Similar presentations

Presentation on theme: "CS 603 Data Replication February 25, 2002. Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased."— Presentation transcript:

Similar presentations

About project

Feedback