Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Similar presentations


Presentation on theme: "1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR."— Presentation transcript:

1 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR )

2 2 The Goal client Reliable replicated storage Using unreliable components Asynchrony - tolerate unpredictable network delays server (process)

3 3 Designing an Asynchronous Replicated System State machine replication (e.g., Paxos) –Any object –Impossible in asynchronous systems Atomic R/W Register [Attiya, Bar-Noy, Dolev 95] –Simple object: read( ), write(v) –Possible in asynchronous system –Atomic (linearizable) –Liveness: if #failures < #servers/2 then every operation invoked on a correct server eventually completes.

4 4 Breaking the Minority Barrier Over a long period of time #failures < #servers/2 is not good enough Reconfiguration! –Increasing resilience by changing the set of servers –Example: 3 failures out of 5 Semantics of Reconfigurable R/W register: –Atomic (linearizable) –Liveness: ? ABC DE Our first contribution: First "black box" definition (in terms of user interface)

5 Reconfigurable Register: User Interface read() (returns a value) write(value) (returns OK) reconfig(c) (returns OK) –c is a set of changes (relative to current config.) –Each change is either (Add, pid) or (Remove, pid) –Example: c = {+C, +E, –D} Only processes that were successfully added can invoke ops Universe of processes (servers): –Unknown, unbounded, possibly infinite –At any given time, only a finite number has been added change

6 Definitions Current(t) – servers in the system at time t –the “current configuration” AddPending(t) – servers whose Add is pending at t RemovePending(t) – servers whose Remove is pending at t Faulty(t) – servers that have crashed by t p i is active in an execution if –During the execution, p i does not crash –Some process invokes reconfig adding p i –No process invokes reconfig removing p i

7 Dynamic System Liveness Static system: operations complete if #failures<#servers/2 What should this be in a dynamic system? Try #1: for every t, a minority of Current(t) is in Faulty(t) What if processes crash while others are removed? no operation is guaranteed to complete in new configuration! Try #2: for every t, a minority of Current(t) is in Faulty(t)  RemovePending(t) reconfig({–A}) OK AB C

8 Adding Servers Q: At time t 0, who can crash from {A, B,..., G}? A: minority of {A, B,..., E}, and in addition, –in this scenario G can crash –in a different scenario F can crash Simple condition: any 2 servers can fail (fewer than |Current(t)|/2) reconfig({+F}) reconfig({+G}) OK time t 0 A F B G E D C

9 Dynamic Service Liveness If #reconfigs invoked in the execution is finite and at every time t in the execution, fewer than |Current(t)|/2 processes out of Current(t)  AddPending(t) are in Faulty(t)  RemovePending(t) Then: Eventually, every active process that was successfully added can invoke operations Every operation invoked by an active process eventually completes

10 10 Reconfigurable Solutions Many previous solutions: All use consensus (or similar) State machine replication (Paxos) –Use state-machine to agree on set of servers Virtual Synchrony based solutions –e.g., [Yeger-Lotem, Keidar, Dolev 97] R/W register + reconfiguration service –[Lynch, Shvartsman 97], [Englert, Shvartsman 00] –Rambo [Lynch, Shvartsman 02] –Rambo II [Gilbert, Lynch, Shvartsman 03] –Long Lived Rambo [Georgiou, Musial, Shvartsman 04] Is consensus really necessary? consensus to agree on next configuration one designated “reconfigurer” membership service stronger than consensus (equivalent to  P) Our second contribution: Consensus is NOT needed! DynaStore - algorithm for a completely asynchronous system

11 “Old” and “New” Configurations A reconfiguration transfers the state from a majority of the old config. to a majority of the new config. What if there are concurrent reconfigurations ? Suppose that initial configuration is {A, B, C, D} –A invokes reconfig({+E}); C invokes reconfig({  D}) –A writes to {A, D, E}, a majority of {A, B, C, D, E} –C reads from {B, C}, a majority of {A, B, C} –No intersection  Atomicity is violated! Simple solution: consensus on the sequence of configurations But how can we do this without consensus?

12 The approach in DynaStore For each configuration c, we use a (weak) snapshot nextConfig(c) to store the next configuration (weak) snapshot objects are (easily) implemented in an asynchronous environment Processes update nextConfig(c) to suggest the next configuration after c (concurrent updates possible) Sequence of Established Configurations (simplified): – The initial configuration is established – If c is established, then the first snapshot update to nextConfig(c) is the next established configuration after c included in every scan from nextConfig(c)

13 Transferring the State scan of nextConfig(c) returns a set of configs that follow c – if c is established, one config in the returned set is the next established config after c scanning nextConfig for each returned config returns a further set, etc. this creates a DAG of configurations – This DAG contains the sequence of established configs A reconfiguration transfers state along all paths in the DAG – This guarantees that state is transferred along the sequence of established configurations

14 Suppose that initial configuration is {A, B, C, D} A invokes reconfig({+E}); C invokes reconfig({  D}) A updates nextConfig(C 0 ) to C 1 A scans nextConfig(C 0 ) to check for concurrent updates. Scan returns {C 1 }, i.e., no concurrent updates detected –C 1 is the next established config after C 0 A’s state transfer: –Read from maj. of C 0 and maj. of C 1 –Write latest value found to maj. of C 1 Example C0C0 C1C1 {A, B, C, D, E} {A, B, C, D}

15 Suppose that initial configuration is {A, B, C, D} A invokes reconfig({+E}); C invokes reconfig({  D}) Concurrently, C updates nextConfig(C 0 ) to C 2 and scans it. Scan returns {C 1, C 2 }, implying that A’s update was concurrent C updates nextConfig(C 1 ) and nextConfig(C 2 ) to C 3. No concurrent updates detected –C 3 is an established configuration C’s state transfer: –Read from maj. of each config on every path found from C 0 to C 3 –Write latest value found to maj. of C 3 Example C0C0 C1C1 {A, B, C, D, E} {A, B, C, D} C2C2 {A, B, C} C3C3 {A, B, C, E}

16 Suppose that initial configuration is {A, B, C, D} A invokes reconfig({+E}); C invokes reconfig({  D}) A invokes a write(newValue) operation in C 1 In this scenario, DynaStore guarantees: 1.Either C’s state transfer finds newValue in C 1, or A’s write op discovers C 3 and ends after writing newValue to maj. of C 3 3.Read operations also traverse the DAG, and will find newValue on the path of established configurations, intersecting the write Example C0C0 C1C1 {A, B, C, D, E} {A, B, C, D} C2C2 {A, B, C} C3C3 {A, B, C, E}

17 17 Conclusions First “black box” definition of dynamic R/W register – In terms of events visible to user – A natural failure model – resilience changes dynamically – Possibly useful for specifying other dynamic problems DynaStore: first asynch. dynamic storage protocol – Implements a Reconfigurable Atomic MWMR register – In a completely asynchronous system (consensus impossible) – Proves that R/W storage is really easier than consensus (not only in a static system)


Download ppt "1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR."

Similar presentations


Ads by Google