Presentation is loading. Please wait.

Presentation is loading. Please wait.

OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

Similar presentations


Presentation on theme: "OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman."— Presentation transcript:

1 OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman

2 OPODIS 05 Goals Reconfigurable Distributed Storage (RDS) Atomic consistency (read/write) Fault Tolerance …in Dynamic and Asynchronous Systems.

3 OPODIS 05 Distributed Storage

4 OPODIS 05 Distributed Storage Data is replicated at several network locations

5 OPODIS 05 Distributed Storage Write Read Operation policy

6 OPODIS 05 …in Dynamic Networks

7 OPODIS 05 Distributed Storage in Dynamic Networks

8 OPODIS 05 Distributed Storage in Dynamic Networks leaving nodes joining nodes

9 OPODIS 05 Distributed Storage in Dynamic Networks

10 OPODIS 05 Distributed Storage in Dynamic Networks …requires a reconfiguration process.

11 OPODIS 05 Distributed Storage in Dynamic Networks …by achieving agreement.

12 OPODIS 05 Model Distributed –Connected set of processors –Each processor has a unique id i  I –MWMR, any processor is a potential client Asynchronous –Asynchronous processors –Point-to-point asynchronous unreliable channels Dynamic –Processors join and leave the system –Processors may crash

13 OPODIS 05 What is a configuration? Configuration –members is a set of processors, –read-quorums, write-quorums two sets of quorums – RQ  read-quorums,  WQ  write-quorums RQ  members WQ  members RQ  WQ   (only for a given configuration) Every client maintains a set of configurations, initially containing the default one.

14 OPODIS 05 Single Object Operations Overview After [ABD95] tag =  N  I, val a possible value val = Read() i (,val)=query();[prop(,val);] Write(val) i (,val’)=query();prop(,val); 1.(tag,val) query(NULL) : gathers ( tag,val ) pairs of all processors of a RQ and returns the one with the largest tag. 2.NULL prop(tag,val) : updates ( tag,val ) pairs at all processors of a WQ. Write tag Read tag

15 OPODIS 05 Reconfiguration Design Goals Sound –Totally ordered configurations Flexible –No dependences between configurations Non-intrusive –Makes possible concurrent read/write operations Fast –Strengthening fault tolerance

16 OPODIS 05 Decoupling Reconfiguration Reconfiguration = Replacing Configurations –{I} Installing a new configuration –{R} Removing old configuration(s) If {R} ≺ {I}  Operations are delayed If {I} ≺ {R}  Stronger configuration viability assumption is required

17 OPODIS 05 Solution ({R} ≺ {I})  ({I} ≺ {R})  {I} // {R} Tighter coupling between removal and installation

18 OPODIS 05 RDS Reconfiguration Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm) l is the leader c is the current configuration configs is the set of active configurations A ballot has a unique identifier b and a value v, which is a configuration Paxos phases: –Prepare: l creates a new ballot and chooses/gets the value to propose. –Propose: l proposes and gathers votes from a majority. –Propagate: l propagates decision

19 OPODIS 05 RDS Reconfiguration l RQWQ Recon(c,c’)

20 OPODIS 05 RDS Reconfiguration l RQWQ Prepare phase Recon(c,c’) Creates a new larger ballot b

21 OPODIS 05 RDS Reconfiguration l RQWQ Prepare phase Recon(c,c’)

22 OPODIS 05 RDS Reconfiguration l RQWQ > Updates its ballot’s value v with the one received Updates its configs set Prepare phase Recon(c,c’)

23 OPODIS 05 RDS Reconfiguration l RQWQ > Propose phase Recon(c,c’)

24 OPODIS 05 RDS Reconfiguration l RQWQ > Recon(c,c’) Propose phase Updates their tag and val Adds v to their configs set

25 OPODIS 05 RDS Reconfiguration l RQWQ > Recon(c,c’) Propagation phase Update their tag and val Remove configuration c from their configs set

26 OPODIS 05 Proving Atomicity Ordering configurations Ordering operations Theorem 1: The set of installed configurations in the system is totally ordered. Theorem 2: If operation  1 precedes operation  2 then  1 ’s tag is not larger than  2 ’s tag.

27 OPODIS 05 Additional Assumptions Eventual stabilization with –Unique leader l –Message delay bound d (unkown to the algorithm) –Gossip with frequency d –Restricted reconfiguration rate –Some quorums remain alive in active configurations tsts t s : System stabilization time Let’s t r be the Request time 2d t l : Algorithm stabilization time tltl

28 OPODIS 05 Reconfiguration Latency Worst case scenario: Last reconfiguration was done by a different leader. Prepare max(t l, t r ) ProposePropagate 2d d tete t e : end time Reconfiguration is complete 5d

29 OPODIS 05 Reconfiguration Latency Other cases: The leader made the previous reconfiguration. max(t l, t r ) ProposePropagate 2dd tete t e : end time Reconfiguration is complete 3d

30 OPODIS 05 Operation Latency Phase latency: 2d is sufficient for the phase round trip. In some cases (pending reconfiguration), the phase might be delayed twice. 1st round trip Operation latency: Operations are bounded by 8d. In some cases, the propagation phase of the read operation can be ignored, leading to a possible bound of 2d. 2nd round trip 2d New configuration discovered

31 OPODIS 05 Experimental Results IOA to Java code following set of rules. Implementation of Attiya, Bar-Noy, and Dolev algorithm « ABD » (w/o Reconfiguration) and RDS which shares parts of the ABD code. Using majority-based configurations. Measuring operation latency 1.While varying configuration size 2.While varying algorithm instances

32 OPODIS 05 Experimental Results Operation latency of RDS is competitive with ABD, confirming the theory. Reconfiguration messages contain operation information which might accelerate operations in RDS.

33 OPODIS 05 Conclusion RDS, Reconfigurable Distributed Storage. With sound, flexible, non-intrusive and fast reconfiguration. It solves two problems in one: Configuration replacement and Consensus. Reconfiguration is inexpensive (time). Fault tolerance is strenghtened. RAMBO can become more agressive: it is exactly what we did here!


Download ppt "OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman."

Similar presentations


Ads by Google