Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn. Boston University April 23, 2003.

Similar presentations


Presentation on theme: "RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn. Boston University April 23, 2003."— Presentation transcript:

1 RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn. Boston University April 23, 2003

2 Goal of the Research An algorithm to implement atomic read/write shared memory in a dynamic network setting.An algorithm to implement atomic read/write shared memory in a dynamic network setting. –Participants may join, leave, fail during computation. –Mobile networks, peer-to-peer networks. High availability, low latency.High availability, low latency. Atomicity for all patterns of asynchrony and change.Atomicity for all patterns of asynchrony and change. Good performance under reasonable limits on asynchrony and change.Good performance under reasonable limits on asynchrony and change. Applications:Applications: –Battle data for teams of soldiers in military operation. –Game data for players in a multiplayer game.

3 Approach: Dynamic Quorums Objects are replicated at several network locations.Objects are replicated at several network locations. To accommodate small, transient changes:To accommodate small, transient changes: –Uses quorum configurations: members, read-quorums, write-quorums. –Maintains atomicity during stable situations. –Allows concurrency. To handle larger, more permanent changes:To handle larger, more permanent changes: –Reconfigure –Maintains atomicity across configuration changes. –Any configuration can be installed at any time. –Reconfigure concurrently with reads/writes; no heavyweight view change.

4 RAMBO RAMBO: Reconfigurable Atomic Memory for Basic Objects (dynamic atomic read/write shared memory).RAMBO: Reconfigurable Atomic Memory for Basic Objects (dynamic atomic read/write shared memory). Global service specification:Global service specification: Algorithm:Algorithm: –Reads and writes objects. –Chooses new configurations, notifies members. –Identifies, removes (“garbage-collects”) obsolete configurations. –All concurrently. RAMBO

5 Main algorithm + reconfiguration serviceMain algorithm + reconfiguration service Loosely coupledLoosely coupled Recon service:Recon service: –Provides the main algorithm with a consistent sequence of configurations. Main algorithm:Main algorithm: –Handles reading, writing. –Receives, disseminates new configuration information; no formal installation. –Garbage-collects old configurations. –Reads/writes may use several configurations. Recon Net R RAMBO Recon RAMBO algorithm structure

6 Main algorithm: Reads/writes Uses two-phase strategy:Uses two-phase strategy: –Phase 1: Collect object values from read-quorums of active configurations. –Phase 2: Propagate latest value to write-quorums of active configurations. Operations may execute concurrently.Operations may execute concurrently. Quorum intersection properties guarantee atomicity.Quorum intersection properties guarantee atomicity. Our communication mechanism:Our communication mechanism: –Background gossiping –Terminate by fixed-point condition, involving a quorum from each active configuration.

7 Removing old configurations Main algorithm removes old configurations by garbage- collecting them in the background.Main algorithm removes old configurations by garbage- collecting them in the background. Two-phase garbage-collection procedure:Two-phase garbage-collection procedure: –First phase: Inform write-quorum of old configuration about the new configuration. Collect object values from read-quorum of the old configuration. –Second phase: Propagate the latest value to a write-quorum of the new configuration. Garbage-collection concurrent with reads/writes.Garbage-collection concurrent with reads/writes. Implemented using gossiping and fixed points.Implemented using gossiping and fixed points.

8 Consensus Recon Net Implementation of Recon Uses distributed consensus to determine successive configurations 1,2,3,…Uses distributed consensus to determine successive configurations 1,2,3,… Members of old configuration propose new configuration.Members of old configuration propose new configuration. Proposals reconciled using consensusProposals reconciled using consensus Consensus is a heavyweight mechanism, but:Consensus is a heavyweight mechanism, but: –Used only for reconfigurations, infrequent. –Does not delay Read/Write operations.

9 Implementation of consensus Use a version of the Paxos algorithm [Lamport 89, 98, 02].Use a version of the Paxos algorithm [Lamport 89, 98, 02]. Agreement, validity guaranteed absolutely.Agreement, validity guaranteed absolutely. Termination guaranteed if/when underlying system stabilizes.Termination guaranteed if/when underlying system stabilizes. Leader chosen using failure detectors.Leader chosen using failure detectors. Leader conducts two-phase algorithm with retries.Leader conducts two-phase algorithm with retries. decide(v) init(v) Consensus

10 Models and analysis Timed I/O automata models.Timed I/O automata models. Prove atomicity for arbitrary patterns of asynchrony and change.Prove atomicity for arbitrary patterns of asynchrony and change. Analyze performance conditionally, based on failure and timing assumptions.Analyze performance conditionally, based on failure and timing assumptions. –Reads and writes take time at most 8d, under reasonable “steady-state” assumptions.

11 Other approaches Other approaches Use consensus to agree on total ordering of operations: [Lamport 89…]Use consensus to agree on total ordering of operations: [Lamport 89…] –Not resilient to transient failures. –Termination of r/w depends on termination of consensus. Totally-ordered broadcast over group communication: [Amir, Dolev, Melliar-Smith, Moser 94], [Keidar, Dolev 96]Totally-ordered broadcast over group communication: [Amir, Dolev, Melliar-Smith, Moser 94], [Keidar, Dolev 96] –View formation takes a long time, delays reads/writes. –One change may trigger view formation. Dynamic quorums over GC: [De Prisco, et al, 99]Dynamic quorums over GC: [De Prisco, et al, 99] –View formation delays reads/writes. –New view must satisfy intersection requirements. Single reconfigurer: [Lynch, Shvartsman 97], [Englert, Shvartsman 00]Single reconfigurer: [Lynch, Shvartsman 97], [Englert, Shvartsman 00]

12 Outline of talk 1. Introduction  2. Reconfigurable Atomic Memory (RAMBO) specification 3. Reconfiguration service (Recon) specification 4. Implementation of RAMBO using Recon 5. Proof of atomicity 6. Implementation of Recon 7. Conditional performance results 8. Conclusions

13 2. RAMBO Service Specification I, infinite set of participants’ locationsI, infinite set of participants’ locations X, set of objectsX, set of objects C, configuration identifiersC, configuration identifiers External actions for each i and x:External actions for each i and x: –Inputs: join x,i, read x,i, write(v) x,i, recon(c,c’) x,i –Outputs: join-ack x,i, read-ack(v) x,i, …, report(c) x,i Ignore joins in this talk.Ignore joins in this talk. Behavior:Behavior: –Assuming basic well-formedness conditions, RAMBO guarantees atomicity. –Liveness replaced by latency bounds. RAMBO

14 Atomicity AKA linearizabilityAKA linearizability Definition: Each operation appears to occur at some point between its invocation and response.Definition: Each operation appears to occur at some point between its invocation and response. Sufficient condition: For each object x, all the read and write operations for x can be partially ordered by , so that:Sufficient condition: For each object x, all the read and write operations for x can be partially ordered by , so that: –  is consistent with the order of invocations and responses: there are no operations such that  1 completes before  2 starts, yet  2   1. –All write operations are ordered with respect to each other and with respect to all the reads. –Every read returns the value of the last write preceding it in .

15 Implementing RAMBO Composition of separate service for each x.Composition of separate service for each x. RAMBO (for x) uses separate Recon service (for x):RAMBO (for x) uses separate Recon service (for x): Net Recon recon read, write RAMBO new-config

16 3. Recon Service Specification External actions for each i:External actions for each i: –Inputs: recon(c,c’) i –Outputs: recon-ack i, report(c) i, new-config(c,k) i –And some joining actions (ignore) Behavior:Behavior: –Assuming well-formedness, Recon produces consistent configuration identifiers at participating locations: Agreement: Two configs never assigned to same k. Validity: Any announced new-config was previously requested by someone. No duplication: No configuration is assigned to more than one k.

17 Outline 1. Introduction  2. Reconfigurable Atomic Memory (RAMBO) specification  3. Reconfiguration service (Recon) specification  4. Implementation of RAMBO using Recon 5. Correctness (atomicity) 6. Implementation of Recon 7. Conditional performance results 8. Conclusions

18 4. Implementing RAMBO using Recon ReconRecon –Chooses configurations –Tells members of the previous and new configuration. –Informs Reader-Writer components (new-config). Reader-WriterReader-Writer –Conducts read and write operations Two-phased quorum-based algorithm. Uses all current configurations. –Garbage-collects obsolete configurations.

19 Static Reader-Writer protocol Quorum configuration for I:Quorum configuration for I: –read-quorums, write-quorums, collections of subsets of I RW –For any R in read-quorums, W in write-quorums : RW  . R  W  . Replicate the object x at all locations in I.Replicate the object x at all locations in I. At each i in I, keep:At each i in I, keep: –value –tag, consisting of (sequence number, location) Read, Write use two phases:Read, Write use two phases: –Phase 1: Read (value, tag) from a read-quorum –Phase 2: Write (value,tag) to a write-quorum

20 Static Reader-Writer protocol Write at location i:Write at location i: –Phase 1: Read (value, tag) from a read-quorum. Determine largest seq-number among the tags that are read. Choose new-tag := (larger sequence-number, i). –Phase 2: Propagate (new-value, new-tag) to a write-quorum. Read at location i:Read at location i: –Phase 1: Read (value, tag) from a read-quorum. Determine largest (value,tag) among those read. –Phase 2: Propagate this (value,tag) to a write-quorum. Return value. Highly concurrent.Highly concurrent. Quorum intersection implies atomicityQuorum intersection implies atomicity

21 Why Readers propagate If not, then atomicity can be violated:If not, then atomicity can be violated: Note: The value after phase 1 is reliable---could use it “optimistically”.Note: The value after phase 1 is reliable---could use it “optimistically”. Write(1)... slow 0 1 Read returns 1Read returns

22 Extend to dynamic setting Any member of current configuration can propose a new configuration.Any member of current configuration can propose a new configuration. Recon produces consistent configurations.Recon produces consistent configurations. Reader-Writer processes run two-phase static quorum- based algorithm, using all current configurations.Reader-Writer processes run two-phase static quorum- based algorithm, using all current configurations. Uses gossip and fixed-point tests.Uses gossip and fixed-point tests. When Recon provides new configuration, Reader-Writer doesn’t abort reads/writes in progress, but does extra work to access additional processes needed for new quorums.When Recon provides new configuration, Reader-Writer doesn’t abort reads/writes in progress, but does extra work to access additional processes needed for new quorums.

23 Configurations and Config Maps Configuration cConfiguration c –members(c) -- “owners” of the data in configuration c –read-quorums(c) –write-quorums(c) Configuration map cmConfiguration map cm –Sequence of configurations cm(k) –Can be defined, undefined (  ), garbage-collected ( ± ) ±± ccc  c ...  GC’d Defined Mixed Undefined c

24 Configuration maps c0c0  c0c0 c1c1  c0c0 c1c1 c2c2  ckck  ± c1c1 c2c2  ckck  ±± c2c2  ckck ... ±±± c3c3  ckck  ±±±±± ccc  c 

25 Reader-Writer state worldworld value, tagvalue, tag cmapcmap pnum1, counts phases of locally-initiated operationspnum1, counts phases of locally-initiated operations pnum2, records latest known phase numbers for all locationspnum2, records latest known phase numbers for all locations op-record, keeps track of the status of a current locally initiated read/write operationop-record, keeps track of the status of a current locally initiated read/write operation –Includes op.cmap, consisting of consecutive configs. gc-record, keeps track of the status of a current locally- initiated garbage-collection operationgc-record, keeps track of the status of a current locally- initiated garbage-collection operation

26 Reader-Writer protocol One kind of message, gossiped nondeterministically.One kind of message, gossiped nondeterministically. Message  W, v, t, cm, ns, nr  from i to j, where:Message  W, v, t, cm, ns, nr  from i to j, where: –W is i ’s world –v,t are i’s value and tag –cm is i’s cmap –ns is i’s phase number, pnum1 –nr is the latest phase number i knows for j, pnum2(j) (ns,nr) used to identify “fresh” messages.(ns,nr) used to identify “fresh” messages. Key actions are taken when “enough” information has been gathered (fixed-point).Key actions are taken when “enough” information has been gathered (fixed-point).

27 When  W,v,t,cm,ns,nr  arrives from j: world := world  Wworld := world  W if t > tag then (value,tag) := (v,t)if t > tag then (value,tag) := (v,t) cmap := update(cmap,cm)cmap := update(cmap,cm) –Updates cmap with newer information in cm. pnum2(j) := max(pnum2(j), ns)pnum2(j) := max(pnum2(j), ns) gc-record: If the message is “fresh”, then record the sender.gc-record: If the message is “fresh”, then record the sender. op-record: If message is “fresh” then:op-record: If message is “fresh” then: –Record the sender. –Extend op.cmap with newly-discovered configurations.

28 Processing reads and writes Reads and Writes perform Query and Propagation phases using known configurations, stored in op.cmap.Reads and Writes perform Query and Propagation phases using known configurations, stored in op.cmap. –Query phase: Obtains fresh value, tag, cmap information from read-quorums. –Propagation phase: Propagates up-to-date (value,tag) to write-quorums; obtains fresh cmap information from write-quorums. –Both phases: Extend op.cmap with newly-discovered configurations; new configurations are also used in the phase. Each phase ends with a fixed point, after hearing from quorums of all the configurations currently in op.cmap.Each phase ends with a fixed point, after hearing from quorums of all the configurations currently in op.cmap.

29 Reader-Writer: Fixed points

30 Garbage collection A process can try to GC config k when its cmap looks like:A process can try to GC config k when its cmap looks like: Phase 1:Phase 1: –Informs a write-quorum of c k about c k+1. –Collects latest (value, tag) from a read-quorum of c k. Phase 2:Phase 2: –Propagates (value, tag) to a write-quorum of c k+1. –Set cmap(k) to ±. GC operates concurrently with reads and writes.GC operates concurrently with reads and writes. Uses gossiping and fixed points.Uses gossiping and fixed points. ± ckck c k+1 ...

31 Outline 1. Introduction  2. Reconfigurable Atomic Memory (RAMBO) specification  3. Reconfiguration service (Recon) specification  4. Implementation of RAMBO using Recon  5. Correctness (atomicity) 6. Implementation of Recon 7. Conditional performance results 8. Conclusions

32 5. Proof of Atomicity Atomicity holds for:Atomicity holds for: –arbitrary patterns of asynchrony, –arbitrary crash-failures and message loss, –arbitrary joins. Proof: Construct partial order  of read and write operations satisfying:Proof: Construct partial order  of read and write operations satisfying: –  is consistent with the order of invocations and responses. –All write operations are ordered with respect to each other and with respect to all the reads. –Every read returns the value of the last write preceding it in . Let  be the lexicographic order on the operations’ tags, and order write with tag t before all reads with tag t.Let  be the lexicographic order on the operations’ tags, and order write with tag t before all reads with tag t.

33 Showing consistency Lemma 1: Tags of GC operations are nondecreasing with respect to the configuration index.Lemma 1: Tags of GC operations are nondecreasing with respect to the configuration index. Proof: GC is done sequentially.Proof: GC is done sequentially. Lemma 2: If the first GC of config k completes before a read/write operation  begins, then the tag of the GC is less than or equal to the tag of  (< if  is a write).Lemma 2: If the first GC of config k completes before a read/write operation  begins, then the tag of the GC is less than or equal to the tag of  (< if  is a write). Lemma 3: If  1 and  2 are two read/write operations and  1 completes before  2 begins, then the tag of  1 is less than or equal to the tag of  2 (strictly less if  2 is a write).Lemma 3: If  1 and  2 are two read/write operations and  1 completes before  2 begins, then the tag of  1 is less than or equal to the tag of  2 (strictly less if  2 is a write).

34 Proof of Lemma 3 Assume  1 and  2 are two read/write operations and  1 completes before  2 begins.Assume  1 and  2 are two read/write operations and  1 completes before  2 begins. Each phase uses consecutive configurations.Each phase uses consecutive configurations. Case 1: prop-cmap(  1 ) and query-cmap(  2 ) share a configuration c.Case 1: prop-cmap(  1 ) and query-cmap(  2 ) share a configuration c. –Quorum intersection for c yields the tag inequality. Case 2: All configurations in prop-cmap(  1 ) are less than all those in query-cmap(  2 ).Case 2: All configurations in prop-cmap(  1 ) are less than all those in query-cmap(  2 ). –Then the tag inequality follows from a chain of tag inequalities, following a chain of GC operations for the intervening configurations. Uses Lemmas 1 and 2. Case 3: All configs in prop-cmap(  1 ) are greater than all those in query-cmap(  2 ).Case 3: All configs in prop-cmap(  1 ) are greater than all those in query-cmap(  2 ). –Impossible.

35 Outline 1. Introduction  2. Reconfigurable Atomic Memory (RAMBO) specification  3. Reconfiguration service (Recon) specification  4. Implementation of RAMBO using Recon  5. Correctness (atomicity)  6. Implementation of Recon 7. Conditional performance results 8. Conclusions

36 6. Implementing Recon Recon algorithm uses (static) consensus services to determine configurations 1, 2, 3,…Recon algorithm uses (static) consensus services to determine configurations 1, 2, 3,… Cons(k,c): Used to determine config k, if config k-1 is c.Cons(k,c): Used to determine config k, if config k-1 is c. Consensus is used only for reconfigurations, does not delay read or write operations.Consensus is used only for reconfigurations, does not delay read or write operations. Consensus Recon Net recon recon-ack

37 Implementing Recon Simple---no atomicity issues.Simple---no atomicity issues. Members of old configuration may propose a new configuration; proposals reconciled using consensus.Members of old configuration may propose a new configuration; proposals reconciled using consensus. –recon(c,c’): Request for reconfiguration from c to c’. If c is the k-1 st configuration (and is current), then send init message to members; invoke Cons(k,c) with initial value c’ –Receive an init message: Participate in consensus. –decide(c’): Tell Reader-Writer the new configuration; send config message to members of c’. –Receipt of config message: Tell Reader-Writer the new configuration. Consensus implemented using Paxos Synod algorithm.Consensus implemented using Paxos Synod algorithm.

38 7. Latency Analysis Consider a subset of timed executions: Gossip occurs:Gossip occurs: –Periodically, and –At certain key times: At beginning of operation phase. Just after receiving a message from someone with a new phase number. Just after certain join and reconfiguration events. Perform local steps immediately.Perform local steps immediately. Reliable message delivery, bounded delay.Reliable message delivery, bounded delay. Normal timing for consensus services.Normal timing for consensus services.

39 Additional assumptions e-Configuration-viability for time parameter ee-Configuration-viability for time parameter e –A read-quorum and a write-quorum of configuration k remain alive, until at least time e after configuration k+1 is “installed” (decided upon by all non-failed members of configuration k). e-Reconfiguration-spacinge-Reconfiguration-spacing –recon(c, * ) i occurs at least e time after report(c) i e-Join-connectivitye-Join-connectivity –If i and j join by time t then they learn about each other by time t+e

40 Latency results Reconfiguration:Reconfiguration: –13d, if recon(c,c’) i occurs and no members of c subsequently fail. Garbage-collection of c k by process i:Garbage-collection of c k by process i: –4d, if process i, a read-quorum and a write-quorum of c k, and a write-quorum of c k+1, do not fail. Read or write operation by process i in a “stable” system:Read or write operation by process i in a “stable” system: –4d, if no reconfigurations occur, and process i’s cmap is “up-to-date”. Learning about configurations:Learning about configurations: –If i and j are “old enough” and don’t fail, then information from i is conveyed to j within time 2d.

41 Latency results Garbage-collection, in executions with 6d- reconfiguration-spacing and 5d-configuration-viability:Garbage-collection, in executions with 6d- reconfiguration-spacing and 5d-configuration-viability: –If report(c) occurs at i and i does not fail then any non-failed process that is old enough learns about c and garbage-collects any older configuration within time 6d. Read and write operations, in executions with 12d- reconfiguration-spacing and 11d-configuration-viability:Read and write operations, in executions with 12d- reconfiguration-spacing and 11d-configuration-viability: –8d, for an operation managed by a process that is old enough and does not fail.

42 8. Conclusions RAMBO algorithmRAMBO algorithm Composed of R/W algorithm, Recon service, ConsensusComposed of R/W algorithm, Recon service, Consensus Atomicity in all executions.Atomicity in all executions. Good latency bounds:Good latency bounds: –For reading, writing, garbage-collection. –Under assumptions about timing, joins, failures, and rate of reconfiguration.

43 Algorithmic innovations Dynamic configurations:Dynamic configurations: –Members can be changed dynamically. –Any current member may request reconfiguration. –Arbitrary configurations can be installed; no intersection requirements. Loosely-coupled reconfiguration:Loosely-coupled reconfiguration: –Concurrent reading, writing, reconfiguration. –Reads/writes can use several configurations; can complete during reconfiguration. Efficient “steady-state”:Efficient “steady-state”: –Assuming bounded delays, infrequent reconfiguration, and periodic gossip, read and write operations complete in time 8d. –Each phase involves at most 2 configurations.

44 Comparison with other approaches Using consensus to agree on a total ordering of operations:Using consensus to agree on a total ordering of operations: –We use consensus only for the configurations. –Consensus termination impacts only reconfiguration latency, not read and write latency. Group communication:Group communication: –Our reads/writes work during “new view” establishment. Dynamic quorum configurations over GC:Dynamic quorum configurations over GC: –We allow arbitrary new configurations. Single reconfigurer approaches:Single reconfigurer approaches: –We allow multiple reconfigurers. –We uncouple introduction of new configurations and garbage-collection of old configurations.

45 Current and future work LAN implementationsLAN implementations [Musial, Shvartsman 03] [Bachmann 03] –Experiments, toy applications More analysis:More analysis: –“Normal behavior” starting from some point Algorithmic improvements and additions:Algorithmic improvements and additions: –Concurrent garbage-collection [Gilbert, Lynch, Shvartsman 03] –Limiting communication. –Eliminating second phase of reads, or first phase of writes, in special cases. –Choosing good configurations. –Better join protocol, explicit “leave” protocol. –Early return of read values. –Backup strategies for when configuration-viability fails. –Extensions to other data types? Corresponding lower bounds, impossibility results?Corresponding lower bounds, impossibility results? Implementations in mobile networks, peer-to-peer systemsImplementations in mobile networks, peer-to-peer systems


Download ppt "RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn. Boston University April 23, 2003."

Similar presentations


Ads by Google