Presentation is loading. Please wait.

Presentation is loading. Please wait.

Communication and Data Sharing for Dynamic Distributed Systems Nancy Lynch MIT Alex Shvartsman UConn.

Similar presentations


Presentation on theme: "Communication and Data Sharing for Dynamic Distributed Systems Nancy Lynch MIT Alex Shvartsman UConn."— Presentation transcript:

1 Communication and Data Sharing for Dynamic Distributed Systems Nancy Lynch MIT Alex Shvartsman UConn

2 Motivation and Focus Constructing distributed applications for highly dynamic environments is a difficultConstructing distributed applications for highly dynamic environments is a difficult In practice, considerable effort is required to make applications resilient toIn practice, considerable effort is required to make applications resilient to –changes in client requirements –evolution of the underlying computing medium Focus of our workFocus of our work –design and analysis of distributed services –that provide useful guarantees and –that make the construction of sophisticated distributed applications easier.

3 Our Approach TraditionallyTraditionally –research on distributed services emphasized specification and correctness, while –research on distributed algorithms emphasized complexity and performance We combine these concerns leading toWe combine these concerns leading to –algorithms that perform efficiently and degrade gracefully in dynamic distributed settings, and –whose correctness, performance, and fault-tolerance guarantees are expressed by precisely-defined global services.

4 Research Direction Summary Develop and analyze algorithms to solve problems of communication and data sharing in highly dynamic distributed environmentsDevelop and analyze algorithms to solve problems of communication and data sharing in highly dynamic distributed environments “Dynamic” encompasses“Dynamic” encompasses –Changes in network topology –Processor mobility –Changing sets of participants –Wide range of failures –Timing variations

5 Research Direction (cont’d) The properties we study includeThe properties we study include –ordering and reliability guarantees for communication –coherence guarantees for data sharing The algorithmic results will be accompanied byThe algorithmic results will be accompanied by –lower bound and impossibility results, –which describe inherent limitations on what problems can be solved, and at what cost.

6 RAMBO Reconfigurable Atomic Memory for Read/Write Objects Nancy Lynch Alex Shvartsman

7 Design Goals RAMBORAMBO –Reconfigurable Atomic Memory for Basic Objects (Read/Write) for message-passing systems Dynamic replication for availability and survivabilityDynamic replication for availability and survivability Loosely-coupled on-the-fly reconfigurationLoosely-coupled on-the-fly reconfiguration High concurrencyHigh concurrency Low latencyLow latency Safety for any patterns of asynchrony and failuresSafety for any patterns of asynchrony and failures Good performance under partial asynchrony and for moderate failuresGood performance under partial asynchrony and for moderate failures

8 Algorithmic Ideas Reconfigurable quorum systemsReconfigurable quorum systems –Quorums maintain consistency during modest and transient changes –Reconfigurations accommodate more drastic and permanent changes Read/write operations are frequentRead/write operations are frequent –Use quorum access and allow concurrency –Isolate from reconfiguration Reconfigurations are infrequentReconfigurations are infrequent –Use consensus to impose total order (Paxos) –Optimistic dissemination without formal installation –Conservative garbage collection of obsolete config-s

9 Related Prior Work Atomic read/write memory in message-passing modelsAtomic read/write memory in message-passing models –Upfal Widgerson 86 –Attiya Bar-Noy Dolev 91, 95 –Lynch Shvartsman 97 –Englert Shvartsman 01   –Lamport 89, 98 QuorumsQuorums –Gifford 79, Thomas 79 –and many many others

10 Methodology Specify algorithmSpecify algorithm –Interacting state machines –Using non-deterministic “gossip” Show correctness/safety forShow correctness/safety for –arbitrary patterns of asynchrony –assuming arbitrary crash-failures and message loss Analyze performance for a subset of timed executionsAnalyze performance for a subset of timed executions –Bounded message delay, 0-time local processing –Some “gossip” becomes deliberate, some periodic –Non-failure of certain quorums for certain periods –Reason about operation latency –(Of course none of this impacts safety)

11 Showing Read/Write Atomicity We show atomicity using a partial orderWe show atomicity using a partial order Atomicity of a sequence  of reads/writesAtomicity of a sequence  of reads/writes –Let  be an irreflexive PO of all op-s in . Show: –For any , finitely many    –If  precedes , then not    –If  is write then either    or    –Any read returns value written by last write, per   [Lynch, Lemma 13.16]

12 Approach: Values and Tags Each value v has an associated tag tEach value v has an associated tag t –Tag is made up of the sequence-processor pair Reads:Reads: –a set of value-tag pairs is obtained –the result is the value with the maximum tag Writes:Writes: –a set of value-tag pairs is obtained –new-value is propagated with a new-tag that is a lexicographic increment of tag : new-tag :=  tag.seq + 1, pid 

13 Using Quorum Systems Given a set I (a set of processor ids)Given a set I (a set of processor ids) A quorum system is a pairA quorum system is a pair – WhereWhere –Read-quorums is a collection of subsets of I –Write-quorums is a collection of subsets of I Such thatSuch that RW RW   –For any R in read-quorums and W in write-quorums, R  W   W 1 W 2 W 1 W 2   –For any W 1 and W 2 in write-quorums, W 1  W 2  

14 High-Level Functions JoinerJoiner –Introduces new participants to the system Reader-WriterReader-Writer –Routine read and write operations –Two-phased algorithm using all “known” configurations –Using tags ReconfigurationReconfiguration –Chooses new next configuration –Informs members of the previous configuration Garbage collection (“packaged” with Reader-Writer)Garbage collection (“packaged” with Reader-Writer) –Identify and remove obsolete configurations

15 RAMBO RAMBO System Reader-Writer Recon Cons Network Joiner

16 Architectural View Each component is formally specifiedEach component is formally specified –Input/Output Automata [Tuttle Lynch] Joiners are specified as Joiner i for i in IJoiners are specified as Joiner i for i in I Reader-Writers are Reader-Writer i for i in IReader-Writers are Reader-Writer i for i in I Reconfigurers are Recon i for i in IReconfigurers are Recon i for i in I Consensus instances are Cons(k,c) for i in N, c in CConsensus instances are Cons(k,c) for i in N, c in C –Where the members of configuration c decide on the configuration number k Network is specified in terms of Channel i,j for i, j in INetwork is specified in terms of Channel i,j for i, j in I –Assumed only to be “honest” The System is then the composition of all automataThe System is then the composition of all automata

17 Configurations and Config Maps Configuration cConfiguration c –members(c) -- set of members of configuration c –read-quorums(c) -- set of read quorums –write-quorums(c) -- set of write quorums Configuration map cmConfiguration map cm –mapping from naturals to configurations –cm(k) is the configuration k, and it can be –defined, undefined (  ), garbage-collected ( ± ) ±± ccc  c ...  G-C-ed Defined “Mixed” Undefined

18 Configuration Maps c0c0  c0c0 c1c1  c0c0 c1c1 c2c2  ckck  ± c1c1 c2c2  ckck  ±± c2c2  ckck  TIME... ±±± c3c3  ckck  ±±±±± ccc  c 

19 Reader-Writer Protocol One “gossip” messageOne “gossip” message – Message from a sender s to a receiver r is such thatMessage from a sender s to a receiver r is such that –World is s ’s set of participants, and r  World –value and tag are the object value and its tag at s –cmap is the configuration map at s –ns and nr are sender’s and best known receiver’s phase numbers used to identify “fresh” messages These messages areThese messages are –Sent non-deterministically –For performance analysis we impose an additional deterministic send policy Certain actions are taken when “enough” info is gatheredCertain actions are taken when “enough” info is gathered

20 gossip RAMBO i Reader-Writer i Recon i Read/Write Protocol RAMBO j Reader-Writer j Recon j read i gossip new-config (c,k) i read-ack (v) i write (v) i RAMBO n Reader-Writer n Recon n gossip write-ack i...

21 Reader-Writer Code Start read Start write End read End write New cfg Receive Send Query fix Prop fix

22 Fixpoint reached? Start End Recv Send Send Collect responses The Phase Pattern Send to a collection of processes in “known” configsSend to a collection of processes in “known” configs Collect responses and update configuration informationCollect responses and update configuration information Continue until a certain predicate is satisfiedContinue until a certain predicate is satisfied Continue sending no yes

23 Read and Write Operations Reads and Writes use Query and Propagation phases involving known quorum configurationsReads and Writes use Query and Propagation phases involving known quorum configurations –Query obtains information about “latest” operations from read quorums & updates configurations –Propagation disseminates the results of “latest” operation to write quorums & updates configurations Fixed point must be reached -- discovery of new configurations requires new quorums to be reachedFixed point must be reached -- discovery of new configurations requires new quorums to be reached Read or Write PropagateQuery Start Query End Query Start Prop. End Prop.

24 Reader-Writer: Send/Recv

25 Reader-Writer: Fixed Points

26 Why Readers Propagate If the readers do not propagate, atomicity can be easily violated:If the readers do not propagate, atomicity can be easily violated: Write of v 1... ( s l o w ) v0v0 v0v0 v1v1 Read of v 1 Read of v 0 v0v0 v0v0 v0v0

27 RAMBO i Reader-Writer i Recon i Joining Protocol RAMBO j Reader-Writer j Recon j Joiner j join ack Joiner i join (J) i join gossip

28 Garbage Collection When a process has the following configuration map cmap it can garbage-collection configuration cmap ( k) = c kWhen a process has the following configuration map cmap it can garbage-collection configuration cmap ( k) = c k Two-phase protocol using the “gossip” messagesTwo-phase protocol using the “gossip” messages –Update own tag & value by obtaining the “best” tag and value from a read- and write-quorum of cmap(k) –Propagate tag & value to a write-quorum of cmap(k+1) –Set cmap(k) to ± This “bootstraps” configuration k in case it is “too new”This “bootstraps” configuration k in case it is “too new” ±± ckck c k+1 ...

29 Reconfiguration Very simple protocol for Recon iVery simple protocol for Recon i –Reconfiguration is free of atomicity concerns Initiator i (multiple initiators are allows)Initiator i (multiple initiators are allows) –Accepts reconfiguration request recon(c,c’) i from environment: reconfigure from c to c’ –If c is the locally-known “latest” configuration k-1, informs member of c of the reconfiguration –Calls Paxos for k to decide on “next” configuration c’ –Informs Reader-Writer i of the new configuration Participants iParticipants i –Learn about the initiation of reconfiguration –Participate in Paxos –Inform Reader-Writer i of the new configuration

30 Latency Analysis Certain gossip and messages become “important”Certain gossip and messages become “important” –Messages to members of “active” configurations when read or write is performed –Messages to configurations k and k+1 when garbage collection is performed –Specific messages when joining and reconfiguring –Responses to such messages Consider “good” timed executionsConsider “good” timed executions –Bounded message delay d –0 local processing time –Environment is well-formed

31 Additional Assumptions These are assumptions are used in some resultsThese are assumptions are used in some results Configuration-viability for time parameter eConfiguration-viability for time parameter e –If c becomes “known” as configuration k anywhere –Then either one read- and one write-quorum of c stays alive forever –Or if by time t another configuration is decided upon by non-faulty members of c, then one read- and one write-quorum of c stays alive until t+e Reconfiguration-spacing for time parameter eReconfiguration-spacing for time parameter e –recon(c, * ) i occurs at least e time after report(c) i Join-connectivity for time parameter eJoin-connectivity for time parameter e –If i and j join by time t then the learn about each other by time t+e

32 Latency Bounds (selected) Joining:Joining: –2d, provided “joiner” and “joinee” do not fail Reconfiguration:Reconfiguration: –In 0-configuration-viable executions –If recon(c,c’) i action occurs by time t and no members of c fail after t, then recon-ack i occurs at t+12d+  Garbage-collection of c k at non-faulty i :Garbage-collection of c k at non-faulty i : –4d, if R in read-quorums(c k ), W 1 in write-quorums(c k ), and W 2 in write-quorums(c k+1 ) do not fail Read and write operations in “stable” systemsRead and write operations in “stable” systems –If no reconfig-s in progress, then process with “up- to-date” config map completes its operation in 4d (These do not depend on “gossip”)(These do not depend on “gossip”)

33 More Latency (1) These bounds depend on periodic gossipThese bounds depend on periodic gossip Learning new configurationsLearning new configurations –If i and j are “old enough” and do not fail, then information from i is conveyed to j within time 2d Garbage-collection when reconfigurations are 6d-spaced and executions are 6d-configuration-viableGarbage-collection when reconfigurations are 6d-spaced and executions are 6d-configuration-viable –If recon(c, * ) occurs before t and c is “known” by t-6d then any non-faulty process that is “old enough” learns about c and garbage-collects any older configuration by time t+6d –All non-faulty “old enough” processes have one or two defined configurations in their configuration maps

34 More Latency (2) Read and write operations (with periodic gossip)Read and write operations (with periodic gossip) –Complete in time 8d for non-faulty processes that are “old enough”, provided execution satisfies 12.1d-recon-spacing and 6d-configuration-viability Learning in failure-free executionsLearning in failure-free executions –Let J be the set of processes that joined by time t 1. Then by time t + log|J|, J  world i for any i in J 2. If i in J “knows” a configuration at time t’, then any j in J learns about it by max(t + log|J|, t’) + 2d

35 Algorithmic Innovations Dynamic owners of data:Dynamic owners of data: –Any and all owners may request reconfiguration –the set of owners can be changed dynamically Dynamic configurations:Dynamic configurations: –Arbitrary configurations can be installed –no constraints on intersection of quorum sets or member sets in distinct configurations. Loosely-coupled reconfiguration:Loosely-coupled reconfiguration: –Concurrent reads, writes and reconfiguration –If finite reconfigurations occur during a read or write operation, then its completion does not depend on whether any reconfigurations complete

36 Algorithmics (cont’d) Efficient “steady-state”:Efficient “steady-state”: –Assuming bounded delays, infrequent reconfig-s, and periodic gossip, reads and writes complete in time constant times the message delay –Assuming periodic garbage collection, readers/writers only deal with 1 or 2 configurations Fast “catch-up”:Fast “catch-up”: –New “joiners” with out-of-date configurations can catch up after a logarithmic number of message exchanges provided the “joiners graph” is connected

37 Comparison with Other Approaches Paxos or a similar consensus service can be used to agree on global order of operationsPaxos or a similar consensus service can be used to agree on global order of operations –We only agree on sequence configurations –Consensus termination impacts only Recon –Reads/writes are not affected by consensus Group communication systems can also be usedGroup communication systems can also be used –Our algorithm is “from scratch”: low-level send- receive, no hidden/relative costs –Reads/writes work during “new view” establishment Dynamic quorums / dynamic configurations workDynamic quorums / dynamic configurations work –We allow arbitrary new configurations - no static  Our earlier work also solves this problemOur earlier work also solves this problem –New work: concurrent recon-s and garbage-collect

38 Work in Progress and Futures Full-fledged implementation is under developmentFull-fledged implementation is under development Additional analysis in progressAdditional analysis in progress –“Normal timing” starts at some point –Trade-off between configuration-viability and garbage collection –Analysis of “join-connectivity” graphs Algorithmic refinementsAlgorithmic refinements –Elimination of unnecessary communication –Explicit “leave” protocol –Gossip: “owners” vs. “users” of objects


Download ppt "Communication and Data Sharing for Dynamic Distributed Systems Nancy Lynch MIT Alex Shvartsman UConn."

Similar presentations


Ads by Google