Presentation is loading. Please wait.

Presentation is loading. Please wait.

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

Similar presentations


Presentation on theme: "D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University."— Presentation transcript:

1 D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University

2 A Classic Paper Time, Clocks, and the Ordering of Events in Distributed Systems Leslie Lamport, CACM, July 1978 Introduced: – logical clocks (“Lamport clocks”) – state-machine replication model – Causal multicast – physical clocks that respect causality – (Almost but not) vector clocks

3 Concurrency and time A B C C What do these words mean? after? last? subsequent? eventually?

4 Same world, different timelines Which of these happened first? A B W(x)=v R(x) Message send Message receive “Event e1a wrote W(x)=v” e1a e1b e2 e3be4 e3a e1a is concurrent with e1b e3a is concurrent with e3b This is a partial order of events.

5 Concurrency and time Premise: nodes communicate only by messages. Nodes can observe a remote event only through some chain of messages. – Message patterns define the observability of events. Event e 1 can affect or cause event e 2 iff the node initiating e 2 could have already observed e 1. – Message patterns define a potential causality relation. – Also known as happened-before (  ). Event e 1 precedes e 2 iff e 1 could have caused e 2. – We can view the potential causality relation as logical time. Events are concurrent if neither precedes the other.

6 Axiom 1: happened-before (  ) C A B C 1.If e1, e2 are in the same process/node, and e1 comes before e2, then e1  e2. - Also called program order Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

7 Axiom 2: happened-before (  ) C A B C 2. If e1 is a message send, and e2 is the corresponding receive, then e1  e2. Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

8 Axiom 3: happened-before (  ) C A B C 3.  is transitive happened-before is the transitive closure of the relation defined by #1 and #2 Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

9 Potential causality: example A B C A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 A1 < B2 < C2 B3 < A3 C2 < A4

10 Logical clocks We need clocks that reflect logical time. Start with logical clocks: 1. Each node maintains a monotonically increasing logical clock value LC. 2. Events are timestamped with the current LC value at the generating node. – Increment LC on each new event: LC = LC + 1 3. Piggyback current clock value on all messages. – On receive, advance receiver’s LC to sender’s LC. – If LC s > LC then LC = LC s + 1

11 The Clock Condition e 1  e 2 implies that LC(e 1 ) < LC(e 2 ) LC ordering respects potential causality. Is the converse also true? i.e., Does LC(e 1 ) < LC(e 2 ) imply that e 1  e 2 ? What if e 1 and e 2 are concurrent?

12 Logical clocks: example 0 0 0 1 2 39 6 1 5 8 2 3 4 6 7 45 6 78 10 7 5 A B C C5: LC update advances receiver’s clock if it is “running slow” relative to sender. A6-A10: receiver’s clock is unaffected because it is “running fast” relative to sender.

13 Potential causality A SS C W(x)=v R(x) v? OK “Try it now” “Event e1a wrote W(x)=v” If nodes observe events in an order that violates causality, they may perceive an inconsistency.

14 Causal consistency A SS C W(x)=v R(x) 0???? This ordering violates causal consistency. “Try it now” “Event e1a wrote W(x)=v”

15 Same world, unified timelines? A B W(x)=v R(x) e1b e2 e4 e3a This is a total order of events. Also called a sequential schedule. It is consistent with the partial order induced by happened-before (causal order). External witness e1a e5 X e3b

16 Same world, unified timelines? A B W(x)=v R(x) e1b e2 e3b e4 e3a Here is another total order of the same events. Like the last one, it is consistent with the partial order: it does not change any existing orderings; it only assigns orderings to events that are concurrent in the partial order. External witness e1a X X

17 Replicated State Machines, revisited Replicas are “consistent” if all replicas apply the same (deterministic) updates in the same total order. Suppose a client can read from any replica while writes are propagating. Does the order matter? op A op B op A op B op B op A

18 Challenge: A Decentralized Mutex Implement a distributed mutex, respecting these conditions: 1.Resource holder must release before algorithm grants the resource to another Process; 2.Different acquire requests must be granted “in the order in which they are made”; 3.If every holder eventually releases, then every request is eventually granted. Note: ordering is arbitrary for concurrent requests.

19 Definition of a lock (mutex) Acquire + release ops on L are strictly paired. – After acquire completes, the caller holds (owns) the lock L until the matching release. Acquire + release pairs on each L are ordered. – Total order: each lock L has at most one holder. – That property is mutual exclusion; L is a mutex.

20 Lamport’s Mutex Problem The trick here is to implement a mutex service in a decentralized way, without a central lock server. Nodes/processes exchange messages and execute requests locally in the same order. – Replicated State Machine Lamport’s condition II says the order must be causal. – (why?).

21 Lamport’s Proposed Solution Process group of N processes/nodes/peers with unique IDs. Each process is a state machine: – Two operations: request (acquire) and release – Internal state per-process: request queue Basic idea: – Requester sends request to all peers (multicast) – Peers have some means to order the requests: receive code may defer or reorder messages before delivery. – All peers agree on the same order of delivery. – Therefore, all peers agree on the sequence of acquisitions.

22 Causal Multicast (“cbcast”) 1.Assume FIFO delivery at transport: messages sent by P are received by others in their send order (program order). 2.Sender timestamps each messages with its logical clock. 3.Receiver acknowledges a received request with a message back to the sender, timestamped with logical clock of receiver (ack sender). Propagates knowledge of peer timestamp values. 4.The logical clocks induce a partial order on the messages. To make it a total order: if two messages have the same timestamp, use the unique process ID to break the tie. This total order is “arbitrary”, but it respects potential causality. 5.A node delivers a request R to the local application when R is stable: it has the earliest timestamp T of all undelivered requests, and there can be no preceding request still in the network (e.g., the node has received a later-timestamped message from every peer).

23 The Lamport Mutex Algorithm 1.Use causal multicast. 2.Processes cache the acquire requests they have received on their request queue, in timestamp order. Including their own acquire requests 3.If a release message is received, remove the corresponding request from the queue. 4.Take the lock when your request is at the front of the queue. This request is next, and it is stable.

24 Enhanced causal multicast: Isis Later refinements to causal multicast (“cbcast”) improved concurrency (e.g., in Birman’s Isis group system). – Lamport’s approach delivers messages in a total order. – Lamport’s use of logical clocks imposes an ordering in some instances where causality does not require it. – Isis advanced cbcast (causal broadcast): deliver concurrent messages in any order; never delay your own messages to self; use various batching optimizations for message bursts. – How to know if messages are concurrent? Lamport’s state-machine service doesn’t handle failures: addressed in later work (e.g., Isis again). – Requires failure detection, uniform atomicity, and views.

25

26

27

28

29

30


Download ppt "D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University."

Similar presentations


Ads by Google