EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org

2 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Outline Group communication systems –Vector timestamp –Ordered multicast –Membership protocols –Agreed and safe delivery –Virtual synchrony Checkpointing and roll-forward recovery Reference: –Reliable distributed systems, by K. P. Birman, Springer; Chapter 14-16 Reminder: wiki page review due midnight 4/12

3 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Vector Timestamps Lamport timestamps do not guarantee if C(a)< C(b) then a indeed happened before b We need vector timestamps for that –Each process P i has an array V i [1..n], where V i [j] denotes the number of events that process P i knows have taken place at process P j –When P i sends a message m, it adds 1 to V i [i], and sends V i along with m as vector timestamp vt(m) –When P j receives m from P i with vt(m), it updates each V j [k] to max(V j [k],vt(m)[k]), and increments V j [j] by 1

4 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Vector Timestamps V i (a) < V j (b)  (if and only if) − V i (a)[ k ] <= V j (b)[ k ] for every k, and − V i (a)[ i ] < V j (b)[ j ]

5 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Ordered Reliable Multicast Reliable multicast – the message is targeted to multiple receivers, and all receivers receive the message reliably Ordered reliable multicast – if many messages are multicast by many senders, in what order the messages are delivered at the receivers? –First in first out (FIFO) –Causal – the causal relationship among msgs preserved –Total – all msgs are delivered at all receivers in the same order

6 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao FIFO Ordered Multicast FIFO or sender ordered multicast: Messages are delivered in the order they were sent (by any single sender) pqrspqrs a bcd e delivery of c to p is delayed until after b is delivered

7 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a b

8 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a bc delivery of c to p is delayed until after b is delivered

9 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a bc e delivery of c to p is delayed until after b is delivered e is sent (causally) after b

10 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a bcd e delivery of c to p is delayed until after b is delivered delivery of e to r is delayed until after b&c are delivered

11 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Implementing Causal Ordering Start with a FIFO multicast We can strengthen a FIFO multicast into a causal multicast by adding vector timestamp Assumptions –The only type of events are message sending and receiving. However, the clock is incremented only on sending –All messages sent are multicast to all other processes, and –The multicast is reliable and FIFO

12 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Implementing Causal Ordering Using Vector Timestamps When a process p j receives a message from p i, it delivery the message only if the following two conditions are met: –It has delivered all earlier messages sent by p i –It has delivered any message that process p i had delivered at the time it multicast the message vt(m)[i] = V j [i] + 1 vt(m)[k] ≤ V j [k] for k  i Delivery Condition at P j for m from P i

13 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Totally Ordered Multicast Total ordering: Messages are delivered in same order to all recipients (including the sender) pqrspqrs a b c d e all deliver a, b, c, d, then e

14 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Implementing Total Ordering Use a token that moves around –Token has a sequence number –When you hold the token you can send the next burst of multicasts Use a sequencer to order all multicast –Message is first multicast to all, including the sequencer; then the sequencer determines the order for the message and informs all –Or send to the sequencer and the sequencer multicast with total order information –Each sender can take turn to serve as the sequencer

15 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Group membership service Input: –Process “join” events –Process “leave” events –Apparent failures Output: –Membership views for group(s) to which those processes belong

16 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Issues? The service itself needs to be fault-tolerant –Otherwise our entire system could be crippled by a single failure! –Hence Group Membership Service (GMS) must run some form of protocol (GMP)

17 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Approach We’ll assume that GMS has members {p,q,r} at time t Designate the “oldest” of these as the protocol “leader” –To initiate a change in GMS membership, leader will run the GMP –Others can’t run the GMP; they report events to the leader

18 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao GMP Example Example: –Initially, GMS consists of {p,q,r} –Then q is believed to have crashed p q r

19 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Unreliable Failure Detection Recall that failures are hard to distinguish from network delay –So we accept risk of mistake –If p is running a protocol to exclude q because “q has failed”, all processes that hear from p will cut channels to q Avoids “messages from the dead” –q must rejoin to participate in GMS again

20 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Basic GMP Someone reports that “q has failed” Leader (process p) runs a 2-phase commit protocol –Announces a “proposed new GMS view” Excludes q, or might add some members who are joining, or could do both at once –Waits until a majority of members of current view have voted “ok” –Then commits the change

21 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao GMP Example Proposes new view: {p,r} [-q] Needs majority consent: p itself, plus one more (“current” view had 3 members) Can add members at the same time p q r Proposed V 1 = {p,r} V 0 = {p,q,r} OK Commit V 1 V 1 = {p,r}

22 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Special Concerns? What if someone doesn’t respond? –P can tolerate failures of a minority of members of the current view New first-round “overlaps” its commit: –“Commit that q has left. Propose add s and drop r” –P must wait if it can’t contact a majority Avoids risk of partitioning

23 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao What If Leader Fails? Here we do a 3-phase protocol –New leader identifies itself based on age ranking (oldest surviving process) –It runs an inquiry phase “The adored leader has died. Did he say anything to you before passing away?” Note that this causes participants to cut connections to the adored previous leader –Then run normal 2-phase protocol but “terminate” any interrupted view changes leader had initiated

24 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao GMP Example New leader first sends an inquiry Then proposes new view: {r,s} [-p] Needs majority consent: q itself, plus one more (“current” view had 3 members) Again, can add members at the same time p q r Proposed V 1 = {r,s} V 0 = {p,q,r} OK Commit V 1 V 1 = {r,s} Inquire [-p] OK: nothing was pending

25 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Safe and Agreed Delivery For totally ordered reliable multicast, there are two delivery policies –Safe delivery: a message is delivered only when all correct processes have received it –Agreed delivery: a message is delivered as long as it is the next message in total order

26 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Safe and Agreed Delivery Safe delivery guarantees the uniformity of multicast: –If a message is delivered to any process, it is delivered by all correct processes Agreed delivery does not: –It is possible that a message is delivered in one (or more) process, but is not delivered by some correct process

27 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Virtual Synchrony Model Proposed by Ken Birman Virtual synchrony model ensures processes perceive process failures and other configuration changes as occurring at the same logical time When network partitioning occurs, only the primary component is allowed to make progress

28 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Why “Virtual” Synchrony? What would a synchronous execution look like? In what ways is a “virtual” synchrony execution not the same thing?

29 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao A Synchronous Execution p q r s t u With true synchrony executions run in genuine lock-step

30 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Virtual Synchrony at a Glance With virtual synchrony executions only look “lock step” to the application p q r s t u

31 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Virtual Synchrony at a Glance p q r s t u We use the weakest (hence fastest) form of communication possible

32 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao In General? Replace “safe” with “agreed” totally ordered multicast when possible Replace totally ordered multicast with causally ordered multicast Replace causally ordered multicast with FIFO ordered multicast Unless replies are needed, don’t wait for replies to a multicast

33 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Why “Virtual” Synchrony? The user sees what looks like a synchronous execution –Simplifies the developer’s task But the actual execution is rather concurrent and asynchronous –Maximizes performance –Reduces risk that lock-step execution which may trigger correlated failures

34 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Correlated Failures Why do we claim that virtual synchrony makes these less likely? –Recall that many programs are buggy –Often these are Heisenbugs (order sensitive) With lock-step execution each group member sees group events in identical order –So all die in unison With virtual synchrony orders differ –So an order-sensitive bug might only kill one group member!

35 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Roll-Forward Recovery With replication in space, it is possible to recover a fault while the system is progressing ahead Roll-forward recovery is made possible by –Checkpointing of replica state –Logging of incoming messages –Reliable, totally ordered group communication system

36 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Checkpointing Checkpointing: the act of taking a snapshot of an entity so that we can restore it later A replica is a process running in an operating system. The state of a process –Processes' memory, stack and registers –Threads –Open or mmap'ed files –Current working directory –Interprocess communication: Semaphores, shared memory, pipes, sockets –Dynamic Load Libraries –…

37 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Checkpointing Many tools are available to perform checkpointing transparently or semi- transparently –http://www.checkpointing.org/ –Condor, libckpt, etc. –Checkpoints taken in general are not portable –Checkpoint size might be big

38 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Checkpointing of Application State Sometimes it is more efficient to save and store the application state only –Checkpoints can be very portable and compact in size –class Counter { int counter; Counter(int initVal) { counter = initVal; } void increment() {counter++; } void decrement() {counter--; } void setState(int c) {counter = c; } int getState() { return counter;}| }

39 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Logging Logging of messages –Checkpointing in general is expensive –Logging of messages is cheaper => we can periodically do checkpointing, or do checkpointing on demand and log all messages in between Logging of other non-deterministic activities –Access order to shared data

40 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Why We Need GCS? We want to ensure the newly admitted replica to have a consistent state with others when it starts Steps of adding a new replica into a group (with on-demand checkpointing) –A recovered (or a new) replica joins a group –A join message is multicast in total order –On receiving the join message, it is put into incoming message queue and wait for processing –When the join message is at the head of the queue, a checkpoint is taken and it is transferred to the new replica

41 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Why We Need GCS? –At the new replica, it starts queueing messages after it receives the join messages (sent by itself) –When the checkpoint is received by the new replica, its state is restored using the received checkpoint (the checkpoint is delivered out of order!) –The queued messages are delivered in order, at the new replica –Other replicas do not stop and wait for the new replica Steps of adding a new replica into a group with periodic checkpointing is similar

42 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Steps of Roll-Forward Recovery

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.

Similar presentations

Presentation on theme: "EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.

Similar presentations

Presentation on theme: "EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering."— Presentation transcript:

Similar presentations

About project

Feedback