EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org

2 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Outline Two-phase commit Group communication systems –Event ordering –Ordered multicast –Techniques to implement ordered multicast –Membership protocols Reference: –Reliable distributed systems, by K. P. Birman, Springer; Chapter 14-16 Reminder: wiki page review due midnight 4/12

3 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Two-Phase Commit Model: The client who initiated the computation acts as a coordinator; processes required to commit are the participants Phase 1a: Coordinator sends VOTE_REQUEST to participants (also called a pre-write) Phase 1b: When participant receives VOTE_REQUEST it returns either YES or NO to coordinator. If it sends NO, it aborts its local computation Phase 2a: Coordinator collects all votes; if all are YES, it sends COMMIT to all participants, otherwise it sends ABORT Phase 2b: Each participant waits for COMMIT or ABORT and handles accordingly

4 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao 2PC: First Phase C P P P VOTE_REQUEST C P P P YES

5 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao 2PC: Second Phase C P P P COMMIT C P P P COMMITTED

6 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Two-Phase Commit The finite state machine for the coordinator in 2PC The finite state machine for a participant

7 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao 2PC – Failing Participant Consider participant crash in one of its states, and the subsequent recovery to that state: Initial state: No problem, as participant was unaware of the protocol Ready state: Participant is waiting to either commit or abort. After recovery, participant needs to know which state transition it should make => log the coordinator ’ s decision Abort state: Need to make entry into abort state idempotent Commit state: Also make entry into commit state idempotent

8 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao 2PC – Failing Coordinator If it fails, the final decision is not available until the coordinator recovers Alternative: Let a participant P in the ready state timeout when it hasn ’ t received the coordinator ’ s decision. Then, P tries to find out what other participants know Question: Can P not succeed in getting the required information?

9 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao 2PC – Failing Coordinator Question: Can P not succeed in getting the required information? Observation: Essence of the problem is that a recovering participant cannot make a local decision: it is dependent on other (possibly failed) processes –There might exist one participant that has received a COMMIT decision from the coordinator and subsequently failed (more or less concurrently failed with the coordinator) –The rest of participants cannot unilaterally decide to abort the transaction

10 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Two-Phase Commit Why 2PC can help achieve replica consistency? –The processing of every input is carried out atomically Either all replicas succeed, or none E.g., one replica might not be able to complete the task due to deadlock avoidance (one form of nondeterminism). The whole operation will be aborted and retried 2PC is more appropriate with replicated applications using the read/write operations with structured data, such as replicated database systems

11 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Group Communication System Services provided by the GCS –Membership service: who is up and who is down Deals with failure detection and more –Reliable, ordered, multicast service FIFO, causal, total –Virtual synchrony service Virtual synchrony synchronizes membership change with multicasts Much easier to build a fault tolerance system using GCS

12 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Event Ordering “Time, Clocks, and the Ordering of Events in a Distributed System”, by Leslie Lamport, Communications of the ACM, July 1978, Volume 21, Number 7, pp.558-565 –What usually matters is not that all processes agree on exactly what time it is, but rather, that they agree on the order in which events occur

13 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Happens-Before Relation Assumptions: –The system is composed of a collection of processes, each process consists of a sequence of events –The events of a process form a sequence, where a occurs before b in this sequence if a happens before b –The sending or receiving of a message is an event in a process

14 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Happens-Before Relation The happens-before relation “→” on the set of events of a system is the relation satisfying the following three conditions: –If a and b are events in the same process, and a comes before b, then a → b –If a is the sending of a message by one process and b is the receipt of the same message by another process, then a → b –If a → b and b → c, then a → c

15 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Partial Ordering Not all events have the happens-before relationship Two distinct events a and b are said to be concurrent if a → b and b → a –Neither event can causally affect the other –This introduces a partial ordering of events in a system with concurrently operating processes “a happens before b” means that information can flow from a to b “a is concurrent with b” means that there is no information flow between a and b

16 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao How to Capture the Partial Ordering? Use logical clocks to capture the partial ordering –Define a clock C i for each process P i. Assign a number C i (a) to any event a in that process –The entire system of clocks is represented by the function C which assigns to any event b the number C(b), where C(b) =C j (b) if b is an event in process P j –The clocks C i are logical clocks rather than physical clocks

17 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Lamport Clock A Lamport logical clock is a monotonically increasing software counter Each process P i keeps its own logical clock C i to apply Lamport timestamps to events To capture the happens-before relation →, processes must do the following: –Before each event at P i : C i := C i +1 –When P i sends a message m, it piggybacks t = C i –When P j receives (m,t): C j := max(C j,t) + 1 e → e’  C(e) < C(e’)

18 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Lamport Clock: An Example

19 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Vector Timestamps Lamport timestamps do not guarantee if C(a)< C(b) then a indeed happened before b We need vector timestamps for that –Each process P i has an array V i [1..n], where V i [j] denotes the number of events that process P i knows have taken place at process P j –When P i sends a message m, it adds 1 to V i [i], and sends V i along with m as vector timestamp vt(m) –When P j receives m from P i with vt(m), it updates each V j [k] to max(V j [k],vt(m)[k]), and increments V j [j] by 1

20 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Vector Timestamps V i (a) < V j (b)  (if and only if) − V i (a)[ k ] <= V j (b)[ k ] for every k, and − V i (a)[ i ] < V j (b)[ j ]

21 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Ordered Reliable Multicast Reliable multicast – the message is targeted to multiple receivers, and all receivers receive the message reliably Ordered reliable multicast – if many messages are multicast by many senders, in what order the messages are delivered at the receivers? –First in first out (FIFO) –Causal – the causal relationship among msgs preserved –Total – all msgs are delivered at all receivers in the same order

22 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao FIFO Ordered Multicast FIFO or sender ordered multicast: Messages are delivered in the order they were sent (by any single sender) pqrspqrs a bcd e delivery of c to p is delayed until after b is delivered

23 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a b

24 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a bc delivery of c to p is delayed until after b is delivered

25 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a bc e delivery of c to p is delayed until after b is delivered e is sent (causally) after b

26 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Causally Ordered Multicast Causal or happens-before ordering: If send(a)  send(b) then deliver(a) occurs before deliver(b) at common destinations pqrspqrs a bcd e delivery of c to p is delayed until after b is delivered delivery of e to r is delayed until after b&c are delivered

27 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Implementing Causal Ordering Start with a FIFO multicast We can strengthen a FIFO multicast into a causal multicast by adding vector timestamp Assumptions –The only type of events are message sending and receiving. However, the clock is incremented only on sending –All messages sent are multicast to all other processes, and –The multicast is reliable and FIFO

28 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Implementing Causal Ordering Using Vector Timestamps When a process p j receives a message from p i, it delivery the message only if the following two conditions are met: –It has delivered all earlier messages sent by p i –It has delivered any message that process p i had delivered at the time it multicast the message vt(m)[i] = V j [i] + 1 vt(m)[k] ≤ V j [k] for k  i Delivery Condition at P j for m from P i

29 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Totally Ordered Multicast Total ordering: Messages are delivered in same order to all recipients (including the sender) pqrspqrs a b c d e all deliver a, b, c, d, then e

30 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Implementing Total Ordering Use a token that moves around –Token has a sequence number –When you hold the token you can send the next burst of multicasts Use a sequencer to order all multicast –Message is first multicast to all, including the sequencer; then the sequencer determines the order for the message and informs all –Or send to the sequencer and the sequencer multicast with total order information –Each sender can take turn to serve as the sequencer

31 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Group membership service Input: –Process “join” events –Process “leave” events –Apparent failures Output: –Membership views for group(s) to which those processes belong

32 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Issues? The service itself needs to be fault-tolerant –Otherwise our entire system could be crippled by a single failure! –Hence Group Membership Service (GMS) must run some form of protocol (GMP)

33 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Approach We’ll assume that GMS has members {p,q,r} at time t Designate the “oldest” of these as the protocol “leader” –To initiate a change in GMS membership, leader will run the GMP –Others can’t run the GMP; they report events to the leader

34 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao GMP Example Example: –Initially, GMS consists of {p,q,r} –Then q is believed to have crashed p q r

35 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Unreliable Failure Detection Recall that failures are hard to distinguish from network delay –So we accept risk of mistake –If p is running a protocol to exclude q because “q has failed”, all processes that hear from p will cut channels to q Avoids “messages from the dead” –q must rejoin to participate in GMS again

36 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Basic GMP Someone reports that “q has failed” Leader (process p) runs a 2-phase commit protocol –Announces a “proposed new GMS view” Excludes q, or might add some members who are joining, or could do both at once –Waits until a majority of members of current view have voted “ok” –Then commits the change

37 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao GMP Example Proposes new view: {p,r} [-q] Needs majority consent: p itself, plus one more (“current” view had 3 members) Can add members at the same time p q r Proposed V 1 = {p,r} V 0 = {p,q,r} OK Commit V 1 V 1 = {p,r}

38 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao Special Concerns? What if someone doesn’t respond? –P can tolerate failures of a minority of members of the current view New first-round “overlaps” its commit: –“Commit that q has left. Propose add s and drop r” –P must wait if it can’t contact a majority Avoids risk of partitioning

39 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao What If Leader Fails? Here we do a 3-phase protocol –New leader identifies itself based on age ranking (oldest surviving process) –It runs an inquiry phase “The adored leader has died. Did he say anything to you before passing away?” Note that this causes participants to cut connections to the adored previous leader –Then run normal 2-phase protocol but “terminate” any interrupted view changes leader had initiated

40 Spring 2007EEC693: Secure & Dependable ComputingWenbing Zhao GMP Example New leader first sends an inquiry Then proposes new view: {r,s} [-p] Needs majority consent: q itself, plus one more (“current” view had 3 members) Again, can add members at the same time p q r Proposed V 1 = {r,s} V 0 = {p,q,r} OK Commit V 1 V 1 = {r,s} Inquire [-p] OK: nothing was pending

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.

Similar presentations

Presentation on theme: "EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.

Similar presentations

Presentation on theme: "EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering."— Presentation transcript:

Similar presentations

About project

Feedback