Presentation is loading. Please wait.

Presentation is loading. Please wait.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.

Similar presentations


Presentation on theme: "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant."— Presentation transcript:

1 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant Broadcasts Spring 2008 Prof. Idit Keidar

2 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 2 Today’s Material Distributed Systems 2nd edition, Sape Mullender (Editor) –Causal order, Ch. 4 –Fault-tolerant broadcasts, Ch. 5 –Vector clocks, Ch. 4 –State machine replication, from Ch. 7 Attiya-Welch –Vector clocks, Ch. 6

3 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 3 Broadcast Sending a message to all the nodes in the system –E.g., our course mailing list What’s it good for? Allows for redundancy –In storage, processing Broadcast Service - building block for replication

4 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 4 Motivation: Replication 2 Paradigms: Primary-Backup - Passive State Machine Replication (SMR) - Active

5 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 5 Primary-Backup Replication Primary Backup(s) Client broadcast

6 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 6 Primary-Backup (Passive) Replication “Hot” standby Client talks to primary server Primary updates backup(s) Client detects server failure using timeout –Performs “fail-over” to backup server –May need to repeat last operation(s) Can be a problem with “false suspicions” Works with benign servers only

7 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 7 Primary-Backup Variants Backups can detect primary failure Client can be oblivious to failure –Using dispatcher –Using IP takeover Picture from IBM Web site: Highly available Web server cluster on HTTP Server Primary/backup with a network dispatcher model

8 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 8 State Machine Replication (SMR): The Idea aaa bb c Replicas are identical deterministic state machines Process operations in the same order  remain consistent

9 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 9 SMR Architecture Client broadcast

10 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 10 SMR Architecture: Option II Client A Client B broadcast

11 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 11 State Machine Replication Send updates to all servers All servers are identical deterministic state machines –Servers begin in the same initial state –Perform operations in the same order to remain consistent May be slower than primary backup, but provides quicker, smoother fail-over Can overcome false suspicions and tolerate malicious servers

12 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 12 Replica Coordination Requirements Agreement: all correct replicas receive all client requests Order: replicas process requests in the same order We want a Broadcast Service satisfying these We’ll start with Agreement

13 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 13 Before We Begin … Define the model where we want to solve the problem Define the service interface Specify the service –Using properties

14 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 14 Model: Correct vs. Faulty Processes Look at a complete run (execution) –External observer’s view A process that does not fail in a run is correct in that run Otherwise, the process is faulty in the run –A process that fails any time in the run is faulty throughout the entire run

15 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 15 Threshold Failure Model t out of n processes may fail t is usually given as a function of n, e.g., –t < n –2t < n –3t < n

16 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 16 Model: Synchronous vs. Asynchronous Synchrony: –Bounded latency, clock drift, processing time –Process crash failures can be accurately detected Asynchrony: non-assumption –Process crash failures cannot be accurately detected Failstop –Time-free, but crash failures accurately detected Unreliable failure detectors – later in the course

17 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 17 Service Interface: Separating Reception from Delivery Application (e.g., state machine) deliver: update state Network receive Delivery Layer: wait for messages that should be acted on first

18 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 18 Broadcast Service for Replication Primitives: broadcast(m), deliver(m). –For simplicity, assume m is unique. Network Broadcast Algorithm Application deliver broadcast receivesend Broadcast Algorithm Application deliver broadcast receivesend

19 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 19 Reliable Broadcast Specification Validity: if a correct process broadcasts m then all correct processes eventually deliver m Agreement: if a correct process delivers m then all correct processes eventually deliver m –Uniform agreement: if any process delivers m then all correct processes eventually deliver m Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast

20 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 20 Model for Implementation Asynchronous Process crash failures –Note: cannot be detected Pair-wise point-to-point reliable links between correct processes –If a correct process p sends a message to a correct process q, then q receives the message –Safety or liveness?

21 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 21 Reliable Broadcast Implementation Simple implementation … Does it solve Uniform Reliable Broadcast? What if there are some link failures? –Under what condition does the protocol still work?

22 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 22 Replica Coordination Requirements Agreement: all correct replicas receive all client requests Order: replicas process requests in the same order

23 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 23 Possible Reception Orders? Process 1 send “1” receive message send “2” receive message Process 2 send “a” send “b” receive message Space-time diagram P1 P2 1 a 2 b

24 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 24 Possible Reception Orders? Process 1 send “1” receive message send “2” receive message Process 2 send “a” send “b” receive message Space-time diagram P1 P2 1 a 2 b

25 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 25 FIFO Order If some process broadcasts message m before message m’, then every correct process that delivers m’ delivers m beforehand. Trivial to implement (How?)

26 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 26 Possible Delivery Orders? Process 1 bcast “1” dlvr msg (“a”) dlvr msg bcast “2” dlvr msg Process 2 bcast “a” dlvr msg (“2”) What delivery orders make sense? (“a”)

27 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 27 Cause and Effect When observing a distributed system, there no common clock to order events of different processes We instead use the (weaker) notion of cause and effect If one event e caused another event f to happen, then e and f could never have happened simultaneously –e happened before f When we do not know a given program’s logic but can only observe its communication, –We cannot tell whether one event causes another –We can tell whether it could have caused another

28 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 28 Happened Before or Causal Order [Lamport 78] Event e happens before (causally precedes) event f, denoted e → f if: –The same process executes e before f or –e is send(m) and f is receive(m) or –Exists h so that e → h and h → f We define concurrent, e || f, as: ¬(e → f  f → e) In a broadcast service, application-level causality is between broadcast and deliver

29 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 29 Causal Order For two messages, m and m’, m →m’ if send(m) causally precedes send(m’) –Causal order: transitive closure of FIFO + some process delivers m before broadcasting m’ Causal Delivery: If m →m’ then every correct process that delivers m’ delivers m beforehand

30 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 30 Does FIFO Between all Processes Guarantee Causal Order?

31 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 31 Total Order If two correct processes deliver both m and m’, they deliver them in the same order

32 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 32 FIFO Broadcast Reliable Broadcast + FIFO Simple implementation on top of reliable broadcast (using sequence numbers) Network Reliable Broadcast Application receivesend FIFO Broadcast deliverbroadcast deliverbroadcast Reliable Broadcast Application receivesend FIFO Broadcast deliverbroadcast deliver broadcast Reliable Broadcast FIFO Deliver

33 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 33 Causal Broadcast Reliable Broadcast + Causal Implementation on top of Reliable Broadcast

34 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 34 If we Could Use Clocks …. Assume: processes have access to a global real- time clock RC, message delays are bounded by D –Every message m contains a timestamp TS(m) = RC –DR1: At time t, deliver all received messages with timestamps up to t –D in increasing timestamp order –If two messages have the same timestamp, break ties by process id (deliver the one with the lower id first) Clock Condition: if e → f then RC(e) < RC(f) Hence, if m →m’ then TS(m) < TS(m’)

35 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 35 Lamport’s Timestamps – LTS Logical Clocks Invent a clock that satisfies the clock condition Each process maintains a logical clock – Local positive-integer variable Each message is tagged with the source’s logical clock at the time the message is sent

36 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 36 LTS Broadcast Algorithm Part I: Logical Clock Assignment code for process p i TS[j] ← 0,  j=0,…,n pending ← empty broadcast (m) TS[i] ← TS[i] + 1 /* p i ’s logical clock respects FIFO */ send (m,  TS[i], i  ) to all upon receive (m,  t, j  ) TS[j] ← t add (m,  t, j  ) to pending TS[i] ← max (TS[i], t) + 1 /* p i ’s logical clock respects causality */

37 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 37 Example: Logical Clocks LTS=  1,p1  LTS=  1,p2  LTS=  3,p3 

38 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 38 Does This Solve The Problem? When do we deliver a message? Deliver a message with TS = t when no message with TS < t can be received A message m received by p i is stable at p i if no future messages with timestamps smaller than TS(m) can be received by p i DR2: Deliver all received messages that are stable at p i in increasing timestamp order

39 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 39 Detecting Stability Assume FIFO communication If p i receives m from p j with TS(m), p i cannot later receive a message m’ from p j with TS(m’) < TS(m) Stability of m at p i is guaranteed when –p i has received a message with timestamp greater than TS(m) from all processes Note: the timestamp is a pair  LC, pid 

40 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 40 LTS Broadcast Algorithm Part II: Delivery Rule code for process p i let (m,  t, j  ) be the entry in pending with the smallest  t, j  if  t, j    TS[k],k   k=0,…n then /* DR2 */ deliver (m) remove (m,  t, j  ) from pending

41 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 41 The LTS Algorithm Implements Causal Broadcast Causal Delivery: If m →m’ then every correct process that delivers m’ delivers m beforehand Timestamps respect the Clock Condition: –If m →m’ then TS(m) < TS(m’) DR2 + FIFO links ensure that if m is in pending, all messages with lower TS were received –Were either delivered or are pending Delivery from pending is by TS order

42 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 42 Fault Tolerance?

43 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 43 Vector Clocks Process p i has a vector clock VC[1…n] –VC[i] is the local message sequence number of the last message broadcast by p i –For j≠i, VC[j] is the latest message p i delivered from p j VC is sent with each message m –For j≠i, m.VC[j] is p j ’s latest message that causally precedes m

44 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 44 Vector Clocks: Sending At process p i, on broadcast(m) –VC[i]  VC[i]+1 –Use Reliable Broadcast to send (m,VC) to all No need to send to myself –Deliver m locally

45 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 45 Vector Clocks: Delivery Rule Upon receive m –Place in message buffer Deliver m from p j from buffer if –VC[j] = m.VC[j] -1 and –Forall k≠j : VC[k] ≥ m.VC[k] Upon deliver –VC[j] := VC[j] + 1 VC[j] is the number of messages of p j that causally precede p i ’s subsequent messages FIFO Messages causally preceding m were delivered

46 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 46 Vector Clocks Example

47 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 47 Atomic Broadcast Services Atomic Broadcast: – Reliable Broadcast + Total Order FIFO Atomic Broadcast –FIFO + Reliable Broadcast + Total Order Causal Atomic Broadcast –Causal + Reliable Broadcast + Total Order HW question: are FIFO Atomic Broadcast and Causal Atomic Broadcasts equivalent?

48 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 48 Causal Atomic Broadcast in Failstop Model Order messages by logical timestamp (LTS) –Break ties by process id Use FIFO Broadcast When is a message with LTS t delivered? –Reminder: failstop = failures are accurately detected –Assume further that no message from a faulty process arrives after its failure is detected

49 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 49 Atomic Broadcast in Asynchronous Systems??? Alas, impossible if even one process can crash

50 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 50 Now, Back to State Machines We can build state machines using Atomic Broadcast A client can –Broadcast to all servers; or –Forward its request to one of the servers to broadcast to the others Resend on timeout if the server fails What about client failures?

51 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 51 Multicast Problems Processes organized in groups –Groups have names –Messages sent to groups –Like broadcast, but for group members –Processes can join and leave groups Processes may care about who else is a member of the group (group membership)


Download ppt "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant."

Similar presentations


Ads by Google