Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus Krzysztof Ostrowski

Similar presentations


Presentation on theme: "Consensus Krzysztof Ostrowski"— Presentation transcript:

1 Consensus Krzysztof Ostrowski krzys@cs.cornell.edu

2 Part #1 A motivating example

3 "Coordinated Attack"

4 What are they trying to achieve? Consensus on whether to attack: –Both make the same decision. –Common knowledge: "A" knows that "B" knows that "A" knows... etc. THIS IS IMPOSSIBLE! Refer eg. to Joe Halpern’s work. It’s because the system is asynchronous. – Messages may take arbitrarily long to get delivered. – Impossible to tell if a process failed (or is just slow).

5 Why are we here? We’re here because... –Real systems are not asynchronous? We wouldn’t wait 1000 years for a message. Equipment gets repaired or replaced. –We don’t need any absolute guarantees? An asteroid may hit anybody at any time......so what guarantees are absolute anyway?! –For the stubborn... problem ill-posed?

6 What are they trying to achieve? Generals agree on whether to attack: –Both do make (at least one) decision. –Both make only one (irreversible) decision. –Both make the same decision. –If all initially intended on doing the same, then that’s what they decide. –Both make their decision at the same time.

7 What are they trying to achieve? Liveness: decision is made Correctness: exactly one decision Non-triviality: decision isn’t arbitrary Simultaneity: at the same time A new hope: We could drop some assumptions!

8 Now we have a simple solution! Send a message with proposal Upon receiving, take a decision based on certain agreed-upon function of all inputs (eg. give "A" priority over "B") Until proposal is received, keep asking for it Ultimately the other party will get your proposal as well and he’ll do the same F(v A,v B )

9 Our solution again... A B A decidesB decides F(v A,v B )

10 But does it really work? What if we have failures? –If the other process dead: we’ll never make progress. –And if we try making progress, it may turn out the other process wasn’t dead at all... Unsafe, decision could have been wrong!

11 Conclusions Liveness seems very hard to achieve –Can’t just "make up" for a missing value... –Can we achieve it via a smarter scheme? –But isn’t it the very thing we may give up? We could rely on probabilistic guarantees! Can’t do the same for correctness/nontriviality.

12 Part #2 Definitions

13 What is „distributed consensus”? Reaching an agreement among a set of distributed processes. –What is "agreement" ? –What does it mean to "reach" it ? –What "set of processes" ?

14 What is „distributed consensus”? Agreement: –All processes think „X”. X =... Let’s do something: commit, rollback transaction. The value of parameter „A” is now „50”. Process „200” is now the new group coordinator. –Handle failures: We don’t want one dead guy to hang the system A majority of processes needs to „think” so? –(think of overlapping)

15 What is „distributed consensus”? Agreement: –Seems to imply that no individual opinion should be „critical” to the final outcome Something is „critical”  progress in danger –Seems to imply that we need to rely on a form of majority voting... –Does it imply any „common knowledge” ? Can processes „change their minds” ? Can processes give up on agrement ?

16 What is „distributed consensus”? Reaching agreement: –So do all need to „think” at the same time ? What does it mean „at the same time” ? Consider consistent cuts and atomicity: –All would have to think „X” before some other „Y” –Leads us to the virtual synchrony model Maybe: all processes will eventually think „X” ?

17 What is „distributed consensus”? What is the set of processes involved: –A static, fixed set: All processes know each other’s names. –A form of common knowledge. –A fixed point of reference. –Built into the system or updated „consistently” (everywhere „at the same time” = „atomically”).

18 What is „distributed consensus”? Set of processes involved: –A dynamic set (of out some superset): For example: all alive nodes, a set of nearby nodes etc. Agreeing processes need to first agree on who’s there to agree with. With whom to agree on that, though? Consensus within a consensus: Group Membership. –Fixing membership upfront is a way to solve this recursive dependency.

19 What is „distributed consensus”? A static set of processes: –Processes may fail. Do we include failed processes in the set? Yes: Everybody becomes a single point of failure. No: What about the actions of faulty processes? –Could affect environment: eg. a teller machine. –May require everybody to do the same! No: Need a consistent way of reporting failures. –Failure Detectors, Oracles.

20 What is „distributed consensus”? Agreeing on failures. –Network partitioning. A B

21 What is „distributed consensus”? A method to collectively: –Make a decision according to majority will –Ensure that actions can be based on it, that conflicting decisions cannot be taken Almost by „definition” it’s a 2PC –Need to declare and learn intentions –Need to „secure” the decision made!

22 Do we really need „consensus”? Approach a bit „religious” –Why bother about liveness: Probabilistic guarantees are perfectly enough Could be quite good even without much effort –Why simultaneously: A „promise” to agree would often be enough... Ordering may be all that we care about –Common knowledge may not be important –Why solve all problems at once: Rely on oracles, failure detectors –What does „consensus” really „need” to be ”solved”?

23 Part #3 The Impossibility Result

24 What is „impossibility”? As defined, the problem is unsolvable......but has it been defined in the „right” way? –Are all the assumptions reasonable? Does the model sound right? Are the „required” properties really required? –Aren’t they too strong? –Are they intuitive, do they have interpretation? –Is the conclusion something I care about? What is „impossibility”, after all? Does this apply to any reallistic scenarios?

25 What did they really prove? Every protocol must necessarily have a „window of vulnerability” –A failure during this period may be fatal: –...may cause the protocol to get stuck –...may keep the protocol running forever Conclusion: –Accept non-liveness that as given –Change the approach: terminate any old protocols if no progress observed, then initiate new, clean rounds

26 System model Assumptions (weak): –Processes are modeled as automata: Can have infinitely many states Can have unbounded internal storage –Processes operate in steps Receive, work, atomically send multiple msgs –Communication via messages Asynchronous, nondeterministic...... but „fair” (messages eventually delivered) necessary inevitable this is the weakening assumption

27 System model Participants –N processes, N≥2 –Cooperative (a non-byzantine setting) –State: Distinguished input/output registers Unbounded internal storage, program counter –Behavior determined by input + transition f. –One-bit input x p (fixed at the beginning) –Write-once output y p, values  {b,0,1} decision statesundecided writing = making decision

28 System model Communication model: –A single message is a pair (p,m), m  M –Message buffer: a multi-set Contains all messages sent & not yet delivered –Operations supported: Send(p,m) – place message in buffer Receive(p,m) –delete some message (p,m) from buffer (message gets delivered), or... –...just return  (buffer stays unchanged) name of the destination process some fixed universe of all known messages

29 System model Communication guarantees: –Communication reliable: msgs  corrupted –Communication is nondeterministic: Don’t know when message gets delivered Can be delayed for a finite number of rounds –Other messages may be delivered first –Nothing may be delivered Messages can be reordered –Communication is „fair”: If receive is performed infinitely many times...... every message eventually gets delivered.

30 What really is deterministic here? Process ARE deterministic Environment is NOT –Environment can choose event sequence Like moving needles at different speeds! –Environment can feed a process either with events or with a „non-event” (call it  ) Deterministic automata with  -transitions System as a whole is nondeterministic

31 Part #4 The Proof

32 Our general strategy A typical proof by contradiction Show that we can’t have all properties  (C  N  L)  (C  N)   L Assume correctness and nontriviality......show that liveness isn’t guaranteed! –We therefore want to show that: Any protocol can be made forever indecisive

33 Our general strategy A little confusing... the proof is indirect Use Games with the Devil approach: –Exploit the inherent uncertainty –Construct sneaky (yet possible) scenarios: Communication is maliciously delayed The "red button" – we can "blow up" a process –An irresistible analogy to pumping lemma

34 Our general strategy ALERT!!! The danger of consensus! The danger eliminated (can deliver) the red button applied

35 A quick refresher on notation Configuration: –Internal states of each process –Contents of the message buffer Initial configuration: –Each process starts at an initial state –Message buffer is empty Initial state: –All values but those of input registers are fixed –In particular, output registers have value „b” Some configurations „have decision value” –A certain process is in a decision state undecided C1C1 C2C2 C3C3 C4C4 0 1

36 A quick refresher on notation Step –Configuration  Configuration –A primitive step by a single process „p”: Perform receive(p), obtain m  M   Depending on p’s internal state and m: –„p” enters a new state –„p” sends a finite number of messages –Determinism: For a given configuration C, step is uniquely determined by the message delivered C2C2 C1C1

37 A quick refresher on notation Event: –A pair e=(p,m) –Can be „applicable” to a configuration (p,  ) always applicable (p,m) applicable if message m is in the buffer –A function e(  ):  Uniquely determines a step in every C: e(C) = C’

38 A quick refresher on notation Schedule: –A sequence of events „  ” Can be finite or infinite Can be applied to C, producing C’ =  (C) –We say that such C’ is reachable from C –Config. reachable from initial config. is accessible Run: –A sequence of configurations Determined by C,  = (e 1, e 2,...) as (C, e 1 (C), e 2 (e 1 (C)),...)

39 Configurations and events C1C1 C2C2 e3e3 e1e1 e2e2 range of choice (applicable events) e4e4 C3C3 0 0 1

40 A quick refresher on notation Consensus protocol is: –„Partially correct” if: [correctness] No accesible configuration has more than one decision value. [nontriviality] Accessible configurations with both „0” and „1” decision values exist

41 A quick refresher on notation Process nonfaulty: takes  many steps –Eventually receives every message sent! Run is admissible if: –At most process is faulty –All messages sent to nonfaulty ones are eventually received (the „fairness”) Run is deciding if: –Some process reaches a decision state

42 A quick refresher on notation Consensus protocol is: –„Totally correct in spite of one fault” if: Partially correct [liveness] Every admissible run is deciding (every path in the „configurations tree” has a finite prefix that ends with some process in a deciding state)

43 Our general strategy Partially correct = correct + nontrivial Take a partially correct protocol Construct an infinite path that never enters configurations w. decision values –Via choosing the right sequence of events This will mean that the given protocol is not „totally correct in spite of one fault” –Such path represents admissible, nondeciding run Not totally correct = not live

44 Bivalent configurations Configuration in which, in a given protocol, the outcome is not determined –The protocol might lead to accepting „A”... –...but it might as well lead to accepting „B” Our proof by induction: –A) Show that initial configuration is bivalent –B) Show that we can force the protocol to produce bivalent configurations indefinitely

45 Bivalent configurations C1C1 0 single step taken (state transition) C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 C8C8 bivalent configuration 10111 univalent configuration

46 Analogy to the Pumping Lemma C1C1 C2C2 bivalent configurations could not deliver this message here... but now it’s okay, we are still bivalent e1e1 e1e1

47 Proof decomposed 1. Showing an existence of some initial bivalent configuration 2. Showing that we can get from one bivalent configuration into another... 3....in a way that every message gets delivered after a finite time.

48 Initial bivalent configuration Proof by contradiction:  Assume init. biv. config. doesn’t exist What would it mean, though? –Every set of inputs determines the outcome of the consensus algorithm –There exists a function that given the inputs, produces the decision –Our algorithm essentially „computes” this function –But one process may fail... –...so we might miss one of the input arguments! –Our algorithm sort of „tolerates” a loss of one bit –Note the analogy to error correcting codes!

49 Initial bivalent configuration Assume it doesn’t exists, then......there must exist 0-valent and 1-valent configurations (by partial correctness) Recall: this corresponds to „nontriviality”

50 Initial bivalent configuration Adjacent configurations: –Differ by value of a single input register p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p7p7 p8p8 11111000 C1C1 11101000 C2C2

51 Initial bivalent configuration Every two initial configurations are connected by a chain of adjacent ones 11111000 C1C1 11101000 C2C2 11101100 01101100

52 Initial bivalent configuration There must exist adjacent such a pair!!! What does it mean? –A single process „determines” the output! –In a sense, what this guy does is „critical”

53 Initial bivalent configuration Let C 0,C 1 be the univalent adjacent pair Let the „critical” process be P Take an admissible run from C 0 where P takes no steps (must exist... why?) Take a corresponding schedule  Apply  to C 1 – must lead to almost identical configurations (differences only in P’s state) C0C0 C1C1 e1e1 e2e2 enen e1e1 e2e2 enen Must reach the same decisions!!!

54 The Intuition How could the initial value of a process that didn’t communicate at all affect the outcome of the protocol?

55 Commutativity of schedules Assumption: –  1,  2 involve disjoint sets of processes Conclusion: –  1 applicable to  2 (C) –  2 applicable to  1 (C) –  1 (  2 (C)) =  2 (  1 (C)) Argument: –  1,  2 don’t „interact”

56 Commutativity of schedules XY CX1X1 Y1Y1 C 1 =  1 (C) X2X2 Y1Y1  2 (C 1 ) X2X2 Y2Y2 processes taking steps in  1 processes taking steps in  2

57 The inductive step Intuition: –We want to apply some event „e”......but we need to avoid univalent configs –How far can we get via delaying „e”...... so that we can safely apply „e” later? –We want to show we can apply „e” as every event eventually must be applied

58 The inductive step C bivalent e trouble univalent Where else can we get without applying e? we want to show that some of the pink guys are bivalent! (adding a delay) e applicable to each of the yellow guys

59 The inductive step Intuition: –If C was bivalent, there must be a way to delay e so that to get into another univalent state, different from e(C) –If things weren’t pre-determined in C, then there must exist an alternative scenario

60 The inductive step C define F i accordingly among the pink guys there must be some 0-valent E 0 and some 1-valent E 1 reachable from C (since C is bivalent) EiEi FiFi EiEi FiFi EiEi case1 case 2 case 3 the F i guys are i-valent: -- they are not bivalent -- they have path to E i there exist both 0-valent and 1-valent pink guys

61 The inductive step Intuition: –When at C, event e leads to some univalent configuration –When delayed to some C’, event e leads to another univalent configuration –Well... this change happens at some point in time, during a certain primitive step e’ !

62 The inductive step C E0E0 F0F0 F1F1 E1E1 there must exist neighbors C 0, C 1 such that D i = e(C i ) are i-valent (meet) C 0-valent 1-valent C1C1 C0C0 say it looks like this: e’ e e e’=(p’,m’) D0D0 D1D1

63 The inductive step Intuition: –Can it be that some process different from p is making this critical step e’ ? –No, since then we could delay his step and apply e first... but once we apply e, we are in a univalent configuration and applying e’ would make no difference (commutativity).

64 The inductive step C C1C1 C0C0 e’ e e e’=(p’,m’) Case #1: p  p’ e’ D0D0 D1D1  D 1 = e’(D 0 ) by commutativity... ...but this is wrong, 1-valent cannot follow 0-valent

65 The inductive step Intuition: –So both e, e’ are delivered to p –Aparently p is now the „critical” guy –Let’s kill the critical guy then! –The protocol must do some progress –Actions of other processes must now cause decision to be made –But then, what if we revive p ??? –Still, he was the critical guy, so delivering messages to him now should matter! –We will again refer to commutativity

66 The inductive step C C1C1 C0C0 e’ e e D0D0 D1D1 Case #2: p = p’  Consider any finite deciding run from C 0 in which p takes no steps, let  be the corresponding schedule  By commutativity,  is applicable to D i thus giving i-valent E i  Again by commutativity, we get  But that means that A is bivalent... and A is deciding!  We reached a contradiction A=  (C 0 )  e  e E1E1 E0E0 

67 Final construction A queue of processes maintained Buffer organized as FIFO queues In each step (roughly what happens): –Take a process from the process queue –Give him his earliest undelivered message –Put him at the end of the process queue Guarantees admissibility: –Every process takes infinitely many steps –Every message is eventually delivered

68 Final construction Okay, now what exactly happens: p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 process queue m1m1 m2m2 m3m3 m4m4 m5m5 message queue of p 1 C bivalent configuration C’ m1m1  a sequence guaranteeing that applying m 1 later will put us again in a bivalent configuration

69 Part #5 The Consensus Protocol

70 Consensus protocol Assumptions: –A majority of processes aren’t faulty (before the protocol starts) –No process dies during the protocol

71 Consensus Protocol: Phase #1 N=9 L=5

72 Consensus Protocol: Phase #2a N=9 L=5

73 Consensus Protocol: Phase #2b N=9 L=5

74 Consensus Protocol: Phase #2c N=9 L=5

75 Part #6 Paxos

76 What is Paxos? A practical algorithm (one of many) –Arguably most prominent –An underlying mechanism in real systems –Dynamic membership Processes may fail or restart at any time –Achieves simultaneous agreement –Does not event try guaranteeing liveness Simply start a new protocol if not sure

77 Part #7 Conclusions

78 What’s possible or impossible... first we need to ask the right question Consensus manifests in many ways and has many flavors to choose from We can only make probabilistic progress... and that’s fine, we accept is as given As a consequence actual protocols like Paxos used in practice keep aborting and restarting Overhead is always high, consensus is costly Consistency is not sacred, either...


Download ppt "Consensus Krzysztof Ostrowski"

Similar presentations


Ads by Google