Introduction r Coping with failures in computer systems r Failed component sends conflicting information to different parts of system. r Agreement in the presence of faults. r P2P Networks? Good nodes have to “ agree to do the same thing ”. m Faulty nodes generate corrupted and misleading messages. m Non-malicious: Software bugs, hardware failures, power failures m Malicious reasons: Machine compromised.
r Generals = Computer Components r The abstract problem… m Each division of Byzantine army is directed by its own general. m There are n Generals, some of which are traitors. m All armies are camped outside enemy castle, observing enemy. m Communicate with each other by messengers. m Requirements: G1: All loyal generals decide upon the same plan of action G2: A small number of traitors cannot cause the loyal generals to adopt a bad plan m Note: We do not have to identify the traitors.
Reduction of General Problem Byzantine Generals Problem (BGP): m A commanding general (commander) must send an order to his n-1 lieutenants. r Interactive Consistency Conditions: m IC1: All loyal lieutenants obey the same order. m IC2: If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. r Note: If General is loyal, IC2 => IC1.
3-General Impossibly Example r 3 generals, 1 traitor among them. r Two messages: Attack (A) or Retreat (R) r Shaded – Traitor r L1 sees (A,R). Who is the traitor? C or L2? r Fig 1: L1 has to attack to satisfy IC2. r Fig 2: L1 attacks, L2 retreats. IC1 violated.
General Impossibility r In general, no solutions with fewer than 3m+1 generals can cope with m traitors. r Proof by contradiction. m Assume there is a solution for 3m Generals with m traitors. m Reduce to 3-General problem.
Solution I – Oral Messages r If there are 3m+1 generals, solution allows up to m traitors. r Oral messages – the sending of content is entirely under the control of sender. r Assumptions on oral messages: m A1 – Each message that is sent is delivered correctly. m A2 – The receiver of a message knows who sent it. m A3 – The absence of a message can be detected. r Assumes: m Traitors cannot interfere with communication as third party. m Traitors cannot send fake messages m Traitors cannot interfere by being silent. Default order to “ retreat ” for silent traitor.
Oral Messages (Cont) r Algorithm OM(0) m Commander send his value to every lieutenant. m Each lieutenant (L) use the value received from commander, or RETREAT if no value is received. r Algorithm OM(m), m>0 1. Commander sends his value to every Lieutenant (v i ) 2. Each Lieutenant acts as commander for OM(m-1) and sends v i to the other n-2 lieutenants (or RETREAT) 3. For each i, and each j<>i, let v j be the value lieutenant i receives from lieutenant j in step (2) using OM(m-1). Lieutenant i uses the majority of (v 1, …, v n-1 ). 4. Why j<>i? “ Trust myself more than what others said I said. ”
Restate Algorithm r OM(M): m Commander sends out command. m Each lieutenant acts as commander in OM(m-1). Sends out command to other lieutenants. m Use majority to compute value based on commands received by other lieutenants in OM(m- 1) r Revisit Interactive Consistency goals: m IC1: All loyal lieutenants obey the same command. m IC2: If the commanding general is loyal, then every loyal lieutenant obeys the command he sends.
Example (n=4, m=1, L3 is traitor) C L1L2L3 v v v r In OM(1) Commander (C) sends command to L1, L2,L3
Example (n=4, m=1, L3 is traitor) C L1L2L3 v v r In OM(0) L1 sends command to L2,L3
Example (n=4, m=1, L3 is traitor) C L1L2L3 v v r In OM(0) L2 sends command to L1,L3
Example (n=4, m=1, L3 is traitor) C L1L2L3 x v r In OM(0) L3 sends command to L1,L2
Example (n=4, m=1, L3 is faulty) r L1 m L1 receives “v” from commander “v” from L2 “v” from L3 m Majority(v,v,x) is v r L2 m L2 receives “v” from commander “v” from L1 “x” from L3 m Majority(v,v,x) is v
Example (n=4, m=1, C is traitor) C L1L2L3 x y z r In OM(1) Commander (C) sends command to L1, L2,L3
Example (n=4, m=1, C is traitor) C L1L2L3 x x r In OM(0) L1 sends command to L2,L3
Example (n=4, m=1, C is traitor) C L1L2L3 y y r In OM(0) L2 sends command to L1,L3
Example (n=4, m=1, C is traitor) C L1L2L3 z z r In OM(0) L3 sends command to L1,L2
Example (n=4, m=1, C is faulty) r L1 m L1 receives “x” from commander “y” from L2 “z” from L3 m Majority(x,y,z) is default value r L2 m L2 receives “y” from commander “x” from L1 “z” from L3 m Majority(x,y,z) is default value
Example (n=4, m=1, L3 is faulty) r L1, L2,L3 satisfy IC1 r IC2 is irrelevant since commander is traitor
Expensive Communication r OM(m) invokes n-1 OM(m-1) r OM(m-1) invokes n-2 OM(m-2) r OM(m-2) invokes n-3 OM(m-3) r … r OM(m-k) will be called (n-1)…(n-k) times r O(n m ) – Expensive!
Problem r Lots of messages required to handle even 1 faulty process r Need minimum 4 processes to handle 1 fault, 7 to handle 2 faults, etc. m But as system gets larger, probability of a fault also increases r If we use signed messages, instead of oral messages, can handle f faults with 2f+1 processes m Simple majority requirement m Still lots of messages sent though, plus cost of signing
Summary r BGP solutions are expensive (communication overheads and signatures) r Use of redundancy and voting to achieve reliability. What if >1/3 nodes (processors) are faulty? r 3m+1 replicas for m failures. Is that expensive? Tradeoffs between reliability and performance (E.g. Oceanstore ’ s primary and secondary replicas) r How would you determine m in a practical system?