Presentation is loading. Please wait.

Presentation is loading. Please wait.

Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

Similar presentations


Presentation on theme: "Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya."— Presentation transcript:

1 Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya

2 Reading List L. Lamport, R. Shostak, M. Pease, “The Byzantine Generals Problem,” ACM ToPLaS 1982. M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” OSDI 1999. 2

3 Byzantine Generals Problem A sender wants to send message to n-1 other peers Fault-free nodes must agree Sender fault-free  agree on its message Up to f failures

4 Byzantine Generals Problem A sender wants to send message to n-1 other peers Fault-free nodes must agree Sender fault-free  agree on its message Up to f failures

5 S 3 2 1 v v v Byzantine Generals Algorithm 5 value v Faulty peer

6 S 3 2 1 v v v 6 value v v v Byzantine Generals Algorithm

7 S 3 2 1 v v v 7 value v v v ? ? Byzantine Generals Algorithm

8 S 3 2 1 v v v 8 value v v v v ? v ? Byzantine Generals Algorithm

9 3 2 v v v 9 value v x v v ? v ? [v,v,?] S 1 Byzantine Generals Algorithm

10 3 2 v v v 10 value v x v v ? v ? v v S 1 Majority vote results in correct result at good peers Byzantine Generals Algorithm

11 S 3 2 1 v w x 11 Faulty source Byzantine Generals Algorithm

12 S 3 2 v w x 12 w w 1 Byzantine Generals Algorithm

13 S 3 2 v w x 13 x w w v x v 1 Byzantine Generals Algorithm

14 S 3 2 v w x 14 x w w v x v 1 [v,w,x] Byzantine Generals Algorithm

15 S 3 2 v w x 15 x w w v x v 1 [v,w,x] Vote result identical at good peers Byzantine Generals Algorithm

16 Known Results Need 3f + 1 nodes to tolerate f failures Need Ω(n 2 ) messages in general 16

17 Ω(n 2 ) Message Complexity Each message at least 1 bit Ω(n 2 ) bits “communication complexity” to agree on just 1 bit value 17

18 Practical Byzantine Fault Tolerance Computer systems provide crucial services Computer systems fail – Crash-stop failure – Crash-recovery failure – Byzantine failure Example: natural disaster, malicious attack, hardware failure, software bug, etc. Need highly available service 18 Replicate to increase availability

19 Challenges 19 Request A Request B Client

20 Requirements All replicas must handle same requests despite failure. Replicas must handle requests in identical order despite failure. 20

21 Challenges 21 2: Request B 1: Request A Client

22 State Machine Replication 22 2: Request B 1: Request A 2: Request B 1: Request A 2: Request B 1: Request A 2: Request B 1: Request A Client How to assign sequence number to requests?

23 Primary Backup Mechanism 23 Client 2: Request B 1: Request A What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) What if the primary is faulty? Agreeing on sequence number Agreeing on changing the primary (view change) View 0

24 Normal Case Operation Three phase algorithm: – PRE-PREPARE picks order of requests – PREPARE ensures order within views – COMMIT ensures order across views Replicas remember messages in log Messages are authenticated – {.} σk denotes a message sent by k 24

25 Pre-prepare Phase Primary: Replica 0 Replica 1 Replica 2 Replica 3 Request: m {PRE-PREPARE, v, n, m} σ0 Fail 25

26 Prepare Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail Accepted PRE-PREPARE 26

27 Prepare Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail {PREPARE, v, n, D(m), 1} σ1 Accepted PRE-PREPARE 27

28 Prepare Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail {PREPARE, v, n, D(m), 1} σ1 Accepted PRE-PREPARE Collect PRE-PREPARE + 2f matching PREPARE 28

29 Commit Phase Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail PREPARE {COMMIT, v, n, D(m)} σ2 29

30 Commit Phase (2) Request: m PRE-PREPARE Primary: Replica 0 Replica 1 Replica 2 Replica 3 Fail PREPARE COMMIT Collect 2f+1 matching COMMIT: execute and reply 30

31 View Change Provide liveness when primary fails – Timeouts trigger view changes – Select new primary (= view number mod 3f+1) Brief protocol – Replicas send VIEW-CHANGE message along with the requests they prepared so far – New primary collects 2f+1 VIEW-CHANGE messages – Constructs information about committed requests in previous views 31

32 View Change Safety Goal: No two different committed request with same sequence number across views Quorum for Committed Certificate (m, v, n) At least one correct replica has Prepared Certificate (m, v, n) At least one correct replica has Prepared Certificate (m, v, n) View Change Quorum View Change Quorum 32

33 Related Works Fault Tolerance Fail Stop Fault Tolerance Paxos 1989 (TR) VS Replication PODC 1988 Byzantine Fault Tolerance Byzantine Agreement Rampart TPDS 1995 SecureRing HICSS 1998 PBFT OSDI ‘99 BASE TOCS ‘03 Byzantine Quorums Malkhi-Reiter JDC 1998 Phalanx SRDS 1998 Fleet ToKDI ‘00 Q/U SOSP ‘05 Hybrid Quorum HQ Replication OSDI ‘06 33


Download ppt "Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya."

Similar presentations


Ads by Google