Presentation is loading. Please wait.

Presentation is loading. Please wait.

HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.

Similar presentations


Presentation on theme: "HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba."— Presentation transcript:

1 HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba Shrira 3 1 MIT CSAIL 2 INESC-ID and Instituto Superior Técnico 3 Brandeis University

2 Byzantine Fault Tolerance ›Reliable client-server distributed systems » Server replicated across group of replica machines ›General operations ›Bounded number f of Byzantine replicas ›Must ensure correct system state » Consistent ordering of client operations

3 State of the Art ›Approaches: » State Machine Replication – BFT  3f+1 replicas » Byzantine Quorums – Q/U  5f+1 replicas  Increased performance  Degradation when writes contend

4 Contributions ›Low overhead Byzantine Fault Tolerance » Performance of Byzantine Quorums without 5f+1 replicas or contention degradation ›Hybrid Quorum scheme for Byzantine Fault Tolerance » Quorum approach in normal-case » Use Byzantine agreement to resolve write contention

5 Outline ›Current Approaches ›HQ Replication ›BFT Improvements ›Performance Evaluation ›Conclusions

6 State Machine Replication ›BFT - Castro and Liskov TOCS ’02 » Operations ordered by primary » Agreed upon by replicas Client Primary Replica 2 Replica 3 Replica 4 RequestPre-PreparePrepareCommitReply

7 Byzantine Quorums ›Q/U - Abd-El-Malek et al. SOSP ’05 ›Client controlled protocol » Replicas order operations independently ›Optimistic » Best case one-phase protocol » Worst case unbounded  Randomized backoff Client Replica 1 Replica 2 Replica 3 Replica 4 Replica 5 UpdateReply Replica 6

8 Advantages/Disadvantages BFT ›Good » 3f+1 replicas » Bounded number of phases ›Bad » Higher latency » Quadratic communication Q/U ›Good » Best-case performance  One-phase write  Low replica load ›Bad » 5f+1 replicas » Degraded performance when writes contend

9 Outline ›Current Approaches ›HQ Replication » Normal-case Protocol » Contention Resolution ›BFT Improvements ›Performance Evaluation ›Conclusions

10 HQ Replication ›3f+1 replicas ›Supports general operations ›No all-to-all communication in normal- case ›BFT used to resolve contention

11 HQ Replication Client Replica 1 Replica 2 Replica 3 Replica 4 Write1Write1 OKWrite2Write2 OK ›One-phase read ›Two-phase write

12 System Architecture (remove this?)

13 High-level Write Protocol ›Two-phase write protocol ›Phase 1: » Client obtains timestamp grant from each replica ›Phase 2: » Client forms certificate from 2f+1 matching grants » Sends to replicas to complete write

14 Grants ›Promise to execute operation at given sequence number » Assuming agreement from quorum ›Grant » Client ID » Object ID » Hash over requested operation » Sequence Number (timestamp) » Replica signature

15 Certificates ›Certificate » Quorum (2f+1) matching grants ›Proves quorum of replicas agree to ordering of operation » Uniquely identify client, operation and sequential ordering » Existence of certificate precludes existence of conflicting certificate

16 Replica State ›Multiple independent objects ›State per-object » Certificate supporting most recent write » Operation status  Active –Write in progress, outstanding grant  Quiescent –No current write operation

17 Write Phase 1 ›Client sends write request to replicas » If quiescent, replica assigns new grant to client » If active, replica sends currently outstanding grant ›Several Possibilities » All grants match » Grants for different client » Grants conflict

18 Isolated Write

19 client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant

20 Isolated Write client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant Write A

21 Isolated Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Write A

22 Isolated Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Grant 1 Grant 2 Grant 3

23 Isolated Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Matching grants: Phase 2 write Grant 1 Grant 2 Grant 3

24 Isolated Write client 1replica 1replica 2replica 3 Cert {G 1,G 2,G 3 } State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant Matching grants: Phase 2 write

25 Isolated Write client 1replica 1replica 2replica 3 execute A Cert {G 1,G 2,G 3 }

26 Isolated Write client 1replica 1replica 2replica 3 State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant Result A

27 Isolated Write client 1replica 1replica 2replica 3 State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant result Write Complete Result A

28 Incomplete Write

29 client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2

30 Incomplete Write client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A

31 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Write A

32 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Grant 1 Grant 2 Grant 3

33 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Client 1 slow or failed Grant 1 Grant 2 Grant 3

34 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Write B

35 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Grant 1 Grant 2 Grant 3 Replicas active: Return current grant

36 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Grants for different client: Perform Writeback Grant 1 Grant 2 Grant 3

37 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant client 2 Cert {G 1,G 2,G 3 }, Write B Grants for different client: Perform Writeback

38 Incomplete Write client 1replica 1replica 2replica 3 client 2 execute A Cert {G 1,G 2,G 3 }, Write B

39 Incomplete Write client 1replica 1replica 2replica 3 State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: 1 Seq No: 1 Operation: A Grant client 2 Cert {G 1,G 2,G 3 }, Write B

40 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant client 2 Grant 1 Grant 2 Grant 3

41 Incomplete Write client 1replica 1replica 2replica 3 State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant State: Active Client: 2 Seq No: 2 Operation: B Grant client 2 Matching grants: Phase 2 write Grant 1 Grant 2 Grant 3

42 Write Contention

43 client 1replica 1replica 2replica 3 State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A

44 Write Contention client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A

45 Write Contention client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Quiescent Client: ? Seq No: 0 Operation: ? Grant client 2 Write A Write B Write A

46 Write Contention client 1replica 1replica 2replica 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant client 2 Write A Write B Write A

47 Write Contention client 1replica 1replica 2replica 3 client 2 Grant 1 Grant 2 Grant 3 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant

48 Write Contention client 1replica 1replica 2replica 3 client 2 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant Conflicting grants: Request resolution Grant 1 Grant 2 Grant 3

49 Write Contention client 1replica 1replica 2replica 3 client 2 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant Cert {G 1,G 2,G 3 } Conflicting grants: Request resolution Resolve Request

50 Write Contention client 1replica 1replica 2replica 3 client 2 State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 1 Seq No: 1 Operation: A Grant State: Active Client: 2 Seq No: 1 Operation: B Grant Contention Resolution Cert {G 1,G 2,G 3 } Resolve Request

51 Write Contention client 1replica 1replica 2replica 3 client 2 execute A Cert {G 1,G 2,G 3 } Resolve Request

52 Write Contention client 1replica 1replica 2replica 3 client 2 execute B Cert {G 1,G 2,G 3 } Resolve Request

53 Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 Result A

54 Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 result Result A

55 Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 Result B

56 Write Contention client 1replica 1replica 2replica 3 State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant State: Quiescent Client: 2 Seq No: 2 Operation: B Grant client 2 result Result B

57 Contention Resolution ›BFT module used to resolve contention » Establish sequential order on contending ops ›On receiving resolve request: » Freeze local object state » Send state to primary ›Primary runs BFT on combined state ›Replicas execute contending operations

58 Read Protocol ›Client sends read request to replicas ›Replica returns current object state » Supported by previous write certificate ›Read complete if quorum of matching responses » Writeback used to retry if responses inconsistent

59 Additional Details ›Read protocol ›State transfer ›Multi-object transactions ›Performance enhancements

60 Performance Enhancements ›Preferred quorums » Core protocol run by only 2f+1 replicas ›Symmetric-key cryptography » Authenticators instead of signatures  Collection of 3f+1 MACs » Lower CPU overhead

61 BFT Improvements ›Preferred quorums » Reduces degree of quadratic communication ›Single MAC per message » Significant improvements over authenticators

62 Outline ›Current Approaches ›HQ Replication ›BFT Improvements ›Performance Evaluation » Analysis » Experiments ›Conclusions

63 Non-Contention Message Overhead Messages sent/received at each replica per write request

64 Non-Contention Bandwidth Use Total bandwidth at each replica per write request

65 Experimental Setup ›HQ and BFT prototypes deployed on Emulab » Up to 16 replicas (f=5), 200 clients (4 per machine) ›New BFT codebase ›Implement counter service » Negligible operation payload » Multiple objects  Private non-contention objects  Shared contention object

66 Non-contention Throughput Maximum operation throughput

67 Resilience to Contention Throughput degradation with increasing write-contention

68 Resilience to Contention Throughput degradation with increasing write-contention new

69 BFT Batching ›BFT allows batching at primary ›Greatly reduces internal protocol communication ›Increased delay Client Primary Replica 1 Replica 2 Replica 3 RequestPre-PreparePrepareCommitReply once per batch

70 Batched Performance Effect of BFT batching on maximum write throughput

71 Recommendations ›Use Q/U when » Latency critical » Contention low » 5f+1 replicas acceptable ›Use HQ when » Low latency important » Moderate contention ›Use BFT when » Contention high » Throughput more important than latency

72 Conclusions ›First Byzantine Quorum protocol with 3f+1 replicas » Supports general operations » Resilient to Byzantine clients ›Introduced Hybrid technique » Resolve contention without performance degradation » Applicable to general quorum systems ›Found optimized BFT to perform well under high load

73 Questions?

74 Further Details ›HQ Replication: Properties and optimizations » James Cowling, Daniel Myers, Barbara Liskov, Rodrigo Rodrigues and Liuba Shrira. Technical Memo In Prep., MIT Computer Science and Artificial Laboratory, Cambridge, Massachusetts, 2006. ›Contact: » cowling@csail.mit.edu » http://people.csail.mit.edu/cowling/

75 Write-back Operation ›Write certificate paired with a subsequent request ›Used to ensure progress with slow replicas or clients » Completes phase 2 for a slow client » Advances state of slow replicas ›Replica processes write phase 2 based on certificate, then the paired request

76

77 Backups…

78 Slow Replicas ›Some grants in quorum have old timestamp ›Perform writeback to slow replicas, using certificate provided with highest grant » Brings replicas up to date and solicits new grants

79 Why 3f+1? ›3f+1 replicas » f of which can be faulty ›2f+1 agree on any ordering » f of these may be Byzantine » The remaining f may be slow ›Maximum of 2f can respond with old system state, but not 2f+1

80 ›Won’t HQ have a higher rate of contention since it’s two phase (higher latency) than Q/U? » No – contention window only between first replica receives phase 1 request to last replica receives it. Hence independent of two-phase, and actually smaller than in Q/U


Download ppt "HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba."

Similar presentations


Ads by Google