Practical Byzantine Fault Tolerance

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
1 Attested Append-Only Memory: Making Adversaries Stick to their Word Byung-Gon Chun (ICSI) October 15, 2007 Joint work with Petros Maniatis (Intel Research,
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
Yee Jiun Song Cornell University. CS5410 Fall 2008.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07.
CS 582 / CMPE 481 Distributed Systems
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
Attested Append-only Memory: Making Adversaries Stick to their Word Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Byzantine fault tolerance
Byzantine Fault Tolerance CS 425: Distributed Systems Fall Material drived from slides by I. Gupta and N.Vaidya.
1 The Design of a Robust Peer-to-Peer System Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Reference: SIGOPS European Workshop.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Practical Byzantine Fault Tolerance Jayesh V. Salvi
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
Byzantine fault tolerance
Practical Byzantine Fault Tolerance and Proactive Recovery
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2012 Lecture 26 November 29, 2012 Presented By: Imranul Hoque 1.
EECS 262a Advanced Topics in Computer Systems Lecture 25 Byzantine Agreement November 28 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
EEC 688/788 Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Byzantine Fault Tolerance
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
BChain: High-Throughput BFT Protocols
Byzantine Fault Tolerance
Byzantine Fault Tolerance
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
Principles of Computer Security
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Building Dependable Distributed Systems, Copyright Wenbing Zhao
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
Presentation transcript:

Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005

Practical Byzantine Fault Tolerance What is a Byzantine fault? Rationale for Byzantine Fault Tolerance BFT Algorithm Conclusion

What is a Byzantine fault? Arbitrary node behavior Failure to return a result Return of an incorrect result Return of a deliberately misleading result Return of a differing result to different parts of the system Source: Byzantine Generals Problem, Lamport, Shostak, and Pease (1982)

Rationale for BFT Guard against malicious attacks Prevent faulty code at a single node from corrupting the system Ultimate goal: provide system consistency even when nodes may be inconsistent Useful in distributed areas like file servers or automated control systems where state is very important

Overview of Solution n generals need to achieve consensus f generals may be traitors Consider a voting algorithm If a general sees f + 1 identical responses, that response must be correct

Simple Example Consider four replicas trying to agree on the value of a single bit (attack/don't attack)

Simple Example All replicas send their value to the other replicas

Simple Example Now, all replicas send their entire vector to all other replicas 2 sends values for <2,3,4> to 1 and <1,2,4> to 3

Simple Example Result is the most frequent value in the vector

Simple Example Question: in this example we had 4 replicas, one of which was faulty. Would this work with 3 replicas, one of which was faulty? Hint: this is an asynchronous environment

BFT Algorithm Algorithm discussion Overview Details BFS Evaluation

BFT Algorithm Overview Previous work was slow-running or relied on synchrony for safety. This algorithm (BFT) provides safety and liveness over an asynchronous network. Safety: the system maintains state and looks to the client like a non-replicated remote service. Safety includes a total ordering of requests. Liveness: clients will eventually receive a reply to every request sent, provided the network is functioning.

BFT Algorithm Overview Based on state machine replication Messages signed by public key cryptography Message digests created using collision- resistant hash functions Uses consensus and propagation of system views: state is only modified when the functioning replicas agree on the change

BFT Algorithm Overview For n clients, there are n 'views', {0..n-1}. In view i, node i is the primary node View change is increment mod n View change occurs when 2f nodes believe the primary has failed Guaranteed safety and liveness provided less than = f replicas have failed.

BFT Algorithm: Normal Operation The client sends a request to the primary. The primary assigns the request a sequence number and broadcasts this to all replicas (pre-prepare). The replicas acknowledge this sequence number (prepare). Once 2f prepares have been received, a client broadcasts acceptance of the request (commit).

BFT Algorithm: Normal Operation Once 2f +1 commits have been received, a client places the request in the queue. 1.In a non-faulty client, the request queue will be totally ordered by sequence number. Once all prior requests have been completed, the request will be executed and the result sent directly to the client. All these messages are logged.

BFT Algorithm: Normal Operation Phase 1: Client sends a request to the primary. The primary can then validate the message and propose a sequence number for it.

BFT Algorithm: Normal Operation Phase 2: Primary sends pre-prepare message to all backups. This allows the backups to validate the message and receive the sequence number.

BFT Algorithm: Normal Operation Phase 3: All functioning backups send prepare message to all other backups. This allows replicas to agree on a total ordering.

BFT Algorithm: Normal Operation Phase 4: All replicas multicast a commit. The replicas have agreed on an ordering and have acknowledged the receipt of the request.

BFT Algorithm: Normal Operation Phase 5: Each functioning replica sends a reply directly to the client. This bypasses the case where the primary fails between request and reply.

BFT Algorithm: View Changes What if the primary is faulty? The client uses a timeout. When this timeout expires, the request is sent to all replicas. If a replica already knows about the request, the rebroadcast is ignored. If the replica does not know about the request, it will start a timer. On timeout of this second timer, the replica starts the view change process.

BFT Algorithm: View Changes If a replica's timer expires, it sends a view change message. This message contains the system state (in the form of archived messages) so that other nodes will know that the replica has not failed. If the current view is v, node v+1 (mod n) waits for 2f valid view-change messages.

BFT Algorithm: View Changes Once v+1 has seen 2f view-change messages, it multicasts a new-view message This message contains all the valid view change messages received by v+1 as well as a set O of all requests that may not have been completed yet (due to primary failure). After a replica receives a valid view-change message, it enters view v+1 and processes O While view change is occurring, no new requests are accepted.

BFT Algorithm: Client's perspective The client must be BFT-aware: Must implement timeout for view-change Must wait for replies directly from replicas The client waits for f+1 replies, then accepts the result. Seamless integration requires three-tier approach.

BFT Algorithm: Client's Perspective In order to provide seamless interaction with the calling code, there should be a client layer between the replicas and the program.

BFT Algorithm: Evaluation 3f+1 replicas required--expensive! # Replicas f

BFT Algorithm: Evaluation Problems Not scalable Significant overhead However Provides Byzantine fault tolerance that can be used in real-world applications

Questions?

BFT Algorithm: BFS The authors implemented a Byzantine Fault- Tolerant NFS system called BFS. BFS is comparable in performance to NFS on average, while providing tolerance for Byzantine faults. View changes not implemented in BFS.

BFT Optimizations Optimization Checkpoints/Garbage Collection Reducing communication Message Authentication Codes

BFT Optimizations: Checkpoints Pre-prepare, prepare, and commit messages are stored to provide proof of correctness. Storing these messages can be expensive. Instead, a checkpoint system is used A checkpoint size c is set A proof of correctness is generated when s mod c = 0 This is called a checkpoint.

BFT Optimizations: Checkpoints After a checkpoint is produced, a checkpoint message is multicast Once 2f+1 checkpoint messages have been collected, that checkpoint is considered stable and all archived messages with s less than the checkpoint number are discarded. All earlier checkpoints are also discarded.

Optimizations: Reducing Messages This protocol is very message intensive, but there are three ways it can be altered to limit traffic: Single result The client request designates only one replica to send the result, and others just send digests If the correct result is not received, the client requests that all nodes send the replies. Only useful for large replies

Optimizations: Reducing Messages Tentative replies If a replica's queue is empty, it can compute the result upon the receipt of 2f prepare messages. The replicas then send these tentative replies to the client. If the client receives 2f+1 matching tentative replies, this is equivalent to a commit. If not, it retransmits the request and waits for f+1 committed replies.

Optimizations: Reducing Messages Read-only requests The client can transmit read-only requests directly to all replicas. After verifying the request, the reply can be processed and sent directly to the client. If the client receives 2f+1 identical replies, it accepts the result. If not, it retransmits the request as a normal (read/write) operation.