Distributed Systems, Consensus and Replicated State Machines

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
Teaser - Introduction to Distributed Computing
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
Composition Model and its code. bound:=bound+1.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
The consensus problem in distributed systems
CS 525 Advanced Distributed Systems Spring 2013
Distributed Systems – Paxos
Distributed Computing
Lecture 17: Leader Election
Distributed Systems: Paxos
EECS 498 Introduction to Distributed Systems Fall 2017
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Implementing Consistency -- Paxos
CS 525 Advanced Distributed Systems Spring 2018
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
FLP Impossibility & Weakest Failure Detector
Replication Improves reliability Improves availability
Presented By: Md Amjad Hossain
Fault-tolerance techniques RSM, Paxos
PERSPECTIVES ON THE CAP THEOREM
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Consensus, FLP, and Paxos
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Consensus Algorithms.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Failure Detectors motivation failure detector properties
Paxos Made Simple.
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Distributed systems Consensus
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Distributed Systems, Consensus and Replicated State Machines Ali Ghodsi – UC Berkeley alig(at)cs.berkeley.edu

What’s a distributed system? “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. “ Leslie Lamport Ali Ghodsi, alig@cs 2

Two Generals’ Problem Two generals need to coordinate an attack Must agree on time to attack They’ll win only if they attack simultaneously Communicate through messengers Messengers may be killed on their way Ali Ghodsi, alig@cs 3

Two Generals’ Problem Lets try to solve it for general g1 and g2 g1 sends time of attack to g2 Problem: how can g1 ensure g2 received msg? Solution: let g2 ack receipt of msg, then g1 attacks Problem: how can g2 ensure g1 received ack? Solution: let g1 ack the receipt of the ack… … This problem is impossible to solve! Ali Ghodsi, alig@cs 4

Teaser: Two Generals’ Problem Applicability to distributed systems Two nodes need to agree on a value Communicate by messages using an unreliable channel Agreement is a core problem… Ali Ghodsi, alig@cs 5

How it all started Commit protocols for databases Needed to implement Atomicity in ACID Every node needs to agree on COMMIT/ABORT Two-phase commit known since 1976 Problem? Centralized and blocking if coordinator fails Ali Ghodsi, alig@cs

Bad news theorem After years of people solving the problem, bunch of authors proved that a basic version of it is impossible in most circumstances Ali Ghodsi, alig@cs

Consensus: agreeing on a number Consensus problem All nodes propose a value Some nodes might crash & stop responding The algorithm must ensure: All correct nodes eventually decide Every node decides the same Only decide on proposed values Ali Ghodsi, alig@cs 8

Bad news theorem (FLP85) In an asynchronous system, even with only one failure, Consensus cannot be solved In asynchronous systems, message delays can be arbitrary, i.e. not bounded If cannot do it with 1 failure, definitely cannot do it with n>1 failures Internet is essentially an asynchronous system But Consensus != Atomic Commit Ali Ghodsi, alig@cs

Consensus is Important Atomic Commit Only two proposal values {commit, abort} Only decide commit if all nodes vote commit This related problem is even harder to solve than consensus Also impossible in asynchronous systems  Ali Ghodsi, alig@cs 10

Reliable Broadcast Problem A node broadcasts a message If sender correct, all correct nodes deliver msg All correct nodes deliver same messages Very simple solution, works in any environment Algo: Every node broadcast every message O(N2) Ali Ghodsi, alig@cs 11

Atomic Broadcast Problem A node broadcasts a message If sender correct, all correct nodes deliver msg All correct nodes deliver same messages Messages delivered in the same order Ali Ghodsi, alig@cs 12

Atomic Broadcast=Consensus Given Atomic broadcast Can use it to solve Consensus. How? Every node broadcasts its proposal Decide on the first received proposal Messages received same order All nodes will decide the same Given Consensus Can use it to solve Atomic Broadcast. How? Atomic Broadcast equivalent to Consensus Ali Ghodsi, alig@cs 13

Possibility of Consensus Consensus solvable in synchronous system with up to N/2 crashes Synchronous system has a bound on message delay Intuition behind solution Accurate crash detection Every node sends a message to every other node If no msg from a node within bound, node has crashed Not useful for Internet, how to proceed? Ali Ghodsi, alig@cs 14

Modeling the Internet But Internet is mostly synchronous Bounds respected mostly Occasionally violate bounds (congestion/failures) How do we model this? Partially synchronous system Initially system is asynchronous Eventually the system becomes synchronous Ali Ghodsi, alig@cs 15

Failure detectors Let each node use a failure detector Detects crashes Implemented by heartbeats and waiting Might be initially wrong, but eventually correct Consensus and Atomic Broadcast solvable with failure detectors Obviously, those FDs are impossible too But useful to encapsulate all asynchrony assumptions inside FD algorithm Ali Ghodsi, alig@cs 16

Useless failure detectors How do we create a failure detector with no false-negatives? i.e., never say a failed node is correct How do we create a failure detector with no false-positives? i.e., never say a correct node is failed Ali Ghodsi, alig@cs

Eventual Perfect FD Eventually perfect failure detector Properties Every failed node is eventually detected as failed How to implement? Eventually, no correct node is detected as failed Properties Initially, all bets are off and the FD might output anything Eventually, all nodes will not give false-positives Ali Ghodsi, alig@cs

Failure detection and Leader Election Leader election (LE) is a special case of failure detection Always suspect every node, but one correct node, as failed Implement LE with eventual perfect FD How? Pick highest ID correct node as leader Ali Ghodsi, alig@cs

All problems solved With LE we can solve Atomic Commit Atomic Broadcast Eventual Perfect Failure Detection Consensus Consensus algorithm Paxos implemented with LE Ali Ghodsi, alig@cs

One special failure detector Omega failure detector Every failed node is eventually detected as failed How to implement? Eventually, at least one correct node will not be suspected as failed by any node Properties Initially, all bets are off and the FD might output anything Eventually, all nodes will not give false-positives w.r.t. at least one node Ali Ghodsi, alig@cs

Failure Detection and Consensus Omega is the weakest failure detector needed to solve Consensus Second most important results of distributed systems Ali Ghodsi, alig@cs

RSM? So many problems Replicated State Machine (RSM) approach All interrelated How should we build distributed systems? Replicated State Machine (RSM) approach Model your application as an RSM Replicate your RSM to N servers Clients/users submits inputs to all servers Servers run agree on the order of inputs All servers will have the same state and output Ali Ghodsi, alig@cs

RSM (2) Advantage of RSM? Distributed file system example Make any application trivially fault tolerant Distributed file system example Each server implements a filesystem Each input (read/write) run through consensus Voila: fault tolerant FS Ali Ghodsi, alig@cs

Paxos vs RSM Paxos vs RSM? Adapting Paxos for RSM Use Paxos to agree on input order to RSM Adapting Paxos for RSM Paxos takes 2 round-trips, but RSM optimizations make it 1 round trip, how? Prepare phase doesn’t need actual values, run Prepare for thousands of inputs at once How do you add/remove nodes to RSM? (Reconfig) Use Paxos to agree on set of nodes in the system Ali Ghodsi, alig@cs

Raft vs Paxos Paxos initially for Consensus Easy to understand correctness, but harder o know how to implement the algorithm Need all optimizations to make it perfect for RSM Need to implement reconfiguration on-top A family of Paxos algorithms Ali Ghodsi, alig@cs

Raft vs Paxos Raft purpose made algorithm for RSM Less declarative, more imperative Leader election, leader replicates log Supports reconfiguration Many implementations Ali Ghodsi, alig@cs

Summary Distributed systems are hard to build Failures Parallelism Many problems interrelated Consensus, Atomic Commit Atomic Broadcast Leader election, Failure detection Replicated state machine Generic approach to make any DS fault-tolerant Can be implemented with Paxos or Raft Raft more straight forward to implement Ali Ghodsi, alig@cs