Distributed Systems Overview Ali Ghodsi

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Impossibility of Distributed Consensus with One Faulty Process
CS 542: Topics in Distributed Systems Diganta Goswami.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
Teaser - Introduction to Distributed Computing
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.
6.852: Distributed Algorithms Spring, 2008 Class 7.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
CS294, YelickConsensus, p1 CS Consensus
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.
Chapter 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus and Its Impossibility in Asynchronous Systems.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
SysRép / 2.5A. SchiperEté The consensus problem.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
The consensus problem in distributed systems
Distributed Systems – Paxos
Implementing Consistency -- Paxos
Distributed Consensus
Distributed Systems, Consensus and Replicated State Machines
FLP Impossibility & Weakest Failure Detector
Fault-tolerance techniques RSM, Paxos
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
Consensus, FLP, and Paxos
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
Distributed systems Consensus
Presentation transcript:

Distributed Systems Overview Ali Ghodsi

Replicated State Machine (RSM) Distributed Systems 101 – Fault-tolerance (partial, byzantine, recovery,...) – Concurrency (ordering, asynchrony, timing,...) Generic solution for distributed systems: Replicated State Machine approach – Represent your system with a deterministic state machine – Replicate the state machine – Feed input to all replicas in the same order

Total Order Reliable Broadcast aka Atomic Broadcast Reliable broadcast – All or none correct nodes get the message (even if src fails) Atomic Broadcast – Reliable broadcast that guarantees: All messages delivered in the same order Replicated state machine trivial with atomic broadcast

Consensus? Consensus problem – All nodes propose a value – All correct nodes must agree on one of the values – Must eventually reach a decision (availability) Atomic Broadcast → Consensus – Broadcast proposal, Decide on first received value Consensus → Atomic Broadcast – Unreliably broadcast message to all – 1 consensus per round: – propose set of messages seen but not delivered – Each round deliver one decided message Atomic Broadcast equivalent to Atomic Broadcast

Consensus impossible No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model 1-crash-robust – Up to one node may crash Asynchronous model – No global clock – No bounded message delay Life after impossibility of consensus? What to do?

Solving Consensus with Failure Detectors Black box that tells us if a node has failed Perfect failure detector – Completeness It will eventually tell us if a node has failed – Accuracy (no lying) It will never tell us a node has failed if it hasn’t Perfect FD → Consensus x i = input for r:=1 to N do if r=p then forall j do send to j; decide x i if collect from r then x i = x´; end decide x i

Solving Consensus Consensus → Perfect FD? – No. Don’t know if a node actually failed or not! What’s the weakest FD to solve consensus? – Least assumptions on top of asynchronous model!

Enter Omega Leader Election – Eventually every correct node trusts some correct node – Eventually no two correct nodes trust different correct nodes Failure detection and leader election are the same – Failure detection captures failure behavior detect failed nodes – Leader election also captures failure behavior Detect correct nodes (a single & same for all) Formally, leader election is an FD – Always suspects all nodes except one (leader) – Ensures some properties regarding that node

Weakest Failure Detector for Consensus Omega the weakest failure detector for consensus – How to prove it? – Easy to implement in practice

High Level View of Paxos Elect a single proposer using Ω – Proposer imposes its proposal to everyone – Everyone decides – Done! Problem with Ω – Several nodes might initially be proposers (contention) Solution is abortable consensus – Proposer attempts to enforce decision – Might abort if there is contention (safety) – Ω ensures eventually 1 proposer succeeds (liveness) 10

Replicated State Machine Paxos approach (Lamport) – Client sends input to leader Paxos – Leader executes Paxos instance to agree on command – Well-understood, many papers, optimizations View-stamp approach (Liskov) – Have one leader that writes commands to a quorum (no Paxos) – When failures happen, use Paxos to agree – Less understood (Mazieres tutorial)

Paxos Siblings Cheap Paxos (LM’04) – Fewer messages – Directly contact a quorum (e.g. 3 nodes out of 5) – If fail to get response from 3, expand to 5 Fast Paxos (L’06) – Reduce from 3 delays to 2 delays (delays ~ delays) – Clients optimistically write to a quorum – Requires recovery

Paxos Siblings Gaios/SMARTER (Bolosky’11) – Make logging to disk efficient for crash-recovery – Uses pipelining and batching Generalized Paxos (LM’05) – Commutative operations for repl. state machine

Atomic Commit – Commit IFF no failures and everyone votes commit – Else Abort Consensus on Transaction Commit (LG’04) – One Paxos instance for every TM – Only commit if every instance said Commit

Reconfigurable Paxos Change the set of nodes – Replace failed nodes – Add/remove new nodes (change size of quorum) Lamport’s idea – Part of the state of state-machine: set of nodes SMART (Eurosys’06) – Many problems (e.g. {A,B,C}->{A,B,D} and A fails) – Basic idea, run multiple Paxos instances side by side