EEC 688/788 Secure and Dependable Computing

Slides:



Advertisements
Similar presentations
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Advertisements

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
EEC 688/788 Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
The consensus problem in distributed systems
Distributed Systems – Paxos
View Change Protocols and Reconfiguration
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Systems, Consensus and Replicated State Machines
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
From Viewstamped Replication to BFT
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
IS 651: Distributed Systems Fault Tolerance
EEC 688/788 Secure and Dependable Computing
Consensus, FLP, and Paxos
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Replicated state machine and Paxos
EEC 688/788 Secure and Dependable Computing
View Change Protocols and Reconfiguration
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
The SMART Way to Migrate Replicated Stateful Services
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
Presentation transcript:

EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org

EEC688/788: Secure & Dependable Computing Outline Reminder Time to work on project! Project outline due: Nov 11th in class (hardcopy, no extension!) Topic, title, list of 5 papers Distributed consensus and Paxos algorithms Multi-Paxos Dynamic Paxos Cheap Paxos 4/17/2019 EEC688/788: Secure & Dependable Computing

Multi-Paxos: Paxos for State Machine Replication Client: partially assumes the role of a proposer Server replicas: acceptors Value to be agreed on: total ordering of requests sent by clients Total ordering accomplished by running a sequence of instances of Paxos Each instance is assigned a sequence number, representing the total ordering of the request that is chosen Value to be chosen: the request chosen for the instance

Multi-Paxos: Paxos for State Machine Replication Client: partially assumes the role of a proposer Only propose a value (i.e., request it sends) without the corresponding proposal number A server replica, the primary, decides on the proposal number Primary: essentially the proposer in Paxos Propose a sequence number – request binding Propagate value chosen (i.e., total ordering info) to other replicas (i.e., learners) Initial membership is known with a sole primary First phase can be omitted during normal operation When the primary is suspected, a new primary is elected (view change)

Multi-Paxos: Paxos for State Machine Replication Normal operation of Multi-Paxos

Multi-Paxos: Checkpointing and Garbage Collection Paxos is open-ended: it never terminates A proposer is allowed to initiate a new proposal even if every acceptor has accepted a proposal An acceptor must remember the last proposal that it has accepted and the latest proposal number it has accepted In Multi-Paxos, every replica must remember such info for every instance of Paxos: Need infinite memory Solution: periodic checkpointing, e.g., once for every n requests Garbage collect logs after taking each checkpoint Request or control msg needed by a slow replica may not be available anymore after a checkpoint => state transfer

Multi-Paxos: Leader Election and View Change Leader election: can be done using a full Paxos instance New primary needs to determine if a value has been chosen in each incomplete instance of Paxos Leader election and history determination can be done in a simple full paxos: view change

Multi-Paxos: View Change A set of 2f+1 replicas, replica id: 0,1,…,2f History of system: a sequence of views Each view: one and only one primary Initially replica 0 assumes the primary role for v=0 Subsequently, replicas take the primary role in a round-robin fashion To ensure liveness A replica starts a view change timer on the initiation of each instance of Paxos If the replica does not learn the request chosen before timer expires => suspect the primary

View Change On suspecting the primary, a replica broadcasts a view change message to all Current primary, if it is wrongly suspected, joins the view change anyway (i.e., it steps down from primary role) A replica joins the view change even if it’s view change timer has not expired yet On joining view change, a replica stops accepting normal control msgs and respond to only checkpointing and view change msgs

View change View change msg contains New view # Seq# of last stable checkpoint A set of accepted records since last stable checkpoint Each record consists of view#, seq#, request msg On receiving f+1 view change msgs, new primary sends new view msg Include a set of accept msgs Include all accepted msgs as part of view change msg When a gap in seq# is detected, create an accept request with no op A replica accepts new view msg if it has not installed a newer view

Dynamic Paxos Designed to accommodate reconfiguration Extension majority concept to quorum Classic Paxos => uses static quorum Dynamic quorum: quorum size may change dynamically Cheap Paxos uses dynamic quorum

Dynamic Paxos Fewer replicas are required by using spare replicas and reconfiguration provided no other fault during reconfiguration Without reconfiguration, 3f+1 replicas can only tolerate up to 3f/2 faulty replicas 2f+1 active replicas, plus f spares can tolerate up to 3f-1 faulty replicas via substitution and reconfiguration As long as 1 active replica and 1 spare are operating

Dynamic Paxos Reconfiguration request must be totally ordered with respect to regular application requests A reconf request includes both new membership and quorum definition Replicas in the new membership should not accept msgs unrelated to reconf from replicas that have been excluded from the membership External replicas should not be allowed to participate the consensus step Replica mistakenly excluded can join via recofiguration

Cheap Paxos Cheap Paxos is a pecial instance of Dynamic Paxos Aims to minimize involvement of spare replicas Enable the use of f+1 active replicas to tolerate f faults, provided that sufficient spares are available (f or more) Active replicas are referred to as main replicas Spare replicas are referred to as auxiliary replicas

Cheap Paxos Primary quorum Secondary quorum Consists of all active replicas Secondary quorum Must be formed by the majority of combined replicas Consists of at least one main replica => Ensures intersection between primary and secondary quorums Question: what if only one active replica left?

Cheap Paxos Upon detection of the failure of an active replica, a reconfiguration request is issued New primary quorum still consists of all surviving active replicas When reconfig request is executed, switch to new configuration Auxiliary replicas are not bothered unless a reconfiguration is necessary What if the primary fails => view change

Cheap Paxos: View Change For history information: new primary must collect info from every active replica except the old primary For approval of the primary role, the new primary must collect votes from all surviving active replicas plus one or more auxiliary replica => a secondary quorum The secondary quorum is used to complete all Paxos instances started by old primary but not yet completed

Cheap Paxos Replicas in secondary quorum must propagate their knowledge to all replicas prior to moving back to primary quorum So that auxiliary replicas do not have to keep all requests and control msgs How to achieve this Primary notifies its latest state to all replicas not in secondary quorum Main replica (if it is not in secondary quorum) requests for retransmission of missing msgs Auxiliary replica keeps new info from primary and purge old data, and ack to the primary Primary resumes ordering requests after it receives ack from every replica

Example

Example

Example

Example