EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Slides:



Advertisements
Similar presentations
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Advertisements

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 582 / CMPE 481 Distributed Systems
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
CSS490 Replication & Fault Tolerance
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC 688/788 Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Outline Introduction (what’s it all about) Data-centric consistency Client-centric consistency Replica management Consistency protocols.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved.
CS542: Topics in Distributed Systems Replication.
EEC 688/788 Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CSE 486/586 Distributed Systems Consistency --- 1
EEC 688/788 Secure and Dependable Computing
6.4 Data and File Replication
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Presentation transcript:

EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Data and Service Replication Replication resorts to the use of space redundancy to achieve high availability  Instead of running a single copy of the service, multiple copies are used  Usually deployed across a group of physical nodes for fault isolation Data and service replication  Usually use different approaches  Transactional data replication  Optimistic replication (omitted)  Balance consistency and performance: CAP theorem (omitted)

Data and Service Replication Service replication: State machine replication  Each replica is modeled as a state machine: state, interface, deterministic state change via interface  Replica consistency issue: coordination needed Total order of requests to the server replicas Sequential execution of requests Data replication:  Direct access on data  Operation on data: read or write  Context: transaction processing => concurrent access to replicated data essential

Service Replication State is encapsulated Clients interact with exported interfaces (APIs) Replication algorithm used to coordinate replicas (for consistency) Fault tolerance middleware

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Replication Styles Active replication  Every input (request) is executed by every replica  Every replica generates the outputs (replies)  Voting is needed to cope with non-fail-stop faults Passive replication  One of the replicas is designated as the primary replica  Only the primary replica executes requests  The state of the primary replica is transferred to the backups periodically or after every request processing Semi-active replication  One of the replicas is designated as the leader (or primary)  The leader determines the order of execution  Every input is executed by every replica per the leader’s instruction

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Duplicate Invocation Suppressed Duplicate Responses Suppressed Active Replication Actively Replicated Client Object A Actively Replicated Server Object B RM

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Active Replication with Voting Question: to cope with f number of faults (non-malicious), how many replicas are needed?

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao State Transfer State Response Invocation Passive Replication Passively Replicated Client Object A Passively Replicated Server Object B Primary Replica Primary Replica RM Question: can passive replication tolerate non-fail-stop faults?

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Ordering info Response Invocation Semi-Active Replication Semi-Actively Replicated Client Object A Semi-Actively Replicated Server Object B Primary Replica Primary Replica RM

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Ensuring Strong Replica Consistency Implementation of Service Replication: Ensuring Strong Replica Consistency For active replication,  use a group communication system or a consensus algorithm that guarantees total ordering of all messages (plus deterministic processing in each replica) Passive replication with systematic checkpointing Semi-active replication Use two-phase commit

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Total Ordering of Messages What is total ordering of messages?  All replicas receive the same set of messages in the same order  Atomic multicast – If a message is delivered to one replica, it is also delivered to all non-faulty replicas With replication, we need to ensure total ordering of messages sent by a group of replicas to another group of replicas  FIFO ordering between one sender and a group is not sufficient m1 m2 m1 m2 m1

2/18/2016 EEC688/788: Secure & Dependable Computing Wenbing Zhao Potential Sources of Non-determinisms Multithreading  The order of accesses of shared data by different threads might not be the same at different replicas System calls/library calls  A call at one replica might succeed while the same call might fail at another replica. E.g., memory allocation, file access Host/process specific information  Host name, process id, etc.  Local clocks - gettimeofday() Interrupts  Delivered and handled asynchronously – big problem

Data Replication Transactional data replication  Read/write ops on a set of data items within the scope of a transaction  At the transaction level, executions appear to be sequential (One-copy serializable)  Actual ops on each data item often concurrent Optimistic data replication  Eventual consistency: eventually, all updates will be propagated to all data items

Transactional Data Replication One-copy serializable  A transactional data replication algorithm should ensure that the replicated data appear to the clients as a single copy  The interleaving of the execution of the transactions be equivalent to a sequential execution of those transactions on a single copy of the data. Make read ops cheaper than updates: read ops are more prevalent It is challenging to design sound replication algorithms

Wrong Data Replication Algorithms Write-all  A read op on a data item x can be mapped to any replica of x  Write on x must be applied to all replicas of x Problem: what if a replica becomes faulty?  Blocking! Any single replica fault => bring down the entire system!

Wrong Data Replication Algorithms Write-all-available  A read op on a data item x can be mapped to any replica of x  Write on x is applied to available replicas of x Problem: cannot ensure one-copy serializable execution!

Attempting to Fix Write-All-Available Problem caused by accessing the not-fully-recovered replica => how about preventing this? Still won’t work  Ti does not precedes Tj because Tj reads y before Ti writes to y  Tj does not precedes Ti because Ti reads x before Tj writes to x  Ti: R(x), W(y)  Tj: R(y), W(x)  Hence, Ti and Tj are not serializable!

Insight to the Problem The problem is caused by the fact that conflicting operations are performed at difference replicas We must prevent this from happening A solution: use quorum-based consensus What is a quorum?  Given a system with n processes, a quorum is formed by a subset of the processes in the system  Any two quorums must intersect in at least one process  Read quorum: a quorum formed for read ops  Write quorum: a quorum formed for write ops

A Quorum-Based Replication Algorithm Basic idea:  Write ops apply to a write quorum  Read ops apply to a read quorum  Fault tolerance: given total number replicas N and write quorum size W (>= read quorum size R), can tolerate up to N-W failures Quorum rule  Each replica assigned a positive weight, e.g., 1  A read quorum has a min total weight RT  A write quorum has a min total weight WT  RT+WT > total weight && 2WT > total weight

A Quorum-Based Replication Algorithm Since update is applied to a quorum of replicas, we need to track which replica has the latest value => use version numbers  Version number is incremented after each update Read rule  A read on data x is mapped to a read quorum replicas of x  Each replica returns both the value of x and its version number  The client select the value that has the highest version number

A Quorum-Based Replication Algorithm Write rule  A write op on data x is mapped to a write quorum replicas of x  First, retrieve version numbers from the replicas, set v=v max +1 for this write op  Write to the replicas (in the write quorum) with new value and version # v. A replica overwrites both the value and version number v

Quorum-Based Replication Algorithm: Example