Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Lab 2 Group Communication Andreas Larsson
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
1 Chapter 14: Replication From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001 Presentation.
CS 582 / CMPE 481 Distributed Systems
CMPT 431 Dr. Alexandra Fedorova Lecture XII: Replication.
CS 582 / CMPE 481 Distributed Systems Replication.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
CSS490 Replication & Fault Tolerance
Distributed Systems Fall 2011 Gossip and highly available services.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Distributed Systems Fall 2009 Distributed transactions.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
CS425 /CSE424/ECE428 – Distributed Systems – Fall Nikita Borisov - UIUC1 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
DISTRIBUTED SYSTEMS II AGREEMENT - COMMIT (2-3 PHASE COMMIT) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
CS542: Topics in Distributed Systems Replication.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Fault Tolerant Services
1 Distribuerede systemer og sikkerhed – 21. marts 2002 zFrom Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design zEdition 3, © Addison-Wesley.
Fault Tolerance and Replication
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
CSE 486/586 Distributed Systems Consistency --- 3
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 13: Replication Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
CSE 486/586 Distributed Systems Consistency --- 1
View Change Protocols and Reconfiguration
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Announcements Fault Tolerance.
CSE 486/586 Distributed Systems Consistency --- 1
Replication Improves reliability Improves availability
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Presentation transcript:

Distributed Systems Fall 2010 Replication

Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly available services

Fall 20105DV0204 Group communication Static vs. Dynamic groups Primary partition vs. partitionable groups Group management –Interface for membership changes –Failure detection –Notification upon membership changes –Provide group address expansion

Fall 20105DV0205 Group views Views contain a set of members at a given point in time –Failed identified processes are not in the view Events occur in views View-synchronous group communication –Based on view delivery, we can know which messages must have been delivered to other members

Fall 20105DV0206 View-synchronous group communication Correct processes deliver the same set of messages in any given view Messages are delivered at most once Correct processes always deliver messages they send –If delivering to q fails, the next view excludes q

Fall 20105DV0207 Why replication? Many algorithms require a working server node Performance (load balancing) Increased availability 1 – p(all replicas crashed) = 1 – p n Fault-tolerance –Correct servers in majority

Fall 20105DV0208 Replication Replication transparency –Client unaware of replication Problem with >1 client –Concurrent access, rather than exclusive –Operations are interleaved How do we ensure correctness?

Fall 20105DV0209 Correctness of interleavings Always –Interleaved sequence of operations must meet the specification of a single correct copy of the object(s) Sequential consistency property –Order of operations is consistent with the program order in which each individual process executed them Linearizability property –Order of operations is consistent with the real times at which the operations occurred during execution

Fall 20105DV02010 Example (interleaved operations) C1: A, B, C C2: d, e, f Order during execution: A, B, d, C, e, f An interleaving with sequential consistency: A, B, d, e, f, C Interleaving with linearizability: A, B, d, C, e, f

Fall 20105DV02011 Generalized replication 1. Request: client makes request 2. Coordination: replica managers decide upon order of request 3. Execution: request is executed 4. Agreement: replica managers agree on result of execution 5. Response: response is sent back to the client

Fall 20105DV02012 Passive replication One Primary replica manager, many backups If primary fails, backups can take its place (election!) Implements linearizability if: –A failing primary is replaced by a unique backup –Backups agree on which operations had been performed when primary crashed View-synchronous group communication!

Fall 20105DV02013 Passive replication 1. Request: front end issues request with unique ID 2. Coordination: primary checks if request has been carried out, if so, returns cached response 3. Execution: perform operation, cache results 4. Agreement: primary sends updated state to backups 5. Response: primary sends result to front end, which forwards to the client

Fall 20105DV02014 Active replication More distributed All replica managers carry out all operations Requests to RM are totally ordered Front ends issue one request at a time (FIFO) Implements sequential consistency

Fall 20105DV02015 Active replication 1. Request: front end adds unique identifier to request, mcasts it to RMs 2. Coordination: totally ordered request delivery to RMs 3. Execution: each RM executes request 4. Agreement: not needed 5. Response: all RMs respond to front end, front end interprets response and forwards interpretation to client

Fall 20105DV02016 Comparison (Active/Passive) Handling of crash failures? –Both: yes (but differently) Handling of arbitrary failures? –Active: yes, Passive: no Complexity? Optimizations? –Send “reads” to backups in passive Lose linearizability property! –Send “reads” to single backup in active Lose fault tolerance

Fall 20105DV02017 Highly available services Goal is to allow clients to use service for as long as possible –Even if network connections are lost –Even if results may be inconsistent

Gossip Guarantees by Gossip – Each client gets a consistent service over time Replicas will provide data that is fresher than what the client has seen so far – Relaxed consistency between replicas Generally less than sequential consistency Eventually, all updates are applied (in order), but clients may observe stale data Fall DV020

Gossip contd. Covered more in-depth later by Daniel Highly relevant for today’s distributed systems Used by e.g. Facebook for Cassandra (source)source Fall DV020

Fall 20105DV02020 Summary Group communication –Views –View-synchronous group communication Replication –Correctness Linearizability: time Sequential consistency: program order –Passive and active replication schemes

Fall 20105DV02021 Next lecture Transactions –Nested transactions Concurrency control –Locks –Optimistic concurrency control