Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003.
CS 542: Topics in Distributed Systems Diganta Goswami.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CPSC 668Set 14: Simulations1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
CS 582 / CMPE 481 Distributed Systems Replication.
1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:
1 Rollback-Recovery Protocols II Mahmoud ElGammal.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Synchronization Chapter 5.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Efficient Fork-Linearizable Access to Untrusted Shared Memory Presented by: Alex Shraer (Technion) IBM Zurich Research Laboratory Christian Cachin IBM.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.
Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Ordering in online games Objectives – Understand the ordering requirements of gaming – Realise how ordering may be achieved – Be able to relate ordering.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Distributed Systems Lecture 6 Global states and snapshots 1.
Replication & Fault Tolerance CONARD JAMES B. FARAON
EEC 688/788 Secure and Dependable Computing
COT 5611 Operating Systems Design Principles Spring 2012
EECS 498 Introduction to Distributed Systems Fall 2017
Chapter 10 Transaction Management and Concurrency Control
EEC 688/788 Secure and Dependable Computing
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05

Agenda Introduction Introduction Challenges in Fault-Tolerant Distributed Systems Challenges in Fault-Tolerant Distributed Systems Consistent Event Ordering in a Distributed System Consistent Event Ordering in a Distributed System Key Aspects of Proposed Approach Key Aspects of Proposed Approach Logical vs. Physical Failures Logical vs. Physical Failures Proposed Broadcast Primitives Proposed Broadcast Primitives GBCAST GBCAST ABCAST ABCAST CBCAST CBCAST Advantages of the Proposed Approach Advantages of the Proposed Approach Sample Application: Updating Replicated Data Sample Application: Updating Replicated Data Final Thoughts Final Thoughts

Introduction White paper written in 1987, funded by DoD White paper written in 1987, funded by DoD Purpose: Purpose: Present a set of communication primitives that facilitate distributed processing in the presence of failures Present a set of communication primitives that facilitate distributed processing in the presence of failures System Assumptions: System Assumptions: One computation, executed by multiple processes in a distributed system (DS) Each process has a local state Processes communicate via broadcasts with other processes Processes may “halt” at any time The paper does not address byzantine failures

Challenges in Fault-Tolerant DS Challenges in fault-tolerant distributed systems Challenges in fault-tolerant distributed systems How does the system handle exit/re-entry of processes How does the system handle exit/re-entry of processes A process may leave the computation (due to failure) A process may leave the computation (due to failure) A process may re-enter the computation (recovery) A process may re-enter the computation (recovery) How do processes communicate with each other when How do processes communicate with each other when Messages may be lost by communication subsystem Messages may be lost by communication subsystem Messages may be re-ordered while in transit Messages may be re-ordered while in transit Some receiver processes may halt Some receiver processes may halt Sender process may halt Sender process may halt How to handle failures when system is asynchronous How to handle failures when system is asynchronous Goal: continue the computation in the presence of failures Goal: continue the computation in the presence of failures

Consistent Event Ordering in DS Key aspects of distributed processing Key aspects of distributed processing Processes must have a consistent view of the ordering of events (i.e., messages) during the computation Processes must have a consistent view of the ordering of events (i.e., messages) during the computation A system must provide ordering, but also allow concurrency A system must provide ordering, but also allow concurrency Failures can affect consistent view of event ordering Failures can affect consistent view of event ordering Example: Example: Process ‘A’ sends a broadcast message, then fails Process ‘A’ sends a broadcast message, then fails Process ‘B’ receives the message, then notices failure Process ‘B’ receives the message, then notices failure Process ‘C’ notices failure, then receives message Process ‘C’ notices failure, then receives message Process ‘D’ never receives message, but notices failure Process ‘D’ never receives message, but notices failure Process ‘F’ receives message, but never notices failure Process ‘F’ receives message, but never notices failure Process ‘G’ never receives message nor notices failure Process ‘G’ never receives message nor notices failure Key: All processes must agree on the events that occurred and on the order of those events Key: All processes must agree on the events that occurred and on the order of those events

... Consistent Event Ordering in DS Approaches to keep consistent event ordering in the presence of failures Approaches to keep consistent event ordering in the presence of failures 1) Run agreement protocol after a failure is detected 1) Run agreement protocol after a failure is detected Problems: slow and requires synchronous communication Problems: slow and requires synchronous communication 2) Use this rule: A process should discard messages received from a process that is known to have failed 2) Use this rule: A process should discard messages received from a process that is known to have failed Problem: Processes learn of failures at different times, so system may still be inconsistent Problem: Processes learn of failures at different times, so system may still be inconsistent Proposed Idea: Proposed Idea: “Construct a broadcast protocol that orders messages relative to failure and recovery events” “Construct a broadcast protocol that orders messages relative to failure and recovery events”

Key Aspects of Proposed Approach Failure and recovery are treated as system events, just like local processing and messages Failure and recovery are treated as system events, just like local processing and messages Thus, failure and recovery have an ordering with respect to messages & local processing Thus, failure and recovery have an ordering with respect to messages & local processing The paper proposes communication primitives that maintain consistent ordering among processes The paper proposes communication primitives that maintain consistent ordering among processes All processes experience the same sequence of events, including failures All processes experience the same sequence of events, including failures Advantages: Advantages: When a process notices a failure, it can assume that the rest of the system has noticed the order of the failure consistently When a process notices a failure, it can assume that the rest of the system has noticed the order of the failure consistently Therefore, the process can immediately react to the failure (no agreement protocol required) Therefore, the process can immediately react to the failure (no agreement protocol required)

Logical vs. Physical Failures Failures (i.e., lost messages, process halts) are physical events, occurring in real-time Failures (i.e., lost messages, process halts) are physical events, occurring in real-time Processes cannot control when a failure occurs Processes cannot control when a failure occurs Recall that processes use logical clocks to track order of events in a distributed computation Recall that processes use logical clocks to track order of events in a distributed computation In order to treat failures as ordered events, physical failures must be mapped to logical failures In order to treat failures as ordered events, physical failures must be mapped to logical failures How? How? Introduce “Process-Group View”: Logical snapshot of processes involved in the distributed computation Introduce “Process-Group View”: Logical snapshot of processes involved in the distributed computation Changes in the properties of the group (i.e., failures, recovery) are ordered with respect to other events Changes in the properties of the group (i.e., failures, recovery) are ordered with respect to other events These changes are communicated among processes by using the proposed broadcast primitives These changes are communicated among processes by using the proposed broadcast primitives

Proposed Broadcast Primitives 3 Broadcast Communication Primitives 3 Broadcast Communication Primitives Group-Broadcast (GBCAST) Group-Broadcast (GBCAST) Atomic-Broadcast (ABCAST) Atomic-Broadcast (ABCAST) Causal-Broadcast (CBCAST) Causal-Broadcast (CBCAST) All 3 are atomic: All processes receive the message or non-receive the message All 3 are atomic: All processes receive the message or non-receive the message Emphasis on lightweight primitives: quick processing is desired to improve performance Emphasis on lightweight primitives: quick processing is desired to improve performance

GBCAST GBCAST  Group Broadcast GBCAST  Group Broadcast Used to keep consistent “process group view” Used to keep consistent “process group view” Call: GBCAST(action, G) Call: GBCAST(action, G) action  type of event that has occurred action  type of event that has occurred G  process group view G  process group view GBCAST satisfies the following ordering constraints GBCAST satisfies the following ordering constraints Delivered in the same order with respect to all other broadcasts at each destination Delivered in the same order with respect to all other broadcasts at each destination Delivered after any messages sent by the failed process Delivered after any messages sent by the failed process

… GBCAST GBCAST is used to inform group member processes that the process group view has changed GBCAST is used to inform group member processes that the process group view has changed Each process keeps a local copy of the “process group view” Each process keeps a local copy of the “process group view” Reception of a GBCAST updates the local copy Reception of a GBCAST updates the local copy A process can assume that its local copy is consistent with the rest of the group A process can assume that its local copy is consistent with the rest of the group Upon failure or recovery, a GBCAST is sent by the Upon failure or recovery, a GBCAST is sent by the Supervisory process executing in same machine where process failure or recovery occurred (if machine alive) Supervisory process executing in same machine where process failure or recovery occurred (if machine alive) Failure detection software executing on other machine Failure detection software executing on other machine The usage of GBCAST avoids execution of an agreement protocol The usage of GBCAST avoids execution of an agreement protocol

ABCAST ABCAST  Atomic Broadcast ABCAST  Atomic Broadcast Provides sequential consistency on replicated data Provides sequential consistency on replicated data Applications use ABCAST to enforce order in the way data is updated in the distributed system (i.e., shared data structure) Applications use ABCAST to enforce order in the way data is updated in the distributed system (i.e., shared data structure) Call: ABCAST(msg, label, dests) Call: ABCAST(msg, label, dests) msg  message to be broadcasted msg  message to be broadcasted label  identifies ABCASTs that are related to each other label  identifies ABCASTs that are related to each other dests  set of processes to which broadcast is sent dests  set of processes to which broadcast is sent ABCASTs with the same label that have destinations in common are delivered in the same order (some order) to all such destinations ABCASTs with the same label that have destinations in common are delivered in the same order (some order) to all such destinations

CBCAST CBCAST  Causal Broadcast CBCAST  Causal Broadcast Provides causal consistency on replicated data Provides causal consistency on replicated data Applications use CBCAST to enforce causal order in the way data is updated in the distributed system Applications use CBCAST to enforce causal order in the way data is updated in the distributed system Call: CBCAST(msg, clabel, dests) Call: CBCAST(msg, clabel, dests) msg  message to be broadcasted msg  message to be broadcasted clabel  identifies related CBCASTs and type of ordering clabel  identifies related CBCASTs and type of ordering dests  set of processes to which broadcast is sent dests  set of processes to which broadcast is sent CBCASTs with the same ‘clabel’ that have destinations in common are delivered in a predetermined order to all such destinations CBCASTs with the same ‘clabel’ that have destinations in common are delivered in a predetermined order to all such destinations

… CBCAST Broadcast ‘A’ causally precedes broadcast ‘B’ if Broadcast ‘A’ causally precedes broadcast ‘B’ if A and B are sent by the same process, and A is sent before B A and B are sent by the same process, and A is sent before B A and B are sent by different processes, and A was received by the process that sent B before B was sent A and B are sent by different processes, and A was received by the process that sent B before B was sent Causal ordering is determined by the value of ‘clabels’ Causal ordering is determined by the value of ‘clabels’ If broadcast A causally precedes broadcast B, then clabel(A) < clabel(B) If broadcast A causally precedes broadcast B, then clabel(A) < clabel(B) Usage of ‘clabels’ gives applications the power to decide events that are causally related Usage of ‘clabels’ gives applications the power to decide events that are causally related Not all CBCASTs are causally related; ordering them would limit system concurrency Not all CBCASTs are causally related; ordering them would limit system concurrency

Advantages of the Proposed Approach Simplify applications Simplify applications Eliminate the need for ‘ordering protocols’ at the application level needed to prevent inconsistencies due to potential failures Eliminate the need for ‘ordering protocols’ at the application level needed to prevent inconsistencies due to potential failures These protocols are needed if communication were done via simple atomic broadcasts These protocols are needed if communication were done via simple atomic broadcasts Improve system performance Improve system performance Application ordering protocols restrict concurrency by imposing synchronization rules Application ordering protocols restrict concurrency by imposing synchronization rules Note: Assumption is that GBCAST, ABCAST, and CBCAST are implemented at a level below the application (i.e., Kernel) Note: Assumption is that GBCAST, ABCAST, and CBCAST are implemented at a level below the application (i.e., Kernel)

Sample Application: Updating Replicated Data All copies of the replicated data must be updated in the same order All copies of the replicated data must be updated in the same order Without the proposed broadcast primitives, process would need to do explicit synchronization Without the proposed broadcast primitives, process would need to do explicit synchronization Send a basic atomic broadcast to the remote copies Send a basic atomic broadcast to the remote copies Wait for the remote copies to reply with confirmation of update Wait for the remote copies to reply with confirmation of update Update local copy, and perform next update Update local copy, and perform next update Note: similar to 2-Phase-Commit Note: similar to 2-Phase-Commit Using CBCAST, process can assume that all copies have been updated once CBCAST returns and local copy is updated Using CBCAST, process can assume that all copies have been updated once CBCAST returns and local copy is updated CBCAST guarantees that all copies receive update in required order with respect to previous CBCASTs that update the same data CBCAST guarantees that all copies receive update in required order with respect to previous CBCASTs that update the same data CBCASTs are ordered with respect to failures (notified via GBCASTs) CBCASTs are ordered with respect to failures (notified via GBCASTs) Note: Usage of CBCASTs improves performance Note: Usage of CBCASTs improves performance

Final Thoughts The proposed broadcast primitives provide The proposed broadcast primitives provide Implicit ordering of messages Implicit ordering of messages Applications need not do explicit synchronization to prevent ordering problems when failures are possible Applications need not do explicit synchronization to prevent ordering problems when failures are possible Message ordering with respect to faults/recoveries Message ordering with respect to faults/recoveries Faults and Recoveries are treated as logical events, subject to ordering with respect to messages Faults and Recoveries are treated as logical events, subject to ordering with respect to messages This provides consistency among the processes in the distributed system (all processes experience same set of events) This provides consistency among the processes in the distributed system (all processes experience same set of events) Improved performance Improved performance Elimination of explicit application ordering protocols allows higher concurrency in computation Elimination of explicit application ordering protocols allows higher concurrency in computation