Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.

Slides:



Advertisements
Similar presentations
Chapter 12 Message Ordering. Causal Ordering A single message should not be overtaken by a sequence of messages Stronger than FIFO Example of FIFO but.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems Spring 2009
Ordering and Consistent Cuts Presented By Biswanath Panda.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Aran Bergman & Eddie Bortnikov & Alex Shraer, Principles of Reliable Distributed Systems, Spring Principles of Reliable Distributed Systems Recitation.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Ordering and Consistent Cuts Presented by Chi H. Ho.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 Principles of Reliable Distributed Systems Lecture 1: Introduction.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Fault Tolerant Services
CIS825 Lecture 2. Model Processors Communication medium.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.
Fault Tolerance (2). Topics r Reliable Group Communication.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
COT 5611 Operating Systems Design Principles Spring 2012
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Replication Improves reliability Improves availability
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
Time And Global Clocks CMPT 431.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant Broadcasts Spring 2008 Prof. Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Material Distributed Systems 2nd edition, Sape Mullender (Editor) –Causal order, Ch. 4 –Fault-tolerant broadcasts, Ch. 5 –Vector clocks, Ch. 4 –State machine replication, from Ch. 7 Attiya-Welch –Vector clocks, Ch. 6

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Broadcast Sending a message to all the nodes in the system –E.g., our course mailing list What’s it good for? Allows for redundancy –In storage, processing Broadcast Service - building block for replication

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Motivation: Replication 2 Paradigms: Primary-Backup - Passive State Machine Replication (SMR) - Active

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Primary-Backup Replication Primary Backup(s) Client broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Primary-Backup (Passive) Replication “Hot” standby Client talks to primary server Primary updates backup(s) Client detects server failure using timeout –Performs “fail-over” to backup server –May need to repeat last operation(s) Can be a problem with “false suspicions” Works with benign servers only

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Primary-Backup Variants Backups can detect primary failure Client can be oblivious to failure –Using dispatcher –Using IP takeover Picture from IBM Web site: Highly available Web server cluster on HTTP Server Primary/backup with a network dispatcher model

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring State Machine Replication (SMR): The Idea aaa bb c Replicas are identical deterministic state machines Process operations in the same order  remain consistent

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring SMR Architecture Client broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring SMR Architecture: Option II Client A Client B broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring State Machine Replication Send updates to all servers All servers are identical deterministic state machines –Servers begin in the same initial state –Perform operations in the same order to remain consistent May be slower than primary backup, but provides quicker, smoother fail-over Can overcome false suspicions and tolerate malicious servers

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all correct replicas receive all client requests Order: replicas process requests in the same order We want a Broadcast Service satisfying these We’ll start with Agreement

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Before We Begin … Define the model where we want to solve the problem Define the service interface Specify the service –Using properties

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model: Correct vs. Faulty Processes Look at a complete run (execution) –External observer’s view A process that does not fail in a run is correct in that run Otherwise, the process is faulty in the run –A process that fails any time in the run is faulty throughout the entire run

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Threshold Failure Model t out of n processes may fail t is usually given as a function of n, e.g., –t < n –2t < n –3t < n

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model: Synchronous vs. Asynchronous Synchrony: –Bounded latency, clock drift, processing time –Process crash failures can be accurately detected Asynchrony: non-assumption –Process crash failures cannot be accurately detected Failstop –Time-free, but crash failures accurately detected Unreliable failure detectors – later in the course

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Service Interface: Separating Reception from Delivery Application (e.g., state machine) deliver: update state Network receive Delivery Layer: wait for messages that should be acted on first

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Broadcast Service for Replication Primitives: broadcast(m), deliver(m). –For simplicity, assume m is unique. Network Broadcast Algorithm Application deliver broadcast receivesend Broadcast Algorithm Application deliver broadcast receivesend

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast Specification Validity: if a correct process broadcasts m then all correct processes eventually deliver m Agreement: if a correct process delivers m then all correct processes eventually deliver m –Uniform agreement: if any process delivers m then all correct processes eventually deliver m Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model for Implementation Asynchronous Process crash failures –Note: cannot be detected Pair-wise point-to-point reliable links between correct processes –If a correct process p sends a message to a correct process q, then q receives the message –Safety or liveness?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast Implementation Simple implementation … Does it solve Uniform Reliable Broadcast? What if there are some link failures? –Under what condition does the protocol still work?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all correct replicas receive all client requests Order: replicas process requests in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Reception Orders? Process 1 send “1” receive message send “2” receive message Process 2 send “a” send “b” receive message Space-time diagram P1 P2 1 a 2 b

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Reception Orders? Process 1 send “1” receive message send “2” receive message Process 2 send “a” send “b” receive message Space-time diagram P1 P2 1 a 2 b

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Order If some process broadcasts message m before message m’, then every correct process that delivers m’ delivers m beforehand. Trivial to implement (How?)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Delivery Orders? Process 1 bcast “1” dlvr msg (“a”) dlvr msg bcast “2” dlvr msg Process 2 bcast “a” dlvr msg (“2”) What delivery orders make sense? (“a”)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Cause and Effect When observing a distributed system, there no common clock to order events of different processes We instead use the (weaker) notion of cause and effect If one event e caused another event f to happen, then e and f could never have happened simultaneously –e happened before f When we do not know a given program’s logic but can only observe its communication, –We cannot tell whether one event causes another –We can tell whether it could have caused another

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Happened Before or Causal Order [Lamport 78] Event e happens before (causally precedes) event f, denoted e → f if: –The same process executes e before f or –e is send(m) and f is receive(m) or –Exists h so that e → h and h → f We define concurrent, e || f, as: ¬(e → f  f → e) In a broadcast service, application-level causality is between broadcast and deliver

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Order For two messages, m and m’, m →m’ if send(m) causally precedes send(m’) –Causal order: transitive closure of FIFO + some process delivers m before broadcasting m’ Causal Delivery: If m →m’ then every correct process that delivers m’ delivers m beforehand

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Does FIFO Between all Processes Guarantee Causal Order?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Total Order If two correct processes deliver both m and m’, they deliver them in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Broadcast Reliable Broadcast + FIFO Simple implementation on top of reliable broadcast (using sequence numbers) Network Reliable Broadcast Application receivesend FIFO Broadcast deliverbroadcast deliverbroadcast Reliable Broadcast Application receivesend FIFO Broadcast deliverbroadcast deliver broadcast Reliable Broadcast FIFO Deliver

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast Reliable Broadcast + Causal Implementation on top of Reliable Broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring If we Could Use Clocks …. Assume: processes have access to a global real- time clock RC, message delays are bounded by D –Every message m contains a timestamp TS(m) = RC –DR1: At time t, deliver all received messages with timestamps up to t –D in increasing timestamp order –If two messages have the same timestamp, break ties by process id (deliver the one with the lower id first) Clock Condition: if e → f then RC(e) < RC(f) Hence, if m →m’ then TS(m) < TS(m’)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lamport’s Timestamps – LTS Logical Clocks Invent a clock that satisfies the clock condition Each process maintains a logical clock – Local positive-integer variable Each message is tagged with the source’s logical clock at the time the message is sent

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring LTS Broadcast Algorithm Part I: Logical Clock Assignment code for process p i TS[j] ← 0,  j=0,…,n pending ← empty broadcast (m) TS[i] ← TS[i] + 1 /* p i ’s logical clock respects FIFO */ send (m,  TS[i], i  ) to all upon receive (m,  t, j  ) TS[j] ← t add (m,  t, j  ) to pending TS[i] ← max (TS[i], t) + 1 /* p i ’s logical clock respects causality */

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Logical Clocks LTS=  1,p1  LTS=  1,p2  LTS=  3,p3 

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Does This Solve The Problem? When do we deliver a message? Deliver a message with TS = t when no message with TS < t can be received A message m received by p i is stable at p i if no future messages with timestamps smaller than TS(m) can be received by p i DR2: Deliver all received messages that are stable at p i in increasing timestamp order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Detecting Stability Assume FIFO communication If p i receives m from p j with TS(m), p i cannot later receive a message m’ from p j with TS(m’) < TS(m) Stability of m at p i is guaranteed when –p i has received a message with timestamp greater than TS(m) from all processes Note: the timestamp is a pair  LC, pid 

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring LTS Broadcast Algorithm Part II: Delivery Rule code for process p i let (m,  t, j  ) be the entry in pending with the smallest  t, j  if  t, j    TS[k],k   k=0,…n then /* DR2 */ deliver (m) remove (m,  t, j  ) from pending

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The LTS Algorithm Implements Causal Broadcast Causal Delivery: If m →m’ then every correct process that delivers m’ delivers m beforehand Timestamps respect the Clock Condition: –If m →m’ then TS(m) < TS(m’) DR2 + FIFO links ensure that if m is in pending, all messages with lower TS were received –Were either delivered or are pending Delivery from pending is by TS order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Fault Tolerance?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks Process p i has a vector clock VC[1…n] –VC[i] is the local message sequence number of the last message broadcast by p i –For j≠i, VC[j] is the latest message p i delivered from p j VC is sent with each message m –For j≠i, m.VC[j] is p j ’s latest message that causally precedes m

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks: Sending At process p i, on broadcast(m) –VC[i]  VC[i]+1 –Use Reliable Broadcast to send (m,VC) to all No need to send to myself –Deliver m locally

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks: Delivery Rule Upon receive m –Place in message buffer Deliver m from p j from buffer if –VC[j] = m.VC[j] -1 and –Forall k≠j : VC[k] ≥ m.VC[k] Upon deliver –VC[j] := VC[j] + 1 VC[j] is the number of messages of p j that causally precede p i ’s subsequent messages FIFO Messages causally preceding m were delivered

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks Example

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Atomic Broadcast Services Atomic Broadcast: – Reliable Broadcast + Total Order FIFO Atomic Broadcast –FIFO + Reliable Broadcast + Total Order Causal Atomic Broadcast –Causal + Reliable Broadcast + Total Order HW question: are FIFO Atomic Broadcast and Causal Atomic Broadcasts equivalent?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Atomic Broadcast in Failstop Model Order messages by logical timestamp (LTS) –Break ties by process id Use FIFO Broadcast When is a message with LTS t delivered? –Reminder: failstop = failures are accurately detected –Assume further that no message from a faulty process arrives after its failure is detected

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Atomic Broadcast in Asynchronous Systems??? Alas, impossible if even one process can crash

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Now, Back to State Machines We can build state machines using Atomic Broadcast A client can –Broadcast to all servers; or –Forward its request to one of the servers to broadcast to the others Resend on timeout if the server fails What about client failures?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Multicast Problems Processes organized in groups –Groups have names –Messages sent to groups –Like broadcast, but for group members –Processes can join and leave groups Processes may care about who else is a member of the group (group membership)