CS614: Time Instead of Timeout Ken Birman February 6, 2001.

Slides:



Advertisements
Similar presentations
1 CS 194: Distributed Systems Process resilience, Reliable Group Communication Scott Shenker and Ion Stoica Computer Science Division Department of Electrical.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Byzantine Generals Problem: Solution using signed messages.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Distributed Systems Spring 2009
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.
CS603 Process Synchronization February 11, Synchronization: Basics Problem: Shared Resources –Generally data –But could be others Approaches: –Model.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Consensus and Its Impossibility in Asynchronous Systems.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Time This powerpoint presentation has been adapted from: 1) sApr20.ppt.
Lecture 10 – Mutual Exclusion Distributed Systems.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
CIS825 Lecture 2. Model Processors Communication medium.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
CS 3471 CS 347: Parallel and Distributed Data Management Notes13: Time and Clocks.
CS 347Notes 121 CS 347: Parallel and Distributed Data Management Notes12: Time and Clocks Hector Garcia-Molina.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Ordering in online games Objectives – Understand the ordering requirements of gaming – Realise how ordering may be achieved – Be able to relate ordering.
Fault Tolerance (2). Topics r Reliable Group Communication.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
The consensus problem in distributed systems
Agreement Protocols CS60002: Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.
Presentation transcript:

CS614: Time Instead of Timeout Ken Birman February 6, 2001

What we’re after A general means for distributed communication Letting n processes coordinate an action such as resource management or even replicating a database. Paper was first to tackle this issue Includes quite a few ideas, only some of which are adequately elaborated

Earlier we saw… Distributed consensus impossible with even one faulty process. Impossible to determine if failed or merely “slow”. Solution 1: Timeouts Can easily be added to asynchronous algorithms to provide guarantees about slowness. Assumption: Timeout implies failure.

Asynchronous  Synchronous Start with an asynchronous algorithm that isn’t fault-tolerant Add timeout to each message receipt Assumes bounds on the message transmission time and processing time Exceeding the bound implies failure Easy to “bullet-proof” a protocol. Practical if bounds are very conservative

Example: Resource Allocation I want Resource X Yes / No, In Use Timeout = 2δ P Q

Null messages Notice that if a message doesn’t contain real data, we can sometimes skip sending it For example: if resource isn’t in use, I could skip sending the reply and after δ time interpret your “inaction” as a NULL message Lamport is very excited by this option A system might send billions of NULL messages per second! And do nothing on receiving them!! Billions and billions…

Another Synchronous System Round Based Each round characterized by time needed to receive and process all messages.

Lamport’s version: Use Physical Clocks Also fault-tolerant realtime atomic broadcast Assumptions about time lead to conclusions other than failure Passage of time can also have “positive” value Provides generality for distributed computing problems State machines Resource acquisition and locking Expense?

Assumptions Bounded message delay δ Requires bandwidth guarantees. A message delayed by > δ treated as failure. Clock Synchronization Clock times differ by less than ε. Use clock synchronization algorithms (could be costly; revisit in next lecture). Any process can determine message origin (e.g. using HMAC signatures) Network cannot be partitioned

An Algorithm… If send message queue not empty Send m with timestamp T i If receive message queue not empty If queue contains exactly one message m from j with timestamp T i - (δ + ε) Then Received Message = m Else Received Message = NULL Implies Δ = (δ + ε)

Example i j j’ Message M TiTi TjTj T j’ Ti+ ΔTi+ Δ Tj+ ΔTj+ Δ T j’ + Δ ε

More This can be expressed more elegantly as a broadcast algorithm (more later). Can inductively extend definition to allow for “routing” across path of length n Δ = (n·δ + ε) To tolerate f failstop failures, will need f + 1 disjoint paths. To tolerate f Byzantine Failures, will need 2·f + 1 disjoint paths. Transmitting NULL message easy: do nothing.

Even More For good guarantees, need close synchronization. Message arrives T message - ε, …, T message + δ + ε Thus, need to wait (δ + ε).

Synchronization required? A means to reliably broadcast to all other processes. For process P broadcasting message M at time T p, every (correct) process must receive the message at time T p + Δ For correct j, j’, receive by T j + Δ and T j’ + Δ, respectively, or neither does.

= Atomic Broadcast Atomicity All correct processors receives same message. Same order All messages delivered in same order to all processors. Termination All updates delivered by T + Δ.

Lamport’s Assumption Somebody implements Atomic Broadcast black box. Next slide summarizes options Lamport briefly explains that previous point to point algorithm is strong enough. Only assumes ability to send along a path correctly.

Atomic Broadcast: [CASD] * Describes 3 atomic broadcast algorithms. All based on Diffusion (Flooding) Varying degrees of protection 1. Tolerant of omission failures Δ = πδ + dδ + ε 2. Works in presence of Clock Failures Δ = π(δ + ε )+ dδ + ε 3. Works in presence of Byzantine Failures Δ = π(δ + ε )+ dδ + ε δ much larger than previous for message authentication * F. Cristian, H. Aghali, R. Strong and D. Dolev, "Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement", in Proc. 15th Int. Symp. on Fault-Tolerant Computing. June 1985.

State Machine General model for computation (State Machine = Computer!) Describe computation in terms of state + transformations on the state

State Machines Multiple replicas in lock-step Number of replicas bounded (below) by fault-tolerance objectives Failstop model Failover, > f + 1 replicas Byzantine model Voting, > 2·f + 1 replicas

State Machine: Implementation Let CLOCK = current time While ( TRUE ) Execute Message CLOCK – Δ Execute Local Processing(CLOCK) Generate and Send Message CLOCK If there exist multiple messages with same time stamp, create an ordering based on sending process.

State Machine (Cont.) If we use our broadcast algorithm, all processes will get message by T sender + Δ Using the sending process id to break ties ensures everyone executes messages in same order.

State Machines for Distributed Applications Resource allocation All processes maintain list of which process has resource “locked”. Lock expires after Δ’ seconds Requests for resource are broadcast to all Rules govern who is granted lock (followed by all correct processes) Ensure no starvation Maintain consistency of resource locking

Example: Resource Allocation Request R’ TiTi TjTj T j’ Request R i j j’ Wait Time: Δ

Comparison No explicit acknowledgement needed Would be needed in traditional asynchronous algorithm But here, requesting process knows that any conflicting request would arrive within T + Δ window.

Key: Non-occurrence of event (non-request) tells us of info: we can safely lock the resource! Cost is the delay, as message sits in “holding pen.” Concern about scalability in n: We always see n requests in each  time period, so  will grow in n. Not addressed Must bound request processing time so that all can be satisfied (else could starve process with higher id hence lower priority)

More on Comparison: Resource Allocation Timeout Max Delay: 2·δ Average Delay: 2·δ exp Messages: n + dependent on failure mode Time [Lamport] Max Delay: Δ = δ + ε Average Delay: Δ = δ + ε Messages: dependent on failure mode l But is request processing time the “real” issue?

Characterizing ε ε proportional to δ var Low level algorithms can achieve good clock synchronization. δ var small for low-level algorithms δ var large for high-level algorithms Variance added by traversing low levels of protocol stack

Summary… Expressing application as state machine transitions can easily be transferred to distributed algorithm. Event based implementation can be easily created from transitions.

Other State Machine uses Distributed Semaphores Transaction Commit State Machine synchronization core on top of distributed apps. Entire application need not be distributed state machine.

Ideas in this paper Coordination and passing of time modeled as synchronous execution of steps of a state machine Absence of a message becomes NULL message after delay Δ Notion of dynamic membership (vague) Broadcast to drive state machine (vague) State transfer for restart (vague) Scalability in n (not addressed) Fault-tol. (ignores application semantics) Δ-T behavior (real-time mechanism)

Discussion How far can we take the state machine model? Can it be made to scale well? Extreme clock synchronization dependence, practical? Worth it? Possibly large waiting time for each message, dependent upon worst case message delivery latency