Consistent Cuts Ken Birman. Idea  We would like to take a snapshot of the state of a distributed computation  We’ll do this by asking participants to.

Slides:

Advertisements

Similar presentations

Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.

1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.

Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.

Ken Birman Cornell University. CS5410 Fall

CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.

CS 582 / CMPE 481 Distributed Systems

Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.

Ordering and Consistent Cuts Presented By Biswanath Panda.

CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.

Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.

Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.

Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.

1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.

EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.

Cloud Computing Concepts

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.

Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:

CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.

© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.

A Survey of Rollback-Recovery Protocols in Message-Passing Systems.

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.

Distributed Systems Fall 2010 Logical time, global states, and debugging.

The Relational Model1 Transaction Processing Units of Work.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.

EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

1 Chapter 11 Global Properties (Distributed Termination)

Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

Distributed Systems Lecture 6 Global states and snapshots 1.

Operating System Reliability Andy Wang COP 5611 Advanced Operating Systems.

Distributed Databases – Advanced Concepts Chapter 25 in Textbook.

PDES Introduction The Time Warp Mechanism

The consensus problem in distributed systems

CSE 486/586 Distributed Systems Global States

Theoretical Foundations

Operating System Reliability

Operating System Reliability

COT 5611 Operating Systems Design Principles Spring 2012

EECS 498 Introduction to Distributed Systems Fall 2017

Operating System Reliability

Operating System Reliability

EEC 688/788 Secure and Dependable Computing

湖南大学-信息科学与工程学院-计算机与科学系

Time And Global Clocks CMPT 431.

Parallel and Distributed Simulation

Operating System Reliability

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

CSE 486/586 Distributed Systems Global States

COT 5611 Operating Systems Design Principles Spring 2014

Last Class: Fault Tolerance

Operating System Reliability

Operating System Reliability

Presentation transcript:

Consistent Cuts Ken Birman

Idea  We would like to take a snapshot of the state of a distributed computation  We’ll do this by asking participants to jot down their states  Under what conditions can the resulting “puzzle pieces” be assembled into a consistent whole?

An instant in real-time  Imagine that we could photograph the system in real-time at some instant  Process state: A set of variables and values  Channel state Messages in transit through the network  In principle, the system is fully defined by the set of such states

Problems?  Real systems don’t have real-time snapshot facilities  In fact, real systems may not have channels in this sense, either  How can we approximate the real-time concept of a cut using purely “logical time”?

Deadlock detection  Need to detect cycles AB D C

Deadlock is a “stable” property  Once a deadlock occurs, it holds in all future states of the system  Easy to prove that if a snapshot is computed correctly, a stable condition detected in the snapshot will continue to hold Insight is that adding events can’t “undo” the condition

Leads us to define consistent cut and snapshot  Think of the execution of a process as a history of events, Lamport-style Events can be local, msg-send, msg-rcv  A consistent snapshot is a set of history prefixes and messages closed under causality  A consistent cut is the frontier of a consistent snapshot – the “process states”

Deadlock detection  Need to detect cycles AB D C

Deadlock detection  Need to detect cycles AB D C

Deadlock detection  Need to detect cycles AB D C

Deadlock detection  A “ghost” or “false” cycle! AB D C

A ghost deadlock  Occurs when we accidently snapshot process states so as to include some events while omitting prior events  Can’t occur if the cut is computed consistently since this violates causal closure requirement

A ghost deadlock ABCDABCD

ABCDABCD

ABCDABCD

Algorithms for computing consistent cuts  Paper focuses on a flooding algorithm  We’ll consider several other methods too Logical timestamps Flooding algorithm without blocking Two-phase commit with blocking  Each pattern arises commonly in distributed systems we’ll look at in coming weeks

Cuts using logical clocks  Suppose that we have Lamport’s basic logical clocks  But we add a new operation called “snap” Write down your process state Create empty “channel state” structure Set your logical clock to some big value Think of clock as (epoch-number, counter)? Record channel state until rcv message with big incoming clock value

How does this work?  Recall that with Lamport’s clocks, if e is causally prior to e’ then LT(e) < LT(e’)  Our scheme creates a snapshot for each process at instant it reaches logical time t  Easy to see that these events are concurrent: a possible instant in real-time  Depends upon FIFO channels, can’t easily tell when cut is complete – a sort of lazy version of the flooding algorithm

Flooding algorithm  To make a cut, observer sends out messages “snap”  On receiving “snap” the first time, A Writes down its state, creates empty channel state record for all incoming channels Sends “snap” to all neighbor processes Waits for “snap” on all incoming channels A’s piece of the snapshot is its state and the channel contents once it receives “snap” from all neighbors  Note: also assumes FIFO channels

With 2-phase commit  In this, the initiator sends to all neighbors: “Please halt” A halts computation, sends “please halt” to all downstream neighbors Waits for “halted” from all of them Replies “halted” to upstream caller  Now initiator sends “snap” A forwards “snap” downstream Waits for replies Collects them into its own state Send’s own state to upstream caller and resumes

Why does this work?  Forces the system into an idle state  In this situation, nothing is changing…  Usually, sender in this scheme records unacknowledged outgoing channel state  Alternative: upstream process tells receiver how many incoming messages to await, receiver does so and includes them in its state.  So a snapshot can be safely computed and there is nothing unaccounted for in the channels

Observation  Suppose we use a two-phase property detection algorithm  In first phase, asks (for example), “what is your current state”  You reply “waiting for a reply from B” and give a “wait counter”  If a second round of the same algorithm detects the same condition with the same wait-counter values, the condition is stable…

A ghost deadlock ABCDABCD

Look twice and it goes away…  But we could see “new” wait edges mimicking the old ones  This is why we need some form of counter to distinguish same-old condition from new edges on the same channels…  Easily extended to other conditions

Consistent cuts  Offer the illusion that you took a picture of the system at an instant in real-time  A powerful notion widely used in real systems Especially valuable after a failure Allows us to reconstruct the state so that we can repair it, e.g. recreate missing tokens  But has awkward hidden assumptions

Hidden assumptions  Use of FIFO channels is a problem Many systems use some form of datagram Many systems have multiple concurrent senders on same paths  These algorithms assume knowledge of system membership Hard to make them fault-tolerant Recall that a slow process can seem “faulty”

High costs  With flooding algorithm, n 2 messages  With 2-phase commit algorithm, system pauses for a long time  We’ll see some tricky ways to hide these costs either by continuing to run but somehow delaying delivery of messages to the application, or by treating the cut algorithm as a background task  Could have concurrent activities that view same messages in different ways…

Fault-tolerance  Many issues here Who should run the algorithm? If we decide that a process is faulty, what happens if a message from it then turns up? What if failures leave a “hole” in the system state – missing messages or missing process state  Problems are overcome in virtual synchrony implementations of group communication tools

Systems issues  Suppose that I want to add notions such as real- time, logical time, consistent cuts, etc to a complex real-world operating system (list goes on)  How should these abstractions be integrated with the usual O/S interfaces, like the file system, the process subsystem, etc?  Only virtual synchrony has really tackled these kinds of questions, but one could imagine much better solutions. A possible research topic, for a PhD in software engineering

Theory issues  Lamport’s ideas are fundamentally rooted in static notions of system membership  Later with his work on Paxos he adds the idea of dynamically changing subsets of a static maximum set  Does true dynamicism, of sort used when we look at virtual synchrony, have fundamental implications?

Example of a theory question  Suppose that I want to add a “location” type to a language like Java: Object o is at process p at computer x Objects {a,b,c} are replicas of   Now notions of system membership and location are very fundamental to the type system  Need a logic of locations. How should it look?  Extend to a logic of replication and self-defined membership? But FLP lurks in the shadows…

FLP

Other questions  Checkpoint/rollback Processes make checkpoints, probably when convenient Some systems try to tell a process “when” to make them, using some form of signal or interrupt But this tends to result in awkward, large checkpoints Later if a fault occurs we can restart from the most recent checkpoint

So, where’s the question?  The issue arises when systems use message passing and want to checkpoint/restart  Few applications are deterministic Clocks, signals, threads & scheduling, interrupts, multiple I/O channels, order in which messages arrived, user input…  When rolling forward from a checkpoint actions might not be identical Hence anyone who “saw” my actions may be in a state that won’t be recreated!

Technical question  Suppose we make checkpoints in an uncoordinated manner  Now process p fails  Which other processes should roll back?  And how far might this rollback cascade?

Rollback scenario

Avoiding cascaded rollback?  Both making checkpoints, and rolling back, should happen along consistent cuts  In mid 1980’s several papers developed this into simple 2-phase protocols  Today would recognize them as algorithms that simply run on consistent cuts  For those who are interested: sender-based logging is the best algorithm in this area. (Alvisi’s work)