CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Lock-Based Concurrency Control
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS542 Topics in Distributed Systems Diganta Goswami.
Consistent Cuts Ken Birman. Idea  We would like to take a snapshot of the state of a distributed computation  We’ll do this by asking participants to.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Ken Birman Cornell University. CS5410 Fall
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Distributed Systems Overview
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Clock Synchronization and algorithm
Cloud Computing Concepts
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
© 1997 UW CSE 11/13/97N-1 Concurrency Control Chapter 18.1, 18.2, 18.5, 18.7.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
CS5412: REPLICATION, CONSISTENCY AND CLOCKS Ken Birman 1 CS5412 Spring 2014 (Cloud Computing: Birman) Lecture X.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
CSC 536 Lecture 1. Outline Synchronization Clock synchronization Logical clocks Lamport timestamps Vector timestamps Global states and distributed snapshot.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Lecture 10 – Mutual Exclusion Distributed Systems.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Consistent cut A cut is a set of events.
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
Distributed Snapshots & Termination detection
Distributed Snapshot.
Distributed Systems: Atomicity, decision making, snapshots
COT 5611 Operating Systems Design Principles Spring 2012
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Time And Global Clocks CMPT 431.
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Consistent cut If this is not true, then the cut is inconsistent
COT 5611 Operating Systems Design Principles Spring 2014
Distributed Snapshot.
Presentation transcript:

CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA

Recall our discussion of time Logical clocks: represent part of  relation, small overhead Vector clocks: accurately represent  but more costly Wall clocks: tradeoff between precision and accuracy. Rarely precise enough for use in protocols Hence often view time as an “add on”

Today: “Simultaneous” actions There are many situations in which we want to talk about some form of simultaneous event Our missile interceptor is one case But think about updating replicated data Perhaps we have multiple conflicting updates The need is to ensure that they will happen in the same order at all copies This “looks” like a kind of simultaneous action

Temporal distortions Things can be complicated because we can’t predict Message delays (they vary constantly) Execution speeds (often a process shares a machine with many other tasks) Timing of external events Lamport looked at this question too

Temporal distortions What does “now” mean? p 0 a f e p 3 b p 2 p 1 c d

Temporal distortions What does “now” mean? p 0 a f e p 3 b p 2 p 1 c d

Temporal distortions Timelines can “stretch”… … caused by scheduling effects, message delays, message loss… p 0 a f e p 3 b p 2 p 1 c d

Temporal distortions Timelines can “shrink” E.g. something lets a machine speed up p 0 a f e p 3 b p 2 p 1 c d

Temporal distortions Cuts represent instants of time. But not every “cut” makes sense Black cuts could occur but not gray ones. p 0 a f e p 3 b p 2 p 1 c d

Consistent cuts and snapshots Idea is to identify system states that “might” have occurred in real-life Need to avoid capturing states in which a message is received but nobody is shown as having sent it This the problem with the gray cuts

Temporal distortions Red messages cross gray cuts “backwards” p 0 a f e p 3 b p 2 p 1 c d

Temporal distortions Red messages cross gray cuts “backwards” In a nutshell: the cut includes a message that “was never sent” p 0 a e p 3 b p 2 p 1 c

Who cares? Suppose, for example, that we want to do distributed deadlock detection System lets processes “wait” for actions by other processes A process can only do one thing at a time A deadlock occurs if there is a circular wait

Deadlock detection “algorithm” p worries: perhaps we have a deadlock p is waiting for q, so sends “what’s your state?” q, on receipt, is waiting for r, so sends the same question… and r for s…. And s is waiting on p.

Suppose we detect this state We see a cycle… … but is it a deadlock? pq s r Waiting for

Phantom deadlocks! Suppose system has a very high rate of locking. Then perhaps a lock release message “passed” a query message i.e. we see “q waiting for r” and “r waiting for s” but in fact, by the time we checked r, q was no longer waiting! In effect: we checked for deadlock on a gray cut – an inconsistent cut.

Consistent cuts and snapshots Goal is to draw a line across the system state such that Every message “received” by a process is shown as having been sent by some other process Some pending messages might still be in communication channels A “cut” is the frontier of a “snapshot”

Chandy/Lamport Algorithm Assume that if p i can talk to p j they do so using a lossless, FIFO connection Now think about logical clocks Suppose someone sets his clock way ahead and triggers a “flood” of messages As these reach each process, it advances its own time… eventually all do so. The point where time jumps forward is a consistent cut across the system

Using logical clocks to make cuts p 0 a f e p 3 b p 2 p 1 c d Message sets the time forward by a “lot” Algorithm requires FIFO channels: must delay e until b has been delivered!

Using logical clocks to make cuts p 0 a f e p 3 b p 2 p 1 c d “Cut” occurs at point where time advanced

Turn idea into an algorithm To start a new snapshot, p i … Builds a message: “P i is initiating snapshot k”. The tuple (p i, k) uniquely identifies the snapshot In general, on first learning about snapshot (p i, k), p x Writes down its state: p x ’s contribution to the snapshot Starts “tape recorders” for all communication channels Forwards the message on all outgoing channels Stops “tape recorder” for a channel when a snapshot message for (p i, k) is received on it Snapshot consists of all the local state contributions and all the tape-recordings for the channels

Chandy/Lamport This algorithm, but implemented with an outgoing flood, followed by an incoming wave of snapshot contributions Snapshot ends up accumulating at the initiator, p i Algorithm doesn’t tolerate process failures or message failures.

Chandy/Lamport p q r s t u v w x y z A network

Chandy/Lamport p q r s t u v w x y z A network I want to start a snapshot

Chandy/Lamport p q r s t u v w x y z A network p records local state

Chandy/Lamport p q r s t u v w x y z A network p starts monitoring incoming channels

Chandy/Lamport p q r s t u v w x y z A network “contents of channel p- y”

Chandy/Lamport p q r s t u v w x y z A network p floods message on outgoing channels…

Chandy/Lamport p q r s t u v w x y z A network

Chandy/Lamport p q r s t u v w x y z A network q is done

Chandy/Lamport p q r s t u v w x y z A network q

Chandy/Lamport p q r s t u v w x y z A network q

Chandy/Lamport p q r s t u v w x y z A network q z s

Chandy/Lamport p q r s t u v w x y z A network q v z x u s

Chandy/Lamport p q r s t u v w x y z A network q v w z x u s y r

Chandy/Lamport p q r s t u v w x y z A snapshot of a network q x u s v r t w p y z Done!

What’s in the “state”? In practice we only record things important to the application running the algorithm, not the “whole” state E.g. “locks currently held”, “lock release messages” Idea is that the snapshot will be Easy to analyze, letting us build a picture of the system state And will have everything that matters for our real purpose, like deadlock detection

Other algorithms? Many algorithms have a consistent cut mechanism hidden within More broadly we’ll see that notions of time are sometimes explicit in algorithms But are often used as the insight that motivated the developer By thinking about time, he or she was able to reason about a protocol We’ll often use this approach