Ordering and Consistent Cuts

Ordering and Consistent Cuts
Nicole Caruso Cornell University Dept. of Computer Science Time, Clocks, and the Ordering of Events in a Distributed System Distributed Snapshots: Determining Global States of Distributed Systems

Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport Stanford Research Institute Time, Clocks, and the Ordering of Events in a Distributed System

About the Author Leslie Lamport Stanford Research Institute Leslie Lamport born 1941 in New York City, Bronx High School of Science, B.S. in mathematics from MIT in 1960, M.A. and Ph.D. in mathematics from Brandeis University, respectively in 1963 and 1972, His dissertation was about singularities in analytic partial differential equations. Lamport is best known for his seminal work in distributed systems and as the initial developer of the document preparation system LaTeX. Lamport’s research contributions have laid the foundations of the theory of distributed systems. * “Time, Clocks, and the Ordering of Events in a Distributed System” 5578 * “The Byzantine Generals Problem” 2209 * “Distributed Snapshots: Determining Global States of a Distributed System” 1966 Describe algorithms to solve many fundamental problems in distributed systems, including: * threads in a computer system that require the same resources at the same time * the snapshot algorithm for the determination of consistent global states. Time, Clocks, and the Ordering of Events in a Distributed System

Introduction Our concept of time Distributed system’s concept of time Read news Cross street See green light Person Cars Observer Read news Cross street See green light Person Cars Observer ? Our fundamental concept of time is derived from the way that events are ordered in relation to one another. However, our instinctive understanding of time does not necessarily hold when considering events in a distributed system. This is partly because we claim that event A “happened before” event B, based on the observation that A happened at an earlier time than B, while a distributed system thinks that event A “happened before” event B, based on events observable within the system. But the most significant cause is that events in a distributed system inherently follow partial ordering, which means that identifying which event preceded the other is often impossible. Problems often arise because people are not fully aware of this. Time, Clocks, and the Ordering of Events in a Distributed System

Introduction Coordination of Distributed Systems Lack of Understanding Partial Ordering of Events Total Ordering of Events A distributed system consists of a collection of autonomous computers, connected through a network. The goal of a distributed system is to coordinate processes so that so that they behave as a single, integrated computing facility. Events in a distributed system inherently follow a partial ordering, which include concurrency and is non-deterministic. System designers did not fully understand the temporal characteristics involved in having multiple processes. To solve this problem, we need a high-level distributed algorithm to provide a consistent total ordering of all events. In order to understand this algorithm, I will first describe partial ordering as defined by the “happened before” relation, and then extend this concept to total ordering. Time, Clocks, and the Ordering of Events in a Distributed System

Outline Partial Ordering of Events Logical Clocks Total Ordering of Events Anomalous Behavior Physical Clocks Time, Clocks, and the Ordering of Events in a Distributed System

Partial Ordering of Events
System Definition System contains spatially separated processes Process contains a sequence of events Event manifestation is arbitrary, but must include message sending and message receiving System definition System is composed of a collection of spatially separated processes Each process consists of a sequence of events. Events of a single process form a sequence, or a priori total ordering Event: Granularity of an event is arbitrary: subprogram or single instruction Assume that events include inter-process communication: sending messages and receiving messages Time, Clocks, and the Ordering of Events in a Distributed System

Mathematical Properties Asymmetric If a\<b, then b<a If a\<b and b\<a, then a=b Transitive If a\<b and b\<c, then a\<c Reflexive a\<a Time, Clocks, and the Ordering of Events in a Distributed System

“Happened Before” Relation () Asymmetric: If ab, then b\a If a and b are in same process ab if a occurs before b If a and b are in different processes ab if a is sending of message and b is receipt of message If a is concurrent with b a\b and b\a Transitive: If ab and bc, then ac Reflexive: a\a “happened before” relation () is based on three conditions (1) within a single process: if a and b are within the same process, and if a occurs before b, then ab (2) across multiple processes: if a is within one process and b is within another process, and if a is the sending of a message and b is the receipt of the message, then ab one additional characteristic: concurrent: when a-/->b and b-/->a, implies that a and b are in different processes (3) transitive: if ab & bc, then ac one exception: although the mathematical “happened before” relation has a reflexive property, we ignore this since this is not physically meaningful Time, Clocks, and the Ordering of Events in a Distributed System

Typical Space-Time Diagram for a Distributed System Person Cars Observer See green light Cross street This definition is illustrated by a space time diagram. Horizontal direction represents spatially separated processes (lines). Vertical direction represents temporally separated events (dots). Green arrows indicate which events are messages. Observation: Possible for events (say p and q) to be concurrent even when one event occurs at an earlier physical time than the other. This is because process P cannot know about q until P receives a message about q. We could very well swap the two dots for P3, since there is no guarantee regarding which message P3 observes first. Read news Time, Clocks, and the Ordering of Events in a Distributed System

Logical Clocks Process Clock Ci Clock assigns number to event to represent time Assigns Ci(a) to each event a within Pi Belongs to one process Pi System Clock C Clock C(a) = Ci(a) number is thought of as the time at which the event occurred. Each process has a logical clock, which assigns a number to each event within the process More precisely, we define a clock “Ci” for each process “Pi” which assigns a number to each event “a” in “Pi”. Time, Clocks, and the Ordering of Events in a Distributed System

Logical Clocks Clock Condition: If ab, then C(a) < C(b) If events a and b are in the same process Pi Ci(a)<Ci(b) if a occurs before b If events a and b are in processes Pi and Pj Ci(a)<Cj(b) a is the sending of a message b is the receipt of the message The converse does not hold, since concurrent events may occur at different times. There must be a tick line between any two events on a process line There must be a tick line across any message line Time, Clocks, and the Ordering of Events in a Distributed System

Total Ordering of Events
Total ordering eliminates concurrency Identify message with event sending it Following Example Multiple processes compete for resource Told from point of view of one process Pi Total ordering is a consistent, deterministic way to eliminate concurrency. To create a total ordering, we identify the message with the event sending it. Time, Clocks, and the Ordering of Events in a Distributed System

Process Pi is granted resource Request Tm:Pi In Pi’s request queue Time-stamped before other requests in the queue Acknowledge Tm:Pi Received from Pj Time-stamped later than Request Tm:Pi Condition: Pi is granted the resource when two conditions are satisfied: There is a request resource message in Pi’s request queue. This message is totally ordered before other messages in the queue. Pi has received a message from every other process timestamped later than Tm Time, Clocks, and the Ordering of Events in a Distributed System

Step 1: Pi Sends Request Resource Pi sends Request Tm:Pi to Pj Pi puts Request Tm:Pi on its request queue T1:P1 T0:P1 P1 Step 1: Process Pi requests the resource. Pi puts a request message (with the current timestamp and its ID) on its request queue. P2 P3 Time, Clocks, and the Ordering of Events in a Distributed System

Step 1: Pi Sends Request Resource Pi sends Request Tm:Pi to Pj Pi puts Request Tm:Pi on its request queue T1:P1 T0:P1 P1 Step 1: Process Pi requests the resource. Pi sends the request to every other process Pj. request request T0:P1 T1:P1 P2 P3 Time, Clocks, and the Ordering of Events in a Distributed System

Step 2: Pj Adds Message Pj puts Request Tm:Pi on its request queue Pj sends Acknowledgement Tm:Pj to Pi T1:P1 T0:P1 P1 Step 2: Each process Pj sends acknowledgement to Pi. Each Pj puts the request message on its request queue P2 P3 T0:P1 T1:P1 Time, Clocks, and the Ordering of Events in a Distributed System

Step 2: Pj Adds Message Pj puts Request Tm:Pi on its request queue Pj sends Acknowledgement Tm:Pj to Pi T1:P1 T0:P1 P1 Step 2: Each process Pj sends acknowledgement to Pi. Pj sends an acknowledgement message (with the current timestamp and its ID) to Pi ack ack T2:P2 T3:P3 P2 P3 T0:P1 T1:P1 Time, Clocks, and the Ordering of Events in a Distributed System

Step 3: Pi Sends Release Resource Pi removes Request Tm:Pi from request queue Pi sends Release Tm:Pi to each Pj P1 Step 3: Release Resource Pi removes any of its own request message from its request queue P2 P3 T0:P1 T1:P1 Time, Clocks, and the Ordering of Events in a Distributed System

Step 3: Pi Sends Release Resource Pi removes Request Tm:Pi from request queue Pi sends Release Tm:Pi to each Pj P1 Step 3: Release Resource Pi sends a release resource message to each Pj release release T4:P1 T5:P1 P2 P3 T0:P1 T1:P1 Time, Clocks, and the Ordering of Events in a Distributed System

Step 4: Pj Removes Message Pj receives Release Tm:Pi from Pi Pj removes Request Tm:Pi from request queue P1 Step 4: Remove Message Pj removes any request resource message from its request queue. P2 P3 Time, Clocks, and the Ordering of Events in a Distributed System

Anomalous Behavior Discrepancy between universe/system Event sets and “happens before” relations PA PB PC b a PA PB PC b Universal Event Set S: ab System Event Set S: a\b and b\a a Time, Clocks, and the Ordering of Events in a Distributed System

Anomalous Behavior Strong Clock Condition For events a and b in system event set S If ab, Then C(a)<C(b) Attainable via physical clocks Time, Clocks, and the Ordering of Events in a Distributed System

Physical Clocks Physical Clock Ci(t) PC1 к << 1 for all i: | dCi(t)/dt˗̶̵ 1 | < к PC2 ϵ for all i,j: | Ci(t)˗̶̵ Cj(t) | < ϵ Also µ < smallest transmission time Physical Clock Ci(t) is a function of time, rather than in-system events PC1: There exists a constant k << 1 such that for all i: | dCi(t)/dt - 1 | < k PC2: There exists a constant e (sufficiently small) such that for all i,j: | Ci(t) - Cj(t) | < e Also: There exists a constant m such that: m < smallest transmission time for interprocess communication Time, Clocks, and the Ordering of Events in a Distributed System

Physical Clocks Prevent anomalous behavior Must ensure that Cj(t) < Ci(t+µ) How small must к and ϵ be? ϵ/(1- к) < µ Time, Clocks, and the Ordering of Events in a Distributed System

Discussion Partial ordering Total ordering Anomalous Behavior Physical clocks Extend partial ordering to total ordering via arbitrary tie-breaking. Constant value constraints for physical clocks prevents anomalous behavior. Time, Clocks, and the Ordering of Events in a Distributed System

Conclusions Coordination of Distributed Systems Partial Ordering of Events Total Ordering of Events Background necessary to understand concepts introduced in this paper is complex, but the concepts themselves are quite simple The goal of a distributed system is to coordinate processes so that so that they behave as a single, integrated computing facility. Events in a distributed system inherently follow partial ordering, which means that identifying which event preceded the other is often impossible. Designers did not fully understand the temporal characteristics involved in having multiple processes. To solve this problem, Lamport developed an algorithm to provide a consistent total ordering of all events. Time, Clocks, and the Ordering of Events in a Distributed System

K. Mani Chandy University of Texas at Austin
Distributed Snapshots: Determining Global States of Distributed Systems K. Mani Chandy University of Texas at Austin Leslie Lamport Stanford Research Institute Distributed Snapshots: Determining Global States of Distributed Systems

K. Mani Chandy University of Texas at Austin
About the Authors K. Mani Chandy University of Texas at Austin Leslie Lamport Stanford Research Institute Distributed Snapshots: Determining Global States of Distributed Systems

Introduction Panoramic dynamic scene Questions
Cannot capture with single snapshot Must piece together multiple snapshots Questions How should snapshots be taken? What criteria must overall picture satisfy? Global-state-detection algorithm Example: A photographer cannot capture a panoramic dynamic scene with a single snapshot. A group of photographers must take and piece together multiple snapshots to form a picture of the overall scene. Photographers cannot take snapshots simultaneously and should not disturb the subjects. Likewise: Processes take and piece together records to form a recorded global state Assume that processes do not share clocks or memory, so that Processes cannot take records simultaneously, and finally, processes should not disturb the computation. Goals: Determine how the records should be taken Determine the criteria that the recorded global state must satisfy in order to serve a given purpose Distributed Snapshots: Determining Global States of Distributed Systems

Introduction Process can record its own state
States of all processes form global state Record valid global system state Detect stable properties y(S) = true implies y(all states reachable from S) = true Process in a distributed system can determine a global system state by Record its own state and the messages it sends and receives Have other processes record their own states and communicating them by sending messages to p Set of recorded process and channel states form a global system state. Many distributed algorithms are structured as a sequence of phases. Each phase contains a transient part in which useful work is done, and a stable part in which successive iterations produce no change. Objective is to form a valid recorded global system state from the set of recorded states. Algorithm is applicable to detecting the termination of a phase, or equivalently, detecting stable properties. Once some stable property is true, then that stable property for all states reachable from S is also true Distributed Snapshots: Determining Global States of Distributed Systems

Outline Distributed System Model Global State Detection Algorithm Recorded Global State Properties Stability Detection Time, Clocks, and the Ordering of Events in a Distributed System

Distributed System Model
Process State(t) Event(t) Channel MessagesSent(t) Event Process P State S before State S’ after Channel C (incoming or outgoing from P) Messages (received by P or sent from P) distributed system consists of a finite set of processes and a finite set of channels. process or a channel is defined by a set of states and a set of events. state of a channel is the sequence of messages sent along the channel. event consists of process p to which it belongs, state s of p immediately before the event state s’ of p immediate after the event incident channel whose contents have changed contents of the channel in relation to the process Distributed Snapshots: Determining Global States of Distributed Systems

Example 1: Single Token System States Global: in-P P : sI C : empty C’ : empty Q : sO Global: in-C P : sO C : token Global: in-Q Q : sI Global: in-C’ C’ : token Event P sends Q receives Q sends Q P Distributed Snapshots: Determining Global States of Distributed Systems

Example 2: Nondeterministic System States Global: S0 P : sI C : empty C’ : empty Q : sI Q P Q P States Global: S3 P : sI C : M C’ : empty Q : sO Q P Q P Event P sends M States Global: S1 P : sO C : M C’ : empty Q : sI Event Q sends M’ States Global: S2 P : sO C : M C’ : M’ Q : sO Event P receives M’ Distributed Snapshots: Determining Global States of Distributed Systems

Algorithm Marker Sending Rule Marker Receiving Rule
P records its state P sends marker along each outgoing channel C Marker Receiving Rule Q receives a marker along incoming channel C If Q has not recorded its state Q records its state Else Q records C’s state as a sequence of messages If the graph is strongly connected and at least one process spontaneously records its state, then all processes will record their states in finite time. Distributed Snapshots: Determining Global States of Distributed Systems

Algorithm Example: Process P Obtains Global State from Process Q
Q receives P’s marker along channel C Q records its state Computation Q records C’s state Example: Process P Obtains Global State from Process Q Q receives P’s marker along channel C Q records its state Some Computation Q records C’s state as sequence of messages received along C during Computation Distributed Snapshots: Determining Global States of Distributed Systems

Recorded Global State Properties
Markers produce concurrent subsequence S* may not actually exist S* from combination of concurrent events No effect on preceding/following events S* reachable from Si So reachable from S* The markers break up events into concurrent subsequences Possible for the recorded global state S* to not be identical to any global states that actually occurred. It is okay if S* did not actually exist S* is just as valid as other concurrently permuted states S* is legitimately attainable from a different combination of concurrent events Reordering concurrent events has no effect on the preceding or following events As long as S* is reachable from Si, and So is reachable from S* Time, Clocks, and the Ordering of Events in a Distributed System

Recorded Global State Properties
Theorem 1: Exists Computation {e0’...en’} Events { e0’ ... ei -1’ } is equivalent to { e0 ... ei -1 } { ei’ ... eo -1’ } is a permutation of { ei ... eo -1 } { eo’ ... en’ } is equivalent to { eo ... en } States { S0’ ... Si’ } is equivalent to { S0 ... Si } For some k, where i<k<o, Sk’ = S* { So’ ... Sn’ } is equivalent to { So ... Sn } Theorem 1: There exists a computation { e0’ ... en’ } such that for Events { e0’ ... ei -1’ } is equivalent to { e0 ... ei -1 } { ei’ ... eo -1’ } is a permutation of { ei ... eo -1 } { eo’ ... en’ } is equivalent to { eo ... en } States { S0’ ... Si’ } is equivalent to { S0 ... Si } For some k, where i<k<o, Sk’ = S* { So’ ... Sn’ } is equivalent to { So ... Sn } Time, Clocks, and the Ordering of Events in a Distributed System

Stability Detection Algorithm Initialize: definite=false, y(Si)=definite Repeat: record S*, definite=y(S*) Implications of “definite” definite == false: no stable property at start definite == true: stable property at termination Correctness Si can lead to S*, S* can lead to So for all j: y(Sj) = y(Sj+1) Time, Clocks, and the Ordering of Events in a Distributed System

Discussion Partial Ordering Recorded Global State Global State Detection Algorithm Stable Property Detection Identifies partial ordering via token passing Employs partial ordering by allowing the recording of nonexistent global states Ensures that the recorded global state is one possible correct solution In fact, algorithm produces recorded global states that are potentially inaccurate Global State Detection algorithm Renders recorded global states that are concurrently equivalent to actual states but still meets requirements to detect stability and can still lead to the same final state Stable Property Detection Deadlock detection Termination detection Time, Clocks, and the Ordering of Events in a Distributed System

Conclusions Processes form recorded global state Record its own state Piece together multiple records Questions addressed How should the snapshots be taken? What criteria must overall picture satisfy? Global-state-detection algorithm A process can record its own state and the messages it sends and receives, but A process cannot capture a global state with a single record Processes in a distributed system communicate by sending and receiving messages, which allows Processes to take and piece together multiple records to form a recorded global state How the records should be taken The criteria that the recorded global state must satisfy in order to provide stability detection Time, Clocks, and the Ordering of Events in a Distributed System

Ordering and Consistent Cuts

Similar presentations

Presentation on theme: "Ordering and Consistent Cuts"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ordering and Consistent Cuts

Similar presentations

Presentation on theme: "Ordering and Consistent Cuts"— Presentation transcript:

Similar presentations

About project

Feedback