1 Chapter 11 Global Properties (Distributed Termination)

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Global States.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Global States in a Distributed System By John Kor and Yvonne Cheng.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
6.852: Distributed Algorithms Spring, 2008 Class 12.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Termination Detection of Diffusing Computations Chapter 19 Distributed Algorithms by Nancy Lynch Presented by Jamie Payton Oct. 3, 2003.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University
Global State Collection. Global state collection Some applications - computing network topology - termination detection - deadlock detection Chandy-Lamport.
Distributed Snapshot (continued)
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Slides for Chapter 10: Time and Global State
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
1 Chapter 10 Distributed Algorithms. 2 Chapter Content This and the next two chapters present algorithms designed for loosely-connected distributed systems.
Termination Detection
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Global State Collection
D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Hwajung Lee. Some applications - computing network topology - termination detection - deadlock detection Chandy Lamport algorithm does a partial job.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Token-passing Algorithms Suzuki-Kasami algorithm The Main idea Completely connected network of processes There is one token in the network. The holder.
Parallel and Distributed Simulation Deadlock Detection & Recovery.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Termination detection
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
Lecture 9: Asynchronous Network Algorithms
Logical Clocks and Casual Ordering
Global State Collection
Distributed Snapshot Distributed Systems.
Chapter 5 (through section 5.4)
Slides for Chapter 11: Time and Global State
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
CIS825 Lecture 5 1.
Slides for Chapter 14: Time and Global States
Presentation transcript:

1 Chapter 11 Global Properties (Distributed Termination)

2 Global properties of DS Two problems to be tackled Two problems to be tackled Determine if the computations at each node have terminated Determine if the computations at each node have terminated Construct a snapshot of the system (where every message is at a “certain time”) Construct a snapshot of the system (where every message is at a “certain time”)

3 Termination Rules A sequential program terminates when it has executed its last statement A sequential program terminates when it has executed its last statement A concurrent program terminates when all its sequential processes have terminated. A concurrent program terminates when all its sequential processes have terminated. Concurrent programs usually execute in infinite loops Concurrent programs usually execute in infinite loops The method for stopping a process is by making it wait for some event (eg., a signal) if no work is available The method for stopping a process is by making it wait for some event (eg., a signal) if no work is available If this blocking is not intentional, then it is a deadlock If this blocking is not intentional, then it is a deadlock

4 How to Detect Termination or Deadlock Centralized system – check ready queue. If empty, it is a termination. If there are blocked processes then it is usually a deadlock Centralized system – check ready queue. If empty, it is a termination. If there are blocked processes then it is usually a deadlock Distributed systems – not easy because there is no way to take a ‘snapshot” of the global state. Distributed systems – not easy because there is no way to take a ‘snapshot” of the global state. Termination algorithms collect information from a set of processes over a period of time and then decide whether termination has occurred or not. Termination algorithms collect information from a set of processes over a period of time and then decide whether termination has occurred or not.

5 Process and Channel Graph Source A set of processes are connected by directional communications channels to form a graph A set of processes are connected by directional communications channels to form a graph This graph is not necessary acyclic and bi-directional channels can be modeled by two channels. This graph is not necessary acyclic and bi-directional channels can be modeled by two channels.

6 Process and Channel Graph (Cont.) We assume that there is a source process which has no incoming edges as in the previous slide We assume that there is a source process which has no incoming edges as in the previous slide The computation is started when the source process sends a message along each of its outgoing edges. The computation is started when the source process sends a message along each of its outgoing edges. Processes send messages to their outgoing edges and receive messages from incoming edges Processes send messages to their outgoing edges and receive messages from incoming edges A process may terminate when computation is finished or re-start if it receives new messages A process may terminate when computation is finished or re-start if it receives new messages

7 Signalling Scheme A signalling scheme is to be superimposed on top of message communication so that when all processes have terminated, the source process will eventually be informed A signalling scheme is to be superimposed on top of message communication so that when all processes have terminated, the source process will eventually be informed Signalling will be done on special channels, one for each message channel, but pointed in the opposite direction as shown on the next slide Signalling will be done on special channels, one for each message channel, but pointed in the opposite direction as shown on the next slide

8 Channels and Spanning Tree A spanning tree is a subset of the edges of the graph that forms a tree such that every node is incident with an edge in the tree A spanning tree is a subset of the edges of the graph that forms a tree such that every node is incident with an edge in the tree The tree will be directed with the source process at its root The tree will be directed with the source process at its root

9 Dijkstra-Scholten Algorithm Consider the above four node system Consider the above four node system Source starts a computation by sending messages to all outgoing edges (nodes 2 and 3) Source starts a computation by sending messages to all outgoing edges (nodes 2 and 3) Waits for termination signals from these nodes and then terminates Waits for termination signals from these nodes and then terminates Source

10 DS Algorithm for the Source Node Source sends messages to all outgoing edges to start the computation (the number of messages sent are recorded in an outDeficit variable) When all the termination signals are received from outgoing edges, the source announces termination

11 DS Algorithm in Computation Nodes A node starts a computation in an other node by sending a message A node starts a computation in an other node by sending a message Parent = -1 means that the node is not active (no message received) Parent = -1 means that the node is not active (no message received) The first message to arrive sets the parent node id to the source node id to create a spanning tree The first message to arrive sets the parent node id to the source node id to create a spanning tree

12 A Distributed System and it’s Spanning Tree The spanning tree is constructed as messages are received The spanning tree is constructed as messages are received The tree structure is not held in any data structure but the parent fields in each node hold the parent id, so a spanning tree may be formed virtually The tree structure is not held in any data structure but the parent fields in each node hold the parent id, so a spanning tree may be formed virtually The termination algorithm is executed on the spanning tree The termination algorithm is executed on the spanning tree A Distributed System Spanning Tree

13 InDeficit and OutDeficit Variables The difference between the number of messages received on an incoming edge E of node i and the number of signals sent on the corresponding back edge is denoted inDeficit i [E] The difference between the number of messages received on an incoming edge E of node i and the number of signals sent on the corresponding back edge is denoted inDeficit i [E] The difference between the number of messages sent on outgoing edges of node i and the number of signals received on back edges is denoted outDeficit i [E] The difference between the number of messages sent on outgoing edges of node i and the number of signals received on back edges is denoted outDeficit i [E]

14 Termination in Computation Nodes Send back termination signals - When a node is terminated it starts to send back termination messages to the incoming edges except the parent edge Send back to the parent - Note that it has to wait all termination signals from its outgoing edges The signal to the parent is sent when the computations in a node and its leaf has terminated

15 Termination in DS Algorithm 1. Send signals on all incoming edges except the parent (first_edge) 2. Wait for signals from all outgoing edges 3. Send signal to the parent (first_edge) Note: This algorithm is executed in a node when there is no work to be done. New messages may arrive to re-start the node process. Algorithm stops when source gets the signal to announce that all is finished

16 Notes on DS The spanning tree is not unique and a different sequence of messages would build a different tree. The spanning tree is not unique and a different sequence of messages would build a different tree. The DS algorithm on slides is simplified (eg., mutual exclusion to global variables are not shown) The DS algorithm on slides is simplified (eg., mutual exclusion to global variables are not shown) A node has no way of knowing whether the entire system has terminated or whether only this node is temporarily without work until a new message arrives. Thus it must continually check for incoming messages. A node has no way of knowing whether the entire system has terminated or whether only this node is temporarily without work until a new message arrives. Thus it must continually check for incoming messages.

17 Termination using Markers In DS algorithm a node can not terminate because it has no way of knowing that the last message arrived is actually the last message to arrive In DS algorithm a node can not terminate because it has no way of knowing that the last message arrived is actually the last message to arrive Termination with marking (TM) algorithm enables all processes to terminate Termination with marking (TM) algorithm enables all processes to terminate

18 TM Algorithm TM algorithm uses special marker (a negative integer) messages to mark the end of message transmission The source process sends a marker message to all outgoing edges. When a process decides to terminate, it first waits for markers on all incoming edges and then propagates a copy of the marker message to all outgoing edges The source process sends a marker message to all outgoing edges. When a process decides to terminate, it first waits for markers on all incoming edges and then propagates a copy of the marker message to all outgoing edges When markers are received from all incoming edges, the node will terminate using the same algorithm as in DS (signal all incoming edges except the first, wait for signals from outgoing edges before signalling the first) When markers are received from all incoming edges, the node will terminate using the same algorithm as in DS (signal all incoming edges except the first, wait for signals from outgoing edges before signalling the first)

19 Global Variables for TM type edge is record record Exists:Boolean := false; Active:Boolean := false; Marker_Received:Boolean := false; end record; Incoming: array(1..N) of edge; Outgoing: array(1..N) of edge; Received_ID:Integer; Received_Data:Integer; First_Edge:Integer;

20 Global Variables for TM (Cont.) Active: Boolean := false; Active: Boolean := false; Used instead of deficit field in DS to mark active channels Marker_Received: Boolean := false; Marker_Received: Boolean := false; Field denoting whether a marker has been received or not

21 Receiving Messages in TM -- message processing if Received_Data < 0 then {marker received} Incoming(Received_ID).Marker_Received := true; else {construct the spanning tree} if First_Edge = 0 then First_Edge := Received_ID; end if; end if; {mark incoming edge as active} if not Incoming(Received_ID).Active then Incoming(Received_ID).Active := true; end if;

22 Receiving Signals in TM -- signal processing {Signal received, mark outgoing edge as in-active} Outgoing(Received_ID).Active := false; N_signals := N_Signals – 1;

23 Sending Messages in TM procedure Send_Message(Data: integer; ID: integer) is begin if not outgoing(ID).Active then Outgoing(ID).Active := true; N_Signals := N_Signals + 1; end if; Node(ID).Message(Data,I); end Send_Message; When the main process wants to sent a message the above procedure is used When the main process wants to sent a message the above procedure is used

24 Main Process in TM Loop {wait for a marker from spanning tree’s first edge – not before} exit when Incoming(First_Edge).Marker_Received; end loop; -- send markers to all outgoing edges {markers for outgoing edges} loop exit when Markers_Received; {markers from incoming edges} end loop; loop {as in DS: send signals, wait signals, send signal to first_edge} exit when Decide_to_Terminate; end loop;

25 Snapshots The TM algorithm is a special case of a more general algorithm that can capture the global state of a system – a distributed snapshot (a consistent recording of the states of all nodes and channels) The TM algorithm is a special case of a more general algorithm that can capture the global state of a system – a distributed snapshot (a consistent recording of the states of all nodes and channels) State of a node: sequence of messages that have been sent and received along all channels incident with the node State of a node: sequence of messages that have been sent and received along all channels incident with the node State of an edge (channel): sequence of messages sent on the edge but not yet delivered to the receiving node (in transmission) State of an edge (channel): sequence of messages sent on the edge but not yet delivered to the receiving node (in transmission) For a snapshot to be consistent, each message must be in exactly one of the states: sent, in transit in an edge or already received

26 Chandy-Lamport Algorithm for Snapshots This algorithm works only if the channels are FIFO, that is, if messages are delivered in the order they were sent This algorithm works only if the channels are FIFO, that is, if messages are delivered in the order they were sent Consider two nodes and the stream of messages sent from node1 to node2: Consider two nodes and the stream of messages sent from node1 to node2: Suppose a snapshot of these two nodes are taken. Node1 has sent 14 messages (m1,...,m14), node2 received 9 messages (m1,...,m9) and messages m10 to m14 are still in transmission. Node1 has no idea which messages have been received by node2 and which are still on the edge, and similarly, node2 can only know which messages it has received, not which messages on the edge Suppose a snapshot of these two nodes are taken. Node1 has sent 14 messages (m1,...,m14), node2 received 9 messages (m1,...,m9) and messages m10 to m14 are still in transmission. Node1 has no idea which messages have been received by node2 and which are still on the edge, and similarly, node2 can only know which messages it has received, not which messages on the edge

27 Marker Messages A marker message is sent on each edge to start a snapshot A marker message is sent on each edge to start a snapshot Let us assume that node1 received a “take a snapshot” marker message after sending message m11. It will send the marker to node2 immediately, so the messages on the edge will be as follows: Let us assume that node1 received a “take a snapshot” marker message after sending message m11. It will send the marker to node2 immediately, so the messages on the edge will be as follows: Node1 records its state as messages m1 to m11. Node 2 records its state as messages m1 to m9. The messages m10 and m11 which are still in transit are the state of the edge Node1 records its state as messages m1 to m11. Node 2 records its state as messages m1 to m9. The messages m10 and m11 which are still in transit are the state of the edge

28 Snapshot Initiation The source starts a snapshot by recording its state and sending a marker on each of its outgoing edges: The source starts a snapshot by recording its state and sending a marker on each of its outgoing edges: for all outgoing edges E send(marker, E, myID) A receiving node, upon receiving the first marker, records its state and propagates the marker to all outgoing edges. A receiving node, upon receiving the first marker, records its state and propagates the marker to all outgoing edges. When a receiving process receives markers from all incoming edges, it records When a receiving process receives markers from all incoming edges, it records its state as the last message received from all incoming edges and the last message sent to the outgoing edges, and its state as the last message received from all incoming edges and the last message sent to the outgoing edges, and the state of the edge as the sequence of messages received between the state it recorded and the receipt of the marker. the state of the edge as the sequence of messages received between the state it recorded and the receipt of the marker.

29 Chandy-Lamport Algorithm for Global Snapshots When a message is sent, the message is recorded in the lastSent array A message received from some incoming edge is stored in lastReceived array

30 Chandy-Lamport Algorithm for Global Snapshots When all markers are received, the state is recorded in the following way: stateAtRecord[E]: the last message sent on each outgoing edge E (array assignment) stateAtRecord[E]: the last message sent on each outgoing edge E (array assignment) messageAtRecord[E]: the last message received on each incoming edge E (array assignment) messageAtRecord[E]: the last message received on each incoming edge E (array assignment) State of edge E: messageAtRecord[E]+1 to messageAtMarker[E]. State of edge E: messageAtRecord[E]+1 to messageAtMarker[E].

31 An Example Abbreviations: ls for lastSent; lr for lastReceived; sr for stateAtRecord; mr for messageAtRecord; mm for messageAtMarker Abbreviations: ls for lastSent; lr for lastReceived; sr for stateAtRecord; mr for messageAtRecord; mm for messageAtMarker Node1 sends three messages and then a marker to node2 and node3 Node1 sends three messages and then a marker to node2 and node3 Node2 sends three messages and then a marker to node3 Node2 sends three messages and then a marker to node3

32 Scenario for the Example 1. Node1 sends 3 messages to node2 where they are received. Node 1 sends 3 messages to node 3 and node2 sends 3 messages to node3, but they are not yet received (node3 is not shown since no messages are received) 2. Node1 sends a marker to node2 and records its state (sr: 3 messages are sent on each outgoing edges) 3. Node1 sends a marker to node3 (Node 3 has not received any messages yet) 4. Node2 receives the marker and records its state (sr: 3 messages sent; mr: 3 messages received) 5. Node2 sends a marker to node3 StepActionnode1node2 lslrsrmrmmlslrsrmrmm 1 [3,3] [3] [3][3][3][3] 21M=>2[3,3][3,3][3][3] 31M=>3[3,3][3,3][3][3] 42<=1M[3,3][3,3][3][3] 52M=>3[3,3][3,3][3][3][3][3][3]

33 Scenario Continued StepActionnode3 lslrsrmrmm 63<=2 73<=2[0,1] 83<=2[0,2] 93<=2M[0,3] 103<=1[0,3][0,3][0,3] 113<=1[1,3][0,3][0,3] 123<=1[2,3][0,3][0,3] 133<=1M[3,3][0,3][0,3] 14[3,3][0,3][3,3] 6,7,8 Node3 receives 3 messages from node2 and updates lr variable 9 Marker sent by node2 is read and the state is updated 10,11,12 Node3 receives 3 messages from node1 and updates lr variable 13 A marker is received from node1. Since a marker has already been received (9) by this node, mr is not updated (p10), but mm is updated The difference between the first components of these two variables indicates that the 3 messages sent from node1 to node3 are considered to be the state of the edge when the snapshot was taken