1 Chapter 11 Global Properties (Distributed Termination)

2 Global properties of DS Two problems to be tackled Two problems to be tackled Determine if the computations at each node have terminated Determine if the computations at each node have terminated Construct a snapshot of the system (where every message is at a “certain time”) Construct a snapshot of the system (where every message is at a “certain time”)

3 Termination Rules A sequential program terminates when it has executed its last statement A sequential program terminates when it has executed its last statement A concurrent program terminates when all its sequential processes have terminated. A concurrent program terminates when all its sequential processes have terminated. Concurrent programs usually execute in infinite loops Concurrent programs usually execute in infinite loops The method for stopping a process is by making it wait for some event (eg., a signal) if no work is available The method for stopping a process is by making it wait for some event (eg., a signal) if no work is available If this blocking is not intentional, then it is a deadlock If this blocking is not intentional, then it is a deadlock

4 How to Detect Termination or Deadlock Centralized system – check ready queue. If empty, it is a termination. If there are blocked processes then it is usually a deadlock Centralized system – check ready queue. If empty, it is a termination. If there are blocked processes then it is usually a deadlock Distributed systems – not easy because there is no way to take a ‘snapshot” of the global state. Distributed systems – not easy because there is no way to take a ‘snapshot” of the global state. Termination algorithms collect information from a set of processes over a period of time and then decide whether termination has occurred or not. Termination algorithms collect information from a set of processes over a period of time and then decide whether termination has occurred or not.

5 Process and Channel Graph Source A set of processes are connected by directional communications channels to form a graph A set of processes are connected by directional communications channels to form a graph This graph is not necessary acyclic and bi-directional channels can be modeled by two channels. This graph is not necessary acyclic and bi-directional channels can be modeled by two channels.

6 Process and Channel Graph (Cont.) We assume that there is a source process which has no incoming edges as in the previous slide We assume that there is a source process which has no incoming edges as in the previous slide The computation is started when the source process sends a message along each of its outgoing edges. The computation is started when the source process sends a message along each of its outgoing edges. Processes send messages to their outgoing edges and receive messages from incoming edges Processes send messages to their outgoing edges and receive messages from incoming edges A process may terminate when computation is finished or re-start if it receives new messages A process may terminate when computation is finished or re-start if it receives new messages

7 Signalling Scheme A signalling scheme is to be superimposed on top of message communication so that when all processes have terminated, the source process will eventually be informed A signalling scheme is to be superimposed on top of message communication so that when all processes have terminated, the source process will eventually be informed Signalling will be done on special channels, one for each message channel, but pointed in the opposite direction as shown on the next slide Signalling will be done on special channels, one for each message channel, but pointed in the opposite direction as shown on the next slide

8 Channels and Spanning Tree A spanning tree is a subset of the edges of the graph that forms a tree such that every node is incident with an edge in the tree A spanning tree is a subset of the edges of the graph that forms a tree such that every node is incident with an edge in the tree The tree will be directed with the source process at its root The tree will be directed with the source process at its root

9 Dijkstra-Scholten Algorithm Consider the above four node system Consider the above four node system Source starts a computation by sending messages to all outgoing edges (nodes 2 and 3) Source starts a computation by sending messages to all outgoing edges (nodes 2 and 3) Waits for termination signals from these nodes and then terminates Waits for termination signals from these nodes and then terminates Source

10 DS Algorithm for the Source Node Source sends messages to all outgoing edges to start the computation (the number of messages sent are recorded in an outDeficit variable) When all the termination signals are received from outgoing edges, the source announces termination

11 DS Algorithm in Computation Nodes A node starts a computation in an other node by sending a message A node starts a computation in an other node by sending a message Parent = -1 means that the node is not active (no message received) Parent = -1 means that the node is not active (no message received) The first message to arrive sets the parent node id to the source node id to create a spanning tree The first message to arrive sets the parent node id to the source node id to create a spanning tree

12 A Distributed System and it’s Spanning Tree The spanning tree is constructed as messages are received The spanning tree is constructed as messages are received The tree structure is not held in any data structure but the parent fields in each node hold the parent id, so a spanning tree may be formed virtually The tree structure is not held in any data structure but the parent fields in each node hold the parent id, so a spanning tree may be formed virtually The termination algorithm is executed on the spanning tree The termination algorithm is executed on the spanning tree A Distributed System Spanning Tree

13 InDeficit and OutDeficit Variables The difference between the number of messages received on an incoming edge E of node i and the number of signals sent on the corresponding back edge is denoted inDeficit i [E] The difference between the number of messages received on an incoming edge E of node i and the number of signals sent on the corresponding back edge is denoted inDeficit i [E] The difference between the number of messages sent on outgoing edges of node i and the number of signals received on back edges is denoted outDeficit i [E] The difference between the number of messages sent on outgoing edges of node i and the number of signals received on back edges is denoted outDeficit i [E]

14 Termination in Computation Nodes Send back termination signals - When a node is terminated it starts to send back termination messages to the incoming edges except the parent edge Send back to the parent - Note that it has to wait all termination signals from its outgoing edges The signal to the parent is sent when the computations in a node and its leaf has terminated

15 Termination in DS Algorithm 1. Send signals on all incoming edges except the parent (first_edge) 2. Wait for signals from all outgoing edges 3. Send signal to the parent (first_edge) Note: This algorithm is executed in a node when there is no work to be done. New messages may arrive to re-start the node process. Algorithm stops when source gets the signal to announce that all is finished

16 Notes on DS The spanning tree is not unique and a different sequence of messages would build a different tree. The spanning tree is not unique and a different sequence of messages would build a different tree. The DS algorithm on slides is simplified (eg., mutual exclusion to global variables are not shown) The DS algorithm on slides is simplified (eg., mutual exclusion to global variables are not shown) A node has no way of knowing whether the entire system has terminated or whether only this node is temporarily without work until a new message arrives. Thus it must continually check for incoming messages. A node has no way of knowing whether the entire system has terminated or whether only this node is temporarily without work until a new message arrives. Thus it must continually check for incoming messages.

17 Termination using Markers In DS algorithm a node can not terminate because it has no way of knowing that the last message arrived is actually the last message to arrive In DS algorithm a node can not terminate because it has no way of knowing that the last message arrived is actually the last message to arrive Termination with marking (TM) algorithm enables all processes to terminate Termination with marking (TM) algorithm enables all processes to terminate

18 TM Algorithm TM algorithm uses special marker (a negative integer) messages to mark the end of message transmission The source process sends a marker message to all outgoing edges. When a process decides to terminate, it first waits for markers on all incoming edges and then propagates a copy of the marker message to all outgoing edges The source process sends a marker message to all outgoing edges. When a process decides to terminate, it first waits for markers on all incoming edges and then propagates a copy of the marker message to all outgoing edges When markers are received from all incoming edges, the node will terminate using the same algorithm as in DS (signal all incoming edges except the first, wait for signals from outgoing edges before signalling the first) When markers are received from all incoming edges, the node will terminate using the same algorithm as in DS (signal all incoming edges except the first, wait for signals from outgoing edges before signalling the first)

19 Global Variables for TM type edge is record record Exists:Boolean := false; Active:Boolean := false; Marker_Received:Boolean := false; end record; Incoming: array(1..N) of edge; Outgoing: array(1..N) of edge; Received_ID:Integer; Received_Data:Integer; First_Edge:Integer;

20 Global Variables for TM (Cont.) Active: Boolean := false; Active: Boolean := false; Used instead of deficit field in DS to mark active channels Marker_Received: Boolean := false; Marker_Received: Boolean := false; Field denoting whether a marker has been received or not

21 Receiving Messages in TM -- message processing if Received_Data < 0 then {marker received} Incoming(Received_ID).Marker_Received := true; else {construct the spanning tree} if First_Edge = 0 then First_Edge := Received_ID; end if; end if; {mark incoming edge as active} if not Incoming(Received_ID).Active then Incoming(Received_ID).Active := true; end if;

22 Receiving Signals in TM -- signal processing {Signal received, mark outgoing edge as in-active} Outgoing(Received_ID).Active := false; N_signals := N_Signals – 1;

23 Sending Messages in TM procedure Send_Message(Data: integer; ID: integer) is begin if not outgoing(ID).Active then Outgoing(ID).Active := true; N_Signals := N_Signals + 1; end if; Node(ID).Message(Data,I); end Send_Message; When the main process wants to sent a message the above procedure is used When the main process wants to sent a message the above procedure is used

24 Main Process in TM Loop {wait for a marker from spanning tree’s first edge – not before} exit when Incoming(First_Edge).Marker_Received; end loop; -- send markers to all outgoing edges {markers for outgoing edges} loop exit when Markers_Received; {markers from incoming edges} end loop; loop {as in DS: send signals, wait signals, send signal to first_edge} exit when Decide_to_Terminate; end loop;

25 Snapshots The TM algorithm is a special case of a more general algorithm that can capture the global state of a system – a distributed snapshot (a consistent recording of the states of all nodes and channels) The TM algorithm is a special case of a more general algorithm that can capture the global state of a system – a distributed snapshot (a consistent recording of the states of all nodes and channels) State of a node: sequence of messages that have been sent and received along all channels incident with the node State of a node: sequence of messages that have been sent and received along all channels incident with the node State of an edge (channel): sequence of messages sent on the edge but not yet delivered to the receiving node (in transmission) State of an edge (channel): sequence of messages sent on the edge but not yet delivered to the receiving node (in transmission) For a snapshot to be consistent, each message must be in exactly one of the states: sent, in transit in an edge or already received

26 Chandy-Lamport Algorithm for Snapshots This algorithm works only if the channels are FIFO, that is, if messages are delivered in the order they were sent This algorithm works only if the channels are FIFO, that is, if messages are delivered in the order they were sent Consider two nodes and the stream of messages sent from node1 to node2: Consider two nodes and the stream of messages sent from node1 to node2: Suppose a snapshot of these two nodes are taken. Node1 has sent 14 messages (m1,...,m14), node2 received 9 messages (m1,...,m9) and messages m10 to m14 are still in transmission. Node1 has no idea which messages have been received by node2 and which are still on the edge, and similarly, node2 can only know which messages it has received, not which messages on the edge Suppose a snapshot of these two nodes are taken. Node1 has sent 14 messages (m1,...,m14), node2 received 9 messages (m1,...,m9) and messages m10 to m14 are still in transmission. Node1 has no idea which messages have been received by node2 and which are still on the edge, and similarly, node2 can only know which messages it has received, not which messages on the edge

27 Marker Messages A marker message is sent on each edge to start a snapshot A marker message is sent on each edge to start a snapshot Let us assume that node1 received a “take a snapshot” marker message after sending message m11. It will send the marker to node2 immediately, so the messages on the edge will be as follows: Let us assume that node1 received a “take a snapshot” marker message after sending message m11. It will send the marker to node2 immediately, so the messages on the edge will be as follows: Node1 records its state as messages m1 to m11. Node 2 records its state as messages m1 to m9. The messages m10 and m11 which are still in transit are the state of the edge Node1 records its state as messages m1 to m11. Node 2 records its state as messages m1 to m9. The messages m10 and m11 which are still in transit are the state of the edge

28 Snapshot Initiation The source starts a snapshot by recording its state and sending a marker on each of its outgoing edges: The source starts a snapshot by recording its state and sending a marker on each of its outgoing edges: for all outgoing edges E send(marker, E, myID) A receiving node, upon receiving the first marker, records its state and propagates the marker to all outgoing edges. A receiving node, upon receiving the first marker, records its state and propagates the marker to all outgoing edges. When a receiving process receives markers from all incoming edges, it records When a receiving process receives markers from all incoming edges, it records its state as the last message received from all incoming edges and the last message sent to the outgoing edges, and its state as the last message received from all incoming edges and the last message sent to the outgoing edges, and the state of the edge as the sequence of messages received between the state it recorded and the receipt of the marker. the state of the edge as the sequence of messages received between the state it recorded and the receipt of the marker.

29 Chandy-Lamport Algorithm for Global Snapshots When a message is sent, the message is recorded in the lastSent array A message received from some incoming edge is stored in lastReceived array

30 Chandy-Lamport Algorithm for Global Snapshots When all markers are received, the state is recorded in the following way: stateAtRecord[E]: the last message sent on each outgoing edge E (array assignment) stateAtRecord[E]: the last message sent on each outgoing edge E (array assignment) messageAtRecord[E]: the last message received on each incoming edge E (array assignment) messageAtRecord[E]: the last message received on each incoming edge E (array assignment) State of edge E: messageAtRecord[E]+1 to messageAtMarker[E]. State of edge E: messageAtRecord[E]+1 to messageAtMarker[E].

31 An Example Abbreviations: ls for lastSent; lr for lastReceived; sr for stateAtRecord; mr for messageAtRecord; mm for messageAtMarker Abbreviations: ls for lastSent; lr for lastReceived; sr for stateAtRecord; mr for messageAtRecord; mm for messageAtMarker Node1 sends three messages and then a marker to node2 and node3 Node1 sends three messages and then a marker to node2 and node3 Node2 sends three messages and then a marker to node3 Node2 sends three messages and then a marker to node3

32 Scenario for the Example 1. Node1 sends 3 messages to node2 where they are received. Node 1 sends 3 messages to node 3 and node2 sends 3 messages to node3, but they are not yet received (node3 is not shown since no messages are received) 2. Node1 sends a marker to node2 and records its state (sr: 3 messages are sent on each outgoing edges) 3. Node1 sends a marker to node3 (Node 3 has not received any messages yet) 4. Node2 receives the marker and records its state (sr: 3 messages sent; mr: 3 messages received) 5. Node2 sends a marker to node3 StepActionnode1node2 lslrsrmrmmlslrsrmrmm 1 [3,3] [3] [3][3][3][3] 21M=>2[3,3][3,3][3][3] 31M=>3[3,3][3,3][3][3] 42<=1M[3,3][3,3][3][3] 52M=>3[3,3][3,3][3][3][3][3][3]

33 Scenario Continued StepActionnode3 lslrsrmrmm 63<=2 73<=2[0,1] 83<=2[0,2] 93<=2M[0,3] 103<=1[0,3][0,3][0,3] 113<=1[1,3][0,3][0,3] 123<=1[2,3][0,3][0,3] 133<=1M[3,3][0,3][0,3] 14[3,3][0,3][3,3] 6,7,8 Node3 receives 3 messages from node2 and updates lr variable 9 Marker sent by node2 is read and the state is updated 10,11,12 Node3 receives 3 messages from node1 and updates lr variable 13 A marker is received from node1. Since a marker has already been received (9) by this node, mr is not updated (p10), but mm is updated The difference between the first components of these two variables indicates that the 3 messages sent from node1 to node3 are considered to be the state of the edge when the snapshot was taken

1 Chapter 11 Global Properties (Distributed Termination)

Similar presentations

Presentation on theme: "1 Chapter 11 Global Properties (Distributed Termination)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter 11 Global Properties (Distributed Termination)

Similar presentations

Presentation on theme: "1 Chapter 11 Global Properties (Distributed Termination)"— Presentation transcript:

Similar presentations

About project

Feedback