Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Snapshot Distributed Systems.

Similar presentations


Presentation on theme: "Distributed Snapshot Distributed Systems."— Presentation transcript:

1 Distributed Snapshot Distributed Systems

2 Introduction: ¿What is a Distributed System?
A network of processes. The nodes are processes, and the edges are comunication channels.

3 Introduction A computation is a sequence of atomic actions that transform a given initial state to the final state. While such actions are totally ordered in a sequential process, they are only partially ordered in a distributed system.

4 Introduction In this context, the state (also known as global state) of a distributed system is the set of local states of all the component processes, as well as the states of every channel through which messages flow.

5 Introduction So the important question is: when or how do we record the states of the processes and the channels? Depending on when the states of the individual components are recorded, the value of the global state can vary widely.

6 Difficulties The recording of the global state may look simple for some external observert who looks at the system from outside. The same problem is surprisingly challenging, when one takes a snapshot from inside the system.

7 Difficulties Consider a system of three processes numbered 0, 1, and 2 connected by FIFO channels, and assume that an unknown number of indistinguishable tokens are circulating indefinitely through this network. We want the processes to cooperate with one another to count the exact number of tokens circulating in the system (without ever stopping the system).

8 Difficulties Deadlock detection. Any process that does not have an eligible action for a prolonged period would like to find out if the system has reached a deadlock configuration. Termination detection. To begin the computation in a certain phase, a process must therefore know whether every other process has finished their computation in the previous phase. Network reset. In case of a malfunction or a loss of coordination, a distributed system will need to roll back to a consistent global state and initiate a recovery. Previous snapshots may be helpful.

9 Properties of Consistent Snapshots
A snapshot state (SSS) consists of a set of local states, where each local state is the outcome of a recording event that follows a send, or a receive, or an internal action. The important notion here is that of a consistent cut.

10 Properties of Consistent Snapshots
A cut is a set of events—it contains at least one event per process. A cut is called consistent, if for each event that it contains, it also includes all events causally ordered before it.

11 Properties of Consistent Snapshots
The set of local states following the recorded recent events of a consistent cut forms a consistent snapshot. In a distributed system, many consistent snapshots can be recorded. A snapshot that is often of practical interest is the one that is most recent.

12 The Chandy-Lamport Algorithm
Let the topology of a distributed system be represented by a strongly connected graph. Each node represents a process and each directed edge represents a FIFO channel. A process called the initiator initiates the distributed snapshot algorithm. Any process can be an initiator. The initiator process sends a special message, called a marker (*) that prompts other processes in the system to record their states. The global state consists of the states of the processes as well as the channels. However, channels are passive entities — so the responsibility of recording the state of a channel lies with the process on which the channel is incident.

13 The Chandy-Lamport Algorithm
DS1 The initiator process, in one atomic action, does the following: Turns red Records its own state Sends a marker along all its outgoing channels DS2 Every process, upon receiving a marker for the first time and before doing anything else, does the following in one atomic action: Records its state Sends markers along all its outgoing channels

14 The Chandy-Lamport Algorithm
The snapshot algorithm terminates, when: Every process has turned red Every process has received a marker through each of its incoming channels

15 The Chandy-Lamport Algorithm

16 The Chandy-Lamport Algorithm
The individual processes only record the fragments of a snapshot state SSS. It requires another phase of activity to collect these fragments and form a composite view of SSS. Global state collection is not a part of the snapshot algorithm.

17 The Lai-Yang Algorithm
Lai andYang proposed an algorithm for distributed snapshot on a network of processes where the channels need not be FIFO. A message is white if it is sent by a process that has not recorded its state, and a message is red if the sender has already recorded its state. However, there are no markers — processes are allowed to record their local states spontaneously,

18 The Lai-Yang Algorithm
LY1. The initiator records its own state. When it needs to send a message m to another process, it sends (m, red). LY2. When a process receives a message (m, red), it records its state if it has not already done so, and then accepts the message m.

19 The Lai-Yang Algorithm
The approach is “lazy” in as much as processes do not send or use any control message for the sake of recording a consistent snapshot. The good thing is that if a complete snapshot is taken, then it will be consistent. However, there is no guarantee that a complete snapshot will eventually be taken: if a process i wants to detect termination, then i will record its own state following its last action, but send no message, so other process may not record their states (dummy control messages).


Download ppt "Distributed Snapshot Distributed Systems."

Similar presentations


Ads by Google