Presentation is loading. Please wait.

Presentation is loading. Please wait.

8.6. Recovery By Hemanth Kumar Reddy.

Similar presentations


Presentation on theme: "8.6. Recovery By Hemanth Kumar Reddy."— Presentation transcript:

1 8.6. Recovery By Hemanth Kumar Reddy

2 Recovery Error is a part of the system that might lead to a failure.
When a failure has occurred in a system, it is required to recover the process where the failure happened to a correct state. The idea is to replace the error state with a error free state. Error recovery can be done in two ways: Forward Recovery and Backward Recovery

3 Backward Recovery In this type of recovery, the current erroneous state of the system is changed to a previously correct state. To achieve this, the state of the system had to be recorded from time to time. This recorded data has to be restored when there is a occurrence of failure. A checkpoint is made whenever the state of the system is recorded. Example: To recover a lost packet, the sender has to retransmit that packet. As a result, this is done by going back to a previous correct state that is in which the packet that was lost is being sent.

4 Contd.. Backward Recovery mechanisms are widely used for recovering from failures in distributed systems. Advantage: Independent of any specific system or process. Disadvantages: Restoring to a previous state is a costly operation. After recovery, there is no guarantee that the similar failure will not occur again - loop of recovery. Inspite of checkpointing, few states can never be rolled back to.

5 Forward Recovery In this form of recovery, instead of entering a previously correct state, the erroneous state of the system is moved to a new correct state. Disadvantage: The errors that might occur in the system have to be known from before. Example: the missing packets are constructed from successfully delivered packets. If the delivered packets are not sufficient to reconstruct the lost packet, the sender have to continue sending packets until that lost packet can be constructed.

6 Stable Storage The information that is required to enable recovery has to be safely stored i.e. information should be able to survive process crashes, site failures and storage media failures. Storage comes in three categories: RAM - Data is lost when power fails or a machine crashes. Disk storage - survives CPU failures but which can be lost in disk head crashes. Stable storage - survives anything except major calamities. Stable storage plays an important role when it comes to recovery in distributed systems.

7 Contd.. Stable storage is implemented with a pair of ordinary disks. Every block on drive 2 is a copy of the corresponding block on drive 1. So, when updated, first the block on drive 1 is updated and then the same block on drive 2 is done.

8 Message Logging Recovery to a previous correct state is achieved through checkpointing which is a costly operation and can lead to performance issues. To avoid this, many fault-tolerant distributed systems combine checkpointing with message logging. Sender-based logging : After a checkpoint has been taken, a process logs its messages before sending a message. Receiver-based logging : The receiving process logs an incoming message before delivering it to the application.

9 Advantages of Message Logging
If only checkpointing is used, processes will be restored to a checkpointed state. So, from then, the system may behave in a different way than it did before recovery. For example, messages being delivered in a different order. If message logging also takes place, an actual replay of the events take place since the last checkpoint takes place.

10 Recovery-Oriented Computing
It is cheaper to optimize recovery, than targeting for systems that are free from failures. One method is to simply reboot only the fault part of the system, for this the fault has to be properly localized.

11 Future Work Efforts are being made to combine forward data recovery with backward data recovery in order to produce an optimal output. A Non-blocking roll-forward recovery for message passing is being considered for systems with no built- in fault detection methods.

12 References Behzad Chitsaz, Mohammadreza Razzazi. Non- blocking roll-forward recovery for message passing systems[2012]. Massimiliano Fasi, Yves Robert and Bora Ucar. Combining backward and forward recovery to cope with silent errors in iterative solvers[2015].


Download ppt "8.6. Recovery By Hemanth Kumar Reddy."

Similar presentations


Ads by Google