Presentation is loading. Please wait.

Presentation is loading. Please wait.

Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia.

Similar presentations


Presentation on theme: "Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia."— Presentation transcript:

1 Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia Polytechnic and State University

2 Outline  Background  Problem Definition – Failure Recovery in the Mobile Computing Environment  Proposed Solution – Movement-Based Check-pointing and Logging  Performance Analysis Analytic Model of the System Analysis Results and Conclusions  Future Work

3 Background

4 Mobile Computing  Advances in wireless networking and portable device technologies are revolutionizing computing  Mobile Computing – A type of distributed computing Involves hosts that may be mobile Host network connectivity maintained through wireless communications

5 Fault-tolerance in Distributed systems Check-pointing, Logging, Rollback recovery  Check-pointing  failure-free operations Save system state to stable storage This snapshot is called a checkpoint  Logging  failure-free operations All non-deterministic events and the information necessary to replay these events are logged to the stable storage In addition to checkpoints

6 Fault-tolerance in Distributed systems  Failure Recovery Failed process rolls back to the latest checkpoint Replays all the logged events in their original order Recreates pre-failure state independently

7 Problem Definition Failure Recovery in the Mobile Computing Environment

8 Effects of Properties of MC Env.  Mobility of hosts If checkpointing requires coordination, the MH must be searched and located first before control messages can be delivered; this increases communication delay Data related to recovery, such as checkpoints and logs, may be distributed over many MSS; a mechanism is required for efficient storage, retrieval and management of this dispersed information

9 Effects of Properties of MC Env.  Low bandwidth and unreliable network connectivity A recovery mechanism that requires a large number of messages or large size of messages imposes undue burden on the wireless resources and increases the cost of providing fault tolerance.

10 Effects of Properties of MC Env.  Limited battery life of host devices Communication is energy intensive. Recovery mechanism must keep communication (the number of messages and the size of messages) to a minimum.

11 Effects of Properties of MC Env.  Lack of stable storage on host devices Devices are vulnerable to physical damage Devices are small and are equipped with limited memory MH’s disk cannot reliably function as the stable storage required to store recovery information.

12 Effects of Properties of MC Env.  Different types of ‘failures‘ Voluntary disconnection and hardware failure must be handled differently  A disconnected host may reconnect after a while and expect to resume operations A MH that is currently unreachable cannot be expected to participate in a checkpointing or recovery operation. A scheme that requires synchronization or coordination with other MHs would either block until the MH reconnected or would fail.

13 The Problem…  Traditional recovery schemes suffer from many shortcomings when applied to the mobile computing environment.  The failure-prone nature of the environment makes it essential to provide some form of explicit recovery mechanism.

14 The Problem…  In general, application recovery mechanisms try to balance Recovery cost (failure-free operational cost) Recovery time Storage requirements for recovery related information

15 The Problem…  Adaptations of traditional recovery schemes for the mobile computing environment Do not consider mobility in the selection of checkpointing interval Use periodic checkpointing Subsequently control the proliferation of recovery information using techniques that merge logs and move the information closer to the MH.

16 Proposed Solution Movement-Based Check-pointing and Logging

17 Assumed Mobile Computing System  A set of mobile hosts (MHs)  They maintain network connectivity through a wireless link to a static mobile support station (MSS)  A MSS handles all communications to and from MHs within its area of influence known as a cell  Each MSS is equipped with enough volume of stable storage to store the state and log information

18 Assumed Mobile Computing System  Interactions between the MH and the network infrastructure relevant to failure recovery Handoff – Cell boundary crossing Disconnection – For power conservation Reconnection – Possibly in a cell different from the one in which it disconnected

19 Assumed Mobile Computation  A distributed computation  a number of processes executing concurrently on multiple hosts.  Process states: Normal- executing application related computations, receiving user inputs or sending and receiving messages. Save - saves its state as a checkpoint to the stable storage Between checkpoints, the process also logs all events (Normal state) Recovery – Loads checkpoints and applies logs

20 Movement-Based Checkpointing and Logging  Interval between checkpoints is governed by the number of handoffs experienced by the MH and is not fixed  MH maintains a handoff counter which is incremented by 1 every time a handoff occurs.  When the value of the counter becomes greater than a threshold M, a checkpoint is taken.  In between checkpoints, all write events related to a MH is also logged to the local MSS.

21 Movement-Based Checkpointing and Logging  The threshold M is a configurable parameter. Depends on: User mobility rate Network the failure rate Application log arrival rate

22 Movement-Based Checkpointing and Logging  Thus, depending on the variability in the MH’s mobility, the time interval between successive checkpoints differs.  Recovery – MH recovers independently without coordination with other MHs Upon reconnection, MH informs local MSS. Local MSS contacts MSS with latest checkpoint Local MSS contacts all MSS storing logs All data transferred to local MSS via wired network and to MH via wireless link MH rolls back and applies logs

23 Movement-Based Checkpointing and Logging  The performance of this scheme depends on identifying the optimal movement threshold M per user and application. Checkpoints and logs remain within acceptable range of the MH’s current location and eliminates the need for information consolidation. Ensures acceptable recovery time since M bounds the number of MSSs’ from which logs must be retrieved.

24 Performance Analysis Analytic Model

25 Stochastic Petri-Net (SPN) Model

26 SPN Model Parameters ParameterDescription σMH mobility rate, i.e. the rate at which the MH crosses cell boundaries. μLog arrival rate i.e. the rate at which logs are created λfλf MH failure rate i.e. the rate at which the MH fails MMovement threshold i.e. the number of handoffs after which the MH takes a checkpoint rRatio of bandwidth of wireless network to wired network T ckp_w Time required to transmit a checkpoint through the wireless link T log_w Time required to load a log entry through the wireless link T elog Time required to execute a log entry at the MH

27 SPN Model Parameters  Parameter Θ k - Checkpoint rate of the MH  Parameter Θ i - Recovery rate of the MH = inverse of recovery time  i - number of handoffs experienced by the MH since the last checkpoint and before failure.

28 Analytic Model – Recovery Time

29  T req_rec - Time spent on recovery information requests N mss_logs – Number of MSSs storing logs D mss - average hop count between MSS cp and MSS rec

30 Analytic Model – Recovery Time  T ckp_tx - Time spent on transmitting the latest checkpoint to the MH  T log_tx - Time spent on transmitting the logs to the MH  T rec - Time spent to rollback to the last checkpoint and apply the logs

31 Analytic Model – Cost of Recovery  T r – Average Recovery time per failure  F r – Recovery probability  T c – Cost of recovery No. of checkpoints before failure No. of logs before failure

32 SPN Evaluation Parameters  Size of a log entry - 50B  Size of a checkpoint - 2000B  Bandwidth of wired network-2Mbps  Ratio of bandwidth of wireless to wired network (r) - 0.1  Time required to apply a log entry (Telog) - 0.0001s  Time required to transmit a log entry through the wireless channel (Tlog_w) - 0.002s  Time required to transmit a checkpoint through the wireless channel (Tckp_w) - 0.08s

33 Performance Analysis Results and Conclusions

34 Recovery Probability vs. Recovery Time

35 Recovery Probability vs. Log Arrival Rate

36 Recovery Probability vs. Failure Rate

37 Recovery Probability & Recovery Time vs. Movement Threshold

38 Determining Optimal Movement Threshold that Minimizes Recovery Cost Per Failure

39 Conclusion – Proposed Scheme  An efficient failure recovery scheme for mobile computing systems based on movement-based checkpointing and logging  Movement-based checkpointing and logging scheme takes a checkpoint only after the mobile node has made M movements (mobility handoffs).  The value of M is governed by the failure rate, log arrival rate, and the mobility rate of the application and MH.  Identify the optimal movement threshold M, when given the failure, mobility and log arrival rates, to minimize the cost of recovery per failure.

40 Conclusion – Practical Application  Build a table at configuration time covering possible parameter values of the mobility rate and failure rate of the MH and log arrival rate of the mobile applications, and listing the optimal M value that would minimize the recovery cost per failure.  At runtime, based on the measured rates, the optimal M may be selected dynamically to minimize the recovery cost per failure.  Optimal M selected must also satisfy the specified recovery probability when given an application deadline to recover from a failure.


Download ppt "Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia."

Similar presentations


Ads by Google