Dept. of Computer Science & Engineering, CUHK Performance and Effectiveness Analysis of Checkpointing in Mobile Environments Chen Xinyu
u Introduction u Mobile Environment – Wireless CORBA u Performance and Effectiveness Analysis of Checkpointing u Conclusions and Future Work Outline
Introduction u Mobile Computing u Permanent failures Physical damage u Transient failures Mobile hosts Wireless links Environmental conditions
Checkpointing and Rollback Recovery u Checkpoint the saved program’s states during failure-free execution u Repair brings the failed device back to normal operation u Rollback reloads the program’s states saved at the most recent checkpoint u Recovery the reprocessing of the program, starting from the most recent checkpoint, applying the logged messages and until the point just before the failure
Wireless CORBA Architecture Visited Domain Home Domain Terminal Domain Access Bridge Static Host Terminal Bridge GIOP Tunnel ab 1 ab 2 mh 1 GTP Messages
Wireless CORBA Architecture Visited Domain ab 1 ab 2 Access Bridge Static Host Home Domain Home Location Agent Terminal Domain Terminal Bridge GIOP Tunnel mh 1 Terminal Domain Terminal Bridge GIOP Tunnel GIOP Tunnel mh 1 Terminal Domain Terminal Bridge GIOP Tunnel mh 1 Terminal Domain Terminal Bridge Access Bridge
u Introduction u Mobile Environment – Wireless CORBA u Performance and Effectiveness Analysis of Checkpointing u Conclusions and Future Work Outline
Program’s Termination Condition u A program is successfully terminated if it receives N computational messages continuously
Assumptions u Failure occurrence, message arrival and handoff event homogeneous Poisson process with parameter, and respectively u Failures do not occur when the program is in the repair or rollback process u A failure is detected as soon as it occurs
Execution without Checkpointing RY0Y0 X0X0 R F1F1 H1H1 Z0Z0 0 t FjFj HkHk m j (1)m j (N)m 1 (n 1 )m 0 (N) X(N) RepairHandoff HH
Conditional Execution Time without checkpointing
LST without checkpointing
LST and Expectation of Program Execution Time
Bounded Situations u Without handoff u Without handoff and failure
Execution with Equi-number Checkpointing CiCi R+C Y i (0) X i (0) R+C F i (1) H i (1) Z i (0) 0 t F i (j) H i (k) m ij (1)m ij (a)m i1 (n i1 )m i0 (a) X i (N,a) Repair + RollbackHandoff C i-1 Checkpointing HHCC
Conditional Execution Time & LST with Checkpointing
LST and Expectation of Program Execution Time
Average Effectiveness u Effective interval: a program produces useful work towards its completion u Wasted interval: Repair and rollback Handoff Checkpoint creation Wasted Computation u Average Effectiveness: how much of the time an MH is in effective interval during an execution
Optimal Checkpointing Interval
Beneficial Condition
Equi-number Checkpointing u Equi-number checkpointing with respect to message number Message number in each checkpointing interval is not changed u Equi-number checkpointing with respect to checkpoint number Checkpoint number is not changed
Equi-number Checkpointing with respect to Checkpoint Number
Equi-number Checkpointing with respect to Message Number
Comparison Between Checkpointing and Without Checkpointing
Average Effectiveness vs. Message Arrival Rate and Handoff Rate
Conclusions u Introduce an equi-number checkpoiting strategy u Derive LST and expectation of program execution time u Derive average effectiveness u Derive optimal checkpointing interval u Identify the beneficial condition
Future Work u Analytical model Message queuing effect during repair and recovery General event distributions u Fault tolerance in ad hoc network Without infrastructure support Self-organizing and adaptive
Thank You