Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.

Similar presentations


Presentation on theme: "Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg."— Presentation transcript:

1 Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August 2011

2 Fault Tolerance Means to isolate component faults Prevents system failures May increase system dependability... And mask them

3 Fault Tolerance

4 FT - levels Full tolerance Graceful Degradation Fail safe BW p. 107

5 FT basis: Redundancy Time Space TryRetry... Try...

6 Fault Tolerance

7 Basic Strategies

8 Dynamic Redundancy 1.Error detection 2.Damage confinement and assessment 3.Error recovery 4.Fault treatment and continued service BW p. 114

9 Error Detection f: State x Input  State x Output Environment (exception) Application Assertion: precondition (input) postcondition (input, output) invariant(state, state’) Timing: WCET(f, input) Deadline (f,input) D

10 Damage Confinement Static structure Dynamic structure (transaction) object I I

11 Error Recovery Forward Backward Repair the state – if you can ! define recovery points checkpoint state at r. p. roll back retry Domino effect

12 Recovery blocks ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR BW p. 120

13 Implementation of Recovery Blocks

14 Abstract class RecoveryBlock public abstract class RecoveryBlock { abstract boolean acceptanceTest(); /** method to produce the result, it must be implemented by the application. * @param module 0,..., MaxModule-1 */ abstract void block(int module); /* MaxModules must be set by the application to the number of blocks */ protected int MaxModules; ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR

15 RecoveryBlock execution /** method to execute recovery module 0, 1,... MaxModules-1 until one succeds * @throws NoAccept if no module passes acceptanceTest. */ public final void do_it() throws NoAccept, CloneNotSupportedException{ save(); int i = 0; do { try { block(i++); if ( acceptanceTest() ) return; } catch (Exception e) {/* if the block fails, we continue - not acceptance */} restore(copy); } while (i < MaxBlocks); throw new NoAccept(); } ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR

16 RecoveryBlock cache public abstract class RecoveryBlock { /** The recovery Cache is implemented by a clone of the original object */ RecoveryBlock copy; /** save object to recovery cache, uses Java clone which must be a deep clone. */ private final void save() throws CloneNotSupportedException { copy = (RecoveryBlock) this.clone(); } /** method to restore data from recovery cache, it must be implemented by the application * @param value of the object to be restored */ abstract void restore(RecoveryBlock copy);

17 Application /** Extends the basic abstract RecoveryBlock with faulty sorting * algorithms and log calls, returns etc. to a TextArea. */ public class RecoveringSort extends RecoveryBlock { /** checksum for acceptance test */ private int checksum; /** data to be saved in recovery cache */ private int [] argument; public RecoveringSort(TextArea t) { MaxBlocks = 3; log = t; }

18 Acceptance criteria /* Acceptance test for sorting; it shall verify: * 1) the return value is an ordered list, * 2) the return value is a permutation of the initial values */ boolean acceptanceTest() { boolean result = true; // check ordering int i = argument.length-1; while (i > 0) if (argument[i] < argument[--i]) {result = false; break; } // check permutation, this is a partial check through a checksum // A full check is as expensive computationally as sorting, // thus, we use a partial check. i = argument.length; int sum = 0; while (i > 0) sum+=argument[--i]; return result && (sum == checksum); }

19 Application - modules /** Starts sorting using the recovery block mechanisms.. * @param data integer array containing elements to be sorted. */ public int [] sort(int [] data) { argument = (int [])data.clone(); // copy needed for recovery to work checksum = 0; int i = argument.length; while (i > 0) checksum+=argument[--i]; try { do_it(); } catch (NoAccept e) { log.append("All blocks falied\n"); } return argument; } void block(int i) { switch (i) { case 0: BucketSort(argument); break; case 1: BadSort(argument); break; case 2: AlmostGoodSort(argument); break; default: }

20 Fault classes (scope of R-B) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) ++ (-) + / (+) + / +

21 The ideal FT-component Exception HandlerNormal mode Request/response Interface exception Interface exception Failure exception Failure exception

22 N-version programming V1 V2 V3 Driver (comparator) Comparison vectors (votes) Comparison status indicators Comparison points

23 Fault classes (scope of N-VP) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) ++ + + / (+) + / +


Download ppt "Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg."

Similar presentations


Ads by Google