Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.

Similar presentations


Presentation on theme: "Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University."— Presentation transcript:

1 Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University

2 Fault Tolerance Means to isolate component faults Prevents system failures May increase system dependability

3 Dependability - attributes Availability Reliability Safety Confidentiality Integrity Maintainability BW p. 129

4 Dependability - impairments Faults Errors Failures BW p. 103,...,130 FaultErrorFailure... Fault

5 System and Component

6 Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting BW p. 106,..., 130

7 Fault classification Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent)

8 Error Classification (Fault  Error) Effect Extent latent effective local distributed

9 Failure Classification (Fault  Error  Failure) Consequence benign malign (a mishap) BW (Failure modes) p. 105

10 Fault Avoidance Careful Design Conservative Design process (activities) notations tools robust functionality testability tracability

11 Error Removal Verification (analysis of design) Test (analysis of implementation)

12 Failure Forecasting Calculation – analysis of design Simulation – measurement on design Test -- measurement on implementation

13 Fault Tolerance Means to isolate component faults Prevents system failures May increase system dependability... And mask them

14 Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting BW p. 106,...

15 Fault Tolerance

16 FT - levels Full tolerance Graceful Degradation Fail safe BW p. 107

17 FT basis: Redundancy Time Space TryRetry... Try... BW p. 109

18 N-version programming V1 V2 V3 Driver (comparator) Comparison vectors (votes) Comparison status indicators BW p. 109 Comparison points

19 Fault classification (scope of N-VP) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) ++ (+) + / (+) + / +

20 Dynamic Redundancy 1.Error detection 2.Damage confinement and assessment 3.Error recovery 4.Fault treatment and continued service BW p. 114

21 Error Detection f: State x Input  State x Output Environment (exception) Application BW p. 115 Assertion: precondition (input) postcondition (input, output) invariant(state, state’) Timing: WCET(f, input) Deadline (f,input) D

22 Damage Confinement Static structure Dynamic structure BW p. 117 object I I

23 Error Recovery Forward Backward BW p. 118 Repair the state – if you can ! define recovery points checkpoint state at r. p. roll back retry Domino effect

24 Recovery blocks ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR BW p. 120

25 The ideal FT-component Exception HandlerNormal mode Request/response Interface exception Interface exception Failure exception Failure exception BW p. 126

26 Safety Assessment Find faults that may lead to mishaps, analyze their relations, and estimate their consequences. May involve probabilistic reasoning (Reliability Engineering)

27 Fault Tree - Events Primary Events: Basic event – fault in atomic component Undeveloped Event – fault in composite component (may be analyzed later) External event – expected event from environment Intermediate event: Nodes inside a fault-tree

28 Fault Tree - Gates... condition Inhibit gate

29 Example – ”Wake too late” Wake too late Alarm clock fails Phone fails ”Inner clock” fails

30 Example ”Alarm clock fails” Beeper fails Button fails Alarm clock fails electronics fail SW fails Power fails Button read failsBeeper not set

31 Cut Set A cut set is a set of events that causes a top level event A singleton cut set is a single point of failure

32 Example – ”Wake too late” Wake too late Alarm clock fails Phone fails ”Inner clock” fails

33 Example ”Alarm clock fails” Beeper fails Button fails Alarm clock fails electronics fail SW fails Power fails Button read failsBeeper not set

34 Extensions etc. Probabilities on edges Event tree (forward analysis from initiating event) Combinations (cause-consequence diagrams) Many tools Kirsten M. Hansen, Anders P. Ravn and Victoria Stavridou, From Safety Analysis to Formal Specification, IEEE Trans. Softw. Eng.24,pp. 573-584, July 1998

35 Example

36 Fault Hypotheses

37 Fault-Tolerant System

38 Impulse Generator

39 CU

40 Voter and Arbiter

41 Parameters

42 Properties

43 Procedure 1.Model the correct component and check that it has the desired properties. 2.Model relevant faults and introduce them as internal transitions to error states. Check that this fault-affected. 3. Introduce into the model the mechanisms for fault detection, error recovery and masking and check that the desired properties are valid for this design.


Download ppt "Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University."

Similar presentations


Ads by Google