Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.

Slides:



Advertisements
Similar presentations
Tolerating Timing faults TSW November 2009 Anders P. Ravn Aalborg University.
Advertisements

Software Engineering Implementation Lecture 3 ASPI8-4 Anders P. Ravn, Feb 2004.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 2.
Fault-Tolerant Systems Design Part 1.
Exception Handling – illustrated by Java mMIC-SFT November 2003 Anders P. Ravn Aalborg University.
Written by: Dr. JJ Shepherd
©Ian Sommerville 1995/2000 (Modified by Spiros Mancoridis 1999) Software Engineering, 6th edition. Chapter 18 Slide 1 Dependable software development l.
Software Construction 1 ( ) First Semester Dr. Samer Odeh Hanna (PhD) Office: IT 327.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Fault Tolerance -Example TSW November 2009 Anders P. Ravn Aalborg University.
Dependability TSW 10 Anders P. Ravn Aalborg University November 2009.
The Java Assert Statement. 2 Assert A Java statement in JDK 1.4 & newer Intent: enables code to test assumptions. E.g., a method that calculates a particle’s.
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
© Burns and Welling, 2001 Characteristics of a RTS n Large and complex n Concurrent control of separate system components n Facilities to interact with.
Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
Mini Project ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August.
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Dependability ITV Real-Time Systems Anders P. Ravn Aalborg University February 2006.
Documentation ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Constructing Reliable Software Components Across the ORB M. Robert Rwebangira Howard University Future Aerospace Science and Technology.
Software Fault Tolerance – The big Picture mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
Chapter 2: Reliability and Fault Tolerance
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Design of SCS Architecture, Control and Fault Handling.
Principles of Computer Programming (using Java) Review Haidong Xue Summer 2011, at GSU.
1 Identifiers  Identifiers are the words a programmer uses in a program  An identifier can be made up of letters, digits, the underscore character (
Characteristics of a RTS
Critical systems development. Objectives l To explain how fault tolerance and fault avoidance contribute to the development of dependable systems l To.
CS, AUHenrik Bærbak Christensen1 Fault Tolerant Architectures Lyu Chapter 14 Sommerville Chapter 20 Part II.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Software testing techniques Software testing techniques Mutation testing Presentation on the seminar Kaunas University of Technology.
Replicated State Machines ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
Fault-Tolerant Systems Design Part 1.
1 CSE 8343 Presentation # 2 Fault Tolerance in Distributed Systems By Sajida Begum Samina F Choudhry.
Page: 1 การโปรแกรมเชิงวัตถุด้วยภาษา JAVA บุรินทร์ รุจจนพันธุ์.. ปรับปรุง 15 มิถุนายน 2552 Keyword & Data Type มหาวิทยาลัยเนชั่น.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Quality Assurance.
CprE 458/558: Real-Time Systems
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Fault-Tolerant Systems Design Part 1.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Exceptions Chapter 16 This chapter explains: What as exception is Why they are useful Java exception facilities.
Exceptions in C++. Exceptions  Exceptions provide a way to handle the errors generated by our programs by transferring control to functions called handlers.
Introduction to Collections. Collections Collections provide a way of organizing related data in a model Different types of collections have different.
Written by: Dr. JJ Shepherd
CSE 8377 Software Fault Tolerance. CSE 8377 Motivation Software is becoming central to many life- critical systems Software is created by error-prone.
© 2006 Pearson Addison-Wesley. All rights reserved 1-1 Chapter 1 Review of Java Fundamentals.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ
Week#3 Software Quality Engineering.
Chapter 2: Reliability and Fault Tolerance
Mutation testing Julius Purvinis IFM-0/2.
Fault Tolerance In Operating System
Critical systems development
Multi-version approach (with error detection and recovery)
null, true, and false are also reserved.
Introduction to Java Programming
Fault Tolerance Distributed Web-based Systems
JavaScript Reserved Words
Exceptions 1 CMSC 202.
Fault Tolerance Distributed
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
CMSC 202 Lesson 20 Exceptions 1.
Presentation transcript:

Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August 2011

Fault Tolerance Means to isolate component faults Prevents system failures May increase system dependability... And mask them

Fault Tolerance

FT - levels Full tolerance Graceful Degradation Fail safe BW p. 107

FT basis: Redundancy Time Space TryRetry... Try...

Fault Tolerance

Basic Strategies

Dynamic Redundancy 1.Error detection 2.Damage confinement and assessment 3.Error recovery 4.Fault treatment and continued service BW p. 114

Error Detection f: State x Input  State x Output Environment (exception) Application Assertion: precondition (input) postcondition (input, output) invariant(state, state’) Timing: WCET(f, input) Deadline (f,input) D

Damage Confinement Static structure Dynamic structure (transaction) object I I

Error Recovery Forward Backward Repair the state – if you can ! define recovery points checkpoint state at r. p. roll back retry Domino effect

Recovery blocks ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR BW p. 120

Implementation of Recovery Blocks

Abstract class RecoveryBlock public abstract class RecoveryBlock { abstract boolean acceptanceTest(); /** method to produce the result, it must be implemented by the application. module 0,..., MaxModule-1 */ abstract void block(int module); /* MaxModules must be set by the application to the number of blocks */ protected int MaxModules; ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR

RecoveryBlock execution /** method to execute recovery module 0, 1,... MaxModules-1 until one succeds NoAccept if no module passes acceptanceTest. */ public final void do_it() throws NoAccept, CloneNotSupportedException{ save(); int i = 0; do { try { block(i++); if ( acceptanceTest() ) return; } catch (Exception e) {/* if the block fails, we continue - not acceptance */} restore(copy); } while (i < MaxBlocks); throw new NoAccept(); } ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR

RecoveryBlock cache public abstract class RecoveryBlock { /** The recovery Cache is implemented by a clone of the original object */ RecoveryBlock copy; /** save object to recovery cache, uses Java clone which must be a deep clone. */ private final void save() throws CloneNotSupportedException { copy = (RecoveryBlock) this.clone(); } /** method to restore data from recovery cache, it must be implemented by the application value of the object to be restored */ abstract void restore(RecoveryBlock copy);

Application /** Extends the basic abstract RecoveryBlock with faulty sorting * algorithms and log calls, returns etc. to a TextArea. */ public class RecoveringSort extends RecoveryBlock { /** checksum for acceptance test */ private int checksum; /** data to be saved in recovery cache */ private int [] argument; public RecoveringSort(TextArea t) { MaxBlocks = 3; log = t; }

Acceptance criteria /* Acceptance test for sorting; it shall verify: * 1) the return value is an ordered list, * 2) the return value is a permutation of the initial values */ boolean acceptanceTest() { boolean result = true; // check ordering int i = argument.length-1; while (i > 0) if (argument[i] < argument[--i]) {result = false; break; } // check permutation, this is a partial check through a checksum // A full check is as expensive computationally as sorting, // thus, we use a partial check. i = argument.length; int sum = 0; while (i > 0) sum+=argument[--i]; return result && (sum == checksum); }

Application - modules /** Starts sorting using the recovery block mechanisms.. data integer array containing elements to be sorted. */ public int [] sort(int [] data) { argument = (int [])data.clone(); // copy needed for recovery to work checksum = 0; int i = argument.length; while (i > 0) checksum+=argument[--i]; try { do_it(); } catch (NoAccept e) { log.append("All blocks falied\n"); } return argument; } void block(int i) { switch (i) { case 0: BucketSort(argument); break; case 1: BadSort(argument); break; case 2: AlmostGoodSort(argument); break; default: }

Fault classes (scope of R-B) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) ++ (-) + / (+) + / +

The ideal FT-component Exception HandlerNormal mode Request/response Interface exception Interface exception Failure exception Failure exception

N-version programming V1 V2 V3 Driver (comparator) Comparison vectors (votes) Comparison status indicators Comparison points

Fault classes (scope of N-VP) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) / (+) + / +