Dependability ITV Real-Time Systems Anders P. Ravn Aalborg University February 2006.

Slides:



Advertisements
Similar presentations
An Overview of ABFT in cloud computing
Advertisements

Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Tolerating Timing faults TSW November 2009 Anders P. Ravn Aalborg University.
Exception Handling – illustrated by Java mMIC-SFT November 2003 Anders P. Ravn Aalborg University.
5th Conference on Intelligent Systems
Dependability ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August.
Fault Tolerance -Example TSW November 2009 Anders P. Ravn Aalborg University.
Real-Time Systems... And the Fine Print Real-Time Systems Anders P. Ravn Aalborg University September 2009.
Dependability TSW 10 Anders P. Ravn Aalborg University November 2009.
Software Fault Tolerance – The big Picture RTS April 2008 Anders P. Ravn Aalborg University.
© Burns and Welling, 2001 Characteristics of a RTS n Large and complex n Concurrent control of separate system components n Facilities to interact with.
Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
CSE 322: Software Reliability Engineering Topics covered: Dependability concepts Dependability models.
8. Fault Tolerance in Software 8.1 Introduction Is it true that a program that has once performed a given task as specified will continue to do so? Yes,
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
ABCSG - Dependable Systems - 01/06/ ABCSG Dependable Systems.
Modified from Sommerville’s originals Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Real-Time Systems – The big Picture
Software Fault Tolerance – The big Picture mMIC-SFT September 2003 Anders P. Ravn Aalborg University.
Chapter 2: Reliability and Fault Tolerance
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Design of SCS Architecture, Control and Fault Handling.
Summary and Safety Assessment mMIC-SFT November 2003 Anders P. Ravn Aalborg University.
CIS 376 Bruce R. Maxim UM-Dearborn
IV&V Facility Model-based Design Verification IVV Annual Workshop September, 2009 Tom Hempler.
Issues on Software Testing for Safety-Critical Real-Time Automation Systems Shahdat Hossain Troy Mockenhaupt.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Reliability and Fault Tolerance Setha Pan-ngum. Introduction From the survey by American Society for Quality Control [1]. Ten most important product attributes.
2. Fault Tolerance. 2 Fault - Error - Failure Fault = physical defect or flow occurring in some component (hardware or software) Error = incorrect behavior.
CRITICAL SYSTEMS PROPERTIES Survey and Taxonomy. OVERVIEW What is a critical system? What are its properties? Try to classify the properties. How the.
Characteristics of a RTS
Critical systems development. Objectives l To explain how fault tolerance and fault avoidance contribute to the development of dependable systems l To.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Replicated State Machines ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 CSE 8343 Presentation # 2 Fault Tolerance in Distributed Systems By Sajida Begum Samina F Choudhry.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Fault Tolerance Mechanisms ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg.
Quality Assurance.
CprE 458/558: Real-Time Systems
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
Copyright 1999 G.v. Bochmann ELG 7186B ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Fault-Tolerant Systems Design Part 1.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
1 INTRUSION TOLERANT SYSTEMS WORKSHOP Phoenix, AZ 4 August 1999 Jaynarayan H. Lala ITS Program Manager.
CSE 8377 Software Fault Tolerance. CSE 8377 Motivation Software is becoming central to many life- critical systems Software is created by error-prone.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
18/05/2006 Fault Tolerant Computing Based on Diversity by Seda Demirağ
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Week#3 Software Quality Engineering.
Faults and fault-tolerance
Chapter 2: Reliability and Fault Tolerance
Fault Tolerance & Reliability CDA 5140 Spring 2006
Fault Tolerance In Operating System
Multi-version approach (with error detection and recovery)
Reliability and Fault Tolerance
Fault Tolerance Distributed Web-based Systems
Faults and fault-tolerance
Fault Tolerance Distributed
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Presentation transcript:

Dependability ITV Real-Time Systems Anders P. Ravn Aalborg University February 2006

Characteristics of a RTS Timing Constraints Dependability Requirements Concurrent control of separate components Facilities to interact with special purpose hardware

Dependability - attributes Availability Reliability Safety Confidentiality Integrity Maintainability BW p. 139

Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting BW p. 106,...

Dependability - impediments Faults Errors Failures BW p. 103,... FaultErrorFailure... Fault

System and Component

Fault classification Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent)

Error Classification (Fault  Error) Effect Extent latent effective local distributed

Failure Classification (Fault  Failure) Consequence benign malign (a mishap) BW (Failure modes) p. 105

Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting

Fault Prevention Careful Design Conservative Design process (procedures) notations tools robust functionality testability tracability

Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting

Error Removal Verification (analysis of design) Test (analysis of implementation)

Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting

Calculation – analysis of design Simulation – measurement on design Test -- measurement on implementation

Dependability - means Fault prevention Fault tolerance Error Removal Failure Forecasting BW p. 106 …

Fault Tolerance Means to isolate component faults Prevents system failures May increase system dependability... And mask them

Fault Tolerance

FT - levels Full tolerance Graceful Degradation Fail safe BW p. 107

FT basis: Redundancy Time Space TryRetry... Try... BW p. 109

N-version programming V1 V2 V3 Driver (comporator) Comparison vectors (votes) Comparison status indicators BW p. 109 Comparison points

Fault classification (scope of N-VP) Origin Kind Property physical (internal/external) logical (design/interaction) omission value timing byzantine duration (permanent, transient) consistency (determinate, nondeterminate) autonomy (spontaneous, event-dependent) + (+) ++ (+) + / (+) + / +

Dynamic Redundancy 1.Error detection 2.Damage confinement and assessment 3.Error recovery 4.Fault treatment and continued service BW p. 114

Error Detection f: State x Input  State x Output Environment (exception) Application BW p. 115 Assertion: precondition (input) postcondition (input, output) invariant(state, state’) Timing: WCET(f, input) Deadline (f,input) D

Damage Confinement Static structure Dynamic structure BW p. 117 object I I

Error Recovery Forward Backward BW p. 118 Repair the state – if you can ! define recovery points checkpoint state at r. p. roll back retry Domino effect

Recovery blocks ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 }... ELSE BY { module_m } ELSE ERROR BW p. 120

The ideal FT-component Exception HandlerNormal mode Request/response Interface exception Interface exception Failure exception Failure exception BW p. 126