Introduction to Dependability. Overview Dependability: "the trustworthiness of a computing system which allows reliance to be justifiably placed on the.

Slides:



Advertisements
Similar presentations
Chapter 8 Fault Tolerance
Advertisements

Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.1.1 FAULT TOLERANT SYSTEMS Part 1 - Introduction.
EEC 688/788 Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Dependability ITV Model-based Analysis and Design of Embedded Software Techniques and methods for Critical Software Anders P. Ravn Aalborg University August.
Term Paper OLOMOLA,Afolabi( ). Dependability Modellling.
Dependability Evaluation through Markovian model.
EEC 688/788 Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
SWE Introduction to Software Engineering
1 Software Testing and Quality Assurance Lecture 37 – Software Quality Assurance.
EEC 688/788 Secure and Dependable Computing Lecture 11 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS 603 Failure Models April 12, Fault Tolerance in Distributed Systems Perfect world: No Failures –W–We don’t live in a perfect world Non-distributed.
1 Software Testing and Quality Assurance Lecture 34 – Software Quality Assurance.
Last Class: Weak Consistency
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 2 Wenbing Zhao Department of Electrical and Computer Engineering.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
Introduction to Dependability slides made with the collaboration of: Laprie, Kanoon, Romano.
EEC 688/788 Secure and Dependable Computing Lecture 11 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Lucas Phillips Anurag Nanajipuram FAILURE MODE AND EFFECT ANALYSIS.
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
CIS 376 Bruce R. Maxim UM-Dearborn
Maintenance Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
System Testing There are several steps in testing the system: –Function testing –Performance testing –Acceptance testing –Installation testing.
Reliability and Fault Tolerance Setha Pan-ngum. Introduction From the survey by American Society for Quality Control [1]. Ten most important product attributes.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 2.
SOFTWARE ENGINEERING1 Introduction. Software Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE.
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
1 Chapter 3 Critical Systems. 2 Objectives To explain what is meant by a critical system where system failure can have severe human or economic consequence.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems 1.
Part.1.1 In The Name of GOD Welcome to Babol (Nooshirvani) University of Technology Electrical & Computer Engineering Department.
Fault-Tolerant Systems Design Part 1.
Software Reliability in Nuclear Systems Arsen Papisyan Anthony Gwyn.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Software Testing and Quality Assurance Software Quality Assurance 1.
CprE 545Iowa State University CprE 558: Real-Time Systems Lectures 15-16: Dependability Concepts & Faul-Tolerance.
On the Definition of Survivability J. C. Knight and K. J. Sullivan, Department of Computer Science, University of Virginia, December 2000.
CS551 - Lecture 5 1 CS551 Lecture 5: Quality Attributes Yugi Lee FH #555 (816)
CS 505: Thu D. Nguyen Rutgers University, Spring CS 505: Computer Structures Fault Tolerance Thu D. Nguyen Spring 2005 Computer Science Rutgers.
Fault-Tolerant Systems Design Part 1.
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
Software Testing and Quality Assurance Software Quality Assurance 1.
Basic Concepts of Dependability Jean-Claude Laprie DeSIRE and DeFINE Workshop — Pisa, November 2002.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
1 INTRUSION TOLERANT SYSTEMS WORKSHOP Phoenix, AZ 4 August 1999 Jaynarayan H. Lala ITS Program Manager.
Maintainance and Reliability Pertemuan 26 Mata kuliah: J Manajemen Operasional Tahun: 2010.
Introduction to Fault Tolerance By Sahithi Podila.
For a good summary, visit:
©Ian Sommerville 2000Dependability Slide 1 Chapter 16 Dependability.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
Stracener_EMIS 7305/5305_Spr08_ Systems Availability Modeling & Analysis Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7305/5305.
1 Software Testing and Quality Assurance Lecture 38 – Software Quality Assurance.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
1 Introduction to Engineering Spring 2007 Lecture 16: Reliability & Probability.
Faults and fault-tolerance
Software Testing An Introduction.
Fault Tolerance & Reliability CDA 5140 Spring 2006
Fault Tolerance In Operating System
Dependability Dependability is the ability to avoid service failures that are more frequent or severe than desired. It is an important goal of distributed.
Reliability and Fault Tolerance
Fault Tolerance Distributed Web-based Systems
Faults and fault-tolerance
Introduction to Fault Tolerance
Overview Dependability: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"
Presentation transcript:

Introduction to Dependability

Overview Dependability: "the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers " IFIP 10.4 Working Group on Dependable Computing and Fault Tolerance Introduction Dependability attributes Application with dependability requirements Impairments Techniques to improve dependability Fault tolerant techniques

Introduction

Dependability attributes Reliability R(t): continuity of correct service Availability: readiness for correct service A(t) (transient value), A /steady state value) Safety S(t): absence of catastrophic consequences on the user(s) and the environment Performability P(L,t): ability to perform a given performance level Maintainability: ability for a system to undergo modifications and repairs Testability: attitude of a given system to be tested Security: degree of protection against danger, damage, loss, and criminal activity.

Reliability R(t), Availability A(t) & A Reliability, R(t): the conditional probability that a system performs correctly throughtou the interval (t 0,t), given that the system was performing correctly at time t 0. Istantaneous Availability, A(t): the probability that a system is operating corretly and is available to perform its functions at the instant of time t Limiting or steady state Availability, A: the probability that a system is operating correctly and is available to perform its functions.

Reliability versus Availability Availability differ from reliability in that reliability involves an interval of time, while availability at an istant of time. A system can be highly available yet experience frequent periods of inoperability. The availability of a system depends not only on how frequently it becomes inoperable but also how quickly it can be repaired.

Safety S(t) Safety, S(t): the probability that a system will either perform its functions correctly or will discontinue its fucntions in a manner that does not disrupt the operation of other systems or compromise the safety of any people associated directly or inderectly with the syste. The Safety is a measure of the fail-safe capability of a system, i.e, if the system does not operate correctly, it fails in a safe manner. Safety and availability differ because availability is the probability that a system will perform its function corretly, while Safety is the probability that a system will either perform its functions corrctly or will discontinue the functions in a manner that causes no harm.

Performability P(L,t) Performability, P(L.t): the probability that a system performance will be at, or above, some level L, at instant t (Fortes 1984). It is a measure of the system ability to achieve a given performance level, despite the occurrence of failures. Performability differs from reliability in that reliability is a measure of the likehooh that all of the functions are performed correctly, while performability is a measure of likehood that some subset of the functins is performed correctly.

Security Security is the degree of protection against danger, damage, loss, and criminal activity. Security as a form of protection are structures and processes that provide or improve security as a condition. The key difference between security and reliability-availability- safety is that security must take into account the actions of people attempting to cause destruction.

Maintainability Maintainability is the probability M(t) that a malfunctioning system can be restored to a correct state within time t. It is a measure of the speed of repairing a system after the occurrence of a failure. It is closely correlated with availability: - The shortest the interval to restore a correct behavior, the highest the likelihood that the system is correct at any time t. - As an extreme, if M(0) = 1.0, the system will always be available.

Testability Testability is simply a measure of how easy it is for an operator to verify the attributes of a system. It is clearly related to maintainability: the easiest it is to test a malfunctioning system, the fastest will be to identify a faulty component, the shortest will be the time to repair the system.

Applications with dependability requirements from Pradhan’s book Long life applications Critical-computation applications Hardly maintainable applications (Maintenance postponement applications) High availability applications Long life applications: applications whose operational life is of the order of some year. The most common examples are the unmanned space flights and satellites. Typical requirements are to have a 0.95 or greater probability of being operational at the end of ten year period. This kind of system should or not have maintenance capability

Applications with dependability requirements (2/3) Critical-computation applications: applications that should cause safety problem to the people and to the business. Examples: aircraft, airtrafic flight control system, military systems, infrastructures for the control of industrial plants like nuclear or chemical plants. Typical requirements are to have a or greater probability of being operational at the end of three hour period. In this period normally it is not possible a human maintenance. Hardly Maintainable Applications : applications in which the maintenance is costly or difficult to perform. Examples: remote processing systems in not human region (like antartide continent). The maintenance can be scheduled independently by the presence of failure

Applications with dependability requirements (3/3) High availability applications: applications in which the availability is the key parameter. Users expect that the service is operational with high probability whenever it is requested. Examples: banking computing infrastructures. The maintenance can be done immediately and “ easily ”.

Impairments to dependability 15

Causes and effects 16

Example of human causes at design phase 17

Example of physical cause (permanent) 18

Example of human cause at operational phase 19

Example of physical cause (transient) 20 FAULT IMPAIRED MEMORY DATA ELECTROMAGNETIC PERTURBATION FAULT ACTIVATION FAULTY COMPONENT AND INPUTS ERROR FAILURE PROPAGATION WHEN DELIVERED SERVICE DEVIATES (VALUE, DELIVERY INSTANT) FROM FUNCTION FULFILLING

Failure modes: taxonomy 21 FAILURE MODES FAILURES DOMAIN PERCEPTION BY SEVERAL USERS CONSISTENT FAILURES INCONSISTENT (BYZANTINE) FAILURES CONSEQUENCES ON ENVIRONMENT … BENIGN FAILURES CATASTROPHIC FAILURES FAIL-SAFE SYSTEM VALUE FAILURES TIMING FAILURES STOPPING (HALTING) FAILURES FAIL- SILENT SYSTEM OUTPUT VALUE FROZEN SILENCE (ABSENCE OF EVENT) FAIL-HALT ("FAIL-STOP") SYSTEM FAIL- PASSIVE SYSTEM

Fault classification 22

Fault classification (1/2)

Fault classification (2/2) 24

Human-made faults 25