Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.

Slides:



Advertisements
Similar presentations
Reliability Engineering (Rekayasa Keandalan)
Advertisements

Random Variables ECE460 Spring, 2012.
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
5/18/2015CPE 731, 4-Principles 1 Define and quantify dependability (1/3) How decide when a system is operating properly? Infrastructure providers now offer.
SMJ 4812 Project Mgmt and Maintenance Eng.
Dependability Evaluation through Markovian model.
Markov Reward Models By H. Momeni Supervisor: Dr. Abdollahi Azgomi.
Reliable System Design 2011 by: Amir M. Rahmani
1 © 1998 HRL Laboratories, LLC. All Rights Reserved Construction of Bayesian Networks for Diagnostics K. Wojtek Przytula: HRL Laboratories & Don Thompson:
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.2.1 FAULT TOLERANT SYSTEMS Part 2 – Canonical.
1 Fundamentals of Reliability Engineering and Applications Dr. E. A. Elsayed Department of Industrial and Systems Engineering Rutgers University
1 Review Definition: Reliability is the probability that a component or system will perform a required function for a given period of time when used under.
Continuous Random Variables and Probability Distributions
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
Overview Software Quality Assurance Reliability and Availability
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
1 Reliability Application Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS.
Chapter 6 Time dependent reliability of components and system.
-Exponential Distribution -Weibull Distribution
System Testing There are several steps in testing the system: –Function testing –Performance testing –Acceptance testing –Installation testing.
Control Systems and Adaptive Process. Design, and control methods and strategies 1.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Transition of Component States N F Component fails Component is repaired Failed state continues Normal state continues.
Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,
Software Reliability SEG3202 N. El Kadri.
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 2. Reliability Block Diagrams Andrea Bobbio Dipartimento di Informatica Università.
Performance Evaluation of Computer Systems Introduction
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
© 2002 Eaton Corporation. All rights reserved. Designing for System Reliability Dave Loucks, P.E. Eaton Corporation.
1 Basic probability theory Professor Jørn Vatn. 2 Event Probability relates to events Let as an example A be the event that there is an operator error.
Lecture 2: Combinatorial Modeling CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at.
Stracener_EMIS 7305/5305_Spr08_ System Reliability Analysis - Concepts and Metrics Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Maintenance Policies Corrective maintenance: It is usually referred to as repair. Its purpose is to bring the component back to functioning state as soon.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
 How do you know how long your design is going to last?  Is there any way we can predict how long it will work?  Why do Reliability Engineers get paid.
Software Reliability Research Pankaj Jalote Professor, CSE, IIT Kanpur, India.
An Application of Probability to
Software Reliabilty1 Software Reliability Advanced Software Engineering COM360 University of Sunderland © 1998.
1 Component reliability Jørn Vatn. 2 The state of a component is either “up” or “down” T 1, T 2 and T 3 are ”Uptimes” D 1 and D 2 are “Downtimes”
Fault-Tolerant Computing Systems #4 Reliability and Availability
Reliability Failure rates Reliability
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Part.2.1 In The Name of GOD FAULT TOLERANT SYSTEMS Part 2 – Canonical Structures Chapter 2 – Hardware Fault Tolerance.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Prof. Enrico Zio Availability of Systems Prof. Enrico Zio Politecnico di Milano Dipartimento di Energia.
PRODUCT RELIABILITY ASPECT RELIABILITY ENGG COVERS:- RELIABILITY MAINTAINABILITY AVAILABILITY.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
More on Exponential Distribution, Hypo exponential distribution
Expectations of Random Variables, Functions of Random Variables
Software Metrics and Reliability
ECE 753: FAULT-TOLERANT COMPUTING
ECE 313 Probability with Engineering Applications Lecture 7
Hardware & Software Reliability
Availability Availability - A(t)
Fault-Tolerant Computing Systems #5 Reliability and Availability2
Introduction to Instrumentation Engineering
Reliability Failure rates Reliability
T305: Digital Communications
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
Probability, Statistics
Reliability.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Chapter 6 Time dependent reliability of components and system
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Dependability Evaluation

Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally (heuristic): a system prototype is built and empirical statistical data are used to evaluate the system’s metrics:  by far more expensive and complex than the analytic approach  building a system prototype may be impossible  experimental evaluation of dependability requires long observation periods  analytical: dependability metrics are obtained by a mathematical model of the system:  mathematical models may not adequately represent the real system’s strucure or the behavior of its components  simulation models may be a complementary helpful tool

Fundamental Definitions Failure Function Q(t): – probability that a component fails for the first time in the time interval (0,t) – it’s a cumulative distribution function: Q(t) = 0 for t = 0 0  Q(t)  Q(t +  t) for  t  0 Q(t) = 1for t → + 

Fundamental Definitions (cont’d) Reliability Function R(t): – probability that a component functions correctly in the time interval (0,t) R(t) = 1 for t = 0 1  R(t)  R(t +  t) for  t  0 R(t) = 0for t → +  R(t) = 1 – Q(t)

Fundamental Definitions (cont’d) Failure probability density function q(t): it’s the derivative of Q(t) when this is a continous function: R(t) is continous too and its derivative over time r(t) is equal to: R(t) and Q(t) are experimentally evaluated analyzing the behavior of a sufficiently large population and determining the failure rate. N: population at time t = 0 n(t): correct components at time t

Average Failure Frequency Average failure frequency during the time interval (t, t + Δt) : Average failure frequency of a single unit in the time interval (t, t + Δ t) :

Instantaneous Failure Frequency If Δt tends to zero each entity at time t is characterized by an instantaneous failure frequency given by: Being : after integration, we obtain the reliability function:

MTTF (Mean Time To Failure) Index used to evaluate reliability and other dependability metrics. MTTF (Mean Time To Failure). Average time before a failure, or expected operational time of a system before the occurrence of the first failure. Expected time for a failure occurrence: which can also be expressed (expanding q(t)) as: being given that h(t) is constant or increases over time.

Bathtube curve

Failure Frequency Function The first and third region can be excluded assuming to use the entities after the initial testing period and before their aging time. Hence, the instantaneous fault frequency function can be assumed constant: Which determines the following values of the previously introduced expressions: Fault probability (density) over time – q(t)

Repairable Systems In the case of repairable systems, besides the “fault occurrence” event, the event “repairing” or “replacement” of the faulty components has to be considered: MTTF Mean Time to Fault MTTR (Mean Time To Repair) iThe average time to repair or replace a faulty entity  System Availability: MTBF (Mean Time Between Fault) is the average time between two faults, given by the sum of MTTF and MTTR.

Cover Factor Conditional probability that, after the occurrence of a fault, the system returns to function correctly. Measure of the system’s ability to reveal a fault, localize it, contain it and restore a consistent and error free state For its estimation it’s needed to identify every possible fault, and for each fault, forecast its frequency and the corresponding cover factor. Limits: Hard to determine the probability of every possible fault Often it is unrealistic to take into account every possibe fault The cover factor is determined considering one fault at a time, whereas one should keep into account the possibility of multiple concurrent faults.

Dependability Evalution Dependability evaluation of a complex system can be performed via either: COMBINATORIAL MARKOVIAN MODELS MODELS   Combinatorial MethodsMarkov Processes1. reliability2. availability 3. security 4. performability

Combinatorial Models Availability and reliability of computing systems cosiders the system as composed by a set of interconnected entities. First step: identify availability and reliability of each composing entitiy; Second step: identify the configurations that allow the analyzed system to operate according to the project’s specifications; Third step: identify the relation between the faults of each entity and those of the whole system. Enitities, in their turn, are made up of components whose dependability metrics depend on: – Components’ quality, – Mainteinance policies, – Mutual interconnections

Interconnections Typical interconnections are: – Serial – Parallel – TMR – Hybrid M out of N

Serial Interconnection K entities are serially inteconnected when the functioning of the system depends on the correct functioning of all the K entities. Given: – R i (t) = reliability of each entity – A i = availability of each entity one can derive the following system wide metrics: C1C1 C2C2 CkCk

Parallel Interconnection K entities are inteconnected in parallel when the functioning of the system is guaranteed even if just a single entity works. Given: – R i (t) = reliability of each entity – A i = availability of each entity one can derive the following system wide metrics: The system stops working (is unavailable) if all of its K entities fail (are unavailable). C1C1 C2C2 CkCk

Parallel Interconnection (cont’d) In the case of entities having the same reliability R C (t) or availability A C we get that:

TMR Interconnection The system fails or is not available when two entities are simultaneously faulty/unavailable or when the voter is faulty/unavailable: IO C1C1 C2C2 C3C3 r/n

Parallel/Serial Interconnections I O C 21 C 23 C 22 C 111 C 12 C 112 C 11 C1C1 C2C2 R 11 =  R 111. R 112 R 1 = 1 - (1 - R 11 ). (1 - R 12 ) R 2 = 1 - (1 - R 21 ). (1 - R 22 ). (1 - R 23 ) R =  R 1. R 2

Hybrid M out of N interconnection The system remains operational as long as there are at least M correct entities, namely at most K = N – M entities fail. Given: – R i (t) = reliability of each entity – A i = availability of each entity one can derive the following system wide metrics: Infact, the probability that: – N entities are correct is: – N-1 entities are correct: – N-2 entities are correct: – N-K entities are correct:

Evaluation Examples Let us consider a non-redundant system composed of 4 serially connected entities: How can I increase the system’s dependability? O S1S1 S2S2 S3S3 S4S4 I

Pair with a duplicated system O S1S1 S2S2 S3S3 S4S4 S1S1 S2S2 S3S3 S4S4 I

Duplicate Each Component where: S1S1 S1S1 S2S2 S4S4 S3S3 S2S2 S4S4 S3S3 I

Quantifying the dependability of the considered configurations Assuming, e.g., that each A i = 0,9, the system’s availability in the three cases is, respectively: – A = 0,6561 – A d1 = 0,8817 – A d2 = 0,9606