Reliable System Design 2011 by: Amir M. Rahmani

Slides:



Advertisements
Similar presentations
Chapter 0 Review of Algebra.
Advertisements

Discrete time Markov Chain
Ch 7.7: Fundamental Matrices
COE 444 – Internetwork Design & Management Dr. Marwan Abu-Amara Computer Engineering Department King Fahd University of Petroleum and Minerals.
Continuous-Time Markov Chains Nur Aini Masruroh. LOGO Introduction  A continuous-time Markov chain is a stochastic process having the Markovian property.
Markov Chains.
Fault Tree Analysis Part 12 – Redundant Structure and Standby Units.
Chapter 8 Continuous Time Markov Chains. Markov Availability Model.
6. Reliability Modeling Reliable System Design 2010 by: Amir M. Rahmani.
1 Chapter 5 Continuous time Markov Chains Learning objectives : Introduce continuous time Markov Chain Model manufacturing systems using Markov Chain Able.
Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 6 on Stochastic Processes Kishor S. Trivedi Visiting Professor.
Markov Analysis Jørn Vatn NTNU.
Topics Review of DTMC Classification of states Economic analysis
Oct. 2007State-Space ModelingSlide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
Dependability Evaluation through Markovian model.
Stochastic Processes Dr. Nur Aini Masruroh. Stochastic process X(t) is the state of the process (measurable characteristic of interest) at time t the.
Lecture 13 – Continuous-Time Markov Chains
Oct State-Space Modeling Slide 1 Fault-Tolerant Computing Motivation, Background, and Tools.
1 Software Testing and Quality Assurance Lecture 36 – Software Quality Assurance.
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte.
A. BobbioReggio Emilia, June 17-18, Dependability & Maintainability Theory and Methods Part 2: Repairable systems: Availability Andrea Bobbio Dipartimento.
Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.2.1 FAULT TOLERANT SYSTEMS Part 2 – Canonical.
3-1 Introduction Experiment Random Random experiment.
Introduction Before… Next…
Inferences About Process Quality
1 Spare part modelling – An introduction Jørn Vatn.
Control Charts for Attributes
1 1.1 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra SYSTEMS OF LINEAR EQUATIONS.
Generalized Semi-Markov Processes (GSMP)
A. BobbioBertinoro, March 10-14, Dependability Theory and Methods 2. Reliability Block Diagrams Andrea Bobbio Dipartimento di Informatica Università.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
Maintenance Policies Corrective maintenance: It is usually referred to as repair. Its purpose is to bring the component back to functioning state as soon.
Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC.
Why Wait?!? Bryan Gorney Joe Walker Dave Mertz Josh Staidl Matt Boche.
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Chapter 61 Continuous Time Markov Chains Birth and Death Processes,Transition Probability Function, Kolmogorov Equations, Limiting Probabilities, Uniformization.
Model under consideration: Loss system Collection of resources to which calls with holding time  (c) and class c arrive at random instances. An arriving.
1 Component reliability Jørn Vatn. 2 The state of a component is either “up” or “down” T 1, T 2 and T 3 are ”Uptimes” D 1 and D 2 are “Downtimes”
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
Fault-Tolerant Computing Systems #4 Reliability and Availability
© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.
Reliability and availability considerations for CLIC modulators Daniel Siemaszko OUTLINE : Give a specification on the availability of the powering.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
Reliability Failure rates Reliability
CS433 Modeling and Simulation Lecture 07 – Part 01 Continuous Markov Chains Dr. Anis Koubâa 14 Dec 2008 Al-Imam.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
One Function of Two Random Variables
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
REPAIRABLE SYSTEMS 1 AVAILABILITY ENGINEERING. A Single Repairable Component With Failure Rate λ and Repair Rate μ The component may exist in one of Two.
Fault Tree Analysis Part 11 – Markov Model. State Space Method Example: parallel structure of two components Possible System States: 0 (both components.
Engineering Probability and Statistics - SE-205 -Chap 3 By S. O. Duffuaa.
Part.2.1 In The Name of GOD FAULT TOLERANT SYSTEMS Part 2 – Canonical Structures Chapter 2 – Hardware Fault Tolerance.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Reliability Engineering
Prof. Enrico Zio Availability of Systems Prof. Enrico Zio Politecnico di Milano Dipartimento di Energia.
Adding Dynamic Nodes to Reliability Graph with General Gates using Discrete-Time Method Lab Seminar Mar. 12th, 2007 Seung Ki, Shin.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.4.1 FAULT TOLERANT SYSTEMS Part 4 – Analysis Methods Chapter 2 – HW Fault Tolerance.
Discrete-time Markov chain (DTMC) State space distribution
Availability Availability - A(t)
Engineering Probability and Statistics - SE-205 -Chap 3
Software Reliability PPT BY:Dr. R. Mall 7/5/2018.
V5 Stochastic Processes
Reliability Failure rates Reliability
§1-2 State-Space Description
T305: Digital Communications
Reliability Engineering
Discrete-time markov chain (continuation)
Solutions Markov Chains 6
Presentation transcript:

Reliable System Design 2011 by: Amir M. Rahmani 7. Markov Models Reliable System Design 2011 by: Amir M. Rahmani

Markov Models The primary difficulty with the combinatorial models is that many complex systems cannot be modeled easily in a combinatorial fashion. The fault coverage is sometimes difficult to incorporate into the reliability expression in a combinatorial model. The process of repair is very difficult to model in a combinatorial model. Alternative: Markov models matlab1.ir

Markov Process In 1907 A.A. Markov published a paper in which he defined and investigated the properties of what are now known as Markov processes. A Markov process with a discrete state space is referred to as a Markov Chain. A set of random variables forms a Markov chain if the probability that the next state is Sn+1 depends only on the current state Sn, and not on any previous states matlab1.ir

Markov Process A stochastic process is a function whose values are random variables The classification of a random process depends on different quantities – state space – index (time) parameter – statistical dependencies among the random variables X(t) for different values of the index parameter t. matlab1.ir

Markov Process Categories of Markov state-space models: 1. Discrete space and discrete time 2. Discrete space and continuous time 3. Continuous space and discrete time 4. Continuous space and continuous time The first two categories involve a discrete space; that is, the states of the system can be numbered with an integer. In the first and the third categories, the system changes by discrete time steps. The second category is the one most useful for modeling fault-tolerant systems. matlab1.ir

Markov Process States must be – mutually exclusive – collectively exhaustive Let Pi(t)= Probability of outgoing in the state Si at time t. Markov Properties – future state probability depends only on current state independent of time in state path to state matlab1.ir

State Transition Diagrams A Markov state transition diagram can graphically represent all: 1- System states and their initial conditions. 2- Transitions between system states and corresponding transition rates The transition rates are replaced with equivalent transition probabilities considering that the state transition time is very small (Δt ) this leads to 1- A situation where the system can remain in the current state after time t with some probability. 2- Thus, in the above case, a situation where the system can go to the next state(s) (transition rates) after time t with some probability. matlab1.ir

Construction of State Transition Diagram The basic steps in constructing state transition diagrams are: 1- Define the failure criteria of the system. 2- Enumerate all of the possible states of the system and classify them into good or failed states. 3- Determine the transition rates between various states and draw the state transition diagram matlab1.ir

Example State diagram for one component Let X denote the lifetime for a component. The Markov property is defined as follows: The probability that a component fails in the small interval Λt is proportional to the length of the interval. λ is the proportional constant. The probability above does not depend on the time t. matlab1.ir

Markov Process Assume exponential failure law with failure rate λ. Probability that system failed at t+Δt, given that is was working at time t is given by matlab1.ir

Reliability for one component The probability that the component works at the time t+ Δt is We divide with Δt Let Δt →0 , and we get matlab1.ir

Reliability for one component The solution to this differential equation is Assuming that the component works at the time t = 0, so The reliability of the component is: matlab1.ir

Failure probability for one component The probability that the component does not work at the time t+ Δt is We divide with Δt Let Δt →0 , and we get matlab1.ir

Failure probability for one component Solving the differential equation yields matlab1.ir

Markov chain model The equation system can be written using matrices where and Q is called the transition rate matrix. matlab1.ir

Cold stand-by system with one spare State diagram State labeling 2 Primary module works 1 Spare module works (Primary module does not work) 0 No module works, system failure Assumption: The failure rate for the spare is zero. matlab1.ir

Cold stand-by system with one spare We calculate the reliability of the system by solving the equation system Where matlab1.ir

The Equation System We solve this by Laplace transform using the following relation Laplace transforms: Time function Laplace transform matlab1.ir

Solving the Equation System The Laplace transform get where which give us matlab1.ir

Solving the Equation System 1- We compute which gives the following time function 2- We compute The reliability of the system can be written as: matlab1.ir

Calculating MTTF Let X1 and X2 denote the time spent in state 2 and state 1, respectively. MTTF for the system can then be written as Alternatively, the MTTF can be computed as matlab1.ir

Reliability matlab1.ir

Coverage Designing a fault-tolerant system that will correctly detect, mask or recover from every conceivable fault, or error, is not possible in practice. Even if a system can be designed to tolerate a very large number of faults, or errors, there are for most systems a non-zero probability that a single fault will be remained. such faults are known as “non-covered” faults. The probability that a fault is covered (i.e., correctly handled by the fault-tolerance mechanisms) is known as the coverage factor, and denoted c. The probability that a fault is non-covered can then be written as 1 - c. matlab1.ir

Cold Stand-by system with Coverage factor State diagram We can write-up the Q-matrix directly by inspecting the state diagram. matlab1.ir

Solving the Equation System We have the following equation system After applying the Laplace transform, we get We then compute matlab1.ir

Solving the Equation System can we compute directly from the first equation We then compute Reliability for the system is matlab1.ir

The Reliability with Coverage factor matlab1.ir

Calculating MTTF matlab1.ir

Availability Definition: the probability that a system is functioning properly at a given time t. When calculating the availability we consider both failures and repairs. We must make assumptions about the function time (up time) and the repair time (down time). The repair time consists of the time it takes to perform the repair, the time between the system failure and the repair is started, and the time it takes to restart the system after the repair is completed. matlab1.ir

Steady-state Availability E [X0] = MTTFF (Mean Time To First Failure) E [Xi] = MTTF (Mean Time To Failure) E [Yi] = MTTR (Mean Time To Repair) MTTR + MTTF = MTBF (Mean Time Between Failures) matlab1.ir

Design Tradeoffs MTTF → infinity (high reliability) How to make availability approach 100%? MTTF → infinity (high reliability) MTTR → zero (fast recovery) matlab1.ir

Availability vs. Reliability – Reliability is measured by mean time To failure (MTTF) - There is no repair in the state of system failure for modeling reliability. – Availability is a function of MTTF and mean time to repair (MTTR) MTTF/(MTTF+MTTR) – A system may have a high MTBF, but low availability matlab1.ir

Markov chain model for a simplex system State 0: System OK Failure rate: λ 1: System failure Repair rate: μ Availability: A(t) = P0 (t) Reliability: R(t) = e-λt Maintainability: M(t) = 1 – e-μt matlab1.ir

The availability for a simplex system matlab1.ir

The availability for a simplex system matlab1.ir

Steady-state Availability Assuming exponentially distributed function times and repair times, we get matlab1.ir

Markov chain for a hot stand-by system State 0,1: System OK Failure rate: λ 2: System failure Repair rate: μ Availability: A(t) = P0 (t) + P1 (t) Assumption: Only one repair-person works with the system when a failure has occurred. matlab1.ir

Safety Definition: The probability that a system is either functioning properly, or is in safe failed state. Calculating safety is similar to calculating reliability. In a reliability model there is usually only one absorbing state, while in a safety model there are at least two absorbing states. Among the absorbing states in a safety model, at least one represents that system is in a safe shut-down state, and at least one represents that a catastrophic failure has occurred. matlab1.ir

Safety for a simplex system with coverage factor We obtain the following markov chain model and the corresponding transition-rate matrix matlab1.ir

Safety for a simplex system with coverage factor The solutions of the differential equations are: The safety of the system is: The steady-state safety is: matlab1.ir