Presentation is loading. Please wait.

Presentation is loading. Please wait.

A. BobbioReggio Emilia, June 17-18, 20031 Dependability & Maintainability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica.

Similar presentations


Presentation on theme: "A. BobbioReggio Emilia, June 17-18, 20031 Dependability & Maintainability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica."— Presentation transcript:

1 A. BobbioReggio Emilia, June 17-18, Dependability & Maintainability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte Orientale, “A. Avogadro” Alessandria (Italy) - IFOA, Reggio Emilia, June 17-18, 2003

2 A. BobbioReggio Emilia, June 17-18, States and labeled state transitions State can keep track of: –Number of functioning resources of each type –States of recovery for each failed resource –Number of tasks of each type waiting at each resource –Allocation of resources to tasks A transition: –Can occur from any state to any other state –Can represent a simple or a compound event State-Space-Based Models

3 A. BobbioReggio Emilia, June 17-18, Transitions between states represent the change of the system state due to the occurrence of an event Drawn as a directed graph Transition label: –Probability: homogeneous discrete-time Markov chain (DTMC) –Rate: homogeneous continuous-time Markov chain (CTMC) –Time-dependent rate: non-homogeneous CTMC –Distribution function: semi-Markov process (SMP) State-Space-Based Models (Continued)

4 A. BobbioReggio Emilia, June 17-18, Modeler’s Options Should I Use Markov Models? State-Space-Based Methods + Model Dependencies + Model Fault-Tolerance and Recovery/Repair + Model Contention for Resources + Model Concurrency and Timeliness + Generalize to Markov Reward Models for Modeling Degradable Performance

5 A. BobbioReggio Emilia, June 17-18, Modeler’s Options Should I Use Markov Models? + Generalize to Markov Regenerative Models for Allowing Generally Distributed Event Times + Generalize to Non-Homogeneous Markov Chains for Allowing Weibull Failure Distributions + Performance, Availability and Performability Modeling Possible - Large (Exponential) State Space

6 A. BobbioReggio Emilia, June 17-18, In order to fulfil our goals Modeling Performance, Availability and Performability Modeling Complex Systems We Need Automatic Generation and Solution of Large Markov Reward Models

7 A. BobbioReggio Emilia, June 17-18, Model-based evaluation Choice of the model type is dictated by: –Measures of interest –Level of detailed system behavior to be represented –Ease of model specification and solution –Representation power of the model type –Access to suitable tools or toolkits

8 A. BobbioReggio Emilia, June 17-18, State space models A transition represents the change of state of a single component x i s s’ Pr {s  s’,  t} = Pr {Z(t+  t) = s’ | Z(t) = s} Z(t) is the stochastic process Pr {Z(t) = s} is the probability of finding Z(t) in state s at time t.

9 A. BobbioReggio Emilia, June 17-18, State space models If s  s’ represents a failure event: x i s s’ Pr {s  s’,  t} = = Pr {Z(t+  t) = s’ | Z(t) = s} = i  t If s  s’ represents a repair event: Pr {s  s’,  t} = = Pr {Z(t+  t) = s’ | Z(t) = s} =  i  t

10 A. BobbioReggio Emilia, June 17-18, Markov Process: definition

11 Transition Probability Matrix initial

12 State Probability Vector

13 Chapman-Kolmogorov Equations

14 Time-homogeneous CTMC

15

16 The transition rate matrix

17 C-K Equations for CTMC

18 Solution equations

19 Transient analysis Given that the initial state of the Markov chain, then the system of differential Equations is written based on: rate of buildup = rate of flow in - rate of flow out for each state (continuity equation).

20 Steady-state condition If the process reaches a steady state condition, then:

21 Steady-state analysis (balance equation) The steady-state equation can be written as a flow balance equation with a normalization condition on the state probabilities. (rate of buildup) = rate of flow in - rate of flow out rate of flow in = rate of flow out for each state (balance equation).

22 State Classification

23 A. BobbioReggio Emilia, June 17-18, component system

24 A. BobbioReggio Emilia, June 17-18, component system

25 A. BobbioReggio Emilia, June 17-18, component system

26 A. BobbioReggio Emilia, June 17-18, component series system A1A1A2 2-component parallel system A1A1 A2

27 A. BobbioReggio Emilia, June 17-18, component stand-by system A B

28 Markov Models Repairable systems - Availability

29 A. BobbioReggio Emilia, June 17-18, Repairable system: Availability

30 A. BobbioReggio Emilia, June 17-18, Repairable system: 2 identical components

31 A. BobbioReggio Emilia, June 17-18, Repairable system: 2 identical components

32 A. BobbioReggio Emilia, June 17-18,  Assume we have a two-component parallel redundant system with repair rate .  Assume that the failure rate of both the components is.  When both the components have failed, the system is considered to have failed. 2-component Markov availability model

33 A. BobbioReggio Emilia, June 17-18, Markov availability model  Let the number of properly functioning components be the state of the system.  The state space is {0,1,2} where 0 is the system down state.  We wish to examine effects of shared vs. non- shared repair.

34 A. BobbioReggio Emilia, June 17-18, Non-shared (independent) repair Shared repair Markov availability model

35 A. BobbioReggio Emilia, June 17-18, Note: Non-shared case can be modeled & solved using a RBD or a FTREE but shared case needs the use of Markov chains. Markov availability model

36 A. BobbioReggio Emilia, June 17-18, Steady-state balance equations For any state: Rate of flow in = Rate of flow out Considering the shared case  i : steady state probability that system is in state i

37 A. BobbioReggio Emilia, June 17-18, Steady-state balance equations Hence Since We have Or

38 A. BobbioReggio Emilia, June 17-18, Steady-state balance equations (Continued) Steady-state Unavailability: For the Shared Case =  0 = 1 - A shared Similarly, for the Non-Shared Case, Steady-state Unavailability = 1 - A non-shared Downtime in minutes per year = (1 - A)* 8760*60

39 A. BobbioReggio Emilia, June 17-18, Steady-state balance equations

40 A. BobbioReggio Emilia, June 17-18, Absorbing states MTTF

41 A. BobbioReggio Emilia, June 17-18, Absorbing states - MTTF

42 Markov Reliability Model with Imperfect Coverage

43 A. BobbioReggio Emilia, June 17-18, Markov model with imperfect coverage Next consider a modification of the 2-component parallel system proposed by Arnold as a model of duplex processors of an electronic switching system. We assume that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. The state diagram is now given by the following picture:

44 A. BobbioReggio Emilia, June 17-18, Now allow for Imperfect coverage c

45 A. BobbioReggio Emilia, June 17-18, Markov model with imperfect coverage Assume that the initial state is 2 so that: Then the system of differential equations are:

46 A. BobbioReggio Emilia, June 17-18, Markov model with imperfect coverage After solving the differential equations we obtain: R(t)=P 2 (t) + P 1 (t) From R(t), we can obtain system MTTF: It should be clear that the system MTTF and system reliability are critically dependent on the coverage factor.

47 A. BobbioReggio Emilia, June 17-18, Source of fault coverage data Measurement data from an operational system  Large amount of data needed  Improved instrumentation needed Fault-injection experiments  Expensive but badly needed  Tools from CMU,Illinois, LAAS (Toulouse) A fault/error handling submodel (FEHM)  Phases: detection, location, retry, reconfig, reboot  Estimate duration and probability of success of each phase

48 A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time  Modify the Markov model with imperfect coverage to allow for finite time to detect as well as imperfect detection.  You will need to add an extra state, say D.  The rate at which detection occurs is .  Draw the state diagram and investigate the effects of detection delay on system reliability and mean time to failure.

49 A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time Assumptions:  Two units have the same MTTF and MTTR;  Single shared repair person;  Average detection/switchover time t sw =1/  ;  We need to use a Markov model.

50 A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time 1 1D2 0

51 A. BobbioReggio Emilia, June 17-18, Redundant System with Finite Detection Switchover Time After solving the Markov model, we obtain steady-state probabilities:

52 A. BobbioReggio Emilia, June 17-18, Closed-form

53 A. BobbioReggio Emilia, June 17-18, WFS Example

54 A. BobbioReggio Emilia, June 17-18, A Workstations-Fileserver Example Computing system consisting of: –A file-server –Two workstations –Computing network connecting them System operational as long as: –One of the Workstations and –The file-server are operational Computer network is assumed to be fault-free

55 A. BobbioReggio Emilia, June 17-18, The WFS Example

56 A. BobbioReggio Emilia, June 17-18, Assuming exponentially distributed times to failure – w : failure rate of workstation – f : failure rate of file-server Assume that components are repairable –  w : repair rate of workstation –  f : repair rate of file-server File-server has priority for repair over workstations (such repair priority cannot be captured by non-state- space models) Markov Chain for WFS Example

57 A. BobbioReggio Emilia, June 17-18, Markov Availability Model for WFS 0,0 2,11,1 1,02,0 0,1 f 2 w w ww ww w ff ff ff f f Since all states are reachable from every other states, the CTMC is irreducible. Furthermore, all states are positive recurrent.

58 A. BobbioReggio Emilia, June 17-18, In the figure, the label (i,j) of each state is interpreted as follows:  i represents the number of workstations that are still functioning  j is 1 or 0 depending on whether the file-server is up or down respectively. Markov Availability Model for WFS (Continued)

59 A. BobbioReggio Emilia, June 17-18, For the example problem, with the states ordered as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q matrix is given by: Markov Availability Model for WFS (Continued) Q =

60 A. BobbioReggio Emilia, June 17-18, Markov Model (steady-state)  : Steady-state probability vector These are called steady-state balance equations rate of flow in = rate of flow out after solving for obtain Steady-state availability

61 A. BobbioReggio Emilia, June 17-18, We compute the availability of the system: System is available as long as it is in states (2,1) and (1,1). Instantaneous availability of the system: Markov Availability Model

62 A. BobbioReggio Emilia, June 17-18, Markov Availability Model (Continued)

63 A. BobbioReggio Emilia, June 17-18, Assume that the computer system does not recover if both workstations fail, or if the file-server fails Markov Reliability Model with Repair

64 A. BobbioReggio Emilia, June 17-18, Markov Reliability Model with Repair States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1) are transient states. Note: we have made a simplification that, once the CTMC reaches a system failure state, we do not allow any more transitions.

65 A. BobbioReggio Emilia, June 17-18, Markov Model with Absorbing States If we solve for P 2,1 (t) and P 1,1 (t) then R(t)=P 2,1 (t) + P 1,1 (t) For a Markov chain with absorbing states: A: the set of absorbing states B =  - A: the set of remaining states z i,j : Mean time spent in state i,j until absorption

66 A. BobbioReggio Emilia, June 17-18, Markov Model with Absorbing States (Continued) Mean time to absorption MTTA is given as: Q B derived from Q by restricting it to only states in B

67 A. BobbioReggio Emilia, June 17-18, Markov Reliability Model with Repair (Continued) [ ]

68 A. BobbioReggio Emilia, June 17-18, Mean time to failure is hours. Markov Reliability Model with Repair (Continued)

69 A. BobbioReggio Emilia, June 17-18, Assume that neither workstations nor file- server is repairable Markov Reliability Model without Repair

70 A. BobbioReggio Emilia, June 17-18, Markov Reliability Model without Repair (Continued) States (0,1), (1,0) and (2,0) become absorbing states

71 A. BobbioReggio Emilia, June 17-18, Mean time to failure is 9333 hours. Markov Reliability Model without Repair (Continued) [ ]


Download ppt "A. BobbioReggio Emilia, June 17-18, 20031 Dependability & Maintainability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica."

Similar presentations


Ads by Google