Presentation is loading. Please wait.

Presentation is loading. Please wait.

A. BobbioBertinoro, March 10-14, 20031 Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte.

Similar presentations


Presentation on theme: "A. BobbioBertinoro, March 10-14, 20031 Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte."— Presentation transcript:

1 A. BobbioBertinoro, March 10-14, 20031 Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte Orientale, “A. Avogadro” 15100 Alessandria (Italy) bobbio@unipmn.itbobbio@unipmn.it - http://www.mfn.unipmn.it/~bobbio Bertinoro, March 10-14, 2003

2 A. BobbioBertinoro, March 10-14, 20032 States and labeled state transitions State can keep track of: –Number of functioning resources of each type –States of recovery for each failed resource –Number of tasks of each type waiting at each resource –Allocation of resources to tasks A transition: –Can occur from any state to any other state –Can represent a simple or a compound event State-Space-Based Models

3 A. BobbioBertinoro, March 10-14, 20033 Transitions between states represent the change of the system state due to the occurrence of an event Drawn as a directed graph Transition label: –Probability: homogeneous discrete-time Markov chain (DTMC) –Rate: homogeneous continuous-time Markov chain (CTMC) –Time-dependent rate: non-homogeneous CTMC –Distribution function: semi-Markov process (SMP) State-Space-Based Models (Continued)

4 A. BobbioBertinoro, March 10-14, 20034 Modeler’s Options Should I Use Markov Models? State-Space-Based Methods + Model Dependencies + Model Fault-Tolerance and Recovery/Repair + Model Contention for Resources + Model Concurrency and Timeliness + Generalize to Markov Reward Models for Modeling Degradable Performance

5 A. BobbioBertinoro, March 10-14, 20035 Modeler’s Options Should I Use Markov Models? + Generalize to Markov Regenerative Models for Allowing Generally Distributed Event Times + Generalize to Non-Homogeneous Markov Chains for Allowing Weibull Failure Distributions + Performance, Availability and Performability Modeling Possible - Large (Exponential) State Space

6 A. BobbioBertinoro, March 10-14, 20036 In order to fulfill our goals Modeling Performance, Availability and Performability Modeling Complex Systems We Need Automatic Generation and Solution of Large Markov Reward Models

7 A. BobbioBertinoro, March 10-14, 20037 Model-based evaluation Choice of the model type is dictated by: –Measures of interest –Level of detailed system behavior to be represented –Ease of model specification and solution –Representation power of the model type –Access to suitable tools or toolkits

8 A. BobbioBertinoro, March 10-14, 20038 State space models A transition represents the change of state of a single component x i s s’ Pr {s  s’,  t} = Pr {Z(t+  t) = s’ | Z(t) = s} Z(t) is the stochastic process Pr {Z(t) = s} is the probability of finding Z(t) in state s at time t.

9 A. BobbioBertinoro, March 10-14, 20039 State space models If s  s’ represents a failure event: x i s s’ Pr {s  s’,  t} = = Pr {Z(t+  t) = s’ | Z(t) = s} = i  t If s  s’ represents a repair event: Pr {s  s’,  t} = = Pr {Z(t+  t) = s’ | Z(t) = s} =  i  t

10 A. BobbioBertinoro, March 10-14, 200310 Markov Process: definition

11 Transition Probability Matrix initial

12 State Probability Vector

13 Chapman-Kolmogorov Equations

14 Time-homogeneous CTMC

15

16 The transition rate matrix

17 C-K Equations for CTMC

18 Solution equations

19 Transient analysis Given that the initial state of the Markov chain, then the system of differential Equations is written based on: rate of buildup = rate of flow in - rate of flow out for each state (continuity equation).

20 Steady-state condition If the process reaches a steady state condition, then:

21 Steady-state analysis (balance equation) The steady-state equation can be written as a flow balance equation with a normalization condition on the state probabilities. (rate of buildup) = rate of flow in - rate of flow out rate of flow in = rate of flow out for each state (balance equation).

22 A. BobbioBertinoro, March 10-14, 200322 2-component system

23 A. BobbioBertinoro, March 10-14, 200323 2-component system

24 A. BobbioBertinoro, March 10-14, 200324 2-component system

25 A. BobbioBertinoro, March 10-14, 200325 2-component series system A1A1A2 2-component parallel system A1A1 A2

26 A. BobbioBertinoro, March 10-14, 200326 2-component stand-by system A B

27 A. BobbioBertinoro, March 10-14, 200327 Repairable system: Availability

28 A. BobbioBertinoro, March 10-14, 200328 Repairable system: 2 identical components

29 A. BobbioBertinoro, March 10-14, 200329 Repairable system: 2 identical components

30 A. BobbioBertinoro, March 10-14, 200330  Assume we have a two-component parallel redundant system with repair rate .  Assume that the failure rate of both the components is.  When both the components have failed, the system is considered to have failed. 2-component Markov availability model

31 A. BobbioBertinoro, March 10-14, 200331 Markov availability model  Let the number of properly functioning components be the state of the system.  The state space is {0,1,2} where 0 is the system down state.  We wish to examine effects of shared vs. non- shared repair.

32 A. BobbioBertinoro, March 10-14, 200332 210 210 Non-shared (independent) repair Shared repair Markov availability model

33 A. BobbioBertinoro, March 10-14, 200333 Note: Non-shared case can be modeled & solved using a RBD or a FTREE but shared case needs the use of Markov chains. Markov availability model

34 A. BobbioBertinoro, March 10-14, 200334 Steady-state balance equations For any state: Rate of flow in = Rate of flow out Considering the shared case  i : steady state probability that system is in state i

35 A. Bobbio35 Steady-state balance equations Hence Since We have Or

36 A. BobbioBertinoro, March 10-14, 200336 Steady-state balance equations (Continued) Steady-state Unavailability: For the Shared Case =  0 = 1 - A shared Similarly, for the Non-Shared Case, Steady-state Unavailability = 1 - A non-shared Downtime in minutes per year = (1 - A)* 8760*60

37 A. BobbioBertinoro, March 10-14, 200337 Steady-state balance equations

38 A. BobbioBertinoro, March 10-14, 200338 Absorbing states MTTF

39 A. BobbioBertinoro, March 10-14, 200339 Absorbing states - MTTF

40 Markov Reliability Model with Imperfect Coverage

41 A. BobbioBertinoro, March 10-14, 200341 Markov model with imperfect coverage Next consider a modification of the 2-component parallel system proposed by Arnold as a model of duplex processors of an electronic switching system. We assume that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. The state diagram is now given by the following picture:

42 A. BobbioBertinoro, March 10-14, 200342 Now allow for Imperfect coverage c

43 A. BobbioBertinoro, March 10-14, 200343 Markov model with imperfect coverage Assume that the initial state is 2 so that: Then the system of differential equations are:

44 A. BobbioBertinoro, March 10-14, 200344 Markov model with imperfect coverage After solving the differential equations we obtain: R(t)=P 2 (t) + P 1 (t) From R(t), we can obtain system MTTF: It should be clear that the system MTTF and system reliability are critically dependent on the coverage factor.

45 A. BobbioBertinoro, March 10-14, 200345 Source of fault coverage data Measurement data from an operational system  Large amount of data needed  Improved instrumentation needed Fault-injection experiments  Expensive but badly needed  Tools from CMU,Illinois, LAAS (Toulouse) A fault/error handling submodel (FEHM)  Phases: detection, location, retry, reconfig, reboot  Estimate duration and probability of success of each phase

46 A. BobbioBertinoro, March 10-14, 200346 Redundant System with Finite Detection Switchover Time  Modify the Markov model with imperfect coverage to allow for finite time to detect as well as imperfect detection.  You will need to add an extra state, say D.  The rate at which detection occurs is .  Draw the state diagram and investigate the effects of detection delay on system reliability and mean time to failure.

47 A. BobbioBertinoro, March 10-14, 200347 Redundant System with Finite Detection Switchover Time Assumptions:  Two units have the same MTTF and MTTR;  Single shared repair person;  Average detection/switchover time t sw =1/  ;  We need to use a Markov model.

48 A. BobbioBertinoro, March 10-14, 200348 Redundant System with Finite Detection Switchover Time 1 1D2 0

49 A. BobbioBertinoro, March 10-14, 200349 Redundant System with Finite Detection Switchover Time After solving the Markov model, we obtain steady-state probabilities:

50 A. BobbioBertinoro, March 10-14, 200350 Closed-form

51 A. BobbioBertinoro, March 10-14, 200351 WFS Example

52 A. BobbioBertinoro, March 10-14, 200352 A Workstations-Fileserver Example Computing system consisting of: –A file-server –Two workstations –Computing network connecting them System operational as long as: –One of the Workstations and –The file-server are operational Computer network is assumed to be fault-free

53 A. BobbioBertinoro, March 10-14, 200353 The WFS Example

54 A. BobbioBertinoro, March 10-14, 200354 Assuming exponentially distributed times to failure – w : failure rate of workstation – f : failure rate of file-server Assume that components are repairable –  w : repair rate of workstation –  f : repair rate of file-server File-server has priority for repair over workstations (such repair priority cannot be captured by non-state- space models) Markov Chain for WFS Example

55 A. BobbioBertinoro, March 10-14, 200355 Markov Availability Model for WFS 0,0 2,11,1 1,02,0 0,1 f 2 w w ww ww w ff ff ff f f Since all states are reachable from every other states, the CTMC is irreducible. Furthermore, all states are positive recurrent.

56 A. BobbioBertinoro, March 10-14, 200356 In the figure, the label (i,j) of each state is interpreted as follows:  i represents the number of workstations that are still functioning  j is 1 or 0 depending on whether the file-server is up or down respectively. Markov Availability Model for WFS (Continued)

57 A. BobbioBertinoro, March 10-14, 200357 For the example problem, with the states ordered as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q matrix is given by: Markov Availability Model for WFS (Continued) Q =

58 A. BobbioBertinoro, March 10-14, 200358 Markov Model (steady-state)  : Steady-state probability vector These are called steady-state balance equations rate of flow in = rate of flow out after solving for obtain Steady-state availability

59 A. BobbioBertinoro, March 10-14, 200359 We compute the availability of the system: System is available as long as it is in states (2,1) and (1,1). Instantaneous availability of the system: Markov Availability Model

60 A. BobbioBertinoro, March 10-14, 200360 Markov Availability Model (Continued)

61 A. BobbioBertinoro, March 10-14, 200361 Assume that the computer system does not recover if both workstations fail, or if the file-server fails Markov Reliability Model with Repair

62 A. BobbioBertinoro, March 10-14, 200362 Markov Reliability Model with Repair States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1) are transient states. Note: we have made a simplification that, once the CTMC reaches a system failure state, we do not allow any more transitions.

63 A. BobbioBertinoro, March 10-14, 200363 Markov Model with Absorbing States If we solve for P 2,1 (t) and P 1,1 (t) then R(t)=P 2,1 (t) + P 1,1 (t) For a Markov chain with absorbing states: A: the set of absorbing states B =  - A: the set of remaining states z i,j : Mean time spent in state i,j until absorption

64 A. BobbioBertinoro, March 10-14, 200364 Markov Model with Absorbing States (Continued) Mean time to absorption MTTA is given as: Q B derived from Q by restricting it to only states in B

65 A. BobbioBertinoro, March 10-14, 200365 Markov Reliability Model with Repair (Continued) [ ]

66 A. BobbioBertinoro, March 10-14, 200366 Mean time to failure is 19992 hours. Markov Reliability Model with Repair (Continued)

67 A. BobbioBertinoro, March 10-14, 200367 Assume that neither workstations nor file- server is repairable Markov Reliability Model without Repair

68 A. BobbioBertinoro, March 10-14, 200368 Markov Reliability Model without Repair (Continued) States (0,1), (1,0) and (2,0) become absorbing states

69 A. BobbioBertinoro, March 10-14, 200369 Mean time to failure is 9333 hours. Markov Reliability Model without Repair (Continued) [ ]


Download ppt "A. BobbioBertinoro, March 10-14, 20031 Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte."

Similar presentations


Ads by Google