RELIABILITY, MAINTAINABILITY & AVAILABILITY INTRODUCTION

RELIABILITY, MAINTAINABILITY & AVAILABILITY INTRODUCTION
International Society of Logistics (SOLE) slides provided by Frank Vellella, C.P.L; Ken East, C.P.L & Bernard Price, C.P.L Slides for this overview of Reliability, Availability and Maintainability (RAM) were initially provided as a Society of Logistic (SOLE) presentation by Frank Vellella and Ken East and later updated by Bernie Price. This new SOLE presentation will provide an introduction into Reliability, Maintainability and Availability terminology in that corresponding order. Also, some reliability and availability mathematical concepts will be explained, which may help to magnify the understanding of the terminology if you can comprehend the math used.

System Reliability (λ = failure rate) (Mean Time To Failure)
The probability of performing a mission action without a mission failure within a specified mission time t A system with a 90% reliability has a 90% probability that the system will operate the mission duration without a critical failure The failure rate, Lambda, provides the frequency of failure occurrences over time The random variable in Reliability is time-to-failure (Mean Time To Failure) The reliability of a system used in Requirement Documents represents the probability that the system will perform a mission action without a mission failure within a specified mission time, represented as t. A system with a 90% reliability has a 90% probability that the system will operate the mission duration t without a critical failure. The failure rate, represented as Lambda (λ), provides the frequency of failure occurrences over time. 1 divided by λ is the reciprocal of the Failure Rate. It represents the mean time to failure, typically in units of hours per failure. When the failure rate remains constant over the systems life, the reliability equation for a system has its failure rate λ times the mission time t distributed exponentially. The system’s reliability R shown as function of the mission time t is equal to the exponent of minus λ times t. The Reliability equation for a system has the failure rate times the mission time distributed exponentially, Reliability R(t) is given by: (λ = failure rate)

Additional Time to Failure Terminology
Mean Time Between Operational Mission Failure (MTBOMF) – System mission reliability often associated to an operating mission requirement, where the failure causes a mission abort or mission degradation Mean Time Between Failure (MTBF) – System reliability typically associated to a design specification based on operating use. Per failure definition, the failure may be to any item causing a logistics demand or just critical items within the system Mean Calendar Time Between Failure (MCTBF) – System reliability typically associated to a system operational availability based on calendar time per failure Failure Factor (FF) – Component logistics reliability typically used for logistics support expressed in terms of failures or demands per 100 systems per year Additional Mean Time to Failure terminology frequently used is listed on this slide. The Mean Time Between Operational Mission Failure (MTBOMF) is often applied to a system mission reliability requirement where the system’s failure causes a mission abort of mission degradation. Mean Time Between Failures (MTBF) is often applied in a system reliability design specification. MTBF is also based on system operating time only. Depending on the failure definition used, the failure may be to any item causing a logistics demand or just to critical item failures within the system. The Mean Calendar Time Between Failures (MCTBF) is applied as a system reliability term tied to Operational Availability (Ao), which is based on the calendar time per failure. The Failure Factor is typically a component logistics reliability time to failure term tied to the calendar time frequency of logistics support. A Failure Factor is expressed in terms of failures or demands per 100 systems per year.

System Requirement Example
What is the MTBOMF of a system required to have a 91% reliability over a 72 hour mission pulse? As a system reliability requirement example, what is the Mean Time Between Operational Mission Failure (MTBOMF) of a system if the system is required to have a 91% reliability over a 72 hour mission pulse? The first equation repeats the reliability formula previously shown and then substitutes 1 divided by MTBOMF as the mission failure rate. The value of Reliability R(t) was provided as 0.91 for a mission time t of 72 hours. Applying the natural logarithm (ln) to each side of the equation, the exponent for 72 divided MTBOMF disappears, while the ln of 0.91 is – Multiplying each side of the equation by a –MTBOMF yields the equation times MTBOMF equals 72. Therefore, the MTBOMF in this example is operating hours per mission failure. operating hours per mission failure

System Reliability Terminology
System - Collection of components, subsystems and/or assemblies arranged to a specific design in order to achieve desired functions with acceptable performance and reliability The types of components, their quantities, their qualities and the manner in which they are arranged within the system have a direct effect on the system's reliability The reliability relationship between a system and its components is sometimes misunderstood or oversimplified An example non-valid statement is: If all components in a system have a 90% reliability at a given time, the reliability of the system is 90% for that time. System Reliability Terminology will now continue by breaking down the system. A system is a collection of components, subsystems or assemblies arranged to a specific design in order to achieve its desired functions with acceptable performance and reliability. The types of components, their quantities, their qualities and the manner in which they are arranged within the system have a direct effect on the system’s reliability. The reliability relationship between a system and its components is sometimes misunderstood or oversimplified. For example, the following statement is not valid. If all components in a system have a 90% reliability at a given time, the reliability of the system is 90% for that time. Unfortunately, a poor understanding of the relationship between a system and its constituent components can result in statements like this being accepted as factual, when in reality they are false.

System Reliability Terminology
Block Diagrams are widely used in engineering and science and exist in many different forms. Reliability Block Diagram (RBD) Describes the interrelation between the components to define the system Graphical representation of the system components and how they are reliability-wise related (connected) RBD may differ from how the components are physically connected After defining properties of each block in a system, the blocks can be connected in a reliability-wise manner to create a RBD for the system Block diagrams are widely used in engineering and science an exist in many different forms. A Reliability Block Diagram (RBD) can be used to describe the interrelation between the components in defining the system. A RBD is typically a graphical representation of the system components and how they are related or connected reliability-wise. The RBD may differ from how the components are physically connected. After defining properties of each block in a system, the blocks can be connected in a reliability-wise manner to create a RBD for the system. The RBD provides a visual representation of the way the reliability blocks are arranged.

Example Reliability Block Diagram
RBD of a simplified computer system with a redundant fan configuration A Reliability Block Diagram of a simplified computer system with a redundant fan configuration is shown in this illustration. If the power supply, processor or hard drive fail, the computer system will fail. If both fans in the computer system fail to operate, the computer will eventually overheat and fail. However, if one of the redundant fans fail with the other fan still operating, the computer system will continue to operate.

System Reliability Block Diagram
The System Reliability Function The RBD represents the system’s functioning state (i.e. success or failure) in terms of the functioning states of its components The RBD demonstrates the effect of the success or failure of a component on the success or failure of the system If all components in a system must succeed for the system to succeed, the components are arranged reliability-wise in series If one of two components must succeed in order for the system to succeed, those two components are arranged reliability-wise in parallel The reliability-wise arrangement of components is directly related to the derived mathematical description of the system The system's reliability function uses probabilistic methods for defining the system reliability from the component reliabilities System reliability is often described as a function of time The system Reliability Block Diagram (RBD) created represents the system’s functioning state for success or failure in terms of the functioning states of its components. In other words, this diagram demonstrates the effect of the success or failure of a component on the success or failure of the system. For example, if all components in a system must succeed for the system to succeed, the components are arranged reliability-wise in series. If one of two components must succeed in order for the system to succeed, those two components are arranged reliability-wise in parallel. The reliability-wise arrangement of components is directly related to the derived mathematical description of the system. With a RBD, the system’s reliability function uses probabilistic methods for defining the system reliability from the component reliabilities. System reliability is often described as a success function of mission time.

Series Configuration A failure of any component results in failure for the entire system When considering a system at the subsystem level, subsystems are often arranged reliability-wise in a series configuration Example: a PC may consist of four basic subsystems: the motherboard, hard drive, power supply and the processor A failure to any of these subsystems will cause a system failure All units in a series system must succeed for system to succeed In a series configuration, a failure of any component results in a failure for the entire system. In most cases, when considering a system at their subsystem level, subsystems are often arranged reliability-wise in a series configuration. For example, a Personal Computer may consist of four basic subsystems; the motherboard, the hard drive, the power supply and the processor. These are reliability-wise in series because a failure to any of these subsystems will cause a system failure. In other words, all units in a series system must succeed for system to succeed.

Series Configuration System Reliability
The reliability of the system is the probability that unit 1 succeeds and unit 2 succeeds and all of the other units in the system succeed All n units must succeed for the system to succeed The reliability of the system is then given by: For a series configuration system, the reliability of the system is the probability that unit 1 succeeds and unit 2 succeeds and all of the other units in the system succeed. So, all n units must succeed for the system to succeed. The reliability of the system is then given by the probability that component x, succeeds and the probability component 2 succeeds given that x, succeeded and so on until the probability of component n succeeds given that x, through xn-1. When the failure of components are not dependent on the failure other components in the system, the system reliability simply becomes the multiplication of the probability of each component succeeding. In the case of independent components, this becomes: Or:

Series System Reliability Example
Three subsystems are reliability-wise in series & make up a system Subsystem 1 has a reliability of 99.5% for a 100 hour mission Subsystem 2 has a reliability of 98.7% for a 100 hour mission Subsystem 3 has a reliability of 97.3% for a 100 hour mission What is the overall reliability of the system for a 100 hour mission? Solution to the RBD and Analytical System Reliability Example Since reliabilities of the subsystems are specified for 100 hours, the reliability of the system for a 100 hour mission is simply: As a system reliability example, three subsystems are reliability-wise in series to make up a system. Subsystem 1 has a reliability of 99.5% for a 100 hour mission, Subsystem 2 has a reliability of 98.7% for a 100 hour mission, and Subsystem 3 has a reliability of 97.3% for a 100 hour mission. What is the overall reliability of the system for a 100 hour mission? Since the reliabilities of all the subsystems are specified for 100 hours, the reliability of the system for a 100 hour mission is simply the reliability of subsystem 1 times the reliability of sub-system 2 times the reliability of sub-system 3. The reliability of the system for a 100 hour mission is computed to be 95.55%.

Basic System Reliability
Effect of Component Reliability in a Series System In a series configuration, the component with the smallest reliability has the biggest effect on the system's reliability Saying: A chain is only as strong as its weakest link Good example of the effect of a component in a series system In a chain, all the rings are in series and if any of the rings break, the system fails The weakest link in the chain is the one that will break first The weakest link dictates the strength of the chain in the same way that the weakest component/subsystem dictates the reliability of a series system As a result, the reliability of a series system is always less than the reliability of the least reliable component. In a series configuration, the component with the smallest reliability has the biggest effect on the system’s reliability. There is a saying that a chain is only a strong as its weakest link. This is a good example of the effect of a component in series system. In a chain, all the rings are in series and if any of the rings break, the system fails. In addition, the weakest link in a chain is the one that will likely break first. The weakest link dictates the strength of the chain in the same way that the weakest component or subsystem dictates the reliability of a series system. As a result, the reliability of a series system is always less than the reliability of the least reliable component. Therefore, if one of the components has a 90% reliability, the reliability for the system with series components will be less than 90% for that time.

Redundant Configuration
Simple Parallel Systems This slide illustrates a redundant configuration. In a redundant configuration, the items are viewed in parallel rather than series. n simple parallel systems are shown in the picture.

Redundant System Configuration
In a simple parallel system, at least one of the units must succeed for the system to succeed Units in parallel are also referred to as redundant units Redundancy is a very important aspect of system design & reliability because adding redundancy is one of several methods to improve system reliability Redundancy is widely used in the aerospace industry and generally used in mission critical systems In a simple parallel system, at least one of the units must succeed for the system to succeed. Units in parallel are also referred to as redundant units. Redundancy is a very important aspect of system design and reliability because adding redundancy is one of several methods to improve system reliability. Redundancy is often used at the weakest reliability point when designing reliability into the system. Redundancy is widely used in the aerospace industry and generally used in mission critical system; especially where failure of the system could be catastrophic.

Parallel Configuration System Reliability
The probability of failure, or unreliability, for a system with n statistically independent parallel components is the probability that unit 1 fails and unit 2 fails and all of the other units in the system fail In a parallel system, all n units must fail for the system to fail If unit 1 succeeds or unit 2 succeeds or any of the n units succeeds, then the system succeeds The unreliability of the system is then given by: The probability of failure, or unreliability, for a system with n statistically independent parallel components is the probability that unit 1 fails and unit 2 fails and all of the other units in the system fail. So in a parallel system, all n units must fail for the system to fail. To put it another way, if unit 1 succeeds or unit 2 succeeds or any of the n units succeeds, then the system succeeds. The unreliability of the parallel, redundant system is then given by the probability that component x, fails and the probability x2 fails given that x, failed and so on until the probability that component xn failed given that components x1 through xn-1 failed.

Redundant System Unreliability
In the case of independent components: Or Or, in terms of component unreliability: When the failure of components are not dependent on the failure of other components in the system, the system Unreliability, shown as Qs, becomes the multiplication of the probabilities of each component not succeeding the mission duration. In terms of component reliability, the unreliability is the probability of component non-success during the mission.

Redundant System Reliability
With the series system, the system reliability is the product of the component reliabilities With the parallel system, the overall system unreliability is the product of the component unreliabilities The reliability of the parallel system is then given by: Observe the contrast of redundant system reliability compared to series system reliability. With the series system, the system reliability is the product of the component reliabilities. With the parallel system, the overall system unreliability is the product of each component unreliability. The reliability of the parallel system is then given by 1 minus the unreliability of the system. In turn, this is equal to multiplication 1 minus the reliability of each parallel component in the system.

Redundant System Reqt. Example
What is the MTBOMF of each system when it is required to have 91% probability that 1 of 2 systems operate failure free over a 72 hour mission pulse? per system As a redundant system requirement example, what is the MTBOMF of each system when it is required to have 91% probability that 1 of 2 systems operate failure free over a 72 hour mission pulse? The first equation represents the reliability of 1 system to be 1 minus its unreliability. The unreliability of the redundant system is 9% when its reliability is 91%. The unreliability both systems, shown as Q2, is also equal to Taking the square root of each side of the equation yields the unreliability for one of the systems to be 30%, which provides a 0.7 reliability per system. Using the reliability of .7 for each system in the mission reliability equation, their MTBOMF value can eventually be calculated. After going through each mathematical step, the MTBOMF of each system is operating hours per mission failure. Note that if 2 systems perform the mission and only 1 system must not failure over 72 hour mission pulse, an MTBOMF of about 202 hours will satisfy the requirement. In a previous example when only 1 system was used to accomplish the mission, the system MTBOMF was required to be operating hours per mission failure. . operating hours per mission failure

Redundant System Reliability Example
Three subsystems are reliability-wise in parallel & make up a system Subsystem 1 has a reliability of 99.5% for a 100 hour mission Subsystem 2 has a reliability of 98.7% for a 100 hour mission Subsystem 3 has a reliability of 97.3% for a 100 hour mission What is the overall reliability of the system for a 100 hour mission? Solution to the RBD and Analytical System Reliability Example Since reliabilities of the subsystems are specified for 100 hours, the reliability of the system for a 100 hour mission is simply: In another redundant system reliability example, consider a system consisting of three subsystems arranged reliability-wise in parallel. Subsystem 1 has a reliability of 99.5% for a 100 hour mission, subsystem 2 has a reliability of 98.7% for a 100 hour mission, and subsystem 3 has a reliability of 97.3% for a 100 hour mission. What is the overall reliability of the system for a 100 hour mission? Since reliabilities of the subsystems are specified for 100 hours the reliability of the system for a 100 hour mission is simply 1 minus the quantity of the multiplication of the 3 subsystem’ unreliabilities. When the unreliabilities are multiplied together, they yield an unreliability close to 2 millionths for the system. With a redundant block diagram, the system reliability becomes extremely reliable at % In a previous example with the same subsystems in series, the reliability was only 95.55%.

Series Reliability Block Diagram
N T RA RB RC RN RT All elements, (A,B,C,…,N) must work for equipment T to work. The reliability of T is: This is a summary of an equipment reliability that applies a series reliability block diagram. The series reliability for equipment T needs all elements, A, B, C through N to work for equipment T to work. The reliability of equipment T is the reliability of A times the reliability of B times the reliability of C and so on until multiplied by the reliability of N. RT = RA • RB • RC • … • RN =

Block Diagrams with Parallel Reliability and Series Reliability
RB RC RT At least one of the elements (A,B) and element C must work for equipment T to work. The reliability of T is: This is a summary of an equipment reliability that applies both a parallel reliability block diagram and series reliability block diagram. At least one of the elements, A or B and element C must work for equipment T to work. The reliability of equipment T is 1 minus the product of the unreliabilities of A and B multiplied by the reliability of C. After manipulating the mathematics associated with the unreliability of A and B, its result becomes the reliability of A plus the reliability of B minus the product of the reliability of A and reliability of B. In a Venn Diagram, this redundancy represents the full areas of A and B minus the intersection of A and B.

Non-Repairable Systems
Non-repairable systems do not get repaired when they fail Specifically, components of the system are not removed or replaced when the system fails because it does not make economic sense to repair the system Repairing a four-year-old microwave oven is economically unreasonable when the repair costs approximately as much as purchasing a new unit Non-repairable systems do not get repaired when they fail. Specifically, components of the system are not removed or replaced when the system fails because it does not make economic sense to repair the system. A non-repairable product does not necessarily mean that it could not be repaired. It just makes economic sense to not repair the product. Repairing a four-year-old microwave oven is economically unreasonable when the repair would cost approximately as much as purchasing a new unit.

Repairable Systems Repairable systems get repaired when they fail
Repairs are done by replacing the failed components in system Example: An automobile is a repairable system when rendered inoperative by a component or subsystem failure by typically removing & replacing the failed components rather than purchasing a new automobile Failure distributions and repair distributions apply to repairable systems A failure distribution describes the time it takes for a component to fail A repair distribution describes the time it takes to repair a component (time-to-repair instead of time-to-failure) For repairable systems, the failure distribution itself is not a sufficient measure of system performance because it does not account for the repair distribution A performance criterion called availability is calculated to account for both the failure and repair distributions On the other hand, repairable systems get repaired when they fail. The repairs are done by repairing or replacing the failed components in system. For example, an automobile is a repairable system when rendered inoperative by a component or subsystem failure by typically removing and replacing the failed components rather than purchasing a new automobile. Both failure distributions and repair distributions apply to repairable systems. A failure distribution describes the time it takes for a component to fail. It is based on a time to failure variable. A repair distribution describes the time it takes to repair a component after failure. It is based on a time to repair variable. For repairable systems, the failure distribution itself is not a sufficient measure of system performance because it does not account for a repair distribution. A performance criterion called availability is calculated to account for both the failure and repair distributions.

System Maintainability/Maintenance
Deals with repairable system maintenance System Maintainability involves the time it takes to restore a system to a specified condition when maintenance is performed by personnel having specified skills using prescribed procedures and resources In general, maintenance is defined as any action that restores failed units to an operational condition or retains non-failed units in an operational state Maintenance plays a vital role in the life of a system affecting the system's overall reliability, availability, downtime, cost of operation, etc. Types of system maintenance actions: corrective maintenance, preventive maintenance & inspections System maintainability and maintenance deals with repairable system maintenance. System maintainability involves the time it takes to restore a system to a specified condition when maintenance is performed by forward support personnel having specified skills using prescribed procedures and resources to restore the system. In general, maintenance is defined as any action that restores failed units to an operational condition or retains non-failed units in an operational state. For repairable systems, maintenance plays a vital role in the life of a system. It affects the system’s overall reliability, availability, downtime and cost of operation. Generally, there are three types of system maintenance actions. They are corrective maintenance, preventive maintenance and inspections.

Corrective Maintenance
Actions taken to restore a failed system to operational status Usually involves replacing or repairing the component that is responsible for the failure of the overall system Corrective maintenance is performed at unpredictable intervals because a component's failure time is not known a priori The objective of corrective maintenance is to restore the system to satisfactory operation within the shortest possible time Corrective maintenance consists of actions taken to restore a failed system to operational status. This usually involves replacing or repairing the component that is responsible for the failure of the overall system. Corrective maintenance is performed at unpredictable intervals because of component’s failure time is not known a priori of before hand. The objective of corrective maintenance is to restore the system to satisfactory operation within the shortest possible time.

Corrective Maintenance Steps
Diagnosis of the problem Maintenance technician takes time to locate the failed parts or otherwise satisfactorily assess the cause of the system failure Repair and/or replacement of faulty component Action is taken to address the cause, usually by replacing or repairing the components that caused the system to fail Verification of the repair action Once components have been repaired or replaced, the maintenance technician must verify that the system is again successfully operating Corrective maintenance is typically carried out in three steps. The first step is diagnosis of the problem. In diagnosing the problem, the maintenance technician takes time to locate the failed parts or otherwise satisfactorily assess the cause of the system failure. The second step is to repair and/or replace the faulty component. Once the cause of system failure has been determined, action is taken to address the cause. This action is usually replacing or repairing the components that caused the system to fail. The final step is verification of the repair action. Once the components have been repaired or replaced, the maintenance technician must verify that the system is again successfully operating.

Preventive Maintenance
The practice of replacing components or subsystems before they fail to promote continuous system operation The preventive maintenance schedule is based on: Observation of past system behavior Component wear-out mechanisms Knowledge of components vital to continued system operation Cost is always a factor in the scheduling of preventive maintenance Reliability may be a factor, but cost is a more general term because reliability & risk can be expressed in terms of cost In many circumstances, it may be financially better to replace parts or components that have not failed at predetermined intervals rather than wait for a system failure that may result in a costly disruption in operations Preventive maintenance, unlike corrective maintenance, is the practice of replacing components or subsystems before they fail to promote continuous system operation. The schedule for preventive maintenance is based on observation of past system behavior, component wear-out mechanisms, and knowledge of components vital to continued system operation. Cost is always a factor in the scheduling of preventive maintenance. Improving reliability may also be a factor, but cost is a more general term because reliability and risk can be expressed in terms of cost. In many circumstances, it may be financially better to replace parts of components that have not failed at predetermined intervals rather than wait for a system failure that may result in a costly disruption in operations.

Inspections Used to uncover hidden failures (also called dormant failures) In general, no maintenance action is performed on the component during an inspection unless the component is found failed causing a corrective maintenance action to be initiated Sometimes there may be a partial restoration of the inspected item performed during an inspection For example, when checking the motor oil in a car between scheduled oil changes, one might occasionally add some oil in order to keep it at a constant level Inspections are used to uncover hidden failures, which are also called dormant failures. In general, no maintenance action is performed on the component during an inspection unless the component is found failed, in which case a corrective maintenance action is initiated. However, there might be cases where a partial restoration of the inspected item would be performed during an inspection. For example, when checking the motor oil in a car between scheduled oil changes, one might occasionally add some oil in order to keep it a constant level.

Maintenance Downtime There is time associated with each maintenance action, i.e. amount of time it takes to complete the action This time is referred to as downtime & defined as the length of time an item is not operational There are a number of different factors that can affect the length of downtime Physical characteristics of the system Repair crew availability Spare part availability & other ILS factors Human factors & Environmental factors There are two Downtime categories for these factors: Waiting Downtime & Active Downtime Maintenance actions that are preventive or corrective are not performed instantaneously. There is time associated with each maintenance action including the amount of time it takes to complete the action. This time is usually referred to as downtime and it is defined as the length of time an item is not operational. There are a number of different factors that can affect the length of downtime. These factors may be the physical characteristics of the system, repair crew availability, spare part availability and other Integrated Logistics Support factors. Human factors and environmental factors may also play a role in extending downtime. There are two downtime categories based on these factors. They are Waiting Downtime and Active Downtime.

Maintenance Downtime Waiting Downtime Active Downtime
The time during which the equipment is inoperable, but not yet undergoing repair For example, the time it takes for replacement parts to be shipped, administrative processing time, etc. Active Downtime The time during which the equipment is inoperable and actually undergoing repair The active downtime is the time it takes repair personnel to perform a repair or replacement The length of the active downtime is greatly dependent on human factors and the design of the equipment For example, the ease of accessibility of components in a system has a direct effect on the active downtime Waiting downtime is the time during which the equipment is inoperable, but not yet undergoing repair. This could be due to the time it takes for replacement parts to be shipped, administrative processing time, and so on. Active downtime is the time during which the equipment is inoperable and actually undergoing repair. In other words, the active downtime is the time it takes repair personnel to perform a repair or replacement. The length of the active downtime is greatly dependent on human factors as well as the design of the equipment. For example, the ease of accessibility of components in a system has a direct effect on the active downtime.

System Maintainability
The time it takes to repair/restore a specific item is a random variable implying an underlying probabilistic distribution Distributions describing the time-to-repair are repair or downtime distributions, distinguishing them from failure distributions Methods to quantify these distributions are similar, but differ in how employed, i.e. the events they describe and metrics utilized In failure distributions, unreliability provides the probability the event (failure) will occur by that time, while reliability provides the probability the event (failure) will not occur In downtime distributions, the times-to-repair data becomes the probability of the event (repairing the component) occurring The probability of repairing the component by a given time, t, is also called the component's maintainability System Maintainability is based on the time it takes to repair or restore a specific item. The time to repair is a random variable implying an underlying probabilistic distribution around a mean or average amount of time. Distributions that describe the time-to-repair are called repair or downtime distributions in order to distinguish them from failure distributions. However, the methods to quantify these distributions are not any different mathematically, but they do differ in how they are employed, such as the events they describe and metrics utilized. In failure distributions, unreliability provides the probability the event or failure will occur by that time, while reliability provides the probability the event or failure will not occur. In the case of downtime distributions, the data set consists of times-to-repair data. Therefore, what was termed as unreliability now becomes the probability of the event to repair the component as occurring. The probability of repairing the component by a given time, t, is also called the component’s maintainability.

System Maintainability
Maintainability is sometimes defined as a probability of performing a successful repair action within a given time Measures the ease & speed with which a system can be restored to operational status after a failure occurs For example, a component with a 90% maintainability in one hour has a 90% probability the component will be repaired in one hour Maintainability M(t) for a system with the repair times distributed exponentially is given by: Maintainability is sometimes defined as the probability of performing a successful repair action within a given time. System maintainability measures the ease and speed with which a system can be restored to an operational status after a failure occurs. For example, a component with a 90% maintainability in one hour has a 90% probability that the component will be repaired in one hour. Maintainability as a function of time t for a system with its repair times distributed exponentially is given by the equation on the bottom of the slide. In this equation, μ represents the repair rate and the reciprocal of μ or 1 divided by μ represents the mean time to repair. where Mean Time To Repair (MTTR) μ = repair rate

Maintainability/Time to Repair Terms
Mean Corrective Maintenance Time for Operational Mission Failure Repairs (MCMTOMF) is based on the average time to repair operational mission failures Mean Corrective Maintenance Time (MCMT) is based on the average corrective time to all failures Maximum (e.g. 90 percentile time) Corrective Maintenance Time (MaxCMT) for all incidents may be applied to maintainability testing Maintenance Ratio (MR) is a full maintenance burden requirement expressed in terms of the Mean Maintenance Man-Hours per Operating Hour, Mile, etc. The cumulative number of maintenance man-hours during a given period divided by the cumulative number of operating hours Additional Maintainability or Time to Repair terminology frequently used is listed on this slide. Mean Corrective Maintenance Time of Operational Mission Failure repairs is based on the average time to repair operational mission failures. Mean Corrective Maintenance Time is based on the average corrective time to all failures. Maximum Corrective Maintenance Time, such as the 90 percentile time for all incidents, may be applied to maintainability testing. The Maintenance Ratio of a system is a full maintenance burden requirement expressed in terms of the Mean Maintenance Man-Hours per Operating Hour or Mile. The cumulative number of maintenance man-hours during a given period divided by the cumulative number of operating hours yields a Maintenance Ratio.

Availability Considers both reliability (probability the item will not fail) and maintainability (probability the item is successfully restored after failure) Reliability, Availability, and Maintainability (RAM) are always associated with time Availability is the probability that the system/component is operational at a given time, t (i.e. has not failed or it has been restored after failure) May be defined as the probability an item is operable & can be committed at the start of a mission when the mission is called for at any unknown (random) point in time. Example: For a lamp with a 99.9% availability, there will be one time out of a thousand that someone needs to use the lamp and finds it is not operating Availability considers reliability, which covers the probability that the item will not fail and considers maintainability, which covers the probability that the item is successfully restored after failure. Reliability, Availability, and Maintainability known as the acronym RAM, are always associated with time. Availability is a probability that the system or component is operational at a given time, t. If an item has not failed or if the item has been restored after failure, it is considered operational. Availability may also be defined as the probability an item is operable and can be committed at the start of a mission when the mission is called for at any unknown (random) point in time. For example, for a lamp with a 99.9% availability, there will be one time out of a thousand that someone needs to use the lamp and finds it is not operating.

RAM Relationships Availability alone tells us nothing about how many times the lamp has been replaced Reliability and Maintainability metrics are still important. The table illustrates RAM relationships Availability alone tells us nothing about how many times the lamp has been replaced. The lamp could have been replaced ever day or the lamp may never have been replaced. Therefore, reliability and maintainability metrics are still important and needed. The table illustrates RAM relationships. If reliability remains constant and maintainability decreases to a larger time to repair, the availability will decrease. Conversely, if reliability remains constant and maintainability increases to a smaller time to repair, then availability will increase. If reliability increases to a larger time to failure and maintainability remains constant, then availability will increase. Conversely, if reliability decreases to a smaller time to failure and maintainability remains constant, then availability will decrease.

Inherent Availability
The steady state availability when considering only the corrective downtime of the system For a single component, this can be computed by: - For a system, the Mean Time Between Failures, or MTBF, is used to compute inherent availability: Inherent Availability is the steady state availability when considering only the corrective downtime of the system. For a single component, this can be computed as the Mean Time To Failure divided by quantity of the Mean Time To Failure plus the Mean Time To Repair. For a system, the Mean Time Between Failures, or MTBF, is used to compute inherent availability. Inherent Availability is the responsibility of the system designer and equipment manufacturer.

Achieved Availability
Achieved Availability is similar to Inherent Availability except Preventive Maintenance (PM) is also included The steady state availability when considering the corrective and preventive downtime of the system Computed by looking at the Mean Time Between Maintenance actions, MTBM and the Mean Maintenance Downtime: Achieved Availability is very similar to Inherent Availability with the exception that Preventive Maintenance downtimes are also included. Specifically, it is the steady state availability when considering the corrective and preventive downtime for the system. Achieved Availability is computed as the Mean Time Between Maintenance actions or MTBM divided by the MTBM plus the Mean Maintenance Downtime. Achieved Availability is also the responsibility of the system designer and equipment manufacture.

Operational Availability
Operational Availability is the percentage of calendar time to which one can expect a system to work properly when it is required Expression of User Need rather than just Design Need Operational Availability is the ratio of the system Uptime and Total time. Mathematically, it is: Operational Availability is the percentage of calendar time to which one can expect a system to work properly when it is require. Ao is an expression of User Need rather than just Design Need. Operational Availability is the ratio of the system Uptime and Total time. Mathematically, it is Uptime divided by the quantity of Uptime plus Downtime, which represents Total Time. Ao includes all experienced sources of downtime, such as administrative downtime and logistics downtime to restore the system. Some of the downtime factors are beyond the responsibility of the system designer making logistics planning and analysis very important when Ao is required or desired by the user of the system. Includes all experienced sources of downtime, such as administrative downtime and logistic downtime to restore the system

Basic System Availability
Previous availability definitions can be a priori estimations based on models of the system failure and downtime distributions Inherent Availability and Achieved Availability are controlled by the system designer/manufacturer Operational Availability is not solely controlled by the manufacturer due to variations in location, resources and logistics factors under the province of the end user of the product When recorded, an Operational Readiness Rate is the Operational Availability that the customer actually experiences. It is the a posteriori availability based on actual events that happened to the system All the previous availability definitions can be a priori estimations based on models of the system failure and downtime distributions. Inherent Availability and Achieved Availability are controlled by the system designer and manufacturer. Operational Availability is not solely controlled by the manufacturer due to variations in location, resources and logistics actors under the province of the end user of the product. When recorded properly, an Operational Readiness Rate is essentially the Operational Availability that the customer actually experiences. It is the a posteriori availability based on actual events that happened to the system

Ao / Operational Readiness Example
A diesel power generator is supplying electricity at a research site in Antarctica & personnel are not satisfied with the generator In the past six months, they estimate being without electricity due to generator failure for an accumulated time of 1.5 months Therefore, the operational availability of the diesel generator experienced by personnel of the station is: As an Operational Availability (Ao) or Operational Readiness Rate example, consider the following scenario. A diesel power generator is supplying electricity at a research site in Antarctica and the personnel there are not satisfied with the generator. In the past six months, they estimate being without electricity due to generator failure for an accumulated time of 1.5 months. Therefore, the Ao of the diesel generator experienced by personnel of the station is 75% as shown by being up only 4 ½ months over a 6 month timeframe.

Redundant Configurations
Hot Standby Redundancy Operates all systems or subassemblies simultaneously Accrues more failures by operating all items Switchover time to the redundant item is near instantaneous Uses the Binomial Distribution to determine the Operational Availability (Ao) of the redundant configuration Cold Standby Redundancy Redundant systems or subassemblies are treated like spares stored in the system configuration Accrues less failures by operating only the items needed Switchover time to the redundant item is needed Uses the Poisson Distribution to determine the Ao of the redundant configuration When considering the Operational Availability (Ao) of redundant configurations, the type of redundancy applied can yield different results. If hot standby redundancy is applied, all systems or subassemblies are operating simultaneously. Hot redundancies accrues more failures by operating all items, but switchover time to the redundant item is nearly instantaneous. Hot standby redundancy uses the Binomial Distribution to determine the Ao of the redundant configuration. If cold standby redundancy is applied, the redundant systems or subassemblies are treated like spares stored in the system configuration Cold standby redundancy accrues less failures by operating only the items needed. However, switchover down time to the redundant item now occurs. Cold standby redundancy uses the Poisson Distribution to determine the Ao of the redundant configuration.

Binomial Distribution
R out of N of the Same System Need To Be Up: Series configuration where R=N as all common items need to be up: because only the first Binomial term is used Hot standby redundancy applies the Binomial Distribution to determine the Ao of a redundant configuration. Sometimes multiple systems in a fleet of systems needs to be up rather than just one system. In general, if there is a fleet of N systems and R systems needs to be up for the fleet to be considered fully available, then the Fleet Ao is determined by Ao configuration Binomial Distribution equation on the upper part of this slide. The is only 1 combination where all N items are up and N combinations where N-1 items are up and 1 item is down. There are N times N-1 divided by 2 combinatorial possibilities where N-2 items are up and 2 items are down. In general, the combinatorial possibilities where R items are up and N-R items are down is N factorial divided by the quantity of N-R factorial times R factorial. For a series configuration where R is equal to N because all common items need to be up the Fleet Ao and configuration is Ao to the N power because only the first Binomial term is used. For a redundant configuration where R is equal to 1 because only one of the items need to be up, all terms of the Binomial Distribution apply except for the last term with the probability that all N items are down. Since all terms of a probability distribution sum up to 1, the Ao of this configuration is equal to 1 minus the quantity 1 - Ao to the N power. Redundant Config where R=1 as only 1 of the items needs to be up: Note: All terms of a Binomial Distribution sum up to 1 because all but the last Binomial term is used

RELIABILITY, MAINTAINABILITY & AVAILABILITY INTRODUCTION

Similar presentations

Presentation on theme: "RELIABILITY, MAINTAINABILITY & AVAILABILITY INTRODUCTION"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RELIABILITY, MAINTAINABILITY & AVAILABILITY INTRODUCTION

Similar presentations

Presentation on theme: "RELIABILITY, MAINTAINABILITY & AVAILABILITY INTRODUCTION"— Presentation transcript:

Similar presentations

About project

Feedback