DATA ANALYSIS: THEORY AND PRACTICE by Vicki M. Bier.

DATA ANALYSIS: THEORY AND PRACTICE by Vicki M. Bier

PRA DATA REQUIREMENTS FOR QUANTIFICATION Initiating event frequencies (internal and external) Component failure rates Component maintenance frequencies and durations Human error rates Common cause failure parameters Component fragilities (thermal, seismic, fire, flood, and missile)

DATA ANALYSIS TOPICS Assembling failure and success data Bayes’ Theorem and Bayesian updating Data analysis for common cause failures

ASSEMBLING FAILURE AND SUCCESS DATA Data collection Data interpretation Categorization of components and failure modes Assembling success (i.e., exposure) data

PRELIMINARY STEPS Determine level of detail –E.g., controller boards versus individual electronic components Identify components of interest –E.g., fuel system isolation valve Determine failure modes –E.g., failure to open on demand, failure to close on demand, plugging during system operation Define data base study period –E.g., all flight data, or only after initial design evolution period

DATA COLLECTION Correlate available data with risk assessment requirements Collect all relevant information –Corrective action reports –Anomaly reports –Operational histories (e.g., mission logs) –Results of check-out, qualification, and acceptance tests

TYPE OF INFORMATION AVAILABLE 1.General engineering knowledge about the design, manufacture, and operation of the equipment in question. 2.The historical performance of the equipment under conditions other than the one under consideration—e.g., test results; generic reliability data from MIL-HDBK; Rome Air Force Base reliability data for non-electronic parts, etc. 3.The historical performance of the equipment under the specific conditions being analyzed.

TYPES OF INFORMATION NEEDED FOR FAILURE FREQUENCY EVALUATION Generic information –Expert opinion –Case histories –Failure frequencies for similar components Component-specific information –Component type –Usage (e.g., cycling, standby, or continuous operation) –Testing, monitoring, and refurbishment policies –Performance history

DATA INTERPRETATION Need to determine: –Appropriate level of detail (e.g., resistors versus circuit boards) –Applicability of test data (e.g., bench tests) –Treatment of incident failures (e.g., turbine blade cracking) and out-of-spec operation (e.g., erratic turbine speed) –Treatment of non-critical components (e.g., temperature or pressure transducers, redundant equipment) –Treatment of corrective actions (e.g., improved cleanliness procedures, system design changes)

APPROPRIATE LEVEL OF DETAIL In general, components should be modeled at the highest level of detail for which good data is available Going beyond this level can significantly decrease accuracy Example – aggregating reliability data for individual electronic components may not match the observed failure history for a controller board

APPLICABILITY OF TEST DATA Tests of installed components under normal operating conditions (e.g., hot firing tests) are likely to give valid results Bench tests or tests outside the normal operating envelope may not give representative results

TREATMENT OF INCIPIENT FAILURES AND OUT-OF-SPECIFICATION OPERATION Examples – turbine blade cracking, erratic turbine speed In general, partial failures should not be included in the data base if the component in question could still perform its intended function Alternative approaches: –Explicitly model the chance of an incipient failure progressing to an actual failure; e.g., turbine blade cracking versus turbine failure –Take partial failures into account in establishing prior distributions; e.g., higher estimated prior probability of turbine over-speed for turbines with a history of erratic speed control

TREATMENT OF NONCRITICAL COMPONENTS Examples – pressure transducers, redundant valves in parallel Non-critical components can still be important: –If another component fails –If some action needs to be taken –Etc. In general, components should be ignored only if they have no safety-related function

TREATMENT OF CORRECTIVE ACTIONS Some types of corrective actions (e.g., addition of a new safety system) may make previous failures inapplicable. However, the effectiveness of actions such as improved maintenance or better cleanliness procedures is difficult to predict without either explicit data or detailed risk models The effectiveness of corrective actions may be disproved by subsequent failures!

CATEGORIZATION OF COMPONENTS AND FAILURE MODES Should similar components be grouped? What is the appropriate level of detail in identifying failure modes?

GROUPING OF SIMILAR COMPONENTS If components are effectively identical, grouping will result in smaller uncertainties by increasing the amount of data available This may not be appropriate due to: –Different operating environments; e.g., identical valves in gas, fuel, and water systems –Different operating modes; e.g., cyclic, continuous, or standby (continuously pulsing valves versus isolation valves) –Different failure models; e.g., fail open versus fail closed (primary versus secondary fuel control valves on loss of power supply) –Different inspection intervals or maintenance policies; e.g., inspected after every flight or not (fuel filters) –Different failure histories

CATEGORIZATION OF FAILURE MODES Treat startup failures differently from run-time failures; e.g., seizing of gearbox Different failure modes must be identified if they have different effects on the system; e.g., valve leaks versus valve stuck in wide open position Different failure causes (e.g., corrosion, fatigue, etc.) do not need to be identified except to aid in prioritization of corrective actions

NEED FOR SUCCESS OR EXPOSURE DATA Failure frequency = Failure of operating systems or components – e.g., loss of flow in a lube oil system (3 events in 110 hours of operation) Failure of passive systems or components – e.g., nitrogen leakage from fuel tank (0 events in 11,000 hours of experience) Failure of standby or cyclic components – e.g., heater switches (0 events in 72 demands) Number of failures Number of tests or operating hours

QUANTIFYING THE RELEVANT EXPOSURE DATA Identify the population of similar components –E.g., how many similar valves or switches are potentially vulnerable to this failure mode? Determine the appropriate units –Does the failure mode occur over time, or only with the imposition of a demand? –Can the failure mode occur at any time, or only when the system is running? –Are there “latent” failures? (e.g., failures occurring during orbit that are only detected at descent)

BAYES’ THEOREM AND BAYESIAN UPDATING

PROPERTIES OF THE BAYESIAN APPROACH 1.All types of information can be used 2.The use of judgment is visible and explicit 3.Provides a way of quantifying uncertainty

BAYESIAN SPECIALIZATION (UPDATE) Update generic distribution of information (“prior distribution”) with more specific or more detailed data (“evidence”) to obtain a weighted distribution which generally contains less uncertainty than the generic information (“specialized posterior distribution”) POSTERIOR PRIOR FAILURE RATE.01%1%

PROPERTIES OF BAYESIAN UPDATING Posterior probability is proportional to: –Prior probability –Likelihood of evidence Bayes’ Theorem assigns the appropriate weights in all cases: –With weak evidence, prior dominates results –With strong evidence, evidence dominates results Successive updating with several pieces of evidence gives same result as one-step updating with all the evidence

BAYES’ THEOREM – APPLICATION TO DATA ANALYSIS = A specific value for the frequency of some event E= Evidence related to that frequency = Prior probability that before observing evidence E = Posterior probability that after observing evidence E = Likelihood of observing evidence E if the true frequency is

LIKELIHOOD FUNCTION FOR DEMAND- BASED FAILURE RATES is the probability that a system or component will fail on each demand The likelihood of observing K failures in N demands is binomial; i.e.,

LIKLIHOOD FUNCTION FOR TIME-BASED FAILURE RATES is the frequency of failures per unit time The likelihood of observing K failures in T time units is Poisson; i.e.,

EXAMPLE OF BAYESIAN UPDATING PROBLEM 1:Use Bayes’ Theorem to calculate an updated (posterior) distribution for the component failure rate with evidence of 3 failures in 100 hours of operation PROBLEM 2:Repeat with evidence of 30 failures in 1,000 hours of operation Prior distribution of failure rate Failures per 100 hours 12310.5.4.3.2.1

SOLUTION APPROACH 1.Discrete form of Bayes’ Theorem 2.Assume TRUE = CONSTANT 3.Likelihood function is Poisson

PROBLEM 1 RESULTS COMPARISON OF PRIOR AND POSTERIOR DISTRIBUTIONS Prior distribution Failures per 100 hours 12310.5.4.3.2.1.6.7 )(P j 0 Posterior distribution with E= Failures per 100 hours 12310.5.4.3.2.1.6.7 )(P j

PROBLEM 2 RESULTS COMPARISON OF PRIOR AND POSTERIOR DISTRIBUTIONS Prior distribution Failures per 100 hours 12310.5.4.3.2.1.6.7 )(P j 0 Posterior distribution with E= Failures per 100 hours 12310.5.4.3.2.1.6.7 )(P j.8

SOURCES OF UNCERTAINTY Interpretation and classification of events –Treatment of incipient failures –Extrapolation of data from other situations Determination of success data (number of demands and exposure/mission time) –Ability to detect failures –Run time versus latent time Sample size/statistical uncertainty –Possibility of missing data –Variability between components, vehicles, or missions Mathematical modeling –Assumption of constant failure rates

THE KNOWLEDGE ELICITATION PROCESS Motivate the experts – explain the importance of the assessment and the fact that information (not a firm prediction) is the goal Structure the interview – define the question to be answered, the units or scale for answering the question, and any inherent assumptions Exploratory discussions – discuss the parameter of interest to detect biases and induce the expert to reveal his or her true judgments Encoding – ask questions to encode the expert’s judgments, beginning with extreme percentiles Verification – construct the probability distribution and verify that the expert believes it is valid

ISSUES IN THE ELICITATION OF EXPERT OPINION Expert calibration Problem decomposition Structured versus unstructured group processes Mathematical aggregation of opinions versus the consensus approach

DEPENDENT FAILURES Functional dependencies –One system/component cannot operate because it depends on another system/component for its function Examples: –Lack of electric power fails system –False instrumentation signal means controller fails Treated in: –Event tree models –Split fraction models

DEPENDENT FAILURES Common cause failures –Shared cause leading to failure of two or more components within a short time interval (e.g., inherent defect in design, subtle environmental condition that is not explicitly identified) –Not failure of another component which can be explicitly modeled; e.g., breaker supplying power to both pumps Example –Lube oil filter plugging because of common defect and inattentive inspections Treated in –Split fraction models

DEPENDENT FAILURES Spatial dependencies –One system/component fails by virtue of close proximity to another system/component that has failed in a way that causes a cascading effect Examples –Shrapnel –Detonations –Fire Treated in –Event trees –Simulation models

DATA ANALYSIS: THEORY AND PRACTICE by Vicki M. Bier.

Similar presentations

Presentation on theme: "DATA ANALYSIS: THEORY AND PRACTICE by Vicki M. Bier."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DATA ANALYSIS: THEORY AND PRACTICE by Vicki M. Bier.

Similar presentations

Presentation on theme: "DATA ANALYSIS: THEORY AND PRACTICE by Vicki M. Bier."— Presentation transcript:

Similar presentations

About project

Feedback