Presentation on theme: "Accelerated Stress Testing and Reliability Workshop October 9-11, 2013 San Diego, CA Accelerating Reliability into the 21 st Century Keynote Presenter."— Presentation transcript:
Accelerated Stress Testing and Reliability Workshop October 9-11, 2013 San Diego, CA Accelerating Reliability into the 21 st Century Keynote Presenter Day 1: Vice Admiral Walter Massenburg Keynote Presenter Day 2: Alain Bensoussan, Thales Avionics Accelerating Reliability into the 21 st Century Keynote Presenter Day 1: Vice Admiral Walter Massenburg Keynote Presenter Day 2: Alain Bensoussan, Thales Avionics & CALL FOR PRESENTATIONS: We are now Accepting Abstracts. to: Guidelines on website For more details, click here to join our LinkedIn Group: IEEE/CPMT Workshop on Accelerated Stress Testing and Reliability
This is the 3rd of a series of four webinars being put on by Ops A La Carte, ASTR, and ASQ Reliability Division Each webinar will also be presented as a full 2 hour tutorial at our ASTR Workshop Oct 9-11 th, San Diego. Abstracts for presentations are due Apr 30.
Introduction 5 min Accelerated Reliability Growth Testing45 min Questions10 min Agenda
Upcoming Reliability Webinars Title: 40 Years of HALT: What Have We Learned Author: Mike Silverman Date: Sept 12, 2013, 12pm EST have-we-learned/ Location: Webinar HALT began 40 years ago with a simple idea of testing beyond specifications in order to better understand design margins. Over the past 40 years, thousands of engineers around the world have been exposed to the concepts of HALT and have tried the techniques. This tutorial will explore what we have learned in the past 40 Years and what the future of HALT could be.
Registration Demographics For this webinar we have signed up –250 Registrants –17 Countries –28 US States
Registration Question #1 Have you ever performed a Reliability Growth Test? –Never 45% –All the time25% –Tried Once20%
Registration Question #1 For your last RGT, did you have a chance to plan the duration and stresses? –Neither50% –Both25% –Duration Only10% –Stresses only10%
Page 10 Identify shortcomings of traditional reliability growth testing and offer alternatives Reliability Growth Test objectives Explain traditional Reliability Growth test methodology along with the assumptions Show shortfalls of the traditional methods Entire item failure rate not calculated and presented in results Test duration too long for the modern high reliability items Little or no relationship of reliability and stresses on the tested item Show principles of the Physics of Failure test methodology Show how the Reliability growth test based on PoF is constructed Show how the expected stresses are applied and accelerated Show how to account for total final failure rates Show achieved considerable test cost reduction. Tutorial Objectives
Page 11 Overall test duration determined based on the initial and goal reliability measure: failure rates Mean Time Between Failures, MTBF (or MTTF) Initial failure rate estimated for the entire item and then used for calculations of reliability growth Reliability growth parameters and test duration determined based on the goal reliability - mathematically Magnitude (stress level of applied operational and environmental stresses equal to those in use – but not their duration Applied stress duration determined by engineering judgment, and level by assumptions of some mean stress Overall test duration and stress application are unrelated to use profiles or required life or mission of the product – only to mathematics Additional errors: Mathematical Traditional RG Test Methodology
Page 12 Goal: Increase the current (existing reliability – measured in mean time between failures) Goal magnitude guided by: Requirement or commercial logic Item as designed contains design errors: Those are going to appear in test reasonably within the determined test time The test errors are going to be eliminated by design corrections type B failure modes) The test continuation will evaluate success of the fix. Design errors that cannot be fixed (type A failure modes) will continuously be counted Failures determined to be random will not be counted Reliability growth will be measured. Principles and Assumptions
Page 13 Failure rate during the test is constant when there are no changes of the tested item Failure rate decreases with introduced design corrections in steps, and remains constant through the next change The step curve is fitted with a curve representing Non- Homogenous Poisson Process, NHPP) The process definition: failure rate is constant until changes occur. The facts not considered in application of that theory: The initial failure rate is just the total failure rate. No rationale how much of it is attributed to: Design problems that can be corrected Random events (those failure modes one does not know where they come from, they just happen) Design problems that cannot be corrected for one of the reasons: –Technically impossible –Economically not justifiable –Time to market constraints Principles and Assumptions, cont.
Page 14 The expected accumulated number of failures up to test time T is given by: where is the scale parameter; is the shape parameter (a function of the general effectiveness of the improvements; (0 1 corresponds to negative reliability growth- reliability degradation) The failure intensity when it is changing as a result of design improvements after T h of testing is given by: Mathematical Model - Refresher
Page 15 Failure modes types in test: Systematic: corrected in test (Type B), not corrected (Type A), Random - constant Mathematics of Traditional Reliability Growth The only failure modes with decreasing failure rates (power law) Only type B failure modes failure rates are accounted for in a reliability test program – those that show growth expressed by the power law model; the type A and random remain constant.
Page 16 To plan a reliability growth, the initial value of failure rate, I or initial mean time between failures, I, was assumed as known at some time t I. This initial failure rate would have a value that was known by experience for that item or by similarity with another like item, I (t I )=constant The thought process was then that this initial failure rate would decrease under the rules of the power law and at the end of the test with the corrections would assume a final value (a constant again), F (t F ). The Crow/AMSAA/Duane planning model is simple and easy to implement: But, the initial failure rate has three components, only one of those can be improved and fitted with the power law, the failure rate of the B failure modes. The remaining components are constant. Planning Reliability Growth
Page 17 The remaining two components are constant. The final failure rate as a function of time also contains three components, two constant and one only that can be fitted with the power law: The final B-modes failure rate is then made of the improved B- type failure modes failure rate and the total final item or system failure rate contains also two additional constant components: Planning Reliability Growth, cont.
Page 18 The random failure rates are not recorded or taken into account, the A-type failures are considered in the number of failures it is said that they are included into the shape parameter calculations but there is no example in current Handbooks that would show how it was done It is also stated that the Type A failure modes are counted every time they show up, repetitions included; no example of that statement could be found Given that there is no improvement applied, type A failure modes should be treated in the same manner as the random failure rates. They could be separately accounted for, but numerically, their failure rate will be added to the random failure rate. This means that during the test, the A type failure modes should be counted as another group of constant failure rates In which case the methodology of the fixed duration testing should be applied to determine failure rates for both: The A – type failure modes All other random failure modes where the origin is not identifiable. A Failure Modes
Page 19 Test duration is mathematically determined from the reciprocal of the failure rate as: Where: F = final product MTBF (for mitigated. fixed failure modes only) – given goal I = initial product MTBF (for failure modes that will be mitigated) - assumed t F =test duration needed to achieve the final MTBF for fixed failure modes t I = initial test time (has various explanations) – assumed – what is it? Example – old school: I =4,000 hours, F =10,000, = 0.6 Present Method to Determine Test Duration
Page 20 In the traditional test design, the initial test MTBF is the MTBF assumed for the product, but: The reciprocal of this initial MTBF is the initial failure rate made up of three components, two of them are constant, not Power Law: Design – correctable Design – non correctable Random failure rates or failure modes It is only the design failure modes that can be corrected (B type) that can be fitted by the Power Law (Weibull Intensity Function), thus: What part of the entire item initial assumed, estimated failure rate could those correctable failure modes could be? Analytical prediction contains only the random failure rates –If the Design Engineering is reasonably competent, Type A or B failure modes could be at the most 40% of the assumed initial failure rate –B failure rate could be only a small fraction of the estimated product failure rate before the test. Initial MTBF – What is It?
Page 21 Recorded in test are cumulative times of occurrence of A and B failure modes. A modes are not addressed, they should not be a part of the power law – handbook text suggested they are counted, if they were it would have been in error From test data, shape and scale parameters are determined The reported failure rate and MTBF are: Random and A modes do not seem to be a part of the achieved growth. They are unfortunately - forgotten. Parameters and Results
Page 22 If initial test time was assumed to be 200 hours Traditional test (all failure rates – power law): Initial failure rate: I = 2.5×10 -4 f/hr Initial MTBF: I = 4,000 hours Final MTBF: F = 10,000 hours Final test time: 1,976 hours (from the initial time) True status, only B-type failure modes improved (e.g. maximum 40% of the old initial failure rate: I = 2.5×10 -4 f/hr Initial failure rate for B modes: I = 0.4 ×2.5×10 -4 f/hr = 1×10 -4 f/hr Initial MTBF: IB = 10,000 hours Possible final MTBF for B modes: FB = 30,000 hours Overall final failure rate B modes + random and A modes: 1,833 ×10 -4 Final overall MTBF: F = 5,544 hours Final test time: 3,118 hours (from the initial time) The forgotten, unreported failure rate: = 1.5×10 -4 f/hr Comparison
Page 23 The possible correct solution: Prepare a reliability growth test for only B failure modes Count A type failure modes as if they are random Count random failures Calculate final B failure modes failure rate and MTBF Add the constant A and random failure rates to get results Possible problems - difficulties: The calculated mathematical test duration is unrelated to use stresses or use profile The traditionally determined test duration is too short to account for the random failures, normally the required test duration for a reasonable confidence is about 10 MTBFs (in our example would be about 70,000 hours) The traditional RG test duration does not support this test time A short reliability growth test does not disclose any cumulative damage or failures of small failure rates that would start showing only after the test is complete, while useful life of the item could be 10 or 20 years The proposed viable solution – accelerated Reliability Growth test. The Solution – Way Forward
Page 24 Failures occur when an item is not strong enough to withstand one or more attributes of a stress: Level, duration, or repetitions of its application The higher the level the shorter duration or less repetitions induce a failure If the mean of strength is a k times multiple of the mean of stress (load) and the standard deviations of each are a and b times their respective mean values, reliability of an item regarding each use stress (i), and the total reliability will be: Physics of Failure and Reliability The area of overlap of strength and stress distributions represents probability of failure for each of the stresses; L, L = mean and standard deviation of the load distribution L = b× L S, S = mean and standard deviation of the strength distribution S = a × S
Page 25 Allocate reliability regarding each of the expected stresses in use The cumulative damage and ultimately failure due to a stress is proportional to the stress level and its duration. For the stress applied at the same level as in life, the cumulative damage model is: Physics of Failure Reliability – Margin k Selection For the allocated reliability regarding each stress, select the value of margin k which would multiply its duration in use to be applied in test; Apply stresses simultaneously whenever possible; If the same stress type is applied at different levels in use, recalculate their durations to the highest level (using acceleration factors); The most common values for a and b are: a = 0.05, b = 0.2 Reliability
Page 26 Each of the stresses is accelerated in test to allow for shorter test duration Total item failure rate is the sum of its failure rates regarding each individual stress ( 0 is the item total failure rate in use condition and A is the accelerated item total failure rate (in reliability growth is equivalent to ): Product j exists when the stresses 1 to j produce the same failure mode. Stress acceleration models for different stresses – example: inverse power law model (usually applicable to thermal cycling, vibration, shock, humidity); Arrhenius model (used for temperature acceleration using absolute temperature); Eyring model (used also when the thermal stress is a factor in process acceleration); step stress model, where the stress is increasing in steps; fatigue model representing the degradation due to the repetitious stress. Test Acceleration
Page 27 Test Example B Failure Modes – duration k×life Stresses: Thermal cycling Thermal exposure (thermal dwell) Humidity Vibration Operational cycling Thermal cycling One thermal cycle in test = 24 hours in life. Determination of factor k – for major stresses: k=1.5 Thermal dwell (normalize exposure when OFF to duration at ON temperature): Duration of accelerated exposure:
Page 28 The thermal exposure is combined with the thermal cycling, distributed over the high temperature: The test cycle profile: Humidity: Test 95% RH and temperature T RH = 85 °C (65 °C chamber + 20 °C internal temperature rise) Vibration: 150,000 miles, 150 hours per axis vibration at 1.7 g rms. Test level: 3.2 g rms To project test time to life use acceleration factor to multiply test time Data for reliability plotting: Initial B failure modes MTBF 100,000 hours, final 10 6 hours Initial test time: 100 hours Total traditional test time: 4.6x10 3 hours Final test reliability (B failure modes): Final MTBF (improved failure modes):1,431,964 hours Total accelerated test time; 526 hours Test Example, Cont. ilure Time to failure h Cumulative time to failure (n=24) (t) log(t) log[ (t)] 13, ,711,9291, , , , ,016336, , , ,56111, t 0 *k ,153,600788,
Page 29 The test duration covers product entire life It allows detection of all design problems, not only those that appear in a small fraction of product life It enables estimate of failure rate regarding product random events, disregarded in traditional RG testing The failure rate achieved by design improvement with the random failure rate provides realistic estimate of total product reliability Test duration is determined based on required total reliability in view of product physical cumulative damage from life stresses in use; Test acceleration allows achievement of very reasonable test duration, shorter than traditional mathematically derived testing The reliability improvement through test is no longer cost prohibitive Test failure times are projected to their appearance in real life and the analysis uses this data; Even though covering the product expected life (durability information), it is still considerably shorter than the traditional reliability growth test. Why Accelerated Reliability Growth?
Page 30 Milena Krasich is a Senior Principal Systems Engineer in Raytheon Integrated Defense Systems, Whole Life Engineering in RAM Engineering Group, Sudbury, MA. Prior to joining Raytheon, she was a Senior Technical Lead of Reliability Engineering in Design Quality Engineering of Bose Corporation, Automotive Systems Division. Before joining Bose, she was a Member of Technical Staff in the Reliability Engineering Group of General Dynamics Advanced Technology Systems formerly Lucent Technologies, after the five year tenure at the Jet Propulsion Laboratory in Pasadena, California. While in California, she was a part-time professor at the California State University Dominguez Hills, where she taught graduate courses in System Reliability, Advanced Reliability and Maintainability, and Statistical Process Control. At that time, she was also a part-time professor at the California State Polytechnic University, Pomona, teaching undergraduate courses in Engineering Statistics, Reliability, SPC, Environmental Testing, Production Systems Design,. She holds a BS and MS in Electrical Engineering from the University of Belgrade, Yugoslavia, and is a California registered Professional Electrical Engineer. She is also a member of the IEEE and ASQC Reliability Society, and a Fellow and the president Emeritus of the Institute of Environmental Sciences and Technology. Currently, she is the Technical Advisor (Chair) to the US Technical Advisory Group (TAG) to the International Electrotechnical Committee, IEC, Technical Committee, TC56, Dependability. As a part of the TC56 Working groups she is working on dependability/Reliability standards as a project leader for revision of many released and current international standards such as IEC/IEEE/ANSI Reliability Growth IEC and IEC 61164, Fault Tree Analysis IEC /ANSI 61025, Testing for the constant failure rate and failure intensity (Reliability compliance/demonstration tests), IEC/ANSI and FMEA, IEC/ANSI 60812, and for preparation of the new IEC standard on Accelerated Testing, IEC Biography
Page 31 Upcoming Reliability Webinars Title: 40 Years of HALT: What Have We Learned Author: Mike Silverman Date: Sept 12, 2013, 12pm EST what-have-we-learned/ Location: Webinar HALT began 40 years ago with a simple idea of testing beyond specifications in order to better understand design margins. Over the past 40 years, thousands of engineers around the world have been exposed to the concepts of HALT and have tried the techniques. This tutorial will explore what we have learned in the past 40 Years and what the future of HALT could be.