Presentation on theme: "Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department."— Presentation transcript:
Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department 12341, Weapon System and Software Quality
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting2 Presentation Objective Events have many potential causes. We tend to think of causes as related mostly to unwanted events – but in effect, all events that occur have causes – that is, the reason that the event occurs. The objective of this short presentation/discussion is to gain a better understanding of why it is important to understand the difference between direct causes and root causes of events. In so doing, we enhance our capability to influence a much larger class of events – both in preventing unwanted events and ensuring wanted events actually do occur.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting3 An Example of a Problem USAF F-22A jets grounded by software glitch > Fri, 23 Feb :55: Navigational systems failed, planes forced to return to Hawaii [visually having to follow their tankers to safety]. The problem turns out to be software (no surprise there). Fix created, "verified", installed, and they're off again. [Direct or Root Cause addressed?] A spokesman for Lockheed Martin this week insisted that the navigation software problem was minor. 'The issue was quickly identified in a matter of days and a fix installed in the airplanes, which were flown successfully to Japan,' he said. 'There are 87 of these exceptional fighters and they are out there performing exceptionally well, and their pilots continue to fly them in new and greater ways.'"
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting4 Examples to Test Our Understanding Army Training Accident, June 2002 Friendly Fire Deaths, March 2002 Medical Direct/Root Cause Determinations RESOURCE: Peter Neumann, Stanford University Professor RISK site provides a voluminous list of risks, many of which are computer/software related - primarily interested in security and safety risks; summaries are provided with links to more detail.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting5 A Simple Example Assume each of these factors is as described below: e: car will not start d: battery is dead c: alternator does not function b: alternator is well beyond its designed service life a: car is not being maintained according to recommended service schedule Direct Cause? Intermediary Causes? Root Cause?
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting6 Error, Fault/Defect, Failure Error –a human action or lack of action that results in the inclusion of a fault in a product or the way it is used –the variance between expected and actual results Fault/Defect –an accidental condition that causes a product to fail to perform its required function if encountered during operational use Failure –an event in which a product does not perform a required function within its specified limits during operational use ERRORFAULT/DEFECTFAILURE may lead to NO FAILURE REDUCED EFFECT FAULT TOLERANCE or
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting7 Direct Cause Causes of events may be natural or man-made, active or passive, initiating or permitting, obvious or hidden. Those causes that lead immediately to the effect are often called direct or proximate causes. Examples of direct/proximate causes: Equipment Human Arched Pushed incorrect button Leaked Fell Over-loaded Dropped tool Over-heated Connected wires
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting8 Root Cause Direct causes often result from another set of causes, which could be called intermediate causes, and these may be the result of still other causes. When a chain of cause and effect is followed from a known end-state, back to an origin or starting point, root causes are found.cause and effect The process used to find root causes is called root cause analysis --- systematic problem solving. root cause analysis A root cause is an initiating cause of a causal chain which leads to an outcome or effect of interest.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting9 The Benefits of Problem Solving! The usual purpose of attempting to find root causes is to solve a problem that has actually occurred, or to prevent a less serious problem from escalating to an unacceptable level (e.g., Near miss safety for aircraft).problem The basic concept is that solving a problem by addressing root causes is ultimately more effective than merely addressing symptoms or direct causes. That is, a class of problems may be solved/prevented by addressing root causes rather than just direct causes.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting10 Basic Process - Continue to Ask Why! Continue to ask why until you have reached: 1. Direct, Intermediate, and Root cause(s) - including all organizational factors that exert control over the design, fabrication, development, maintenance, operation, and disposal of the system. 2. A problem/cause that is not correctable by your organization => may be promoted to higher responsible organization. 3. Insufficient data to continue.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting11 Example
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting12 Why-Causal Tree
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting13 Example
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting14 Potential Problem Analysis Tools Failure Modes and Effects Analysis (FMEA) –an inductive engineering technique used at the component level to define, identify, and eliminate known and/or potential failures, problems, and errors from the system, design, process, and/or service before they reach the customer Fault Tree Analysis (FTA) –FTA is a deductive analytical technique of reliability and safety analyses and generally is used for complex dynamic systems Probabilistic Risk Assessment (PRA) –PRA is a systematic, logical, and comprehensive discipline that uses tools like FMEA, FTA, Event Tree Analysis (ETA), Event Sequence Diagrams (ESD), Master Logic Diagrams (MLD), Reliability Block Diagrams (RBD), and so forth to quantify risk.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting15 Summary Direct Cause vs Root Cause –Issue: level of problem solving Problem Solving –Direct Cause: objective is to solve an instance of a potential class of problems –Root Cause: objective is to solve a class of problems –Both are useful Analysis Methods –Methods exist to analyze events – goal is to eliminate occurrence of unwanted events and ensure wanted events do occur –FMEA, FTA, PRA Q&A?
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting17 Army Training Accident Incident –Thu, 13 Jun 2002: two soldiers were killed in training at Ft Drum. They were firing artillery shells, and were relying on the output of the Advanced Field Artillery Tactical Data System. When they forgot to enter the target altitude, the system assumed an altitude of zero. (Ft Drum is 676 ft) Direct Cause –Soldiers forgot to enter the target altitude Potential Root Cause(s) –Software should not default to a valid altitude –Software/System analysis and modeling/testing inadequate –Software requirements not adequately specified –System CONOPS not adequate –Soldier training inadequate
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting18 Friendly Fire Deaths Incident –A U.S. Special Forces air controller was calling in GPS positioning from some sort of battery-powered device. He had used the GPS receiver to calculate the latitude and longitude of the Taliban position in minutes and seconds for an airstrike by a Navy F/A-18. The bomber crew "required" a seconds calculation in degree decimals. The crew did not have equipment to perform the minutes- seconds conversion themselves. –The air controller had recorded the correct value in the GPS receiver when the battery died. Upon replacing the battery, he called in the degree-decimal position the unit was showing -- without realizing that the unit is set up to reset to its *own* position when the battery is replaced. –The 2,000-pound bomb landed on the air controller position, killing three Special Forces soldiers and injuring 20 others. Direct Cause –Taliban position was incorrectly transmitted to the Navy F/A-18 bomber crew Potential Root Cause(s) –GPS System Default was a valid not invalid position –Lack of battery backup to hold values in memory during battery replacement –Not equipping users to translate one coordinate system to another (reminiscent of the Mars Climate Orbiter slamming into the planet when ground crews confused English with metric) –Using a device with such flaws in a combat situation without adequate testing
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting19 Medical Direct/Root Cause Example 1 - Questions? Sentinel eventDirect causeRoot cause - thoughts? A patient was given the wrong medication and the patient experienced an adverse reaction. As a result, the patient's length of stay was extended for an additional 10 days. The nurse who administered the medication did not compare the name on the patient's armband to the name on the medication order. The nurse did not follow the patient identification policy. Registration staff placed the wrong armband on the patient's arm to begin with.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting20 Medical Direct/Root Cause Example 2 - Questions? Sentinel eventDirect cause Root cause - thoughts? Doctor prescribes an anti-seizure drug (phenytoin) and the patient develops a severe allergic reaction known as anaphylaxis. The symptoms were itching, hives, swelling in the throat, wheezing, light-headedness from low blood pressure, nausea, and abdominal cramping. Patient is allergic to phenytoin. The doctor did not do a thorough background check on the patient medical history or the patient did not inform the doctor of his/her previous medical history.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting21 Medical Direct/Root Cause Example 3 - Questions? Sentinel eventDirect causeRoot cause - thoughts? Medication of Lasix drip hung to wrong patient. Patient had same last name. Interruption during medication administration. - nurse had very heavy patient assignment and skipped double check medication administration with another RN. Missed the double check process on patient identification and medication administration. All hospital medication should be double checked by two nurses.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting22 Medical Direct/Root Cause Example 4- Questions? Sentinel eventDirect cause Root cause - thoughts? A patient slips and falls on a slippery floor that has been mopped previously from another patient having an upset stomach. Janitor was not able to put signs down noting caution before the patient walked down the hall because he was interrupted by a cafeteria worker needing him to clean a spill made. The sign is not down noting the caution.