Presentation on theme: "The Application of Causal Analysis Techniques for Computer-Related Mishaps Chris Johnson University of Glasgow, Scotland."— Presentation transcript:
The Application of Causal Analysis Techniques for Computer-Related Mishaps Chris Johnson University of Glasgow, Scotland. http://www.dcs.gla.ac.uk/~johnson SAFECOMP: 26 th September 2003
Acknowledgements HSE: Mark Bowell, Ray Ward. Adelard: George Clelland, Peter Bishop, Luke Emmett, Sofia Guerra, Robin Bloomfield. Blacksafe Consulting: Bill Black. Glasgow University: Chris Johnson. Look, I’m not blaming you, I’m just suing you…
Author bias: –individuals reluctant to accept findings they did not produce. Confidence bias: –people trust those with most confidence in their techniques. Hindesight bias: –investigators use information unavailable to people in incident. Judgement bias: –investigators reach decision within a constrained time period. Political bias: –high status member has influence by status not judgement itself… “At this point in the meeting, I’d like to shift the blame from me onto someone else…” Bias
The Sunday Telegraph, September 7 th, 2003, page 33. Does this really look like me? Fish accidents?
“The NASA Accident Investigation Team investigated the accident using “fault trees,” a common organizational tool in systems engineering. Fault trees are graphical representations of every conceivable sequence of events that could cause a system to fail.” (CAIB, p.85)
But…Fault Trees: - not good for event sequences (poor notion of time); - few engineers would agree with “every conceivable”? * work with Clif Ericsson at Boeing on Accident Fault Trees *
Control system closes valve A, starves debutanizer. Also closes valve B, heating raises debutanizer pressure. Opens valve A, debutanizer flow restored. Valve B should open to splitter. Operators see misleading signals, valve B shown open. Debutanizer fills while naptha splitter empties.
Motivation: Milford Haven Separate displays. Didnt check status of valve B, operators open valve C. Debutanizer vents to flare, wet gas compressor restarts. Should increase flow but increases debutanizer pressure. Material vents to flare drum, corroded discharge breaks. 20 tonnes of hydrocarbon ignites, damage > £50 million.
Motivation: Milford Haven Human ‘Error’ and Plant Design/Operation “Operators were not provided with information systems configured to help them identify the root cause of such problems. Secondly, the preparation of shift operators and supervisors for dealing with a sustained upset and therefore stressful situation was inadequate. Safety Management Systems “… the company’s crucial safety management systems were not adequately performing their function. Examples are the systems for modification and inspection. Company was unaware of defects in safety management systems because its monitoring of their performance did not effective highlight problems.” Risk Assessment “…3 years before a modification was carried out so automated high-capacity discharge pumps no longer automatically started to move excess to slops from flare discharge tank. Instead, low capacity pumps recycle material back to production process. Valves had to be operated manually if high-capacity pumping to slops needed but this was seldom (never?) practiced”.
Control Flaws 1. Inadequate Enforcements of Constraints (Control Actions) –1.1 Unidentified hazards –1.2 Inappropriate, ineffective or missing control actions for identified hazards 1.2.1 Design of control algorithm (process) does not enforce constraints –Flaws in creation process –Process changes without appropriate change in control algorithm (asynchronous evolution) –Incorrect modification or adaptation. 1.2.2 Process models inconsistent, incomplete or incorrect (lack of linkup) –Flaws in creation process –Flaws in updating process (asynchronous evolution) –Time lags and measurement inaccuracies not accounted for 1.2.3 Inadequate coordination among controllers and decision makers 2 Inadequate Execution of Control Action –2.1 Communication flaw –2.2 Inadequate actuator operation –2.3 Time lag 3 Inadequate or Missing Feedback –3.1 Not provided in system design –3.2 Communication flow –3.3 Time lag –3.4 Inadequate sensor operation (incorrect or no information provided)
Conclusions Several classes of causal analysis techniques for E/E/PES: –Elicitation Techniques (e.g., Barrier Analysis); –Event-based techniques (e.g., Accident fault trees); –Flow Charts (e.g., PRISMA); –Accident Models (e.g., control theory models in STAMP); –Argumentation Techniques (e.g., counterfactual WBA). How do we assess them? –investment, (i.e., training and time required to apply them); –consistency of individuals applying approach to same incident. –degree of support for recommendations/redesign?
Conclusions Can technique analyze failures at every stage of E/E/PES development? –Need to identify all candidate stages of development…. –Assess techniques against IEC 61508 development model. –Other standards/models might have been used. Begin with subjective assessments + peer review (NTSB and NASA). Currently validating against industrial experience. Methodological problems (who has used more than 2 techniques?).