Presentation is loading. Please wait.

Presentation is loading. Please wait.

Safety in large technology systems October, 1999.

Similar presentations


Presentation on theme: "Safety in large technology systems October, 1999."— Presentation transcript:

1 Safety in large technology systems October, 1999

2 Technology failure Why do large, complex systems sometimes fail so spectacularly? Do the easy explanations of “operator error,” “faulty technology,” or “complexity” suffice? Are there managerial causes of technology failure? Are there design principles and engineering protocols that can enhance large system safety? What is the role of software in safety and failure?

3 Decision-making about complex technology systems l the scientific basis of technology systems l How do managers make intelligent decisions about complex technologies? l managers, scientists, citizens l example: Star Wars anti-missile systems l Note: the technical specialist often does not make the decision; so the persuasive power of good scientific communication is critical.

4 Goal for technology management l A central problem for designers, policy makers, and citizens, then, is how to avoid large-scale failures when possible, through appropriate design, and how to plan for minimizing the consequences of those failures which will inevitably occur.

5 Information and decision- making l Information flow and management of complex technology systems l complex organizations pursue multiple objectives simultaneously l complex organizations pursue the same objective along different and conflicting paths

6 Surprising failures l Franco-Prussian war, Israeli intelligence failure in Yom Kippur war l The Mercedes “A” vehicle sedan and the moose test l Chernobyl nuclear power meltdown

7 Therac-25 l high energies l computer control rather than electro- mechanical control l positioning the turntable: x-ray beam flattener l 15,000 rad administered rather than 200 rad

8 Technology failure l sources of failure n management failures n design failures n proliferating random failures n “storming” the system l design for “soft landings” l crisis management

9 Causes of failure l Complexity and multiple causal pathways and relations l defective procedures l defective training systems l “human” error l faulty design

10 Causes of failure l “The causes of accidents are frequently, if not almost always, rooted in the organization--its culture, management, and structure. These factors are all critical to the eventual safety of the engineered system” (Leveson, 47).

11 Varieties of failure l routine failures, stochastic failures, design failures, systemic failures, interactive failures, “horseshoe nail” failures l vulnerability of modern technologies to software failure -- Euro, 2000 bug, air traffic control failures

12 Sources of potential failure l hardware interlocks replaced with software checks on turntable position l cryptic malfunction codes; frequent messages l excessive operator confidence in safetysystems l lack of effective mechanism for reporting and investigating failures l poor software engineering practices;

13 Organizational factors l “Large-scale engineered systems are more than just a collection of technological artifacts: They are a reflection of the structure, management, procedures, and culture of the engineering organization that created them, and they are also, usually, a reflection of the society in which they were created” (Leveson, 47).

14 Design for safety l hazard elimination l hazard reduction l hazard control l damage reduction

15 Aspects of design l the technology -- machine, vehicle, software system, airport l the management structure -- locus of decision-making l the communications system -- transmission of critical and routine information within the organization l training of workers for task -- performance skills, safety procedures

16 Information and decision-making l Information flow and management of complex technology systems l complex organizations pursue multiple objectives simultaneously l complex organizations pursue the same objective along different and conflicting paths

17 System safety l builds in safety, not simply adding it on to a completed design l deals with systems as a whole rather than subsystems or components l takes a larger view of hazards than just failures l emphasizes analysis rather than past experience and standards

18 System safety (2) l emphasizes qualitative rather than quantitative approaches l recognizes the importance of tradeoffs and conflicts in system design l more than just system engineering

19 Hazard analysis l development: identify and assess potential hazards l operations: examine an existing system to improve its safety l licencing: examine a planned system to demonstrate acceptable safety to a regulatory authority

20 Hazard analysis (2) l construct an exhaustive inventory of hazards early in design l classify by severity and probability l construct causal pathways that lead to hazards l design so as to eliminate, reduce, control, or ameliorate

21 Better software design l design for the worst case l avoid “single point of failure” designs l design “defensively” l investigate failures carefully and extensively l look for “root cause,” not symptom or specific transient cause l embed audit trails; design for simplicity

22 Safe software design l control software should be designed with maximum simplicity (408) l design should be testable; limited number of states l avoid multitasking, use polling rather than interrupts l design should be easily readable and understood

23 Safe software (2) l interactions between components should be limited and straightforward l worst-case timing should be determinable by review of code l code should include only the minimum features and capabilities required by the system; no unnecessary or undocumented features

24 Safe software (3) l critical decisions (launch a missile) should not be made on values often taken by failed components -- 0 or 1. l Messages should be designed in ways to eliminate possibility of compute hardware failures having hazardous consequences (missile launch example)

25 Safe software (4) l strive for maximal decoupling of parts of a software control system l accidents in tightly coupled systems are a result of unplanned interactions l the flexibility of software encourages coupling and multiple functions; important to resist this impulse.

26 Safe software (5) l “Adding computers to potentially dangerous systems is likely to increase accidents unless extra care is put into system design” (411).

27 Scope and limits of simulations l Computer simulations permit “experiments” on different scenarios presented to complex systems l Simulations are not reality l Simulations represent some factors and exclude others l Simulations rely on a mathematicization of the process that may be approximate or even false.

28 Human interface considerations l unambiguous error messages (Therac 25) l operator needs extensive knowledge about the “theory” of the system l alarms need to be comprehensible (TMI); spurious alarms minimized l operator needs knowledge about timing and sequencing of events l design of control board is critical

29 Control panel anomalies

30 Risk assessment and prediction l What is involved in assessing risk? n probability of failure n prediction of consequences of failure n failure pathways

31 Reasoning about risk l How should we reason about risk? l Expected utility: probability of outcome x utility of outcome l Probability and science l How to anticipate failure scenarios?

32 Compare scenarios l nuclear power vs coal power l automated highway system vs routine traffic accidents

33 Ordinary reasoning and judgment l well-known “fallacies” of ordinary reasoning: n time preference n framing n risk aversion

34 large risks and small risks l the decision-theory approach: minimize expected harms l the decision-making reality: large harms are more difficult to absorb, even if smaller in overall consequence l example: JR West railway

35 The end


Download ppt "Safety in large technology systems October, 1999."

Similar presentations


Ads by Google