Safety in large technology systems October, 1999.

Safety in large technology systems October, 1999

Technology failure Why do large, complex systems sometimes fail so spectacularly? Do the easy explanations of “operator error,” “faulty technology,” or “complexity” suffice? Are there managerial causes of technology failure? Are there design principles and engineering protocols that can enhance large system safety? What is the role of software in safety and failure?

Decision-making about complex technology systems l the scientific basis of technology systems l How do managers make intelligent decisions about complex technologies? l managers, scientists, citizens l example: Star Wars anti-missile systems l Note: the technical specialist often does not make the decision; so the persuasive power of good scientific communication is critical.

Goal for technology management l A central problem for designers, policy makers, and citizens, then, is how to avoid large-scale failures when possible, through appropriate design, and how to plan for minimizing the consequences of those failures which will inevitably occur.

Information and decision- making l Information flow and management of complex technology systems l complex organizations pursue multiple objectives simultaneously l complex organizations pursue the same objective along different and conflicting paths

Surprising failures l Franco-Prussian war, Israeli intelligence failure in Yom Kippur war l The Mercedes “A” vehicle sedan and the moose test l Chernobyl nuclear power meltdown

Therac-25 l high energies l computer control rather than electro- mechanical control l positioning the turntable: x-ray beam flattener l 15,000 rad administered rather than 200 rad

Technology failure l sources of failure n management failures n design failures n proliferating random failures n “storming” the system l design for “soft landings” l crisis management

Causes of failure l Complexity and multiple causal pathways and relations l defective procedures l defective training systems l “human” error l faulty design

Causes of failure l “The causes of accidents are frequently, if not almost always, rooted in the organization--its culture, management, and structure. These factors are all critical to the eventual safety of the engineered system” (Leveson, 47).

Varieties of failure l routine failures, stochastic failures, design failures, systemic failures, interactive failures, “horseshoe nail” failures l vulnerability of modern technologies to software failure -- Euro, 2000 bug, air traffic control failures

Sources of potential failure l hardware interlocks replaced with software checks on turntable position l cryptic malfunction codes; frequent messages l excessive operator confidence in safetysystems l lack of effective mechanism for reporting and investigating failures l poor software engineering practices;

Organizational factors l “Large-scale engineered systems are more than just a collection of technological artifacts: They are a reflection of the structure, management, procedures, and culture of the engineering organization that created them, and they are also, usually, a reflection of the society in which they were created” (Leveson, 47).

Design for safety l hazard elimination l hazard reduction l hazard control l damage reduction

Aspects of design l the technology -- machine, vehicle, software system, airport l the management structure -- locus of decision-making l the communications system -- transmission of critical and routine information within the organization l training of workers for task -- performance skills, safety procedures

Information and decision-making l Information flow and management of complex technology systems l complex organizations pursue multiple objectives simultaneously l complex organizations pursue the same objective along different and conflicting paths

System safety l builds in safety, not simply adding it on to a completed design l deals with systems as a whole rather than subsystems or components l takes a larger view of hazards than just failures l emphasizes analysis rather than past experience and standards

System safety (2) l emphasizes qualitative rather than quantitative approaches l recognizes the importance of tradeoffs and conflicts in system design l more than just system engineering

Hazard analysis l development: identify and assess potential hazards l operations: examine an existing system to improve its safety l licencing: examine a planned system to demonstrate acceptable safety to a regulatory authority

Hazard analysis (2) l construct an exhaustive inventory of hazards early in design l classify by severity and probability l construct causal pathways that lead to hazards l design so as to eliminate, reduce, control, or ameliorate

Better software design l design for the worst case l avoid “single point of failure” designs l design “defensively” l investigate failures carefully and extensively l look for “root cause,” not symptom or specific transient cause l embed audit trails; design for simplicity

Safe software design l control software should be designed with maximum simplicity (408) l design should be testable; limited number of states l avoid multitasking, use polling rather than interrupts l design should be easily readable and understood

Safe software (2) l interactions between components should be limited and straightforward l worst-case timing should be determinable by review of code l code should include only the minimum features and capabilities required by the system; no unnecessary or undocumented features

Safe software (3) l critical decisions (launch a missile) should not be made on values often taken by failed components -- 0 or 1. l Messages should be designed in ways to eliminate possibility of compute hardware failures having hazardous consequences (missile launch example)

Safe software (4) l strive for maximal decoupling of parts of a software control system l accidents in tightly coupled systems are a result of unplanned interactions l the flexibility of software encourages coupling and multiple functions; important to resist this impulse.

Safe software (5) l “Adding computers to potentially dangerous systems is likely to increase accidents unless extra care is put into system design” (411).

Scope and limits of simulations l Computer simulations permit “experiments” on different scenarios presented to complex systems l Simulations are not reality l Simulations represent some factors and exclude others l Simulations rely on a mathematicization of the process that may be approximate or even false.

Human interface considerations l unambiguous error messages (Therac 25) l operator needs extensive knowledge about the “theory” of the system l alarms need to be comprehensible (TMI); spurious alarms minimized l operator needs knowledge about timing and sequencing of events l design of control board is critical

Control panel anomalies

Risk assessment and prediction l What is involved in assessing risk? n probability of failure n prediction of consequences of failure n failure pathways

Reasoning about risk l How should we reason about risk? l Expected utility: probability of outcome x utility of outcome l Probability and science l How to anticipate failure scenarios?

Compare scenarios l nuclear power vs coal power l automated highway system vs routine traffic accidents

Ordinary reasoning and judgment l well-known “fallacies” of ordinary reasoning: n time preference n framing n risk aversion

large risks and small risks l the decision-theory approach: minimize expected harms l the decision-making reality: large harms are more difficult to absorb, even if smaller in overall consequence l example: JR West railway

The end

Safety in large technology systems October, 1999.

Similar presentations

Presentation on theme: "Safety in large technology systems October, 1999."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Safety in large technology systems October, 1999.

Similar presentations

Presentation on theme: "Safety in large technology systems October, 1999."— Presentation transcript:

Similar presentations

About project

Feedback