Download presentation
Presentation is loading. Please wait.
1
Future Trends in Process Safety
Prof. Nancy Leveson Engineering Systems Aeronautics and Astronautics MIT My personal view of the accident and accidents in general. Although the Baker Panel report findings represent a consensus view, each member has their own prioritization of the importance of the individual findings and their own view of accidents in general. So let me tell you a little about my background to set the context for what you will hear.
2
You’ve carefully thought out all the angles
You’ve done it a thousand times It comes naturally to you You know what you’re doing, it’s what you’ve been trained to do your whole life. Nothing could possibly go wrong, right?
3
Think Again
4
Topics Lessons from Texas City New factors in process accidents
Safety as a control problem Conclusions 2. System accidents, hmi and computers in control, human errors 4. Hazard analysis and stpa (batch reactor) design for safety (precedence) risk analysis and management
5
Leadership Safety requires passionate and effective leadership
Tone is set at the top of the organization Not just sloganeering but real commitment Setting priorities Adequate resources assigned A designated, high-ranking leader Safety and productivity are not conflicting if take a long-term view Instead of focusing on deficiencies at BP, want to look at what can be learned from our findings and what general principles were violated. Thus focus more on what need to do to do it right rather than on what was wrong at BP (although they clearly are just two sides of the coin) Sincere management concern is top factor in organizations that have fewer accidents and losses
6
Managing and Controlling Safety
Need clear definition of expectations, responsibilities, authority, and accountability at all levels of safety control structure Entire control structure must together enforce the system safety property Unsafe changes must be eliminated or controlled through system design or detected and fixed before they lead to an accident. Planned changes (MOC process) Unplanned changes
7
Visibility and Communication
Downward and upward communication Requires a positive, open, trusting environment Need effective measurement and monitoring of process safety performance (e.g., injury rates are not useful and are misleading) Avoid “culture of denial” If managers do not want to hear, people stop talking
8
Information and Appropriate Feedback
Good accident/incident investigation and follow through Identification and correction of systemic causal factors. Ensuring thorough reporting of incidents and near misses Thorough hazard identification, analysis, and control Effective process safety audit system to ensure adequate process safety performance Management info system is second more important factor in organizations that have accidents. Three aspects: collection, analysis, dissemination and use Management by goals in high-pressure industries (such as offshore oil drilling) encourages an image of super-performance and creates a tendency to cover up past mistakes. corporate learning requires formal or informal mechanisms to observe, record, retrieve past collective experience, including mistakes Requires delegation of responsibility for capturing info, rewards or at least not punishment, a system for creating and handling incident/accident reports, comprehensive procedures for analyzing incidents and identifying causal factors, and procedures for using reports and generating corrective actions.
9
Oversight and Control Results of operating experience, process hazard analyses, audits, near misses, or accident investigations must be used to improve process operations and process safety management system. Address promptly and track to completion the deficiencies found during assessments, audits, inspections and incident investigation. Not always done. Can develop tremendous backlogs. Becomes standard operating procedure
10
Fumbling for his recline button Ted unwittingly instigates a disaster
Often treat accidents as a chain of events and end up blaming operators or humans close to the actual events. But human behavior always occurs in a context. And humans will always make mistakes. Need to create context in which humans less likely to do the wrong thing. Systems approach to safety (vs. reliability approach) Fumbling for his recline button Ted unwittingly instigates a disaster
11
Process Safety vs. Personal Safety
All behavior influenced by context in which it occurs Both physical and social context Personal safety focuses on changing individual behavior Process (system) safety focuses on design of system in which behavior occurs To understand why process accidents occur and to prevent them, need to: Understand current context (system design) Create a design that effectively ensures safety Confusion between personal and process safety was a major cause of accident, in my view. Measuring and controlling wrong thing – e.g., days without an accident is not a process safety measurement. Couldn’t find culture survey with process safety questions (and data for comparison). Had to write our own.
12
The Enemies of Safety Complacency Arrogance Ignorance SUBSAFE
Complacency factors: Discounting risk: a human tendency, when people attempt to predict risk, they explicity or implicitly multiply events with low probability, assuming independence, and co thatme out with impossibly low numbers, when in fact the events are dependent. Called the Titanic Coincidence. Titanic Effect: Explain the fact that major accidents often preceded by a belief they cannot happen. Magnitude of disasters decreases to the extent that people believe that disasters are possible and plan to prevent them or to minimize their effects. Costs of taking action in advnce to prevent are inconsequential when measured against losses that may ensue if no action taken. Over-relying on redundancy: many accidents result of common cause failures in redundant systems Paradox: providing redundancy may lead to the complacency that defeats the redundancy Unrealistic risk assessment: ignores factors that not able to quantify or just make up numbers Ignoring high-consequence, low-probability events Assuming risk decreases over time Ignoring warning signs
13
Factors in Complacency
Discounting risk Over-relying on redundancy Unrealistic risk assessment Ignoring low-probability, high-consequence events Assuming risk decreases over time Ignoring warning signs
14
Topics Lessons from Texas City New factors in process accidents
New technology System accidents New types of human error Safety as a control problem Conclusions New technology – particularly digital technology 2. System accidents, hmi and computers in control, human errors 4. Hazard analysis and stpa (batch reactor) design for safety (precedence) risk analysis and management
15
Accident with No Component Failures
16
Types of Accidents Component Failure Accidents System Accidents
Single or multiple component failures Usually assume random failure System Accidents Arise in interactions among components Related to interactive complexity and tight coupling Exacerbated by introduction of computers and software
17
Safety vs. Reliability Safety and reliability are NOT the same
Sometimes increasing one can even decrease the other. Making all the components highly reliable will have no impact on system accidents. For relatively simple, electro-mechanical systems with primarily component failure accidents, reliability engineering can increase safety For complex systems, need something more I’ll talk about what that “something more” can be a little later
18
Humans in Process Safety
Usually define human error as deviation from normative procedures, but operators always deviate from standard procedures Normative vs. effective procedures Sometimes violation of rules has prevented accidents Cannot effectively model human behavior by decomposing it into individual decisions and acts and studying it in isolation from Physical and social context Value system in which takes place Dynamic work process
19
Less successful actions are natural part of search by operators for optimal performance
20
New Operator Roles and Errors
High tech automation changing cognitive demands on operators Supervising rather than directly monitoring Doing more cognitively complex decision-making Dealing with complex, mode-rich systems Increasing need for cooperation and communication Human-factors experts complaining about technology-centered automation Designers focus on technical issues, not on supporting operator tasks Leads to “clumsy” automation Errors are changing, e.g., errors of omission vs. commission
21
Impacts on System Design
Design for error tolerance Alarm management (managing by exception) Matching tasks to human characteristics Design to reduce human errors Providing information and feedback Training and maintaining skills
22
Topics Lessons from Texas City New factors in process accidents
Safety as a control problem New approaches to hazard analysis Design for safety Risk analysis and management Conclusions 2. System accidents, hmi and computers in control, human errors 4. Hazard analysis and stpa (batch reactor) design for safety (precedence) risk analysis and management
23
STAMP: A System’s Model of Accident Causality
Systems-Theoretic Accident Model and Processes Safety treated as a control problem, not a “failure” problem Accidents are not simply an event or chain of events Involve a complex, dynamic process Arise from interactions among humans, machines and the environment
24
A Broad View of “Control”
Does not imply need for a “controller” Component failures and dysfunctional interactions may be “controlled” through design (e.g., redundancy, interlocks, fail-safe design) or through process Manufacturing processes and procedures Maintenance processes Operations Does imply the need to enforce safety constraints in some way
25
STAMP (2) Safety is an emergent property that arises when system components interact with each other within a larger environment A set of safety constraints related to behavior of system components enforces that property Accidents occur when interactions among system components violate those constraints Goal of process (system) safety engineering is to identify the safety constraints and enforce them in the system design
26
Example Safety Constraints
Build safety in by enforcing constraints on behavior Controller contributes to accidents not by “failing” but by: Not enforcing safety-related constraints on behavior Commanding behavior that violates safety constraints System Safety Constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor Software (Controller) Safety Constraint: Software must always open water valve before catalyst valve
27
STAMP (3) Systems are not static
A socio-technical system is a dynamic process continually adapting to achieve its ends and to react to changes in itself and its environment Systems and organizations migrate toward accidents (states of high risk) under cost and productivity pressures in an aggressive, competitive environment Preventing accidents requires designing a control structure to enforce constraints on system behavior and adaptation that ensures safety Not just management of change for planned changes but also migration (changes) in system due to natural factors
28
Example Control Structure
29
Controlling and managing dynamic systems requires visibility and feedback
Controller Model of Process Control Actions Feedback Controlled Process
30
Relationship Between Safety and Process Models
Accidents occur when models do not match process and Incorrect control commands given Correct ones not given Correct commands given at wrong time (too early, too late) Control stops too soon (Note the relationship to system accidents)
31
Relationship Between Safety and Process Models (2)
How do they become inconsistent? Wrong from beginning Missing or incorrect feedback Not updated correctly Time lags not accounted for Resulting in Uncontrolled disturbances Unhandled process states Inadvertently commanding system into a hazardous state Unhandled or incorrectly handled system component failures
32
Modeling Accidents Using STAMP
Two types of models are used: Static safety control structure Behavioral dynamics (system dynamics) Dynamic processes behind change in the safety control structure, i.e., why it may change (e.g., degrade) over time Starting from this view of accidents as a control problem, can model and analyze safety. Two types of models used.
33
Simplified System Dynamics Model of Columbia Accident
34
Uses for STAMP Basis for new, more powerful hazard analysis techniques (STPA) Safety-driven design More comprehensive accident/incident investigation and root cause analysis Organizational and cultural risk analysis Defining safety metrics and performance audits Designing and evaluating potential policy and structural improvements Identifying leading indicators of increasing risk (“canary in the coal mine”) New risk management tools New holistic approaches to security
35
STAMP-Based Hazard Analysis (STPA)
Supports a safety-driven design process where Hazard analysis influences and shapes early design decisions Hazard analysis iterated and refined as design evolves Goals (same as any hazard analysis) Identification of system hazards and related safety constraints necessary to ensure acceptable risk Accumulation of information about how hazards can be violated, which is used to eliminate, reduce and control hazards in system design, development, manufacturing, and operations
36
STPA (2) STPA process Starts with identifying system requirements and design constraints necessary to maintain safety. Then STPA assists in Top-down refinement into requirements and safety constraints on individual components. Identifying scenarios in which safety constraints can be violated. Using results to eliminate or control hazards in design, operations, etc.
38
© Copyright Nancy Leveson, Aug. 2006
39
Comparison of STPA with Traditional HA Techniques
Top-down (vs bottom-up like FMECA) Considers more than just component failure and failure events (includes these but more general) Guidance in doing analysis (vs. FTA) Handles dysfunctional interactions and system accidents, software, management, etc.
40
Comparisons (2) Concrete model (not just in head)
Not physical structure (HAZOP) but control (functional) structure General model of inadequate control (based on control theory) HAZOP guidewords based on model of accidents being caused by deviations in system variables Includes HAZOP model but more general Fault trees concentrate on component failures, miss system accidents
41
Risk Analysis and Risk Management
Effectiveness and Credibility of ITA Time
42
System Technical Risk Time
43
Identifying Lagging vs. Leading Indicators
Number of waivers issued good indicator for risk in Space Shuttle operations but lags rapid increase in risk Time
44
No. of incidents under investigation a better leading indicator
Time
45
Managing Tradeoffs Among Risks
Good risk management requires understanding tradeoffs among Schedule Cost Performance Safety
46
Example: Schedule Pressure and Safety Priority
High Schedule Pressure Low Low High Safety Priority Takeaways: 1- Overly aggressive schedule enforcement has little effect on completion time (<2%) and cost, but has a large impact on safety 2- Aggressive safety enforcement has a large impact on safety, and can even have a positive cost impact Overly aggressive schedule enforcement has little effect on completion time (<2%) & cost, but has a large negative impact on safety Priority of safety activities has a large positive impact, including a positive cost impact (less rework)
47
Conclusions Future needs for safety in the process industry:
Differentiation between process safety and personal (occupational) safety Improved safety culture management New approaches to handle Advanced technology (particularly digital technology) System accidents and complexity New types of human error Using a control-based (vs. failure-based) model of causality expands our power to prevent process accidents
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.