Engineering a Safer and More Secure World Prof. Nancy Leveson Aeronautics and Astronautics Dept. MIT (Massachusetts Institute of Technology)

Slides:



Advertisements
Similar presentations
An advanced weapon and space systems company 1 23 rd ISSC/NWSSS Conference 23 rd ISSC/NWSSS Conference C. Forni, B. Blake – Remote Controlled.
Advertisements

Module N° 7 – Introduction to SMS
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Information Risk Management Key Component for HIPAA Security Compliance Ann Geyer Tunitas Group
The design process IACT 403 IACT 931 CSCI 324 Human Computer Interface Lecturer:Gene Awyzio Room:3.117 Phone:
Flight Testing Advanced Unmanned Aircraft Michael McDaniel - AIR Naval Air Systems Command NAS Patuxent River, MD, USA DISTRIBUTION STATEMENT A:
STPA A new hazard analysis technique based on the STAMP model of accident causation.
Integration of Quality Into Accident Investigation Processes ASQ Columbia Basin Section 614 John Cornelison January 2008.
The Role of Complexity in System Safety and How to Manage It Nancy Leveson.
Future Trends in Process Safety
AIR ACCIDENT PREVENTION Presented by Rex Hamza Aviation Safety Inspector-Guyana Civil Aviation Authority.
1 Certification Chapter 14, Storey. 2 Topics  What is certification?  Various forms of certification  The process of system certification (the planning.
1 Software Testing and Quality Assurance Lecture 38 – Software Quality Assurance.
Software Engineering for Safety : A Roadmap Presentation by: Manu D Vij CS 599 Software Engineering for Embedded Systems.
SWE Introduction to Software Engineering
CSC 402, Fall Requirements Analysis for Special Properties Systems Engineering (def?) –why? increasing complexity –ICBM’s (then TMI, Therac, Challenger...)
9 1 Chapter 9 Database Design Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 2 Slide 1 Systems engineering 1.
Unit 3a Industrial Control Systems
STAMP A new accident causation model using Systems Theory (vs
A System Theory Approach to Hazard Analysis Mirna Daouk
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
The design process z Software engineering and the design process for interactive systems z Standards and guidelines as design rules z Usability engineering.
Engineering a Safer World. Traditional Approach to Safety Traditionally view safety as a failure problem –Chain of random, directly related failure events.
SEDS Research GroupSchool of EECS, Washington State University Annual Reliability & Maintainability Symposium January 30, 2002 Frederick T. Sheldon and.
System Analysis & Design
Engineering a Safer World. Traditional Approach to Safety Traditionally view safety as a failure problem –Chain of random, directly related failure events.
EE551 Real-Time Operating Systems
S/W Project Management Software Process Models. Objectives To understand  Software process and process models, including the main characteristics of.
3- System modelling An architectural model presents an abstract view of the sub-systems making up a system May include major information flows between.
© 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1 A Discipline of Software Design.
CSE 303 – Software Design and Architecture
WHAT IS SYSTEM SAFETY? The field of safety analysis in which systems are evaluated using a number of different techniques to improve safety. There are.
Software Safety CS3300 Fall Failures are costly ● Bhopal 1984 – 3000 dead and injured ● Therac – 6 dead ● Chernobyl / Three Mile.
Introduction to Software Testing Chapter 9.1 Challenges in Testing Software – Testing for Emergent Properties: Safety and Security Paul Ammann & Jeff Offutt.
CS4730 Real-Time Systems and Modeling Fall 2010 José M. Garrido Department of Computer Science & Information Systems Kennesaw State University.
FAULT TREE ANALYSIS (FTA). QUANTITATIVE RISK ANALYSIS Some of the commonly used quantitative risk assessment methods are; 1.Fault tree analysis (FTA)
Combining Theory and Systems Building Experiences and Challenges Sotirios Terzis University of Strathclyde.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
Software Testing and Quality Assurance Software Quality Assurance 1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
1 Safety - definitions Accident - an unanticipated loss of life, injury, or other cost beyond a pre-determined threshhold.  If you expect it, it’s not.
Hazard Identification
Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Slide 1 Critical Systems Specification 1.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
Topic 3 Understanding systems and the impact of complexity on patient care.
Objectives Students will be able to:
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Organizational Behaviour What is an organization? Is a group of individuals working together to achieve common goals and structured into a division of.
Toulouse, September 2003 Page 1 JOURNEE ALTARICA Airbus ESACS  ISAAC.
CSE 303 – Software Design and Architecture
University of Virginia Department of Computer Science Complex Systems and System Accidents presented by: Joel Winstead.
Smart Home Technologies
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
Why Cryptosystems Fail R. Anderson, Proceedings of the 1st ACM Conference on Computer and Communications Security, 1993 Reviewed by Yunkyu Sung
Basic Security Concepts University of Sunderland CSEM02 Harry R Erwin, PhD.
Lectures 2 & 3: Software Process Models Neelam Gupta.
Cognitive & Organizational Challenges of Big Data in Cyber Defence. YALAVARTHI ANUSHA 1.
1 Software Testing and Quality Assurance Lecture 38 – Software Quality Assurance.
SENG521 (Fall SENG 521 Software Reliability & Testing Preparing for Test (Part 6a) Department of Electrical & Computer Engineering,
MODELLING SAFETY NZISM
Estrella Vergara EN-ACE group 24th May 2017
Air Carrier Continuing Analysis and Surveillance System (CASS)
Nicolas Malloy Systems Engineer
Knowing When to Stop: An Examination of Methods to Minimize the False Negative Risk of Automated Abort Triggers RAM XI Training Summit October 2018 Patrick.
Software Engineering for Safety: a Roadmap
Computer in Safety-Critical Systems
Review and comparison of the modeling approaches and risk analysis methods for complex ship system. Author: Sunil Basnet.
Presentation transcript:

Engineering a Safer and More Secure World Prof. Nancy Leveson Aeronautics and Astronautics Dept. MIT (Massachusetts Institute of Technology)

Problem Statement Current flight-critical systems remarkably safe due to –Conservative adoption of new technologies –Careful introduction of automation to augment human capabilities –Reliance on experience and learning from the past –Extensive decoupling of system components NextGen and new avionics systems violate these assumptions. –Increased coupling and inter-connectivity among airborne, ground, and satellite systems –Control shifting from ground to aircraft and shared responsibilities –Use of new technologies with little prior experience in this environment –Reliance on software increasing and allowing greater system complexity –Human assuming more supervisory roles over automation, requiring more cognitively complex human decision making 2

Hypothesis Complexity is reaching a new level (tipping point) –Old approaches becoming less effective –New causes of accidents appearing Need a paradigm change Change focus from Component reliability (reductionism) Systems thinking (holistic) Does it work? Evaluations and experiences so far show it works much better than what we are doing today. 3

Accident with No Component Failures Mars Polar Lander –Have to slow down spacecraft to land safely –Use Martian atmosphere, parachute, descent engines (controlled by software) –Software knows landed because of sensitive sensors on landing legs. Cut off engines when determine have landed. –But “noise” (false signals) by sensors generated when parachute opens. Not in software requirements. –Software not supposed to be operating at that time but software engineers decided to start early to even out load on processor –Software thought spacecraft had landed and shut down descent engines

Types of Accidents Component Failure Accidents –Single or multiple component failures –Usually assume random failure Component Interaction Accidents –Arise in interactions among components –Related to interactive and dynamic complexity –Behavior can no longer be Planned Understood Anticipated Guarded against –Exacerbated by introduction of computers and software

Software-Related Accidents Are usually caused by flawed requirements –Incomplete or wrong assumptions about operation of controlled system or required operation of computer –Unhandled controlled-system states and environmental conditions Merely trying to get the software “correct” or to make it reliable (satisfy its requirements) will not make it safer under these conditions.

A A B B C Unreliable but not unsafe Unsafe but not unreliable Unreliable and unsafe Confusing Safety and Reliability Preventing Component or Functional Failures is NOT Enough Scenarios Involving failures Unsafe scenarios

Traditional Ways to Cope with Complexity 1.Analytic Reduction 2.Statistics

Analytic Reduction Divide system into distinct parts for analysis Physical aspects  Separate physical components or functions Behavior  Events over time Examine parts separately Assumes such separation does not distort phenomenon –Each component or subsystem operates independently –Analysis results not distorted when consider components separately –Components act the same when examined singly as when playing their part in the whole –Events not subject to feedback loops and non-linear interactions

Chain-of-Events Accident Causality Model Explains accidents in terms of multiple events, sequenced as a forward chain over time. –Simple, direct relationship between events in chain Events almost always involve component failure, human error, or energy-related event Forms the basis for most safety engineering and reliability engineering analysis: e,g, FTA, PRA, FMEA/FMECA, Event Trees, etc. and design: e.g., redundancy, overdesign, safety margins, ….

Human factors concentrates on the “screen out” Engineering concentrates on the “screen in”

Not enough attention on integrated system as a whole

Analytic Reduction does not Handle Component interaction accidents Systemic factors (affecting all components and barriers) Software Human behavior (in a non-superficial way) System design errors Indirect or non-linear interactions and complexity Migration of systems toward greater risk over time

Summary The world of engineering is changing. If safety and security do not change with it, it will become more and more irrelevant. Trying to shoehorn new technology and new levels of complexity into old methods does not work

Systems Theory Developed for systems that are –Too complex for complete analysis Separation into (interacting) subsystems distorts the results The most important properties are emergent –Too organized for statistics Too much underlying structure that distorts the statistics Developed for biology (von Bertalanffy) and engineering (Norbert Weiner) Basis of system engineering and System Safety –ICBM systems of 1950s/1960s –MIL-STD-882

Systems Theory (2) Focuses on systems taken as a whole, not on parts taken separately Emergent properties –Some properties can only be treated adequately in their entirety, taking into account all social and technical aspects “The whole is greater than the sum of the parts” –These properties arise from relationships among the parts of the system How they interact and fit together

Emergent properties (arise from complex interactions) Process Process components interact in direct and indirect ways Safety and Security are emergent properties

Controller Controlling emergent properties (e.g., enforcing safety constraints) Process Control Actions Feedback Individual component behavior Component interactions Process components interact in direct and indirect ways

Controller Controlling emergent properties (e.g., enforcing safety constraints) Process Control Actions Feedback Individual component behavior Component interactions Process components interact in direct and indirect ways Air Traffic Control: Safety Throughput

Controls/Controllers Enforce Safety Constraints Power must never be on when access door open Two aircraft must not violate minimum separation Aircraft must maintain sufficient lift to remain airborne Bomb must not detonate without positive action by authorized person

Controlled Process Process Model Control Actions Feedback Role of Process Models in Control Controllers use a process model to determine control actions Accidents often occur when the process model is incorrect –How could this happen? Four types of unsafe control actions: Control commands required for safety are not given Unsafe ones are given Potentially safe commands given too early, too late Control stops too soon or applied too long Controller 21 (Leveson, 2003); (Leveson, 2011) Control Algorithm

STAMP (System-Theoretic Accident Model and Processes) Defines safety as a control problem (vs. failure problem) Works on very complex systems Includes software, humans, new technology Based on systems theory and systems engineering –Design safety (and security) into the system from the beginning –Expands the traditional model of the cause of losses

Safety as a Dynamic Control Problem Events result from lack of enforcement of safety constraints in system design and operations Goal is to control the behavior of the components and systems as a whole to ensure safety constraints are enforced in the operating system A change in emphasis: “prevent failures” “enforce safety/security constraints on system behavior”

Changes to Analysis Goals Hazard/security analysis: –Ways that safety/security constraints might not be enforced (vs. chains of failure events/threats leading to accident) Accident Analysis (investigation) –Why safety control structure was not adequate to prevent loss (vs. what failures led to loss and who responsible)

Systems Thinking

STAMP: Theoretical Causality Model Accident/Event Analysis CAST Hazard Analysis STPA System Engineering (e.g., Specification, Safety-Guided Design, Design Principles) Specification Tools SpecTRM Risk Management Operations Management Principles/ Organizational Design Identifying Leading Indicators Organizational/Cultural Risk Analysis Tools Processes Regulation Security Analysis STPA-Sec

STPA (System-Theoretic Process Analysis) A top-down, system engineering technique Identifies safety constraints (system and component safety requirements) Identifies scenarios leading to violation of safety constraints; use results to design or redesign system to be safer Can be used on technical design and organizational design Supports a safety-driven design process where –Hazard analysis influences and shapes early design decisions –Hazard analysis iterated and refined as design evolves

Aircraft Braking System System Hazard H4 : An aircraft on the ground comes too close to moving or stationary objects or inadvertently leaves the taxiway. Deceleration-related hazards: H4-1: Inadequate aircraft deceleration upon landing, rejected takeoff, or taxiing H4-2: Deceleration after the V1 point during takeoff H4-3: Aircraft motion when aircraft is parked H4-4: Unintentional aircraft directional control (differential braking) H4-5: Aircraft maneuvers out of safe regions (taxiway, runways, terminal gates, ramps, etc. H4-6: Main gear rotation is not stopped when (continues after) the gear is retracted.

Example System Requirements Generated SC1: Forward motion must be retarded within TBD seconds if a braking command upon landing, rejected takeoff, or taxiing SC2: The aircraft must not decelerate after V1 SC3: Uncommanded movement must not occur when the aircraft is parked SC4: Differential braking must not lead to loss of or unintended aircraft directional control SC5: Aircraft must not unintentionally maneuver out of safe regions (taxiways, runways, terminal gates and ramps, etc.) SC6: Main gear rotation must stop when the gear is retracted

Safety Control Structure

Generate More Detailed Requirements Identify causal scenarios for how requirements could be violated (using safety control model) Determine how to eliminate or mitigate the causes –Additional crew requirements and procedures –Additional feedback –Changes to software or hardware –Etc.

Automating STPA: John Thomas 36 Requirements can be derived automatically (with some user guidance) using mathematical foundation Allows automated completeness/consistency checking Hazardous Control Actions Discrete Mathematical Representation Predicate calculus /state machine structure Formal (model- based) requirements specification Hazards

Is it Practical? STPA has been or is being used in a large variety of industries –Spacecraft –Aircraft –Air Traffic Control –UAVs (RPAs) –Defense –Automobiles (GM, Ford, Nissan) –Medical Devices and Hospital Safety –Chemical plants –Oil and Gas –Nuclear and Electrical Power –C0 2 Capture, Transport, and Storage –Etc.

Does it Work? Most of these systems are very complex (e.g., the new U.S. missile defense system) In all cases where a comparison was made (to FTA, HAZOP, FMEA, ETA, etc.): –STPA found the same hazard causes as the old methods –Plus it found more causes than traditional methods –In some evaluations, found accidents that had occurred that other methods missed (e.g., EPRI) –Cost was orders of magnitude less than the traditional hazard analysis methods

Applies to Security Too (AF Col. Bill Young) Currently primarily focus on tactics –Cyber security often framed as battle between adversaries and defenders (tactics) –Requires correctly identifying attackers motives, capabilities, targets Can reframe problem in terms of strategy –Identify and control system vulnerabilities (vs. reacting to potential threats) –Top-down strategy vs. bottom-up tactics approach –Tactics tackled later –Goal is to ensure that critical functions and services provided by networks and services are maintained (loss prevention)

STPA-Sec Allows us to Address Security “Left of Design” ConceptRequirementsDesignBuildOperate System Engineering Phases Cost of Fix Low High Attack Response System Security Requirements Secure Systems Engineering Cyber Security “Bolt-on” Secure Systems Thinking Abstract SystemsPhysical Systems Build security into system like safety

Evaluation of STPA-Sec Several experiments by U.S. Air Force with real missions and cyber-security experts –Results much better than with standard approaches –U.S. Cyber Command now issuing policy to adopt STPA-sec throughout the DoD –Training began this summer Dissertation to be completed by Fall, 2014

Current Aviation Safety Research Air Traffic Control (NextGen): Cody Fleming, John Thomas –Interval Management (IMS) –ITP UAS (unmanned aircraft systems) in national airspace: Major Kip Johnson –Identifying certification requirements Safety in Air Force Flight Test: Major Dan Montes Safety and Security in Army/Navy systems: Lieut. Blake Albrecht Security in aircraft networks: Jonas Helfer

Other Safety Research Airline operations Workplace (occupational) safety (Boeing assembly plants) Non-aviation: –Hospital Patient Safety –Hazard analysis and feature interaction in automobiles –Leading indicators of increasing risk –Manned spacecraft development (JAXA) –Accident causal analysis –Adding sophisticated human factors to hazard analysis

Human Factors in Hazard Analysis

Controlled Process Process Model Control Actions Feedback Role of Process Models in Control Controllers use a process model to determine control actions Accidents often occur when the process model is incorrect –How could this happen? Four types of unsafe control actions: Control commands required for safety are not given Unsafe ones are given Potentially safe commands given too early, too late Control stops too soon or applied too long Controller 45 (Leveson, 2003); (Leveson, 2011) Control Algorithm

Model for Human Controllers

Conops Identify: -- Missing, inconsistent, conflicting information required for safety -- Vulnerabilities, risks, tradeoffs -- Potential design or architectural solutions to hazards Demonstrating on TBO (Trajectory Based Operations) Cody Fleming: Analyzing Safety in ConOps

Comparing Potential Architectures Control Model for Trajectory Negotiation

Modified Control Model for Trajectory Negotiation

Alternative Control Model for Trajectory Negotiation

It’s still hungry … and I’ve been stuffing worms into it all day.

Nancy Leveson, Engineering a Safer World: Systems Thinking Applied to Safety MIT Press, January 2012