Knowing When to Stop: An Examination of Methods to Minimize the False Negative Risk of Automated Abort Triggers RAM XI Training Summit October 2018 Patrick.

Slides:



Advertisements
Similar presentations
VSE Corporation Proprietary Information
Advertisements

EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Sensitivity Analysis In deterministic analysis, single fixed values (typically, mean values) of representative samples or strength parameters or slope.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
Systems Analysis and Design 8th Edition
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
SQM - 1DCS - ANULECTURE Software Quality Management Software Quality Management Processes V & V of Critical Software & Systems Ian Hirst.
Trade Study Training Need and Goals Need Consistent methodologies and practices performing trade studies Pros/cons, advantages/disadvantages, customer/management.
The Need for an Integrated View of Water Quality Modeling and Monitoring Bruce Kiselica USEPA, Region 2 Second Workshop on Advanced Technologies in Real.
1 Software Testing and Quality Assurance Lecture 35 – Software Quality Assurance.
Vegard Joa Moseng BI - BL Student meeting Reliability analysis summary for the BLEDP.
LSU 10/09/2007System Design1 Project Management Unit #2.
Risk management in Software Engineering T erm Paper By By Praveenkumar Sammita Praveenkumar Sammita CSC532 CSC532.
WHAT IS SYSTEM SAFETY? The field of safety analysis in which systems are evaluated using a number of different techniques to improve safety. There are.
ERT 312 SAFETY & LOSS PREVENTION IN BIOPROCESS RISK ASSESSMENT Prepared by: Miss Hairul Nazirah Abdul Halim.
Protecting the Public, Astronauts and Pilots, the NASA Workforce, and High-Value Equipment and Property Mission Success Starts With Safety Believe it or.
Looking at systems. What is a SYSTEM? The radiator makes the room warmer by turning ON when the temperature of the room is lower than required The THERMOSTAT.
Planning and Analysis Tools to Evaluate Distribution Automation Implementation and Benefits Anil Pahwa Kansas State University Power Systems Conference.
FAULT TREE ANALYSIS (FTA). QUANTITATIVE RISK ANALYSIS Some of the commonly used quantitative risk assessment methods are; 1.Fault tree analysis (FTA)
Slide 1V&V 10/2002 Software Quality Assurance Dr. Linda H. Rosenberg Assistant Director For Information Sciences Goddard Space Flight Center, NASA
NSTX Centerstack Upgrade: initial discussions of the Machine Protection System (MPS) Robert Woolley 4 November 2009.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter(3) Qualitative Risk Analysis. Risk Model.
Idaho RISE System Reliability and Designing to Reduce Failure ENGR Sept 2005.
March 2004 At A Glance autoProducts is an automated flight dynamics product generation system. It provides a mission flight operations team with the capability.
RLV Reliability Analysis Guidelines Terry Hardy AST-300/Systems Engineering and Training Division October 26, 2004.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
"... To design the control system that effectively matches the plant requires an understanding of the plant rivaling that of the plant's designers, operators,
Using functional analysis to determine the requirements for changes to critical systems: Railway level crossing case study Joe Silmon, Clive Roberts Centre.
Probabilistic Risk Assessment and Conceptual Design Bryan C Fuqua – SAIC Diana DeMott – SAIC
Toward a New ATM Software Safety Assessment Methodology dott. Francesca Matarese.
Failure Modes, Effects and Criticality Analysis
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Landing in CONF 3 – Use of reversers
Part II AUTOMATION AND CONTROL TECHNOLOGIES
Prepared by: Fatih Kızkun
Analysis Manager Training Module
Information Systems Development
NERC Published Lessons Learned Summary
Chapter 14 Introduction to Multiple Regression
Robert Anderson SAS JMP
Outcome of BI.DIS Fast Interlocks Peer Review
Saint- Petersburg State University of Aerospace Instrumentation
Information Technology Controls
Analysis Using Spreadsheets
Albert M. K. Cheng Embedded Real-Time Systems
Software Testing An Introduction.
Chapter 1- Introduction
Engine Control Systems
FAULT TOLERANCE TECHNIQUE USED IN SEAWOLF SUBMARINE
Chapter 4 MATLAB Programming
PROJECT RISK MANAGEMENT
CS 325: Software Engineering
Software Engineering (CSI 321)
of Heritage and New Hardware For Launch Vehicle Reliability Models
Frequently asked questions about software engineering
Information Systems Development
Planetary Protection Category V Restricted Earth Return
In this tutorial you will:
Designed-in Logic to Ensure Safety of Integration and Field Engineering of Large Scale CBTC Systems Author: Fenggang Shi.
A. Mancusoa,b, M. Compareb, A. Saloa, E. Ziob,c
RAM XI Training Summit October 2018
HMI Reliability Dale Wolfe Reliability Engineer LMSSC*ATC*LMSAL
Welcome to Corporate Training -1
Project Management Unit #2
Definitions Cumulative time to failure (T): Mean life:
Agile Project Management and Quantitative Risk Analysis
ERP and Related Technologies
Presentation transcript:

Knowing When to Stop: An Examination of Methods to Minimize the False Negative Risk of Automated Abort Triggers RAM XI Training Summit October 2018 Patrick Fussell, QD35, Bastion Technologies, Inc. Esha Rahaman, QD35, Bastion Technologies, Inc.

Introduction Crewed launch vehicles contain a series of abort options Ground Control and Range Safety are able to manual initiate aborts and the flight safety system However, due to the very fast response time required to ensure safety of the crew in many abort scenarios, the flight computers are able to initiate automatic aborts An automated abort will safe the vehicle by cutting off the engines, get the crew capsule away from the vehicle, and if necessary activate the flight safety system General abort information

Automated Aborts Flight computers monitor sensor data to determine if abort actions should be initiated based on current flight conditions These range from simply warnings to immediately aborting and initiating the flight safety system Failure to identify an out of range condition it is considered a false negative failure If the flight computer takes an abort action when conditions are nominal it is considered a false positive failure Chart 1 shows a simulated sensor signal nominally functioning in range. Chart 2 shows a simulated sensor signal nominally functioning, showing the measured condition out of range at 45s

Sensor Failure Modes Sensor failure modes include ‘fail high’ and ‘fail low’ Fail High - upon failure the signal will be at the upper range Fail Low - upon failure the signal be at the lower range Either sensor failure can potentially lead to a false positive OR a false negative failure depending on the limits for that parameter Chart 1 shows a simulated failed high sensor signal at 45s going out of range Chart 2 shows a simulated failed low sensor signal at 45s

Sensor False Negative Mitigation To protect against false negative failures a variety of methods are used Sensor Redundancy Sensor failure logic resiliency Sensors for multiple independent flight conditions Unfortunately, most of these will inadvertently increase the false positive risk This presentation will review PRA analysis of several common methods to mitigate false negative risks and present sensitivities showing how the methods affect false negative and positive risks

Additional Considerations Software used to interpret the signals will be modified when using any of these methodologies This increase in complexity, will in turn decrease software reliability This can be a driver in decreasing over-all reliability when you are trying to mitigate it Once the lognormal parameters of the distributions are established, those parameters are used to estimate the failure probability of the component Then it is simple to calculate the probability of failure using Excel functions rather than by numerical integration

Quantitative Analysis Using Notional Sensor Hardware In the following examples we will be looking at two kinds of sensors The first has a failure mode distribution of 15% Fails High, and 85% Fails Low with a reliability of 0.99999 The second sensor has a failure mode evenly split with a reliability of 0.999995 These are notional, but realistic failure probabilities A false negative will be defined as a failed low signal from all sensors In our examples this would cause a failure to abort when the situation would require an abort A false positive will be defined as any failed high sensor In our example this causes a spurious abort Once the lognormal parameters of the distributions are established, those parameters are used to estimate the failure probability of the component Then it is simple to calculate the probability of failure using Excel functions rather than by numerical integration

Sensor Redundancy Scenario Fails to Abort Spurious Abort Total Total+SW One Sensor 1 in 120,000 1 in 670,000 1 in 100,000 1 in 91,000 Two Sensors 1 in 1,700,000 1 in 330,000 1 in 280,000 1 in 210,000 Failure to Abort Adding another sensor reduces the false negative risk exposure This however increases exposure to false positive failures Depending on the sensor configuration’s failure mode distribution this can end up leading to a higher over-all loss of mission risk Increase in software complexity decreases software reliability False negative is not lowered as much as may be expected due to Common Cause Spurious Abort

Sensor Failure Logic Resiliency Scenario Fails to Abort Spurious Abort Total Total+SW One Sensor 1 in 120,000 1 in 670,000 1 in 100,000 1 in 91,000 3/4 Sensor logic 1 in 980,000 1 in 5,600,000 1 in 830,000 1 in 310,000 Adding a three of four required sensor logic reduces both false negative and positive This will again decrease software reliability, but generally not enough to outweigh the benefits While this lowers the overall risk, it also increases flight sensor hardware, and may not be feasible when modifying a current design due to cost and schedule constraints Failure to Abort Spurious Abort logic will be identical, but with fails high.

Applying Independent Sensor Check Scenario Fails to Abort Spurious Abort Total Total+SW One Sensor 1 in 120,000 1 in 670,000 1 in 100,000 1 in 91,000 Independent Sensor added < 1 in 1,000,000,000 1 in 250,000 1 in 190,000 Using multiple parameters, such as temperature and pressure to self check 3/3 low signals leads to a false negative 1/3 high signals triggers the abort Essentially reduces all risk due to a false negative, due to independent sensor failure allowing common cause to be less of an issue Adds significant software complexity, decreasing reliability of software Failure to Abort Once the lognormal parameters of the distributions are established, those parameters are used to estimate the failure probability of the component Then it is simple to calculate the probability of failure using Excel functions rather than by numerical integration Spurious Abort

Independent Sensor added Design Change Scenario Fails to Abort Spurious Abort Total Total+SW Two Sensors 1 in 1,700,000 1 in 330,000 1 in 280,000 1 in 210,000 Independent Sensor added 1 in 250,000 < 1 in 1,000,000,000 1 in 190,000 In a hypothetical design change, a redundant sensor design was found to be too susceptible to false positive failure, driven by common cause failures This leads to an effort to redesign the sensors to make use of an independent sensor already available The change will essentially remove the chance of a false negative failure However, due to the software changes required and the additional risk of false positive from the added sensor, PRA results show that the change will lead to an overall decrease in reliability

Summary /Conclusion The failure of sensors used in automatic aborts can lead to two very different failure scenarios based on the failure mode Components, such as sensors, that have multiple failure modes and effects, can lead to unintentional risk increases if design changes are focused on improving only one of these effects. When performing a design change, risk-based analysis with a PRA model is a critical input for risk- based decision making This gives decision makers a full picture of the risk that will be incurred/removed from the system based on the proposed change When analyzing sensors, or any component with varying failure modes and effects, using a fully integrated system model to ensure all risk changes are captured is vital The best option reliability wise is resiliency using a voting logic. This has a higher reliability for both failure modes. Cost, vehicle weight, and schedule often require a compromise, using a PRA in tandem with design will help balance risk vs. costs