National Aeronautics and Space Administration www.nasa.gov Estimating Flight Software Risk for a Space Launch Vehicle 2014 RAMS VII Workshop November 4,

Slides:



Advertisements
Similar presentations
Quantitative Cost & Schedule Risk Assessment
Advertisements

Maintenance Forecasting and Capacity Planning
Mission Success Starts with Safety The Similarities and Differences of Reliability Engineering and Probabilistic Risk Assessment RAMS VII Workshop November.
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
OPSM 639, C. Akkan1 Defining Risk Risk is –the undesirable events, their chances of occurring and their consequences. Some risk can be identified before.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
©GoldSim Technology Group LLC., 2004 Probabilistic Simulation “Uncertainty is a sign of humility, and humility is just the ability or the willingness to.
Reliability Risk Assessment
ARIANE 5 FAILURE ► BACKGROUND:- ► European space agency’s re-useable launch vehicle. ► Ariane-4 was a major success ► Ariane -5 was developed for the larger.
Introduction to Risk Assessment in Engineering: With Application to Heat Shield Reliability Modeling Presented by: Austin Howard University of Idaho Mechanical.
Software Process and Product Metrics
Software Testing and QA Theory and Practice (Chapter 15: Software Reliability) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Civil Government Services Group 1 Return on Investment of Independent Verification and Validation: Indirect Benefits James B. Dabney, Gary Barber, Don.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 2 Slide 1 Systems engineering 1.
CARLOS CEDEÑO DSES /04/2008 Reliability of the Three Main Engines of Space Shuttle.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Chapter 24 - Quality Management Lecture 1 1Chapter 24 Quality management.
Aaron Hoff. Overview Compare and hardware and software reliability Discuss why software should be reliable? Describe MLE (Maximum Likelihood Estimation)
Software Project Management
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Software Reliability Categorising and specifying the reliability of software systems.
CSCI 5801: Software Engineering
1 Reliability Engineering Program University of Maryland at College Park September 5, 2001 Integrating the Contribution of Software into Probabilistic.
Relex Reliability Software “the intuitive solution
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 15 Software Reliability
CRASH AND BURN ARIANE 5 Kristen Hieronymus SYSM6309 Advanced Requirements Engineering
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Software Reliability SEG3202 N. El Kadri.
Analyze Opportunity Part 1
Chapter 6 : Software Metrics
Protecting the Public, Astronauts and Pilots, the NASA Workforce, and High-Value Equipment and Property Mission Success Starts With Safety Believe it or.
Boğaziçi University Software Reliability Modelling Computer Engineering Software Reliability Modelling Engin Deveci.
National Aeronautics and Space Administration From Determinism to “Probabilism” Changing our mindsets, or why PTC isn’t an easy sell - yet.
West Virginia University Towards Practical Software Reliability Assessment for IV&V Projects B. Cukic, E. Gunel, H. Singh, V. Cortellessa Department of.
Assessment of Alternate Methodologies for Establishing Equivalent Satisfaction of the Ec Criterion for Launch Licensing Terry Hardy AST-300/Systems Engineering.
Lecture 4 Software Metrics
AMERICA’S ARMY: THE STRENGTH OF THE NATION Mort Anvari 1 Cost Risk and Uncertainty Analysis MORS Special Meeting | September.
SEG3300 A&B W2004R.L. Probert1 COCOMO Models Ognian Kabranov.
SOFTWARE METRICS. Software Process Revisited The Software Process has a common process framework containing: u framework activities - for all software.
Quantitative Project Risk Analysis 1 Intaver Institute Inc. 303, 6707, Elbow Drive S.W., Calgary AB Canada T2V 0E5
SFWR ENG 3KO4 Slide 1 Management of Software Engineering Chapter 8: Fundamentals of Software Engineering C. Ghezzi, M. Jazayeri, D. Mandrioli.
California Institute of Technology Estimating and Controlling Software Fault Content More Effectively NASA Code Q Software Program Center Initiative UPN.
1 Report on results of Discriminant Analysis experiment. 27 June 2002 Norman F. Schneidewind, PhD Naval Postgraduate School 2822 Racoon Trail Pebble Beach,
1 Software Quality Assurance COMP 4004 Notes Adapted from S. Som é, A. Williams.
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
Pavan Rajagopal, GeoControl Systems James B. Dabney, UHCL Gary Barber, GeoControl Systems 1Spacecraft FSW Workshop 2015.
RLV Reliability Analysis Guidelines Terry Hardy AST-300/Systems Engineering and Training Division October 26, 2004.
1 Overview of Maintenance CPRE 416-Software Evolution and Maintenance-Lecture 3.
West Virginia University Sherif Yacoub, Hany H. Ammar, and Ali Mili A UML Model for Analyzing Software Quality Sherif Yacoub, Hany H. Ammar, and Ali Mili.
Advanced Software Engineering Lecture 4: Process & Project Metrics.
CSE SW Metrics and Quality Engineering Copyright © , Dennis J. Frailey, All Rights Reserved CSE8314M13 8/20/2001Slide 1 SMU CSE 8314 /
Copyright , Dennis J. Frailey CSE Software Measurement and Quality Engineering CSE8314 M00 - Version 7.09 SMU CSE 8314 Software Measurement.
Probabilistic Risk Assessment and Conceptual Design Bryan C Fuqua – SAIC Diana DeMott – SAIC
-1 Instructor: Dr. Yunes Mogheir.  By considering the system variables as random, uncertainties can be quantified on a probabilistic framework.  Loads.
Mean Time To Repair
Stracener_EMIS 7305/5305_Spr08_ Systems Availability Modeling & Analysis Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7305/5305.
CSCI 3428: Software Engineering Tami Meredith Chapter 11 Maintaining the System.
SwCDR (Peer) Review 1 UCB MAVEN Particles and Fields Flight Software Critical Design Review Peter R. Harvey.
Topic: Reliability and Integrity. Reliability refers to the operation of hardware, the design of software, the accuracy of data or the correspondence.
Failure Modes, Effects and Criticality Analysis
1 Common Cause Modeling Huntsville Society of Reliability Engineers RAM VIII Training Summit November 3-4, 2015 Frank Hark Bastion Technologies, Inc. Paul.
Certification of Reusable Software Artifacts
Software Defects Cmpe 550 Fall 2005
ANCOLD/NZSOLD Conference 2004 Melbourne, Victoria, Australia
of Heritage and New Hardware For Launch Vehicle Reliability Models
Software Reliability Models.
RAM XI Training Summit October 2018
COCOMO Models.
Knowing When to Stop: An Examination of Methods to Minimize the False Negative Risk of Automated Abort Triggers RAM XI Training Summit October 2018 Patrick.
Design period is estimated based on the following:
Presentation transcript:

National Aeronautics and Space Administration Estimating Flight Software Risk for a Space Launch Vehicle 2014 RAMS VII Workshop November 4, 2014 S. Novack 1, S.Hatfield 1, M. Al Hassan 1,J.Bales 2, P. Britton 2, F.Hark 1, J.Stott 2 1 Bastion Technologies, Inc. 2 NASA Marshall Space Flight Center

Why is Understanding Software Risk Important to NASA? “Software had a critical role in a large share (~ 60% in quantitative terms) of the failures suffered by NASA in high-stakes missions during the 1998 – 2007 decade.” (NASA PRA Procedures Guide, 2010)  Mariner 1 (incorrect formula coded)  Phobos (deactivated thrusters)  Ariane 5 (reused Ariane 4 software)  Mars Polar Lander (mistook turbulence for landing)  Mars Climate Orbiter (unit conversion)  Spirit (flash memory full)  CryoSat-1 (missing shutdown command)  Mars Global Surveyor (assumption of motor failure)  Express-AM4 (faulty parameter)

Software versus Hardware Characteristics  Failure cause  Wear-out  Repairable system concept  Environmental factors  Reliability prediction  Redundancy  Interfaces  Failure rate motivators  Standard components The NASA definition of software failure is “Flight Computer Software (FSW) performs the wrong action, fails to perform the intended action, or performs the intended action at the wrong time.” NASA Shuttle Probabilistic Risk Assessment (PRA), SSA rev B

Potential Methods Used to Obtain Software Risk  Mathematical model using Space Launch Vehicle software data from flights  Best possible data (assuming large data sets)  Unavailable for first flight  Mathematical model using historical data on a “like” system  Low/moderate uncertainty  May require modifications to fit Space Launch Vehicle  Will provide a path to incorporate Space Launch Vehicle specific software data (test and flight)  Heuristic method using generic statistical data  Moderate uncertainty  Different approaches, different results  Context-based Software Risk Model (CSRM)  Off-nominal scenarios  Can be based on specific failure modes  SOFTREL  Ann Marie Neufelder  Expert judgment  Highest uncertainty

Software Reliability Metrics  Product  Code complexity, which is directly related to reliability  Project Management  Organizational and management influence on the code outcome and reliability  Process  Function of upgrades and software improvements to reach some steady state  Fault and Failure  Reliability trending estimated from software fault and failure data

Desired Software Reliability Characteristics  Vetted approach  Quantitative and Complete  Tractable  Easily understood and traceable  Assumptions  Sensitive to environmental (context) failures  Uses available test results and operational experience  Quantifies uncertainty  Can account for Common Cause Failures

Considerations for Applying a Historical Data Approach  Software Process  Coding practices  Testing schemes  Programming Language and Operating system  Comparable language or equivalent basis  Flight time  Mission versus Ascent or Reentry  Software errors during different phases of flight  Accounting  Reliability basis Source Lines of code (SLOC)  Computer system structure  Redundancy  Independence or backup systems

Different Categories of Software Failures  Coding errors  Errors introduced into the code by human design or error  Are not detected during testing prior to first flight  Have the potential to manifest during nominal mission conditions  Context Latent Errors (LEs)  Errors manifest during off-nominal conditions  May need very specific conditions to manifest  Would typically not be caught during testing  “A segment of code that fulfills its requirements except under certain off- nominal and probably unanticipated conditions.”  Operating system errors  Includes response to application errors  Memory allocation problems  Background processes 3 software failure types Latent Context Error Coding Error Operating System Errors

Approach for Coding and Latent Errors  Identify selected scenarios for off-nominal conditions  Ex: Spacecraft navigation errors  Determine how software contributes to selected scenarios (i.e., the software failure mode)  Software causes a hardover  Determine what portion of the code could cause the software error (e.g., modules, partitions, functions)   Calculate Total Source Line of Codes (SLOCs) for each failure mode  Adjust reliability if needed to match historical data  Estimate risk for each software failure mode based on Total SLOC per software failure mode 9

Example of Historical Reliability Data  Uncertainty increases with higher levels of testing  Quantify historic system differences (redundancy)  Operational errors making it through testing onto flight  Establishes a reliability constant for SLOC that can be corrected for new system

Example of Software Failure Modes by Code Segments Software Failure ModeAssociated Software Conversion/corruption data on navigation Data Conversion Engine Code Last Good Data on BUS (Stale Data) Input Data Exchange OutPut Data Exchange Application Code Data Conversion Data InInfrastructure Command Code Loss of Computer Communication Communicat ion Code System Code Booster hardover (left) Booster Code Booster hardover (right) Booster Code Navigation Whole Vehicle Hardover Navigation Code Booster Code No Booster Sep Command Application Code System Code Command Early or Late System Code Inadvertent Command Data Conversion Application Code No Command Application Code

A Nominal Software Risk Example and Ranges  Reliability growth curves can provide ranges  Uncertainty may be affected by extrapolation of historical data

Special Considerations  Treatment of Software Common Cause Failures (CCFs)  Software CCF basic events can parallel hardware events ( NUREG CR-6268 )  Uncertainty  Historical or statistical based failure rates can have associated distributions  Variability can be propagated via the distributions and standard uncertainty analysis methods  Software basic events can be correlated for Monte Carlo routines  Software support  Provide prioritized for software testing due to risk-informed data  Model estimates can be provided for software reliability requirements per system  Focus on software testing schemes  Firmware  Many of the major systems contain independently developed firmware or software (e.g., Navigation system, engine)  Similar approach coupled with hardware reliability data can be used to estimate firmware reliability

Summary and Conclusions  Software reliability is very difficult to predict and standard hardware reliability methods may be inappropriate  Software groupings (e.g., modules, functions) can be used with a base unit of complexity (SLOC) to determine reliability  This approach allows testing information to be rolled into a module failure scheme at the functional level  May be directly linked to physical partitions and processes  Can be broken down into a component level to support software development needs  Provides a meaningful resolution in the Reliability (how and why a system fails)  May have a direct correlation with vehicle specifications and requirements

Questions? POC: Steven Novack, Bastion Technologies, Inc