Presentation on theme: "An Approach to the Software Aspects of Safety Management"— Presentation transcript:
1 An Approach to the Software Aspects of Safety Management Ron StroupFAA, Office of Information ServicesProcess Engineering Division, AIO-200Software Safety and Certification LeadPH. (202)Good afternoon. My name is Ron Stroup, from the Office of Information Services, process Engineering Division, AIO My specific responsibilities include the development and implementation of software safety and certification processes and standards both within the FAA and to harmonize those standards within the international aviation communities.The focus of my presentation today is to address the issues the FAA safety and certification communities are currently working as documented in the 1997 GAO report.
2 National Airspace System (NAS) The NAS is not defined by a single component or system, rather it is a complex collection of systems, procedures, facilities, aircraft, and of course, at the base of it all, making it work, people. The NAS represents the overall environment for the safe operation of aircraft.The FAA has responsibility for civil aviation safety. The FAA’s mission is to ensure the safety, security, and system efficiency of the National Airspace System. The ever increasing system complexity, interdependencies, and the ever increasing dependence on software intensive mission critical and safety critical software have placed a heightened sensitivity to ensuring end-to-end system safety.The NAS is a highly technical system and includes some 36,000 pieces of equipment operating in hundreds of locations throughout the United States. At present there are approximately 45 million flights operating throughout the United States per year. The system provides communications, navigation, surveillance, display, flight planning, and weather data to controllers, traffic managers, and pilotsThe FAA has recognized the need for and taken a proactive approach to ensuring software safety engineering is applied effectively and consistently throughout the National Airspace System.
3 FAA Experience (1/2) What were our concerns? Ineffective Risk Management.Immature software acquisition processes.GAO Report - Air Traffic Control: Immature Software Acquisition Processes Increase FAA’s System Acquisition Risks. AIMD-97-47, March 1997What were our concerns? I talked around risk management on my previous slide, now I would like to more directly identify the concern. Previously our risk management often resulted in unsatisfied stakeholders, poor performance and/or cost and schedule overruns. Each program was treated as an island with the resolution of system interdependence issues and safety issues being discovered, resolved, or worst case, ignored until formal systems testing or operational evaluation in the field. Obviously, discovering issues at the culmination of the design and development efforts greatly contributed to or resulted in the overall dissatisfaction of the stakeholders, system performance, cost and schedule overruns, and ultimately the targeted safety of the system. As I’m sure everyone present recognizes that the most effective and successful safety programs design in safety upfront, as backfitting safety design features usually fall short of the desired results. Changing the culture to one of early detection and reduction of risk was required.We also discovered that we were using immature software acquisition processes, which honestly were ad hoc and chaotic. Software is the most costly and complex component of an Air Traffic System and we had no standardized means of evaluating and improving our processes. I shall discuss our improvement initiatives latter on in the presentation.
4 FAA Experience (2/2) How are we improving? Ineffective Risk ManagementDevelop safety risk management policy.(FAA Order Safety Risk Management)(Software Safety and Certification Initiative)Improve knowledge of systems engineering.(Systems Engineering Council)Immature software acquisition processes.Improve knowledge of software engineering.(Software Engineering Body of Knowledge)Develop software policy, practices, and technologies.(FAA integrated Capability Maturity Model)How are we improving? This slide identifies a number of initiatives that have been undertaken by the FAA to address the concerns.Order establishes the safety risk management policy within the FAA. To comply with Order , the FAA launched a Software Safety and Certification program to improve certification/approval practices for the software aspects of CNS/ATM ground-based systems and airborne systems. The Systems Engineering Council was also launched to develop common systems engineering activities across the National Airspace System.Software Engineering body of knowledge provides a systematic, concise, and complete description of the software engineering discipline (methodologies, sources, anticipated use, etc.). This is supplemented by the Software Engineering Curriculum Framework for determining, assessing, and improving software engineering competencies.The FAA-iCMM is a model that describes the essential elements of an organizations process that must exist to ensure good acquisition of software intensive systems. The model combines the features of the software acquisition, software, and systems engineering CMM models.
5 Order 8040.4 Safety Risk Management PurposeEstablished safety risk management policyFormalized process for all high-consequence decisions.Prescribes procedures for implementing safety risk management and decision-making toolPlan, Identify, Analysis, Assess, DecisionEstablishes Safety Risk Management CommitteeProvides advice, counsel the organizationsSafety Risk Management CommitteeProvides supplemental support to assist in the overall risk analysis capability and efficiency of key FAA organizationsMaintains a risk management resource directoryRisk methodologies employedResource assistanceIdentifying suitable risk analysis tools and trainingFORMALIZE A COMMON SENSE APPROACHOrder Safety Risk Management was signed by the administrator in June 1998.This Order formalizes the Safety Risk policy for all high consequence decisions. A high consequence decision is defined as one that either creates or could be reasonably estimated to result in a statistical increase or decrease in personal injuries and/or loss of life and health, a change in property value, loss of or damage to property, cost or savings, or other economic impacts valued at 100 million or more per annum.A Safety Risk Management Committee was formed to provide a service to the various FAA organizations to provide assistance in the development of a comprehensive and effective plan for the management of safety risk. The SRMC meets periodically to exchange risk management ideas and information and provide advice and counsel to the office of system safety and other management officials upon request.
6 System Engineering Council PurposeOrchestrates common systems engineering activities across the NASResponsibility, authority, and accountability for the development, documentation, deployment, control, and monitoring of the systems engineering process.ProductsSystem Engineering Management PlanSystem Engineering ManualA systems engineering council was implemented to assist in the consistent and efficient application of system engineering throughout the various NAS System Components.The System Engineering Council has four primary functions:1. Systems engineering leadership.2. Development of processes and tools using govt. and industry standards.3. Facilitate problem definition and resolution4. Advocacy for resources to accomplish system engineeringProducts currently being developed by this council are the System Engineering Management Plan and the System Engineering Manual.The SEMP provides an organizational focus to discuss roles and responsibilities for systems engineering as a process and discipline applied across the FAA.The SEM identifies the technical and programmatic activities and products as the program moves from the initial idea through disposal and elimination of the system.
7 System Safety Working Group PurposeWorking arm of the System Engineering CouncilAssists in supporting and evaluating Comparative and Operational Safety AssessmentsProductsSystem Safety Management PlanSystem Safety HandbookThe System Safety Working Group is an advisory body of FAA System Safety professionals. The near-term purpose is to establish guidance for conducting safety risk management processes in accordance with Order Our long-term purpose is to control and implement these processes.Products currently being developed by the SSWG is the System Safety Management Plan and the System Safety Handbook.The SSMP establishes and defines the FAA plan for ensuring that system safety is effectively integrated into the NAS modernization in accordance with FAA orders and AMS policy.The SSH provides instructions on how to perform system safety engineering and management (best practices).
8 Acquisition Management System The FAA’s Acquisition Management System (AMS)/Life-cycle Management System (LMS) consists of:Mission NeedsInvestment AnalysisSolution ImplementationIn-Service ManagementService-life ExtensionThe AMS phases are:- Mission Analysis enables the Joint Resource Council to determine and prioritize its most critical capability shortfalls and best technology opportunities for improving the FAA’s overall safety, security, capacity, efficiency, and effectiveness in providing services to its customers.- Investment analysis defines the functional and performance strategy to satisfy the agency’s mission needs and baseline the best overall solution for satisfying critical capability shortfalls.Solution Implementation begins after the JRC selects a solution and ensures that products are shown to meet user requirements, be operationally suitable, and be compatible with other operational systems prior to an in-service decision.In-Service Management establishes a framework for evolutionary product development and to identify operational problems early enough to upgrade or replace products prior to their obsolescence.System Safety Management shall be conducted and documented throughout the acquisition management system.
9 Solution Implementation In-Service Management System Safety ProcessMission NeedsInvestment AnalysisSolution ImplementationIn-Service ManagementService-lifeExtensionJRC1JRC2ISDOption1Concept of OperationOperations and MaintenanceUpgrade or RetireOption SelectionOption2Option3OSANAS SSMPPHACRASSPPSHA/SSHASSARHTRRCRAThis slide shows the various safety analyses and activities to be accomplished through a combined effort by the System Safety Working Group and the Integrated Product Team throughout the systems acquisition life cycle.Prior to Order being implemented, the safety analyses and activities were not being accomplished until the Solution Implementation Phase. There was also inconsistency among the Integrated Product Teams as to the analyses and activities to be performed. This resulted in programs busting their cost and schedule baselines.Today each line of business involved in the acquisition management must institute a system safety management process that includes at minimum: hazard identification, hazards classification, measures to mitigate the hazards to an acceptable level, verification that mitigation measures are incorporated into product design, and assessment of residual risk.We are also establishing a NAS Wide Hazard Tracking and Risk Resolution database to ensure a closed loop process of managing safety hazards and risks.System Safety ProgramNAS System Safety Management (Hazard Tracking)
10 FAA CNS/ATM SoftwareFAA-iCMMSoftware developmentSoftware assuranceImplement and integrate software engineering processes into systems engineering.As I stated earlier, software safety engineering cannot perform effectively outside the boundaries to the total system engineering effort. As I discuss the specific components it must be clear that there is interaction to the systems engineering effort even though it may not be clearly identified.The structure of our software quality model is one based onStrategic (FAA-iCMM)Enablers and tools (IEEE12207, DO-178B)Tailored practices (FAA-STD-026, Software Assurance Guidelines)
11 Software Quality Triangle Establishes a process and documentation guidance for software developmentEstablishes a level of confidence for software that is consistent with its environmentSoftware Assurance GuidanceFAA-STD-026 (IEEE12207)QUALITY SW FOR NAS SYSTEMSThis slide provides a graphical view of the software quality triangleFAA-iCMM elements include the following processesEngineering (Requirements, SW Development, System Test),Project (Proj. Mgt., Risk Mgt., Contracts Mgt.),Supporting (QA, CM, Measurement), andOrganization (Implementation, training).FAA-STD-026 establishes the requirements for software development associated with NAS acquisitions. Formally this standard required Mil-Std-498, now IEEE and we are developing an implementation document to standardize with the FAA-iCMM, AMS, and software assurance guidelines.Software Development Assurance provides a level of confidence for the software in safety-critical systems that is consistent with other components of the NAS and will meet the safety requirements of the system.I will concentrate the remainder of my presentation on the use of software assurance as a vehicle to achieving desired targeted level of safety and security integrity within the NAS.FAA-iCMMEstablishes essential elements of an organizations software acquisition, engineering, and management process
12 Software Assurance What do we want to achieve? Identify the objectives necessary, throughout the life cycle process, to provide confidence that a product and process satisfies given safety and security integrity level requirements. ICAO has established a targeted Global Risk Factor of extremely remote or 10-7As systems become more complex and software-intensive, the ability to establish and maintain acceptable safety and security integrity level requirements has become increasingly more difficult.Software safety and security integrity level requirements are satisfied by applying rigorous design analysis to the system. This analysis includes, but is not limited to: requirements validation and verification, requirements-based testing, system testing, and structural coverage analysis.Other communities may discuss safety and security separately, however, the FAA, based on the NAS infrastructure, must consider that an overt security breach could result in a mishap.
13 Safety and Security Similarities ANALYSISREQUIREMENTSVERIFICATIONSECURITYVulnerability/Threat AssessmentRisk DeterminationSecurity RequirementsPenetration testingSAFETYOperational Safety AssessmentRisk DeterminationSafety RequirementsRequirements-based testingAs you can see on this slide, the qualification processes for safety and security are similar. You analyze the vulnerabilities, develop mitigating requirements, and verify their effectiveness.
14 Preliminary Safety/Security Model System Development ProcessSystem Security ProcessSystem Safety ProcessRequirements SpecificationAssurance MilestonesProtection ProfilesOperational Safety AssessmentMission Needs/ Investment AnalysisThreat AnalysisPreliminary Hazard AnalysisPreliminary Vulnerability AssessmentRequirements AnalysisSafety RequirementsSecurity RequirementsSecurity TargetSolution ImplementationSystem SpecificationRefined Vulnerability AssessmentSW Spec.HW Spec.ProceduresSystem/SubSystem Hazard AnalysisSW DesignContinued AnalysisThis model shows the various activities and their relationship to the system development process, system safety processes, and the system security processes within the Acquisition Management System.Our goal is to have complete, well-defined requirements by the completion of the investment analysis phase to establish the proper baseline and reduce the risk, cost, and schedule of programs.We are attempting to look more at a systems approach to safety and security.In the past, we were uncovering deficiencies late in the design and risk assessments were too focused. We have found that acceptable risks in independent systems could contribute to a mishap when fully integrated within the NAS. We have now refocused our safety/security programs to evaluate the NAS as a whole, to ensure total end-to-end system safety and security.Our goal over the next year will be to evaluate and refine this model and to identify those activities, products and assurance points that are necessary to assure the design of a safe and effective system.SW CodeSW IntegrationOperating & Support Hazard AnalysisSystem Integration & TestCertificationIn-Service DecisionIn-Service ManagementHazard Tracking & Monitor Residual RiskService Life ExtensionMonitor VulnerabilitySustainment & Retirement
15 SummaryThe FAA continues to refine its systems and software engineering processesWe are focusing on the technical and programmatic efficiencies that can be achieved by integrating safety and security into the system life cycle processes.The FAA is present to gain knowledge and understanding from other industries on their approach to mitigating safety issues.The FAA continues to refine its systems and software engineering processes.We are focusing on the technical and programmatic efficiencies that can be achieved by integrating safety and security into the system life cycle processes.I would like to thank Dr. Leveson for the opportunity to discuss our issues before this prestigious body.
17 Acronyms (1/2) AIO Office of Information Services AMS Acquisition Management SystemATM Air Traffic ManagementCNS Communications, Navigation and SurveillanceCRA Comparative Risk AnalysisFAA Federal Aviation AdministrationFMEA Failure Modes Effects AnalysisHTRR Hazard Tracking and Risk ResolutionICAO International Civil Aviation OrganizationICMM Integrated Capability Maturity ModelISD In-Service DecisionJRC Joint Resource Council
18 Acronyms (2/2) LMS Life-cycle Management System NAS National Airspace SystemOSA Operational Safety AssessmentPHA Preliminary Hazard AssessmentSEMP System Engineering Management PlanSEM System Engineering ManualSHA System Hazard AnalysisSSH System Safety HandbookSSHA SubSystem Hazard AnalysisSSMP System Safety Management PlanSSAR System Safety Assessment Report