We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJavier Slay
Modified about 1 year ago
Space System Development: Lessons Learned (Excerpts) Conference on Quality in the Space and Defense Industries March 14, 15, 2011 Joe Nieberding
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Presenter 2 Joe Nieberding: Mr. Nieberding has over 40 years of management and technical experience in leading and participating in NASA independent review teams, and in evaluating NASA advanced space mission planning. Before retiring from NASA GRC in 2000, under his direction numerous studies were conducted during 35 years at GRC to help select transportation, propulsion, power, and communications systems for advanced NASA mission applications. His Advanced Space Analysis Office led all exploration advanced concept studies for GRC. In addition, he was a launch team member on over 65 NASA Atlas/Centaur and Titan/Centaur launches, and is a widely recognized expert in launch vehicles and advanced transportation architecture planning for space missions. Mr. Nieberding is co-founder and President of Aerospace Engineering Associates.
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Introduction 3 Excerpted from two day presentation aimed at assisting today’s space system developers –Explore overarching fundamental lessons derived from Many specific mishap case histories from multiple programs “Root” causes not unique to times/programs Will cover some material from the two day presentation: –A few of the detailed case histories –A summary of causes for all case histories –Example countermeasure “Rules of Practice” References given for all resource information –Lessons learned charts (yellow background) were either developed independently by Aerospace Engineering Associates(AEA) or extracted from resource information It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so. Mark Twain
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 4 2 Day Outline Introduction The Practice of Failure Analysis Space Mission Record of Success General Management Lessons Lessons Learned from Specific Case Histories –Screening Out Design Errors –Impact of Weak Testing Practices –Screening Out Procedural Errors –System Engineering Lapses –Mishaps Associated With Software –When Processes Break Down –Adverse Program Management Factors Can Produce Bad Outcomes –A Piece Part Failure –Not Everyone May Want the Project to Succeed –Experienced Teams make Mistakes –Normalizing Deviance –When Advanced Warnings are Missed –The Perils of Heritage
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 2 Day Outline (concluded) 5 Summary of Causes for the Foregoing Case Histories The Unsuccessful Failure Investigation of Atlas Centaur 70 Common Cause Failures The Human Element Applying the Lessons: Sample “Rules of Practice” One Strike and You’re Out! – Flight Termination Conclusions Politicians are like diapers; They need to be changed often and for the same reason Mark Twain
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 6 Historical Perspective
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 7 The Practice of Failure Analysis CaseEvent The Milan CathedralWall collapse The Tay Rail BridgeBridge collapse – 75 fatalities Kansa City Hyatt Regency SkywaySkyway collapse – 114 fatalities American Airlines Flight 96Separation of DC-10 aft cargo door – no fatalities Turkish Air Flight 981Separation of DC-10 aft cargo door – 346 fatalities Tacoma Narrows BridgeBridge collapse Russian R-16 ICBMPad explosion - >120 fatalities
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Baikonur Cosmodrome Russia, 10/24/1960 Preps for first test flight of R-16 ICBM Program rushed to launch on anniversary of Bolshevik revolution (as a present for Premier Khrushchev) Lead by head of the Soviet Ballistic Missile Forces Marshal Mitrofan Nedelin 250 people on and around pad –Viewing stand for visiting dignitaries Unsafe design and undisciplined procedures caused 2 nd stage ignition More than 120 people were killed including Nedelin 8 Historical Perspective: Prominent Failures from Across the Spectrum of Engineering Endeavors Mitrofan NedelinR-16 ICBM Destroyed Pad and Memorial at Baikonur (Tyuratam) Video Possibly The Largest Disaster in the History of Rocketry! For additional information see “Rockets and People: Creating a Rocket Industry, Volume II”, Boris Chertok, NASA History Series SP
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 9 Design Screens
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 10 A Quick Aside About Design Error “Screens” Design Error “Screens” Design Review Test Unexpected Behavior Design Error GIVEN: Our design “machine” (humans) WILL produce errors at some >0 rate “Engineers today, like Galileo three and a half centuries ago, are not superhuman. They make mistakes in their assumptions, in their calculations, in their conclusions. That they make mistakes is forgivable; that they catch them is imperative.” (1) (1) “To Engineer is Human”; Henry Petroski, Vintage Books, 1992
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 11 Selected Mishaps
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Genesis 12 Underlying Issue: Omitted test combined with flawed adaptation of heritage design Problem: Spacecraft failed to properly deploy drogue chute (9/8/2004) Impact: Loss of some scientific data Source: Genesis Mishap Report, Dr. M. Ryschkewitsch Chairperson, 11/30/2005; Presentation: Genesis Mishap Investigation and Stardust Entry, Dr. Mike Ryschkewitsch and Pete Spidaliere Video
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 13 Genesis G-Switch Orientation Acceleration to Activate Switch Aerobraking Acceleration As Installed Velocity Heatshield Pyros
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 14 Genesis (cont’d) WHY: Improperly oriented gravity switch sensors (inverted). Deficiencies in the following processes resulted in the mishap: −Design that inverted the G-switch sensor (a heritage design) −Design reviews did not detect the error −Verification processes did not detect the design error No tests were conducted that would reveal the problem −Red Team review did not uncover the failure in the verification process
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Genesis (cont’d) 15 The Board further identified ineffective systems engineering as a root cause: –Inadequate project and systems engineering management –Inadequate systems engineering processes –Inadequate review process –Unfounded confidence in heritage designs –Failure to “Test like you fly” –Better/Faster/Cheaper philosophy - quote from MIB Report: “Root Cause 6.1: Faster, Better, Cheaper (FBC) philosophy: Cost-capped mission with threat of cancellation if overrun… Findings: The project maintained the cost-cap, in part at the expense of adequate technical oversight by JPL into LMSS Flight System and at the expense of a complete and robust Systems Engineering function. The Agency was at fault for encouraging and accepting the FBC philosophy as described above.”
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Genesis (concluded) 16 LESSONS: Imposition of a concept (Better/Faster/Cheaper) absent sensible, practical, and reliable implementation guidance is a recipe for serious trouble Treat changed heritage designs as new designs Make it very difficult to change baselined* test plans Test like you fly – and pay attention to when you don’t Don’t let system reviews get superficial (checking the block) *Those adopted after appropriate vetting activities
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 17 CONTOUR Underlying Issue: Erroneous prediction of spacecraft thermal environment Problem: Spacecraft broke up following SRM firing (8/15/2002) Impact: Loss of mission
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC CONTOUR (cont’d) 18 Why: Spacecraft overheating caused by improper installation of a “heritage” SRM –Inadequate systems engineering process –Inappropriate reliance on analysis by similarity –Inadequate review function –Dubious decision to omit telemetry coverage of motor firing event –Inadequate oversight, insight, and review of subcontractors –Inadequate communications between APL and ATK –ATK models not specific to CONTOUR –Limited understanding of the SRM plume heating environments in space –Limited understanding of CONTOUR SRM operating conditions Source: Contour Mishap Investigation Board Report, May 31, 2003;
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 19 CONTOUR (concluded) LESSONS: Heritage designs must be re-qualified for new applications Systems engineering is absolutely vital to mission success – in this case it should have: Challenged the flawed heritage assumption Objected to the use of invalid models Insisted on a more complete understanding of SRM plume heating Involve subcontractors early in the design process They need to understand and “buy in” to how their product is integrated
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Ariane 5 20 Underlying Issue: Unwarranted reliance on heritage software Problem: Forty seconds into maiden Ariane-5 flight (6/4/1996), vehicle veered off course and broke-up Impact: Loss of mission Why: Flight software error –The flight software was programmed for Ariane-4 launch and trajectory conditions Didn’t account for higher horizontal velocity of Ariane-5 Caused IRU software overflow error resulting in loss of guidance information Never tested in conditions that simulated the Ariane-5 trajectory Source: I-Shih Chang, Space Launch Reliability- Video
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 21 Ariane 5 (concluded) LESSONS: Technical experts need to push back against baseless management directives Be very thorough in justifying dependence on previous “heritage” hardware or software development/testing Have the decision to accept “heritage” verifications examined in an IV&V mode Test like you fly and fly like you test
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Lewis Spacecraft 22 Underlying Issue: Misapplication of heritage system Problem: Spacecraft tumbled out of control 8/26/1997 Impact: Loss of spacecraft
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 23 Lewis Spacecraft (cont’d) Why: Proximate Cause - Inoperable ACS safe mode –Spacecraft had multiple anomalies during initial operations Contact lost for two orbits Reappeared in uncontrolled attitude mode Commanded to “safe mode” –“Safe mode” adopted from Total Ozone Mapping Spacecraft Inherently unstable in Lewis application (no X-axis gyro) –In spite of serious “cause unknown” anomalies, operations crew entered rest period X-axis rates due to thruster imbalances ‒ Rates transferred to Y and Z axes (Polhode Motion) ‒ Computer shuts down excessive thruster firings ‒ Spacecraft rates transferred to principal moment of inertia axis ‒ Edge on to Sun - battery discharged ~ 72% ‒ Attempt to recover was flawed and failed ‒ Spacecraft went out of contact and was never reacquired –Only one crew conducted all on-orbit operations (One 12 hour shift/day) No crew on duty during significant periods when spacecraft in view of ground station Source: Lewis Spacecraft Mission Failure Investigation Board Final Report, February 12, Polhode Motion Safe Mode X Axis Spin
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 24 Lewis Spacecraft (cont’d) Root Causes: –No mutual contractor/government understanding as to what is meant by “Better/Faster/Cheaper” leading to: Requirements changes without adequate resource adjustment Undue cost and schedule pressures Inadequate ground station availability for initial operations Frequent key personnel changes Inadequate engineering discipline Inadequate management discipline Active NASA oversight and management absent –Senior management imposition of an ill-defined concept (Better/Faster/Cheaper)* *While the BFC thrust was abandoned after multiple disappointing outcomes, vestiges (both good and bad) remain.
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Lewis Spacecraft (cont’d) 25 LESSONS: With respect to the proximate cause: “Heritage” hardware/software is often a trap Flag any proposed use of heritage designs for special attention Challenge applicability and understand its qualification history Make certain that the true heritage (especially the limitations) is fully understood Even presumably qualified heritage items need to be functionally tested in the way they will fly!
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Lewis Spacecraft (concluded) 26 LESSONS: (concluded) With respect to the root causes: Imposition of a concept (Better/Faster/Cheaper) absent sensible, practical, and reliable implementation guidance is a recipe for serious trouble Take great care to select qualified people to run a program - when it’s clear they’re not right for the job, replace them
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 27 Causation Summary
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 28 23% 69% Design Prod/Ops Pgm Mgt 8% 51% 41% Sys Engr Prod/Ops 8% Pgm Mgt Causation Analysis – Breakdown by Category
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 29 Observations Only one of the 39 cases analyzed (Atlas Centaur 24) had failure of a proper part as the cause! –Programs doing good job of acceptance testing The other 38 were associated with human error: management weaknesses, systems engineering shortcomings, etc. Therefore, it is necessary that risk assessments be based on data that somehow reflects human error Facts are stubborn things, but statistics are pliable. Mark Twain
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 30 Observations (concluded) Programs that adopt a zero-based approach to testing are betting on the ability of the engineering community to foresee all aspects of system performance under all conditions –This is a very risky bet! History demonstrates that tests frequently, if not usually, produce unexpected (and unwanted) results
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Applying the Lessons: A Sample Set of “Rules of Practice” 31
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Applying the Lessons: A Sample Set of “Rules of Practice” Issue: Many lessons learned have common themes. The issue is to systematically infuse this knowledge into programs so they’re not lessons forgotten One approach: For large and complex programs, impose a Program specific set of overarching “Rules of Practice” that govern how certain things are to be done (i.e. to codify some of the lessons) −Any deviation from these “Rules” would be cause for special attention (risk management) by Program Management −These ad hoc “Rules” would not take the place of existing design standards or similar tools, but rather provide an additional mechanism to flag when special action is warranted 32
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 33 Applying the Lessons: A Sample Set of “Rules of Practice” (cont’d) Advance Warning: (Causal in 17 of 39 cases) −An effective system for facilitating communication between those concerned about a potential safety-of-flight problem and those in a position to reconcile it is to be designed and embedded in the Program culture (easier said than done - but surely it’s doable!). It must be: Formal and visible. Reliable (if not foolproof). Simple to use with quick feedback. Plugged into real authority to stop the action. Culturally valued and respected.
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 34 Applying the Lessons: A Sample Set of “Rules of Practice” (cont’d) Analytical Modeling: (Causal in 12 of 39 ) −All analytical modeling on which designs are based will be test- validated and acquired from at least two independent sources. −An independently validated plume heating analysis is required of all systems employing a new propulsion arrangement. Heritage Items: (Contributing cause in 12 of 39 cases) −Any item adopted for use based on successful flight performance in another program will be deemed unqualified in the adopting application until a thorough analysis has been performed to confirm that the adopting application is identical (or less demanding) in all relevant features to the prior successful application. −Any deviations must be qualified by test.
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Applying the Lessons: A Sample Set of “Rules of Practice” (cont’d) 35 Software: (Causal in 6 of 39 cases: Ariane 501, Titan IVB-32, SOHO, MCO, MPL, DART) −All software development, testing, and application processes will be controlled by a single formal, and configuration managed Software Management Plan for which a single individual is responsible. Testing provided for in this plan will specifically include: –Demonstration of proper flight software operation in nominal and off nominal flight simulation functional testing; this will be done with flight hardware to the greatest extent possible. –Formal “qualification” and “acceptance” testing of flight critical software “end items” prior to controlled “release” for use. The plan will also provide for periodic, independent verification that the original requirements remain valid.
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 36 Applying the Lessons: A Sample Set of “Rules of Practice” (concluded) General Engineering Management Practices: Certain practices will constitute required standard operating procedures: −Rationale Documentation: It will be mandatory to systematically record the rationale associated with all engineering products such as design and operational requirements, procedures, test parameters, processes, design choices, specifications, etc., and to place the rationale as close to the item it relates to as possible. −Assumptions: All assumptions that form the foundation for engineering activities (analyses, test or not-to-test decisions, trade studies, design approaches, etc.) will be explicitly stated and documented. A process for validating, and periodically revalidating, the assumptions will be initiated. Etc. (This is a sampling – not an all inclusive list. Certainly, Project specific “Rules” are also appropriate.)
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC The Message 37 Some may say that the foregoing rules are rather boring - Nothing earthshaking - all pretty routine Rigorous implementation and infusion of quality into all aspects of routine, common sense practices will prevent most mission failures It’s really not rocket science! But that’s exactly the point!
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC Conclusions 38
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 39 Conclusions – Stuff Happens Most mishaps can be broadly attributed to human error, not rocket science –Lack of complete understanding of how complex systems interact with each other –Inadequate attention to every detail –Flawed analyses or tests –Improper use of “heritage” systems –Flawed processes –Flawed understanding of how software fails –Reaction to budget or schedule pressure –Imperfect management Often, a complex, subtle, sequence of events is needed –If just one event in the chain were prevented, the failure would not have happened Must ensure quality in all the above areas Essential for mission success Over decades, the same root causes of failures appear repeatedly There are few new ones!
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 40 Conclusions – About Learning From Past Incidents Sometimes we do, but the process is haphazard Those involved learn what to do and/or what not to do –But eventually they disappear taking with them: The nuances of causation Factors omitted from the official record The lessons themselves (often) and their underlying rationale –Mishap Reports and Lessons Learned Data Bases (which have come a long way) are what’s left but: Relevant information may be missing They lack the live element (the passion) and, Nothing beats talking to those who “were there”
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC 41 Conclusions (cont’d) Basically, there is no universally successful approach to learning the lessons from the past What’s needed is a dependable process that: –Uncovers root causation from those involved and/or the documentation –Develops and promulgates “Rules of Practice” as countermeasures Organizations desiring to profit from applying lessons previously learned should develop their own tailored approaches –Should be included in the Project Plan In the end, lessons are still best learned as a “contact sport”
© 2006 All Rights Reserved. Aerospace Engineering Associates LLC MISSION AEA’s mission is to leverage the vital lessons learned by NASA’s spacefaring pioneers to strengthen the skills of today’s aerospace explorers. P. O. Box Bay Village OH Joe Nieberding, President Cell: Larry Ross, CEO Cell:
PLANNING THE AUDIT Individual audits must be properly planned to ensure: Appropriate and sufficient evidence is obtained to support the auditors opinion;
CM30072 Safety-Critical Systems Department of Computer Science 2004.
PwC Rogue Trading How to successfully manage this risk Informational presentation for our clients February 2008 Strictly private and confidential *connectedthinking.
Quality Tools and Techniques in the School and Classroom.
Project Management Dr. Anbang Qi Prof. of International Business School of Nankai University.
Chapter 12 Technology. INTRODUCTION This chapter considers technology in general, with some limited emphasis on software. The life cycle and software.
Berling Associates, Inc. 1 T.E.A.M. EFFORT A Primer on Process Management (A View From The Improvements Team Perspective)
1 Systems Engineering A Way of Thinking A Way of Doing Business Enabling Organized Transition from Need to Product August 1997 Systems Engineering Technical.
CREATING A SAFETY PROGRAM for YOUR SMALL BUSINESS HCA.
Guidelines For Site Management Approaches Floyd Homer WCPA-Caribbean & SUSTRUST.
Direct Time study: Selecting and timing the job First step in time study is to select the job to be studied. There is always a reason why a particular.
1 GREY BOX TESTING Web Apps & Networking Session 10 Boris Grinberg
Software Development QA Best Practices May 20, 2010 Suzette Hackl, CSM Senior Project Manager Skyline Technologies, Inc.
MFG Assessment Application: Assessment Criteria and Metrics 1 Performance assessment criteria and metrics may be used as the basis for determining the.
1 Welcome The purpose of this presentation is to introduce you to the principles and methods of writing and presenting effective recommendations to improve.
The Project Cycle Management Course presented by Simon Pluess World Alliance of YMCAs.
Risk Communication toward nuclear consensus building Naoki YAMANO Research Institute of Nuclear Engineering, University of Fukui, Japan Presented to Sharing.
Introduction to Project Management session 1. Project management Over the course we will look at: Projects and their features. The project Life Cycle,
IP Audit "We're in an object-oriented, outsourced, and open-sourced world, and organizations are anxious to take steps to ensure that the software they.
Organization: Overview of Core Frameworks Local Training Module For First-year Associates Associate Handbook.
1 Lessons From the Columbia Disaster Safety & Organizational Culture 2005 © American Institute of Chemical Engineers Presentation Rev_newv4_final as of.
Learning Objectives 8.1 Discuss the managers role in human resource management as it regards staffing, training, and employee performance appraisal. 8.2.
Introduction New Form Stage 1 Stage 2 Stage 3 Feedback Conversation Career Development SMART Goals Competency Framework Documents There are also links.
© 2001 Richard M. Harwell All Rights Reserved Product Development – Precepts for Success March 2001 Rich Harwell SYSTEM Perspectives Suite 401-#
Final Report – November 3, 2003 Organization of American States Management Study of the Operations of the General Secretariat Part I – Executive Summary.
Competence is the demonstrated ability to apply knowledge and/or skills and, where relevant, personal attributes. A certification scheme contains.
Coaching: Tapping Into Your Employees Potential. 2 Objectives After this workshop you will be able to: Set the groundwork for productive coaching sessions.
Chapter - 5 Understanding Requirements Unit II. Introduction Definition : “The broad spectrum of tasks and techniques that lead to an understanding of.
PhDs in Computer Science (FAIRS09) Frans Coenen Monday 14 December 2009 Department of Computer Science The University of Liverpool
© 2016 SlidePlayer.com Inc. All rights reserved.