Presentation is loading. Please wait.

Presentation is loading. Please wait.

Space System Development: Lessons Learned (Excerpts) Conference on Quality in the Space and Defense Industries March 14, 15, 2011 Joe Nieberding.

Similar presentations


Presentation on theme: "Space System Development: Lessons Learned (Excerpts) Conference on Quality in the Space and Defense Industries March 14, 15, 2011 Joe Nieberding."— Presentation transcript:

1 Space System Development: Lessons Learned (Excerpts) Conference on Quality in the Space and Defense Industries March 14, 15, 2011 Joe Nieberding

2 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Presenter 2 Joe Nieberding: Mr. Nieberding has over 40 years of management and technical experience in leading and participating in NASA independent review teams, and in evaluating NASA advanced space mission planning. Before retiring from NASA GRC in 2000, under his direction numerous studies were conducted during 35 years at GRC to help select transportation, propulsion, power, and communications systems for advanced NASA mission applications. His Advanced Space Analysis Office led all exploration advanced concept studies for GRC. In addition, he was a launch team member on over 65 NASA Atlas/Centaur and Titan/Centaur launches, and is a widely recognized expert in launch vehicles and advanced transportation architecture planning for space missions. Mr. Nieberding is co-founder and President of Aerospace Engineering Associates.

3 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Introduction 3 Excerpted from two day presentation aimed at assisting today’s space system developers –Explore overarching fundamental lessons derived from Many specific mishap case histories from multiple programs “Root” causes not unique to times/programs Will cover some material from the two day presentation: –A few of the detailed case histories –A summary of causes for all case histories –Example countermeasure “Rules of Practice” References given for all resource information –Lessons learned charts (yellow background) were either developed independently by Aerospace Engineering Associates(AEA) or extracted from resource information It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so. Mark Twain

4 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 4 2 Day Outline Introduction The Practice of Failure Analysis Space Mission Record of Success General Management Lessons Lessons Learned from Specific Case Histories –Screening Out Design Errors –Impact of Weak Testing Practices –Screening Out Procedural Errors –System Engineering Lapses –Mishaps Associated With Software –When Processes Break Down –Adverse Program Management Factors Can Produce Bad Outcomes –A Piece Part Failure –Not Everyone May Want the Project to Succeed –Experienced Teams make Mistakes –Normalizing Deviance –When Advanced Warnings are Missed –The Perils of Heritage

5 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 2 Day Outline (concluded) 5 Summary of Causes for the Foregoing Case Histories The Unsuccessful Failure Investigation of Atlas Centaur 70 Common Cause Failures The Human Element Applying the Lessons: Sample “Rules of Practice” One Strike and You’re Out! – Flight Termination Conclusions Politicians are like diapers; They need to be changed often and for the same reason Mark Twain

6 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 6 Historical Perspective

7 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 7 The Practice of Failure Analysis CaseEvent The Milan CathedralWall collapse The Tay Rail BridgeBridge collapse – 75 fatalities Kansa City Hyatt Regency SkywaySkyway collapse – 114 fatalities American Airlines Flight 96Separation of DC-10 aft cargo door – no fatalities Turkish Air Flight 981Separation of DC-10 aft cargo door – 346 fatalities Tacoma Narrows BridgeBridge collapse Russian R-16 ICBMPad explosion - >120 fatalities

8 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Baikonur Cosmodrome Russia, 10/24/1960 Preps for first test flight of R-16 ICBM Program rushed to launch on anniversary of Bolshevik revolution (as a present for Premier Khrushchev) Lead by head of the Soviet Ballistic Missile Forces Marshal Mitrofan Nedelin 250 people on and around pad –Viewing stand for visiting dignitaries Unsafe design and undisciplined procedures caused 2 nd stage ignition More than 120 people were killed including Nedelin 8 Historical Perspective: Prominent Failures from Across the Spectrum of Engineering Endeavors Mitrofan NedelinR-16 ICBM Destroyed Pad and Memorial at Baikonur (Tyuratam) Video Possibly The Largest Disaster in the History of Rocketry! For additional information see “Rockets and People: Creating a Rocket Industry, Volume II”, Boris Chertok, NASA History Series SP-2006-4110

9 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 9 Design Screens

10 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 10 A Quick Aside About Design Error “Screens” Design Error “Screens” Design Review Test Unexpected Behavior Design Error GIVEN: Our design “machine” (humans) WILL produce errors at some >0 rate “Engineers today, like Galileo three and a half centuries ago, are not superhuman. They make mistakes in their assumptions, in their calculations, in their conclusions. That they make mistakes is forgivable; that they catch them is imperative.” (1) (1) “To Engineer is Human”; Henry Petroski, Vintage Books, 1992

11 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 11 Selected Mishaps

12 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Genesis 12 Underlying Issue: Omitted test combined with flawed adaptation of heritage design Problem: Spacecraft failed to properly deploy drogue chute (9/8/2004) Impact: Loss of some scientific data Source: http://www.nasa.gov/pdf/149414main_Genesis_MIB.pdf; Genesis Mishap Report, Dr. M. Ryschkewitsch Chairperson, 11/30/2005; Presentation: Genesis Mishap Investigation and Stardust Entry, Dr. Mike Ryschkewitsch and Pete Spidaliere Video

13 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 13 Genesis G-Switch Orientation Acceleration to Activate Switch Aerobraking Acceleration As Installed Velocity Heatshield Pyros

14 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 14 Genesis (cont’d) WHY: Improperly oriented gravity switch sensors (inverted). Deficiencies in the following processes resulted in the mishap: −Design that inverted the G-switch sensor (a heritage design) −Design reviews did not detect the error −Verification processes did not detect the design error No tests were conducted that would reveal the problem −Red Team review did not uncover the failure in the verification process

15 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Genesis (cont’d) 15 The Board further identified ineffective systems engineering as a root cause: –Inadequate project and systems engineering management –Inadequate systems engineering processes –Inadequate review process –Unfounded confidence in heritage designs –Failure to “Test like you fly” –Better/Faster/Cheaper philosophy - quote from MIB Report: “Root Cause 6.1: Faster, Better, Cheaper (FBC) philosophy: Cost-capped mission with threat of cancellation if overrun… Findings: The project maintained the cost-cap, in part at the expense of adequate technical oversight by JPL into LMSS Flight System and at the expense of a complete and robust Systems Engineering function. The Agency was at fault for encouraging and accepting the FBC philosophy as described above.”

16 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Genesis (concluded) 16 LESSONS: Imposition of a concept (Better/Faster/Cheaper) absent sensible, practical, and reliable implementation guidance is a recipe for serious trouble Treat changed heritage designs as new designs Make it very difficult to change baselined* test plans Test like you fly – and pay attention to when you don’t Don’t let system reviews get superficial (checking the block) *Those adopted after appropriate vetting activities

17 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 17 CONTOUR Underlying Issue: Erroneous prediction of spacecraft thermal environment Problem: Spacecraft broke up following SRM firing (8/15/2002) Impact: Loss of mission

18 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC CONTOUR (cont’d) 18 Why: Spacecraft overheating caused by improper installation of a “heritage” SRM –Inadequate systems engineering process –Inappropriate reliance on analysis by similarity –Inadequate review function –Dubious decision to omit telemetry coverage of motor firing event –Inadequate oversight, insight, and review of subcontractors –Inadequate communications between APL and ATK –ATK models not specific to CONTOUR –Limited understanding of the SRM plume heating environments in space –Limited understanding of CONTOUR SRM operating conditions Source: Contour Mishap Investigation Board Report, May 31, 2003; http://klabs.org/richcontent/Reports/Failure_Reports/contour/contour.pdf

19 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 19 CONTOUR (concluded) LESSONS: Heritage designs must be re-qualified for new applications Systems engineering is absolutely vital to mission success – in this case it should have: Challenged the flawed heritage assumption Objected to the use of invalid models Insisted on a more complete understanding of SRM plume heating Involve subcontractors early in the design process They need to understand and “buy in” to how their product is integrated

20 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Ariane 5 20 Underlying Issue: Unwarranted reliance on heritage software Problem: Forty seconds into maiden Ariane-5 flight (6/4/1996), vehicle veered off course and broke-up Impact: Loss of mission Why: Flight software error –The flight software was programmed for Ariane-4 launch and trajectory conditions Didn’t account for higher horizontal velocity of Ariane-5 Caused IRU software overflow error resulting in loss of guidance information Never tested in conditions that simulated the Ariane-5 trajectory Source: I-Shih Chang, Space Launch Reliability- http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html Video

21 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 21 Ariane 5 (concluded) LESSONS: Technical experts need to push back against baseless management directives Be very thorough in justifying dependence on previous “heritage” hardware or software development/testing Have the decision to accept “heritage” verifications examined in an IV&V mode Test like you fly and fly like you test

22 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Lewis Spacecraft 22 Underlying Issue: Misapplication of heritage system Problem: Spacecraft tumbled out of control 8/26/1997 Impact: Loss of spacecraft

23 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 23 Lewis Spacecraft (cont’d) Why: Proximate Cause - Inoperable ACS safe mode –Spacecraft had multiple anomalies during initial operations Contact lost for two orbits Reappeared in uncontrolled attitude mode Commanded to “safe mode” –“Safe mode” adopted from Total Ozone Mapping Spacecraft Inherently unstable in Lewis application (no X-axis gyro) –In spite of serious “cause unknown” anomalies, operations crew entered rest period X-axis rates due to thruster imbalances ‒ Rates transferred to Y and Z axes (Polhode Motion) ‒ Computer shuts down excessive thruster firings ‒ Spacecraft rates transferred to principal moment of inertia axis ‒ Edge on to Sun - battery discharged ~ 72% ‒ Attempt to recover was flawed and failed ‒ Spacecraft went out of contact and was never reacquired –Only one crew conducted all on-orbit operations (One 12 hour shift/day) No crew on duty during significant periods when spacecraft in view of ground station Source: Lewis Spacecraft Mission Failure Investigation Board Final Report, February 12, 1998 http://www.lr.tudelft.nl/live/pagina.jsp?id=a8b6dca2-92dc-4965-a64c-298189e5b58e&lang=en&binary=/doc/lewis_document.pdf Polhode Motion Safe Mode X Axis Spin

24 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 24 Lewis Spacecraft (cont’d) Root Causes: –No mutual contractor/government understanding as to what is meant by “Better/Faster/Cheaper” leading to: Requirements changes without adequate resource adjustment Undue cost and schedule pressures Inadequate ground station availability for initial operations Frequent key personnel changes Inadequate engineering discipline Inadequate management discipline Active NASA oversight and management absent –Senior management imposition of an ill-defined concept (Better/Faster/Cheaper)* *While the BFC thrust was abandoned after multiple disappointing outcomes, vestiges (both good and bad) remain.

25 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Lewis Spacecraft (cont’d) 25 LESSONS: With respect to the proximate cause: “Heritage” hardware/software is often a trap Flag any proposed use of heritage designs for special attention Challenge applicability and understand its qualification history Make certain that the true heritage (especially the limitations) is fully understood Even presumably qualified heritage items need to be functionally tested in the way they will fly!

26 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Lewis Spacecraft (concluded) 26 LESSONS: (concluded) With respect to the root causes: Imposition of a concept (Better/Faster/Cheaper) absent sensible, practical, and reliable implementation guidance is a recipe for serious trouble Take great care to select qualified people to run a program - when it’s clear they’re not right for the job, replace them

27 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 27 Causation Summary

28 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 28 23% 69% Design Prod/Ops Pgm Mgt 8% 51% 41% Sys Engr Prod/Ops 8% Pgm Mgt Causation Analysis – Breakdown by Category

29 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 29 Observations Only one of the 39 cases analyzed (Atlas Centaur 24) had failure of a proper part as the cause! –Programs doing good job of acceptance testing The other 38 were associated with human error: management weaknesses, systems engineering shortcomings, etc. Therefore, it is necessary that risk assessments be based on data that somehow reflects human error Facts are stubborn things, but statistics are pliable. Mark Twain

30 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 30 Observations (concluded) Programs that adopt a zero-based approach to testing are betting on the ability of the engineering community to foresee all aspects of system performance under all conditions –This is a very risky bet! History demonstrates that tests frequently, if not usually, produce unexpected (and unwanted) results

31 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Applying the Lessons: A Sample Set of “Rules of Practice” 31

32 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Applying the Lessons: A Sample Set of “Rules of Practice” Issue: Many lessons learned have common themes. The issue is to systematically infuse this knowledge into programs so they’re not lessons forgotten One approach: For large and complex programs, impose a Program specific set of overarching “Rules of Practice” that govern how certain things are to be done (i.e. to codify some of the lessons) −Any deviation from these “Rules” would be cause for special attention (risk management) by Program Management −These ad hoc “Rules” would not take the place of existing design standards or similar tools, but rather provide an additional mechanism to flag when special action is warranted 32

33 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 33 Applying the Lessons: A Sample Set of “Rules of Practice” (cont’d) Advance Warning: (Causal in 17 of 39 cases) −An effective system for facilitating communication between those concerned about a potential safety-of-flight problem and those in a position to reconcile it is to be designed and embedded in the Program culture (easier said than done - but surely it’s doable!). It must be: Formal and visible. Reliable (if not foolproof). Simple to use with quick feedback. Plugged into real authority to stop the action. Culturally valued and respected.

34 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 34 Applying the Lessons: A Sample Set of “Rules of Practice” (cont’d) Analytical Modeling: (Causal in 12 of 39 ) −All analytical modeling on which designs are based will be test- validated and acquired from at least two independent sources. −An independently validated plume heating analysis is required of all systems employing a new propulsion arrangement. Heritage Items: (Contributing cause in 12 of 39 cases) −Any item adopted for use based on successful flight performance in another program will be deemed unqualified in the adopting application until a thorough analysis has been performed to confirm that the adopting application is identical (or less demanding) in all relevant features to the prior successful application. −Any deviations must be qualified by test.

35 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Applying the Lessons: A Sample Set of “Rules of Practice” (cont’d) 35 Software: (Causal in 6 of 39 cases: Ariane 501, Titan IVB-32, SOHO, MCO, MPL, DART) −All software development, testing, and application processes will be controlled by a single formal, and configuration managed Software Management Plan for which a single individual is responsible. Testing provided for in this plan will specifically include: –Demonstration of proper flight software operation in nominal and off nominal flight simulation functional testing; this will be done with flight hardware to the greatest extent possible. –Formal “qualification” and “acceptance” testing of flight critical software “end items” prior to controlled “release” for use. The plan will also provide for periodic, independent verification that the original requirements remain valid.

36 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 36 Applying the Lessons: A Sample Set of “Rules of Practice” (concluded) General Engineering Management Practices: Certain practices will constitute required standard operating procedures: −Rationale Documentation: It will be mandatory to systematically record the rationale associated with all engineering products such as design and operational requirements, procedures, test parameters, processes, design choices, specifications, etc., and to place the rationale as close to the item it relates to as possible. −Assumptions: All assumptions that form the foundation for engineering activities (analyses, test or not-to-test decisions, trade studies, design approaches, etc.) will be explicitly stated and documented. A process for validating, and periodically revalidating, the assumptions will be initiated. Etc. (This is a sampling – not an all inclusive list. Certainly, Project specific “Rules” are also appropriate.)

37 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC The Message 37 Some may say that the foregoing rules are rather boring - Nothing earthshaking - all pretty routine Rigorous implementation and infusion of quality into all aspects of routine, common sense practices will prevent most mission failures It’s really not rocket science! But that’s exactly the point!

38 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC Conclusions 38

39 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 39 Conclusions – Stuff Happens Most mishaps can be broadly attributed to human error, not rocket science –Lack of complete understanding of how complex systems interact with each other –Inadequate attention to every detail –Flawed analyses or tests –Improper use of “heritage” systems –Flawed processes –Flawed understanding of how software fails –Reaction to budget or schedule pressure –Imperfect management Often, a complex, subtle, sequence of events is needed –If just one event in the chain were prevented, the failure would not have happened Must ensure quality in all the above areas Essential for mission success Over decades, the same root causes of failures appear repeatedly There are few new ones!

40 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 40 Conclusions – About Learning From Past Incidents Sometimes we do, but the process is haphazard Those involved learn what to do and/or what not to do –But eventually they disappear taking with them: The nuances of causation Factors omitted from the official record The lessons themselves (often) and their underlying rationale –Mishap Reports and Lessons Learned Data Bases (which have come a long way) are what’s left but: Relevant information may be missing They lack the live element (the passion) and, Nothing beats talking to those who “were there”

41 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC 41 Conclusions (cont’d) Basically, there is no universally successful approach to learning the lessons from the past What’s needed is a dependable process that: –Uncovers root causation from those involved and/or the documentation –Develops and promulgates “Rules of Practice” as countermeasures Organizations desiring to profit from applying lessons previously learned should develop their own tailored approaches –Should be included in the Project Plan In the end, lessons are still best learned as a “contact sport”

42 © 2006 All Rights Reserved. Aerospace Engineering Associates LLC MISSION AEA’s mission is to leverage the vital lessons learned by NASA’s spacefaring pioneers to strengthen the skills of today’s aerospace explorers. P. O. Box 40448 Bay Village OH 44140 www.aea-llc.com Joe Nieberding, President Email: joenieber@sbcglobal.net Cell: 440-503-4758 Larry Ross, CEO Email: ljross1@att.net Cell: 440-227-7240


Download ppt "Space System Development: Lessons Learned (Excerpts) Conference on Quality in the Space and Defense Industries March 14, 15, 2011 Joe Nieberding."

Similar presentations


Ads by Google