3 Dietrich Dorner’s Statement of the Logic of Failure What is the Logic of failure? Thought patterns that, while appropriate to a simpler world, are disastrous for a complex one “An individual’s reality model..as a rule will be both incomplete and wrong. People are most inclined to insist that they are right when they are wrong and when they are beset by uncertainty. The ability to admit ignorance or mistaken assumptions is indeed a sign of wisdom, and most individuals in the thick of complex situations are not, or not yet, wise.” (Dorner p. 42) Decision Making Errors –Goal Definition Failures Not recognizing conflicts in goals Not understanding priorities of goals –Problem recognition Failures Over-simplification of the situation Assuming lack of obvious problems meant all was well. –Problem solution Failures Reliance on “proven solutions.” that don’t apply to the current problem Focusing on solving an immediate crisis at the expense of long term goals Not considering what to keep when deciding on what to fix. Focusing on pet projects and forgetting what is important.
4 Dorner Error Avoidance Guidelines Summary State goals clearly. Establish priorities, but realize they may change as the situation does. Form a model of the system. Gather information... but not too much. Don’t excessively abstract. Remember common sense. Analyze errors and draw conclusions from them. Change your thinking and behavior according to that feedback. Based on summary by Clive Rowe, http://www.cliverowe.com/blog/2010/09/07/the-logic-of-failure/
5 Mapping of Decision Making Errors to Avoidance Guidelines
6 Unsuccessful IT Project Causes Failure to meet requirements (product) –Organizational changes needed to realize benefits –Inadequately specified deliverables –Overly ambitious scope –Technology risks Late or over budget (process) –Impact of changes not recognized (technology, requirements, business case, actors) –Overly optimistic predictions of activity durations –Resources not available when required –Inadequate accountability and status tracking Brenda Whittaker, “What went wrong? Unsuccessful information technology projects”, Information Management & Computer Security 7/1  23-29
7 DoD Reliability and Maintainability (R&M) Problems GAO: Persistently low readiness rates and costly maintenance contribute to increases in the total ownership cost of DoD systems DoD: 80% of systems acquired between 1995 and 2000 failed to meet R&M objectives –requirements focused on technical performance, with little attention to operations and support (O&S) costs and readiness, [Defining Goals] –immature technologies hindered the ability to design weapon systems with high reliability; [Immediate vs. long term] –limited collaboration among organizations charged with requirements setting, product development, and maintenance. –Inadequate attention to data collection and analysis DOD GUIDE FOR ACHIEVING RELIABILITY, AVAILABILITY, AND MAINTAINABILITY, AUGUST 3, 2005, http://www.acq.osd.mil/dte/docs/RAM_Guide_080305.pdf
8 Mapping of DoD R&M Problems to Dorner Decision Making Errors
9 Systems Reliability Approach (GEIA STD-009) Understand Customer/User Requirements and Constraints Design and Redesign for Reliability Verification: Produce Reliable Systems and Products Fielding: Monitor and Assess User Reliability ANSI/GEIA-STD-0009, "Reliability Program Standard for Systems Design, Development, and Manufacturing”, available from Information Technology Association of America (ITAA) http://www.techstreet.com/itaagate.tmpl or http://webstore.ansi.org.
12 Mapping Unsuccessful IT Projects to Dorner Logic of Failures
13 Mapping of Dorner Logic of Failure to Systems Reliability Approach (GEIA STD 009)
14 Mapping of Unsuccessful IT approach to Systems Reliability Approach (GEIA STD 009)
15 Question IF Unsuccessful IT Projects can be mapped to Dorner Logic of Failures AND Reliability Systems Approach (GEIA STD 009) can be mapped to Dorner Logic of Failures THEN Why is it so hard to prevent the Logic of Failure?
16 Errors in Defining Goals Defining basic parameters –Example: failure rate (failures per unit time): MIL STD 721C has 11 definitions related to failures and more than 15 different definitions of time Choosing Attributes –Reliability (probability of success over a definite time interval) – important to an airplane traveler –Availability (probability of being operational) – important to a network or service user –Probability of success upon demand (e.g., actuation) – important to a safety system Formulating verifiable requirements (counterexamples below) –No single point of failure –No failure propagation –No hazardous failure modes
17 Errors in Problem Recognition* System failures attributable to oversimplification or spuriously assuming all was well –Switchover Failures Ariane 5 (June 1996, ref. 1) Phobos Grunt (2012) –Capacity RIM (November 2011 – service interruption) Walmart On-line web site (November, 2006 – Black Friday) –Maintenance Juniper Networks (December 2011) –Data breaches 18 million health records breached in 2011, *See references chart at end of this briefing for more details
18 Errors in Problem Solution Reliance on “proven solutions” –Hardware only solutions assuming software does not fail Focusing on solving an immediate problem at the expense of long term goals –Achieving greater reliability or reducing sustainment costs are immediate costs (problems) but the benefit comes over many years Not considering what to keep when deciding on what to fix –Discarding test, verification, and analysis to solve a budget or schedule problem Focusing on pet projects and forgetting what is important –Consequences of ignoring reliability and dependability: Market failure, Legal liability, Mission Failure, Organizational failure
19 References ARIANE 5, Flight 501 Failure, Report by the Inquiry Board, Prof. J. L. LIONS, Chairman,1996, available online at www.di.unito.it/~damiani/ariane5rep.htmlwww.di.unito.it/~damiani/ariane5rep.html Russia: Computer crash doomed Phobos-Grunt, Spaceflight now, Feb. 6, 2012, available online at http://www.spaceflightnow.com/news/n1202/06phobosgrunt/http://www.spaceflightnow.com/news/n1202/06phobosgrunt/ Evan Suman, “Black Friday Turns Servers Dark at Wal-Mart, Macy's”, e-Week, Nov. 25, 2006, http://www.eweek.com/print_article2/0,1217,a=194801,00.asphttp://www.eweek.com/print_article2/0,1217,a=194801,00.asp Jim Duffy, Juniper at the root of Internet outage?, Network World, November 07, 2011, available online at http://www.networkworld.com/news/2011/110711-internet- outage-252851.htmlhttp://www.networkworld.com/news/2011/110711-internet- outage-252851.html Nicholas Kolakowski, RIM BlackBerry Outage Hits Users Around the World, e-Week, October 10, 2011, available online at http://www.eweek.com/c/a/Mobile-and- Wireless/RIM-BlackBerry-Outage-Hits-Users-Around-the-World-781395/http://www.eweek.com/c/a/Mobile-and- Wireless/RIM-BlackBerry-Outage-Hits-Users-Around-the-World-781395/ U.S. Dept. of Health and Human Services, Health Information Privacy web site, http://www.hhs.gov/ocr/privacy/hipaa/administrative/breachnotificationrule/postedbrea ches.html