Presentation on theme: "CIS 573 Computer Aided Verification Carl A. Gunter Fall 1999 Part 3."— Presentation transcript:
CIS 573 Computer Aided Verification Carl A. Gunter Fall 1999 Part 3
London Ambulance Service l Between October 26 and November 4 of 1992 a Computer Aided Dispatch system for the London area failed. l Key system problems included »need for near perfect input information »poor interfaces between the ambulance crews and the system »unacceptable reliability and performance of the software l Consequences are difficult to measure, but severe in some cases.
Human Factors in Safety l Case Studies »USS Vincennes »Three Mile Island l Leveson Chapters 5 and 6 l Automotive Guidelines l Leveson Chapter 17
USS Vincennes l On July 3, 1988 a US Navy Aegis cruiser shot down an airbus on a regularly-scheduled commercial flight. l Aegis is one of the Navy's most sophisticated weapon systems. l Aegis itself performed well. Human error was blamed: the captain received false reports from the tactical information coordinator. l Carlucci suggestion on user interface: put ``an arrow on [showing] whether it's ascending or descending.''
Three Mile Island l On the morning of 28 March 1979 a cascading sequence of failures caused extensive damage to the nuclear power plant at Three Mile Island near Harrisburg Pennsylvania. l Although radiation release was small, the repairs and clean-up cost 1 to 1.8 billion dollars. l Confidence in the safety of US nuclear facilities was significantly damaged as well. l Operator error was seen as a major contributing factor.
Failed Open Failed Closed Water Pump Blocks Backup Boiled Dry Operator Cuts Back Water Flow High Pressure Injection Pumps 2
Failed Open Failed ClosedBlocked Boiled Dry Saturation Let Down Activated Alarms 3
Failed Open Saturation Let Down Activated Cooling Activated Shut Off Pumps High Level of Neutrons 4
Failed Open Saturation Let Down Activated Fuel Rods Rupture Closed Water Injected Hydrogen Explosion 5
Level 2 Conditions l No training of operators for saturation in the core. l Inadequate operating procedures in place. »Failure to follow rules for PORV. »Surveillance tests not adequately verified. l Control room ill-designed. »100 alarms in 10 seconds »Key indicators poorly placed and key information not displayed clearly (example: cooling water converting to steam had to be inferred from temp and pressure). »Instruments off scale. »Printer not able to keep up.
Level 3 Root Causes l Design for controllability. l Lack of attention to human factors. l Quality assurance limited to safety-critical components. l Inadequate training. l Limited licensing procedures.
MISRA Guidelines l Requirements are very domain-specific. l Given a sufficiently narrow domain, it is possible to provide more detailed assistance in requirements determination. l We look at a set of guidelines for establishing user requirements for automotive software and translating these into software requirements. l The guideline is that of the Motor Industry Software Reliability Association in the UK.
Need for Integrity Levels l An automotive system must satisfy requirements that it not cause: »Harm to humans »Legislation to be broken »Undue traffic disruption »Damage to property or the environment (eg. emissions) »Undue financial loss to either the manufacturer or owner
Controllability Levels l Uncontrollable: Failures whose effects are not controllable by the vehicle occupants, and which are likely to lead to extremely severe outcomes. The outcome cannot be influenced by a human response. l Difficult to Control: This relates to failures whose effects are not normally controllable by the vehicle occupants but could, under favorable circumstances, be influenced by a mature human response.
Controllability Levels Continued l Debilitating: This relates to failures whose effects are usually controllable by a sensible human response and, whilst there is a reduction in safety margin, can usually be expected to lead to outcomes which are at worst severe. l Distracting: This relates to failures which produce operational limitations, but a normal human response will limit the outcome to no worse than minor. l Nuisance Only: This relates to failures where safety is not normally considered to be affected, and where customer satisfaction is the main consideration.
Initial Integrity Level l To determine an initial integrity level: »List all hazards that result from all the failures of the system. »Assess each failure mode identified in the first step to determine the controllability category. »The failure mode with the highest associated controllability category determines the integrity level of the system.
Example l Here is an attempt at an analysis of a design defect in the 1983 Nissan Stanza I used to own. (It wasn't a computer error, but a computer error might display similar behavior.) l Hazard Powertrain drive: loss of power. l Severity Factor Powertrain performance affected. l Controllability Category Debilitating l Integrity Level 2
Human Error Probabilities l Extraordinary errors 10**-5: Errors for which it is difficult to conceive how they could occur. Stress free, with powerful cues pointing to success. l Regular errors 10**-4: Errors in regularly performed, commonplace simple tasks with minimum stress. l Errors of commission 10**-3: Errors such as pressing the wrong button or reading the wrong display. Reasonably complex tasks, little time available, some cues necessary.
Human Errors Continued l Errors of Omission 10**-2: Errors where dependence is placed on situation and memory. Complex, unfamiliar task with little feedback and some distraction. l Complex Task Errors 10**-1: Errors in performing highly complex tasks under considerable stress with little time available. l Creative Task Errors 1 to 10**-1: Errors in processes that involve creative thinking, or unfamiliar, complex operations where time is short and stress is high.
Recommendations l Level 0 is ISO 9001 l Each of the remaining 4 levels carries a recommendation for process activities on software with hazards at that level. l Areas Covered »Specification and design »Languages and compilers »Configuration management »Testing »Verification and validation »Access for assessment
Specification and Design l Structured method. l Structured method supported by CASE tool. l Formal specification for the functions at this level. l Formal specification of complete system. Automated code generation (when available).
Testing l Show fitness for purpose. Test all safety requirements. Repeatable test plan. l Black box testing. l White box module testing with defined coverage. Stress testing against deadlock. Syntactic static analysis. l 100% white box module testing. 100% requirements testing. 100% integration testing. Semantic static analysis.
Verification and Validation l Show tests: are suitable; have been performed; are acceptable; exercise safety features. Traceable correction. l Structured program review. Show new new faults after corrections. l Automated static analysis. Proof (argument) of safety properties. Analysis for lack of deadlock. Justify test coverage. Show tests have been suitable. l All tools to be formally validated (when available). Proof (argument) of code against specification. Proof (argument) for lack of deadlock. Show object code reflects source code.
Access for Assessment l Requirements and acceptance criteria. QA and product plans. Training policy. System test results. l Design documents. Software test results. Training structure. l Techniques, processes, tools. Witness testing. Adequate training. Code. l Full access to all stages and processes.
Aristocracy, Democracy, and System Design l Conceptual integrity is the most important consideration in system design. l The ratio of function to conceptual complexity is the ultimate test of system design. l To achieve conceptual integrity, a design must proceed from one mind or a small group of agreeing minds. l A conceptually integrated system is faster build and to test. l Brooks
Principles of Design l Norman offers the following two principles of good design: »Provide a good conceptual model. »Make things visible. Two important techniques are: –Provide natural mappings –Provide feedback l Donald A. Norman, The Psychology of Everyday Things.
Examples of Bad Designs l Elegant doors that give no hint about whether or where to push or pull. l VCR's which provide inadequate feedback to indicate success of actions. l Telephones using too many unmemorable numerical instructions.
Examples of Good Designs l Original push-button telephones l Certain kinds of single-handle faucets providing a natural mapping to desired parameters l Apple “desk-top” computer interface
Do Humans Cause Most Accidents? l From Leveson, Chapter 5: »85% of work accidents are due to unsafe acts by humans rather than unsafe conditions »88% of all accidents are caused primarily by dangerous acts of individual workers. »60 to 80% of accidents are caused by loss of control of energies in the system
Caveats l Data may be biased or incomplete. l Positive actions are not usually recorded. l Blame may be based on assuming that operators can overcome all difficulties. l Operators intervene at the limits. l Hindsight is 20/20. l It is hard to separate operator errors from design errors.
The Human as Monitor l The task may be impossible. l The operator is dependant on the information provided. l The information is more indirect. l Failures may be silent or masked. l Little activity may result in lowered attention or over reliance.
The Human as Backup l A poorly designed interface may leave operators with diminished proficiency and increased reluctance to intervene. l Fault-intolerant systems may lead to even larger errors. l The design of the system may make it harder to manage in a crisis.
The Human as Partner l The operator may simply be assigned the tasks that the designer cannot figure out how to automate. l The remaining tasks may be complex, and new tasks such as maintenance and monitoring may be added.