Safety-Critical Systems 2 Requirement Engineering T Spring 2008 Ilkka Herttua
Critical Applications Computer based systems used in transportation, chemical process and nuclear power plants. A failure in the system endangers human lives directly or through environment pollution. Also preferable approach for systems, which have large scale economic influence. (telecom, space)
Examples of computer failures in critical systems
Safety Context Diagram HUMANPROCESS SYSTEM - Hardware - Software - Technology - Operating Rules - Physical Facts - Designing - Operating
Current situation / critical systems Based on the data on recent failures of critical systems, the following can be concluded: a)Failures become more and more distributed and often nation-wide (e.g. air traffic control and commercial systems like credit card denial of authorisation) b)The source of failure is more rarely in hardware (physical faults), and more frequently in system design or end-user operation / interaction (software). c)The harm caused by failures is mostly economical, but sometimes health and safety concerns are also involved. d)Failures can impact many different aspects of dependability (dependability = ability to deliver service that can justifiably be trusted).
Safety Definition Safety: Safety is a property of a system that it will not endanger human life or the environment. Safety-Critical System: A system that is intended to achieve, on its own, the necessary level of safety integrity for the implementation of the required safety functions.
V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis Requirements Model Test Scenarios Software Implementation & Unit Test Software Design Requirements Document Systems Analysis & Design Functional / Architechural - Model Specification Document Knowledge Base * * Configuration controlled Knowledge that is increasing in Understanding until Completion of the System: Requirements Documentation Requirements Traceability Model Data/Parameters Test Definition/Vectors
Overall safety lifecycle
Developing safety-related systems To achieve safety: 1. safety requirements (avoid possible hazards, risks) 2. quality management (follow up process) 3. design / system architecture (reliability) 4. defined design/manufacture processes 5. certification and approval processes (testing, proving) 6. known behaviour of the system in all conditions (modelling, formal verification)
1. Define the Problem Context Understanding the whole context –The problem context, and –The problem Setting the boundary –The application domain –The system –Their boundary Describing the context –Traditional context diagrams –The importance of showing the whole domain
Track vacan- cy proving ATP/ATOPoints Level crossings RadioSignals User specific objects Train ERTMS/ETCS Line block INTER- LOCKIN G X Data. Prep. system Installation rules and track layout MaintenanceEnvironmentHuman National rules Power source Traffic control system Diagnostics system Boundary 2 RBC Power Supply Route Setting Control Boundary 3 Locking Object Controller EURO-INTERLOCKING Context diagram working draft,
Safety Requirements Requirements are stakeholders (customer) demands – what they want the system to do. Not defining how !!! => specification Safety requirements are defining what the system must do and must not do in order to ensure safety. Both positive and negative functionality.
Specification Supplier instructions how to build the system. Derived from the required functionality = Requirements. Requirements R + Domain Knowledge D => Specification S
Where do we go wrong? Many system failures are not failures to understand R requirements ; they are mistakes in D domain knowledge –A NYC subway train crashed into the rear end of another train on 5th June The motorman ran through a red light. The safety system did apply the emergency brakes. However the...signal spacing was set in 1918, when trains were shorter, lighter and slower, and the emergency brake system could not stop the train in time. Are you sure?
Requirement Engineering Right Requirements Ways to refine Requirements - complete – linking to hazards (possible dangerous events) - correct – testing & modelling - consistent – semi/formal language - unambiguous – text in real English
Requirement Engineering Tools – Doors (Telelogic) -Data base and configuration management -History, traceability and linking
Furnish Railway requirements Consultants KnowGravity Euro-Interlocking Core Team DOORS Requirements Database Railway Domain Experts Requirements Simulation Requirements Validation via Simulation Capture requirements Project Development Process Requirements Modelling
Traceability in DOORS RequirementSpecification Architectural Design Test Plan Follow Customer Ammendments through all the Documentation
Traceability - Requirements from Scenarios Goal hierarchy user requirements traceability Two people shall be able to lift the boat onto the roof of the average saloon car. The sailor shall be able to contact the coastguard when the boat is capsized. The sailor shall be able to perform a tacking manoeuvre. To have sailed and survived Ready to sail Sailed Returned home Boat loaded Boat lifted Boat unloaded Boat rigged Boat on car Mast rigged Center-plate rigged Rudder rigged Gibed Boat manoeuvred Tacked Cruised Boat capsized Gone ashore Boat righted Coast guard contacted
Risk Analysis Risk is a combination of the severity (class) and frequency (probability) of the hazardous event. Risk Analysis is a process of evaluating the probability of hazardous events. The Value of life?? Value of life is estimated between 0.75M –2,5M Euro. USA numbers higher.
Risk Analysis Classes: - Catastrophic – multiple deaths >10 - Critical – a death or severe injuries - Marginal – a severe injury - Insignificant – a minor injury Frequency Categories: Frequent 0,1 events/year Probable0,01 Occasional0,001 Remote0,0001 Improbable0,00001 Incredible0,000001
Hazard Analysis A Hazard is situation in which there is actual or potential danger to people or to environment. Analytical techniques: - Failure modes and effects analysis (FMEA) - Failure modes, effects and criticality analysis (FMECA) - Hazard and operability studies (HAZOP) - Event tree analysis (ETA) - Fault tree analysis (FTA)
Fault Tree Analysis 1 The diagram shows a heater controller for a tank of toxic liquid. The computer controls the heater using a power switch on the basis of information obtained from a temperature sensor. The sensor is connected to the computer via an electronic interface that supplies a binary signal indicating when the liquid is up to its required temperature. The top event of the fault tree is the liquid being heated above its required temperature.
Fault event not fully traced to its source Basic event, input Fault event resulting from other events OR connection
Risk acceptability National/international decision – level of an acceptable loss (ethical, political and economical) Risk Analysis Evaluation: ALARP – as low as reasonable practical (UK, USA) “Societal risk has to be examined when there is a possibility of a catastrophe involving a large number of casualties” GAMAB – Globalement Au Moins Aussi Bon = not greater than before (France) “All new systems must offer a level of risk globally at least as good as the one offered by any equivalent existing system” MEM – minimum endogenous mortality “Hazard due to a new system would not significantly augment the figure of the minimum endogenous mortality for an individual”
Risk acceptability Tolerable hazard rate (THR) – A hazard rate which guarantees that the resulting risk does not exceed a target individual risk SIL 4 = < THR < per hour and per function SIL 3 = < THR < SIL 2 = < THR < SIL 1 =10 -6 < THR < Potential Loss of Life (PLL) expected number of casualties per year SIL = safety integrity level
V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis Requirements Model Test Scenarios Software Implementation & Unit Test Software Design Requirements Document Systems Analysis & Design Functional / Architechural - Model Specification Document Knowledge Base * * Configuration controlled Knowledge that is increasing in Understanding until Completion of the System: Requirements Documentation Requirements Traceability Model Data/Parameters Test Definition/Vectors
Additional home assignments From Neil Storey’s book Safety Critical Computer Systems 1.12 (Please define primary, functional and indirect safety) 2.4 (Please define unavailability) by 14 February to