Safety-Critical Systems 2 T 79.232 Risk analysis and design for safety Ilkka Herttua.

Slides:



Advertisements
Similar presentations
Integra Consult A/S Safety Assessment. Integra Consult A/S SAFETY ASSESSMENT Objective Objective –Demonstrate that an acceptable level of safety will.
Advertisements

Risk Management Introduction Risk Management Fundamentals
Fault-Tolerant Systems Design Part 1.
Safety Critical Systems T Safeware - Design for safety hardware and software Ilkka Herttua.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Overview of IS Controls, Auditing, and Security Fall 2005.
Module 3 UNIT I " Copyright 2002, Information Spectrum, Inc. All Rights Reserved." INTRODUCTION TO RCM RCM TERMINOLOGY AND CONCEPTS.
Safety-Critical Systems 2 Requirement Engineering T Spring 2006 Ilkka Herttua.
Software Quality Assurance (SQA). Recap SQA goal, attributes and metrics SQA plan Formal Technical Review (FTR) Statistical SQA – Six Sigma – Identifying.
Safety-Critical Systems 2 Requirement Engineering T Spring 2008 Ilkka Herttua.
1 Software Testing and Quality Assurance Lecture 34 – Software Quality Assurance.
CSC 402, Fall Requirements Analysis for Special Properties Systems Engineering (def?) –why? increasing complexity –ICBM’s (then TMI, Therac, Challenger...)
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
Tony Gould Quality Risk Management. 2 | PQ Workshop, Abu Dhabi | October 2010 Introduction Risk management is not new – we do it informally all the time.
Hazards Analysis & Risks Assessment By Sebastien A. Daleyden Vincent M. Goussen.
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
CIS 376 Bruce R. Maxim UM-Dearborn
Testing safety-critical software systems
Safety-Critical Systems 2 T Ilkka Herttua.
Telecom and Informatics 1 PSAM6, San Juan, Puerto Rico, USA - June 2002 ALLOCATING SAFETY INTEGRITY LEVELS IN PRACTICE Odd Nordland SINTEF, Trondheim,
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Safety-Critical Systems 6 Quality Management and Certification T
©Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 22Slide 1 Verification and Validation u Assuring that a software system meets a user's.
Safety Critical Systems
Quality in Product and Process Design Pertemuan 13-14
Safety-Critical Systems 3 Hardware/Software T Ilkka Herttua.
Safety-Critical Systems 6 Certification
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
DESIGNING FOR SAFETY CHAPTER 9. IMPORTANCE OF DESIGNING FOR SAFETY  In the near future, the level of safety that companies and industries achieve will.
Protecting the Public, Astronauts and Pilots, the NASA Workforce, and High-Value Equipment and Property Mission Success Starts With Safety Believe it or.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 3 Slide 1 Critical Systems 1.
FAULT TREE ANALYSIS (FTA). QUANTITATIVE RISK ANALYSIS Some of the commonly used quantitative risk assessment methods are; 1.Fault tree analysis (FTA)
Presentation on Preventive Maintenance
QUALITY RISK MANAGEMENT RASHID MAHMOOD MSc. Analytical Chemistry MS in Total Quality Management Senior Manager Quality Assurance Nabiqasim Group of Industries.
Fault-Tolerant Systems Design Part 1.
GE 116 Lecture 1 ENGR. MARVIN JAY T. SERRANO Lecturer.
Safety-Critical Systems T Ilkka Herttua. Safety Context Diagram HUMANPROCESS SYSTEM - Hardware - Software - Operating Rules.
J1879 Robustness Validation Hand Book A Joint SAE, ZVEI, JSAE, AEC Automotive Electronics Robustness Validation Plan The current qualification and verification.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Historical Aspects Origin of software engineering –NATO study group coined the term in 1967 Software crisis –Low quality, schedule delay, and cost overrun.
Safety Critical Systems 5 Testing T Safety Critical Systems.
1 Safety - definitions Accident - an unanticipated loss of life, injury, or other cost beyond a pre-determined threshhold.  If you expect it, it’s not.
Safety-Critical Systems 5 Testing and V&V T
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
Software Safety Case Why, what and how… Jon Arvid Børretzen.
6 July 2000CSAM Team1 CERN Safety Alarm Monitoring Invitation to Tender Strategy CERN Safety Alarm System Supervisory Board 3st meeting CSAM project team.
Idaho RISE System Reliability and Designing to Reduce Failure ENGR Sept 2005.
Fault-Tolerant Systems Design Part 1.
Product & Technology Quality. Excellence. Support SIL Explanation 27.JAN 2006 Automation & Safety.
Over View of CENELC Standards for Signalling Applications
Safety Critical Systems T Safeware - Design for safety hardware and software Ilkka Herttua.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
NCAF_May03.ppt Slide - 1 CSE International Ltd Data Integrity: The use of data by safety-related systems Alastair Faulkner CEng CSE International Ltd Tel:
Attributes Availability Reliability Safety Confidentiality Integrity Maintainability Dependability Means Fault Prevention Fault Tolerance Fault Removal.
Failure Modes and Effects Analysis (FMEA)
Safety-Critical Systems 3 T Designing Safety Software Ilkka Herttua.
About Us! Rob StockhamBA IEng MIEE General Manager Moore Industries-Europe, Inc MemberIEE Honorary Secretary ISA England Institute of Directors DirectorThe.
ON “SOFTWARE ENGINEERING” SUBJECT TOPIC “RISK ANALYSIS AND MANAGEMENT” MASTER OF COMPUTER APPLICATION (5th Semester) Presented by: ANOOP GANGWAR SRMSCET,
Reliability and Performance of the SNS Machine Protection System Doug Curry 2013.
Introduction to Safety Engineering for Safety-Critical Systems Seo Ryong Koo Dept. of Nuclear and Quantum Engineering KAIST Lab. Seminar.
Functional Safety in industry application
Most people will have some concept of what reliability is from everyday life, for example, people may discuss how reliable their washing machine has been.
How SCADA Systems Work?.
Quality Risk Management
J1879 Robustness Validation Hand Book A Joint SAE, ZVEI, JSAE, AEC Automotive Electronics Robustness Validation Plan Robustness Diagram Trends and Challenges.
HMI Reliability Dale Wolfe Reliability Engineer LMSSC*ATC*LMSAL
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
Unit I Module 3 - RCM Terminology and Concepts
Definitions Cumulative time to failure (T): Mean life:
Presentation transcript:

Safety-Critical Systems 2 T Risk analysis and design for safety Ilkka Herttua

V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis Requirements Model Test Scenarios Software Implementation & Unit Test Software Design Requirements Document Systems Analysis & Design Functional / Architechural - Model Specification Document Knowledge Base * * Configuration controlled Knowledge that is increasing in Understanding until Completion of the System: Requirements Documentation Requirements Traceability Model Data/Parameters Test Definition/Vectors

Overall safety lifecycle

Risk Analysis Risk is a combination of the severity (class) and frequency (probability) of the hazardous event. Risk Analysis is a process of evaluating the probability of hazardous events. The Value of life?? Value of life is estimated between 0.75M –2M GBP. USA numbers higher.

Risk Analysis Classes: - Catastrophic – multiple deaths >10 - Critical – a death or severe injuries - Marginal – a severe injury - Insignificant – a minor injury Frequency Categories: Frequent 0,1 events/year Probable0,01 Occasional0,001 Remote0,0001 Improbable0,00001 Incredible0,000001

Hazard Analysis A Hazard is situation in which there is actual or potential danger to people or to environment. Analytical techniques: - Failure modes and effects analysis (FMEA) - Failure modes, effects and criticality analysis (FMECA) - Hazard and operability studies (HAZOP) - Event tree analysis (ETA) - Fault tree analysis (FTA)

Fault Tree Analysis 1 The diagram shows a heater controller for a tank of toxic liquid. The computer controls the heater using a power switch on the basis of information obtained from a temperature sensor. The sensor is connected to the computer via an electronic interface that supplies a binary signal indicating when the liquid is up to its required temperature. The top event of the fault tree is the liquid being heated above its required temperature.

Fault event not fully traced to its source Basic event, input Fault event resulting from other events OR connection

Risk acceptability National/international decision – level of an acceptable loss (ethical, political and economical) Risk Analysis Evaluation: ALARP – as low as reasonable practical (UK, USA) “Societal risk has to be examined when there is a possibility of a catastrophe involving a large number of casualties” GAMAB – Globalement Au Moins Aussi Bon = not greater than before (France) “All new systems must offer a level of risk globally at least as good as the one offered by any equivalent existing system” MEM – minimum endogenous mortality “Hazard due to a new system would not significantly augment the figure of the minimum endogenous mortality for an individual”

Risk acceptability Tolerable hazard rate (THR) – A hazard rate which guarantees that the resulting risk does not exceed a target individual risk SIL 4 = < THR < per hour and per function SIL 3 = < THR < SIL 2 = < THR < SIL 1 =10 -6 < THR < Potential Loss of Life (PLL) expected number of casualties per year

Current situation / critical systems Based on the data on recent failures of critical systems, the following can be concluded: a)Failures become more and more distributed and often nation-wide (e.g. commercial systems like credit card denial of authorisation) b)The source of failure is more rarely in hardware (physical faults), and more frequently in system design or end-user operation / interaction (software). c)The harm caused by failures is mostly economical, but sometimes health and safety concerns are also involved. d)Failures can impact many different aspects of dependability (dependability = ability to deliver service that can justifiably be trusted).

Examples of computer failures in critical systems

Driving force: federation Safety-related systems have traditionally been based on the idea of federation. This means, a failure of any equipment should be confined, and should not cause the collapse of the entire system. When computers were introduced to safety-critical systems, the principle of federation was in most cases kept in force. Applying federation means that Boeing 757 / 767 flight management control system has 80 distinct microprocessors (300, if redundancy is taken into account). Although having this number of microprocessors is no longer too expensive, there are other problems caused by the principle of federation.

Designing for Safety Faults groups: - requirement/specification errors - random component failures - systematic faults in design (software) Approaches to tackle problems - right system architecture (fault-tolerant) - reliability engineering (component, system) - quality management (designing and producing processes)

Designing for Safety Hierarchical design - simple modules, encapsulated functionality - separated safety kernel – safety critical functions Maintainability - preventative versa corrective maintenance - scheduled maintenance routines for whole lifecycle - easy to find faults and repair – short MTTR mean time to repair Human error - Proper HMI

Hardware Faults Intermittent faults - Fault occurs and recurrs over time (loose connector) Transient faults - Fault occurs and may not recurr (lightning) - Electromagnetic interference Permanent faults - Fault persists / physical processor failure (design fault – over current)

Fault tolerance hardware - Achieved mainly by redundancy Redundancy - Adds cost, weight, power consumption, complexity Other means: - Improved maintenance, single system with better materials (higher MTBF) Fault Tolerance

Redundancy types Active Redundancy: - Redundant units are always operating. Dynamic Redundancy (standby): - Failure has to be detected - Changeover to other modul

Hardware redundancy techniques Active techniques: - Parallel (k of N) - Voting (majority/simple) Standby : - Operating - hot stand by - Non-operating – cold stand by

Reliability prediction Electronic Component - Based on propability and statictical - MIL-Handbook 217 – experimental data on actual device behaviour - Manufacture information and allocated circuit types -Bath tube curve; burn in – useful life – wear out

Safety-Critical Hardware Fault Detection: - Routines to check that hardware works - Signal comparisons - Information redundancy –parity check etc.. - Watchdog timers - Bus monitoring – check that processor alive - Power monitoring

Safety-Critical Hardware Possible hardware: COTS Microprocessors - No safety firmware, least assurance - Redundancy makes better, but common failures possible - Fabrication failures, microcode and documentation errors - Use components which have history and statistics.

Safety-Critical Hardware Special Microprocessors - Collins Avionics/Rockwell AAMP2 - Used in Boeing (30+ pieces) - High cost – bench testing, documentation, formal verification - Other models: SparcV7, TSC695E, ERC32 (ESA radiation-tolerant), 68HC908GP32 (airbag)

Safety-Critical Hardware Programmable Logic Controllers PLC Contains power supply, interface and one or more processors. Designed for high MTBFs Firmware Programm stored in EEPROMS Programmed with ladder or function block diagrams

Safety management Safety culture/policy of the organisation - Task for management ( Targets ) Safety planning - Task for safety manager ( How to ) Safety reporting - All personal - Safety log / validation reports

Home assignments 4.18 (tolerable risk) 5.10 (incompleteness within specification) before 2. March to