Presentation is loading. Please wait.

Presentation is loading. Please wait.

Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua.

Similar presentations


Presentation on theme: "Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua."— Presentation transcript:

1 Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua

2 Current situation / critical systems Based on the data on recent failures of critical systems, the following can be concluded: a)Failures become more and more distributed and often nation-wide (e.g. commercial systems like credit card denial of authorisation) b)The source of failure is more rarely in hardware (physical faults), and more frequently in system design or end-user operation / interaction (software). c)The harm caused by failures is mostly economical, but sometimes health and safety concerns are also involved. d)Failures can impact many different aspects of dependability (dependability = ability to deliver service that can justifiably be trusted).

3 Examples of computer failures in critical systems

4 Driving force: federation Safety-related systems have traditionally been based on the idea of federation. This means, a failure of any equipment should be confined, and should not cause the collapse of the entire system. When computers were introduced to safety-critical systems, the principle of federation was in most cases kept in force. Applying federation means that Boeing 757 / 767 flight management control system has 80 distinct microprocessors (300, if redundancy is taken into account). Although having this number of microprocessors is no longer too expensive, there are other problems caused by the principle of federation.

5 Hardware Faults Intermittent faults - Fault occurs and recurrs over time (loose connector) Transient faults - Fault occurs and may not recurr (lightning) - Electromagnetic interference Permanent faults - Fault persists / physical processor failure (design fault – over current)

6 Fault tolerance hardware - Achieved mainly by redundancy Redundancy - Adds cost, weight, power consumption, complexity Other means: - Improved maintenance, single system with better materials (higher MTBF) Fault Tolerance

7 Redundancy types Active Redundancy: - Redundant units are always operating. Dynamic Redundancy (standby): - Failure has to be detected - Changeover to other modul

8 Hardware redundancy techniques Active techniques: - Parallel (k of N) - Voting (majority/simple) Standby : - Operating - hot stand by - Non-operating – cold stand by

9 Reliability prediction Electronic Component - Based on propability and statictical - MIL-Handbook 217 – experimental data on actual device behaviour - Manufacture information and allocated circuit types -Bath tube curve; burn in – useful life – wear out

10 Reliability calculation for system MTTF Mean time to failure- average time for which system would operate before first failure MTTR Mean time to repair – time to get system back in service again MTBF Mean time between failures MTBF= MTTF+MTTR

11 Safety-Critical Hardware Fault Detection: - Routines to check that hardware works - Signal comparisons - Information redundancy –parity check etc.. - Watchdog timers - Bus monitoring – check that processor alive - Power monitoring

12 Safety-Critical Hardware Possible hardware: COTS Microprocessors - No safety firmware, least assurance - Redundancy makes better, but common failures possible - Fabrication failures, microcode and documentation errors - Use components which have history and statistics.

13 Safety-Critical Hardware Specialist Microprocessors - Collins Avionics/Rockwell AAMP2 - Used in Boeing 747-400 (30+ pieces) - High cost – bench testing, documentation, formal verification - Other models: SparcV7, TSC695E, ERC32 (ESA radiation-tolerant), 68HC908GP32 (airbag)

14 Safety-Critical Hardware Programmable Logic Controllers PLC Contains power supply, interface and one or more processors. Designed for high MTBFs Firmware Programm stored in EEPROMS Programmed with ladder or function block diagrams

15 Safety-Critical Software Correct Program: - Normally iteration is needed to develop a working solution. (writing code, testing and modification). - In non-critical environment code is accepted, when tests are passed. - Testing is not enough for safety-critical application – Needs an assessment process: dynamic/static testing, simulation, code analysis and formal verification.

16 Safety-Critical Software Dependable Software : - Process for development - Work discipline - Well documented - Quality management - Validated/verificated

17 Safety-Critical Software Safety-Critical Programming Language: -Logical soundness: Unambigous definition of the language- no dialects of C++ - Simple definition: Complexity can lead to errors in compliers or other support tools - Expressive power: Language shall support to express domain features efficiently and easily - Security of definition: Violations of the language definition shall be detected - Verification: Language supports verification, proving that the produced code is consistent with the specification. - Memory/time constrains: Stack, register and memory usage are controlled.

18 Safety-Critical Software Software faults: - Requirements defects: failure of software requirements to specify the environment in which the software will be used or unambigious requirements - Design defects: not satisfying the requirements or documentation defects - Code defects: Failure of code to conform to software designs.

19 Safety-Critical Software Software faults: - Subprogram effects: Definition of a called variable may be changed. -Definitions aliasing: Names refer to the same storage location. - Initialising failures: Variables are used before assigned values. - Memory management: Buffer, stack and memory overflows - Expression evalution errors: Divide-by- zero/arithmetic overflow

20 Safety-Critical Software Language comparison: -Structured assembler (wild jumps, exhaustion of memory, well understood) - Ada (wild jumps, data typing, exception handling, separate compilation) - Subset languages: CORAL, SPADE and Ada (Alsys CSMART Ada kernel) - Validated compilers for Pascal and Ada - Available expertise: with common languages higher productivity and fewer mistakes, but C still not appropriate.

21

22 Safety-Critical Software Languages used : - Boeing uses mostly Ada, but still for type 747-400 about 75 languages used. - ESA mandated Ada for mission critical systems. - NASA Space station in Ada, some systems with C and Assembler. - Car ABS systems with Assembler - Train control systems with Ada - Medical systems with Ada and Assembler - Nuclear Reactors core and shut down system with Assembler, migrating to Ada.

23 Safety-Critical Software Tools - High reliability and validated tools are required: Faults in the tool can result in faults in the safety critical software. - Widespread tools are better tested - Use confirmed process of the usage of the tool - Analyse output of the tool: static analysis of the object code - Use alternative products and compare results - Use different tools (diversity) to reduce the likelihood of wrong test results.

24 Safety-Critical Software Designing Principles - Use hardware interlocks before computer/software - New software features add complexity, try to keep software simple - Plan for avoiding human error – unambigious human-computer interface - Removal of hazardous module (Ariane 5 unused code)

25 Safety-Critical Software Designing Principles - Add barriers: hard/software locks for critical parts - Minimise single point failures: increase safety margins, exploit redundancy and allow recovery. - Isolate failures: don‘t let things get worse. - Fail-safe: panic shut-downs, watchdog code - Avoid common mode failures: Use diversity – different programmers, n-version programming

26 Safety-Critical Software Designing Principles: - Fault tolerance: Recovery blocks – if one module fails, execute alternative module. - Don‘t relay on run-time systems

27 Safety-Critical Software Techniques/Tools: -Fault prevention: Preventing the introduction or occurence of faults by using design supporting tools (UML with CASE tool) -Fault removal: Testing, debugging and code modification

28 Safety-Critical Software Software faults: - Faults in software tools (development/modelling) can results in system faults. -Techniques for software development (language/design notation) can have a great impact on the performance od the people involved and also determine the likelihiid of faults. - The characteristics of the programming systems and their runtime determine how great the impact of possible faults on the overall software subsystem can be.

29 Safety-Critical Software Architectural design: Layered structure 1 - High level command and control functions 2 – Intermediate level routines 3 – I/O routines and device driver

30 Safety-Critical Software Architectural design: - Design is done after partitioning of the required functions on hardware and software. - Complete specification of the architecture with components, data structures and interfaces (messages/protocols)

31 Safety-Critical Software Architectural design: - Test plan for each module (testability) - Human-computer interface - Change control system needed for inconsistencies and inadequacies within specification. - Verification of the architectural design against specification - Software partitioning: modular aids comprehension and isolation (fault limiting)

32 Safety-Critical Software Reduction of Hazardous Conditions - summary - Simplify: Code contains only minimum features and no unnecessary or undocumented features or unused executable code - Diversity: Data and control redundancy - Multi-version programming: shared specification leads to common-mode failures, but synchronisation code increases complexity

33 Safety-Critical Software Home assignments 3 : - 6.42 (fault-tolerant system) - 7.15 (reliability model) - 9.17 (reuse of software) Please email to herttua@eurolock.org byherttua@eurolock.org 24 of February 2004

34 Home assignments 1&2 1.12 (primary, functional and indirect safety) 2.4 (unavailability) 3.23 (fault tree) 4.18 (tolerable risk) 5.10 (incompleteness within specification) Email before 24. February to herttua@eurolock.org 11 and 18 February Case Studies/ Teemu Tynjälä


Download ppt "Safety-Critical Systems 3 Hardware/Software T 79.232 Ilkka Herttua."

Similar presentations


Ads by Google