Safety in Automotive Systems

Safety in Automotive Systems
Class 10 Safety in Automotive Systems

Index Introduction Motivation Product Liability ISO 26262
Technical Challenges Robustness Freedom from interference Serial AND structure Safety Mechanism (examples) EEPROM Access Grey Channel Multi-IO Multi execution Multi execution cyclically writing of an output Outlook on further measures

Introduction The automotive industry reacts sensitive to technical problems:
Problem with accelerator pedal and other topics: Toyota calls back 8.5 million cars to the garage. Ignition Switch: The switches of 2.6 million small cars have insufficient torque and are prone to slipping out of “run,” shutting off power steering, power brakes and airbags. GM has linked 54 crashes and 13 deaths to the defect. Loss of power steering: 33 declarations have been transmitted to the transport safety agency. 3 accidents are known and checked currently by investigation agents. Mazda 3

Motivation Example: Problem with sudden acceleration
Investigation of the SW Code showed: 67 functions were rated as “un-testable” : Cyclomatic Complexity Metric scored more than 50 Throttle angle function rated to be unmaintainable: Cyclomatic Complexity Metric scored more than 100 Only 11 of 93 MISRA-C:1998 coding rules have been checked, whereas 5 of those were violated in the actual code Intended functionality and all five fail-safe modes have been implemented together in a single task Watchdog has been triggered by a timer-tick interrupt service routine (instead of monitoring the safety relevant tasks), RESET has been performed after 1.5s due to CPU overload A second CPU contains the ADC that digitizes the accelerator pedal position and communicates it to the main CPU  this single ADC feeds both CPUs their vehicle state information  Single Point Fault! Toyota claimed 41% of allocated stack was being used, analysis now showed it is about 94%! global variables, faults like buffer overflow or unsafe casting were found in the code Toyota paid 1 Billion Dollar

Motivation Example: Wrong unit conversion
1999: A disaster investigation board reports that NASA’s Mars Climate Orbiter burned up in the Martian atmosphere because engineers failed to convert units from English to metric. The $125 million satellite was supposed to be the first weather observer on another world. But as it approached the red planet to slip into a stable orbit on Sept. 23, the orbiter vanished. Scientists realized quickly it was gone for good. A NASA review board found that the problem was in the software controlling the orbiter’s thrusters. The software calculated the force the thrusters needed to exert in pounds of force. A separate piece of software took in the data assuming it was in the metric unit: newtons. But: It was correctly specified in the contract between NASA and Lockheed! Source:

Motivation Example: Maiden flight of the ARIANE 5
Fault: typecast of a 64 bit float to an 16 bit integer and overflow! Exception was suppressed explicitly! For Ariane 4 this was ok  evidence of no overflow was given. NO check on this for Ariane 5! Reuse of SW routines from former rocket  Requirements have not been reworked properly for the new rocket Costs 1.7billion Deutsche Mark 4. June 1996, Kourou/fr. Guyana Code example (ADA): declare vertical_veloc_sensor: float; horizontal_veloc_sensor: float; vertical_veloc_bias: integer; horizontal_veloc_bias: integer; ... begin declare pragma suppress(numeric_error, horizontal_veloc_bias); begin sensor_get(vertical_veloc_sensor); sensor_get(horizontal_veloc_sensor); vertical_veloc_bias := integer(vertical_veloc_sensor); horizontal_veloc_bias := integer(horizontal_veloc_sensor); exception when numeric_error => calculate_vertical_veloc(); when others => use_irs1(); end; end irs2; Here the trainer can use the guitar as example, due to more easy to identify parts: From functional point of view the guitar is producing music by player interaction, using a set of procedures in an organized way. From compositional point of view the guitar is composed of strings, body, tuning pegs, head, etc.

Product Liability Age of product liability law
Product Liability Age of product liability law? about 4000 years (~1750 BC) Codex Hammurabi (~1750): §229 If a builder build a house for some one, and does not construct it properly, and the house which he built fall in and kill its owner, then that builder shall be put to death. §230 If it kill the son of the owner, the son of that builder shall be put to death. §231 If it kill a slave of the owner, then he shall pay slave for slave to the owner of the house. §232 If it ruin goods, he shall make compensation for all that has been ruined, and inasmuch as he did not construct properly this house which he built and it fell, he shall re-erect the house from his own means. §233 If a builder build a house for some one, even though he has not yet completed it; if then the walls seem toppling, the builder must make the walls solid from his own means. Personal Liability Compensation Rework

Product Liability

Product Liability Laws in Germany
Product safety Law (ProdSG) According to § 4 a product should only be placed on the market if it is constructed that under normal use or foreseeable misuse safety and health of users or any third party are not endangered. Product Liability Law § 1(ProdHaftG) If somebody will be killed by a defect in a product or his body or his health will be injured or an item will be damaged, the manufacturer of the product is obliged to compensate the damages resulting from this product to the injured party.(…) BGB § 823 Liability for damages (Schadensersatzpflicht) Anyone who intentionally or negligently illegally endanger the life, body, health, liberty, property or other rights of others, is committed to replace anything resulting from this damage. What is the benchmark for the proof of the implementation of adequate measures Benchmark is the current state of the art „BMW Airbag Judgement“ BGH-Urt. V (VI ZR 107/08)

Product liability In case of a purely commercial damage:
Product liability law:  This represents the interests of the consumer / society; In case of a purely commercial damage: There is not necessarily an impact on the engineer Money flows to the injured party (Damage of image of the company, expensive call back actions) In case of endagered life and limp: In case of gross negligence: Possible criminal consequences for the engineer System and it’s functionality are dependent by the point of view of the observer. Point of view is not only physical, it is also filtered by the world representation of the observer, by the historical context. Notice that Taxi is a Role for the Car system. The citizen see the role, the Bedouin see a car, and the Martian see a box. Each point of view correspond to a higher level of abstraction caused by lack of knowledge about system purpose. Now imagine your system/subsystem under development and imagine that the Bedouin is a colleague from other team and the Martian is someone outside the company.

Product Liability Gross negligence
Examples for possible gross negligence: Not following your defined activities according to your role. Signing a review report without a review being performed. Deliberate looking away instead of facing, communicating problems and escalating. Filling out a test report without performing the tests. N.B.: since humans are involved errors can happen - once you followed the defined process with its necessary diligence (carefulness), this is not gross negligence!

ISO 26262 Functional Safety definition
ISO (Vocabulary) Absence of unreasonable risk due to hazards caused by malfunctioning behaviour of E/E systems E/E systems: electrical and / or electronical systems

(Will be included in 2nd edition 2018)
ISO Scope Safety-relevant systems which include one or several E/E systems and are installed in production passenger cars (up to 3500kg). Deals with possible risks emanating from the malfunction of E/E systems, which are caused by the respective E/E system itself. Commercial vehicles and motorcycles have not (yet) been included in the scope but have not been explicitly excluded either. (Will be included in 2nd edition 2018)

ISO 26262 Measurement for the risk potential of vehicle functions
ASIL = Automotive Safety Integrity Level 5-level scale (QM, A, B, C, D) QM means “standard Quality Assurance is sufficient“ (oriented to application of ISO TS 16949) From ASIL A onwards, additional risk reduction actions must be taken ASIL D describes the highest risk potential ASIL has requirements allocated to it The defined safety goals are the top level safety requirements (on vehicle level!)

ISO 26262 Determination of risk potential
On vehicle level ISO Determination of risk potential Probability of damage caused by faulty E/E function Always Unacceptable Unaccepted risk Risk acceptance limit ASIL D Safety Measures ASIL C Accepted (residual) risk Probablity that damage will occur ASIL B Acceptable ASIL A Extremely improbable Extent of damage low Potential extent of damage caused by faulty E/E function high

Contribution of the customer: Hazard Analysis and Risk Assessment
The safety classification of a function is based on its Severity (S) Potential damage to persons, equipment and environment Exposure (E) Frequency or duration where an accident can happen Controllability (C) Possibility to control, avoid or reduce the damage in case of accident

Contribution of the customer: Hazard Analysis and Risk Assessment
Risk Graph

H&R – Example for low beam
Malfunction: Low beam is switched OFF although ON is required. Description Scenario: #930 Vehicle is driving at night on an unlit highway with high speed, low beam switch is in ON position and low beam is ON. Suddenly low beam fails. Reason Scenario: Typical driving situation. S: E: C:2 Reason for S: Collision with delta-v > 40 km/h. Life-threatening injuries (survival uncertain) or fatal injuries cannot be excluded. More than 10% probability of AIS 5-6. Reason for E: Driving at night = E3 Reason for C: 90 % or more of all drivers are able to avoid harm, e.g. by braking. ASIL: B

Output of the H&R ASIL Safety Goal Safe State
Automotive Safety Integrity Level (QM, ASIL A, ASIL B, ASIL C, ASIL D) Safety Goal …is the top-level safety requirement as a result of the hazard analysis and risk assessment Example: The turn indicator shall only be active as long as requested by the driver. Safe State …is an operating mode of an item without an unreasonable level of risk. Example: Turn indicator is OFF and driver is aware that it is OFF. Fault Tolerant Time Interval (FTTI) …is the time-span in which a fault or faults can be present in a system before a hazardous event occurs Example: for 500ms a faulty behaviour of an indicator is acceptable.

Fault Tolerant Time Interval (FTTI)
The FTTI is the time-span on vehicle level in which a fault or faults can be present in a system before a hazardous event occurs e.g. 50 ms for outage of the steering mechanism e.g. 300 ms for outage of brake lights e. g. 500 ms for unintended acceleration / low beam e.g. 10 minutes for outage of tire pressure monitoring fault fault reaction safe state occurs detected hazard can occur FTTI - Fault Tolerant Time Interval hazard possible diagnostic test interval fault reaction time buffer time fault reaction time too long A share of FTTI for our scope of delivery shall be defined by OEM!

H&R – Example for low beam
Safety Goal: Ensure sufficient illumination of the road while driving if low beam is requested by the driver. FTTI : 500ms Derivation of Fault Tolerance Time Interval Low beam has a range of 50m in the middle of the road to 100m for the side of the road in case of asymmetric headlamps. When driving 200 km/h (55 m/s), the light range corresponds to 900ms. It is assumed that the driver can not remember the course of the road perfectly for the whole distance. Safe State: Low Beam is ON Emergency operation: High Beam is ON (if activation of Low Beam is no longer possible)

ISO 26262 The Spirit Engage your brain – let it engaged!
Don‘t just follow a process – start thinking Demonstrate your deep knowledge on the product! You exactly know what’s going on in the product  system knowledge You know about the abilities of the product You know about the constraints of the product In other words:  you can tell what failures can occur and how you are reacting.  you have a rationale documenting why you implemented which mechanism to detect and control failures. Document your decisions Write down what you decided and why. What is not documented has not been performed.

Technical Challenges The ISO Challenge:
Safety related software has to show robustness against potential influences from systematic errors and hardware faults. Safety related software has also to show Freedom From Interference (FFI) against lower-ASIL and QM software.

Technical Challenges Robustness
Robustness includes aspects like: influence on memory (RAM, including Stack, EEPROM, manipulation, overflow, …) influence on execution timing (available runtime, correct order, SW hangs) correct execution (ALU errors, bus errors, systematical errors) communication (between internal entities, stuck, manipulation, …) operation mode changes (bootstrap, sleep mode) responding properly on abnormal inputs and conditions The ability of SW to fulfill the safety requirements under adverse conditions. Robustness has to be considered even if the whole SW is developed according to the highest ASIL

Technical Challenges What is “Freedom of Interference”?
When looking at a safety relevant ECU, initially all software has to be developed according to the highest ASIL! If some parts of the SW are intended to be developed according to a lower ASIL or QM, the following steps have to be performed: Identify the safety relevant parts of the software Show that the non-safety relevant software can not affect the software that implements the safety requirements (Safety Strategy + Safety Mechanisms) ISO gives a list of criteria to evaluate “Freedom from Interference”, which is taken into consideration in the SW Safety Concept. EXAMPLE for FFI: Element 1 is free of interference from element 2 if failures of element 2 cannot cause element 1 to fail . The criteria overlap to a great extent with the aspects for robustness, so that most measures will help to achieve FFI as well as support robustness.

Safety Mechanism Example: EEPROM Access
The Challenge Safety Relevant Parameters shall be stored in a non-volatile memory using a SWP module.  Example for writing a safety relevant value: EEProm as safety relevant SW EEProm as „Gray Channel“ SW Application Application Compare Read + Status write ACK Write A Write B Read A Read B EEProm Module EEProm Module is NOT safety relevant! is safety relevant! Same data stored twice, inverted (A,B) Data may be additionally secured by CRC, counter, inversion other mechanisms depending on required diagnosis This color marks safety relevant SW

Safety Mechanism Example: Grey Channel (CAN)
Grey channel: A „Grey Channel“ is a channel which can introduce faults on a signal or information, but ALL of them can be detected and controlled! ABS_Speedsignal Via CAN Speed information is safety relevant!

Safety Mechanism Example: Grey Channel (CAN)
Example for a CAN message: Failure of communication peer  Timeout monitoring Message corruption  CRC – here CAN-MSG_CRC Message delay  Timeout monitoring Message loss message counter - here CAN-MSG_CNT unintended message repetition  message counter - here CAN-MSG_CNT resequencing  message counter - here CAN-MSG_CNT Insertion of message  message counter - here CAN-MSG_CNT masquerading  data CRC - here CRC_ABS_Speed (fault modes taken from ISO Table D1) also known as E2E (end-to-end protection) The CAN message “as is” (including CRCs, MSG_CNT) is routed through to the safety mechanism. In the safety mechanism the checks for timeout, corruption, …. are performed.  no matter which error is introduced within the „grey channel“ it will be detected and controlled (use of substitute value, error reaction, …) by the according safety mechanism. This mechanism also works for exchange of information SW internally.

Safety Mechanism Example: Grey Channel (ADC)
Example for ADC data: One has a SW with an AUTOSAR architecture (MCAL, Basic SW, RTE, appl. SW). In the MCAL layer one add redundancy (CRC, MSG CNT) to the plain ADC data:  this ADC “message” is routed to the safety mechanism without changes.  debouncing, beautifying is done in the safety mechanism.  The ADC “message” is checked for CRC and MSG CNT in the Safety mechanism. N.B.: in this example the MCAL is safety relevant!

Safety Mechanism Example: Multi-IO
The two speed signals (ABS_Speed, ESP_Speed) are coming via from two different sources (PWM and analog value). Inside the safety mechanism the two signals are compared for their value.  keep timing effects in mind! Different fault modes can be detected. Fault mode examples: In case of a wrong ADC value (faulty V_Ref) can be detected by comparison. A faulty V_Ref has no impact on PWM! In case of an accidently overwritten ADC value (from QM SW) it can be detected by the comparison.  An overwritten ADC value has no impact on PWM If both modules (PWM and ADC) have a stuck, this might be not detected!

Safety Mechanism Example: Multi execution
Multi execution: A function is called multiple times to detect transient faults or faults coming from different conditions of the environment. N.B.: consider timing / expected change in signal / FTTI! Don’t mix up with debouncing a signal By comparing the two results one can gather a higher confidence in the result of the IO signal. N.B. analyze timing/ fault modes… There is a safety function to read in a safety relevant IO signal.

Safety Mechanism Example: Multi execution – cyclically writing of an output
There is a safety function writing a safety relevant IO signal. The signal could be influenced / overwritten by faulty tasks / functionalities. Cyclically* writing the IO signal increases the confidence of having the right signal on the IO. N.B.: The cyclical in * is closely related with the according FTTI! Possibly LP (Low pass) filter – or comparable – could be necessary as well! Multi execution can also be used to detect and / or to control other internal faults e.g. transient ALU errors.

Safety Mechanism Outlook on further measures
Commonly used measures for SW robustness and freedom from interference – as examples: ECC --> Bit Flip in RAM, ROM double redundant (inverse) storage --> FFI, lost pointer Task monitoring --> livelocks, deadlocks program Flow monitoring --> livelocks, deadlock, error in execution order check input variables for expected values e.g. if a month is an input and is coded 1 -> January, 2 -> February, there is no month=13! Some of the mentioned measures have the potential for being improved in terms of safety. As an example, double redundant storage can be improved by storing the value and its one’s complement.

Safety in Automotive Systems

Similar presentations

Presentation on theme: "Safety in Automotive Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Safety in Automotive Systems

Similar presentations

Presentation on theme: "Safety in Automotive Systems"— Presentation transcript:

Similar presentations

About project

Feedback