Presentation on theme: "Bus Architectures for Satety- Critical Embedded Systems --by Harit Desai."— Presentation transcript:
Bus Architectures for Satety- Critical Embedded Systems --by Harit Desai
Introduction Safety-critical systems are federated –Each function has its own fault tolerant embedded control system with minor interconnections. –Provides strong barrier to fault propagation. –Federated approach is expensive(replication)
Buses Time-triggered buses –All activities are driven by passage of time. –Interacts with the environment according to internal schedule Event triggered buses –All activities are driven by occurrence of events –under the control of environment and respond to stimuli as they occur.
Why not event triggered system ? In safety-critical system it is necessary to guarantee some basic quality of service, even in presence of faults. Guaranteed low latency is required. –Events arriving at different nodes may have to contend for access to the bus –So, some form of media access control is required Ethernet resolves contention probabilistically To resolve contention deterministically, lowest number wins the arbitration but latency increases as the load increases.
In presence of faults, message may be retransmitted thereby delaying the next message even if it has higher priority. Furthermore, faulty nodes may make excessive demands for service. ARNIC 629, uses a technique called ‘minislotting’ –Each node has to wait a certain period after sending a message before it can contend to send another –But here also, latency is function of load Byteflight (BMW) extends this mechanism with guaranteed, preallocated slots for critical messages –Provides no protection against a faulty node that fails to recognize them, this kind of fault is called the ‘babbling idiot’ failure.
Time-triggered bus Static preallocation of communication bandwidth in the form of a global schedule Thus, contention is resolve at design time rather than at run time. But what about babbling idiot failure…. –Each node has an independent component, called a bus guardian,that allows to transmit only when its allowed to do so. –Guardian has an independent clock and independent knowledge of the schedule and allows its node to broadcast only when indicated by schedule. No need for source or destination address in the message –This reduces the size of the message. –Increases the message bandwidth of the bus.
Continued… Fault-tolerant clock synchronization is a fundamental requirement for a time-triggered bus architecture. Abstraction of global clock is realized by each node having a local clock that is closely synchronized with the clocks of all other nodes.
Fault hypotheses and Fault Containment Units Fault hypotheses must describe –The modes of faults that are to be tolerated –Their maximum number –And arrival rates. It must also identify different fault containment units.(FCU) –There must be no propagation of faults from one FCU to another. –And no “common mode failures” meaning a single physical event produces faults in multiple FCUs. Fault may exhibit different modes at different levels of protocol hierarchy. –Example: at electrical level intermediate voltage at message level byzantine failure –Such faults must be controlled at underlying intermediate level
Basic Dimensions of faults Faults can affect value, time or space. Value fault: causes an incorrect value to be computed, transmitted or received. Timing fault: causes value to be computed, transmitted or received at wrong time. Spatial proximity:where all the matter in some specified volume is destroyed. –Redundant buses come into close proximity at each node. –Central hub topology is more resilient.
Fault Classification Manifest: fault can be reliably detected. –A fault that causes FCU to cease transmitting. Symmetric: meaning whatever the effect, it is same for all observers Arbitrary: may be asymmetric or byzantine, meaning that its effect is perceived differently by different observers. –Slightly out of specification (SOS) fault Intermediate electrical voltage or a weak edge.
Redunduncy required for fault tolerance depends on the type of fault considered. number of FCUs required for clock synchronization n > 3a + 2s + m where a arbitrary faults s symmetric faults m manifest faults
Some architectures can tolerate only one fault at a time, then they reconfigure and are able to tolerate additional faults. In such architecture, fault arrival rate is very important. –faults must not arrive faster than the architecture can reconfigure –operates according to static schedules, which consists “rounds” or “frames” that are executed repeatedly. –acceptable fault arrival rate is expressed in faults per rounds. Sometimes system may experience many simultaneous faults. (due to HIRF). –Restart is usually initiated. –detection of such failure and restart must be very fast. –estimate of steer-by-wire automobile application is 50ms.
Services Basic purpose of these architectures is to build reliable distributed application. Basic services –clock synchronization –time-triggered activation –reliable message delivery the problem of distributing data consistently in presence of fault is variously called interactive consistency –Agreement: all nonfaulty receivers obtain the same message. –Validity: if the transmitter is nonfaulty, then nonfaulty receivers obtain the message actually sent.
failure notification or ‘membership’ service. –service must produce consistent knowledge. –if one nonfaulty node thinks that a particular node has failed then all the nonfaulty nodes must hold the same opinion. each node maintains a private membership list. –Agreement: the membership lists of all nonfaulty nodes are the same. –Validity: the membership lists of all nonfaulty nodes contain all nonfaulty nodes and atmost one faulty node. When unable to maintain accurate membership, best resource is to maintain agreement, but sacrifice validity.This weakened requirement is called “clique avoidance”.
Practical Implementations SAFEbus:- develop by Honeywell for cockpit displays –Interface or BIUs are duplicated. BIUs perform clock synchronization, message scheduling and transmission functions –each BIUs of a pair is a different FCU. –interconnect bus is quad-redundant. –each BIU of a pair drives a different pair of interconnect buses but is able to read all of four. –each interconnect bus comprise of two data lines and a clock line and operate at 30MHz –it can handle arbitrary faults and a high rate of fault arrivals.it also tolerates spatial proximity faults. –considered to be the best, used in passenger aircraft in Boeing 777.
SPIDER :- Scalable Processor-Independent Design for Electromagnetic Resilience –developed at NASA langley research center –it’s a research platform to explore recovery strategies for radiation- induced (HIRF) faults. –uses star configuration, in which interface may be located either with their hosts or in centralized hosts. –services include interactive consistent message broadcast and identification of failed nodes (membership service). FlexRay:- developed for powertrain and chassis control in cars. –more flexible than other buses –supports ‘static’ time-triggered operation and ‘dynamic’ event triggered operation