Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Time-Triggered Architecture

Similar presentations


Presentation on theme: "The Time-Triggered Architecture"— Presentation transcript:

1 The Time-Triggered Architecture
TU Wien The Time-Triggered Architecture H.Kopetz November 2014

2 Outline Introduction Global Time TTA Components Determinism
Communication in the TTA Error Handling Mechanisms Conclusion

3 Automation of Sinter Plant at Posco, Korea
Around 1975, my employer, VOEST, constructed a computerized sinter plant in Pohang, South Korea. With state of the art real-time computer technology much time and patience was needed in the integration phase to tune and commission the plant on site. Precise specification of the temporal properties of the interfaces of the components is missing. Need for a solid engineering methodology for the design of dependable distributed real-time systems.

4 Start of TTTech Problem Awareness Academic Industrial Emeritus
TU Wien TU Berlin Voest Alpine, Linz Start of TTTech Problem Awareness Ass. Prof USA PhD in Physics, Vienna CV of Hermann Kopetz

5 The Objectives of the TTA
The TTA is based on more than thirty years of research on the topic of dependable real-time systems. To establish a platform for the design of dependable component-based cyber-physical systems (CPS) that can be adapted to the needs of different application domains. To provide global coordination in a distributed real- time system without a central point of failure. In order to understand the design decisions taken in, and the dependability mechanisms provided by, the TTA we have to understand the characteristics of CPS.

6 CPS: Physical World meets Cyber World
P-System C-System j Real Time Com. System m k n l Event in the P System Interface Node of the C-System

7 Physical (P) System versus Cyber (C) System
P-System Controlled by the laws of physics Physical time Time base dense C-System Controlled by program execution Execution time Time-base sparse The model of the P-system that is used by the C-system must be aware of the progression of Physical time.

8 Focus on Safety Critical Systems--No Backup
Fly-by-wire Airplane: There is no mechanical or hydraulic connection between the pilot controls and the control surfaces. Drive-by-wire Car: There is no mechanical or hydraulic connection between the steering wheel and the wheels.

9 No Backup--the 10-9 Challenge
The system as a whole must be more reliable than any one of its components: e.g., System Dependability 1 FIT--Component dependability 1000 FIT (1 FIT: 1 failure in 109 hours) Architecture must be distributed and support fault-tolerance to mask component failures. System as a whole is not testable to the required level of dependability. The safety argument is based on a combination of experimental evidence about the expected failure modes and failures rates of fault-containment units (FCU) and a formal dependability model that depicts the system structure from the point of view of dependability. Independence of the FCUs is a critical issue.

10 Mitigation of Soft Hardware Errors
Architectural means to mitigate the consequences of component failures might become a necessity when using the upcoming submicron devices, as stipulated in the International Roadmap of Semiconductors: Relaxing the requirement of 100% correctness for devices and interconnects may dramatically reduce the costs of manufacturing, verification and test. Such a paradigm shift is likely forced in any case by technology scaling, which leads to more transient and permanent failures of signals, logic values, devices and interconnects.

11 Certification as a Design Driver
Certification is only possible if a system has been designed for certification: Independence of Fault-Containment Units (FCU) Elimination of temporal error propagation from one FCU to the network and in consequence to the other FCUs. Deterministic Operation Formal Analysis of Critical Algorithms Modular Composition of Correctness Argument

12 Control System Requirements--Periodicity
Periodicity is not mandatory, but often assumed as it leads to simpler algorithms and more stable and secure systems. Most of the algorithms developed with this assumption are very sensitive to period duration variations, jitter at the starting instant. This is especially the case of motor controllers iin precision machines. Simultaneous sampling of inputs is also an important stability factor. From : Decotignie, J., D., Which Network for Which Application, in The Industrial Communication Technology Handbook, R. Zuwarski, Editor. 2005, Taylor and Francis: Boca Raton. p. 19/1-19/ p. 19-4

13 Architectural Principles of the TTA
Component Orientation—A component is a hardware/software unit. All components are time-aware—A global time is provided by the platform. Separation of Computation from Communication— Components and Communication systems can be developed independently. Core Services are deterministic—Modular Certification is supported by the architecture. Different Integration Levels—IP-Cores form a Chip, Chips form a Device, Devices form systems Being faulty is normal.

14 Need for a Global Time A valuable lesson from the August 14 blackout is the importance of having time-synchronized system data recorders. The Task Force’s investigators labored over thousands of data items to determine the sequence of events, much like putting together small pieces of a very large puzzle. That process would have been significantly faster and easier if there had been wider use of synchronized data recording devices. From Final Report on US-Canadian August 14, 2003 Power Blackout, p.164.

15 Need for Timeliness On November 21, 2012 it has been announced that the delivery of the eight trains will not take place as originally planned ahead of the major train schedule change of December 9, 2012. According to [Spi12] the reason for this delay is in the complexity of the electronics. One problem is the time delay of one second between the execution of the control command by the driver and the activation of the brakes, which extends the braking distance by up to 70 meters. [Spi12] Der Spiegel on line. [accessed 19. December 2012].

16 RT Information has limited temporal validity
An appropriate model of RT communication must consider timeliness as important as correctness.

17 Communication—The Choices
Competition versus Cooperation Event Triggered Time Triggered Probabilistic Deterministic

18 Event-Triggered Communication
A new message is sent, whenever a significant event occurs. Delay in the queues is unpredictable. What happens under Peak Load?

19 The Alternative: Time Triggered Messages
Event Triggered: Spontaneous Uncoordinated Best Effort Trashing under Peak load Time Triggered: Time Schedule Coordinated Planned No Trashing

20 Timeliness: Event-Triggered vs Time-Triggered
Probability Density ET: dmax unbounded TT: dmax = Precision of Clock Synchronization Timeout for Error Detection TT ET ? Real Time

21 Time-Awareness Characteristic for the the TTA is the availability of a global physical time of known precision at every component of the architecture. This global time is used to Synchronize actions in the physical world and in the cyber world Limit the validity of real-time information Control access to shared resources Strengthen security protocols Detect failures of fail-silent components

22 The Central Research Question:
The development of a clock synchronization algorithm and an associated communication protocols with the following properties: Fault tolerance: No single point of failure. Vigilant: Any violation of an algorithmic assumption must be detected immediately. Efficiency: Small data overhead and few extra messages. Correctness: Simple in order that its correctness can be established by formal methods. Precise Interface Specification to the Components.

23 Cyclic Representation of Time
Real-Time 1 A 2 B C D 5 E 1 Start of Cycle A Observation of Sensor Input 2 Start of Transmission of Sensor Data B Transmission of Input Data 3 Start of Processing of Control Algorithm C Processing of Control Algorithm Termination of Processing D Transmission of Output Data Start of Output to Actuators E Output Operation at the Actuator Termination of Output Operation Real Time ground- state 1 2 A B 6 3 E C D 5 4

24 Dense Physics Discrete Central Computer Sparse Distributed Computer
Models of Time in a CPS Dense Physics Discrete Central Computer Sparse Distributed Computer B A Real Time Real Time Real Time

25 Dense Physics Discrete Central Computer Sparse Distributed Computer
Models of Time in a CPS Dense Physics Discrete Central Computer Sparse Distributed Computer B A Real Time Real Time Real Time Precision of the Global Time

26 Clock Synchronization Condition

27 One Tick Difference: What Does it Mean?
Because of the accumulation of the synchronization error and the digitalization error, it is not possible to reconstruct the temporal order of two events from the knowledge that the global timestamps differ by one.

28 Reasonableness Condition
The global time t is called reasonable, if all local implementations of the global time satisfy the following reasonableness condition for the global granularity g of a macrotick: g >  This reasonableness condition ensures that the synchronization error is bounded to less than one macrogranule, i.e., the duration between two macroticks.

29 Fundamental Limits to Time Measurement
Given a distributed system with a reasonable global timebase with granularity g. Then the following fundamental limits to time measurement must be observed: If a single event is observed by two nodes, there is always the possibility that the timestamps will differ by one tick Let us assume that dobs is the observed duration of an interval. Then the true duration dtrue is ( dobs - 2g) < dtrue <( dobs + 2g) The temporal order of events can only be recovered, if the observed time difference dobs  2g The temporal order of events can always be recovered, if the event set is 0/3g precedent.

30 Sparse Time Model in the TTA
Whenever we use the term time we mean physical time as defined by the international standard of time TAI. If the occurrence of events is restricted to some active intervals on the timeline with duration  with an interval of silence of duration  between any two active intervals, then we call the time base /-sparse, or sparse for short, and events that occur during the active intervals sparse events.

31 The Intervals  and  in a Sparse Timebase
Depend on the precision P of the clock synchronization. In reality, the precision is always larger than zero—in a distributed system clocks can never be fully synchronized. The precision depends on the stability of the oscillator, the length of the resynchronization interval and the accuracy of interval measurement. On a discrete time-base, there is always the possibility that the same external event will be observed by a tick difference.  

32 Complexity Management in the TTA
The architectural style of the TTA deploys the following simplification strategies to reduce the complexity of a design: Partitioning: The partitioning of a system into nearly autonomous subsystems (components).--Physical Structure Abstraction: The introduction of abstraction layers whereby only the relevant properties of a lower layer are exposed to the upper layer--Structure and Behavior Segmentation: The temporal decomposition of complex behavior into small parts that can be processed sequentially (“step-by-step”)--determinism helps! Recursion: Use of the same general concepts and mechanisms at different levels of abstraction

33 TTA Architecture Services Overview
Application Specific Services, Including Middleware Application MW Domain Specific Services DSC e.g., AUTOSAR DS Optional Services Core Services e.g., Message Transport Clock Synchronization Reconfiguration Robustness Security Core Different Implementation Choices

34 What is a TTA Component? Hardware/software unit that accepts input messages, provides a useful service, maintains internal state, and produces after some elapsed time output messages containing the results. It is aware of the progression of physical time Unit of abstraction, the behavior of which is captured in a high-level concept that is used to capture the services of a subsystem. Fault-Containment-Unit (FCU) that maintains the abstraction in case of fault occurrence and contains the immediate effects of a fault (a fault can propagate from a faulty component to a component that has not been affected by the fault only by erroneous messages). Unit of restart, replication and reconfiguration in order to enable the implementation of robustness and fault-tolerance.

35 Model Driven Design of a Component
Domain Specific Application Model (e.g., expressed in UML) Platform Independent Model(PIM) focuses on functionality and time Platform Independent Model (PIM) expressed in a High-Level Language (e.g., System C). Platform Specific Model (PSM) (nonfunctional properties)

36 Performance Trends--Power
Gops/Watt 1000 ASIC 100 FPGA 10 CPU 1 Cell 0.1 0.01 1990 1995 2000 2005 2010 Ref: Lauwereins, Imec, MSOP 2006

37 The Interfaces of a TTA Component
Connection to local sensors, actuators, man-machine interface, other systems Local Interface-- to the Environment (unspecified) View Inside Control Technology Dependent Interface (TDI) Hardware/Software Unit Technology Independent Interface (TII) Configuration Restart Reset Power level For the (remote) Maintenance Expert Linking Interface LIF-- Provides the service to the User Relevant for the integration of components into a cluster of components

38 Software within a Component
Downloading of the component software, the Job, into the component hardware via the TII interface. Communicate with other System Components to establish ports and dynamic links, to reintegrate components after a transient fault etc. Global time management Provision of API Services (e.g., send and receive of a message) Scheduling of the tasks within a component Service of the TII Interface to reset, start, and terminate the operation of a component Provision of Generic Middleware Services (GEM) Application SW Handles LIF GEM handles TII Mini RTOS handles TDI and Local interf. Hardware

39 Operating System in the TTA
In the TTA, the functions of a monolithic operating system are partitioned into A Mini-RTOS in each node including generic middleware, and An open-ended set of autonomous OS Components that provide operating system services such as Device Controllers (Gateway Components) Integrated Resource Management Security Diagnosis and Robustness Shared Memory Component

40 Look at the Inside of a Component
Local Interfaces The TII (technology independent interface) gives access to the services of the generic middleware (GEM). TII LIF

41 Linking Interface Specification
The Linking interface is a message-based interface. Its specification consists of three parts: Transport Specification: contains the information needed to transport a message from the sender to the receiver. Temporal properties are part of the transport specification. Operational Specification: specifies the syntactic structure of the bit stream contained in the message and establishes the message variable names that point to the concepts at the meta level. Meta-Level Specification: assigns meaning to the message variable names established by the operational specification.

42 Example of a Cluster of Components
I/O Driver Interface Assistant System Gateway Body Host Computer LIF LIF LIF LIF LIF LIF LIF Brake Manager Engine Control Steering Manager Vehicle Comm Vehicle to Vehicle Communication I/O I/O I/O (( ))

43 Example of a Cluster--Recursion
Driver Interface Assistant System Gateway Body Host Computer LIF LIF LIF LIF LIF LIF LIF Brake Manager Engine Control Steering Manager Vehicle Comm LIF Vehicle to Vehicle Communication I/O I/O I/O (( ))

44 Gateway Components Gateway Component
Gateway components can connect two clusters that may be based on different architectural styles: T.Luft = 0 Gateway Component T.Air = 32 Green LIF Red LIF The representation of the information in the two clusters will be different, but the semantic content of the message variables must be the same.

45 RT-Protocol is at the Center of an Architecture
The properties of the protocol determine timeliness, composability, error containment, etc. Many of today’s RT Protocols have been designed bottom up with limited architectural vision. (Sometimes with the intention to lock a customer to a supplier). Technology trends force a consolidation.

46 Unidirectional Deterministic Multi-cast Message
Uni-directionality is required to decouple communication from computation decouple the sender behavior from the receiver behavior Determinism is required to establish timeliness simplify the reasoning about the behavior (modus pones) simplify testing (repeatable test cases) be able to implement active replication (TMR) support the certification Multi-cast is required to support the independent observation of the component behavior the support of diagnosis replication of state at multiple components Triple Modular Redundancy

47 Determinism of a Communication Channel
The behavior of a communication channel is called deterministic if (as seen from an omniscient external observer): A message is delivered before an a priori known instant (timeliness). The receive order of the messages is same as the send order. The send order among all messages is established by the temporal order of the send instants of the messages as observed by an omniscient observer. If the send instants of n (n>1) messages are the same, then an order of the n messages will be established in an a priori known manner. Two independent deterministic channels will deliver messages always in the same order before an a priori known instant.

48 Determinism: Timeliness and Order
Timeliness Consistent Temporal Order Probability of Message Arrival Tom 1 Switch TT Tom Ann ET Ann Tom Switch Ann Tom Ann Temporal Delay of ET Time

49 Temporal Order is Obvious
Real Time Red Channel Green Channel

50 Determinism: Simultaneity--Who Wins?
B A B Determinisn: If A wins on the red channel then A must also win on the green channel Real Time Red Channel Green Channel

51 Handling of Simultaneity—A Fundamental Problem
In the hardware : meta-stability In operating systems: mutual exclusion In communication systems: order of messages A two-step solution: (i) Provide consistent view of simultaneity—distinguish between events that are in the sphere of control (SoC) of the system and events that are outside the SoC--difficult (ii) Order simultaneous events according to some a priori established criterion--easy

52 Events outside the SoC: Agreement Protocols
If the occurrence of events is restricted to some active intervals with duration  with an interval of silence of duration  between any two active intervals, then we call the time base /-sparse, or sparse for short. Non-sparse event

53 Agreement Protocol to Generate Sparse Events
To arrive at a consistent view of the temporal order of non-sparse events within a distributed computer system (which does not necessarily reflect the temporal order of event occurrence), the nodes must execute an agreement protocol. (i) exchange information about the observations among all nodes, such that all nodes have the same data set. (ii) every node executes the same algorithm on this data set to arrive at a consistent value and at a sparse interval of the observation.

54 An Impossibility Results
It is impossible to represent the temporal properties of the dynamic analog physical world with true fidelity in digital cyberspace. The conflict between fidelity and consistency can be reduced, but can never be fully resolved. The better the precision of the clock synchronization, the smaller the error introduced by digitalization and synchronization.

55 Core: Time-Triggered Communication
In a time-triggered communication system, the sender and receiver(s) agree a priori on a cyclic time-controlled conflict-free communication schedule for the sending of time-triggered messages. In every period, a message is sent at exactly the same phase. The control and data flow is unidirectional. Error detection is in the sphere of control of the receiver, based on his a priori knowledge of the arrival instants of messages.

56 Fault Containment vs. Error Containment
We do not need an error detector in the value domain if we assume fail-silence. Error Propagation Error Detection Error Detector must be in a separate FCU

57 Temporal Error Containment by the CS
It is impossible to maintain the communication among the correct components of a RT- cluster if the temporal errors caused by a faulty component are not contained. Error containment of an arbitrary temporal node failure requires that the shared Comm. System is a self-contained FCU that has temporal information about the allowed behavior of the nodes– It must contain application- specific state. Temporal Error Containment Boundary Shared Communication System Babbling idiot High priority message in a CAN System?

58 TTP and TT-Ethernet TTP (Time-Triggered Protocol) and TTE (Time- Triggered Ethernet) have been developed to link devices of the TTA. TTP is very data efficient and operates up to a bandwidth of 20 Mbits/second. TTE is fully compatible with standard Ethernet and operates up to 1 Gigabit/second. It has been selected as the standard protocol for NASA. TTP and TTE have been standardized by the SAE. Both protocols are time-triggered deterministic protocols that provide temporal error containment.

59 The Swiss-Cheese Model
Subsystem Failure Normal Operation On-Chip TMR From Reason, J Managing the Risk of Organizational Accidents 1997 Off-Chip TMR NGU Strategy Catastrophic System Event Multiple Layers of Defenses

60 Levels of Fault Mitigation in the TTA
Normal Operation Swift Component Recovery after a transient fault On-Chip TMR: to handle transient and permanent faults within a chip Off-Chip TMR: to handle a transient and permanent fault of a total chip. NGU (Never-Give-Up) Strategy: to handle multiple correlated transient faults.

61 Fault Containment Unit (FCU)
A Fault-Containment Unit (FCU) is an independent subsystem with well-defined interfaces such that the immediate consequences of a single fault (e.g., hardware, software) are contained within this subsystem. Examples: A fail-silent component A micro-processor including the software with a well-defined message interface.

62 Fault-Tolerant Unit (FTU)
A fault-tolerant unit (FTU) is a set of actively redundant FCUs (components) that provide a fault tolerant service to its environment: FTUs have to receive identical input messages in the same order FTUs have to operate in replica determinism The output messages of FTUs should be idempotent As long as a defined subset of the components of the FTU is operational, the FTU is considered operational FTUs provide the continuous service by fault masking.

63 Fault Masking: How Many FCUs in an FTU?
B assumption fail-silent FCUs k+1 assumption Synchronized FCUs 2k + 1 no assumption about FCU (arbitrary) 3k + 1 What is the assumption coverage in cases A and B?

64 What is Needed to Implement TMR?
What architectural services are needed to implement Triple Modular Redundancy (TMR) at the architecture level? Provision of an Independent Fault-Containment Region for each one of the replicated components Synchronization Infrastructure for the components Predictable Multicast Communication Replicated Communication Channels Support for Voting Replica Deterministic (which includes timely) Operation Identical state in the distributed components

65 Replica Determinism: Airplane on Takeoff
Consider an airplane that is taking off from a runway with a flight control system consisting of three independent channels without a global time. Consider the system at the critical instant before takeoff: Channel 1 Take off Accelerate Engine Channel 2 Abort Stop Engine

66 The Critical Role of Time
Speed Critical Takeoff Speed Timeout Channel 1 Timeout Channel 2 Real Time

67 Replica Determinism: Airplane on Takeoff
Consider an airplane that is taking off from a runway with a flight control system consisting of three independent channels. Consider the system at the critical instant before takeoff: Channel 1 Take off Accelerate Engine Channel 2 Abort Stop Engine Channel 3 Take off Stop Engine (Fault)

68 Replica Determinism: Airplane on Takeoff
Consider an airplane that is taking off from a runway with a flight control system consisting of three independent channels. Consider the system at the critical instant before takeoff: Channel 1 Take off Accelerate Engine Channel 2 Abort Stop Engine Channel 3 Take off Stop Engine (Fault) Majority Take off Stop Engine (Fault)

69 Triple Modular Redundancy (TMR)
Triple Modular Redundancy (TMR) is the generally accepted technique for the mitigation of component failures at the system level: V O T E R A/1 V O T E R B/1 A B V O T E R A/2 V O T E R B/2 V O T E R A/3 V O T E R B/3

70 On-Chip TMR 4/27/2018 FP7 GENESYS

71 Off-Chip TMR Green DAS Red DAS TT Ethernet TT Ethernet Voting Actuator
Switch Blue Switch Brown

72 Example: TMR Configuration
Voting Actuator Voting Actuator TT Ethernet Switch Blue Switch Red TT Ethernet Standard Ethernet

73 Example: TMR Configuration
Voting Actuator Voting Actuator TT Ethernet Switch Blue Switch Red TT Ethernet Standard Ethernet

74 Conclusion The TTA provides a framework for the component- based design of dependable distributed real-time systems. The framework can be adapted to different application domains by a hierarchical composition of services. The fault-tolerance mechanism of the TTA mask transient and permanent failures. The conceptual simplicity of the TTA supports the implementation of fast recovery actions in cyber- physical systems.

75 More Information Background information can be found in the second revised edition of my book Real-Time Systems—Design Principles for Distributed Embedded Applications published by Springer Verlag on April 27 ,

76 TTTech at a Glance Today
TTTech Computertechnik AG founded in 1998 Core Business: Reliable Computer Systems based on the time- triggered technology TTP TTEthernet TTTech owns about 140 patents in the field of TT technology. Light House Customers Audi –A8, A6 Boeing—787 Dreamliner, Embraer NASA--Orion Space Program Vestas—Wind Turbines About 350 employees and a turnover of about 50 Mio €. Subsidiaries in DE, US, JP, CZ, RO, IT Cumulative RD Budget: More than 100 Mio € over the years.

77 Anytime Algorithms Help
Start of Processing of Algorithm Time- Triggered Deadline Core segment delivers satisfycing results Optional segment Improves Iteratively the quality of result Real-Time


Download ppt "The Time-Triggered Architecture"

Similar presentations


Ads by Google