Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo.

Similar presentations


Presentation on theme: "1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo."— Presentation transcript:

1 1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden

2 2 of 14 2/14 Motivation  Hard real-time applications  Timing constraints  Cost constraints  Online preemptive  Flexible  Off-line non-preemptive  Predictable vs.  Faults  Predictable  Transient  Intermittent  Hardware solutions  MARS, TTA, X-by-Wire  Permanent faults  Costly for transient faults  Software solutions  Re-execution/rollback recovery  Checkpointing/rollback recovery  Replication, primary-backup… vs.  Software solutions  Re-execution/rollback recovery  Checkpointing/rollback recovery  Replication, primary-backup…

3 3 of 14 3/14 Outline  Motivation  System architecture and fault-model  Fault-tolerance techniques  Problem formulation  Motivational examples  Tabu-search optimization strategy  Experimental results  Contributions and Message

4 4 of 14 4/14 Processes: Static cyclic scheduling Fault-Tolerant Time-Triggered Systems... Transient faults Processes: Re-execution and replication S1S1 S3S3 S2S2 S4S4 S1S1 S3S3 S2S2 S4S4 TDMA Round Cycle of two rounds Slot Time Triggered Protocol (TTP)  Bus access scheme: time-division multiple-access (TDMA)  Schedule table located in each TTP controller: message descriptor list (MEDL) Messages: Static schedule table Messages: Fault-tolerant protocol

5 5 of 14 5/14 Fault-Tolerant Techniques P1P1 P1P1 P1P1 Re-execution N1N1 P1P1 P1P1 P1P1 Replication N1N1 N2N2 N3N3 P1P1 P1P1 N1N1 N2N2 P1P1 Re-executed replicas 2

6 6 of 14 6/14 Problem Formulation  Given  Fault model  Number of transient faults in the system period  System architecture  Application  WCETs, message sizes, periods, deadlines Application: set of process graphsArchitecture: time-triggered system Fault-model: transient faults  Determine  Schedulable and fault-tolerant design implementation  Fault-tolerance policy assignment  Mapping of processes and messages  Schedule tables for processes and messages

7 7 of 14 7/14 Static Scheduling [Kandasamy et al. 03] P1P1 N 1 : S 1 N 2 : S 11 N 3 : S 14 P2P2 P4P4 P5P5 P3P3 m1m1 m2m2 2 P2P2 P4P4 P3P3 P5P5 P1P1 m1m1 m2m2 N1N1 N2N2 N3N3 S 11 S 12 S 13 P1P1 P1P1 N2N2 S1S1 S2S2 S3S3 P2P2 P2P2 S4S4 P3P3 S5S5 S6S6 S7S7 S8S8 S 10 S9S9 P4P4 P3P3 P4P4 P3P3 P4P4 P4P4 N1N1 S 14 S 15 S 18 P5P5 P5P5 N3N3 Root schedules P1P1 N 1 : S 2 N 2 : S 12 N 3 : S 14 P2P2 P4P4 P5P5 P3P3 m1m1 m2m2 P1P1 Contingency schedules S1S1 S 11 S2S2 S 12 P2P2 Contingency schedules Transparent re-execution Recovery slack

8 8 of 14 8/14 Re-execution vs. Replication N1N1 N2N2 P1P1 P3P3 P2P2 m1m1 1 P1P1 P2P2 P3P3 N1N1 N2N2 4050 40 60 50 70 N1N1 N2N2 TTP P1P1 P2P2 S1S1 S2S2 P3P3 Met A1A1 N1N1 N2N2 TTP P1P1 P2P2 P3P3 S1S1 S2S2 Missed P1P1 N1N1 N2N2 TTP P1P1 P2P2 P2P2 P3P3 P3P3 S1S1 S2S2 m1m1 m1m1 m2m2 m2m2 Deadline Met P1P1 P3P3 P2P2 m1m1 m2m2 A2A2 Replication is better P1P1 S1S1 N1N1 N2N2 TTP P1P1 S2S2 P2P2 P2P2 P3P3 P3P3 Deadline Missed m1m1 m1m1 Re-execution is better

9 9 of 14 9/14 P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 P4P4 m2m2 Missed Fault-Tolerant Policy Assignment P1P1 P2P2 P3P3 N1N1 N2N2 4050 60 80 P4P4 4050 1 N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 m2m2 P4P4 N1N1 N2N2 P1P1 P3P3 S1S1 S2S2 P4P4 P2P2 P1P1 m1m1 m1m1 m2m2 m2m2 P2P2 m3m3 m3m3 P3P3 P4P4 Missed P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 m2m2 P4P4 No fault-tolerance: application crashes N1N1 N2N2 P1P1 P3P3 S1S1 S2S2 P4P4 P2P2 P1P1 m2m2 m1m1 TTP Met Optimization of fault-tolerance policy assignment Deadline

10 10 of 14 10/14 Mapping and Fault-Tolerance P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 m4m4 P1P1 P2P2 P3P3 P4P4 N1N1 N2N2 40X 60 4040 70 X 1 N1N1 N2N2 P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 m2m2 P4P4 m4m4 Best mapping without considering fault-tolerance Deadline Missed P1P1 N1N1 N2N2 TTP P2P2 P3P3 S1S1 S2S2 P4P4 m4m4 m2m2 P1P1 N1N1 N2N2 P2P2 P3P3 S1S1 S2S2 m4m4 m2m2 P4P4 Deadline Met Simultaneous mapping and fault-tolerance

11 11 of 14 11/14 Optimization Strategy  Design optimization:  Fault-tolerance policy assignment  Mapping of processes and messages  Root schedules  Three tabu-search optimization algorithms: 1.Mapping and Fault-Tolerance Policy assignment (MRX)  Re-execution, replication or both 2.Mapping and only Re-Execution (MX) 3.Mapping and only Replication (MR) Tabu-search List scheduling

12 12 of 14 12/14 MRX Tabu-Search Example P1P1 P2P2 P3P3 N1N1 N2N2 4050 60 75 P4P4 4050 1 N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 Design transformations

13 13 of 14 13/14 80 20 Experimental Results 0 10 30 40 50 60 70 90 100 20406080100 80 Mapping and replication (MR) 20 Mapping and re-execution (MX) Mapping and policy assignment (MRX) Number of processes Avgerage % deviation from MRX Schedulability improvement under resource constraints  Case study  Vehicle cruise controller  MRX: schedulable fault-tolerant application with 65% overhead

14 14 of 14 14/14 Contributions and Message  Contributions  Combined re-execution and replication  Optimization algorithms for fault-tolerance policy assignment  Efficient contingency schedule generation Optimization of fault-tolerance policy assignment needed for cost-effective fault tolerance


Download ppt "1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo."

Similar presentations


Ads by Google