Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping.

Similar presentations


Presentation on theme: "1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping."— Presentation transcript:

1 1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden Paul Pop Computer Science & Engineering Group Technical University of Denmark, Denmark

2 2 of 14 2  Hard real-time applications  Time-constrained  Cost-constrained  Fault-tolerant  etc. Motivation  Focus on transient faults and intermittent faults

3 3 of 14 3 Outline  Motivation  Background and limitations of previous work  Contributions:  Scheduling with fault tolerance requirements  Trading-off transparency for performance  Fault tolerance policy assignment  Mapping optimization with transparency  Checkpoint optimization  Current work

4 4 of 14 4 P1P1 0 20 40 60 N1N1 P1P1 2 P1P1 P1P1 1 2 P1P1 P1P1 P 1/1 Fault Tolerance Techniques Error-detection overhead  N1N1 Re-execution Checkpointing overhead  P1P1 P1P1 1 2 N1N1 Rollback recovery with checkpointing Recovery overhead  P 1(1) P 1(2) N1N1 N2N2 P 1(1) P 1(2) N1N1 N2N2 Active replication P 1/2 P1P1 1 P 1/1 2 P 1/2 2 N1N1 1

5 5 of 14 5 Fault-Tolerant Time-Triggered Systems Processes: Re-execution, Active Replication, Rollback Recovery with Checkpointing Messages: Fault-tolerant predictable protocol … Transient faults P2P2 P4P4 P3P3 P5P5 P1P1 m1m1 m2m2 Maximum k transient faults within each application run (system period) Static Scheduling

6 6 of 14 6 Limitations of Previous Work  Design optimization with fault tolerance is limited  Process mapping is not considered together with fault tolerance issues  Multiple faults are not addressed in the framework of static cyclic scheduling  Transparency, if at all addressed, is restricted to a whole computation node

7 7 of 14 7 Scheduling with Fault Tolerance Reqirements

8 8 of 14 8 P1P1P1P1 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 k = 2 020406080100120140160180200

9 9 of 14 9 P1P1P1P1 P2P2P2P2 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 40 k = 2 020406080100120140160180200

10 10 of 14 10 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 P1P1P1P1 P 1/1 P 1/2 020406080100120140160180200

11 11 of 14 11 P2P2P2P2 020406080100120140 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 160180200 90 130 P 1/1 P 1/2 P 1/3

12 12 of 14 12 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 90 13014085 020406080100120140160180200 P 1/1 P 1/2 P2P2P2P2 P 2/1 P 2/2

13 13 of 14 13 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 45 40 k = 2 90 130150 95 14085 020406080100120140160180200 P1P1P1P1 P2P2P2P2 P 2/1 P 2/2 P 2/3

14 14 of 14 14 Conditional Schedule Table true P1P1 0 m1m1 P2P2 45 40 50 90 130 140160 105 150 85 95 P1P1P1P1 P2P2P2P2 m1m1 k = 2 N1N1 N2N2

15 15 of 14 15 Conditional Scheduling  Conditional scheduling:  Generates short schedules – Scheduling algorithm is very slow – Requires a lot of memory to store schedule tables  Alternative: shifting-based scheduling  Messages sent over the bus should be scheduled at one time  Faults on one computation node must not affect other nodes  Requires less memory  Schedule generation is very fast – Schedules are longer

16 16 of 14 16 Root Schedules P2P2 m1m1 P1P1 m2m2 m3m3 P4P4 P3P3 N1N1 N2N2 Bus P1P1 P1P1 Worst-case scenario for P 1 Recovery slack for P 1 and P 2

17 17 of 14 17 Extracting Execution Scenarios P2P2 m1m1 P1P1 m2m2 m3m3 P 4/1 P3P3 N1N1 N2N2 Bus P 4/2 P 4/3

18 18 of 14 18 Trading-off Transparency for Performance

19 19 of 14 19 Good for debugging and testing FT Implementations with Transparency P2P2P2P2 P4P4P4P4 P3P3P3P3 P5P5P5P5 P1P1P1P1 m1m1 m2m2 – regular processes/messages – frozen processes/messages P3P3P3P3 Frozen Transparency is achieved with frozen processes and messages

20 20 of 14 20 N1N1 N2N2 P1P1 P2P2 P3P3 N1N1 30X 20 X X N2N2 P4P4 X30  = 5 ms k = 2 No Transparency Deadline P2P2P2P2 P1P1P1P1 P4P4P4P4 m2m2 m1m1 m3m3 P3P3P3P3 P2P2P2P2 m1m1 P1P1P1P1 m2m2 m3m3 P4P4P4P4 P3P3P3P3 no fault scenario N1N1 N2N2 bus P2P2P2P2 m1m1 P1P1P1P1 m2m2 P4P4P4P4 P3P3P3P3 P1P1P1P1 P4P4P4P4 m3m3 the worst-case fault scenario N1N1 N2N2 bus processes start at different times messages are sent at different times

21 21 of 14 21 Full Transparency Customized Transparency P2P2P2P2 P1P1P1P1 m2m2 m3m3 P3P3P3P3 P3P3P3P3 m1m1 P4P4P4P4 P3P3P3P3 Customized transparency P2P2P2P2 m1m1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 no fault scenario P2P2P2P2 m1m1 P1P1P1P1 P1P1P1P1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 P2P2P2P2 m1m1 P1P1P1P1 m2m2 P4P4P4P4 P3P3P3P3 P1P1P1P1 P4P4P4P4 m3m3 No transparency Deadline P2P2P2P2 m1m1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 P3P3P3P3 P3P3P3P3 Full transparency

22 22 of 14 22 Trading-Off Transparency for Performance 0% 25% 50% 75% 100% k=1k=2k=3k=1k=2k=3k=1k=2k=3k=1k=2k=3k=1k=2k=3 20 244463326092397411548831334886139 40 172943204058284972346090396697 60 122434133043193958285479325886 80 81622101829142739244166274373 Four (4) computation nodes Recovery time 5 ms  Trading transparency for performance is essential 2940496066  How longer is the schedule length with fault tolerance? increasing transparency

23 23 of 14 23 Fault Tolerance Policy Assignment

24 24 of 14 24 Fault Tolerance Policy Assignment P 1/1 P 1/2 P 1/3 Re-execution N1N1 P 1(1) P 1(2) P 1(3) Replication N1N1 N2N2 N3N3 P 1(1)/1 P 1(2) N1N1 N2N2 P 1(1)/2 Re-executed replicas 2

25 25 of 14 25 Re-execution vs. Replication N1N1 N2N2 P1P1 P3P3 P2P2 m1m1 1 P1P1 P2P2 P3P3 N1N1 N2N2 4050 40 60 50 70 A1A1 P1P1 P3P3 P2P2 m1m1 m2m2 A2A2 N1N1 N2N2 bus P1P1 P2P2 P3P3 Missed Deadline P 1(1) N1N1 N2N2 bus P 1(2) P 2(1) P 2(2) P 3(1) P 3(2) Met m 1(2) m 1(1) m 2(2) m 2(1) Replication is better N1N1 N2N2 bus P1P1 P2P2 P3P3 Met Deadline P 1(1) N1N1 N2N2 bus P 1(2) P 2(1) P 2(2) P 3(1) P 3(2) Missed m 1(2) m 1(1) Re-execution is better

26 26 of 14 26 P1P1 N1N1 N2N2 bus P2P2 P3P3 P4P4 m2m2 Missed N1N1 N2N2 P 1(1) P 3(2) P 4(1) P 2(1) P 1(2) bus P 2(2) P 3(1) P 4(2) Missed m 1(2) m 1(1) m 2(1) m 2(2) m 3(1) m 3(2) N1N1 N2N2 P 1(1) P3P3 P4P4 P2P2 P 1(2) m 2(1) m 1(2) bus Met Optimization of fault tolerance policy assignment Fault Tolerance Policy Assignment P1P1 P2P2 P3P3 N1N1 N2N2 4050 60 80 P4P4 4050 1 N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 Deadline

27 27 of 14 27 Optimization Strategy  Design optimization:  Fault tolerance policy assignment  Mapping of processes and messages  Root schedules  Three tabu-search optimization algorithms: 1.Mapping and Fault Tolerance Policy assignment (MRX)  Re-execution, replication or both 2.Mapping and only Re-Execution (MX) 3.Mapping and only Replication (MR) Tabu-search Shifting-based scheduling

28 28 of 14 28 80 20 Experimental Results 0 10 30 40 50 60 70 90 100 20406080100 80 Mapping and replication (MR) 20 Mapping and re-execution (MX) Mapping and policy assignment (MRX) Number of processes Avgerage % deviation from MRX Schedulability improvement under resource constraints

29 29 of 14 29 Checkpoint Optimization

30 30 of 14 30 3 Locally Optimal Number of Checkpoints  1 = 15 ms k = 2  1 = 5 ms  1 = 10 ms P1P1 C 1 = 50 ms No. of checkpoints 2 P1P1 P1P1 12 P1P1 P1P1 P1P1 123 4 P1P1 P1P1 P1P1 P1P1 1234 1 P1P1 5 P1P1 P1P1 P1P1 P1P1 P1P1 12345 3

31 31 of 14 31 Globally Optimal Number of Checkpoints P2P2 P1P1 m1m1   105 5 P1P1 P2P2 P1P1 C 1 = 50 ms P2P2 C 2 =60 ms k = 2 265 P1P1 P1P1 P1P1 123 P2P2 P2P2 P2P2 123 P1P1 P2P2 P2P2 P1P1 1212 255

32 32 of 14 32 0% 10% 20% 30% 40% 406080100 Global Optimization of Checkpoint Distribution (MC) % deviation from MC0 ( how smaller the fault tolerance overhead ) Application size (the number of tasks) 4 nodes, 3 faults Local Optimization of Checkpoint Distribution (MC0) Global Optimization vs. Local Optimization Does the optimization reduce the fault tolerance overheads on the schedule length?

33 33 of 14 33 Conclusions  Scheduling with fault tolerance requirements  Two novel scheduling techniques  Handling customized transparency requirements  Design optimization with fault tolerance  Policy assignment optimization strategies  Checkpoint optimization

34 34 of 14 34 To be continued… Design Optimization and Scheduling of Mixed Soft-Hard Real-Time Systems Soft Processes Soft Deadlines Hard Processes Hard Deadlines 

35 35 of 14 35 Design Optimization of Embedded Systems with Fault Tolerance is Essential


Download ppt "1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping."

Similar presentations


Ads by Google