1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping.

Slides:



Advertisements
Similar presentations
Courseware Scheduling of Distributed Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Advertisements

1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
Carnegie Mellon R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems Junsung Kim, Karthik Lakshmanan and Raj Rajkumar Electrical.
Real-Time Scheduling CIS700 Insup Lee October 3, 2005 CIS 700.
Martha Garcia.  Goals of Static Process Scheduling  Types of Static Process Scheduling  Future Research  References.
1 of 14 1 Analysis of Mixed Time-/Event-Triggered Distributed Embedded Systems Paul Pop, Traian Pop, Petru Eles, Zebo Peng Embedded Systems Laboratory.
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
Transparent Environment for Replicated Ravenscar Applications Luís Miguel Pinho Francisco Vasques Ada-Europe 2002 Vienna, Austria June 2002.
1 of 16 April 25, 2006 Minimizing System Modification in an Incremental Design Approach Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer.
P. Albertos* & A. Crespo + Universidad Politécnica de Valencia * Dept. of Systems Engineering and Control, + Dept. of Computer Engineering POB E
1 of 14 1/14 Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo.
Communication Scheduling for Time-Triggered SystemsSlide 1 Communication Scheduling for Time-Triggered Systems Paul Pop, Petru Eles and Zebo Peng Dept.
Bogdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng Department of Computer and Information Science, Linkoping University, Sweden December 3, 2010.
1 of 30 June 14, 2000 Scheduling and Communication Synthesis for Distributed Real-Time Systems Paul Pop Department of Computer and Information Science.
1 of 10 April 25, 2001 Minimizing System Modification in an Incremental Design Approach Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
1 of 14 1/15 Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems Paul Pop, Petru Eles, Zebo Peng Embedded.
Reliability-Aware Frame Packing for the Static Segment of FlexRay Bogdan Tanasa, Unmesh Bordoloi, Petru Eles, Zebo Peng Linkoping University, Sweden 1.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
1 of 16 March 30, 2000 Bus Access Optimization for Distributed Embedded Systems Based on Schedulability Analysis Paul Pop, Petru Eles, Zebo Peng Department.
1 of 12 May 3, 2000 Performance Estimation for Embedded Systems with Data and Control Dependencies Paul Pop, Petru Eles, Zebo Peng Department of Computer.
Real-Time Distributed Databases By: Chris Scardino CSC536 Monday, May 2, 2005.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
1 Oct 2, 2003 Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Traian Pop, Petru Eles, Zebo Peng Embedded Systems Laboratory.
1 of 14 1 Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems Paul Pop Computer and Information Science Dept. Linköpings.
1 of 14 1/15 Design Optimization of Multi-Cluster Embedded Systems for Real-Time Applications Paul Pop, Petru Eles, Zebo Peng, Viaceslav Izosimov Embedded.
Strategic Directions in Real- Time & Embedded Systems Aatash Patel 18 th September, 2001.
Holistic Scheduling and Analysis of Mixed Time/Event-Triggered Distributed Embedded System Traian Pop, Petru Eles, Zebo Peng EE249 Discussion Paper Review.
1 of 14 1 Scheduling and Optimization of Fault- Tolerant Embedded Systems Viacheslav Izosimov Embedded Systems Lab (ESLAB) Linköping University, Sweden.
1 of 16 June 21, 2000 Schedulability Analysis for Systems with Data and Control Dependencies Paul Pop, Petru Eles, Zebo Peng Department of Computer and.
CprE 458/558: Real-Time Systems
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong July 27, 2003 SCI Conference Computer Science Department.
DATE Optimizations of an Application- Level Protocol for Enhanced Dependability in FlexRay Wenchao Li 1, Marco Di Natale 2, Wei Zheng 1, Paolo Giusto.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
1 Memory and Time-Efficient Schedulability Analysis of Task Sets with Stochastic Execution Times Sorin Manolache, Petru Eles, Zebo Peng Department of Computer.
Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
An efficient active replication scheme that tolerate failures in distributed embedded real-time systems Alain Girault, Hamoudi Kalla and Yves Sorel Pop.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
1 of 14 1/15 Synthesis-driven Derivation of Process Graphs from Functional Blocks for Time-Triggered Embedded Systems Master thesis Student: Ghennadii.
Integrated Scheduling and Synthesis of Control Applications on Distributed Embedded Systems Soheil Samii 1, Anton Cervin 2, Petru Eles 1, Zebo Peng 1 1.
Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems Junhe Gan, Flavius Gruian, Paul Pop, Jan Madsen.
Immune Genetic Algorithms for Optimization of Task Priorities and FlexRay Frame Identifiers Soheil Samii 1, Yanfei Yin 1,2, Zebo Peng 1, Petru Eles 1,
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Dynamic Scheduling and Control-Quality Optimization of Self-Triggered Control Applications Soheil Samii, Petru Eles, Zebo Peng Department of Computer and.
1 Scheduling Processes with Release Times, Deadlines, Precedence and Exclusion Relations J. Xu and D. L. Parnas IEEE Transactions on Software Engineering,
Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
1 of 14 1/34 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden.
A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults Ching-Chih Han, Kang G. Shin, and Jian Wu.
1 of 14 1/15 Schedulability-Driven Frame Packing for Multi-Cluster Distributed Embedded Systems Paul Pop, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale
Fault-Tolerant Rate- Monotonic Scheduling Sunondo Ghosh, Rami Melhem, Daniel Mosse and Joydeep Sen Sarma.
1 Schedulability Analysis of Multiprocessor Real-Time Applications with Stochastic Task Execution Times Sorin Manolache, Petru Eles, Zebo Peng {sorma,
Optimization of Time-Partitions for Mixed-Criticality Real-Time Distributed Embedded Systems Domițian Tămaș-Selicean and Paul Pop Technical University.
Reliable energy management System reliability is affected by use of energy management The use of DVS increases the probability of faults, thus damaging.
Prabhat Kumar Saraswat Paul Pop Jan Madsen
Paul Pop, Petru Eles, Zebo Peng
Lecture 24: Process Scheduling Examples and for Real-time Systems
Fault and Energy Aware Communication Mapping with Guaranteed Latency for Applications Implemented on NoC Sorin Manolache, Petru Eles, Zebo Peng {sorma,
Buffer Space Optimisation with Communication Mapping and Traffic Shaping for NoCs Sorin Manolache, Petru Eles, Zebo Peng Linköping University, Sweden.
Replication-based Fault-tolerance for Large-scale Graph Processing
Sorin Manolache, Petru Eles, Zebo Peng {sorma, petel,
Presentation transcript:

1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping University, Sweden Paul Pop Computer Science & Engineering Group Technical University of Denmark, Denmark

2 of 14 2  Hard real-time applications  Time-constrained  Cost-constrained  Fault-tolerant  etc. Motivation  Focus on transient faults and intermittent faults

3 of 14 3 Outline  Motivation  Background and limitations of previous work  Contributions:  Scheduling with fault tolerance requirements  Trading-off transparency for performance  Fault tolerance policy assignment  Mapping optimization with transparency  Checkpoint optimization  Current work

4 of 14 4 P1P N1N1 P1P1 2 P1P1 P1P1 1 2 P1P1 P1P1 P 1/1 Fault Tolerance Techniques Error-detection overhead  N1N1 Re-execution Checkpointing overhead  P1P1 P1P1 1 2 N1N1 Rollback recovery with checkpointing Recovery overhead  P 1(1) P 1(2) N1N1 N2N2 P 1(1) P 1(2) N1N1 N2N2 Active replication P 1/2 P1P1 1 P 1/1 2 P 1/2 2 N1N1 1

5 of 14 5 Fault-Tolerant Time-Triggered Systems Processes: Re-execution, Active Replication, Rollback Recovery with Checkpointing Messages: Fault-tolerant predictable protocol … Transient faults P2P2 P4P4 P3P3 P5P5 P1P1 m1m1 m2m2 Maximum k transient faults within each application run (system period) Static Scheduling

6 of 14 6 Limitations of Previous Work  Design optimization with fault tolerance is limited  Process mapping is not considered together with fault tolerance issues  Multiple faults are not addressed in the framework of static cyclic scheduling  Transparency, if at all addressed, is restricted to a whole computation node

7 of 14 7 Scheduling with Fault Tolerance Reqirements

8 of 14 8 P1P1P1P1 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 k =

9 of 14 9 P1P1P1P1 P2P2P2P2 P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P2 40 k =

10 of P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P k = 2 P1P1P1P1 P 1/1 P 1/

11 of P2P2P2P P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P k = P 1/1 P 1/2 P 1/3

12 of P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P k = P 1/1 P 1/2 P2P2P2P2 P 2/1 P 2/2

13 of P1P1P1P1 P2P2P2P2 m1m1 Conditional Scheduling true P1P1 0 P2P k = P1P1P1P1 P2P2P2P2 P 2/1 P 2/2 P 2/3

14 of Conditional Schedule Table true P1P1 0 m1m1 P2P P1P1P1P1 P2P2P2P2 m1m1 k = 2 N1N1 N2N2

15 of Conditional Scheduling  Conditional scheduling:  Generates short schedules – Scheduling algorithm is very slow – Requires a lot of memory to store schedule tables  Alternative: shifting-based scheduling  Messages sent over the bus should be scheduled at one time  Faults on one computation node must not affect other nodes  Requires less memory  Schedule generation is very fast – Schedules are longer

16 of Root Schedules P2P2 m1m1 P1P1 m2m2 m3m3 P4P4 P3P3 N1N1 N2N2 Bus P1P1 P1P1 Worst-case scenario for P 1 Recovery slack for P 1 and P 2

17 of Extracting Execution Scenarios P2P2 m1m1 P1P1 m2m2 m3m3 P 4/1 P3P3 N1N1 N2N2 Bus P 4/2 P 4/3

18 of Trading-off Transparency for Performance

19 of Good for debugging and testing FT Implementations with Transparency P2P2P2P2 P4P4P4P4 P3P3P3P3 P5P5P5P5 P1P1P1P1 m1m1 m2m2 – regular processes/messages – frozen processes/messages P3P3P3P3 Frozen Transparency is achieved with frozen processes and messages

20 of N1N1 N2N2 P1P1 P2P2 P3P3 N1N1 30X 20 X X N2N2 P4P4 X30  = 5 ms k = 2 No Transparency Deadline P2P2P2P2 P1P1P1P1 P4P4P4P4 m2m2 m1m1 m3m3 P3P3P3P3 P2P2P2P2 m1m1 P1P1P1P1 m2m2 m3m3 P4P4P4P4 P3P3P3P3 no fault scenario N1N1 N2N2 bus P2P2P2P2 m1m1 P1P1P1P1 m2m2 P4P4P4P4 P3P3P3P3 P1P1P1P1 P4P4P4P4 m3m3 the worst-case fault scenario N1N1 N2N2 bus processes start at different times messages are sent at different times

21 of Full Transparency Customized Transparency P2P2P2P2 P1P1P1P1 m2m2 m3m3 P3P3P3P3 P3P3P3P3 m1m1 P4P4P4P4 P3P3P3P3 Customized transparency P2P2P2P2 m1m1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 no fault scenario P2P2P2P2 m1m1 P1P1P1P1 P1P1P1P1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 P2P2P2P2 m1m1 P1P1P1P1 m2m2 P4P4P4P4 P3P3P3P3 P1P1P1P1 P4P4P4P4 m3m3 No transparency Deadline P2P2P2P2 m1m1 P1P1P1P1 P4P4P4P4 m2m2 m3m3 P3P3P3P3 P3P3P3P3 P3P3P3P3 Full transparency

22 of Trading-Off Transparency for Performance 0% 25% 50% 75% 100% k=1k=2k=3k=1k=2k=3k=1k=2k=3k=1k=2k=3k=1k=2k= Four (4) computation nodes Recovery time 5 ms  Trading transparency for performance is essential  How longer is the schedule length with fault tolerance? increasing transparency

23 of Fault Tolerance Policy Assignment

24 of Fault Tolerance Policy Assignment P 1/1 P 1/2 P 1/3 Re-execution N1N1 P 1(1) P 1(2) P 1(3) Replication N1N1 N2N2 N3N3 P 1(1)/1 P 1(2) N1N1 N2N2 P 1(1)/2 Re-executed replicas 2

25 of Re-execution vs. Replication N1N1 N2N2 P1P1 P3P3 P2P2 m1m1 1 P1P1 P2P2 P3P3 N1N1 N2N A1A1 P1P1 P3P3 P2P2 m1m1 m2m2 A2A2 N1N1 N2N2 bus P1P1 P2P2 P3P3 Missed Deadline P 1(1) N1N1 N2N2 bus P 1(2) P 2(1) P 2(2) P 3(1) P 3(2) Met m 1(2) m 1(1) m 2(2) m 2(1) Replication is better N1N1 N2N2 bus P1P1 P2P2 P3P3 Met Deadline P 1(1) N1N1 N2N2 bus P 1(2) P 2(1) P 2(2) P 3(1) P 3(2) Missed m 1(2) m 1(1) Re-execution is better

26 of P1P1 N1N1 N2N2 bus P2P2 P3P3 P4P4 m2m2 Missed N1N1 N2N2 P 1(1) P 3(2) P 4(1) P 2(1) P 1(2) bus P 2(2) P 3(1) P 4(2) Missed m 1(2) m 1(1) m 2(1) m 2(2) m 3(1) m 3(2) N1N1 N2N2 P 1(1) P3P3 P4P4 P2P2 P 1(2) m 2(1) m 1(2) bus Met Optimization of fault tolerance policy assignment Fault Tolerance Policy Assignment P1P1 P2P2 P3P3 N1N1 N2N P4P N1N1 N2N2 P1P1 P4P4 P2P2 P3P3 m1m1 m2m2 m3m3 Deadline

27 of Optimization Strategy  Design optimization:  Fault tolerance policy assignment  Mapping of processes and messages  Root schedules  Three tabu-search optimization algorithms: 1.Mapping and Fault Tolerance Policy assignment (MRX)  Re-execution, replication or both 2.Mapping and only Re-Execution (MX) 3.Mapping and only Replication (MR) Tabu-search Shifting-based scheduling

28 of Experimental Results Mapping and replication (MR) 20 Mapping and re-execution (MX) Mapping and policy assignment (MRX) Number of processes Avgerage % deviation from MRX Schedulability improvement under resource constraints

29 of Checkpoint Optimization

30 of Locally Optimal Number of Checkpoints  1 = 15 ms k = 2  1 = 5 ms  1 = 10 ms P1P1 C 1 = 50 ms No. of checkpoints 2 P1P1 P1P1 12 P1P1 P1P1 P1P P1P1 P1P1 P1P1 P1P P1P1 5 P1P1 P1P1 P1P1 P1P1 P1P

31 of Globally Optimal Number of Checkpoints P2P2 P1P1 m1m1   P1P1 P2P2 P1P1 C 1 = 50 ms P2P2 C 2 =60 ms k = P1P1 P1P1 P1P1 123 P2P2 P2P2 P2P2 123 P1P1 P2P2 P2P2 P1P

32 of % 10% 20% 30% 40% Global Optimization of Checkpoint Distribution (MC) % deviation from MC0 ( how smaller the fault tolerance overhead ) Application size (the number of tasks) 4 nodes, 3 faults Local Optimization of Checkpoint Distribution (MC0) Global Optimization vs. Local Optimization Does the optimization reduce the fault tolerance overheads on the schedule length?

33 of Conclusions  Scheduling with fault tolerance requirements  Two novel scheduling techniques  Handling customized transparency requirements  Design optimization with fault tolerance  Policy assignment optimization strategies  Checkpoint optimization

34 of To be continued… Design Optimization and Scheduling of Mixed Soft-Hard Real-Time Systems Soft Processes Soft Deadlines Hard Processes Hard Deadlines 

35 of Design Optimization of Embedded Systems with Fault Tolerance is Essential