Download presentation
Presentation is loading. Please wait.
Published byCamron Phillips Modified over 8 years ago
1
An Algorithm for Automatically Obtaining Distributed and Fault Tolerant Static Schedules Alain Girault - Hamoudi Kalla - Yves Sorel - Mihaela Sighireanu San Francisco, USA, June 23, 2003 POP ART team & OSTRE Team
2
Outline Introduction Modeling distributed real-time systems Problem : How to introduce fault-tolerance ? The proposed solution for fault-tolerance Principles and example Simulations Conclusion and future work
3
3 High level program Compiler Architecture specification Distribution constraints Execution times Real-time constraints Failure specification Fault-tolerant distributed static schedule Fault-tolerant distributed code Code generator Distribution and scheduling fault-tolerant heuristic Model of the algorithm 1.Introduction
4
4 2.Modeling distributed real-time systems b.Architecture Modela.Algorithm Model P1 P2 P3 m1 m2 m3 « P1, P2 and P3 » are processors « m1, m2 and m3 » are communications links « I 1 and I 2 » are inputs operations « O » is output operation « A, B and C » are computations operations I1I1 A B C O I2I2
5
5 fault-tolerant Find a distributed schedule of the algorithm on the architecture which is fault-tolerant to processors failures ? 3.Problem : How to introduce fault-tolerance ? Problem P1 P2P3 m1 m2 m3 I1I1 A B C O I2I2 scheduleschedule
6
6 Solution active software replication A list scheduling heuristic which use the active software replication of operations and communications. fail-silent Processors are assumed to be fail-silent 3.The proposed solution for fault-tolerance Assumption Npf 1 Tolerate a number of processor failures Npf 1
7
7 more than Npf+1 times Each operation/communication is replicated more than Npf+1 times on different processors/links of the architecture graph. 4.The proposed solution for fault-tolerance Principles (1)
8
8 Principles (2) 4.The proposed solution for fault-tolerance
9
9 Principles (3) 4.The proposed solution for fault-tolerance
10
10 schedule pressure The schedule pressure is used as a cost function to select the best processor p for each operation o : where, 4.The proposed solution for fault-tolerance Principles (4)
11
11 1. o | o is an input operation ; ; 2.While do smallest Npf+1 results a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; best candidate operation b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; replicated Npf+1 times scheduled on parallel links c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; minimise the start time d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
12
12 P1 P2 P3 m1 m2 m3 Npf = 1 Number of fail-silent processor that the system must tolerate Npf = 1 Architecture graph Algorithm graph Failures 5.Example I1I1 A B C O I2I2
13
13 1. o | o is an input operation ; ; 2.While do a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
14
14 = Npf = 1 I1I1 A B C O I2I2 = { I 1, I 2 } 5.Example Step 1. (1) P 1 m 2 P 3 m 3 P 2 m 1
15
15 P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 Step 2. (1) I1I1 A B C O I2I2 5.Example Npf = 1 = { } = { I 1 } = { I 1, I 2 } = { I 2, B } Schedule I 1 on P 1 and P 2
16
16 Step 2. (2) I1I1 A B C O I2I2 5.Example Npf = 1 = { I 1 } = { I 1, I 2 } = { I 2, B } = { A, B } P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 Schedule I 2 on P 1 and P 2
17
17 = { I 1, I 2 } P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 Step 2. (3) I1I1 A B C O I2I2 5.Example Npf = 1 = { A, B }
18
18 1. o | o is an input operation ; ; 2.While do a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
19
19 = { I 1, I 2 } ( A, { P 1, P 2, P 3 } ) = { 7,10, 9 } ( B, { P 1, P 2, P 3 } ) = { 9, 6, 8 } ( A, { P 1, P 3 } ) = { 7, 9 } ( B, { P 2, P 3 } ) = { 6, 8 } Min P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 Step 2.a. (3) I1I1 A B C O I2I2 5.Example Npf = 1 = { A, B }
20
20 1. o | o is an input operation ; ; 2.While do a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
21
21 ( A, { P 1, P 3 } ) = { 7, 9 } ( B, { P 2, P 3 } ) = { 6, 8 } P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 Max ( A, { P 1, P 3 } ) = { 7, 9 } I1I1 A B C O I2I2 5.Example Step 2.b. (3) Npf = 1 = { I 1, I 2 } = { A, B }
22
22 1. o | o is an input operation ; ; 2.While do a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
23
23 Schedule A on P 1 and P 3 P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 A A ( A, { P 1, P 3 } ) = { 7, 9 } I1I1 A B C O I2I2 5.Example Step 2.c. (3) Npf = 1
24
24 1. o | o is an input operation ; ; 2.While do a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
25
25 Replicating I 2 on P 3 P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 A A I1I1 A B C O I2I2 5.Example Step 2.d. (3) Npf = 1 I2I2
26
26 1. o | o is an input operation ; ; 2.While do a.Compute the schedule pressure for each operation o of on each processor p and keep the smallest Npf+1 results; b.Select the best candidate operation o best which has the greatest schedule pressure (o best, p) ; c.Schedule o best on each processor p computed at step a and the communications implied by this schedule are replicated Npf+1 times and scheduled on parallel links; d.Try to minimise the start time of o best on each processor p computed at step a by replicating these predecessors on p [ahmad and al.]; e. Update the list of candidate operations : - o best o | o (succs o best ) & (preds o) ) o best end while; 5.Heuristic
27
27 I1I1 A B C O I2I2 P 1 m 2 P 3 m 3 P 2 m 1 I1I1 I1I1 I2I2 I2I2 A I2I2 A 5.Example Step 2.e. (3) Npf = 1 = { I 1, I 2 } = { A, B } = { I 1, I 2, A} = { B }
28
28 Aim : Aim : Compare the proposed heuristic with the HBP heuristic [Hashimoto and al. 2002]. Assumptions : Assumptions : Architecture with fully connect processors, Number of fail-silent processor Npf = 1. Simulation parameters: Simulation parameters: Communication-to-computation ratio, defined as the average communication time divided by the average computation time, CCR = 0.1, 0.5, 1, 2, 5 and 10, Number of operations N = 10, 20, …, 80. Comparison parameter : Comparison parameter : Overhead = length (HTBR or HBP) - length (HTBR without fault-tolerance) longueur (HTBR without fault-tolerance) x 100 % 6.Simulations
29
29 No processor failure Impact of the number of operation One processor fails
30
30 Impact of the communication-to computation ratio No processor failure One processor fails
31
31 7.Conclusion and future work A new fault-tolerant scheduling heuristics: Processors and communications links failures. reliability Maximise the system’s reliability. A new scheduling heuristics based on the active replication strategy. It produces a static distributed schedule of a given algorithm on a given distributed architecture, tolerant to Npf processor failures. Result Future work
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.