Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer.

Similar presentations


Presentation on theme: "Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer."— Presentation transcript:

1 Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer Science and Engineering Department Michigan State University

2 Acknowledgement This work is partially sponsored by: NSF, DARPA NEST, ONR URI, and Michigan State University

3 Motivation Programs are subject to unanticipated faults Encounter new classes of faults, add corresponding fault- tolerance How to add fault-tolerance? Develop from scratch (expensive approach) Incrementally add fault-tolerance Reuse of the behaviors of the fault-intolerant program Potential to preserve properties that are hard to specify (e.g., efficiency) How to ensure correctness? After the fact verification Automatic addition of fault-tolerance (correct by construction)

4 Motivation (Continued) Problem: Complexity of automatic addition Automatic addition of fault-tolerance to distributed programs is NP-hard [FTRTFT00], [ICDCS02] How do we deal with this complexity? Develop heuristics Identifying the boundary of polynomial-time addition Step-wise addition (weaker forms of fault-tolerance) The goal of this paper Enhance the fault-tolerance of nonmasking programs Partial automation of fault-tolerance programs

5 Outline Preliminary Concepts Enhancement Problem Enhancement in High Atomicity Model Enhancement for Distributed Programs Example: Byzantine Agreement Program Conclusion and Future Work

6 Preliminary Concepts: Programs and Faults Finite State space S p Invariant S, fault-span T  S p Program p, Fault f, Safety  { (s 0, s 1 ) | (s 0, s 1 )  S p  S p } Fault-tolerance Failsafe, Nonmasking, Masking S T p/fp f SpSp Program Fault

7 Step-Wise Addition Intolerant Program Nonmasking fault-tolerant Masking fault-tolerant This paper [FTRTFT00] Failsafe fault-tolerant [ICDCS02]

8 T SpSp Enhancement Problem Synthesis Algorithm Nonmasking program p Specification Spec Invariant S Masking program p' Invariant S' Faults f Requirements: Only fault-tolerance is added; no new functional behavior is added f S Fault-span T' S ' = T '  S T 'T '

9 Enhancement in High Atomicity Model

10 High Atomicity Model Each process can read/write all program variables TS ms ms: States from where safety will be violated by fault transitions f

11 Enhancement in High Atomicity Model – (Continued) T S Deadlock States appear due to removing some transitions ms Find a state predicate T ' such that: T ' is closed in the computations of the program in the presence of faults The specification is satisfied from every state of T ' (i.e., no deadlocks) Construct p' such that for every (s 0, s 1 )  p' : (s 0, s 1 ) does not violate safety s 0  T '  s 1  T ' T'T' S'S'

12 Enhancement Addition HighAtomicityEnhancement ( p,f: transitions, T:StatePredicate, specification spec ) { 1. Calculate ms; Calculate mt; 2. T' = ConstructFaultSpan( ); 3. if ( T' = {} ) declare no masking f-tolerant program exists; exit ; else Construct the transitions of p'; } AddMasking (p,f: transitions, S:StatePredicate, specification spec) { 1. Calculate ms; Calculate mt; 2.... 3.... 4. repeat 4-1)... 4-2)... 4-3) T := ConstructFaultSpan( ); 4-4)... 4-5) if (S = {} \/ T = {}) declare no masking f-tolerant program exists; exit; until (ExitConditionHolds); 5. Remove cycles in outside the invariant in T ; 6. Construct the transitions of p'; } Fault-intolerant program Nonmasking program Masking program Manual Automatic: Enhancement Partial Automation [FTRTFT00]

13 Enhancement For Distributed Programs

14 Difficulties with Distribution Read/Write restrictions (low atomicity model). A program p Two processes j, k Two Boolean variables a and b Process j cannot read b Can we include the following transition ? a=0,b=0 a=1,b=0 Groups of transitions (instead of individual transitions) must be chosen. a=0,b=1 a=1,b=1 Only if we include the transition

15 Enhancement of Nonmasking Distributed Programs Calculate T' high Calculate S' init = S' low Calculate S reachable from S' low by fault/program transitions Calculate S recovery from where recovery is possible to S' low S recovery = {} S reachable = {} No Yes Declare failure No T' = S' low Calculate p' transitions Yes Search in (T' high – S' low ) Under distribution restrictions S' low = S' low  S recovery Stop Start

16 T A High Atomicity Fault-Span The largest possible domain for the states that can be included in the fault-span of the distributed program S T' high S' high = S  T' high ms

17 The Initial Low Atomicity Invariant Remove states from where an outgoing transition crosses the boundary of S ' high E.g., s 0 Removal is a non-deterministic choice, where we have more than one state to remove T' high S' high S0S0 S' init

18 T' high S reachable S' low Single-Step Reachable States Reachable by a fault/program transition (denoted S reachable ) S' init f S1S1 S1S1 S0S0 S2S2 S3S3 S2S2 S3S3

19 T' high S recovery Single-Step Recovery States Safer recovery in a single step (denoted S recovery ) Goal: infinite computations are possible from all states in S' low s 0 represents a typical recovery state S ' init S0S0 S2S2 S3S3 S2S2 S3S3 S ' low

20 Enhancement of Nonmasking Distributed Programs Calculate T' high Calculate S' init = S' low Calculate S reachable from S' low by fault/program transitions Calculate S recovery from where recovery is possible to S' low S recovery = {} S reachable = {} No Yes Declare failure No Start Yes S' low = S' low  S recovery T' = S' low Calculate p' transitions Stop

21 Example: Byzantine Agreement Why this example? Was used to illustrate the addition of masking fault-tolerance in [SRDS01] Manual enhancement has been already applied [TSE98] Processes: General, g, and three non-generals j, k, and l Variables d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {0, 1} f.j, f.k, f.l : {0, 1} Safety Specification: Agreement: No two non-Byzantine non-generals can finalize with different decisions Validity: If g is not Byzantine, no process can finalize with different decision with respect to g A finalized process should not execute any transition g lkj

22 Example: Byzantine Agreement Read/Write restrictions Readable variables for process j b.j, d.j, f.j, d.g, d.k, d.l Process j can write d.j, f.j Disjkstra ’ s guarded commands Guard  Statement { (s 0, s 1 ) | Guard holds at s 0 and atomic execution of Statement yields s 1 } Nonmasking fault-tolerant program transitions d.j = ┴  f.j = 0  d.j := d.g d.j ≠ ┴  f.j = 0  f.j := 1 d.j = 1  d.k = 0  d.l = 0  d.j := 0 d.j = 0  d.k = 1  d.l = 1  d.j := 1 Fault transitions ¬ b.g  ¬ b.j  ¬ b.k  ¬ b.l  b.j := true b.j  d.j :=0|1

23 Example: Byzantine Agreement (Continued) d.j = d.k = ┴, d.g = 1, d.l = 1, f.l = 0 d.j = d.k = ┴, d.g = 1, d.l = 1, f.l = 1 S0S0 S1S1 A good transition inside the invariant d.j = d.k = 0, d.g = 0, d.l = 1, f.l = 1 S4S4 Fault transition A deadlock state Premature finalization b.g = 1 d.j = d.k = ┴, d.g = 0, d.l = 1, f.l = 1 S3S3 S2S2 Why enhancement is easier?

24 Example: Byzantine Agreement (Continued) d.j = ┴  f.j = 0  d.j := d.g d.j ≠ ┴  f.j = 0  f.j := 1 d.j = 1  d.k = 0  d.l = 0  d.j := 0 d.j = 0  d.k = 1  d.l = 1  d.j := 1  ((d.j = d.k)  (d.j = d.l))  (f.j = 0) Masking fault-tolerant program High atomicity reasoning Synthesize a masking program in high atomicity and then refine it to a distributed program

25 Enhancement vs. Addition Reuse the computations of the nonmasking program Reasoning in high atomicity model has the potential to reduce the complexity of addition

26 Synthesis Framework Development of a synthesis framework Developers of fault-tolerance can interactively add fault-tolerance to fault-intolerant programs Partial automation helps us to reap the benefits of automation as much as possible Enhancement identifies programs where partial automation is possible Implementation of enhancement algorithms in the synthesis framework http://www.cse.msu.edu/~sandeep/software/Code/synthesis-framework/

27 Conclusion and Future Work Enhancement simplifies automated design of masking programs Less asymptotic complexity Polynomial-time enhancement in the low atomicity model (in the state space of the nonmasking program) Sound, but not complete Reasoning in high atomicity simplifies the synthesis of masking distributed programs Future Work: A polynomial-time sound and complete enhancement algorithm for a restricted class of programs and specifications

28 Thank You! Questions?

29 Example: Triple Modular Redundancy Processes: Three processes: j, k, and l Variables and their domains in.j, in.k, and in.l are Boolean variables out belongs to { 0, 1, ┴ } Nonmasking program (+ addition in modulo 3): N1: (out = ┴ )  out := in.j N2: (out != ┴ ) /\ (out != in.j) /\ ((in.j = in.k) \/ (in.j = in.l))  out := in.j Faults: F: (in.j = in.k) /\ (in.j = in.l)  in.j := 0|1 Safety specification: Do not reach states where out is different than the majority of inputs. out should not be changed after it is assigned a value.

30 Example: Triple Modular Redundancy Invariant: S = ((out = ┴ ) /\ (in.j = in.k = in.k)) \/ (out = in.j = in.k) \/ (out = in.j = in.l) \/ (out = in.k = in.l) Fault-span: T = ( (in.j = in.k = in.l) => ((out = ┴ ) \/ (out = in.j = in.k = in.l)) ) Enhancement algorithm: Compute ms: ms = { } Remove bad transitions: {t: t violates safety} and {t: t reaches ms} Construct a new fault-span T ’ : T ’ = T – { s: (out != ┴ ) /\ (out is not equal to majority of inputs) } Masking program: M1: (out = ┴ ) /\ (in.j = in.k) \/ (in.j = in.l)  out := in.j

31 Enhancement of Nonmasking Distributed Programs Calculate T' high Calculate S' init = S' low Calculate S reachable from S' low by fault/program transitions Calculate S recovery from where recovery is possible to S' low S recovery = {} S reachable = {} No Yes Declare failure No Start T' = S' low, calculate p' transitions Yes S' low = S' low  S recovery

32 Enhancement of Nonmasking Distributed Programs Calculate T' high Calculate S' init = S' low Calculate S reachable from S' low by fault/program transitions Calculate S recovery from where recovery is possible to S' low S recovery = {} S reachable = {} No Yes Declare failure No Start T' = S' low, calculate p' transitions Yes S' low = S' low  S recovery

33 Enhancement of Nonmasking Distributed Programs Calculate T' high Calculate S' init = S' low Calculate S reachable from S' low by fault/program transitions Calculate S recovery from where recovery is possible to S' low S recovery = {} S reachable = {} No Yes Declare failure No Start T' = S' low, calculate p' transitions Yes S' low = S' low  S recovery S' init = S' low at the first iteration

34 Enhancement of Nonmasking Distributed Programs Calculate T' high Calculate S' init = S' low Calculate S reachable from S' low by fault/program transitions Calculate S recovery from where recovery is possible to S' low S recovery = {} S reachable = {} No Yes Declare failure No Start T' = S' low, calculate p' transitions Yes S' low = S' low  S recovery


Download ppt "Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer."

Similar presentations


Ads by Google