Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to Unbounded Byzantine Faults.

Similar presentations


Presentation on theme: "Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to Unbounded Byzantine Faults."— Presentation transcript:

1 Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to Unbounded Byzantine Faults

2 large system size presents unique challenges and opportunities to ensuring dependability problem  faults: –occur often –affect multiple components –interact unpredictably  asynchronous execution model  faults are spatially/temporally unbounded, complex & undetectable opportunity  a fault directly affects a region rather than whole system  if faults are contained, rest of the system continues to function Faults in System of Large Scale affected faulty unaffected

3 lack of spatial bound  arbitrary number of processes can be faulty  cannot rely on limited scope of fault or number of faulty processes lack of temporal bound  faulty process behaves incorrectly arbitrarily long  cannot wait until fault stops  contain correctness and tolerance instead of faults  use execution models that simplify such containment Difficulties Containing Unbounded Faults

4 Outline containing correctness and tolerance: strict fault containment and strict stabilization execution models and example programs  reactive program: dining philosophers  transformational execution models and programs –output dependent: -independent set selection –output independent: lightweight spanner construction

5 address specification first what does it mean for a system to be correct when its arbitrary portion is faulty? spec defines correct sequences for each process P sequence involves states of P and possibly others a program is locally containing of faults of class F if  constant l (containment radius) such that every P conforms to its spec if faulty processes are at least l hops away from P problem: correctness of P depends on every process in the system conforming to spec or F Containing Correctness fault of class F containment radius l containment locality

6 Strict Fault Containment strict fault containing (SFC) program is locally containing of unbounded Byzantine faults  a process satisfies spec regardless of actions of processes outside locality  SFC-program is containing of bounded and unbounded faults of any class  for each P the spec can only mention processes inside locality  a problem lacking such specs (e.g. routing) does not have SFC-solutions Byzantine fault

7 Strict Stabilization additional tolerance properties to faults within locality for a strictly- fault containing program  strict stabilization – stabilization from transient faults: regardless of actions outside locality, each P eventually satisfies spec

8 Outline containing correctness and tolerance: strict fault containment and strict stabilization execution models and example programs  reactive program: dining philosophers  transformational execution models and programs –output dependent: k -independent set selection –output independent: lightweight spanner construction

9 Dining Philosophers Problem definition  network of processes, each may request to eat  properties –mutual exclusion – no two neighbors eat together –liveness – each requesting process eats eventually execution model  interleaving  communication via shared registers  high-atomicity thinking (T) hungry (H) eating (E) cycle for requesting process

10 Solution to Dining Philosophers priority based actions if T & higher priority neighbors thinking  become hungry if H & no neighbors are eating  eat (ensures MX) E & done  think & give priority to neighbors (ensures liveness)  waiting chain ≤ 3  optimal containment radius of 2 ETHany decreasing priority

11 Fault Containment and Information Propagation fault containment leverages limit on information propagation idea: abstract from the process of information propagation and highlight the result a b c d process: sends info to b sends a’s info to c sends a’s info to d result: d reads from a

12 Execution Models transformation program – given input computes output (e.g. leader election) models for transformation programs – each process reads from processes within range (finite distance) output dependent – each process reads all information within range: input and (atomically) output output independent – each process reads only input within range  every program in this model is strictly fault containing P reads input&output range P reads input only

13 k -Independent Set Selection (cf. [HHJS01]) problem: select a maximal subset of processes S such that for each process in S each other process of S is at least k hops away solution actions if no member of S less than k -hops away  join S if exists member of S less than k -hops away  leave S observe: only faulty node P can make another process Q to leave S if Q leaves S, it can make another process R join S  containment radius is 2k 1-independent set joins Sleaves Sjoins S P Q R k k

14 Outline containing correctness and tolerance: strict fault containment and strict stabilization execution models and example programs  reactive program: dining philosophers  transformational execution models and programs –output dependent: k -independent set selection –output independent: lightweight spanner construction practical problem: fast routing tree construction in sensor networks spanner construction with double range spanner optimization with larger ranges

15 Experimental Platform: Wireless Sensors 4 MHz Amtel processor 8 Kb of programming memory 512B of data memory 916 MHz single-channel, low-power radio 10 Kbps of raw bandwidth uniform antenna length & orientation TinyOS as the runtime system fresh AA batteries

16 Experiment: Fast Routing Tree Construction By Flooding [G+02] 156 nodes are arranged in a 13x12 grid on an open parking lot, with grid spacing of 2 feet. the base station is placed in the middle of the base of the grid and starts the flooding each receiving node rebroadcast the flood message immediately upon receipt and then squelches further broadcasts  the sender is selected as parent, thus routing tree to the base station is formed expectation: a routing tree with relatively regular structure:  # of children, link length, path size, etc.

17 Backward Link Long Link Straggler Clustering 1 hop 2 hops 3 hops final

18 Problems and Solution Approach problem: routing tree constructed fast over “raw” topology is inadequate  uneven clustering (some nodes have too many neighbors)  long links (possibly unreliable)  unoptimal paths (backward links) idea: pre-process the topology to mitigate the problem  weigh links (by length, error rate, node degree, etc.)  locally construct a connected but lightweight spanner –link weight may be reflexive (depend on the spanner, ex: node degree)

19 Lightweight Spanner Construction Using 2k -Range spanner – connected subgraph that includes all nodes (ex: spanning tree) k -local spanner – there is a path within distance ≤ k to each neighbor problem: given a weighted graph (all weights unique) and 2k -range build a lightweight k -local spanner solution: each process P computes the minimum spanning tree for each process Q in distance no more than k and selects the union of incident edges k k P Q P can compute MST for each process Q in this region MST for Q’s region

20 Spanner Optimization Using Ranges > 2 each P computes spanner’s topology in neighborhood with radius range -k  P knows complete spanner in this region P iteratively repeats the procedure on the resultant spanner k k P Q P can compute MST for each process Q in this region k

21 Conclusion complexity and scale of large systems forces unorthodox approaches to faults we explored spatial dimension of fault tolerance to complex unbounded faults, used lack of global info propagation  stated necessary conditions and impossibility results  gave first examples of programs question: how to solve problems that do have global info propagation? is it possible to contain problems before they spread?


Download ppt "Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to Unbounded Byzantine Faults."

Similar presentations


Ads by Google