Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil

Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil
Symposium on Stabilization, Safety, and Security of Distributed Systems Evaluating Practical Tolerance Properties of Stabilizing Programs Through Simulation The Case of Propagation of Information with Feedback Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil Toronto, Canada October, 2012

Why Simulate Stabilization
Stabilizing program has to recover from an arbitrary system state To prove the algorithm correct, the designer has to focus on stabilization from degenerate states that are rarely achieved in practice. Such exercise tells little about the algorithm’s practical performance Performance evaluations in the area of stabilization are relatively rare. However, they present unique challenges. What to consider? states: randomization is a common answer. Yet, uniformly randomized states may be “mild” - evenly distribute process states and may not represent systemic faults execution models: the model needs to be realistic yet, the results should pertain to the algorithm, not be artifact of the model parameters: stabilization time is common, yet it often hides the complexity of failure recovery. Other parameters need to be considered. We simulate stabilizing PIF and analyze its performance using realistic initial state, three classic execution models and compute a number of stabilization parameters

Outline PIF algorithm parameter selection and experiment setup results
analysis conclusion

PIF Algorithm propagation of information with feedback (PIF)
used to deliver information on rooted trees from root to leaves and get an ack often considered in stabilization literature; proven ideally- and self- [8,9] as well as snap-stabilizing [1] description each process can be in one of three states: idle (i), requesting (rq), replying (rp) root initiates a wave by switching from idle to requesting each intermediate process p propagates request to its children Ch.p each leaf reflects the wave back by switching from idle to replying intermediate processes propagate reply back to root root waits for reply from all children and repeats the cycle

Initial State Selection
tree selection problem: how to select trees that do not favor particular topology or shape solution: Prüfer sequence: a sequence of n-2 labels uniquely defines one of all possible trees of n-labels random sequence chooses labeled tree with equal probability initial state – need to select initial state, then perturb it by fault of varied extent problem: not all states occur with equal probability ex: root is seldom idle solution: start from idle state, randomly pick a number from range significantly larger than system size, run the algorithm fault-free that number of states, then induce fault

Execution Models & Faults
problem: execution model should not appear to favor particular system and or architecture solution: selected 3 classic well-studied execution semantics interleaving – randomly execute one enabled action power-set – randomly pick the number X of actions to execute, randomly pick first, exclude enabled neighbors; continue until X or all enabled actions are selected; execute selected actions synchronous – same as power-set only continue randomly selecting actions until none remains faults randomly pick a process and randomly select its state. Note, may have no observable effect if fault state is the same as correct state all processes are faulty – arbitrary initial state: classic stabilization

Experiment Setup 100 processes avg. tree height 21.64.9
avg. number of leaves 37.53.1 faults varied from one to 100 ran 1,000 experiments for each fault number

Metrics stabilization time – number of execution steps for algorithm to achieve legitimate state (a single wave) number of actions until stabilization* overhead – number of action executions outside the propagation of correct wave (wait time for interleaving semantics [1]) longest causality chain* – - actions are causally related if executed on same or neighbor process of actions* scale – number of processes in the system __ * metrics were not included in published proceedings

Stabilization Time

Overhead

Longest Causality Chain

Scale interleaving semanitcs
varied the system size from 100 to 1000 processes fixed % of faults (100% is arbitrary state, classic stabilization)

Analysis simulation results present a detailed picture of algorithm behavior notes effort (overhead, actions, time) rises then diminishes with fault extent. In legitimate state single fault may launch spurious wave in opposite direction. Stabilization proportional to system size. Further faults tend to break up this wave and accelerate stabilization parallel execution semantics (synchronous, power-set) result in greater overhead

Future Research & Conclusion
the study is not exhaustive: the fault location affects the system differently. We believe that the fault closer to the root has a greater ability to perturb the system state engagement with practice provides feedback for stabilization research: designers are induced to consider and address the problems of practical import in our case – space fault spurious “counter-wave” was wholly unexpected – may need algorithmic measures to handle it

Thank You Questions?

Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil

Similar presentations

Presentation on theme: "Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil

Similar presentations

Presentation on theme: "Jordan Adamek Mikhail Nesterenko Sébastien Tixeuil"— Presentation transcript:

Similar presentations

About project

Feedback