Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,

Similar presentations


Presentation on theme: "© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,"— Presentation transcript:

1 © 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison, WI

2 – 2 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Preview  Focus on tree-based overlay networks (T-BŌN) Leverage characteristics of hierarchical topologies  MRNet overview  Reliability background  Our approach to T-BŌN reliability Main-memory implicit checkpointing protocol

3 – 3 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Research Domain  Target distributed system monitors, tools, profilers and debuggers Paradyn, Tau, etc.Paradyn, Tau, etc.  Fault-model: crash-stop failures  TCP-like reliability for multicast and stateful reduction operations  Tolerate all internal node failures Graceful degradation to flat topology

4 – 4 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks This Year in HPC  Processor Statistics from Top500 List: 7974: Top ten average 18%: ≥ %: clusters 8192: largest cluster 32,768: largest system  In 2005: 65,536 processor system Clusters and MPPs w/ processors will soon be commonplace.

5 – 5 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Large Scale Challenge #1: Performance  MRNet: Multicast/Reduction Overlay Network T-BŌN for scalable, efficient group communications and data analysesT-BŌN for scalable, efficient group communications and data analyses –Scalable multicast –Scalable reduction –In-network data aggregation

6 – 6 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks BE Front-End BE MRNet Example: Running Average Filter 1,181,81,271,51,111,221,32 2,131,272,82,27 3,18 4,18 3,18 4,18 7,18 4,18 7,18 3,18 4,183,18 2,131,272,82,27

7 – 7 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Large Scale Challenge #2: Reliability A system with 10,000 nodes is 10 4 times more likely to fail than one with 100 nodes.

8 – 8 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Large Scale Challenge #2: Reliability  Leverage characteristics of T-BŌNs to provide highly scalable reliability protocols Logarithmic properties Regularity and predictability –Structure –Communication Inherent data redundancy

9 – 9 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Approaches to Distributed Reliability  Reliable group communications  Distributed transactions  Rollback-recovery protocols

10 – 10 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Rollback-Recovery Protocols  Checkpoint/Restart  Challenges: Time Overhead –Checkpointing latency Commit latency (stable storage access) Coordination (coordinated checkpointing) –Recovery latency Calculating recovery point (uncoordinated checkpointing) Space Overhead –Checkpoint storage Multiple/useless checkpoints (uncoordinated checkpointing) Forced checkpoints (communication-induced checkpointing) –Protocol messages Complexity –Heterogeneity –Recovery semantics

11 – 11 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Approach to T-BŌN Reliability  Framework for studying various recovery protocols in T-BŌNs Specify different recovery protocols for experimentation or customization Cost-benefit analyses of various recovery schemes  Three new rollback-recovery protocols 1.Main-memory implicit checkpoints (MMIC) and state regeneration 2.Uncoordinated checkpoints w/ fast recovery 3.Pure communication-induced checkpoints

12 – 12 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Idea Leverage inherent redundancy of stateful reduction networks Eliminate explicit checkpoints  Use volatile storage Reduces checkpoint latency –Checkpointed state used to regenerate the state of other failed processes  Establish recovery clique Enable efficient recovery

13 – 13 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Filter State Operations Input: Set of states from a complete set of sibling nodes Output: Regenerated state of parent node Input: States from a parent node and an incomplete set of children nodes. Output: Regenerated state of failed node(s)

14 – 14 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Filter State Operations (cont’d) Input: States from a node Output: Two states to be assumed by two new sibling nodes jointly responsible for task of original node. Input: Two states from nodes in the network. Output: State to be assumed by a new node responsible for the tasks of the two original nodes.

15 – 15 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC: Recovery Semantics 1.Detect failure 2.Establish a recovery clique Set of processes whose persistent state can be used to regenerate that of failed node 3.Identify take-over node Assumes role of failed node 4.Regenerate persistent state of failed node 5.Reintegrate regenerated state into take- over node 6.Resume

16 – 16 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks BE MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c1 :2,27S c0 :2,8

17 – 17 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c1 :2,27S c0 :2,8 1. Detect Failure

18 – 18 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c0 :2,8S c1 :2,27 1. Detect Failure 2. Calculate Recovery Clique

19 – 19 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter BE S p :8,14 S c3 :2,5S c2 :2,16S c0 :2,8S c1 :2,27 1. Detect Failure 2. Calculate Recovery Clique 3. Assign a “take-over” node.

20 – 20 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter 1. Detect Failure 2. Calculate Recovery Clique 3. Assign a “take-over” node. 4. Regenerate lost state into “take- over” node: 4.1 read(S p, S c0, S c3 ) 4.2 decompose(S p, S c0, S c2, S c3 ) → S c1 ’ 4.3 merge(S c1 ’,S c2 ) → S c2 ’ 4.4 write(S c2 ’) → S c2 BE S p :8,14 S c3 :2,5S c2 :2,16S c0 :2,8S c1 :2,27 2,8 8,14 2,5 S c1 ’:2,27 S c2 ’:4,21

21 – 21 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks MMIC Example: Running Average Filter 5. Update and resume BE S p :8,14 S c3 :2,5S c2 :4,21S c0 :2,8 1. Detect Failure 2. Calculate Recovery Clique 3. Assign a “take-over” node. 4. Regenerate lost state into “take- over” node: 4.1 read(S p, S c0, S c3 ) 4.2 decompose(S p, S c0, S c2, S c3 ) → S c1 ’ 4.3 merge(S c1 ’,S c2 ) → S c2 ’ 4.4 write(S c2 ’) → S c2

22 – 22 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Outstanding Issues and Other Research Evaluation of new rollback recovery protocols Preemptive vs. non-preemptive recovery Failure zone identification Non-trivial filters Failure detection Topology reconfiguration Modeling Transmission layer reliability Efficient data loss repair

23 – 23 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks References  Roth, Arnold, and Miller, “MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools”, in SC2003.  Roth, Arnold, and Miller, “Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned”, in 2004 High-Performance Grid Computing Workshop.  More to come … see you next year! 

24 – 24 –© 2005 Dorian C. Arnold Reliability in Tree-based Networks Filter State Operations (cont’d) Input: None Output: Current state of filter object Input: State of a filter object Output: None Side effect: Checkpoint to volatile/stable storage


Download ppt "© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,"

Similar presentations


Ads by Google