Presentation is loading. Please wait.

Presentation is loading. Please wait.

Positive Feedback Loops in DHTs or Be Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling.

Similar presentations


Presentation on theme: "Positive Feedback Loops in DHTs or Be Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling."— Presentation transcript:

1 Positive Feedback Loops in DHTs or Be Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling Churn in a DHT”, available at http://bamboo-dht.org/pubs.html (and in the back of the room)

2 Background A year ago, started benchmarking DHTs Usual goals: –Improve the state of the art –Provide a metric of success Interested in real implementations –Not simulations –Want to use the DHTs in real applications Need a solid experimental framework

3 PlanetLab Our first testbed A “real” network –Some machines bandwidth, CPU limited –Lots of cross traffic But problems –Too hard to get reproducible results –Too little scale (~250 machines)

4 ModelNet Run several virtual hosts per CPU –Override systems calls like sendto, recvfrom Route all packets through single host –Applies delay, queuing, loss –Uses 10,000 node AS-level topology Allows for reasonable scale –Have run with up to 4,000 DHT nodes Reproducible results

5 A Simple Experiment Start 1000 nodes in a DHT (FreePastry) Let network stabilize Start 200 more What happens?

6 FreePastry under Massive Join Does the bandwidth explosion have something to do with the DHT’s collapse?

7 Talk Overview Background Teaser Pastry review Pastry’s problem and a fix Conclusions and future work

8 Pastry Review Each DHT node has –An identifier in [0,2 160 ) –Leaf set Predecessors Successors –Routing table Nodes w/similar prefixes Choose node for each prefix by proximity (in network latency) Each node responsible for keys closest to its ID 0… 10… 110… 111…

9 Pastry Join Algorithm function join (A, G) = G’ = nearest_neighbor (A, G); (B, P) = lookup (G’, ID A ); L A = get_leaf_set (B); for i from 0 to |P| - 1 do k = len_longest_matching_pfx (ID A, ID P i ); R i = get_routing_table_level (P i, k);

10 Probes in Pastry’s Join To compute nearest_neighbor, must probe –Looking for nearest node in some set Existing nodes also probe joining node Castro et al. estimate ~150 probes/join –Independent of congestion for correctness On failure, must probe to find replacement –May need many probes to find closest one

11 Talk Overview Background Teaser Pastry review Pastry’s problem and a fix Conclusions and future work

12 Teaser Explaination In network under stress, many probes –If bandwidth limited, interfere with each other –Lots of dropped probes looks like a failure Pastry responds to failure, sending more –Probability of drop goes up –We have a positive feedback cycle (squelch) Easy to confirm –Increasing available b.w. solves problem

13 What Went Wrong? Pastry publications show it working fine Existing Pastry results are of two types: 1.Simulations of 10,000-100,000 nodes Don’t model queuing, delay, or cross traffic 2.Planetlab tests using 10s of nodes Low scale, ample bandwidth on chosen hosts

14 A Simple Fix Idea: fix broken links periodically –Instead of recovering in reaction to failure –Breaks feedback loop Also, scale back period in response to loss –Now it’s a negative feedback cycle (damping) Still have a probe problem: –How to probe independently of congestion? –Good probes important for neighbor proximity

15 Restoring Proximity Finding the closest neighbor takes time –Meanwhile, routing is no longer O(log n) Fix: fill holes with first appropriate node –Can find such a node using a lookup –Immediately restores O(log n) routing Later, can look for close nodes –Again, periodically, with backoff on failure –Use several techniques not covered here

16 Related Work Chord’s stabilization is proactive, periodic –Not clear what motivated this decision Mahajan et al. –Simulation-based study of Pastry under churn –Automatic tuning of maintenance rate –Suggest increasing rate on failures! Liben-Nowell et al. –Analytical lower bound on maintenance costs

17 Conclusions Simplifying network model dangerous –May lead to bad design, false sense of correctness Separate concerns in DHT routing 1.Correctness – comes from leaf set 2.Efficiency – comes from filled routing table 3.Proximity – only a concern after 1 and 2 Can we do better in simulation? –And still scale to 10,000s of nodes? –ModelNet requires a whole cluster…

18 Thanks for Listening! More information available at http://bamboo-dht.org


Download ppt "Positive Feedback Loops in DHTs or Be Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling."

Similar presentations


Ads by Google