Download presentation
Presentation is loading. Please wait.
1
Exploring Tradeoffs in Failure Detection in P2P Networks Shelley Zhuang, Ion Stoica, Randy Katz HIIT Short Course August 18-20, 2003
2
Problem Statement One of the key challenges to achieve robustness in overlay networks: quickly detect a node failure Canonical solution: each node periodically pings its neighbors Propose keep-alive techniques Study the fundamental limitations and tradeoffs between detection time, control overhead, and probability of false positives
3
Outline Motivation Network Model and Assumptions Keep-alive Techniques Performance Evaluation Conclusion
4
Network Model and Assumptions P2P system with n nodes Each node A knows d other nodes Average path length = l Node up-time ~ i.i.d. T = exponential(λ f ) Failstop failures If a neighbor is lost, a node can use another neighbor to route the packet w/o affecting the path length
5
Packet Loss Probability δ = average time it takes a node to detect that a neighbor has failed Probability that a node forwards a packet to a neighbor that has failed is 1- e -λ f δ δλ f P(T-t δ | T t) = P(T<=δ) Probability that the packet is lost is p l lδλ f δ T pdf
6
Outline Motivation Network Model and Assumptions Keep-alive Techniques Performance Evaluation Conclusion
7
Aliveness Techniques Baseline –Each node sends a ping message to each of its neighbors every Δ seconds A BC D
8
Aliveness Techniques Information Sharing –Piggyback failures of neighbors in acknowledgement messages –Best case: completely connected graph of degree d BC DA
9
Aliveness Techniques Boosting –When a node detects failure of a neighbor, D, it announces to all other nodes that have D as their neighbor –Best case: completely connected graph of degree d BC DA
10
Outline Motivation Network Model and Assumptions Keep-alive Techniques Performance Evaluation Conclusion
11
Performance Evaluation Case studies –d-regular network –Chord lookup protocol Chord event driven simulator –Gnutella join/leave trace –Packet loss rate –Control overhead Planetlab experiments –Planetlab event driven simulator –False positives
12
Loss Rate – Gnutella Loss Rate = # Lookup timeouts / # Lookups 20 lookups per second Boosting (simple) - No additional state
13
Loss Rate – Gnutella T to seconds before deciding that a probe is lost Multiple losses before deciding that a neighbor has failed
14
Overhead (count) – Gnutella Constant probing overhead (1 probe/second) Small difference due to boost messages
15
Overhead (bps) – Gnutella Boosting w/ bptr 1.29 times the baseline
16
Overhead (bps) – Gnutella Send backpointers every 10 probe acks
17
False Positive – Planetlab Propagation of positive information Most false positives are of TO = 0, 1 increase probe timeout threshold
18
Overhead (bps) – Planetlab Overhead from boost messages and positive information correlate with the loss rate
19
Outline Motivation Network Model and Assumptions Keep-alive Techniques Performance Evaluation Conclusion
20
Examined three keep-alive techniques in Chord with Gnutella join/leave trace By carefully designing keep-alive algorithms, it is possible to significantly reduce packet loss probability Probability of false positive for boosting with backpointer < 0.01 for loss rate ~ 8.6% by propagating positive information and increasing probe timeout threshold
21
Future Work Evaluate keep-alives schemes under massive failures and churn Optimal control resource allocation strategy for a given network topology, failure rate, and load distribution Other applications of keep-alive techniques?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.