Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield –

Similar presentations


Presentation on theme: "On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield –"— Presentation transcript:

1 On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research Subhabrata Sen – AT&T Labs—Research Walter Willinger – AT&T Labs—Research Internet Measurement Conference Rio de Janeiro, Brazil October 25 th, 2006

2 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 2/19 Motivation P2P systems are very popular in practice. Several million simultaneous users collectively. 60% of all Internet traffic [CacheLogic Research 2005] Measurement studies aid understanding existing systems and user behavior. Capturing a accurate global picture is often infeasible. P2P systems are distributed, large, and rapidly changing. Capturing a global picture is time-consuming, resulting in a blurry picture. Sampling is a natural approach, and has been used implicitly in most earlier P2P measurement studies. But how do we know the samples are representative?

3 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 3/19 The Problem We focus on sampling peer properties. Number of neighbors (degree) Link bandwidth Number of shared files Remaining uptime Sampling peer properties occurs in two steps: Discover and select peers Collect measurements from the selected peers Selecting peers uniformly at random is hard. Temporal: Peer dynamics can introduce bias. Topological: The graph topology can introduce bias. We first examine these two problems in isolation. We then examine them together.

4 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 4/19 Sampling with Dynamics Define V t as the set of peers present at time t. We gather samples over a measurement window of length Δ. The most common approach is to gather peers from the set present during the window:

5 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 5/19 Bias towards Short-Lived Peers Time Short-lived peers Long-lived peer Consider a simple two-peer system, containing: One long-lived peer One rapidly-changing short-lived peer The common approach over-selects short-lived peers. Short-lived

6 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 6/19 Handling Temporal Causes of Bias The common approach is intuitive but incorrect. Sampling peers is the wrong goal. We want to sample peer properties. Two samples from the same peer, but at different times, are distinct. Allow sampling the same peer more than once, at different points in time.

7 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 7/19 Example of avoiding bias towards Short-Lived Peers Time Short-lived peers Long-lived peer Allowing re-selecting a peer solves the problem. The long-lived peer will be selected half the time, reflecting the actual state of the system. How do we select a peer uniformly at random at a particular moment? Short-lived

8 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 8/19 Sampling from Static Graphs Assume for the moment a static graph… Goal: Select a peer uniformly from the graph Discover: Begin with one peer. Query peers to discover neighbors. Classic algorithms: Breadth-First Search, Depth- First Search Select: Choose a subset of discovered peers Gather samples from the selected peers

9 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 9/19 Advantages of Random Walks Problems with classic approaches: Peers are correlated by their neighbor relationship Peers with higher degree discovered more often A peer can only be selected once. Random walks are a promising alternative: The information in the starting location is “lost” by repeatedly injecting randomness at each step. The results are biased, but the bias is precisely known. Random walks can implicitly visit the same peer twice.

10 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 10/19 Random walks, formally Random walks can be described with a transition matrix, P(x,y). P(x,y) is the probability of moving from x to y: P r (x,y) is the probability of moving from x to y after r moves Random walks converge to a stationary distribution: Problem: we want a uniform distribution:

11 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 11/19 The Metropolis—Hastings Method The Metropolis—Hastings method modifies the transition matrix to yield the desired distribution: Proven for static graphs Plugging in our P(x,y) and μ(x): Select a neighbor y of x uniformly at random Transition to y with probability deg(x) / deg(y) Otherwise, self-transition to x.

12 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 12/19 Sampling from Dynamic Graphs Adapting to vanishing peers We maintain a stack of visited peers If a query times out, go back in the stack Hypothesis: A Metropolized random walk will yield approximately unbiased samples in practice. Trivially valid for extremely slowly changing graphs Trivially false for extremely rapidly changing graphs Where is the transition? Methodology: Session-level simulations of a wide variety of situations Determine what conditions lead to biased samples Do those conditions arise in practice?

13 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 13/19 Metrics: Fundamental properties We focus on three fundamental properties that affect the walk: Degree Session length Query latency (in paper only) We compute the KS statistic (D) for each distribution versus a snapshot from an oracle. We evaluate these metrics under a variety of conditions: Several models of churn Several models of degree distribution Four different peer discovery mechanisms

14 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 14/19 Base case Base case: Session length distribution is Weibull (k=0.59, λ=40) Maximum degree: 30 Target degree: 15 Peer discovery mechanism: FIFO rendezvous point Sampled and expected distributions are visually indistinguishable. Very low KS statistic: D < 0.004

15 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 15/19 Varying churn Each point represents a simulation; y-axis show KS statistic (D) Error is low over a wide range of session lengths Becomes significant for median < 2 min High for median < 30 s Type of distribution does not have a large impact

16 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 16/19 Varying topology Little bias when target degree > 2 Degree ≤ 2 means network fragmentation History mechanism bias is due to ~2% of peers with no neighbors. More simulation results in the paper

17 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 17/19 Empirical results We developed the technique into a tool called ion-sampler. Available from our website bash$./ion-sampler gnutella --hops 25 -n 10 10.8.65.171:6348 10.199.20.183:5260 10.8.45.103:34717 10.21.0.29:6346 10.32.170.200:6346 10.201.162.49:30274 10.222.183.129:47272 10.245.64.85:6348 10.79.198.44:36520 10.216.54.169:44380

18 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 18/19 Empirical validation Empirical validation is tricky because there is no perfect baseline for comparison. Full crawling performed by Cruiser [Stutzbach 05 IMC] The full crawl may be slightly biased towards higher degree Ion-sampler records slightly fewer higher degree peers than a full crawl Conclusion: ion-sampler is close to a full crawl in accuracy, and may even be more accurate!

19 Daniel Stutzbach The ION P2P Project http://mirage.cs.uoregon.edu/P2PSlide 19/19 Conclusions and Future Work Summary Temporal and topological bias can lead to sampling error. We present the Metropolized Random Walk with Backtracking technique. Extensive simulations show that it gathers nearly unbiased samples in a wide variety of circumstances. Ion-sampler is a tool for gathering nearly unbiased samples from real P2P systems. Future work Explore improving sampling efficiency for uncommon events. Evaluate MRWB under flash crowd scenarios. Develop additional plug-ins for ion-sampler.


Download ppt "On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield –"

Similar presentations


Ads by Google