Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Similar presentations

Presentation on theme: "A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad."— Presentation transcript:

1 A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad

2 Motivation In a P2P file sharing system, peers are usually in the edge of the network Does this affect/limit the quality of the infrastructure? What are the characteristics of hosts that choose to participate? Solution: Measure Gnutella and Napster traffic to help understand these issues

3 Napster

4 Gnutella

5 Methodology Crawler periodically takes snapshot of Napster/Gnutella –capture basic info (peers, files shared, …) For peers discovered –measure bottleneck bandwidth –measure latency –track content and degree of sharing Measure lifetime –track availability of peers (at P2P and IP level)

6 Crawling Napster Peers can only be discovered by querying index Crawler issues queries with names of popular song artists Query responses contain –IP, reported bandwidth, files shared (number, names and sizes) Results: –Captured 40-60% of Napster hosts (contributing to 80-95% of total files) –Could not capture peers that do not share files

7 Crawling Gnutella Crawler uses ping/pong to discover peers Each crawl captured aprox peers

8 Measuring bandwidth Reported bandwidth may not be accurate (ignorance or lies) Use bottleneck bandwidth as approximation to available bandwidth –capacity of slowest host along path between two hosts Used SProbe to actively measure both upstream and downstream bottleneck bandwidth –Similar to packet pair technique

9 Packet Pair Technique Two packets queued next to each other at bottleneck link exit the link t seconds apart: Then, Kevin Lai and Mary Baker. Measuring bandwidth. In Proceedings of IEEE INFOCOM ' s 2 :size of second packet b bnl : bottleneck bandwidth

10 How many peers are server-like? High-bandwidth, low latency, high availability 8% have upstream bb 10Mbps

11 Availability – Host uptimes

12 Availability – Session duration

13 Free-riders

14 Is Gnutella robust?

15 Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Presented by Cristina Abad

16 Three-tiered approach 1. Analyze 200-day trace of Kazaa traffic Considered only traffic going from U. Washington to the outside 2. Develop a model of multimedia workloads Analyze and confirm hypothesis 3. Explore potential impact of locality - awareness in Kazaa

17 Contributions Obtained some useful characterizations of Kazaas traffic Showed that Kazaas workload is not Zipf –Showed that other workloads (multimedia) may not be Zipf either Presented a model of P2P file-sharing workloads based on their trace results –Validated the model through simulations that yielded results very similar to those from traces Proved the usefulness of exploiting locality- aware request routing

18 Measurement results Users are patient Users slow down as they age Kazaa is not one workload Kazaa clients fetch objects at-most-once Popularity of objects is often short-lived Kazaa is not Zipf

19 User characteristics (1) Users are patient

20 User characteristics (2) Users slow down as they age –clients die –older clients ask for less each time they use system

21 User characteristics (3) Client activity –Tracing used could only detect users when their clients transfer data –Thus, they only report statistics on client activity, which is a lower bound on availability –Avg session lengths are typically small (median: 2.4 mins) Many transactions fail Periods of inactivity may occur during a request if client cannot find an available server with the object

22 Object characteristics (1) Kazaa is not one workload

23 Object characteristics (2) Kazaa object dynamics –Kazaa clients fetch objects at most once –Popularity of objects is often short-lived –Most popular objects tend to be recently born objects –Most requests are for old objects

24 Object characteristics (3) Kazaa is not Zipf Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests. Zipfs law: popularity of ith-most popular object is proportional to i -α, (α: Zipf coefficient) (Zipf) looks linear on log-log scale

25 Model of P2P file-sharing workloads On average, a client requests 2 objects/day P(x): probability that a user requests an object of popularity rank x Zipf(1) –Adjusted so that objects are requested at most once A(x): probability that a newly arrived object is inserted at popularity rank x Zipf(1) All objects are assumed to have same size Use caching to observe performance changes (effectiveness hit rate)

26 Model – Simulation results File-sharing effectiveness diminishes with client age –System evolves towards one with no locality and objects chosen at random from large space New object arrivals improve performance –Arrivals replenish supply of popular objects New clients cannot stabilize performance –Cant compensate for increasing number of old clients –Overall bandwidth increases in proportion to population size

27 Model validation By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)

28 Exploring locality-awareness Currently organizations shape or filter P2P traffic Alternative strategy: exploit locality in file- sharing workload –Caching; or, –Use content available within organization to substantially decrease external bandwidth usage –Result: 86% of externally downloaded bytes could be avoided by using an organizational proxy

29 Questions?

30 Analysis How can results obtained be used when evaluating P2P schemes? Are any of the measurements obtained biased? Peers are heterogeneous –Incentives –Enforcement (e.g. super-peers in Kazaa)

31 SProbe Works in uncooperative environments Works on asymmetric network paths Exploit properties of TCP protocol –Send SYN packet with large payload; then, measure time dispersion of received RST packet

32 Zipf Linguist George Kingsley Zipf observed that for many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n "Zipf's law" is also sometimes used to refer to the corresponding probability distribution Is an instance of a power law Zipf's law is often demonstrated by plotting the data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.

Download ppt "A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad."

Similar presentations

Ads by Google