Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Similar presentations


Presentation on theme: "Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar."— Presentation transcript:

1 Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar

2 Introduction to Peer-to-Peer (P2P) systems l End-systems (or peers), are capable of behaving as clients and servers of data, hence system is scalable and reliable l Peers participation is voluntary, membership is dynamic, hence topology keeps changing l Most popularly used for file sharing, hence peer-to-peer systems have become synonymous with peer-to-peer file sharing networks

3 Classification of P2P systems l P2P computation (e.g. seti@home) l P2P communication (instant messaging) l P2P file-sharing networks l Centralized (e.g. Napster) l Decentralized l Structured (e.g. Chord, CAN, Pastry, Tapestry) l Unstructured (e.g. Gnutella, Kazaa, Freenet, eDonkey, eMule, Direct Connect, …)

4 Popularity of unstructured decentralized P2P networks l Gnutella host count, maintained by Limewire ( http://www.limewire.com ) l good scope for measurement studies because: l deployed and widely used l use a lot of bandwidth during data transfer, hence a concern for network operators l quite a few measurement studies have been done on these systems, some of which we will discuss in this seminar

5 Outline l Characterization of users of P2P systems l Saroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002. l Effect of P2P traffic on the underlying network l Sen, et.al., “Analyzing peer-to-peer traffic across large networks”, IMW’02 l Peer-to-Peer Topologies l Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002. l Searching on the P2P network l Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001 l Deciphering proprietary P2P systems (like Kazaa) l Leibowitz, et.al., “Deconstructing the Kazaa Network”, WIAPP, 2003.

6 Gnutella protocol overview l Connecting to the Gnutella network l bootstrap using GWebCache system and locally cached hostlist l Ping/Pong messages are exchanged with potential neighbors l Searching on the network l Query messages are flooded on the network l QueryHit messages are received (back-propagated along Query path) from peers having the requested content l Downloading the content l peers download files directly from peers having the requested content

7 Characterization of Users of P2P systems S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN’02. l first paper to characterize p2p file sharing systems l Goal: To analyze the following user characteristics l latency l lifetime of peers l bottleneck bandwidth l number of files shared and downloaded l degree of cooperation l methodology: active crawling l systems studied: Napster and Gnutella l data collection: May 2001

8 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Measurement Methodology l active crawling of the Napster and Gnutella systems l Napster: issued queries for popular content, and then queried central server for peer information l Gnutella: used ping/pong messages in protocol to get metadata about peers, and then their neighbors and so on l parallel measurement for: l peer lifetime- periodic probing of peers obtained from crawlers l offline if no response to TCP SYN l inactive if response to TCP SYN is a TCP RST l active if accepts the incoming TCP connection on that port l latency- RTT measurements from one host l bottleneck link bandwidth- active probing using Sprobe, a tool they developed based on packet-pair dispersion technique

9 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Host Lifetime analysis l 20% peers in Napster, Gnutella have IP-level uptime of 93% or more l Napster peers have higher application uptimes than Gnutella peers l the best 20% of Napster peers have uptime of 83% or more and the best 20% of Gnutella peers have uptime of 45% or more l median session duration is 60 minutes for Napster and Gnutella

10 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Latency analysis (Gnutella) l 20% peers have a latency of at most 70ms and 20% have a latency of at least 280ms l correlation between downstream bottleneck bandwidth and latency: two clusters for modems (20-60Kbps, 100-1000ms) and broadband (1Mbps, 60-300ms)

11 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Bottleneck Bandwidth Analysis (Gnutella) l 92% Gnutella peers have downstream bottleneck bandwidth of at least 100Kbps l 22% peers have upstream bottleneck bandwidth of 100Kbps or less l peers are unsuitable to serve content

12 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Downloads, Uploads and Shared Files l relative number of downloads and uploads varies significantly across bandwidth classes l clear client/server behavior of different classes

13 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Shared files v/s Shared Data (Napster and Gnutella) l Strong correlation between number of files shared and amount of shared MB of data l slope of both lines is 3.7MB, the size of a typical MP3 audio file

14 Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Degree of Cooperation (Napster) l 30% of the peers report bandwidth as 64Kbps or less, but actually have significantly higher bandwidths l 10% of the peers reporting higher bandwidths (3Mbps or higher) actually have significantly lower bandwidth

15 Effect of P2P traffic on underlying network S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW 2002. l Goal: To characterize p2p traffic at three aggregation levels- IP, prefix and AS l host distribution and host connectivity l traffic volume and mean bandwidth usage l traffic patterns over time l connection duration and on-time methodology: passive measurements at routers (port based) l systems studied: FastTrack(Kazaa), Gnutella, Direct Connect l analysis of flow-level data collected from multiple border routers across a large tier-1 ISP’s backbone

16 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Measurement Methodology l flow records from multiple border routers matching ports: l 6346/6347: Kazaa l 1214: FastTrack l 411/412: Direct Connect l processed data to eliminate l private IP addresses l invalid AS numbers l final data set contained 800 million flow records

17 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Datasets used for analysis l FastTrack is most popular in terms of number of hosts participating and average traffic volume per day l rapid growth of P2P traffic is mainly caused by increasing number of hosts in the system l Direct Connect systems have higher traffic volume per IP address

18 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Host distribution analysis l # of IP addresses in FastTrack ranges from 0.5 to 2 million l ratio of # of IP addresses in FastTrack:Gnutella:DirectConnect is 150:30:1 l Density of a prefix is the number of unique active IP addresses belonging to it l Density of an AS is the number of unique prefixes belonging to it l FastTrack hosts are distributed more densely than Gnutella and Direct Connect hosts (64:16:4)

19 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Host connectivity analysis (FastTrack) l 48% of individual IPs communicate with at most one IP and 89% with at most 10 IPs l 75% of prefixes and ASes communicate with at least 2 prefixes or ASes l very few hosts have very high connectivity and most hosts have very low connectivity

20 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Traffic volume analysis l CDF of traffic volume per IP/prefix/AS for FastTrack (one day) l distribution of P2P upstream traffic volume across three months

21 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Mean bandwidth usage (FastTrack and Direct Connect) l FastTrack: 33% IP addresses have mean downstream b/w 56Kbps or less; 50% have mean upstream b/w 56Kbps or less l Direct Connect: 20% IP addresses have mean downstream b/w 56Kbps or less; 33% have mean upstream b/w 56Kbps or less

22 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Traffic patterns over time (FastTrack) l traffic volume transferred every hour among FastTrack hosts l number of unique IP addresses, prefixes, ASes active every hour l number of active unique IP addresses in each bin of various sizes l system is very dynamic- hosts join and leave frequently

23 S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Connection duration and On-time (FastTrack) l 50% of the IPs are online for less than one minute/day l 60% IPs, 40% prefixes, 30% ASes stay for less than 10 mins/day l 65% of the IPs join only once l AS, prefix level- not very transient

24 Peer-to-Peer Topologies M. Ripeanu, I. Foster and A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale Peer-to- Peer Systems and Implications for System Design”, IEEE Internet Computing Journal, 2002. l Goal: To discover and analyze the Gnutella overlay topology and evaluate generated traffic l methodology: active crawling l datasets: Nov 2000, March 2001 and May 2001

25 Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Gnutella Network Growth l number of nodes in the largest connected component in the Gnutella network l significantly larger network found during Memorial Day and Thanksgiving l 50 times increase within 6 months

26 Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Distribution of node-to-node shortest paths l more than 95% node pairs are at most 7 hops away l longest node-to- node path is 12 hops

27 Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Averag node connectivity l average number of connections per node remains constant = 3.4

28 Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Node connectivity distribution l Nov 2000: Gnutella nodes organize themselves in a power law l March 2001: connectivity does not look like a power law for all nodes; power law distribution is preserved for nodes with more than 10 links; for less than 10 links, the distribution is almost constant

29 Searching on the P2P network K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001, http://www- 2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html l methodology: passive measurements at one or two peers, made part of the Gnutella network, to log queries and query messages routed through it l data sets: Dec 2000, Jan 2001

30 K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001. Top 20 most popular query types l 17% queries contained non-ASCII strings- filtered them out l most queries for artists, adult content and file extensions (audio) l some queries for books, software etc.

31 K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001. Query popularity distribution l two distinct distributions of document popularity, with a break at query rank 100 l most popular documents are equally popular l less popular documents follow a Zipf-like distribution, with alpha beween 0.63 and 1.24

32 Deciphering proprietary P2P systems Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003. l methodology: passive content-based data collection at a caching server installed at the border of a large ISP l L4 switch inspects first few packets of each TCP connection to detect Kazaa download traffic l redirects Kazaa download traffic through caching server l focus on download traffic only, not control traffic (since it is encrypted)

33 Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 Characteristics of Collected Traces l 38% of all download sessions do not use standard Kazaa port (1214)

34 Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 File download distribution by bytes l CDF of byte popularity distribution for 10%, 1% most popular files l 0.8 % of all files account for 80% of the generated traffic l 0.1% of the most bandwidth hungry files (top 1% of all files) generate 50% traffic

35 Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 File size distribution l note the log-scale on X-axis l 3 distinct modes l 100KB for pictures l 2-5MB for music files l 700MB for movies

36 Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 Quantity and Rate of Distinct Files l new files seen at different time scales- every day, hour, minute l 150,000 distinct files during a 17-day period l daily graph: new files seen continued to decrease, but no steady state value (rate of injection of files in the network) achieved l hourly graph: time of day effect l per-minute graph: 50 new files seen every minute on an average

37 Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 Rate of change of popularity of files l percentage of files that make it to the N most popular files list- (a) in consecutive intervals and (b) after T intervals, compared with first list l measurement interval is 24 hours l 15% of the highly popular files remain popular throughout the experiment, and the rest are popular at short time intervals

38 Open Questions l Mapping a global snapshot of the entire Gnutella topology l Bootstrapping of peers in unstructured peer-to-peer systems (work in progress) l More efficient searching on P2P networks- efforts in this direction include random walks, bloom-filter based techniques etc. l End-point privacy/anonymity is absent in most of these peer-to-peer networks

39 References l Papers covered in the seminar: l S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN 2002. l S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW 2002. l M. Ripeanu, I. Foster, A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002. l Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001. l N. Leibowitz, M. Ripeanu, A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP 2003. l Papers not covered in the seminar: l J. Chu, K.Labonte and B. Levine, “Availability and Locality Measurements of Peer-to-Peer File Systems”, SPIE, July 2002. l F. Bustamante and Y. Qiao, “Friendships that last: Peer lifespan and its role in P2P protocols”, WCW 2003. l R. Bhagwan, S. Savage and G. Voelker, “Understanding Availability”, IPTPS 2003. l Saroiu, et.al., “An Analysis of Internet Content Delivery Systems”, OSDI 2002. l Markatos et.al., “Tracing a large-scale Peer-to-Peer System: An hour in the life of Gnutella”, CCGrid 2002.


Download ppt "Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar."

Similar presentations


Ads by Google