Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia Computing & Networking 2006 1 Shanyu Zhao, Daniel Stutzbach, Reza Rejaie Multimedia & Internetworking Research Group (Mirage) Computer & Information.

Similar presentations


Presentation on theme: "Multimedia Computing & Networking 2006 1 Shanyu Zhao, Daniel Stutzbach, Reza Rejaie Multimedia & Internetworking Research Group (Mirage) Computer & Information."— Presentation transcript:

1 Multimedia Computing & Networking 2006 1 Shanyu Zhao, Daniel Stutzbach, Reza Rejaie Multimedia & Internetworking Research Group (Mirage) Computer & Information Science Department University of Oregon http://mirage.cs.uoregon.edu Characterizing Files in the Modern Gnutella Network: A Measurement Study

2 Multimedia Computing & Networking 2006 2 Introduction P2P applications are very popular over the Internet File-sharing: Gnutella, Kazza, eDonkey Content distribution: BitTorrent IP telephony: Skype P2P applications remain popular because of Ease of deployment, self-scaling, infrastructure-less Significant impact on the Internet Characterizing P2P applications is essential for Evaluating their performance and improving their designs Conducting meaningful simulations and analytical study Examining their impact on the network  Characteristics of large scale P2P applications are not well understood!

3 Multimedia Computing & Networking 2006 3 P2P Systems: An Overview (I) Theme: enabling a group of peers (computers) to share their resources (e.g. file, bandwidth, storage, CPU) As participating peers arbitrarily join & leave, they form an (application level) overlay topology. Overlay is inherently dynamic No especial support from the network (e.g. multicast) Overlay is used for resource discovery, management

4 Multimedia Computing & Networking 2006 4 P2P Systems – Overview (II) Inherent properties: Scalability: available resources organically grows with the number of peers Churn: peers voluntarily join/leave Heterogeneity: peers have different capabilities Two basic architectures: 1)Unstructured: peers form a randomly connected overlay 2) Structured: peers form an overlay with certain properties (ring, tree)

5 Multimedia Computing & Networking 2006 5 Effect on the Internet 60% of all Internet traffic [CacheLogic Research 2005] Some P2P apps have millions of simultaneous users. Geographically distributed. Gnutella population (Oct 04 – Jan 06) Gnutella overlay in 2002

6 Multimedia Computing & Networking 2006 6 Research on P2P Networking Active area of research since 2001 Mostly focusing on new architectures, new resource discovery/management techniques Evaluation is only feasible through simulation or small scale experiments with synthetic workloads. Few empirical studies on P2P systems  Characteristics of widely-deployed P2P systems are not well understood.  Peer dynamics: e.g. dist of peer uptime  Overlay properties: e.g. dist of peer degree  Resource properties: e.g. popularity dist of files

7 Multimedia Computing & Networking 2006 7 Methodology Characterizing P2P applications requires capturing system “snapshots”. Snapshot is a graph that represents state of the system at a given point of time (peers = nodes, connections = edges). Individual snapshots reveal instantaneous properties. Consecutive snapshots reveal dynamics. Ideally, a snapshot is captured instantaneously. In practice, a snapshot is iteratively discovered by a P2P crawler. P2P apps should provide support for crawler, e.g. query a peer for list of neighbors, files.  It is difficult to characterize proprietary P2P applications.

8 Multimedia Computing & Networking 2006 8 Cruiser: a Fast P2P Crawler We developed a parallel crawler, called Cruiser. Features: Master-slave architecture, master coordinates among slaves, each slave crawls hundred peers simultaneously Dynamic adaptation to bandwidth & CPU constraints Generic crawler, accommodates plug-ins Orders of magnitude faster than other P2P crawlers: Captures one million Gnutella nodes in around 7 minutes 140K peers/min (visiting 22K peers/min) >> 2.5 peers/min Lots of important implementation issues: Setting timeout, no of file-descriptors per process, dealing with local NAT box

9 Multimedia Computing & Networking 2006 9 Evaluating Snapshot Accuracy No ref. snapshot to compare Completeness of captured snapshots: edges, nodes Tradeoff between granularity & completeness of snapshots Node distortion > 4% Edge distortion > 15% 30% of peers are unreachable 3% departed peer 17% behind firewall (NAT) 10% overloaded !! Cruiser/ Peers discovered (*10,000)

10 Multimedia Computing & Networking 2006 10 Previous Studies Captured a small population of peers Partial snapshot through a short crawl Periodic probe of a fixed group of peers  Have not verified whether the captured population is representative Conducted more than 3 years ago (outdated) Population of these apps has significantly grown New features & two-tier arch. were incorporated Characterizing Files/

11 Multimedia Computing & Networking 2006 11 Measurement Methodology Characterizing files requires file snapshots. Obtaining the list of shared files & neighbor info. from individual peers  a content crawl + a topolgy crawl Individual snapshots reveal static & topological analysis. Consecutive snapshots reveal dynamic analysis. Topology crawl is much faster than content crawl (minutes vs hours) Other challenges: NAT, DHCP, fileID, …(see paper). Minimizing the distortion in file snapshots by Capturing a complete snapshot with a high-speed crawler Decoupling topology crawl from content crawl Topology crawl Content Crawl Topology crawl 5.5 hours 15 min Characterizing Files/ Top-level overlay Leaf Ultrapeer

12 Multimedia Computing & Networking 2006 12 Dataset Captured around 50 snapshots Average log size/snapshot: 10GByte Each snapshot represents 800 Terabyte content 100 million unique files 0.5 million reachable peers, 20% of identified peers  Available content in Gnutella = 4,000 Terabytes Reported results were consistent across multiple snapshots Post processing e.g. Removed duplicate files reported by individual peers (9% of all captured files) Characterizing Files/

13 Multimedia Computing & Networking 2006 13 Summary of Characterizations 1) Static analysis: characteristics of files at a given point of time 2) Topological analysis: correlation between file distribution and overlay topology 3) Dynamics analysis: changes in file characteristics over time Characterizing Files/

14 Multimedia Computing & Networking 2006 14 Free Riding Characterizing Files/Static Analysis 352 332 349 363 350 297 340 12% 15% 12% 16% 14% 159K 235K 125K 34K 156K 79K 394K PeersNoneFiles Ultra Leaf Long-lived Ultra Short-lived Ultra Long-lived leaf Short-lived Leaf total % of free riders reported in previous studies 66% in 2000 [Adar] 25% in 2002 [Saroiu]  % of free riders have dropped June 13, 2005 [rounded numbers] Free Riders

15 Multimedia Computing & Networking 2006 15 Resource Sharing How much resources (files, storage) peers contribute? Dist. of peers contributing: x files conforms power-law x MByte conforms power-law Most peers contribute little, but few contribute a lot Shared files vs storage Not as strong as reported by Saroiu et al. 2002 Characterizing Files/Static Analysis

16 Multimedia Computing & Networking 2006 16 File Popularity Representing availability of individual files. Follows Zipf distribution Popularity distribution remains stable over time Characterizing Files/Static Analysis

17 Multimedia Computing & Networking 2006 17 File Types in 2001, chu et al. reported Audio: 67% of files, 79% of bytes Video: 2% of files, 19% of bytes mp3 files are very popular! mm files make up: 73% files, 93% bytes Non-mm: jpg, gif, htm, exe, txt Video files become more popular Characterizing Files/Static Analysis mp361%37% wma2.7%1.3% wave1.9%0.7% m4a1.4%0.7% total67%40% wmv2.3%3.4% mpg2.4%23.3% avi0.8%24.5% asf.14%0.64% Type File%Byte% Type File%Byte% Major Audio Types Major Video Types total5.6%52%

18 Multimedia Computing & Networking 2006 18 Topological Analysis Is there any correlation between locations of a file and overlay topology? i.e. Are copies of a file topologically clustered? File locations are affected by two factors: 1) Scoped search => topological clustering 2) Churn => random distribution  Which factor is dominant? Examining from two angles: Per-file perspective Per-peer perspective Characterizing Files/

19 Multimedia Computing & Networking 2006 19 Topological Analysis Simulate flood-based query from 100 random peers No of messages to find 5 copies Files with different popularity Random vs realistic file distr. Average similarity of content between 100 random peers with one/two/three-hop neighbors.  No topological clustering exists  Churn is the dominant factor  Use random file dist. for sim  Select random peers to characterize files (non trivial) Characterizing Files/

20 Multimedia Computing & Networking 2006 20 Dynamic Analysis How do various characteristics of available files change over different timescales? Peers add/download or remove files Peers join/leave the system 1) Variations in shared files by individual peers Dynamics IP address introduces error 2) Variations in popularity of individual files 3) Trend in popularity changes Characterizing Files/

21 Multimedia Computing & Networking 2006 21 Variations of files at individual peers Ratio of added/removed files to total files (degree of change) 3000 random peers Timescales: 2hr, 6hr, 1day, 1wk More change over longer timescales seems intuitive Change in popularity of 50K files over one-day interval More changes for more popular Characterizing Files/Dynamic Analysis

22 Multimedia Computing & Networking 2006 22 Change in file popularity Characterizing Files/Dynamic Analysis Top 100 files Top 1000 files Change in popularity For top 100 and 1000 files Over different timescales For any timescale, more popular files exhibit larger changes Changes occur more rapidly  Caching references is useful These all seem intuitive but one needs to quantify rate of changes

23 Multimedia Computing & Networking 2006 23 Trends in Popularity Changes Characterizing Files/Dynamics Analysis Goal: to predict popularity of a file in the future? No major change in popularity over several days Larger changes over a few months The key is to quantify the rate and pattern of changes. Significantly more snapshots are required to derive any reliable conclusion


Download ppt "Multimedia Computing & Networking 2006 1 Shanyu Zhao, Daniel Stutzbach, Reza Rejaie Multimedia & Internetworking Research Group (Mirage) Computer & Information."

Similar presentations


Ads by Google