A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Evaluating scalability Peer-to-Peer File Sharing Networks of Sayantan Mitra Vibhor Goyal.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess.
Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard Dunn, Stefan Saroiu Steve Gribble, Hank Levy, John.
1 CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Measurement Studies Lecture 23 Reading: See links on website All Slides © IG.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Applications: Peer-to-Peer Networks Presentation by Michael Smathers Chapter 7.4 Internet Measurement: Infrastructure, Traffic and Applications by.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.
1 Web Performance Modeling Chapter New Phenomena in the Internet and WWW Self-similarity - a self-similar process looks bursty across several time.
Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack IEEE/ACM TRANSACTIONS.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
A Measurement Study of Peer-to- Peer File Sharing Systems Sariou, Gummadi, and Gribble.
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research.
Characterizing Residential Broadband Networks Marcel Dischinger †, Andreas Haeberlen †‡, Krishna P. Gummadi †, Stefan Saroiu* † MPI-SWS, ‡ Rice University,
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
Network Traffic Modeling Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, March 6.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Skype P2P Kedar Kulkarni 04/02/09.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
An Efficient Approach for Content Delivery in Overlay Networks Mohammad Malli Chadi Barakat, Walid Dabbous Planete Project To appear in proceedings of.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
Estimating Bandwidth of Mobile Users Sept 2003 Rohit Kapoor CSD, UCLA.
Ivan Osipkov Fighting Freeloaders in Decentralized P2P File Sharing Systems.
Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems By Stefan Saroiu, P. Krishna Gummadi,
1 CS 425 / ECE 428 Distributed Systems Fall 2013 Indranil Gupta (Indy) Measurement Studies Lecture 22 Nov Reading: See links on website All Slides.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Web Proxy Caching: The Devil is in the Details Ramon Caceres, Fred Douglis, Anja Feldmann Young-Ho Suh Network Computing Lab. KAIST Proceedings of the.
#16 Application Measurement Presentation by Bobin John.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VIII Web Performance Modeling (Book, Chapter 10)
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Accelerating Peer-to-Peer Networks for Video Streaming
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
A Measurement Study of Peer-to-Peer File Sharing Systems
A Measurement Study of Napster and Gnutella
Improving Performance in the Gnutella Protocol
Peer-to-Peer Information Systems Week 6: Performance
Vern Paxson and Sally Floyd, "Why We Don't Know How To Simulate The Internet", Proceedings of the 1997 Winter Simulation Conference, Dec1997 Sally Floyd.
Presentation transcript:

A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad

Motivation In a P2P file sharing system, peers are usually in the edge of the network Does this affect/limit the quality of the infrastructure? What are the characteristics of hosts that choose to participate? Solution: Measure Gnutella and Napster traffic to help understand these issues

Napster

Gnutella

Methodology Crawler periodically takes snapshot of Napster/Gnutella –capture basic info (peers, files shared, …) For peers discovered –measure bottleneck bandwidth –measure latency –track content and degree of sharing Measure lifetime –track availability of peers (at P2P and IP level)

Crawling Napster Peers can only be discovered by querying index Crawler issues queries with names of popular song artists Query responses contain –IP, reported bandwidth, files shared (number, names and sizes) Results: –Captured 40-60% of Napster hosts (contributing to 80-95% of total files) –Could not capture peers that do not share files

Crawling Gnutella Crawler uses ping/pong to discover peers Each crawl captured aprox peers

Measuring bandwidth Reported bandwidth may not be accurate (ignorance or lies) Use bottleneck bandwidth as approximation to available bandwidth –capacity of slowest host along path between two hosts Used SProbe to actively measure both upstream and downstream bottleneck bandwidth –Similar to packet pair technique

Packet Pair Technique Two packets queued next to each other at bottleneck link exit the link t seconds apart: Then, Kevin Lai and Mary Baker. Measuring bandwidth. In Proceedings of IEEE INFOCOM ' s 2 :size of second packet b bnl : bottleneck bandwidth

How many peers are server-like? High-bandwidth, low latency, high availability 8% have upstream bb 10Mbps

Availability – Host uptimes

Availability – Session duration

Free-riders

Is Gnutella robust?

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Presented by Cristina Abad

Three-tiered approach 1. Analyze 200-day trace of Kazaa traffic Considered only traffic going from U. Washington to the outside 2. Develop a model of multimedia workloads Analyze and confirm hypothesis 3. Explore potential impact of locality - awareness in Kazaa

Contributions Obtained some useful characterizations of Kazaas traffic Showed that Kazaas workload is not Zipf –Showed that other workloads (multimedia) may not be Zipf either Presented a model of P2P file-sharing workloads based on their trace results –Validated the model through simulations that yielded results very similar to those from traces Proved the usefulness of exploiting locality- aware request routing

Measurement results Users are patient Users slow down as they age Kazaa is not one workload Kazaa clients fetch objects at-most-once Popularity of objects is often short-lived Kazaa is not Zipf

User characteristics (1) Users are patient

User characteristics (2) Users slow down as they age –clients die –older clients ask for less each time they use system

User characteristics (3) Client activity –Tracing used could only detect users when their clients transfer data –Thus, they only report statistics on client activity, which is a lower bound on availability –Avg session lengths are typically small (median: 2.4 mins) Many transactions fail Periods of inactivity may occur during a request if client cannot find an available server with the object

Object characteristics (1) Kazaa is not one workload

Object characteristics (2) Kazaa object dynamics –Kazaa clients fetch objects at most once –Popularity of objects is often short-lived –Most popular objects tend to be recently born objects –Most requests are for old objects

Object characteristics (3) Kazaa is not Zipf Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests. Zipfs law: popularity of ith-most popular object is proportional to i -α, (α: Zipf coefficient) (Zipf) looks linear on log-log scale

Model of P2P file-sharing workloads On average, a client requests 2 objects/day P(x): probability that a user requests an object of popularity rank x Zipf(1) –Adjusted so that objects are requested at most once A(x): probability that a newly arrived object is inserted at popularity rank x Zipf(1) All objects are assumed to have same size Use caching to observe performance changes (effectiveness hit rate)

Model – Simulation results File-sharing effectiveness diminishes with client age –System evolves towards one with no locality and objects chosen at random from large space New object arrivals improve performance –Arrivals replenish supply of popular objects New clients cannot stabilize performance –Cant compensate for increasing number of old clients –Overall bandwidth increases in proportion to population size

Model validation By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)

Exploring locality-awareness Currently organizations shape or filter P2P traffic Alternative strategy: exploit locality in file- sharing workload –Caching; or, –Use content available within organization to substantially decrease external bandwidth usage –Result: 86% of externally downloaded bytes could be avoided by using an organizational proxy

Questions?

Analysis How can results obtained be used when evaluating P2P schemes? Are any of the measurements obtained biased? Peers are heterogeneous –Incentives –Enforcement (e.g. super-peers in Kazaa)

SProbe Works in uncooperative environments Works on asymmetric network paths Exploit properties of TCP protocol –Send SYN packet with large payload; then, measure time dispersion of received RST packet

Zipf Linguist George Kingsley Zipf observed that for many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n "Zipf's law" is also sometimes used to refer to the corresponding probability distribution Is an instance of a power law Zipf's law is often demonstrated by plotting the data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.