P2P Architecture Case Study: Gnutella Network

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Kulesh Shanmugasundaram Security Issues.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Evaluation of Ad hoc Routing Protocols under a Peer-to-Peer Application Authors: Leonardo Barbosa Isabela Siqueira Antonio A. Loureiro Federal University.
Eddie Bortnikov/Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
CS 34701: Large-Scale Networked Systems Professor: Ian Foster TA: Adriana Iamnitchi
Traffic Engineering With Traditional IP Routing Protocols
Copyright 2002 Ellis Horowitz A look at Peer-to-Peer File Sharing with Gnutella Prof. Ellis Horowitz November 25, 2002.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Small World Networks Somsubhra Sharangi Computing Science, Simon Fraser University.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on Mikko Vapa, research student.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
On Power-Law Relationships of the Internet Topology.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
1 Exploiting locality for scalable information retrieval in peer-to-peer networks D. Zeinalipour-Yazti, Vana Kalogeraki, Dimitrios Gunopulos Manos Moschous.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Mapping the Gnutella Network Presented By: Tony Young M.Math Candidate October 7th, 2004.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
P2PComputing/Scalab 1 Gnutella and Freenet Ramaswamy N.Vadivelu Scalab.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Overlay Networks : An Akamai Perspective
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Design of a Robust Search Algorithm for P2P Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Project funded by the Future and Emerging Technologies arm of the IST Programme Search in Unstructured Networks Niloy Ganguly, Andreas Deutsch Center for.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Peer-to-Peer and Social Networks
A look at Peer-to-Peer File Sharing with Gnutella
Topology Mapping Bo Sheng Sept. 15.
Presentation transcript:

P2P Architecture Case Study: Gnutella Network I am … and I’m going to talk about the Gnutella network – more specifically about the macroscopic characteristics of this large-scale, distributed system. Gnutella network is one of the many P2P systems that appeared recently that allow users to exchange files. It’s special because it is completely decentralized: (at least until recently) all nodes performed exactly the same tasks and take decisions based only on local information. Matei Rîpeanu The University of Chicago

Why analyze Gnutella network? Unprecedented scale up to 100k nodes, 100TB data, 10M files today Self-organizing network Staggering growth more than 50 times during first half of 2001 Open architecture, simple and flexible protocol Interesting mix of social and technical issues

Overview Gnutella protocol Tools for exploring the network Network growth Structural graph analysis Is Gnutella a power-law network? Generated (overhead) network traffic Traffic estimates Overlay network topology mapping I’m going to briefly present the protocol and the tools developed to explore the network. We used those tools to track the network over a 7 months period: November 2000 – May 2001. We analyzed the data gathered and tried to explain network growth, prerformed structural analysis on the network topology grapy and discovered growth invariants and analyzed gnutella’s similarities with other large-scale systems. Finally we analyzed generated traffic and the match between …

Gnutella protocol overview P2P file sharing application on top of an overlay network Nodes maintain open TCP connections Messages are broadcasted (flooded) or back-propagated Protocol: Broadcast (Flooding) Back-propagated Node to node Membership PING PONG Query QUERY QUERY HIT File download GET, PUSH

Gnutella search mechanism Steps: Node 2 initiates search for file A 7 1 A 4 2 6 3 5

Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors 7 1 4 2 A 6 3 A 5

Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message 7 1 4 2 A 6 3 A 5

Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message 7 1 4 2 A 6 3 A:5 5 A

Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 2 A:7 A:5 6 3 A 5 A

Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 A:7 2 A:5 6 3 5

Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated File download download A 7 1 4 2 6 3 5

Tools for network exploration Eavesdropper - insert modified nodes into the network to eavesdrop traffic. Crawler - connects to all active nodes and uses the membership protocol to discover graph topology. Client-server approach. Graph analysis tools high-volume offline computations.

Network growth High user interest Better resources Users tolerate high latency, low quality results Better resources DSL and cable modem nodes grew from 24% to 41% over first 6 months. Today >50%. Although the protocol looks almost too simple, and although the failure to scale of the gnutella network has been predicted timje and again, the network managed to grow 100x in about a year (50x during the 6 month period we ran our crawler). … Graph explainations … This growth deserves some explanations: Open architecture / open-source environment Competing implementations Lower overhead network traffic, improved resource utilization, better structure

Growth invariants (1): avg. node connectivity 3.4 links per node on average With the data gathered over this 6 months we performed some structural analysis on the topology graph. A first interesting growth invariant was that the average number of links per node stayed constant. For the graph – each point is a network – on X axis the size of the network and on Y axis the total number of links.

Growth invariants (2): network diameter Node-to-node distance maintains similar distribution Average node-to-node distance grew 25% while the network grew 50 times over 6 months A more interesting invariant is related to the distribution and average values of node-to-node shortest paths for all the topology graphs we’ve obtained. In the figure each line relresents a graph … The darker ones represent earlier network measurements while the lighter one represent later network measurements. As you can see the distributions remain pretty stable … curves have the same shape … they only shift a bit right over time. And this shift is reflected in a 25% increase in average node to node shortest path  all whlie the network grew 50%. Note that this is better than a random graph would do!

Is Gnutella a power-law network? Power-law networks: the number of links per node follows a power-law distribution Examples: the Internet, in/out links to/from HTML pages, citation network, US power grid, social networks. November 2000 An interesting analysis is generated by the question on whether GN is a power-law network? Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary

Is Gnutella a power-law network? Later, larger networks display a bimodal distribution Implications: High tolerance to random node failures preserved Increased reliability when facing an attack. May 2001

Overview Gnutella protocol Network growth Structural graph analysis Generated network traffic: Traffic estimates Does Gnutella overlay network topology match the underlying resources.

Traffic analysis  6-8 kbps per link over all connections Traffic structure changed over time

Total generated traffic 1Gbps (or 330TB/month)! Compare to 15,000TB/month in US Internet backbone (Dec. 2000) Note that this estimate excludes actual file transfers Q: Does it matter? Reasoning: QUERY and PING messages are flooded. They form more than 90% of generated traffic predominant TTL=7 >95% of nodes are less than 7 hops away measured traffic at each link about 6kbs network with 50k nodes and 170k links

Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! 40% of all nodes are in the 10 largest Autonomous Systems (AS) Only 2-4% of all TCP connections link nodes within the same AS Largely ‘random wiring’ Entropy experiment gives similar results

Conclusions Gnutella: self-organizing, large-scale, P2P application based on overlay network. It works! Growth hindered by the volume of generated traffic and inefficient resource use. Discovered growth invariants specific to large-scale systems that: Help predict resource usage Give hints for better search and resource organization techniques. Some solutions to help the network scale: Organize the overlay network to match the underlying infrastructure topology. Investigate methods for reducing traffic (query routing/filtering, better information organization). Exploit locality in user interest  small world network (vorbit despre proiectul nostru de la Chicago) Exploit caches  all while maintaining the self-organizing characteristics

Thank you! Questions?

What’s next? Organize the overlay network to match the underlying infrastructure topology. Investigate methods for reducing traffic (query routing/filtering, better information organization). Is Gnutella network a small-world network? What are the implications? CRED CA ASTA POATE SA DISPARA!

Statistical laws of large-scale systems Zipf’s law: the size of the rth largest occurrence of the event is inversely proportional to it's rank: y ~ r -b, with b close to unity. Power law distributions: Probability distribution of event X is P[X=x]=x -k Pareto distribution: Cumulative probability distribution P[X>x]=x –(k-1) =x – Zipf, Pareto and power-law distributions are basically different ways to express the same phenomenon

H G E F D A B C H G E F D A B C

Overview Gnutella protocol Network growth Statistical properties of large-scale systems Power-law distributions. Power-law networks. Generated (overhead) network traffic.

Power-law distributions Probability distribution of event X is P[X=x]=x –k Present all over WWW and Internet space: the number of HTML pages within a site, visits to a site, links to a page, cache document popularity, etc

Power-law distributions in Gnutella Number of shared files per node Query popularity follows a power-law distribution [Kas01] Implications: Caching is an effective solution to reduce traffic and query latency New search and node organizing mechanisms!