GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
Scalable Content-Addressable Network Lintao Liu
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
Improving Gnutella Willy Henrique Säuberli Seminar in Distributed Computing, 16. November 2005 Papers: I.Making Gnutella-like P2P Systems Scalable; SIGCOMM.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Technion –Israel Institute of Technology Software Systems Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi Melamed.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Maximizing Remote Work in Flooding-based P2P Systems Qixiang Sun Neil Daswani Hector Garcia-Molina Stanford University.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
P2P Group Meeting (ICS/FORTH) Monday, 21 February, 2005 Making Gnutella-like P2P Systems Scalable (Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
An overview of Gnutella
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Unstructured Networks: Search Márk Jelasity. 2 Outline ● Emergence of decentralized networks ● The Gnutella network: how it worked and looked like ● Search.
Peer-to-Peer Data Management
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
GIA: Making Gnutella-like P2P Systems Scalable
Elders know best Lifespan-based ideas in P2P systems
Joydeep Chandra, Santosh Shaw and Niloy Ganguly
Presentation transcript:

GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from the authors’ original presentation

The Peer-to-peer Phenomenon Internet-scale distributed system –Distributed file-sharing applications e.g., Napster, Gnutella, KaZaA File sharing is the dominant P2P app Mass-market –Mostly music, some video, software

The Problem Large scale system: millions of users –Wide range of heterogeneity –Large transient user population (in a system with 100,000 nodes, 1600 nodes join and leave per minute) Existing search solutions cannot scale –Flooding-based solutions limit capacity –Distributed Hash Tables (DHTs) not necessarily appropriate

A Solution: GIA Scalable Gnutella-like P2P system Design principles: –Explicitly account for node heterogeneity –Query load proportional to node capacity Results: –GIA outperforms Gnutella by 3–5 orders of magnitude

Outline Existing approaches GIA: Scalable Gnutella Results: Simulations & Experiments Conclusion

Gnutella Distributed search and download Unstructured: ad-hoc topology –Peers connect to random nodes –Supports keyword–based search Random search –Flood queries across network Scaling problems –As network grows, search overhead increases P1P1 P2P2 P4P4 P3P3 who has “madonna” P 4 has “madonna- american-life.mp3” P5P5 P6P6 P 2 has “madonna-ray-of light.mp3”

Distributed Hash Tables (DHTs) Structured solution –Given the exact filename, find its location Can DHTs do file sharing? –Yes, but with lots of extra work needed for keyword searching Do we need DHTs? Not necessarily: Great at finding rare files, but most queries are for popular files Poor handling of churn – why?

Other Solutions Supernodes [KaZaA] –Classify nodes as low- or high-capacity –Only pushes the problem to a bigger scale Random Walks [Lv et al] –Forwarding is blind –Queries can get stuck in overloaded nodes Biased Random Walks [Adamic et al] –Right idea, but exacerbates overloaded-node problem

Outline Existing approaches GIA: Scalable Gnutella Results: Simulations & Experiments Conclusion

GIA: High-level view Unstructured, but take node capacity into account –High-capacity nodes have room for more queries: so, send most queries to them Will work only if high-capacity nodes: –Have correspondingly more answers, and –Are easily reachable from other nodes

Make high-capacity nodes easily reachable –Dynamic topology adaptation converts them into high- degree nodes Make high-capacity nodes have more answers –One-hop replication Search efficiently –Biased random walks Prevent overloaded nodes –Active flow control GIA Design Query

Dynamic Topology Adaptation Make high-capacity nodes have high degree (i.e., more neighbors), and keep low capacity nodes within short reach from them. Per-node level of satisfaction, S: –0  no neighbors, 1  enough neighbors Satisfaction S is a function of: ● Node’s capacity ● Neighbors’ capacities ● Neighbors’ degrees When S << 1, look for neighbors aggressively

Dynamic Topology Adaptation Each GIA node maintains a host cache containing a list of other GIA nodes. The host cache is populated using a variety of methods (like contacting well- known web-based hosts, and exchanging host information using PING-PONG messages. A node X with S < 1 randomly picks a node Y from its host cache, and examines if it can be added as a neighbor.

Topology adaptation steps Life of Node X: it picks node Y from its host cache Case 1 {Y can be added as a new neighbor} (Let C i represent capacity of node i) if num nbrsX + 1 < max nbrs then we have room ACCEPT Y ; return Case 2 { Node X decides if to replace an existing neighbor in favor of Y } subset := i U  i  nbrsX : C i ≤ C Y if no such neighbors exist then REJECT Y ; return else candidate Z := highest-degree neighbor from subset If Y has higher capacity than Z or (num nbrsZ > num nbrsY + H) {Y has fewer nbrs } thenDROP Z ; ACCEPT Y elseREJECT Y { Do not drop poorly connected nodes in favor of well-connected ones }

Active Flow Control Accept queries based on capacity –Actively allocation “tokens” to neighbors –Send query to neighbor only if we have received token from it Incentives for advertising true capacity –High capacity neighbors get more tokens to send outgoing queries –Allocate tokens with start-time fair queuing. Nodes not using their tokens are marked inactive and this capacity id redistributed among its neighbors.

Practical Considerations Query resilience: node death –Periodic keep-alive messages –Query responses are implicit keep-alive messages Determining node capacity –Function of bandwidth and storage Each query is assigned a unique GUID by its originator. –This helps with reverse-path forwarding of responses. Returning queries are forwarded to a different neighbor, so that they don’t traverse the same path twice

Outline Existing approaches GIA: Scalable Gnutella Results: Simulations & Experiments Conclusion

Simulation Results Compare four systems –FLOOD: TTL-scoped, random topologies –RWRT: Random walks, random topologies –SUPER: Supernode-based search –GIA: search using GIA protocol suite Metric: –Collapse point: aggregate throughput that the system can sustain (per node query rate beyond which the success rate drops below 90%)

Questions What is the relative performance of the four algorithms? Which of the GIA components matters the most? How does the system behave in the face of transient nodes?

System Performance GIA outperforms SUPER, RWRT & FLOOD by many orders of magnitude in terms of aggregate query load % % % population of the object

Factor Analysis AlgorithmCollapse point RWRT RWRT+OHR RWRT+BIAS RWRT+TADAPT RWRT+FLWCTL AlgorithmCollapse point GIA 7 GIA – OHR GIA – BIAS 6 GIA – TADAPT 0.2 GIA – FLWCTL 2 Topology adaptation Flow control No single component is useful by itself; the combination of them all is what makes GIA scalable

Transient Behavior Static SUPER Static RWRT (1% repl) Even under heavy churn, GIA outperforms the other algorithms by many orders of magnitude

Deployment Prototype client implementation using C++ Deployed on PlanetLab: –100 machines spread across 4 continents Measured the progress of topology adaptation…

Progress of Topology Adaptation Nodes quickly discover each other and soon reach their target “satisfaction level”

Outline Existing approaches GIA: Scalable Gnutella Results: Simulations & Experiments Conclusion

Summary GIA: scalable Gnutella –3–5 orders of magnitude improvement in system capacity Unstructured approach is good enough! –DHTs may be overkill –Incremental changes to deployed systems Status: Prototype implementation deployed on PlanetLab