An overview of Gnutella

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
Structuring Unstructured Peer-to-Peer Networks Stefan Schmid Roger Wattenhofer Distributed Computing Group HiPC 2007 Goa, India.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Search in Power-Law Networks Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides also borrowed from the following paper Path Finding Strategies.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Improving Gnutella Willy Henrique Säuberli Seminar in Distributed Computing, 16. November 2005 Papers: I.Making Gnutella-like P2P Systems Scalable; SIGCOMM.
Small-world Overlay P2P Network
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Technion –Israel Institute of Technology Software Systems Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi Melamed.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Unstructured Routing : Gnutella and Freenet Presented By Matthew, Nicolai, Paul.
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
P2P Group Meeting (ICS/FORTH) Monday, 21 February, 2005 Making Gnutella-like P2P Systems Scalable (Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick.
1 ISP-aided Biased Query Search in P2P Systems Vinay Aggarwal and Anja Feldmann Deutsche Telekom Laboratories / TU Berlin Berlin,
1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Unstructured Networks: Search Márk Jelasity. 2 Outline ● Emergence of decentralized networks ● The Gnutella network: how it worked and looked like ● Search.
An example of peer-to-peer application
BitTorrent Vs Gnutella.
An overview of Gnutella
Peer-to-Peer and Social Networks
GIA: Making Gnutella-like P2P Systems Scalable
Unstructured Routing : Gnutella and Freenet
Presentation transcript:

An overview of Gnutella

Gnutella is a protocol for distributed search What is Gnutella? Gnutella is a protocol for distributed search peer-to-peer comm decentralized model No third party lookup Gnutalla networks are dumb. Just a bunch of nodes connected together with 3 or 4 connections each What’s the only way they can communicate?? - They have to asked their neighbours – and then ask their neighbours to ask their neighbours - peers provide client-side interfaces for issuing queries and viewing search results AND accept search queries from other peers Due to its distributed nature, a network of Gnutella nodes is highly fault-tolerant i.e. operation of the network will not be interrupted if a subset go offline. Two stages: Join Network … later Use Network, I.e discover / search other peers

Gnutella Jargon Servent: A Gnutella node. Each servent is both a server and a client. 2 Hops Hops: a hop is a pass through an intermediate node 1 Hop client TTL: how many hops a packet can go before it dies (default setting is 7 in Gnutella)

Gnutella Scenario Step 0: Join the network Step 1: Determining who is on the network "Ping" packet is used to announce your presence on the network. Other peers respond with a "Pong" packet. Also forwards your Ping to other connected peers A Pong packet also contains: an IP address port number amount of data that peer is sharing Pong packets come back via same route Step 2: Searching Gnutella "Query" ask other peers if they have the file you desire A Query packet might ask, "Do you have any content that matches the string ‘Homer"? Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers Continues for TTL Step 3: Downloading Peers respond with a “QueryHit” (contains contact info) File transfers use direct connection using HTTP protocol’s GET method When there is a firewall a "Push" packet is used – reroutes via Push path Step 1: Determining who is on the network A "Ping" packet is used to announce your presence on the network. When another computer hears your Ping it will respond with a "Pong" packet. It will also forward your Ping packet to other computers to which it is connected and, in response, they too will send back Pong packets. Each Ping and Pong packet contains a Globally Unique Identifier (GUID). A Pong packet also contains an IP address, port number, and information about how much data is being shared by the computer that sent the Pong. Pong packets are not necessarily returned directly to the point of origin, instead they are sent from computer to computer via the same route as the initial Ping. After sending a Ping to one computer you will start receiving many Pong responses via that one computer. Now that the Pong packets have told you who your active peers are, you can start making searches. Step 2: Searching Gnutella is a protocol for distributed search. Gnutella "Query" packets allow you to search by asking other computers if they are sharing specific content (and have an acceptably fast network connection). A Query packet might ask, "Do you have any content that matches the string ‘Homer"? This question is sent to all the computers that sent you Pong packets. Each of these computers does two things. First, each computer checks to see if it has any content that matches the search string. In this case it looks to see if there are any files in a specified directory marked "sharable to the outside world" that have the letters "Homer" in its complete file path. Second, each computer sends your Query packet on to all the computers to which it is connected. These computers check their directories and send your Query packet to all their connected computers. This process continues until you run out of computers to ask or until the Query packet gets too old and times out. This last detail is important because without a pre-defined Time To Live (TTL) the Query packet could get bounced around for a very long time, potentially forever. Most servents, including Toadnode, allow you to adjust the TTL. GUIDs in each packet are used to make sure that the same message does not get passed to the same computer again and again, creating a loop. Step 3: Downloading By the time you are ready to download, the question you asked in your Query packet has been distributed to a huge number of computers. Each computer has checked its shared information and determined if it is sharing anything that matches "Homer". Let us say that three computers that received your Query packet have a match for "Homer". The last two packet descriptors, called "QueryHit" and "Push" are responsible for content delivery. Each of the three computers will send you a QueryHit packet via the same delivery route, computer-to-computer, that the Query packet originally traveled. The QueryHit packet contains the IP address and GUID of the computer that has the data as well as information about the file that matched your query. When you receive a QueryHit packet your servent software will display the name of the file for you and give you the option to download. File transfers use the HTTP protocol’s GET method directly between your computer and the computer that has the file you want. Normally, your computer will initiate the HTTP connection to the computer that has the file. Occasionally, due to a firewall, you will be unable to initiate a connection directly to the computer that has the file you want. In these cases the "Push" packet is used. The Push packet allows a message to be delivered to the computer that has the file you would like to download via the route that the QueryHit packet originally traveled, except in reverse. The Push packet tells this computer that you would like to download a file but cannot manage to initiate an HTTP connection. This computer then becomes the initiator, attempting to connect directly to you, which often is possible because the firewall between the machines is only limiting connections initiated from outside the firewall.

Remarks Simple idea , but lacks scalability, since bandwidth is wasted. Sometimes, existing objects may not be located due to limited TTL. Various improved search strategies have been proposed.

Searching in Gnutella The topology is dynamic, I.e. constantly changing. How do we model a constantly changing topology? Usually, we begin with a static topology, and later account for the effect of churn. Modeling Static topology (measurements provide useful inputs) Random graph Power law graph (Q. Is Gnutella topology a random graph?) Search strategies Flooding Random walk / Biased random walk / multiple walker one-hop replication / two-hop replication

Gnutella topology Gnutella topology is a power-law graph. (Also called scale-free graph) What is a power-law graph? The number of nodes with degree k = c.k - r Contrast this with Gaussian distribution where the number of nodes with degree k = c. 2 - k. Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web (the number of pages that have k in-links Is proportional to k - 2), fraction of scientic papers that receive k citations is k -3 etc.

from which calls were made # of telephone numbers AT&T Call Graph How many telephone numbers receive calls from k different telephone numbers? from which calls were made # of telephone numbers # of telephone numbers called 4

Gnutella network power-law link distribution summer 2000, 10 1 2 number of neighbors proportion of nodes data power-law fit t = 2.07 summer 2000, data provided by Clip2 5

A possible explanation Nodes join at different times. The more connections a node has, the more likely it is to acquire new connections (Rich gets richer). Popular webpages attract new pointers. Such a growth process produces power-law network 7

Search via random walk Existence of a path does not necessarily mean that such a path can be discovered Gnutella networks are dumb. Just a bunch of nodes connected together with 3 or 4 connections each What’s the only way they can communicate?? - They have to asked their neighbours – and then ask their neighbours to ask their neighbours - peers provide client-side interfaces for issuing queries and viewing search results AND accept search queries from other peers Due to its distributed nature, a network of Gnutella nodes is highly fault-tolerant i.e. operation of the network will not be interrupted if a subset go offline.

Search via Random Walk Search metrics Discovery time in hops (also called delay) Distance covered (overhead) by the walker Both should be as small as possible. For a single random walker, these are equal. For search by flooding, if delay = h but distance = d + d2 + … + dh where d = degree of a node. K random walker (fixed K) is a compromise.

A simple analysis of random walk Let p = Population of the object. i.e. the fraction of nodes hosting the object T = TTL (time to live) Hop count h probability 1 p 2 (1-p).p 3 (1-p)2.p T (1-p)T-1.p

A simple analysis of random walk Expected hop count E(h) = 1.p + 2.(1-p).p + 3(1-p)2.p + …+ T.(1-p)T-1.p = 1/p. (1-(1-p)T) - T(1-p)T With a large TTL, E(h) = 1/p, which is intuitive. With a small TTL, there is a risk that search will time out before an existing object is located.

K random walkers Assume they all k walkers start in unison. Probability that none could find the object after one hop = (1-p)k. Prob. that none succeeded after k hops = (1-p)kT. So the probability that at least one walker succeeded is 1-(1-p)kT. A typical assumption is that the search is abandoned as soon as at least one walker succeeds Expected overhead = 1-(1-p)T-1 - T. (1-p)T-1 . k p As k increases, the probability of success increases, the overhead increases, but the delay decreases. There is a tradeoff here.

Increasing search efficiency Major strategies Biased walk utilizing node degree heterogeneity. Utilizing structural properties like random graph, power-law graphs, or small-world properties Topology adaptation for faster search Introducing two layers in the graph structure using supernodes

Dynamic Topology Adaptation Make high-capacity nodes have high degree (i.e., more neighbors) Per-node level of satisfaction, S: 0  no neighbors, 1  enough neighbors Function of: Node’s capacity Neighbors’ capacities Neighbors’ degrees Their age When S << 1, look for neighbors aggressively

One hop replication Each node keeps track of the indices of the files belonging to its immediate neighbors. As a result, high capacity / high degree nodes can provide useful clues to a large number of search queries. Where is

Biased random walk P=5/10 P=2/10 P=3/10 Each node records the degree of the neighboring nodes. Search easily gravitates towards high degree nodes that hold more clues.

Deterministic biased walk power-law graph number of nodes found 67 63 94 54 1 6 2 Deterministic biased walk 9

The KaZaA approach Where is ABC? download ABC Supernode Supernode Powerful nodes (supernodes) act as local index servers, and client queries are propagated to other supernodes. Two-layered.

Case Study: Gia (Chawathe et al. Making Gnutella-like P2P systems scalable. SIGCOMM ‘03) Three major decisions Flooding replaced by (biased) random walk Topology adaptation: high-capacity nodes maintain more clues via one hop replication. They should be easily reachable Token based flow control to prevent overloading of high-degree nodes.

Topology Control Steps for a node X when Y wants to join {X has spare capacity} If # of neighbors + 1 < max then accept Y as a neighbor {when X has no spare capacity, it has to drop a neighbor} If C(Y) > C(i: i is a current neighbor of X) then accept Y as neighbor and drop any one of the existing neighbors. {Otherwise, X has to pick a Z that can be dropped in favor Y} pick the highest degree current neighbor Z If C(Y) > C(Z) or Y has fewer neighbors than Z then drop Z, accept Y else reject Y {This last step ensures that we do not drop poorly connected neighbors}

Simulation Results Compare four systems Metric: FLOOD: TTL-scoped, random topologies RWRT: Random walks, random topologies SUPER: Supernode-based search GIA: search using GIA protocol suite Metric: Collapse point: aggregate throughput that the system can sustain. The success rate drops to 90%.

System Performance % % % population of the object GIA outperforms SUPER, RWRT & FLOOD by many orders of magnitude in terms of aggregate query load

Factor Analysis Algorithm Collapse point Algorithm Collapse point RWRT 0.0005 RWRT+OHR 0.005 RWRT+BIAS 0.0015 RWRT+TADAPT 0.001 RWRT+FLWCTL 0.0006 Algorithm Collapse point GIA 7 GIA – OHR 0.004 GIA – BIAS 6 GIA – TADAPT 0.2 GIA – FLWCTL 2 Topology adaptation Flow control