Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.

Similar presentations


Presentation on theme: "Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science."— Presentation transcript:

1 Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science Hong Kong Baptist University

2 2 Outline Overview Napster Unstructured system Gnutella, Freenet FastTrack GIA Structured system DHT: Distributed Hash Table Chord, CAN, Pastry, Tapestry … Some research issues References

3 3 What is Peer-to-Peer? Every node is designed to provide some service that helps other nodes in the network to get service. Each node potentially has the same responsibility. Resource sharing can be in different ways: CPU cycles (peer-to-peer computing) Storage: Napster, Gnutella, Freenet… In this talk, we only consider file sharing: Step1: searching (today’s focus) Step2: file downloading Peer-to-peer vs. Grid computing [IPTPS03_IF]

4 4 Main Design Goals of P2P systems Ability to operate in a dynamic environment Peers join and leave the system randomly Performance Fast resource discovery, fast file retrieval Scalability Support millions or more peers Reliability Peers could leave the system silently Anonymity Privacy of the publishers & downloaders Security issues

5 5 Killer application: Napster Mid-1999 A hybrid system, not real p2p Centralized search: very simple Distributed download: p2p Operations: 1. Connect to Napster server 2. Report the list of local files to server 3. Send keywords to the server for searching 4. Select “best” of correct answers for downloading

6 6 Napster napster.com users File list is uploaded 1.

7 7 Napster napster.com user Request and results User requests search at server. 2.

8 8 Napster napster.com user pings User pings hosts that have the requested data. Looks for best transfer rate. 3.

9 9 Napster napster.com user Retrieves file User retrieves file 4.

10 10 P2P Systems with Distributed Search What’s the problem of centralized search? Not largely scalable Vulnerable to censorship, copyright law problem Single point of failure Vulnerable to DDOS Distributed search The file location information is distributed among all the peers. Two categories of distributed search systems Unstructured P2P system A number of real-life systems Random search Structured P2P system Lots of academic research work Depends on DHT: distributed hash table

11 11 Unstructured P2P Systems The nodes join and leave the P2P system freely to construct an overlay network (need a bootstrapping scheme). Choosing some neighbors Each node knows some file location information (local, or from neighbors). No coupling between network topology and file location information. Random search Purely decentralized systems All the nodes perform the same tasks Example systems: Original Gnutella, Freenet Partially centralized systems Using SuperNode or SuperPeer Example systems: KaZaA, Morpheus, Latest Gnutella

12 12 Gnutella Late-1999, by Nullsoft (Winamp) Purely decentralized, unstructured Each node is a client and a server, and is referred to as a servent. Random search: TTL-limited flooding Breadth-first traversal Each node forwards the received query messages to all of its neighbors and decreases the TTL field. The search could fail even if the requested file exists in the P2P system.

13 13 Gnutella A B Search: “Starwars” X

14 14 Gnutella A B Resp: B has “Starwars.divx” X

15 15 Gnutella A B Resp: “Starwars.divx” X Get: “Starwars”

16 16 Some observations of Gnutella “Why Gnutella cannot scale. No, Really.” http://www.darkridge.com/~jpr5/doc/gnutella.html Flooding is not a scalable design. A single query will generate huge amount of traffic The TTL cannot be large. Some measurement studies http://www.ececs.uc.edu/~mjovanov/Research/paper.html http://www.ececs.uc.edu/~mjovanov/Research/paper.html Small-world characteristics Power-law distribution of node degrees [UC01_MJ], [IPTPS02_MP]

17 17 Freenet http://freenet.sourceforge.net [IEEE02_IC] For publication, replication, and retrieval of data The network is purely decentralized, and publishers and consumers of information are anonymous. Depth-first traversal Each node forwards the query to a single neighbor, and waits for the response before forwarding the query to another neighbor.

18 18 FastTrack & Latest Gnutella FastTrack is a proprietary protocol Partially centralized, unstructured Two real systems: Kazaa, Morpheus Improve the scalability & searching performance by using Supernode Supernode: gathering the file location information from the attached normal nodes Searching is still by flooding among the supernodes. Today’s Gnutella also use this technique.

19 19 Some research issues How to improve the search? Searching latency vs. Traffic overhead [ICDCS02_BY] Iterative deepening Directed BFS: send query to a subset of neighbors Local indices: replication Random Walk [ICS02_LV], [INFOCOM04_CK] How to construct the overlay network? How can caching & replication help? [ICS02_LV], [SIGCOMM02_EC]

20 20 GIA [SIGCOMM03_YC] “Making Gnutella-like P2P Systems Scalable” 1. Dynamic topology adaptation To ensure high capacity nodes have high degree; Low capacity nodes are close to high-capacity nodes. 2. Active flow control scheme Avoid overloaded hot-spots Explicitly handles heterogeneity 3. One-hop replication of pointers to content Allows high-capacity nodes to answer more queries 4. Search protocol Biased random walk: towards high-capacity nodes

21 21 Structured P2P Systems The overlay network topology is tightly controlled. Nodes can join and leave the system, but the topology need to be reconstructed. There is close coupling between the network topology and file location information. They provide a mapping between the file identifier and location by using Distributed Hash Table (DHT). Only for exact-match queries (as compared to keyword queries) Example: Search for file “Starwars.divx” Convert “Starwars.divx” to a key, say “123456789” Lookup “123456789” in the DHT, find out the file location Download the file

22 22 Structured P2P Systems Chord: [SIGCOMM01_IS] MIT CAN: [SIGCOMM01_SR] UC Berkeley Pastry: [MIDDLEWARE01_AR] Microsoft Research Tapestry: [UCB01_BZ] UC Berkeley

23 23 Chord [SIGCOMM01_IS] Provides peer-to-peer hash lookup service Lookup(key)  file location (IP & port) (Key, Value) pairs are distributed among all the nodes Each node maintains a routing table (to accelerate the search) Scalability: O(logN) routing entries per node To lookup a key Queries are routed to the node with the desired key, according to the routing table Efficiency: O(logN) hops per lookup

24 24 Chord (Cont.) N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

25 25 Research Issues Performance issues How to decrease the searching time? Given the node degree, how to minimize the network diameter: a traditional graph-theoretic problem [SIGCOMM03_DL] [INFOCOM03_JX] Tradeoff between the node degree and the network diameter Different topology designs de Bruijn: [SIGCOMM03_DL] Trie, Butterfly, Random graph, etc. However, hop-count != searching time, still some room for future research.

26 26 Research Issues Results from [SIGCOMM03_DL]:

27 27 Research Issues Load balancing How to distribute the (key, value) pairs to all the nodes evenly (or capacity-aware)? [IPTPS03_AR], [IPTPS03_JB], [IPTPS04_DK] Searching issues DTH is designed for exact-match search. How to support keyword search? Inverted index [MIT02_ODG], [Middleware03_PR], [IPTPS04_SS] How to support more complex database queries? [IPTPS02_MH] [ICDCS04_GE]

28 28 Research Issues Security issues [IPTPS02_ES] Not too many papers yet Downloading issues: content distribution The retrieval (file downloading) part is also very important. [SIGCOMM02_JB] [IPTPS03_PM]: download big files from multi-source How to select the peer(s) to download? [IPTPS03_DB]: use machine-learning technique BitTorrent

29 29 Relevant Conferences & Workshops ACM Annual conference of the Special Interest Group on Data Communication (SIGCOMM) IEEE Conference on Computer Communications (INFOCOM) IEEE International Conference on Distributed Computing Systems (ICDCS) ACM Symposium on Principles of Distributed Computing (PODC) International Workshop on Peer-to-Peer Systems (IPTPS) International Workshop on Global and Peer-to-Peer Computing (in conjunction with IEEE/ACM CCGRID 2004) International Workshop on Peer-to-Peer Computing and Databases (in conjunction with EDBT 2004)

30 30 References [NAPSTER] http://www.napster.com [GNUTELLA] http://www.gnutella.com [KAZAA] http://www.kazaa.com [MOREPHEUS] http://www.morepheus.com [FREENET] http://freenet.sourceforge.net [BITTORRENT] http://bitconjurer.org/BitTorrent/ [MIDDLEWARE01_AR] Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems [SIGCOMM01_IS] A scalable peer-to-peer lookup service for Internet applications [SIGCOMM01_SR] A scalable content-addressable network [UCB01_BZ] Tapestry: an infrastructure for fault-resilient wide-area location and routing [UC01_MJ] Modeling large-scale peer-to-peer networks and a case study of gnutella [ICDCS02_BY] Efficient search in peer-to-peer networks

31 31 References [ICS02_LV] Search and replication in unstructured peer-to-peer networks [IEEE02_IC] Protecting free expression online with Freenet [IPTPS02_ES] Security considerations for peer-to-peer Distributed Hash Tables [IPTPS02_MH] Complex queries in DHT-based peer-to-peer networks [IPTPS02_MP] Mapping the Gnutella network: macroscopic properties of large-scale peer-to-peer systems [MIT02_ODG] A keyword-set search system for peer-to-peer networks [SIGCOMM02_EC] Replication strategies in unstructured peer-to-peer networks [SIGCOMM02_JB] Informed content delivery across adaptive overlay networks [INFOCOM03_JX] On the fundamental tradeoffs between routing table size and network diameter in peer-to-peer networks [IPTPS03_AR] Load balancing in structured P2P systems

32 32 References [IPTPS03_DB] Adaptive peer selection [IPTPS03_IF] On death, taxes, and the convergence of peer-to-peer and Grid computing [IPTPS03_JB] Simple load balancing for distributed hash table [IPTPS03_PM] Rateless codes and big downloads [MIDDLEWARE03_PR] Efficient peer-to-peer keyword searching [SIGCOMM03_YC] Making Gnutella-like P2P systems scalable [SIGCOMM03_DL] Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience [ICDCS04_GE] Data indexing in peer-to-peer DHT networks [INFOCOM04_CG] Random walks in peer-to-peer networks [IPTPS04_DK] Simple efficient load balancing algorithms for peer-to- peer systems [IPTPS04_SS] Making peer-to-peer keyword searching feasible using multi-level partitioning


Download ppt "Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science."

Similar presentations


Ads by Google