Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Peer to Peer Networks By Cathy Chen CMSC 621, Fall 2007.
An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
Bruce Hammer, Steve Wallis, Raymond Ho
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Malugo – a scalable peer-to-peer storage system..
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Data Management
Peer-to-Peer and Social Networks
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
GIA: Making Gnutella-like P2P Systems Scalable
Presentation transcript:

Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science Hong Kong Baptist University

2 Outline Overview Napster Unstructured system Gnutella, Freenet FastTrack GIA Structured system DHT: Distributed Hash Table Chord, CAN, Pastry, Tapestry … Some research issues References

3 What is Peer-to-Peer? Every node is designed to provide some service that helps other nodes in the network to get service. Each node potentially has the same responsibility. Resource sharing can be in different ways: CPU cycles (peer-to-peer computing) Storage: Napster, Gnutella, Freenet… In this talk, we only consider file sharing: Step1: searching (today’s focus) Step2: file downloading Peer-to-peer vs. Grid computing [IPTPS03_IF]

4 Main Design Goals of P2P systems Ability to operate in a dynamic environment Peers join and leave the system randomly Performance Fast resource discovery, fast file retrieval Scalability Support millions or more peers Reliability Peers could leave the system silently Anonymity Privacy of the publishers & downloaders Security issues

5 Killer application: Napster Mid-1999 A hybrid system, not real p2p Centralized search: very simple Distributed download: p2p Operations: 1. Connect to Napster server 2. Report the list of local files to server 3. Send keywords to the server for searching 4. Select “best” of correct answers for downloading

6 Napster napster.com users File list is uploaded 1.

7 Napster napster.com user Request and results User requests search at server. 2.

8 Napster napster.com user pings User pings hosts that have the requested data. Looks for best transfer rate. 3.

9 Napster napster.com user Retrieves file User retrieves file 4.

10 P2P Systems with Distributed Search What’s the problem of centralized search? Not largely scalable Vulnerable to censorship, copyright law problem Single point of failure Vulnerable to DDOS Distributed search The file location information is distributed among all the peers. Two categories of distributed search systems Unstructured P2P system A number of real-life systems Random search Structured P2P system Lots of academic research work Depends on DHT: distributed hash table

11 Unstructured P2P Systems The nodes join and leave the P2P system freely to construct an overlay network (need a bootstrapping scheme). Choosing some neighbors Each node knows some file location information (local, or from neighbors). No coupling between network topology and file location information. Random search Purely decentralized systems All the nodes perform the same tasks Example systems: Original Gnutella, Freenet Partially centralized systems Using SuperNode or SuperPeer Example systems: KaZaA, Morpheus, Latest Gnutella

12 Gnutella Late-1999, by Nullsoft (Winamp) Purely decentralized, unstructured Each node is a client and a server, and is referred to as a servent. Random search: TTL-limited flooding Breadth-first traversal Each node forwards the received query messages to all of its neighbors and decreases the TTL field. The search could fail even if the requested file exists in the P2P system.

13 Gnutella A B Search: “Starwars” X

14 Gnutella A B Resp: B has “Starwars.divx” X

15 Gnutella A B Resp: “Starwars.divx” X Get: “Starwars”

16 Some observations of Gnutella “Why Gnutella cannot scale. No, Really.” Flooding is not a scalable design. A single query will generate huge amount of traffic The TTL cannot be large. Some measurement studies Small-world characteristics Power-law distribution of node degrees [UC01_MJ], [IPTPS02_MP]

17 Freenet [IEEE02_IC] For publication, replication, and retrieval of data The network is purely decentralized, and publishers and consumers of information are anonymous. Depth-first traversal Each node forwards the query to a single neighbor, and waits for the response before forwarding the query to another neighbor.

18 FastTrack & Latest Gnutella FastTrack is a proprietary protocol Partially centralized, unstructured Two real systems: Kazaa, Morpheus Improve the scalability & searching performance by using Supernode Supernode: gathering the file location information from the attached normal nodes Searching is still by flooding among the supernodes. Today’s Gnutella also use this technique.

19 Some research issues How to improve the search? Searching latency vs. Traffic overhead [ICDCS02_BY] Iterative deepening Directed BFS: send query to a subset of neighbors Local indices: replication Random Walk [ICS02_LV], [INFOCOM04_CK] How to construct the overlay network? How can caching & replication help? [ICS02_LV], [SIGCOMM02_EC]

20 GIA [SIGCOMM03_YC] “Making Gnutella-like P2P Systems Scalable” 1. Dynamic topology adaptation To ensure high capacity nodes have high degree; Low capacity nodes are close to high-capacity nodes. 2. Active flow control scheme Avoid overloaded hot-spots Explicitly handles heterogeneity 3. One-hop replication of pointers to content Allows high-capacity nodes to answer more queries 4. Search protocol Biased random walk: towards high-capacity nodes

21 Structured P2P Systems The overlay network topology is tightly controlled. Nodes can join and leave the system, but the topology need to be reconstructed. There is close coupling between the network topology and file location information. They provide a mapping between the file identifier and location by using Distributed Hash Table (DHT). Only for exact-match queries (as compared to keyword queries) Example: Search for file “Starwars.divx” Convert “Starwars.divx” to a key, say “ ” Lookup “ ” in the DHT, find out the file location Download the file

22 Structured P2P Systems Chord: [SIGCOMM01_IS] MIT CAN: [SIGCOMM01_SR] UC Berkeley Pastry: [MIDDLEWARE01_AR] Microsoft Research Tapestry: [UCB01_BZ] UC Berkeley

23 Chord [SIGCOMM01_IS] Provides peer-to-peer hash lookup service Lookup(key)  file location (IP & port) (Key, Value) pairs are distributed among all the nodes Each node maintains a routing table (to accelerate the search) Scalability: O(logN) routing entries per node To lookup a key Queries are routed to the node with the desired key, according to the routing table Efficiency: O(logN) hops per lookup

24 Chord (Cont.) N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

25 Research Issues Performance issues How to decrease the searching time? Given the node degree, how to minimize the network diameter: a traditional graph-theoretic problem [SIGCOMM03_DL] [INFOCOM03_JX] Tradeoff between the node degree and the network diameter Different topology designs de Bruijn: [SIGCOMM03_DL] Trie, Butterfly, Random graph, etc. However, hop-count != searching time, still some room for future research.

26 Research Issues Results from [SIGCOMM03_DL]:

27 Research Issues Load balancing How to distribute the (key, value) pairs to all the nodes evenly (or capacity-aware)? [IPTPS03_AR], [IPTPS03_JB], [IPTPS04_DK] Searching issues DTH is designed for exact-match search. How to support keyword search? Inverted index [MIT02_ODG], [Middleware03_PR], [IPTPS04_SS] How to support more complex database queries? [IPTPS02_MH] [ICDCS04_GE]

28 Research Issues Security issues [IPTPS02_ES] Not too many papers yet Downloading issues: content distribution The retrieval (file downloading) part is also very important. [SIGCOMM02_JB] [IPTPS03_PM]: download big files from multi-source How to select the peer(s) to download? [IPTPS03_DB]: use machine-learning technique BitTorrent

29 Relevant Conferences & Workshops ACM Annual conference of the Special Interest Group on Data Communication (SIGCOMM) IEEE Conference on Computer Communications (INFOCOM) IEEE International Conference on Distributed Computing Systems (ICDCS) ACM Symposium on Principles of Distributed Computing (PODC) International Workshop on Peer-to-Peer Systems (IPTPS) International Workshop on Global and Peer-to-Peer Computing (in conjunction with IEEE/ACM CCGRID 2004) International Workshop on Peer-to-Peer Computing and Databases (in conjunction with EDBT 2004)

30 References [NAPSTER] [GNUTELLA] [KAZAA] [MOREPHEUS] [FREENET] [BITTORRENT] [MIDDLEWARE01_AR] Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems [SIGCOMM01_IS] A scalable peer-to-peer lookup service for Internet applications [SIGCOMM01_SR] A scalable content-addressable network [UCB01_BZ] Tapestry: an infrastructure for fault-resilient wide-area location and routing [UC01_MJ] Modeling large-scale peer-to-peer networks and a case study of gnutella [ICDCS02_BY] Efficient search in peer-to-peer networks

31 References [ICS02_LV] Search and replication in unstructured peer-to-peer networks [IEEE02_IC] Protecting free expression online with Freenet [IPTPS02_ES] Security considerations for peer-to-peer Distributed Hash Tables [IPTPS02_MH] Complex queries in DHT-based peer-to-peer networks [IPTPS02_MP] Mapping the Gnutella network: macroscopic properties of large-scale peer-to-peer systems [MIT02_ODG] A keyword-set search system for peer-to-peer networks [SIGCOMM02_EC] Replication strategies in unstructured peer-to-peer networks [SIGCOMM02_JB] Informed content delivery across adaptive overlay networks [INFOCOM03_JX] On the fundamental tradeoffs between routing table size and network diameter in peer-to-peer networks [IPTPS03_AR] Load balancing in structured P2P systems

32 References [IPTPS03_DB] Adaptive peer selection [IPTPS03_IF] On death, taxes, and the convergence of peer-to-peer and Grid computing [IPTPS03_JB] Simple load balancing for distributed hash table [IPTPS03_PM] Rateless codes and big downloads [MIDDLEWARE03_PR] Efficient peer-to-peer keyword searching [SIGCOMM03_YC] Making Gnutella-like P2P systems scalable [SIGCOMM03_DL] Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience [ICDCS04_GE] Data indexing in peer-to-peer DHT networks [INFOCOM04_CG] Random walks in peer-to-peer networks [IPTPS04_DK] Simple efficient load balancing algorithms for peer-to- peer systems [IPTPS04_SS] Making peer-to-peer keyword searching feasible using multi-level partitioning