An Introduction to Peer-to-Peer System

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Lecture 3 1.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Chapter 2: Application layer
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Malugo – a scalable peer-to-peer storage system..
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Data Management
Peer-to-Peer and Social Networks
5.2 FLAT NAMING.
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati

Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks Structured overlay networks Unstructured overlay networks Overlay multicast networks 2

Overview of P2P overlay networks What is P2P systems? P2P refers to applications that take advantage of resources (storage, cycles, content, …) available at the end systems of the internet. What is overlay networks? Overlay networks refer to networks that are constructed on top of another network (e.g. IP). What is P2P overlay network? Any overlay network that is constructed by the Internet peers in the application layer on top of the IP network. 3

What is P2P systems? Multiple sites (at edge) Distributed resources Sites are autonomous (different owners) Sites are both clients and servers Sites have equal functionality 4

Internet P2P Traffic Statistics Between 50 and 65 percent of all download traffic is P2P related. Between 75 and 90 percent of all upload traffic is P2P related. And it seems that more people are using p2p today So what do people download? 61.4 % video 11.3 % audio 27.2 % games/software/etc. Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/ 5

P2P overlay networks properties Efficient use of resources Self-organizing All peers organize themselves into an application layer network on top of IP. Scalability Consumers of resources also donate resources Aggregate resources grow naturally with utilization 6

P2P overlay networks properties Reliability No single point of failure Redundant overlay links between the peers Redundant data source Ease of deployment and administration The nodes are self-organized No need to deploy servers to satisfy demand. Built-in fault tolerance, replication, and load balancing No need any change in underlay IP networks 7

P2P Applications P2P File Sharing P2P Communications Napster, Gnutella, Kazaa, eDonkey, BitTorrent Chord, CAN, Pastry/Tapestry, Kademlia P2P Communications Skype, Social Networking Apps P2P Distributed Computing Seti@home 8

Popular file sharing P2P Systems Napster, Gnutella, Kazaa, Freenet Large scale sharing of files. User A makes files (music, video, etc.) on their computer available to others User B connects to the network, searches for files and downloads files directly from user A Issues of copyright infringement 9

P2P/Grid Distributed Processing seti@home Search for ET intelligence Central site collects radio telescope data Data is divided into work chunks of 300 Kbytes User obtains client, which runs in background Peer sets up TCP connection to central computer, downloads chunk Peer does FFT on chunk, uploads results, gets new chunk Not P2P communication, but exploit Peer computing power 10

Key Issues Management Lookup Throughput How to maintain the P2P system under high rate of churn efficiently Application reliability is difficult to guarantee Lookup How to find out the appropriate content/resource that a user wants Throughput Content distribution/dissemination applications How to copy content fast, efficiently, reliably 11

Management Issue A P2P network must be self-organizing. Load balancing Join and leave operations must be self-managed. The infrastructure is untrusted and the components are unreliable. The number of faulty nodes grows linearly with system size. Tolerance to failures and churn Content replication, multiple paths Leverage knowledge of executing application Load balancing Dealing with free riders Freerider : rational or selfish users who consume more than their fair share of a public resource, or shoulder less than a fair share of the costs of its production. 12

Lookup Issue How do you locate data/files/objects in a large P2P system built around a dynamic set of nodes in a scalable manner without any centralized server or hierarchy? Efficient routing even if the structure of the network is unpredictable. Unstructured P2P : Napster, Gnutella, Kazaa Structured P2P : Chord, CAN, Pastry/Tapestry, Kademlia 13

Classification of overlay networks Structured overlay networks Are based on Distributed Hash Tables (DHT) the overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer. Unstructured overlay networks The overlay networks organize peers in a random graph in flat or hierarchical manners. Overlay multicast networks The peers organize themselves into an overlay tree for multicasting. 14

Structured overlay networks Overlay topology construction is based on NodeID’s that are generated by using Distributed Hash Tables (DHT). The overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer. This structured graph enables efficient discovery of data items using the given keys. It Guarantees object detection in O(log n) hops. 15

Unstructured P2P overlay networks An Unstructured system composed of peers joining the network with some loose rules, without any prior knowledge of the topology. Network uses flooding or random walks as the mechanism to send queries across the overlay with a limited scope. 16

Unstructured P2P File Sharing Networks Centralized Directory based P2P systems Pure P2P systems Hybrid P2P systems 17

Unstructured P2P File Sharing Networks Centralized Directory based P2P systems All peers are connected to central entity Peers establish connections between each other on demand to exchange user data (e.g. mp3 compressed data) Central entity is necessary to provide the service Central entity is some kind of index/group database Central entity is lookup/routing table Examples: Napster, Bittorent 18

Napster was used primarily for file sharing NOT a pure P2P network=> hybrid system Ways of action: Client sends server the query, server ask everyone and responds to client Client gets list of clients from server All Clients send ID’s of the data they hold to the server and when client asks for data, server responds with specific addresses peer downloads directly from other peer(s) 19

Centralized Network Napster model Nodes register their contents with server Centralized server for searches File access done on a peer to peer basis – Poor scalability – Single point of failure Napster model Client Client Server Reply Query File Transfer 20

Napster Further services: Centralized system Chat program, instant messaging service, tracking program,… Centralized system Single point of failure => limited fault tolerance Limited scalability (server farms with load balancing) Query is fast and upper bound for duration can be given 21

Gnutella pure peer-to-peer very simple protocol no routing "intelligence" Constrained broadcast Life-time of packets limited by TTL (typically set to 7) Packets have unique ids to detect loops 22

Query flooding: Gnutella fully distributed no central server public domain protocol many Gnutella clients implementing protocol overlay network: graph edge between peer X and Y if there’s a TCP connection all active peers and edges is overlay net Edge is not a physical link Given peer will typically be connected with < 10 overlay neighbors 23

Gnutella: protocol Query message sent over existing TCP connections File transfer: HTTP Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse path Query QueryHit Scalability: limited scope flooding 24

Gnutella : Scenario Step 0: Join the network Step 1: Determining who is on the network "Ping" packet is used to announce your presence on the network. Other peers respond with a "Pong" packet. Also forwards your Ping to other connected peers A Pong packet also contains: an IP address port number amount of data that peer is sharing Pong packets come back via same route Step 2: Searching Gnutella "Query" ask other peers (usually 7) if they have the file you desire A Query packet might ask, "Do you have any content that matches the string ‘Hey Jude"? Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers if not (usually 7) Continues for TTL (how many hops a packet can go before it dies, typically 10 ) Step 3: Downloading Peers respond with a “QueryHit” (contains contact info) File transfers use direct connection using HTTP protocol’s GET method 25

Gnutella: Peer joining Joining peer X must find some other peer in Gnutella network: use list of candidate peers X sequentially attempts to make TCP with peers on list until connection setup with Y X sends Ping message to Y; Y forwards Ping message. All peers receiving Ping message respond with Pong message X receives many Pong messages. It can then setup additional TCP connections 26

Query/Response analogous Gnutella - PING/PONG 3 6 Ping 1 Ping 1 Pong 3 Pong 6 Pong 6,7,8 Pong 6,7,8 Ping 1 7 Pong 3,4,5 Pong 5 5 1 2 Pong 7 Ping 1 Ping 1 Ping 1 Pong 2 Known Hosts: 2 Pong 8 Pong 4 8 Ping 1 3,4,5 6,7,8 Query/Response analogous 4 27

Unstructured Blind - Gnutella Breadth-First Search (BFS) = source = forward query = processed = found result = forward response 28

Unstructured Blind - Gnutella A node/peer connects to a set of Gnutella neighbors Forward queries to neighbors Client which has the Information responds. Flood network with TTL for termination + Results are complete – Bandwidth wastage 29

Gnutella : Reachable Users (analytical estimate) T : TTL, N : Neighbors for Query 30

Gnutella : Search Issue Flooding based search is extremely wasteful with bandwidth A large (linear) part of the network is covered irrespective of hits found Enormous number of redundant messages All users do this in parallel: local load grows linearly with size What can be done? Controlling topology to allow for better search Random walk, Degree-biased Random Walk Controlling placement of objects Replication 31

Gnutella : Random Walk Basic strategy Random walk In scale-free graph: high degree nodes are easy to find by (biased) random walk Scale-free graph is a graph whose degree distribution follows a power law And high degree nodes can store the index about a large portion of the network Random walk avoiding the visit of last visited node Degree-biased random walk Select highest degree node, that has not been visited This first climbs to highest degree node, then climbs down on the degree sequence Provably optimal coverage 32

Gnutella : Replication Spread copies of objects to peers: more popular objects can be found easier Replication strategies Owner replication Path replication Random replication But there is still the difficulty with rare objects. 33

Random Walkers Improved Unstructured Blind Similar structure to Gnutella Forward the query (called walker) to random subset of its neighbors + Reduced bandwidth requirements – Incomplete results Peer nodes 34

Unstructured Informed Networks Zero in on target based on information about the query and the neighbors. Intelligent routing + Reduces number of messages + Not complete, but more accurate – COST: Must thus flood in order to get initial information 35

Informed Searches: Local Indices Node keeps track of information available within a radius of r hops around it. Queries are made to neighbors just beyond the r radius. + Flooding limited to bounded part of network 36

Routing Indices For each query, calculate goodness of each neighbor. Calculating goodness: Categorize or separate query into themes Rank best neighbors for a given theme based on number of matching documents Follows chain of neighbors that are expected to yield the best results Backtracking possible 37

Free riding File sharing networks rely on users sharing data Two types of free riding Downloading but not sharing any data Not sharing any interesting data On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data 38

Gnutella:summary Hit rates are high High fault tolerance Adopts well and dynamically to changing peer populations High network traffic No estimates on duration of queries No probability for successful queries Topology is unknown => algorithm cannot exploit it Free riding is a problem A significant portion of Gnutella peers are free riders Free riders are distributed evenly across domains Often hosts share files nobody is interested in 39

Gnutella discussion Search types: Scalability Robustness Autonomy: Any possible string comparison Scalability Search very poor with respect to number of messages Updates excellent: nothing to do Routing information: low cost Robustness High, since many paths are explored Autonomy: Storage: no restriction, peers store the keys of their files Routing: peers are target of all kind of requests Global knowledge None required 40

Exploiting heterogeneity: KaZaA Each peer is either a group leader or assigned to a group leader. TCP connection between peer and its group leader. TCP connections between some pairs of group leaders. Group leader tracks the content in all its children. 41

iMesh, Kazaa Hybrid of centralized Napster and decentralized Gnutella Super-peers act as local search hubs Each super-peer is similar to a Napster server for a small portion of the network Super-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time) Users upload their list of files to a super-peer Super-peers periodically exchange file lists Queries are sent to a super-peer for files of interest 42

Overlay Multicasting IP multicast has not been deployed over the Internet due to some fundamental problems in congestion control, flow control, security, group management and etc. For the new emerging applications such as multimedia streaming, internet multicast service is required. Solution: Overlay Multicasting Overlay multicasting (or Application layer multicasting) is increasingly being used to overcome the problem of non-ubiquitous deployment of IP multicast across heterogeneous networks. 43

Overlay Multicasting Main idea Internet peers organize themselves into an overlay tree on top of the Internet. Packet replication and forwarding are performed by peers in the application layer by using IP unicast service. 44

Overlay Multicasting Overlay multicasting benefits Easy deployment It is self-organized it is based on IP unicast service There is not any protocol support requirement by the Internet routers. Scalability It is scalable with multicast groups and the number of members in each group. Efficient resource usage Uplink resources of the Internet peers is used for multicast data distribution. It is not necessary to use dedicated infrastructure and bandwidths for massive data distribution in the Internet. 45

Overlay Multicasting Overlay multicast approaches DHT based Tree based Mesh-tree based 46

Overlay Multicasting DHT based Overlay tree is constructed on top of the DHT based P2P routing infrastructure such as pastry, CAN, Chord, etc. Example: Scribe in which the overlay tree is constructed on a Pastry networks by using a multicast routing algorithm 47

Structured Overlay Networks / DHTs Chord, Pastry, Tapestry, CAN, Kademlia, P-Grid, Viceroy Set of Nodes Keys of Nodes Common Identifier Space Hashing Connect The nodes Smartly Keys of Values Keys of Values Hashing … Node Identifier Value Identifier 48

The Principle Of Distributed Hash Tables A dynamic distribution of a hash table onto a set of cooperating nodes node A node D node B node C Key Value 1 Algorithms 9 Routing 11 DS 12 Peer-to-Peer 21 Networks 22 Grids Each node has a routing table Pointers to some other nodes Typically, a constant or a logarithmic number of pointers →Node D : lookup(9) Basic service: lookup operation Key resolution from any node 49

DHT Desirable Properties Keys mapped evenly to all nodes in the network Each node maintains information about only a few other nodes Messages can be routed to a node efficiently Node arrival/departures only affect a few nodes 50

Chord [MIT] Problem adressed: efficient node localization Distributed lookup protocol Simplicity, provable performance, proven correctness Support of just one operation: given a key, Chord maps the key onto a node 51

The Chord algorithm – Construction of the Chord ring the consistent hash function assigns each node and each key an m-bit identifier using SHA 1 (Secure Hash Standard). m = any number big enough to make collisions improbable Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) Both are uniformly distributed Both exist in the same ID space 52

Chord consistent hashing (SHA-1) assigns each node and object an m-bit ID IDs are ordered in an ID circle ranging from 0 – (2m-1). New nodes assume slots in ID circle according to their ID Key k is assigned to first node whose ID ≥ k  successor(k) 53

Consistent Hashing - Successor Nodes identifier node X key 6 4 2 6 5 1 3 7 1 successor(1) = 1 identifier circle successor(6) = 0 6 2 successor(2) = 3 2 54

Consistent Hashing – Join and Departure When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n’s successor. 55

Consistent Hashing – Node Join keys 5 7 4 2 6 5 1 3 7 keys 1 keys keys 2 56

Consistent Hashing – Node Dep. keys 7 4 2 6 5 1 3 7 keys 1 keys 6 keys 2 57

Simple node localization // ask node n to find the successor of id n.find_successor(id) if (id (n; successor]) return successor; else // forward the query around the circle return successor.find_successor(id); => Number of messages linear in the number of nodes ! 58

Scalable Key Location – Finger Tables To accelerate lookups, Chord maintains additional routing information. This additional information is not essential for correctness, which is achieved as long as each node knows its correct successor. Each node n, maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table. The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier circle. s = successor(n+2i-1). s is called the ith finger of node n, denoted by n.finger(i) 59