Presentation is loading. Please wait.

Presentation is loading. Please wait.

P2P Systems - 1 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Peer-to-Peer Systems Karl Aberer, Manfred.

Similar presentations


Presentation on theme: "P2P Systems - 1 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Peer-to-Peer Systems Karl Aberer, Manfred."— Presentation transcript:

1 P2P Systems - 1 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Peer-to-Peer Systems Karl Aberer, Manfred Hauswirth EPFL-IC, Laboratoire de systèmes d'informations répartis

2 P2P Systems - 2 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Overview 1.P2P Systems and Resource Location 2.Unstructured P2P Overlay Networks 3.Hierarchical P2P Overlay Networks 4.Structured P2P Overlay Networks 5.Small World Graphs

3 P2P Systems - 3 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Web search engine –Global scale application Example: Google –150 Mio searches/day –1-2 Terabytes of data (April 2001) Centralized Information Systems Google: servers 1 Find "aberer" 2 Result home page of Karl Aberer … Google Server Client Strengths –Global ranking –Fast response time Weaknesses –Infrastructure, administration, cost –A new company for every global application ?

4 P2P Systems - 4 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis (Semi-)Decentralized Information Systems P2P Music file sharing –Global scale application Example: Napster –1.57 Mio. Users –10 TeraByte of data (2 Mio songs, 220 songs per user) (February 2001) 1 Find "brick in the wall" "pink floyd" "1 MB" "rock" schema 2 Result you find f.mp3 at peer x 3 Request and transfer file f.mp3 from peer X directly Napster Server Peer PeerX Napster: 100 servers

5 P2P Systems - 5 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Lessons Learned from Napster Strengths: Resource Sharing –Every node “pays” its participation by providing access to its resources physical resources (disk, network), knowledge (annotations), ownership (files) –Every participating node acts as both a client and a server (“servent”): P2P –global information system without huge investment –decentralization of cost and administration = avoiding resource bottlenecks Weaknesses: Centralization –server is single point of failure –unique entity required for controlling the system = design bottleneck –copying copyrighted material made Napster target of legal attack Centralized System Decentralized System increasing degree of resource sharing and decentralization

6 P2P Systems - 6 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Fully Decentralized Information Systems P2P file sharing –Global scale application Example: Gnutella – nodes, 3 Mio files (August 2000) Gnutella: no servers Strengths –Good response time, scalable –No infrastructure, no administration –No single point of failure Weaknesses –High network traffic –No structured search –Free-riding Find "brick in the wall" I have "brick_in_the_wall.mp3" …. Self-organizing System

7 P2P Systems - 7 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Self-Organization Self-organized systems well known from physics, biology, cybernetics –distribution of control ( = decentralization = symmetry in roles = P2P) –local interactions, information and decisions –emergence of global structures –failure resilience Self-organization in information systems –appear to be costly (if done wrong) –Example: search efficiency in Gnutella

8 P2P Systems - 8 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Principle of self-organization can be applied at different system layers Original Internet designed as decentralized system: –P2P overlay networks ~ application-level Internet on top of the Internet –support application-specific addresses P2P Architectures Networking LayerInternetRoutingTCP/IP, DNS Data Access LayerOverlay NetworksResource LocationGnutella, FreeNet Service LayerP2P applicationsMessaging, Distributed Processing Napster, Seti, Groove User LayerUser CommunitiesCollaborationeBay, Ciao

9 P2P Systems - 9 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Resource Location in P2P Systems Problem: Peers need to locate distributed information –Peers with address p store data items d that are identified by a key k d –Given a key k d (or a predicate on k d ) locate a peer that stores d, i.e. locate the index information (k d, p) –Thus, the data we have to manage consists of the key-value pairs (k d, p) Can such a distributed database be maintained and accessed by a set of peers without central control ? P1 P2P3 P4 P5 P6P7 P8 k d ="jingle-bells" p="P8" d="jingle-bells.mp3""jingle-bells.mp3 k d ="jingle" ? ("jingle",P8)

10 P2P Systems - 10 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Resource Location Problem Operations –search for a key at a peer: p->search(k) –update a key at a peer: p->update(k,p') –peers joining and leaving the network: p->join(p’) Performance Criteria (for search) –search latency: e.g. searchtime(query)  Log(size(database)) –message bandwidth, e.g. messages(query)  Log(size(database)) messages(update)  Log(size(database)) –storage space used, e.g. storagespace(peer)  Log(size(database)) –resilience to failures (network, peers) Qualitative Criteria –complex search predicates: equality, prefix, containment, similarity search –use of global knowledge –peer autonomy –peer anonymity and trust –security (e.g. denial of service attacks)

11 P2P Systems - 11 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Summary What is a P2P System ? What is emergence ? At which layers can the P2P architecture occur ? How do we define efficiency for a P2P resource location system ?

12 P2P Systems - 12 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis 2. Unstructured P2P Overlay Networks No index information is used –i.e. the information (k, p) is only available directly from p Simplest approach: Message Flooding (Gossiping) –send query message to C neighbors –messages have limited time-to-live TTL –messages have IDs to eliminate cycles k="jingle-bells" Example: C=3, TTL=2

13 P2P Systems - 13 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Gnutella Developed in a 14 days “quick hack” by Nullsoft (winamp) –Originally intended for exchange of recipes Evolution of Gnutella –Published under GNU General Public License on the Nullsoft web server –Taken off after a couple of hours by AOL (owner of Nullsoft) –This was enough to “infect” the Internet –Gnutella protocol was reverse engineered from downloaded versions of the original Gnutella software –Third-party clients were published and Gnutella started to spread Based on message flooding –Typical values C=4, TTL=7 –One request leads tomessages –Hooking up to the Gnutella systems requires that a new peer knows at least one Gnutella host (gnutellahosts.com:6346; outside the Gnutella protocol specification) –Neighbors are found using a basic discovery protocol (ping-pong messages)

14 P2P Systems - 14 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Gnutella: Protocol Message Types

15 P2P Systems - 15 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Gnutella: Meeting Peers (Ping/Pong) C A BD E A’s ping B’s pong C’s pong D’s pong E’s pong

16 P2P Systems - 16 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Gnutella: Searching (Query/QueryHit/GET) C A BD E A’s query (e.g., X.mp3) C’s query hit E’s query hit X.mp3 GET X.mp3 X.mp3

17 P2P Systems - 17 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Popularity of Queries [ Sripanidkulchai01 ] Very popular documents are approximately equally popular Less popular documents follow a Zipf-like distribution (i.e., the probability of seeing a query for the i th most popular query is proportional to 1/(i alpha ) Access frequency of web documents also follows Zipf-like distributions  caching might work for Gnutella

18 P2P Systems - 18 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Free-riding Statistics [Adar00] Most Gnutella users are free riders Of 33,335 hosts: 22,084 (66%) of the peers share no files 24,347 (73%) share ten or less files Top 1 percent (333) hosts share 37% (1,142,645) of total files shared Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared Many servents share files nobody downloads Of 11,585 sharing hosts: Top 1% of sites provide nearly 47% of all answers Top 25% of sites provide 98% of all answers 7,349 (63%) never provide a query response

19 P2P Systems - 19 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Connectivity in Gnutella Follows a power-law distribution: P(k) ~ k -g –k number of links a node is connected to, g constant (e.g. g=2) –distribution independent of number of nodes N –preferential attachment –low connected nodes follow a constant distribution (?): more robust

20 P2P Systems - 20 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Path Length in Gnutella Average distance between nodes approx. 7

21 P2P Systems - 21 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Improvements of Message Flooding Expanding Ring –start search with small TTL (e.g. TTL = 1) –if no success iteratively increase TTL (e.g. TTL = TTL +2) k-Random Walkers –forward query to one randomly chosen neighbor only, with large TTL –start k random walkers –random walker periodically checks with requester whether to continue

22 P2P Systems - 22 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Discussion Unstructured Networks Performance –Search latency: low (graph properties) –Message Bandwidth: high improvements through random walkers, but essentially the whole network needs to be explored –Storage cost: low (only local neighborhood) –Update and maintenance cost: low (only local updates) –Resilience to failures good: multiple paths are explored and data is replicated Qualitative Criteria –search predicates: very flexible, any predicate is possible –global knowledge: none required –peer autonomy: high

23 P2P Systems - 23 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Summary How are unstructured P2P networks characterized ? What is the purpose of the ping/pong messages in Gnutella ? Why is search latency in Gnutella low ? Which are methods to reduce message bandwidth in unstructured networks ?

24 P2P Systems - 24 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis 3. Hierarchical P2P Overlay Networks Servers provide index information, i.e. the information (k, p) is available from dedicated servers Simplest Approach –one central server –user register files –service (file exchange) is organized as P2P architecture k="jingle-bells" index server

25 P2P Systems - 25 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Napster Central (virtual) database which holds an index of offered MP3/WMA files Clients connect to this server, identify themselves (account) and send a list of MP3/WMA files they are sharing (C/S) Other clients can search the index and learn from which clients they can retrieve the file (P2P) Additional services at server (chat etc.) AB Napster Server register (user, files) “A has X.mp3” Download X.mp3

26 P2P Systems - 26 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Superpeer Networks Improvement of Central Index Server (Morpheus, Kaaza) –multiple index servers build a P2P network –clients are associated with one (or more) superpeers –superpeers use message flooding to forward search requests Experiences –redundant superpeers are good –superpeers should have high outdegree (>20) –TTL should be minimized

27 P2P Systems - 27 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Discussion Performance –Search latency: very low (index) –Message Bandwidth: low with superpeers flooding occurs, but the number of superpeers is comparatively small –Storage cost: low at client, high at index server –Update cost: low (no replication) –Resilience to failures: bad (system has single-point of failure) Qualitative Criteria –search predicates: very flexible, any predicate is possible –global knowledge: server –peer autonomy: low But: complete decentralization is not an asset per se, everything depends upon how well the overall system works !

28 P2P Systems - 28 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Summary Which are the two levels of P2P networks in superpeer networks, and to which functional layers are they related ? Which problem of distribution is avoided in superpeer networks and addressed in structured network ? What is the impact on the relation between nodes and functional layers ?

29 P2P Systems - 29 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis 4. Structured P2P Overlay Networks Unstructured overlay networks – what we learned –simplicity (simple protocol) –robustness (almost impossible to “kill” – no central authority) Performance –search latency O(log n), n number of peers –update and maintenance cost low Drawbacks –tremendous bandwidth consumption for search –free riding Can we do better?

30 P2P Systems - 30 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Efficient Resource Location search cost maximal bandwidth update cost low high UNSTRUCTURED P2P OVERLAY NETWORKS (e.g. Gnutella) SERVER (e.g. Napster) FULL REPLICATION STRUCTURED P2P OVERLAY NETWORKS (e.g. prefix routing)

31 P2P Systems - 31 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Distribution of Index Information Goal: provide efficient search using few messages without using designated servers Easy: distribution of index information over all peers, i.e. every peer maintains and provides part of the index information (k, p) Difficult: distributing the data access structure to support efficient search index information I server data access structure peers (storing data) peers (storing data and index information) I1I2I3I4 ? Search starts here Where to start the search?

32 P2P Systems - 32 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Approaches Different strategies –P-Grid: distributing a binary search tree –Chord: constructing a distributed hash table –CAN: Routing in a d-dimensional space –Freenet: caching index information along search paths Commonalities –each peer maintains a small part of the index information (routing table) –searches performed by directed message forwarding Differences –performance and qualitative criteria

33 P2P Systems - 33 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Example 1: Scalable Distributed Tries (P-Grid) Search trie: search keys are binary keys ?01?10?11? 0??1?? ??? index 101 ? ? ? !

34 P2P Systems - 34 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Non-scalable Distribution of Search Tree Distribute search tree over peers ?01?10?11? 0??1?? ??? peer 1peer 2peer 3peer 4 bottleneck

35 P2P Systems - 35 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis "Napster" bottleneck Scalable Distribution of Search Tree ?01?10?11? 0??1?? ??? peer 1peer 2peer 3peer 4 Associate each peer with a complete path

36 P2P Systems - 36 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Routing Information ? 1?? ??? peer 1 peer 2 peer 3 peer 4 know more about this part of the tree knows more about this part of the tree

37 P2P Systems - 37 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Prefix Routing 11? 1?? ??? peer 4 peer 1 peer 2 peer ? 1?? ??? peer 1peer 2 peer 3 peer ? ? ? ? ! Message to peer ? prefixpeer 0??peer1 peer2 10?peer3 routing table of peer4 search(p. k) find in routing table peer i with longest prefix matching k if last entry then found else search(peer i, k)

38 P2P Systems - 38 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Construction Splitting Approach (P-Grid) –peers meet and decide whether to extend search tree by splitting the data space –peers can perform load balancing considering their storage load –networks with different origins can merge, like Gnutella, Freenet (loose coupling) Node Insertion Approach (Chord, CAN, …) –peers determine their "leaf position" based on their IP address –nodes route from a gateway node to their node-id to populate the routing table –network has to start from single origin (strong coupling) Replication of data items and routing table entries is used to increase failure resilience

39 P2P Systems - 39 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis P-Grid Construction Algorithm (Bootstrap) When peers meet (randomly, e.g. random walk) –Compare the current search paths p and q Case 1: p and q are the same –If split condition satisfied extend the paths, i.e. to p0 and q1 else replicate data Case 2: p is a subpath of q, i.e. q = p0… –If split condition satisfied extend the path p by the inverse, i.e. p1, Case 3: only a common prefix exists –Forward to one of the referenced peers –Limit forwarding by recmax The peers remember each other and exchange in addition references at all levels Split conditions –below a maximal path length –storing a minimal number of data items

40 P2P Systems - 40 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Load Balancing in P-Grid Split criterion: minimal number of data items –Each node has same storage load –Algorithm still converges quickly All binary strings of length 5 sorted by frequency

41 P2P Systems - 41 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis P-Grid Discussion Performance –Search latency: O(log n) (with high probability, provable) –Message Bandwidth: O(log n) (selective routing) –Storage cost: O(log n) (routing table) –Update cost: low (like search) Qualitative Criteria –search predicates: prefix searches –global knowledge: key hashing –peer autonomy: peers can locally decide on their role (splitting decision)

42 P2P Systems - 42 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Example 2: Distributed Hash Tables (Chord) Hashing of search keys AND peer addresses on binary keys of length m –e.g. m=8, key("jingle-bells.mp3")=17, key( )=3 Data keys are stored at next larger node key peer with hashed identifier p, data with hashed identifier k, then k  ] predecessor(p), p ] m=8 32 keys p p2 p3 k stored at predecessor Search possibilities 1. every peer knows every other O(n) routing table size 2. peers know successor O(n) search cost

43 P2P Systems - 43 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Routing Tables Every peer knows m peers with exponentially increasing distance Each peer p stores a routing table First peer with hashed identifier s i such that s i =successor(p+2 i-1 ) for i=1,..,m We write also s i = finger(i, p) p p+2 p+4 p+1 p+8 p+16 s 1, s 2, s 3 s4s4 s5s5 Search O(log n) routing table size p2 p3 p4 isisi 1p p3 5p4

44 P2P Systems - 44 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Search search(p, k) find in routing table largest (i, p*) such that p*  [p,k[ /* largest peer key smaller than the searched data key */ if such a p* exists then search(p*, k) else return (successor(p)) // found p p+2 p+4 p+1 p+8 p+16 s 1, s 2, s 3 s4s4 s5s5 k1 k2 Search O(log n) search cost RT with exp. increasing distance  O(log n) with high probability p2 p3 p4

45 P2P Systems - 45 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Node Insertion New node q joining the network –q asks existing node p to find predecessor and fingers –cost: O(log 2 n) p p+2 p+4 p+1 p+8 p+16 p2 p3 p4 q isisi 1q 2q 3p2 4p3 5p4 isisi 1p2 2 3p3 4 5p4 routing table of p routing table of q

46 P2P Systems - 46 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Load Balancing in Chord Network size n=10^4 5 10^5 keys uniform data distribution  50 keys per node? NO, as IP addresses do not map uniformly into data key space.

47 P2P Systems - 47 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Length of Search Paths Network size n=2^ ^12 keys Path length ½ Log 2 (n) RTs can be seen as an embedding of search trees into the network and thus search starts at a randomly selected tree depth

48 P2P Systems - 48 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Chord Discussion Performance –Search: like P-Grid –Node join/leave cost: O(log 2 n) –Resilience to failures: replication to successor nodes Qualitative Criteria –search predicates: equality of keys only –global knowledge: key hashing, network origin –peer autonomy: nodes have by virtue of their address a specific role in the network

49 P2P Systems - 49 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Example 3: Topological Routing (CAN) Based on hashing of keys into a d-dimensional space (a torus) –Each peer is responsible for keys of a subvolume of the space (a zone) –Each peer stores the adresses of peers responsible for the neighboring zones for routing –Search requests are greedily forwarded to the peers in the closest zones Assignment of peers to zones depends on a random selection made by the peer

50 P2P Systems - 50 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Network Search and Join Node 7 joins the network by choosing a coordinate in the volume of 1 => O(d) updates or RTs

51 P2P Systems - 51 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis CAN Refinements Multiple Realities –We can have r different coordinate spaces –Nodes hold a zone in each of them –Creates r replicas of the (key, value) pairs –Increases robustness –Reduces path length as search can be continued in the reality where the target is closest Overloading zones –Different peers are responsible for the same zone –Splits are only performed if a maximum occupancy (e.g. 4) is reached –Nodes know all other nodes in the same zone –But only one of the neighbors

52 P2P Systems - 52 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis CAN Path Length

53 P2P Systems - 53 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Increasing Dimensions and Realities

54 P2P Systems - 54 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis CAN Discussion Performance –Search latency: O(d n 1/d ), depends on choice of d (with high probability, provable) –Message Bandwidth: O(d n 1/d ), (selective routing) –Storage cost: O(d) (routing table) –Update cost: low (like search) –Node join/leave cost: O(d n 1/d ) –Resilience to failures: realities and overloading Qualitative Criteria –search predicates: spatial distance of multidimensional keys –global knowledge: key hashing, network origin –peer autonomy: nodes can decide on their position in the key space Examples of more complex search predicates –Encode "semantic proximity" by properly encoding the data keys into the hashed key space –Map a physical network topology into a CAN key space such that neighboring nodes in the CAN space are also physically close

55 P2P Systems - 55 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Example 4: Dynamical Clustering (Freenet) Freenet Background –P2P system which supports publication, replication, and retrieval of data –Protects anonymity of authors and readers: infeasible to determine the origin or destination of data –Nodes are not aware of what they store (keys and files are sent and stored encrypted) –Uses an adaptive routing and caching strategy Index information maintained at each peer (limited cache size)

56 P2P Systems - 56 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Freenet Routing If a search request arrives –Either the data is in the table –Or the request is forwarded to the addresses with the most similar keys (lexicographic similarity, edit distance) till an answer is found or TTL reached (e.g. TTL = 500) If an answer arrives –The key, address and data of the answer are inserted into the table –The least recently used key and data is evicted Quality of routing should improve over time –Node is listed under certain key in routing tables –Therefore gets more requests for similar keys –Therefore tends to store more entries with similar keys (clustering) when receiving results and caching them –Dynamic replication of data

57 P2P Systems - 57 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis FreeNet Routing search k response (k,p) peer p has k new link established search k', k' similar to k peer p' has k'

58 P2P Systems - 58 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Freenet: Inserting Files First a the key of the file is calculated An insert message with this proposed key and a hops-to-live value is sent to the neighbor with the most similar key Then every peer checks whether the proposed key is already present in its local store –yes  return stored file (original requester must propose new key) –no  route to next peer for further checking (routing uses the same key similarity measure as searching) –continue until hops-to-live are 0 or failure Hops-to-live is 0 and no collision was detected  insert file along the path established by insert message

59 P2P Systems - 59 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Freenet: Evolution of Path Length median path length 500  identical nodes max 50 data items/node max 200 references/node Initial references: (i-1, i-2, i+1, i+2) mod n each time-step: - randomly insert - TTL=20 every 100 time-steps: 300 requests (TTL=500) from random nodes and measure actual path length (failure=500).

60 P2P Systems - 60 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Freenet Discussion Performance –Search latency: low (small world property) –Message Bandwidth: low (selective routing) –Storage cost: relatively low (experimentally not validated !) –Update cost: low (like search) but a bootstrapping phase is required –Resilience to failures: good (high degree of replication of data and keys) Qualitative Criteria –search predicates: with encryption only equality of keys –global knowledge: none –peer autonomy: high (with encryption risk of storing undesired data)

61 P2P Systems - 61 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Comparison

62 P2P Systems - 62 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Summary The following questions apply to Chord, P-Grid, CAN and Freenet: What is the expected search cost ? How are other nodes affected when new nodes enter the network? What is the cost of node insertion ? Which replication strategies are used ? What global knowledge or central coordination is required ?

63 P2P Systems - 63 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis 5. Small World Graphs Each P2P system can be interpreted as a directed graph (overlay network) –peers correspond to nodes –routing table entries as directed links Task –Find a decentralized algorithm (greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph –Requires the existence of short paths in the graph

64 P2P Systems - 64 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Milgram’s Experiment Finding short chains of acquaintances linking pairs of people in USA who didn’t know each other; –Source person in Nebraska –Sends message with first name and location –Target person in Massachusetts. Average length of the chains that were completed was between 5 and 6 steps “Six degrees of separation” principle BIG QUESTION: –WHY there should be short chains of acquaintances linking together arbitrary pairs of strangers???

65 P2P Systems - 65 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Random Graphs For many years typical explanation was - random graphs –Low diameter: expected distance between two nodes is log k N, where k is the outdegree and N the number of nodes –When pairs or vertices are selected uniformly at random they are connected by a short path with high probability But there are some inaccuracies –If A and B have a common friend C it is more likely that they themselves will be friends! (clustering) –Many real world networks (social networks, biological networks in nature, artificial networks – power grid, WWW) exhibit this clustering property –Random networks are NOT clustered.

66 P2P Systems - 66 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Clustering Clustering measures the fraction of neighbors of a node that are connected themselves Regular Graphs have a high clustering coefficient –but also a high diameter Random Graphs have a low clustering coefficient –but a low diameter Both models do match the properties expected from real networks! Random Graph (k=4) Short path length –L~log k N Almost no clustering –C~k/n Regular Graph (k=4) Long paths –L ~ n/(2k) Highly clustered –C~3/4

67 P2P Systems - 67 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Small-World Networks Random rewiring of regular graph (by Watts and Strogatz) –With probability p rewire each link in a regular graph to a randomly selected node –Resulting graph has properties, both of regular and random graphs High clustering and short path length –Freenet has been shown to result in small world graphs

68 P2P Systems - 68 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Flashback: Freenet Search Performance Modifying routing tables in Freenet through caching has a "rewiring effect" Studies show that Freenet graphs have small-world properties Explains improving search performance Regular graph: n nodes, k nearest neighbors  path length ~ n/2k 4096/16 = 256 Random graph: path length ~ log (n)/log(k) ~ 4 Rewired graph (1% of nodes): path length ~ random graph clustering ~ regular graph Small World Graph

69 P2P Systems - 69 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Search in Small World Graphs BUT! Watts-Strogatz can provide a model for the structure of the graph –existence of short paths –high clustering It does not explain how the shortest paths are found –also Gnutella networks are small-world graphs –why can search be efficient in Freenet?

70 P2P Systems - 70 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Overlay Networks as Graphs Each P2P system can be interpreted as a directed graph … –peers correspond to nodes –routing table entries as directed links … embedded in some space –P-Grid: interval [0,1] –Chord: ring [0,1) –CAN: d-dimensional torus –Freenet: strings + lexicographical distance Task –Find a decentralized algorithm (greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph

71 P2P Systems - 71 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Kleinberg’s Small-World Model Kleinberg’s Small-World’s model –Embed the graph into an r-dimensional grid –constant number p of short range links (neighborhood) –q long range links: choose long-range links such that the probability to have a long range contact is proportional to 1/d r Importance of r ! –Decentralized (greedy) routing performs best iff. r = dimension of space r = 2

72 P2P Systems - 72 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Influence of “r” (1) Each peer u has link to the peer v with probability proportional to where d(u,v) is the distance between u and v. Optimal value: r = dim = dimension of the space If r < dim we tend to choose more far away neighbors (decentralized algorithm can quickly approach the neighborhood of target, but then slows down till finally reaches target itself). If r > dim we tend to choose more close neighbors (algorithm finds quickly target in it’s neighborhood, but reaches it slowly if it is far away). When r = 0 – long range contacts are chosen uniformly. Random graph theory proves that there exist short paths between every pair of vertices, BUT there is no decentralized algorithm capable finding these paths

73 P2P Systems - 73 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Influence of “r” (2) Given node u if we can partition the remaining peers into sets A 1, A 2, A 3, …, A logN, where A i, consists of all nodes whose distance from u is between 2 i and 2 i+1, i=0..log(N-1). –Then given r = dim each long range contact of u is nearly equally likely to belong to any of the sets A i –When q = log N – on average each node will have a link in each set of A i

74 P2P Systems - 74 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Traditional DHTs and Kleinberg model P-Grid’s model Kleinberg’s model

75 P2P Systems - 75 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Conclusions from Kleinberg's Model With respect to the Watts and Strogatz model –there is no decentralized algorithm capable performing effective search in the class of SW networks constructed according to Watts and Strogatz –J. Kleinberg presented the infinite family of Small World networks that generalizes the Watts and Strogatz model and shows that decentralized search algorithms can find short paths with high probability –there exist only one unique model within that family for which decentralized algorithms are effective. With respect to overlay networks –Many of the structured P2P overlay networks are similar to Kleinberg’s model (e.g. Chord, randomized version, q=log N, r=1) –Unstructured overlay networks also fit into the model (e.g. Gnutella q=5, r=0) –Some variants of structured P2P overlay networks are having no neighborhood lattice (e.g. P-Grid, p=0) –Extensions to spaces beyond regular grids are possible (e.g. arbitrary metric spaces)

76 P2P Systems - 76 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Summary How can we characterize P2P overlay networks such that we can study them using graph-theoretic approaches? What is the main difference between a random graph and a SW graph? What is the main difference between the Watts/Strogatz and the Kleinberg model? What is the relationship between structured overlay networks and small world graphs? What are possible variations of the small world graph model?

77 P2P Systems - 77 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis References Andy Oram, editor. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O'Reilly & Associates, March K. Aberer, M. Hauswirth: “P2P Systems”, Practical Handbook of Internet Computing, CRC press, 2004 K. Aberer, P. Cudré-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva, R. Schmidt, J. Wu: " Advanced Peer-to-Peer Networking: The P-Grid System and its Applications ", PIK - Praxis der Informationsverarbeitung und Kommunikation, Special Issue on P2P Systems, 26(3), Beverly Yang, Hector Garcia-Molina: Improving Search in Peer-to-Peer Networks. ICDCS 2002: 5-14 Clip2. The Gnutella Protocol Specification v0.4 (Document Revision 1.2). June 15, M.A. Jovanovic, F.S. Annexstein, and K.A.Berman. Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella. University of Cincinnati, Laboratory for Networks and Applied Graph Theory, Eytan Adar and Bernardo A. Huberman. Free Riding on Gnutella. Technical report. Xerox PARC. September 9, Kunwadee Sripanidkulchai. The popularity of Gnutella queries and its implications on scalability. February Matei Ripeanu, Adriana Iamnitchi, Ian T. Foster: Mapping the Gnutella Network. IEEE Internet Computing 6(1): (2002)Adriana IamnitchiIan T. FosterIEEE Internet Computing 6

78 P2P Systems - 78 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis References Qin LvQin Lv, Pei Cao, Edith Cohen, Kai Li, Scott Shenker: Search and replication in unstructured peer- to-peer networks. SIGMETRICS 2002: Pei CaoEdith CohenKai LiSIGMETRICS 2002 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS Springer Verlag revisedhttp://www.freenetproject.org/index.php?page=icsi- revised Theodore Hong. Performance in Decentralized Filesharing Networks. Presentation given by at the O'Reilly Peer-to-Peer Conference, San Francisco. February 14-16, Jon Kleinberg. The Small-World Phenomenon: An Algorithmic Perspective. Technical report Cornell Computer Science, October Heylighen F. (1999): The Science of Self-organization and Adaptivity, in: The Encyclopedia of Life Support Systems Heylighen F. (1999): Collective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map, Computational and Mathematical Theory of Organizations 5(3), A. Barabasi, R. Albert: Emergence of Scaling in random Networks, Science, Vol 286, Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content- Addressable Network. Proceedings of the ACM SIGCOMM, 2001.

79 P2P Systems - 79 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Are you still awake?


Download ppt "P2P Systems - 1 ©2006, Karl Aberer, Manfred Hauswirth - EPFL-IC, Laboratoire de systèmes d'informations répartis Peer-to-Peer Systems Karl Aberer, Manfred."

Similar presentations


Ads by Google