Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 162: P2P Networks Computer Science Division

Similar presentations


Presentation on theme: "CS 162: P2P Networks Computer Science Division"— Presentation transcript:

1 CS 162: P2P Networks Computer Science Division
Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA

2 Main Challenge Find where a particular file is stored
Note: problem similar to finding a particular page in web caching (see last lecture – what are the differences?) E F D E? C A B

3 Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time

4 Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine ftp the file Advantages: Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: Robustness, scalability (?)

5 Napster: Example E m5 E m6 E? F D m1 A m2 B m3 C m4 D m5 E m6 F m4 E?

6 Gnutella Distribute file location Idea: broadcast the request
Hot to find a file: Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request, and it sends back the answer Advantages: Totally decentralized, highly robust Disadvantages: Not scalable; the entire network can be swamped with requests (to alleviate this problem, each request has a TTL)

7 Gnutella: Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… m5 E m6 E E? F D m4 E? C A B m3 m1 m2

8 Two-Level Hierarchy Oct 2003 Crawl on Gnutella
Current Gnutella implementation, KaZaa Leaf nodes are connected to a small number of ultrapeers (suppernodes) Query A leaf sends query to its ultrapeers If ultrapeers don’t know the answer, they flood the query to other ultrapeers More scalable: Flooding only among ultrapeers Ultrapeer nodes Leaf nodes

9 Skype Peer-to-peer Internet Telephony Two-level hierarchy like KaZaa
login server Peer-to-peer Internet Telephony Two-level hierarchy like KaZaa Ultrapeers used mainly to route traffic between NATed end-hosts (see next slide)… … plus a login server to authenticate users ensure that names are unique across network B Messages exchanged to login server A Data traffic (Note*: probable protocol; Skype protocol is not published)

10 Detour: NAT (1/3) Internet Network Address Translation:
Motivation: address scarcity problem in IPv4 Allow to independently allocate addresses to hosts behind NAT Two hosts behind two different NATs can have the same address Internet NAT box NAT box Same address

11 Detour: NAT (2/3) Main idea: use port numbers to multiplex/demultiplex connections of NATed end-hosts Map (IPaddr, Port) of a NATed host to (IPaddrNAT, PortNAT) ( : )(1005:80) src addr dst addr src port dst port 1 5 ( : )(78:80) 3 ( : )(80:1005) Internet NAT box ( : )(80:78) 4 ( :1005) ↔ 78 2 NAT Table

12 Detour: NAT (3/3) Limitations Skype and other P2P systems use
Number of machines behind a NAT <= Why? A host outside NAT cannot initiate connection to a host behind a NAT Skype and other P2P systems use Login servers and ultrapeers to solve limitation (2) How? (Hint: ultrapeers have globally unique (Internet-routable) IP addresses)

13 BitTorrent (1/2) Allow fast downloads even when sources have low connectivity How does it work? Split each file into pieces (~ 256 KB each), and each piece into sub-pieces (~ 16 KB each) The loader loads one piece at a time Within one piece, the loader can load up to five sub-pieces in parallel

14 BitTorrent (2/2) Download consists of three phases:
Start: get a piece as soon as possible Select a random piece Middle: spread all pieces as soon as possible Select rarest piece next End: avoid getting stuck with a slow source, when downloading the last sub-pieces Request in parallel the same sub-piece Cancel slowest downloads once a sub-piece has been received (For details see:

15 Distributed Hash Tables
Problem: Given an ID, map to a host Challenges Scalability: hundreds of thousands or millions of machines Instability Changes in routes, congestion, availability of machines Heterogeneity Latency: 1ms to 1000ms Bandwidth: 32Kb/s to 100Mb/s Nodes stay in system from 10s to a year Trust Selfish users Malicious users

16 Content Addressable Network (CAN)
Associate to each node and item a unique id in an d-dimensional space Properties Routing table size O(d) Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

17 CAN Example: Two Dimensional Space
Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: Assume space size (8 x 8) Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 1 2 3 4 5 6 7

18 CAN Example: Two Dimensional Space
Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7

19 CAN Example: Two Dimensional Space
Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 n3 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7

20 CAN Example: Two Dimensional Space
Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n1 n2 2 1 1 2 3 4 5 6 7

21 CAN Example: Two Dimensional Space
Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7

22 CAN Example: Two Dimensional Space
Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7

23 CAN: Query Example Each node knows its neighbors in the d-space
Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 7 6 n5 n4 n3 5 f4 4 f1 3 n1 n2 2 f3 1 f2 1 2 3 4 5 6 7

24 Chord Associate to each node and item a unique ID in an uni-dimensional space Properties Routing table size O(log(N)) , where N is the total number of nodes Guarantees that a file is found in O(log(N)) steps

25 Data Structure Assume identifier space is 0..2m Each node maintains
Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2i Predecessor node An item identified by id is stored on the succesor node of id

26 Chord Example Assume an identifier space 0..8
Node n1:(1) joinsall entries in its finger table are initialized to itself Succ. Table i id+2i succ 1 7 6 2 5 3 4

27 Chord Example Node n2:(3) joins 1 7 6 2 5 3 4 Succ. Table i id+2i succ
i id+2i succ 1 7 6 2 Succ. Table i id+2i succ 5 3 4

28 Chord Example Nodes n3:(0), n4:(6) join 1 7 6 2 5 3 4 Succ. Table
i id+2i succ Succ. Table i id+2i succ 1 7 Succ. Table i id+2i succ 6 2 Succ. Table i id+2i succ 5 3 4

29 Chord Examples Nodes: n1:(1), n2(3), n3(0), n4(6)
Succ. Table Items Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2) i id+2i succ 7 Succ. Table Items 1 7 i id+2i succ 1 Succ. Table 6 2 i id+2i succ Succ. Table i id+2i succ 5 3 4

30 Query Upon receiving a query for item id, a node
Check whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table Items i id+2i succ 7 Succ. Table Items 1 7 i id+2i succ 1 query(7) Succ. Table 6 2 i id+2i succ Succ. Table i id+2i succ 5 3 4

31 Discussion Query can be implemented
Iteratively Recursively Performance: routing in the overlay network can be more expensive than in the underlying network Because usually there is no correlation between node IDs and their locality; a query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance

32 Discussion Robustness Security
Maintain multiple copies associated to each entry in the routing tables Replicate an item on nodes with close ids in the identifier space Security Can be build on top of CAN, Chord, Tapestry, and Pastry

33 Discussion The key challenge of building wide area P2P systems is a scalable and robust location service Naptser: centralized solution Guarantee correctness and support approximate matching… …but neither scalable nor robust Gnutella, KaZaa Support approximate queries, scalable, and robust… …but doesn’t guarantee correctness (i.e., it may fail to locate an existing file) Distributed Hash Tables Guarantee correctness, highly scalable and robust… … but difficult to implement approximate matching


Download ppt "CS 162: P2P Networks Computer Science Division"

Similar presentations


Ads by Google