Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 School of Computing Science Simon Fraser University CMPT 880: Peer-to-Peer Systems Mohamed Hefeeda 17 January 2005.

Similar presentations


Presentation on theme: "1 School of Computing Science Simon Fraser University CMPT 880: Peer-to-Peer Systems Mohamed Hefeeda 17 January 2005."— Presentation transcript:

1 1 School of Computing Science Simon Fraser University CMPT 880: Peer-to-Peer Systems Mohamed Hefeeda 17 January 2005

2 2 Announcements  Initial round, you have four slots -Jan 24, Jan 26, Jan 31, and Feb 2  Ashok will present the paper on Jan 26 (Wednesday)  Need a volunteer to present the Pastry paper next Monday (an easy one)  Did you read the survey paper? Please do.

3 3 Last Lectures  P2P is an active research area with many potential applications in industry and academia  In P2P computing paradigm -Peers cooperate to achieve desired functions  Simple model for P2P systems -Peers form an abstract layer called overlay -Peer software architecture model may have three components 

4 4 P2P Systems: Simple Model P2P Substrate Operating System Hardware Middleware P2P Application Software architecture model on a peer System architecture: Peers form an overlay according to the P2P Substrate

5 5 Peer Software Architecture Model  A software client installed on each peer  Three components: -P2P Substrate -Middleware -P2P Application P2P Substrate Operating System Hardware Middleware P2P Application Software architecture model on a peer

6 6 P2P Substrate  A key component, which -Manages the Overlay -Allocates and discovers objects  P2P Substrates can be -Structured -Unstructured -Based on the flexibility of placing objects at peers

7 7 Structured P2P Substrates − Objects are rigidly assigned to peers −Objects and peers have IDs (usually by hashing some attributes) −Objects are assigned to peers based on IDs − Peers in the overlay form a specific geometrical shape, e.g., tree, ring, hypercube, butterfly network − The shape (to some extent) determines −How neighbors are chosen, and −How messages are routed

8 8 Structured P2P Substrates (cont’d) − The substrate provides a Distributed Hash Table (DHT)-like interface −InsertObject (key, value), findObject (key), … −In the literature, many authors refer to structured P2P substrates as DHTs − It also provides peer management (join, leave, fail) operations − Most of these operations are done in O(log n) steps, n is number of peers

9 9 Structured P2P Substrates (cont’d) − DHTs: Efficient search & guarantee of finding − However, −Lack of partial name and keyword queries −Maintenance overhead, even O(log n) may be too much in very dynamic environments − Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)

10 10 Example: Content Addressable Network (CAN) [Ratnasamy 01] − Nodes form an overlay in d-dimensional space −Node IDs are chosen randomly from the d-space −Object IDs (keys) are chosen from the same d-space − Space is dynamically partitioned into zones − Each node owns a zone − Zones are split and merged as nodes join and leave − Each node stores −The portion of the hash table that belongs to its zone −Information about its immediate neighbors in the d- space

11 11 2-d CAN: Dynamic Space Division n1 n2 n3 n4 0 0 7 7 n5

12 12 2-d CAN: Key Assignment n1 n2 n3 n4 0 0 7 7 K1 K2 K3 K4 n5

13 13 2-d CAN: Routing (Lookup) n1 n2 n3 n4 0 0 7 7 K1 K2 K3 K4 n5 K4?

14 14 CAN: Routing − Nodes keep 2d = O(d) state information (neighbor coordinates, IPs) −Constant, does not depend on number of nodes n − Greedy routing -Route to the node that is closest to the destination -On average, is done in O(n 1/d ) = O(log n) when d = log n /2

15 15 CAN: Node Join − New node finds a node already in the CAN −(bootstrap: one (or a few) dedicated nodes outside the CAN maintain a partial list of active nodes) − It finds a node whose zone will be split −Choose a random point P −Forward a JOIN request to P through the existing node − The node that owns P splits its zone and sends half of its routing table to the new node − Neighbors of the split zone are notified

16 16 CAN: Node Leave, Fail − Graceful departure −The leaving node hands over its zone to one of its neighbors − Failure −Detected by the absence of heart beat messages sent periodically in regular operation −Neighbors initiate takeover timers, proportional to the volume of their zones −The neighbor with the smallest timer takes over the zone of dead node, and notifies other neighbors so they cancel their timers (some negotiation between neighbors may occur) −Note: the (key, value) entries stored at the failed node are lost −Nodes that insert (key, value) pairs periodically refresh (or re-insert) them

17 17 CAN: Discussion − Scalable −O(log n) steps for operations −State information is O(d) at each node − Locality −Nodes are neighbors in the overlay, not in the physical network −Suggestion (for better routing) −Each node measure RTT between itself and its neighbors −Forward the request to the neighbor with maximum ratio of progress to RTT − Maintenance cost −Logarithmic −But, may still be too much for very dynamic P2P systems

18 18 P2P Substrate  A key component, which -Manages the Overlay -Allocates and discovers objects  P2P Substrates can be -Structured -Unstructured -Based on the flexibility of placing objects at peers

19 19 Unstructured P2P Substrates − Objects can be anywhere  Loosely-controlled overlays − The loose control −Makes the overlay tolerate transient behavior of nodes −When a peer leaves for example, nothing needs to be done because there is no structure to restore −Enables the system to support flexible search queries −Queries are sent in plain text and every node runs a mini- database engine − But, we loose on searching −Usually using flooding, inefficient −Some heuristics exist to enhance performance −No guarantee on locating a requested object (e.g., rarely requested objects) − Ex: Gnutella, Kazaa (super node), GIA [Chawathe et al. 03]

20 20 Example: Gnutella − Peers are called servents − All peers form an unstructured overlay − Peer join −Find an active peer already in Gnutella (contact e.g., Gnutella hosts) −Send a Ping message through the active peer −Peers willing to accept new neighbors reply with Pong − Peer leave, fail −Just drop out of the network! − To search for a file −Send a Query message to all neighbors with a TTL (=7) −Upon receiving a Query message −Check local database and reply with a QueryHit to the requester −Decrement TTL and forward to all neighbors of nonzero

21 21 Flooding in Gnutella Scalability Problem

22 22 Heuristics for Searching [Yang and Garcia-Molina 02] − Iterative deepening −Multiple BFS with increasing TTLs −Reduce traffic but increase response time − Directed BFS −Send to “good” neighbors (subset of your neighbors that returned many results in the past)  need to keep history − Local Indices −Keep a small index over files stored on neighbors (within number of hops) −May answer queries on behalf of them −Save cost of sending queries over the network −Index currency?

23 23 Heuristics for Searching: Super Node − Used in Kazaa (signaling protocols are encrypted) − Studied in [Chawathe 03] − Relatively powerful nodes play special role −maintain indexes over other peers

24 24 Unstructured Substrates with Super Nodes Super Node (SN) Ordinary Node (ON)

25 25 Example: FastTrack Networks (Kazaa) − Most of the info/plots in following slides are from Understanding Kazaa by Liang et al. − By far, the most popular (~ 3 million active users in a typical day) sharing 5,000 Terabytes − Kazaa traffic exceeds Web traffic − Two-tier architecture (with Super Nodes and Ordinary Nodes) − SN maintain an index on files stored at ONs attached to it −ON reports to SN the following metadata on each file: −File name, file size, ContentHash, file descriptors (artist name, album name, …)

26 26 FastTrack Networks (cont’d) − Mainly two types of traffic −Signaling −Handshaking, connection establishment, uploading metadata, … −Encrypted! (some reverse engineering efforts) −Over TCP connections between SN—SN and SN—ON −Analyzed in [Liang et al. 04] −Content traffic −Files exchanged, not encrypted −All through HTTP between ON—ON −Detailed Analysis in [Gummadi et al. 03]

27 27 Kazaa (cont’d) − File search −ON sends a query to its SN −SN replies with a list of IPs of ONs that have the file −SN may forward the query to other SNs − Parallel downloads take place between supplying ONs and receiving ON

28 28 FastTrack Networks (cont’d) − Measurement study of Liang et al. −Hook three machines to Kazaa and wait till one of them is promoted to be SN −Connect the other two (ONs) to that SN −Study several properties −Topology structure and dynamics −Neighbor selection − Super node lifetime −….

29 29 Kazaa: Topology Structure [Liang et al. 04] ON to SN: 100 - 160 connections  Since there are ~3M nodes, we have ~30,000 SNs SN to SN: 30 – 50 connections  Each SN connects to ~0.1 % of total number of SNs

30 30 Kazaa: Topology Dynamics [Liang et al. 04] − Average ON – SN connection duration −Is ~ 1 hour, after removing very short-lived connections (30 sec) used for shopping for SNs − Average SN – SN connection duration −23 min, which is short because of −Connection shuffling between SNs to allow ONs to reach a larger set of objects −SNs search for other SNs with smaller loads −SNs connect to each other from time to time to exchange SN lists (each SN stores 200 other SNs in its cache)

31 31 Kazaa: Neighbor Selection [Liang et al. 04] − When ON first joins, it get a list of 200 SNs −ON considers locality and SN workload in selecting its future SN − Locality −40% of ON-SN connections have RTT < 5 msec −60% of ON-SN connections have RTT < 50 msec −RTT: E. US  Europe ~100 msec

32 32 Kazaa: Lifetime and Signaling Overhead [Liang et al. 04] − Super node average lifetime is ~2.5 hours − Overhead: −161 Kb/s upstream −191 Kb/s downstream −  Most of SNs are high-speed (campus network, or cable)

33 33 Kazaa vs. Firewalls, NAT [Liang et al. 04] − Default port WAS 1214 −Easy for firewalls to filter out Kazaa traffic − Now, Kazaa uses dynamic ports −Each peer chooses its random port −ON reports its port to its SN −Ports of SNs are part of the SN refresh list exchanged among peers −Too bad for firewalls! − Network Address Translator (NAT) −A requesting peer can not establish a direct connection with a serving peer behind NAT −Sol: connection reversal −Send to SN of the NATed peer, which already has a connection with it −SN tells the NATed peer to establish a connection with the requesting peer! −Transfer occurs happily through the NAT

34 34 Kazaa: Lessons [Liang et al. 04] − Distributed design − Exploit heterogeneity − Load balancing − Locality in neighbor selection − Connection Shuffling −If a peer searches for a file and does not find it, it may try later and gets it! − Efficient gossiping algorithms −To learn about other SNs and perform shuffling −Kazaa uses a “freshness” field in SN refresh list  a peer ignores stale data − Consider peers behind NATs and Firewalls −They are everywhere!

35 35 Project Discussion − Refer to the handout

36 36 Papers Flash Overview − Refer to the Course Reading List web page


Download ppt "1 School of Computing Science Simon Fraser University CMPT 880: Peer-to-Peer Systems Mohamed Hefeeda 17 January 2005."

Similar presentations


Ads by Google