Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.

Similar presentations


Presentation on theme: "Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella."— Presentation transcript:

1 winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella –KaZaA –BitTorrent Distributed Hash Tables (DHT) and Structured Networks –Chord –Pros and Cons Readings: do required and optional readings if interested

2 winter 2008 P2P2 Peer-to-Peer Networks: How Did it Start? A killer application: Naptser –Free music over the Internet Key idea: share the content, storage and bandwidth of individual (home) users Internet

3 winter 2008 P2P3 Model Each user stores a subset of files Each user has access (can download) files from all users in the system

4 winter 2008 P2P4 Main Challenge Find where a particular file is stored A B C D E F E?

5 winter 2008 P2P5 Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time

6 winter 2008 P2P6 Peer-to-Peer Networks: Napster Napster history: the rise –January 1999: Napster version 1.0 –May 1999: company founded –September 1999: first lawsuits –2000: 80 million users Napster history: the fall –Mid 2001: out of business due to lawsuits –Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained –2003: growth of pay services like iTunes Napster history: the resurrection –2003: Napster reconstituted as a pay service –2006: still lots of file sharing going on Shawn Fanning, Northeastern freshman

7 winter 2008 P2P7 Napster Technology: Directory Service User installing the software –Download the client program –Register name, password, local directory, etc. Client contacts Napster (via TCP) –Provides a list of music files it will share –… and Napster’s central server updates the directory Client searches on a title or performer –Napster identifies online clients with the file –… and provides IP addresses Client requests the file from the chosen supplier –Supplier transmits the file to the client –Both client and supplier report status to Napster

8 winter 2008 P2P8 Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) –Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine –ftp the file Advantages: –Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: –Robustness, scalability (?)

9 winter 2008 P2P9 Napster: Example A B C D E F m1 m2 m3 m4 m5 m6 m1 A m2 B m3 C m4 D m5 E m6 F E? m5 E? E

10 winter 2008 P2P10 Napster Technology: Properties Server’s directory continually updated –Always know what music is currently available –Point of vulnerability for legal action Peer-to-peer file transfer –No load on the server –Plausible deniability for legal action (but not enough) Proprietary protocol –Login, search, upload, download, and status operations –No security: cleartext passwords and other vulnerability Bandwidth issues –Suppliers ranked by apparent bandwidth & response time

11 winter 2008 P2P11 Napster: Limitations of Central Directory Single point of failure Performance bottleneck Copyright infringement So, later P2P systems were more distributed File transfer is decentralized, but locating content is highly centralized

12 winter 2008 P2P12 Peer-to-Peer Networks: Gnutella Gnutella history –2000: J. Frankel & T. Pepper released Gnutella –Soon after: many other clients (e.g., Morpheus, Limewire, Bearshare) –2001: protocol enhancements, e.g., “ultrapeers” Query flooding –Join: contact a few nodes to become neighbors –Publish: no need! –Search: ask neighbors, who ask their neighbors –Fetch: get file directly from another node

13 winter 2008 P2P13 Gnutella Distribute file location Idea: flood the request Hot to find a file: –Send request to all neighbors –Neighbors recursively multicast the request –Eventually a machine that has the file receives the request, and it sends back the answer Advantages: –Totally decentralized, highly robust Disadvantages: –Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)

14 winter 2008 P2P14 Gnutella Ad-hoc topology Queries are flooded for bounded number of hops No guarantees on recall Query: “xyz” xyz

15 winter 2008 P2P15 Gnutella: Query Flooding Fully distributed –No central server Public domain protocol Many Gnutella clients implementing protocol Overlay network: graph Edge between peer X and Y if there’s a TCP connection All active peers and edges is overlay net Given peer will typically be connected with < 10 overlay neighbors

16 winter 2008 P2P16 Gnutella: Protocol Query message sent over existing TCP connections Peers forward Query message QueryHit sent over reverse path Query QueryHit Query QueryHit Query Query Hit File transfer: HTTP Scalability: limited scope flooding

17 winter 2008 P2P17 Gnutella: Peer Joining Joining peer X must find some other peer in Gnutella network: use list of candidate peers X sequentially attempts to make TCP with peers on list until connection setup with Y X sends Ping message to Y; Y forwards Ping message. All peers receiving Ping message respond with Pong message X receives many Pong messages. It can then setup additional TCP connections

18 winter 2008 P2P18 Gnutella: Pros and Cons Advantages –Fully decentralized –Search cost distributed –Processing per node permits powerful search semantics Disadvantages –Search scope may be quite large –Search time may be quite long –High overhead and nodes come and go often

19 winter 2008 P2P19 Peer-to-Peer Networks: KaAzA KaZaA history –2001: created by Dutch company (Kazaa BV) –Single network called FastTrack used by other clients as well –Eventually the protocol changed so other clients could no longer talk to it Smart query flooding –Join: on start, the client contacts a super-node (and may later become one) –Publish: client sends list of files to its super-node –Search: send query to super-node, and the super- nodes flood queries among themselves –Fetch: get file directly from peer(s); can fetch from multiple peers at once

20 winter 2008 P2P20 KaZaA: Exploiting Heterogeneity Each peer is either a group leader or assigned to a group leader –TCP connection between peer and its group leader –TCP connections between some pairs of group leaders Group leader tracks the content in all its children

21 winter 2008 P2P21 KaZaA: Motivation for Super-Nodes Query consolidation –Many connected nodes may have only a few files –Propagating query to a sub-node may take more time than for the super-node to answer itself Stability –Super-node selection favors nodes with high up-time –How long you’ve been on is a good predictor of how long you’ll be around in the future

22 winter 2008 P2P22 Peer-to-Peer Networks: BitTorrent BitTorrent history and motivation –2002: B. Cohen debuted BitTorrent –Key motivation: popular content Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN Web site on 9/11, release of a new movie or game –Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders –Preventing free-loading

23 winter 2008 P2P23 BitTorrent: Simultaneous Downloading Divide large file into many pieces –Replicate different pieces on different peers –A peer with a complete piece can trade with other peers –Peer can (hopefully) assemble the entire file Allows simultaneous downloading –Retrieving different parts of the file from different peers at the same time

24 winter 2008 P2P24 BitTorrent Components Seed –Peer with entire file –Fragmented in pieces Leacher –Peer with an incomplete copy of the file Torrent file –Passive component –Stores summaries of the pieces to allow peers to verify their integrity Tracker –Allows peers to find each other –Returns a list of random peers

25 winter 2008 P2P25 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Web Server.torrent

26 winter 2008 P2P26 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Web Server

27 winter 2008 P2P27 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Response-peer list Web Server

28 winter 2008 P2P28 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Shake-hand Web Server Shake-hand

29 winter 2008 P2P29 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

30 winter 2008 P2P30 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

31 winter 2008 P2P31 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloade r “US” Peer [Seed] Peer [Leech] Tracker Get- announce Response-peer list pieces Web Server

32 winter 2008 P2P32 Free-Riding Problem in P2P Networks Vast majority of users are free-riders –Most share no files and answer no queries –Others limit # of connections or upload speed A few “peers” essentially act as servers –A few individuals contributing to the public good –Making them hubs that basically act as a server BitTorrent prevent free riding –Allow the fastest peers to download from you –Occasionally let some free loaders download

33 winter 2008 P2P33 Distributed Hash Tables (DHTs) Abstraction: a distributed hash-table data structure –insert(id, item); –item = query(id); (or lookup(id);) –Note: item can be anything: a data object, document, file, pointer to a file… Proposals –CAN, Chord, Kademlia, Pastry, Tapestry, etc

34 winter 2008 P2P34 DHT Design Goals Make sure that an item (file) identified is always found Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes

35 winter 2008 P2P35 Distributed Hash Tables (DHTs) Hash table interface: put(key,item), get(key) O(log n) hops Guarantees on recall Structured Networks K I put(K 1,I 1 ) (K 1,I 1 ) get (K 1 ) I1I1

36 winter 2008 P2P36 Chord Associate to each node and item a unique id in an uni-dimensional space 0..2 m -1 Key design decision –Decouple correctness from efficiency Properties –Routing table size O(log(N)), where N is the total number of nodes –Guarantees that a file is found in O(log(N)) steps

37 winter 2008 P2P37 Identifier to Node Mapping Example Node 8 maps [5,8] Node 15 maps [9,15] Node 20 maps [16, 20] … Node 4 maps [59, 4] Each node maintains a pointer to its successor 4 20 32 35 8 15 44 58

38 winter 2008 P2P38 Lookup Each node maintains its successor Route packet (ID, data) to the node responsible for ID using successor pointers 4 20 32 35 8 15 44 58 lookup(37) node=44

39 winter 2008 P2P39 Joining Operation Each node A periodically sends a stabilize() message to its successor B Upon receiving a stabilize() message, node B –returns its predecessor B’=pred(B) to A by sending a notify(B’) message Upon receiving notify(B’) from B, –if B’ is between A and B, A updates its successor to B’ –A doesn’t do anything, otherwise

40 winter 2008 P2P40 Joining Operation Node with id=50 joins the ring Node 50 needs to know at least one node already in the system –Assume known node is 15 4 20 32 35 8 15 44 58 50 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35

41 winter 2008 P2P41 Joining Operation Node 50: send join(50) to node 15 Node 44: returns node 58 Node 50 updates its successor to 58 4 20 32 35 8 15 44 58 50 join(50) succ=58 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35 58

42 winter 2008 P2P42 Joining Operation Node 50: send stabilize() to node 58 Node 58: –update predecessor to 50 –send notify() back 4 20 3232 35 8 15 44 58 50 succ=58 pred=nil succ=58 pred=35 stabilize() notify(pred=50) pred=50 succ=4 pred=44

43 winter 2008 P2P43 Joining Operation (cont’d) Node 44 sends a stabilize message to its successor, node 58 Node 58 reply with a notify message Node 44 updates its successor to 50 4 20 32 35 8 15 44 58 50 succ=58 stabilize() notify(pred=50) succ=50 pred=50 succ=4 pred=nil succ=58 pred=35

44 winter 2008 P2P44 Joining Operation (cont’d) Node 44 sends a stabilize message to its new successor, node 50 Node 50 sets its predecessor to node 44 4 20 32 35 8 15 44 58 50 succ=58 succ=50 Stabilize() pred=44 pred=50 pred=35 succ=4 pred=nil

45 winter 2008 P2P45 Joining Operation (cont’d) This completes the joining operation! 4 20 32 35 8 15 44 58 50 succ=58 succ=50 pred=44 pred=50

46 winter 2008 P2P46 Achieving Efficiency: finger tables 80 + 2 0 80 + 2 1 80 + 2 2 80 + 2 3 80 + 2 4 80 + 2 5 (80 + 2 6 ) mod 2 7 = 16 0 Say m=7 ith entry at peer with id n is first peer with id >= i ft[i] 0 96 1 96 2 96 3 96 4 96 5 112 6 20 Finger Table at 80 32 45 80 20 112 96

47 winter 2008 P2P47 Achieving Robustness To improve robustness each node maintains the k (> 1) immediate successors instead of only one successor In the notify() message, node A can send its k-1 successors to its predecessor B Upon receiving notify() message, B can update its successor list by concatenating the successor list received from A with A itself

48 winter 2008 P2P48 Chord Optimizations Reduce latency –Chose finger that reduces expected time to reach destination –Chose the closest node from range [N+2 i-1,N+2 i ) as successor Accommodate heterogeneous systems –Multiple virtual nodes per physical node

49 winter 2008 P2P49 DHT Conclusions Distributed Hash Tables are a key component of scalable and robust overlay networks Chord: O(log n) state, O(log n) distance Both can achieve stretch < 2 Simplicity is key Services built on top of distributed hash tables –persistent storage (OpenDHT, Oceanstore) –p2p file storage, i3 (chord) –multicast (CAN, Tapestry)


Download ppt "Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella."

Similar presentations


Ads by Google