Presentation is loading. Please wait.

Presentation is loading. Please wait.

OVERVIEW Lecture 8 Distributed Hash Tables. Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and m E.g.,

Similar presentations


Presentation on theme: "OVERVIEW Lecture 8 Distributed Hash Tables. Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and m E.g.,"— Presentation transcript:

1 OVERVIEW Lecture 8 Distributed Hash Tables

2 Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and mgunes@cse.unr.edu m E.g., “http://cse.unr.edu/” and the Web page m E.g., “HitSong.mp3” and “12.78.183.2” r Hash table m Data structure that associates keys with values 2 lookup(key) valuekey value

3 Distributed Hash Table r Hash table spread over many nodes m Distributed over a wide area r Main design goals m Decentralization: no central coordinator m Scalability: efficient with large # of nodes m Fault tolerance: tolerate nodes joining/leaving r Two key design decisions m How do we map names on to nodes? m How do we route a request to that node? 3

4 Hash Functions r Hashing m Transform the key into a number m And use the number to index an array r Example hash function m Hash(x) = x mod 101, mapping to 0, 1, …, 100 r Challenges m What if there are more than 101 nodes? Fewer? m Which nodes correspond to each hash value? m What if nodes come and go over time? 4

5 Consistent Hashing r Large, sparse identifier space (e.g., 128 bits) m Hash a set of keys x uniformly to large id space m Hash nodes to the id space as well 5 01 Hash(name)  object_id Hash(IP_address)  node_id Id space represented as a ring 2 128 -1

6 Where to Store (Key, Value) Pair? r Mapping keys in a load-balanced way m Store the key at one or more nodes m Nodes with identifiers “close” to the key where distance is measured in the id space r Advantages m Even distribution m Few changes as nodes come and go… 6 Hash(name)  object_id Hash(IP_address)  node_id

7 Joins and Leaves of Nodes r Maintain a circularly linked list around the ring m Every node has a predecessor and successor 7 node pred succ

8 Joins and Leaves of Nodes r When an existing node leaves m Node copies its pairs to its predecessor m Predecessor points to node’s successor in the ring r When a node joins m Node does a lookup on its own id m And learns the node responsible for that id m This node becomes the new node’s successor m And the node can learn that node’s predecessor which will become the new node’s predecessor 8

9 Nodes Coming and Going r Small changes when nodes come and go m Only affects mapping of keys mapped to the node that comes or goes 9 Hash(name)  object_id Hash(IP_address)  node_id

10 How to Find the Nearest Node? r Need to find the closest node m To determine who should store (key, value) pair m To direct a future lookup(key) query to the node r Strawman solution: walk through linked list m Circular linked list of nodes in the ring m O(n) lookup time when n nodes in the ring r Alternative solution: m Jump further around ring m “Finger” table of additional overlay links 10

11 Links in the Overlay Topology r Trade-off between # of hops vs. # of neighbors m E.g., log(n) for both, where n is the number of nodes m E.g., such as overlay links 1/2, 1/4 1/8, … around the ring m Each hop traverses at least half of the remaining distance 11 1/2 1/4 1/8

12 Lecture 9 Peer-to-Peer (p2p) CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford

13 Goals of Today’s Lecture r Scalability in distributing a large file m Single server and N clients m Peer-to-peer system with N peers r Searching for the right peer m Central directory (Napster) m Query flooding (Gnutella) m Hierarchical overlay (Kazaa) r BitTorrent m Transferring large files m Preventing free-riding

14 Clients and Servers r Client program m Running on end host m Requests service m E.g., Web browser r Server program m Running on end host m Provides service m E.g., Web server GET /index.html “Site under construction”

15 Client-Server Communication r Client “sometimes on” m Initiates a request to the server when interested m E.g., Web browser on your laptop or cell phone m Doesn’t communicate directly with other clients m Needs to know the server’s address r Server is “always on” m Services requests from many client hosts m E.g., Web server for the www.unr.edu Web site m Doesn’t initiate contact with the clients m Needs a fixed, well- known address

16 Server Distributing a Large File d1d1 F bits d2d2 d3d3 d4d4 upload rate u s Download rates d i Internet

17 Server Distributing a Large File r Server sending a large file to N receivers m Large file with F bits m Single server with upload rate u s m Download rate d i for receiver i r Server transmission to N receivers m Server needs to transmit NF bits m Takes at least NF/u s time r Receiving the data m Slowest receiver receives at rate d min = min i {d i } m Takes at least F/d min time r Download time: max{NF/u s, F/d min }

18 Speeding Up the File Distribution r Increase the upload rate from the server m Higher link bandwidth at the one server m Multiple servers, each with their own link m Requires deploying more infrastructure r Alternative: have the receivers help m Receivers get a copy of the data m And then redistribute the data to other receivers m To reduce the burden on the server

19 Peers Help Distributing a Large File d1d1 F bits d2d2 d3d3 d4d4 upload rate u s Download rates d i Internet u1u1 u2u2 u3u3 u4u4 Upload rates u i

20 Peers Help Distributing a Large File r Start with a single copy of a large file m Large file with F bits and server upload rate u s m Peer i with download rate d i and upload rate u i r Two components of distribution latency m Server must send each bit: min time F/u s m Slowest peer receives each bit: min time F/d min r Total upload time using all upload resources m Total number of bits: NF m Total upload bandwidth u s + sum i (u i ) r Total: max{F/u s, F/d min, NF/(u s +sum i (u i ))}

21 Comparing the Two Models r Download time m Client-server: max{NF/u s, F/d min } m Peer-to-peer: max{F/u s, F/d min, NF/(u s +sum i (u i ))} r Peer-to-peer is self-scaling m Much lower demands on server bandwidth m Distribution time grows only slowly with N r But… m Peers may come and go m Peers need to find each other m Peers need to be willing to help each other

22 Challenges of Peer-to-Peer r Peers come and go m Peers are intermittently connected m May come and go at any time m Or come back with a different IP address r How to locate the relevant peers? m Peers that are online right now m Peers that have the content you want r How to motivate peers to stay in system? m Why not leave as soon as download ends? m Why bother uploading content to anyone else?

23 Locating the Relevant Peers r Three main approaches m Central directory (Napster) m Query flooding (Gnutella) m Hierarchical overlay (Kazaa, modern Gnutella) r Design goals m Scalability m Simplicity m Robustness m Plausible deniability

24 Peer-to-Peer Networks: Napster r Napster history: the rise m January 1999: Napster version 1.0 m May 1999: company founded m December 1999: first lawsuits m 2000: 80 million users r Napster history: the fall m Mid 2001: out of business due to lawsuits m Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained m 2003: growth of pay services like iTunes r Napster history: the resurrection m 2003: Napster name/logo reconstituted as a pay service Shawn Fanning, Northeastern freshman

25 Napster Technology: Directory Service r User installing the software m Download the client program m Register name, password, local directory, etc. r Client contacts Napster (via TCP) m Provides a list of music files it will share m … and Napster’s central server updates the directory r Client searches on a title or performer m Napster identifies online clients with the file m … and provides IP addresses r Client requests the file from the chosen supplier m Supplier transmits the file to the client m Both client and supplier report status to Napster

26 Napster Technology: Properties r Server’s directory continually updated m Always know what music is currently available m Point of vulnerability for legal action r Peer-to-peer file transfer m No load on the server m Plausible deniability for legal action (but not enough) r Proprietary protocol m Login, search, upload, download, and status operations m No security: cleartext passwords and other vulnerability r Bandwidth issues m Suppliers ranked by apparent bandwidth & response time

27 Napster: Limitations of Central Directory r Single point of failure r Performance bottleneck r Copyright infringement r So, later P2P systems were more distributed m Gnutella went to the other extreme… File transfer is decentralized, but locating content is highly centralized

28 Peer-to-Peer Networks: Gnutella r Gnutella history m 2000: J. Frankel & T. Pepper released Gnutella m Soon after: many other clients e.g., Morpheus, Limewire, Bearshare m 2001: protocol enhancements, e.g., “ultrapeers” r Query flooding m Join: contact a few nodes to become neighbors m Publish: no need! m Search: ask neighbors, who ask their neighbors m Fetch: get file directly from another node

29 Gnutella: Query Flooding r Fully distributed m No central server r Public domain protocol r Many Gnutella clients implementing protocol Overlay network: graph r Edge between peer X and Y if there’s a TCP connection r All active peers and edges is overlay net r Given peer will typically be connected with < 10 overlay neighbors

30 Gnutella: Protocol r Query message sent over existing TCP connections r Peers forward Query message r QueryHit sent over reverse path Query QueryHit Query QueryHit Query QueryHit File transfer: HTTP Scalability: limited scope flooding

31 Gnutella: Peer Joining r Joining peer X must find some other peers m Start with a list of candidate peers m X sequentially attempts TCP connections with peers on list until connection setup with Y r X sends Ping message to Y m Y forwards Ping message. m All peers receiving Ping message respond with Pong message r X receives many Pong messages m X can then set up additional TCP connections

32 Gnutella: Pros and Cons r Advantages m Fully decentralized m Search cost distributed m Processing per node permits powerful search semantics r Disadvantages m Search scope may be quite large m Search time may be quite long m High overhead, and nodes come and go often

33 Peer-to-Peer Networks: KaAzA r KaZaA history m 2001: created by Dutch company (Kazaa BV) m Single network called FastTrack used by other clients as well m Eventually the protocol changed so other clients could no longer talk to it r Smart query flooding m Join: on start, the client contacts a super-node and may later become one m Publish: client sends list of files to its super-node m Search: send query to super-node, and the super-nodes flood queries among themselves m Fetch: get file directly from peer(s); can fetch from multiple peers at once

34 KaZaA: Exploiting Heterogeneity r Each peer is either a group leader or assigned to a group leader m TCP connection between peer and its group leader m TCP connections between some pairs of group leaders r Group leader tracks the content in all its children

35 KaZaA: Motivation for Super- Nodes r Query consolidation m Many connected nodes may have only a few files m Propagating query to a sub-node may take more time than for the super-node to answer itself r Stability m Super-node selection favors nodes with high up- time m How long you’ve been on is a good predictor of how long you’ll be around in the future

36 Peer-to-Peer Networks: BitTorrent r BitTorrent history and motivation m 2002: B. Cohen debuted BitTorrent m Key motivation: popular content Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot/Digg effect, CNN Web site on 9/11, release of a new movie or game m Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders m Preventing free-loading

37 BitTorrent: Simultaneous Downloading r Divide large file into many pieces m Replicate different pieces on different peers m A peer with a complete piece can trade with other peers m Peer can (hopefully) assemble the entire file r Allows simultaneous downloading m Retrieving different parts of the file from different peers at the same time m And uploading parts of the file to peers m Important for very large files

38 BitTorrent: Tracker r Infrastructure node m Keeps track of peers participating in the torrent r Peers register with the tracker m Peer registers when it arrives m Peer periodically informs tracker it is still there r Tracker selects peers for downloading m Returns a random set of peers m Including their IP addresses m So the new peer knows who to contact for data r Can have “trackerless” system using DHT

39 BitTorrent: Chunks r Large file divided into smaller pieces m Fixed-sized chunks m Typical chunk size of 256 Kbytes r Allows simultaneous transfers m Downloading chunks from different neighbors m Uploading chunks to other neighbors r Learning what chunks your neighbors have m Periodically asking them for a list r File done when all chunks are downloaded

40 Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Web Server.torrent BitTorrent: Overall Architecture

41 A Web page with link to.torrent B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Web Server

42 BitTorrent: Overall Architecture Peer [Leech] Downloader “US” Web page with link to.torrent A B C Peer [Seed] Peer [Leech] Tracker Response-peer list Web Server

43 Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Shake-hand Web Server Shake-hand BitTorrent: Overall Architecture

44 Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

45 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

46 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Response-peer list pieces Web Server

47 BitTorrent: Chunk Request Order r Which chunks to request? m Could download in order m Like an HTTP client does r Problem: many peers have the early chunks m Peers have little to share with each other m Limiting the scalability of the system r Problem: eventually nobody has rare chunks m E.g., the chunks need the end of the file m Limiting the ability to complete a download r Solutions: random selection and rarest first

48 BitTorrent: Rarest Chunk First r Which chunks to request first? m The chunk with the fewest available copies m i.e., the rarest chunk first r Benefits to the peer m Avoid starvation when some peers depart r Benefits to the system m Avoid starvation across all peers wanting a file m Balance load by equalizing # of copies of chunks

49 Free-Riding Problem in P2P Networks r Vast majority of users are free-riders m Most share no files and answer no queries m Others limit # of connections or upload speed r A few “peers” essentially act as servers m A few individuals contributing to the public good m Making them hubs that basically act as a server r BitTorrent prevent free riding m Allow the fastest peers to download from you m Occasionally let some free loaders download

50 Bit-Torrent: Preventing Free- Riding r Peer has limited upload bandwidth m And must share it among multiple peers r Prioritizing the upload bandwidth: tit for tat m Favor neighbors that are uploading at highest rate r Rewarding the top four neighbors m Measure download bit rates from each neighbor m Reciprocates by sending to the top four peers m Recompute and reallocate every 10 seconds r Optimistic unchoking m Randomly try a new neighbor every 30 seconds m So new neighbor has a chance to be a better partner

51 51 BitTyrant: Gaming BitTorrent r BitTorrent can be gamed, too m Peer uploads to top N peers at rate 1/N m E.g., if N=4 and peers upload at 15, 12, 10, 9, 8, 3 m … then peer uploading at rate 9 gets treated quite well r Best to be the N th peer in the list, rather than 1 st m Offer just a bit more bandwidth than the low-rate peers m But not as much as the higher-rate peers m And you’ll still be treated well by others r BitTyrant software m Uploads at higher rates to higher-bandwidth peers m http://bittyrant.cs.washington.edu/

52 BitTorrent Today r Significant fraction of Internet traffic m Estimated at 30% m Though this is hard to measure r Problem of incomplete downloads m Peers leave the system when done m Many file downloads never complete m Especially a problem for less popular content r Still lots of legal questions remains r Further need for incentives

53 Conclusions r Peer-to-peer networks m Nodes are end hosts m Primarily for file sharing, and recently telephony r Finding the appropriate peers m Centralized directory (Napster) m Query flooding (Gnutella) m Super-nodes (KaZaA) r BitTorrent m Distributed download of large files m Anti-free-riding techniques r Great example of how change can happen so quickly in application-level protocols


Download ppt "OVERVIEW Lecture 8 Distributed Hash Tables. Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and m E.g.,"

Similar presentations


Ads by Google