Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent.

Similar presentations


Presentation on theme: "Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent."— Presentation transcript:

1 Peer-to-Peer Networks Hongli Luo CEIT, IPFW

2 r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent

3 Application architectures r Client-server r Peer-to-peer (P2P) r Hybrid of client-server and P2P

4 Client-server architecture server: m always-on host m permanent IP address m server farms for scaling clients: m communicate with server m may be intermittently connected m may have dynamic IP addresses m do not communicate directly with each other client/server

5 Client-server architecture r Applications based on client-server architecture are infrastructure intensive. r Service provider must pay connection and bandwidth costs for transmission of data over the Internet r Applications m Search engines – Google m Web-based e-mail – Yahoo Mail m Social networking – MySpace and Facebook m Video sharing – YouTube m Amazon? - use cloud computing r Server farm m Used when a single server is incapable of keeping up with all the requests from clients m A cluster of hosts is used to create a powerful virtual server

6 P2P architecture r no always-on server r arbitrary end systems directly communicate r peers are intermittently connected and change IP addresses r Peers are not owned by service provider Highly scalable but difficult to manage peer-peer

7 P2P architecture r Applications m File distribution – BitTorrent m File search/sharing – eMule and LimeWire m Internet telephony – Skype m IPTV – PPLive r Advantages m Self-scalability m Cost effective – requires no significant server infrastructure and server bandwidth r Disadvantage m Difficult to manage m security

8 Hybrid of client-server and P2P Skype m voice-over-IP P2P application m centralized server: finding address of remote party: m client-client connection: direct (not through server) Instant messaging m chatting between two users is P2P User-to-user messages are sent directly between user hosts without passing through intermediate servers m centralized service: client presence detection/location user registers its IP address with central server when it comes online user contacts central server to find IP addresses of buddies Servers are used to track IP addresses of USERS

9 Processes communicating Process: program running within a host. r within same host, two processes communicate using inter- process communication (defined by OS). r processes in different hosts communicate by exchanging messages P2P: applications with P2P architectures have client processes & server processes r In P2P file sharing, peer downloading the file is client, the peer uploading the file is server. r Client process: process that initiates communication r Server process: process that waits to be contacted

10 P2P file sharing r Problem: How to distribute a large file from a single server to a large number of hosts? r Solutions m Client-server file distribution the server send a copy of the file to each of the hosts Burden on the server and server bandwidth m P2P file distribution each peer can redistribute any portion of the file it has received to any other peers The most popular protocol BitTorrent – consists of 30% of the Internet backbone traffic

11 Server Distributing a Large File r Assumptions m Internet core has abundant bandwidth m All of the bottlenecks are in network access m The server and clients are not participating in any other network applications r Distribution time D – the time it takes to get a copy of the file to all N peers.

12 Server Distributing a Large File r Server sending a large file to N receivers m Large file with F bits m Single server with upload rate u s m Download rate d i for receiver i r Server transmission to N receivers m Server needs to transmit NF bits m Takes at least NF/u s time r Receiving the data m Slowest receiver receives at rate d min = min i {d i } m Takes at least F/d min time r Distribution time for client-server Dcs >= max{NF/u s, F/d min }

13 Speeding Up the File Distribution r Distribution time increases linearly with the number of peers N. r Increase the upload rate from the server m Higher link bandwidth at the one server m Multiple servers, each with their own link m Requires deploying more infrastructure r Alternative: have the receivers help m Receivers get a copy of the data m And then redistribute the data to other receivers m To reduce the burden on the server

14 Peers Help Distributing a Large File d1d1 F bits d2d2 d3d3 d4d4 upload rate u s Download rates d i Internet u1u1 u2u2 u3u3 u4u4 Upload rates u i

15 Peers Help Distributing a Large File r Start with a single copy of a large file m Large file with F bits and server upload rate u s m Peer i with download rate d i and upload rate u i r Two components of distribution latency m Server must send each bit: min time F/u s m Slowest peer receives each bit: min time F/d min r Total upload time using all upload resources m Total number of bits: NF m Total upload bandwidth u s + sum i (u i ) r Distribution time for peer-2-peer D p2p >= max{F/u s, F/d min, NF/(u s +sum i (u i ))}

16 Comparing Client-server, P2P architectures assumes that all peers have the same upload rate u F/u =1 hour, us = 10u, d min >=u s The minimum distribution time is less than 1 hour for any number of peers N.

17 Comparing the Two Models r Download time m Client-server: D cs = max{NF/u s, F/d min } m Peer-to-peer: D p2p = max{F/u s, F/d min, NF/(u s +sum i (u i ))} assume each peer redistribute a bit as soon as it receives the bit r Peer-to-peer is self-scaling m Much lower demands on server bandwidth m Distribution time grows only slowly with N r But… m Peers may come and go m Peers need to find each other m Peers need to be willing to help each other

18 P2P file sharing Example r Alice runs P2P client application on her notebook computer r intermittently connects to Internet; gets new IP address for each connection r asks for “Hey Jude” r application displays other peers that have copy of Hey Jude. r Alice chooses one of the peers, Bob. r file is copied from Bob’s PC to Alice’s notebook r while Alice downloads, other users uploading from Alice. r Alice’s peer is both a client and a transient server. All peers are servers = highly scalable!

19 Challenges of Peer-to-Peer r Peers come and go m Peers are intermittently connected m May come and go at any time m Or come back with a different IP address r How to locate the relevant peers? m Peers that are online right now m Peers that have the content you want r How to motivate peers to stay in system? m Why not leave as soon as download ends? m Why bother uploading content to anyone else?

20 Locating the Relevant Peers r Mapping of information to host locations m File-sharing system: file tracking, map file to IP of the peer m Instant messaging : presence tracking, map username to IP r Three main approaches for indexing and searching files m Central directory (Napster) m Query flooding (Gnutella) m Hierarchical overlay (Kazaa, modern Gnutella) r Design goals m Scalability m Simplicity m Robustness m Plausible deniability

21 Peer-to-Peer Networks: Napster r Napster history: the rise m January 1999: Napster version 1.0 m May 1999: company founded m December 1999: first lawsuits m 2000: 80 million users r Napster history: the fall m Mid 2001: out of business due to lawsuits m Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained m 2003: growth of pay services like iTunes r Napster history: the resurrection m 2003: Napster name/logo reconstituted as a pay service

22 P2P: centralized directory original “Napster” design 1) when peer connects, it informs central server: m IP address m content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob centralized directory server peers Alice Bob 1 1 1 1 2 3

23 Napster Technology: Directory Service r User installing the software m Download the client program m Register name, password, local directory, etc. r Client contacts Napster (via TCP) m Provides a list of music files it will share m Napster’s central server updates the directory r Client searches on a title or performer m Napster identifies online clients with the file m and provides IP addresses r Client requests the file from the chosen supplier m Supplier transmits the file to the client m Both client and supplier report status to Napster

24 Napster Technology: Properties r Server’s directory continually updated m Always know what music is currently available m Point of vulnerability for legal action r Peer-to-peer file transfer m No load on the server m Plausible deniability for legal action (but not enough) r Proprietary protocol m Login, search, upload, download, and status operations m No security: cleartext passwords and other vulnerability r Bandwidth issues m Suppliers ranked by apparent bandwidth & response time

25 Napster: Limitations with Centralized Directory r single point of failure r performance bottleneck r copyright infringement: “target” of lawsuit is obvious r Later P2P systems were more distributed m Gnutella went to the other extreme file transfer is decentralized, but locating content is highly centralized

26 Peer-to-Peer Networks: Gnutella r Gnutella history m 2000: J. Frankel & T. Pepper released Gnutella m Soon after: many other clients (e.g., Limewire) m 2001: protocol enhancements, e.g., “ultrapeers” r Query flooding m Join: contact a few nodes to become neighbors m Search: ask neighbors, who ask their neighbors m Fetch: get file directly from another node

27 Gnutella: Query Flooding r Fully distributed m No central server r Many Gnutella clients implementing protocol r Peers form an abstract, logical network, called overlay network Overlay network: graph r Edge between peer X and Y if there’s a TCP connection r All active peers and edges is overlay network r An edge is not a physical communication link, but an abstract link r Given peer will typically be connected with < 10 overlay neighbors

28 Gnutella: Protocol r Query message sent over existing TCP connections r Peers forward Query message r QueryHit sent over reverse path Query QueryHit Query QueryHit Query QueryHit File transfer: HTTP Scalability: limited scope flooding

29 Gnutella: limited-scope query flooding r When an initial query message is sent out, a peer-count field in the message is set to a specific limit, e.g. 7. r Each time the query message reaches a new peer, m the peer decrements the peer-count filed before forwarding the query to its neighbors r A peer stops forwarding a query if the peer- count field equals to zero r It is possible that a peer may not be able to locate the content

30 Gnutella: Peer Joining r Joining peer X must find some other peers m Start with a list of candidate peers m X sequentially attempts TCP connections with candidate peers on list until connection setup with Y r Flooding: X sends Ping message to Y m Ping message includes a peer-count field m Y forwards Ping message to his overlay neighbors until the peer-count field is zero m All peers receiving Ping message respond with Pong message m The pong message includes peer’s IP address. r X receives many Pong messages m X can then set up additional TCP connections with the peers

31 Gnutella: Pros and Cons r Advantages m Fully decentralized m Search cost distributed r Disadvantages m Search scope may be quite large m Search time may be quite long m High overhead, and nodes come and go often r Employed by the popular P2P client Limewire

32 Peer-to-Peer Networks: KaAzA r KaZaA history m 2001: created by Dutch company (Kazaa BV) m First used by FastTrack, a P2P file-sharing protocol which is used by other clients as well, including Kazaa and Morpheus

33 Hierarchical Overlay r between centralized index, query flooding approaches r each peer is either a group leader (super peer) or assigned to a group leader as an ordinary peer. m TCP connection between peer and its group leader. m TCP connections between some pairs of group leaders. r Super peer – higher bandwidth, higher availabilities, greater responsibilities r A super peer may have a few hundred ordinary peer as children. r group leader tracks content in its children

34 Peer-to-Peer Networks: KaAzA r Smart query flooding m Join: on start, the client contacts a super-node (and may later become one) m Publish: client sends list of files to its super-node m Search: send query to super-node, the super-nodes flood queries among themselves Limited-scope flooding in the overlay networks of super peers m Fetch: get file directly from peer(s); can fetch from multiple peers at once

35 P2P Case Study: BitTorrent (1) r BitTorrent history and motivation m 2002: B. Cohen debuted BitTorrent m Key motivation: popular content m Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders m accounted for roughly 27% to 55% of all Internet traffic (depending on geographical location) as of February 2009. (Wiki) m Preventing free-riding m The collection of all peers participating in the distribution of a particular file is called a torrent

36 BitTorrent: Simultaneous Downloading r Divide large file into many pieces m Replicate different pieces on different peers m A peer with a complete piece can trade with other peers m Peer can (hopefully) assemble the entire file r Allows simultaneous downloading m Retrieving different parts of the file from different peers at the same time m And uploading parts of the file to peers m Important for very large files

37 P2P Case Study: BitTorrent (1) tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer r P2P file distribution

38 BitTorrent: Tracker r Infrastructure node m Keeps track of peers participating in the torrent r Peers register with the tracker m Peer registers when it arrives m Peer periodically informs tracker it is still there r Tracker selects peers for downloading m Returns a random set of peers m Including their IP addresses m So the new peer knows who to contact for data r Peer connects to subset of peers (“neighbors”)

39 BitTorrent: Chunks r Large file divided into smaller pieces m Fixed-sized chunks m Typical chunk size of 256 Kbytes r When a peer joins the torrent, it has no chunks, but will accumulate them over time r Allows simultaneous transfers m Downloading chunks from different neighbors m Uploading chunks to other neighbors r Learning what chunks your neighbors have m Periodically asking them for a list r File done when all chunks are downloaded r Once peer has entire file, it may (selfishly) leave or (altruistically) remain as a seed

40 BitTorrent: Pulling Chunks r at any given time, different peers have different subsets of file chunks r periodically, a peer (Alice) asks each neighbor for list of chunks that they have. r Alice issues requests for her missing chunks m Rarest first

41 BitTorrent: Chunk Request Order r Which chunks to request? m Could download in order m Like an HTTP client does r Problem: many peers have the early chunks m Peers have little to share with each other m Limiting the scalability of the system r Problem: eventually nobody has rare chunks m E.g., the chunks need the end of the file m Limiting the ability to complete a download r Solutions: random selection and rarest first

42 BitTorrent: Rarest Chunk First r Which chunks to request first? m The chunk with the fewest available copies m I.e., the rarest chunk first r Benefits to the peer m Avoid starvation when some peers depart r Benefits to the system m Avoid starvation across all peers wanting a file m Balance load by equalizing number of copies of chunks

43 Free-Riding Problem in P2P Networks r Vast majority of users are free-riders m Most share no files and answer no queries m Others limit number of connections or upload speed r A few “peers” essentially act as servers m A few individuals contributing to the public good m Making them hubs that basically act as a server r BitTorrent prevent free riding m Allow the fastest peers to download from you m Occasionally let some free riders download

44 Bit-Torrent: Preventing Free-Riding r Peer has limited upload bandwidth m And must share it among multiple peers r Prioritizing the upload bandwidth: m gives priority to the neighbors that are currently supplying her data at the highest rate. r Rewarding the top four neighbors m continuously measures the rate at which it receives the bits m Reciprocates by sending to the top four peers m Recompute and reallocate top four peers every 10 seconds r Optimistic unchoking m Randomly try a new neighbor every 30 seconds m So new neighbor has a chance to be a better partner m newly chosen peer may join top 4


Download ppt "Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent."

Similar presentations


Ads by Google