Presentation is loading. Please wait.

Presentation is loading. Please wait.

P2P File Sharing Systems

Similar presentations


Presentation on theme: "P2P File Sharing Systems"— Presentation transcript:

1 P2P File Sharing Systems
Johnny Wong Note: Materials of these slides are based on the those from the textbook “Computer Networking: A Top-Down Approach featuring the Internet” by J.F Kurose and K.W. Ross

2 P2P file sharing Alice chooses one of the peers, Bob. Example
File is copied from Bob’s PC to Alice’s notebook: HTTP While Alice downloads, other users uploading from Alice. Alice’s peer is both a Web client and a transient Web server. All peers are servers = highly scalable! Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet; gets new IP address for each connection Asks for “Hey Jude” Application displays other peers that have copy of Hey Jude.

3 P2P: centralized directory
directory server peers Alice Bob 1 2 3 original “Napster” design 1) when peer connects, it informs central server: IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob

4 P2P: problems with centralized directory
Single point of failure Performance bottleneck Copyright infringement file transfer is decentralized, but locating content is highly centralized

5 P2P: Query flooding Send query to neighbors Neighbors forward query
If queried peer has object, it sends message back to querying peer Gnutella no hierarchy use bootstrap node to learn about others join message join

6 P2P: more on query flooding
Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node maintenance of overlay network

7 P2P: decentralized directory
Each peer is either a group leader or assigned to a group leader. Group leader tracks the content in all its children. Peer queries group leader; group leader may query other group leaders.

8 More about decentralized directory
overlay network peers are nodes edges between peers and their group leaders edges between some pairs of group leaders virtual neighbors bootstrap node connecting peer is either assigned to a group leader or designated as leader advantages of approach no centralized directory server location service distributed over peers more difficult to shut down disadvantages of approach bootstrap node needed group leaders can get overloaded

9 Unstructured P2P File Sharing
Centralized Napster Distributed Gnutella KaZaA

10 Napster: how does it work
Application-level, client-server protocol over point-to-point TCP Centralized directory server Steps: connect to Napster server upload your list of files to server. give server keywords to search the full list with. select “best” of matching answers (pings) Pings the candidate server using RTT

11 Gnutella Open-source Links are TCP-connections
Each peer sends a query to each of its peers; the receiving peer sends a search query; once the file is found, the response follows the search path backward to the querier; file transfer is point-to-point

12 Gnutella (con’t) more difficult to “pull plug”
decentralized searching for files central directory server no longer the bottleneck more difficult to “pull plug” each application instance serves to: store selected files route queries from and to its neighboring peers respond to queries if file stored locally serve files

13 Gnutella history 3/14/00: release by AOL, almost immediately withdrawn
became open source many iterations to fix poor initial design (poor design turned many people off) issues: how much traffic does one query generate? how many hosts can it support at once? what is the latency associated with querying? is there a bottleneck?

14 Gnutella: limited scope query
Searching by flooding: if you don’t have the file you want, query 7 of your neighbors. if they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10. reverse path forwarding for responses (not files) (useful for saving TCP connections)

15 Gnutella in practice Gnutella traffic << KaZaA traffic
Anecdotal: Couldn’t find anything Downloads wouldn’t complete Fixes: do things KaZaA is doing: hierarchy, queue management, parallel download,… good source for technical info/open questions about Gnutella:

16 KaZaA: Technology Software Proprietary
control data encrypted (including queries/responses) KaZaA Web site gives a few hits Some studies described in Web Everything in HTTP request and response messages Architecture hierarchical cross between Napster and Gnutella File transfer is not encrypted although the control data is encrypted

17 KaZaA: The service (2) User can configure max number of simultaneous uploads and max number of simultaneous downloads queue management at server and client Frequent uploaders can get priority in server queue Keyword search User can configure “up to x” responses to keywords Responses to keyword queries come in waves; stops when x responses are found From user’s perspective, service resembles Google, but provides links to MP3s and videos rather than Web pages

18 KaZaA: Architecture Each peer is either a supernode or an ordinary node (assigned to one supernode) Each supernode connected to many other supernodes (supernode overlay) Nodes that have more connection bandwidth and are more available are designated as supernodes Ordinary nodes: super nodes: m:1 Supernode: supernode: m:n

19 KaZaa Supernode Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants Guess: supernode has (on average) descendants; roughly 10,000 supernodes Low bandwidth or intermitted connections

20 KaZaA: Overlay maintenance
List of potential supernodes included with software download New peer goes through list until it finds operational supernode Connects, obtains more up-to-date list Node then pings nodes on list and connects with the one with smallest RTT If supernode goes down, node obtains updated list and chooses new supernode

21 KaZaA Queries If x matches found, done.
Node first sends query (keyword) to supernode Supernode responds with matches If x matches found, done. Otherwise, supernode forwards query to subset of supernodes If total of x matches found, done. Otherwise, query further forwarded Probably by original supernode rather than recursively? Not clear how the original supernode queries other supernodes until x matches are found? Or it use a recursive query


Download ppt "P2P File Sharing Systems"

Similar presentations


Ads by Google