Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta.

Similar presentations


Presentation on theme: "Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta."— Presentation transcript:

1 Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta

2 Outline Peer-to-Peer Concept Peer-to-Peer Concept Overview of P2P systems: Overview of P2P systems: –Napster, Gnutella, Freenet, Freehaven, Oceanstore, PAST, Farsite, Publius, CFS, Tapestry, Pastry, Chord, Can and others Comparison of the systems Comparison of the systems Conclusion Conclusion

3 What is Peer-to-Peer? Every node is designed to(but may not by user choice) provide some service that helps other nodes in the network to get service Every node is designed to(but may not by user choice) provide some service that helps other nodes in the network to get service Each node potentially has the same responsibility Each node potentially has the same responsibility Sharing can be in different ways: Sharing can be in different ways: –CPU cycles: –Storage space: Napster, Gnutella, Freenet…

4 P2P: Why so attractive? Peer-to-peer applications fostered explosive growth in recent years. Peer-to-peer applications fostered explosive growth in recent years. –Low cost and high availability of large numbers of computing and storage resources, –Increased network connectivity »As long as these issues keep their importance, peer-to-peer applications will continue to gain importance

5 Main Design Goals of P2P systems Ability to operate in a dynamic environment Ability to operate in a dynamic environment Performance and scalability Performance and scalability Reliability Reliability Anonymity: Freenet, Freehaven, Publius Anonymity: Freenet, Freehaven, Publius Accountability: Freehaven, Farsite Accountability: Freehaven, Farsite

6 First generation P2P routing and location schemes Napster, Gnutella, Freenet… Napster, Gnutella, Freenet… Intended for large scale sharing of data files Intended for large scale sharing of data files Reliable content location was not guaranteed Reliable content location was not guaranteed Self-organization and scalability: two issues to be addressed Self-organization and scalability: two issues to be addressed

7 Second generation P2P systems Pastry, Tapestry, Chord, CAN… Pastry, Tapestry, Chord, CAN… They guarantee a definite answer to a query in a bounded number of network hops. They guarantee a definite answer to a query in a bounded number of network hops. They form a self-organizing overlay network. They form a self-organizing overlay network. They provide a load balanced, fault-tolerant distributed hash table, in which items can be inserted and looked up in a bounded number of forwarding hops. They provide a load balanced, fault-tolerant distributed hash table, in which items can be inserted and looked up in a bounded number of forwarding hops.

8 Napster Application-level, client-server protocol over point to-point TCP, centralized system Retrieval: four steps Connect to Napster server Connect to Napster server Upload your list of files (push) to server. Upload your list of files (push) to server. Give server keywords to search the full list with. Give server keywords to search the full list with. Select “best” of correct answers. (pings) Select “best” of correct answers. (pings) centralized server: single logical point of failure, can load balance among servers using DNS rotation, potential for congestion, Napster “in control” (freedom is an illusion) centralized server: single logical point of failure, can load balance among servers using DNS rotation, potential for congestion, Napster “in control” (freedom is an illusion) no security: passwords in plain text, no authentication, no anonymity no security: passwords in plain text, no authentication, no anonymity

9 Napster: How it works?(1) napster.com users File list is uploaded 1.

10 Napster: How it works?(2) napster.com user Request and results User requests search at server. 2.

11 Napster: How it works?(3) napster.com user pings User pings hosts that apparently have data. Looks for best transfer rate. 3.

12 Napster: How it works?(4) napster.com user Retrieves file User retrieves file 4.

13 Gnutella peer-to-peer networking: applications connect to peer applications peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files focus: decentralized method of searching for files each application instance serves to: each application instance serves to: –store selected files –route queries (file searches) from and to its neighboring peers –respond to queries (serve file) if file stored locally How it works: How it works: Searching by flooding: –If you don’t have the file you want, query 7 of your partners. –If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. –Requests are flooded, but there is no tree structure. –No looping but packets may be received twice. Note: Play gnutella animation at:

14 Freenet(discussed in class before) Completely anonymous, for producers or consumers of information Completely anonymous, for producers or consumers of information Resistance to attempts by third parties to deny access to information Resistance to attempts by third parties to deny access to information Goals: Goals: –Anonymity for producers and consumers –Deniability for information storers –Resistance to denial attacks –Efficient storing and routing –Does NOT provide »Permanent file storage »Load balancing »Anonymity for general n/w usage

15 Free haven Anonymous Anonymous Resists powerful adversaries to find/destroy data Resists powerful adversaries to find/destroy data Goals Goals –Anonymity: publishers, readers, servers –Persistence: lifetime determined by publisher –Flexibility: add/remove nodes –Accountability: reputation A server gives up space => gets space on other servers A server gives up space => gets space on other servers

16 Freehaven: Publication Split doc into n shares, k of which can rebuild the file(k brittle file –Small k =>larger share, more duplication Generate (SK doc, PK doc ) and encrypt each share with SK doc Generate (SK doc, PK doc ) and encrypt each share with SK doc Store on a server: Store on a server: –Encrypted share, timestamp, expiration date, hash(PK doc )

17 Freehaven: Retrieval Documents are indexed by hash(PK doc ) Documents are indexed by hash(PK doc ) Reader generates (PK client, SK client ) and a one-time r ler reply block Reader generates (PK client, SK client ) and a one-time r ler reply block Reader broadcasts hash(PK doc ), PK client,and the r ler block Reader broadcasts hash(PK doc ), PK client,and the r ler block –To all servers it knows about –Broadcasts may be queued and bulk sent Servers holding shares with hash(PK doc ) Servers holding shares with hash(PK doc ) –Encode the share with PK client –Send it using the r ler block

18 Freehaven: Share expiration Absolute Date Absolute Date “Price” of a file: size x lifetime “Price” of a file: size x lifetime Freenet and Mojo Notion favor Popular documents Freenet and Mojo Notion favor Popular documents Unsolved problems: Unsolved problems: –Large corrupt servers, list of “discouraged” documents, DoS Not ready for wide deployment: Not ready for wide deployment: –Inefficient communication=>few users=>weak anonymity

19 PAST & Pastry Pastry: Pastry: –Completely decentralized, scalable, and self- organizing; it automatically adapts to the arrival, departure and failure of nodes. –Seeks to minimize the distance messages travel, according to a scalar proximity metric like the number of IP routing hops. –In a Pastry network, »Each node has a unique id, nodeId. »Presented with a message & a key, Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key.

20 Pastry: NodeId Leaf set: stores numerically closest nodeIds. Leaf set: stores numerically closest nodeIds. Routing table: Routing table: Common prefix with next digit-rest of NodeId Neighborhood set: Stores closest nodes according to proximity metric

21 Pastry: Routing Given a message, Check: Given a message, Check:  If it falls within the range of nodeId’s covered in the leaf set, then forward directly to it.  If not, using the Routing table, the message is forwarded to a node that shares a most common prefix with the key.  If routing table is empty or the node cannot be reached, then forward to a node that is numerically closer to the key and also shares a prefix with the key. Performance: Performance: –If key within the leaf set = O ( 1 ) –If key goes to the routing table=O(Log N) –Worst case = O (N) ( under failures)

22 PAST PAST: an archival, cooperative file storage and distribution facility. PAST: an archival, cooperative file storage and distribution facility. uses Pastry as its routing scheme uses Pastry as its routing scheme Offers persistent storage services for replicated read-only files Offers persistent storage services for replicated read-only files Owners can insert or reclaim files, but clients can just look up Owners can insert or reclaim files, but clients can just look up Collection of PAST nodes form an overlay network. A PAST node is at least an access point, but it can also contribute to storage and participate in the routing optionally as well. Collection of PAST nodes form an overlay network. A PAST node is at least an access point, but it can also contribute to storage and participate in the routing optionally as well. Security: Each node and each user in the system holds a smartcard with which there is a private/public key pair associated. Security: Each node and each user in the system holds a smartcard with which there is a private/public key pair associated. Three operations: insert, lookup and reclaim Three operations: insert, lookup and reclaim

23 Farsite Farsite is a symbiotic, serverless, distributed file system. Farsite is a symbiotic, serverless, distributed file system. Symbiotic: It works among cooperating but not completely trusting the clients. Symbiotic: It works among cooperating but not completely trusting the clients. Main design goals: Main design goals: –To provide high availability and reliability for file storage. –To provide security and resistance to Byzantine threats. –To have the system automatically configure and tune itself adaptively. Farsite first encrypts the contents of the files. This prevents an unauthorized user to read the file. Farsite first encrypts the contents of the files. This prevents an unauthorized user to read the file. Such a user can not read a file even if it is in his own desktop computer, because of encryption. Such a user can not read a file even if it is in his own desktop computer, because of encryption. Digital signatures are used to prevent an unauthorized user to write a file. Digital signatures are used to prevent an unauthorized user to write a file. After encryption, multiple replicas of the file are made and they are distributed to several other client machines. After encryption, multiple replicas of the file are made and they are distributed to several other client machines.

24 Publius Publius system mainly focuses on availability and anonymity. It maintains availability by distributing files as shares over n web servers. J of these shares are enough to reconstruct a file. Publius system mainly focuses on availability and anonymity. It maintains availability by distributing files as shares over n web servers. J of these shares are enough to reconstruct a file. For publishing the file, we first encrypt the document with key K. Then K is split in n shares, any j of which can rebuild K. K(doc) and a share are sent to n servers. “Name” of the document is the address of n servers. For publishing the file, we first encrypt the document with key K. Then K is split in n shares, any j of which can rebuild K. K(doc) and a share are sent to n servers. “Name” of the document is the address of n servers. Query operation is basically running a local web proxy, contacting j servers and rebuild K. Query operation is basically running a local web proxy, contacting j servers and rebuild K. While the identity of the servers are not anonymized, an attacker can remove information by forcing the closure of n-k+1 servers. While the identity of the servers are not anonymized, an attacker can remove information by forcing the closure of n-k+1 servers. Publius lacks accountability(DoS with garbage) and smooth join/leave for servers. Publius lacks accountability(DoS with garbage) and smooth join/leave for servers.

25 Chord Chord: Chord: o o Provides peer-to-peer hash lookup service: o o Lookup(key)  IP address   Chord does not store the data o o Efficient: O(Log N) messages per lookup o o N is the total number of servers o o Scalable: O(Log N) state per node o o Robust: survives massive changes in membership

26 Chord: Lookup Mechanism o Lookups take O(Log N) hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

27 Tapestry Self-administered, self-organized, location independent, scalable, fault-tolerant Self-administered, self-organized, location independent, scalable, fault-tolerant Each node has a neighbor map table with neighbor information. Each node has a neighbor map table with neighbor information.

28 Tapestry Cont. The system is able to adapt to network changes because it algorithms are dynamic. The system is able to adapt to network changes because it algorithms are dynamic. This also provides for Fault-handling This also provides for Fault-handling

29 CAN The network is created in a tree-like form. The network is created in a tree-like form. Each node is associated to one in the upper level an to a group in the lower level. Each node is associated to one in the upper level an to a group in the lower level. A query travels from the uppermost level down through the network until a match is found or until it reaches the lowermost level. A query travels from the uppermost level down through the network until a match is found or until it reaches the lowermost level. For its query model, scalability is an issue. For its query model, scalability is an issue.

30 CAN Cont. The tree like network: The tree like network:

31 Oceanstore De-centralized but monitored system. De-centralized but monitored system. Build thinking of untrusted peers (for data storage) and nomadic data. Build thinking of untrusted peers (for data storage) and nomadic data. Monitoring allows pro-active movement of data. Monitoring allows pro-active movement of data. Uses replication and caching. Uses replication and caching. Two lookup methods used: Fast probabilistic and slow deterministic. Two lookup methods used: Fast probabilistic and slow deterministic.

32 MojoNation Centralized: Central Service Broker + many peers. Centralized: Central Service Broker + many peers. When a file is inserted it’s hashed. This is the files Unique Identifier. When a file is inserted it’s hashed. This is the files Unique Identifier. Uses fragmentation and replication (50%) Uses fragmentation and replication (50%) Load balancing is an issue since it is market-based. Load balancing is an issue since it is market-based.

33 Panther Based on Chord lookup algorithms Based on Chord lookup algorithms Chord capabilities are used for load balancing and replication Chord capabilities are used for load balancing and replication Files and file chunks are identified by keys generated through Chords hash system. Files and file chunks are identified by keys generated through Chords hash system. Replication, fragmentation and caching are used. Replication, fragmentation and caching are used. Authentication is provided through the use of public and private keys. Authentication is provided through the use of public and private keys.

34 Eternity Provide “eternal” storage capabilities. Provide “eternal” storage capabilities. Protect data even from its publisher and systems administrators Protect data even from its publisher and systems administrators Fragmentation and replication are proposed Fragmentation and replication are proposed Anonymity is used to protect data Anonymity is used to protect data

35 Others Others: Others: – –Ohaha system uses consistent hashing-like algorithm to map documents to nodes. Query routing is like the one in freenet, which brings some of the weaknesses of freenet together. – –The Rewebber maintains a measure of anonymity for producers of web information by means of an encrypted URL service. TAZ extends Rewebber by using chains of nested encrypted URL’s that successively point to different rewebber services to be contacted. – –Intermemory and INDIA are two cooperative systems where files are divided into redundant shares and distributed among many servers. They are intended for long term archival storage along the lines of Eternity. – – The xFS file system focuses on providing support to distributed applications on workstations interconnected by a very high-performance network, providing high availability and reliability. – –Frangipani is a file system built on the Petal distributed virtual disk, providing high availability and reliability like xFS through distributed RAID semantics. Unlike xFS, Petal provides support for transparently adding, deleting, or reconfiguring servers.Petal – –GNUnet is free software, available to the general public under the GNU Public License (GPL). As opposed to Napster and Gnutella, GNUnet was designed with security in mind as the highest priority.

36 Comparison

37 File Storage Systems Comparison Replication CFS Mojonation OceanStore Panther De-centralizedFragmentationCaching

38 Conclusion Issues that need to be addressed: Caching, versioning, fragmentation, replication. Issues that need to be addressed: Caching, versioning, fragmentation, replication. Copyright laws! Copyright laws! The technology is very promising. It will probably a common thing in the near future. The technology is very promising. It will probably a common thing in the near future.


Download ppt "Peer to Peer File Sharing: A Survey Ismail Guvenc and Juan Jose Urdaneta."

Similar presentations


Ads by Google