Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001.

Similar presentations


Presentation on theme: "Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001."— Presentation transcript:

1 Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001

2 Lecture Overview  1st Part:  The P2P communication model, architecture and applications  2nd Part:  Chord and CFS

3 Peer to Peer - Overview  A class of applications that takes advantage of resources:  Storage, CPU cycles, content, human presence  Available at the edges of the Internet  A decentralized system that must cope with the unstable nature of computers located at the network edge

4 Client/Server Architecture  An architecture in which each process is a client or a server  Servers are powerful computers dedicated for providing services – storage, traffic, etc  Clients rely on servers for resources

5 Client/Server Properties  Big, strong server  Well known port/address of the server  Many to one relationship  Different software runs on the client/server  Client can be dumb (lacks functionality), server performs for the client  Client usually initiates connection

6 Client Server Architecture Server Client Internet

7 Client/Server Architecture GET /index.html HTTP/1.0 HTTP/1.1 200 OK... Client Server

8 Disadvantages of C/S Architecture  Single point of failure  Strong expensive server  Dedicated maintenance (a sysadmin)  Not scalable - more users, more servers

9 Solutions Replication of data (several servers) Problems: redundancy, synchronization, expensive Brute force (a bigger, faster server) Problems: Not scalable, expensive, single point of failure

10 The Client Side  Although the model hasn’t changed over the years, the entities in it have  Today’s clients can perform more roles than just forwarding users requests  Today’s clients have:  More computing power  Storage space

11 Thin Client  Performs simple tasks:  I/O  Properties:  Cheap  Limited processing power  Limited storage

12 Fat Client  Can perform complex tasks:  Graphics  Data manipulation  Etc…  Properties:  Strong computation power  Bigger storage  More expensive than thin

13 Evolution at the Client Side IBM PC @ 4.77MHz 360k diskettes A PC @ 2GHz 40GB HD DEC’S VT100 No storage ‘70‘80 2001

14 What Else Has Changed?  The number of home PCs is increasing rapidly  PCs with dynamic IPs  Most of the PCs are “fat clients”  Software cannot cope with hardware development  As the Internet usage grow, more and more PCs are connecting to the global net  Most of the time PCs are idle  How can we use all this?

15 Sharing  Definition: 1.To divide and distribute in shares 2.To partake of, use, experience, occupy, or enjoy with others 3.To grant or give a share in intransitive senses Merriam Webster’s online dictionary (www.m-w.com)  There is a direct advantage of a co-operative network versus a single computer

16 Resources Sharing  What can we share?  Computer resources  Shareable computer resources:  “CPU cycles” - seti@home  Storage - CFS  Information - Napster / Gnutella  Bandwidth sharing - Crowds

17 SETI@Home  SETI – Search for ExtraTerrestrial Intelligence  @Home – On your own computer  A radio telescope in Puerto Rico scans the sky for radio signals  Fills a DAT tape of 35GB in 15 hours  That data has to be analyzed

18 SETI@Home (cont.)  The problem – analyzing the data requires a huge amount of computation  Even a supercomputer cannot finish the task on its own  Accessing a supercomputer is expensive  What can be done?

19 SETI@Home (cont.)  Can we use distributed computing?  YEAH  Fortunately, the problem be solved in parallel - examples:  Analyzing different parts of the sky  Analyzing different frequencies  Analyzing different time slices

20 SETI@Home (cont.)  The data can be divided into small segments  A PC is capable of analyzing a segment in a reasonable amount of time  An enthusiastic UFO searcher will lend his spare CPU cycles for the computation  When? Screensavers

21 SETI@Home - Example

22 SETI@Home - Summary  SETI reverses the C/S model  Clients can also provide services  Servers can be weaker, used mainly for storage  Distributed peers serving the center  Not yet P2P but we’re close  Outcome - great results:  Thousands of unused CPU hours tamed for the mission  3+ millions of users

23 What Exactly is P2P?  A distributed communication model with the properties:  All nodes have identical responsibilities  All communication is symmetric

24 P2P Properties  Cooperative, direct sharing of resources  No central servers  Symmetric clients Client Internet

25 P2P Advantages  Harnesses client resources  Scales with new clients  Provides robustness under failures  Redundancy and fault-tolerance  Immune to DoS  Load balance

26 P2P Disadvantages -- A Tough Design Problem  How do you handle a dynamic network (nodes join and leave frequently)  A number of constrains and uncontrolled variables:  No central servers  Clients are unreliable  Client vary widely in the resources they provide  Heterogeneous network (different platforms)

27 Two Main Architectures  Hybrid Peer-to-Peer  Preserves some of the traditional C/S architecture. A central server links between clients, stores indices tables, etc  Pure Peer-to-Peer  All nodes are equal and no functionality is centralized

28 Hybrid P2P  A main server is responsible for various administrative operations:  Users’ login and logout  Storing metadata  Directing queries  Example: Napster

29 Examples - Napster  Napster is a program for sharing information (mp3 music files) over the Internet  Created by Shawn Fanning in 1999 although similar services were already present (but lacked popularity and functionality)

30 Napster Sharing Style: hybrid center+edge “slashdot” song5.mp3 song6.mp3 song7.mp3 “kingrook” song4.mp3 song5.mp3 song6.mp3 song5.mp3 1. Users launch Napster and connect to Napster server 3. beastieboy enters search criteria 4. Napster displays matches to beastieboy 2. Napster creates dynamic directory from users’ personal.mp3 libraries Title User Speed song1.mp3 beasiteboy DSL song2.mp3 beasiteboy DSL song3.mp3 beasiteboy DSL song4.mp3 kingrook T1 song5.mp3 kingrook T1 song5.mp3 slashdot 28.8 song6.mp3 kingrook T1 song6.mp3 slashdot 28.8 song7.mp3 slashdot 28.8 5. beastieboy makes direct connection to kingrook for file transfer song5 “beastieboy” song1.mp3 song2.mp3 song3.mp3

31 What About Communication Between Servers?  Each Napster server creates its own mp3 exchange community:  rock.napster.com, dance.napster.com, etc…  Creates a separation which is bad  We would like multiple servers to share a common ground. Reduces the centralization nature of each server, expands searchability

32 Various HP2P Models – 1. Chained Architecture  Chained architecture – a linear chain of servers  Clients login to a random server  Queries are submitted to the server  If the server satisfies the query – Done  Otherwise – Forward the query to the next server  Results are forwarded back to the first server  The server merges the results  The server returns the results to the client  Used by OpenNap network

33 2. Full Replication Architecture  Replication of constantly updated metadata  A client logs on to a random server  The server sends the updated metadata to all servers  Result:  All servers can answer queries immediately

34 3. Hash Architecture  Each server holds a portion of the metadata  Each server holds the complete inverted list for a subset of all words  Client directs a query to a server that is responsible for at least one of the keywords  That server gets the inverted lists for all the keywords from the other servers  The server returns the relevant results to the client

35 4. Unchained Architecture  Independent servers which do not communicate with each other  A client who logs on to one server can only see the files of other users at the same local server  A clear disadvantage of separating users into distinct domains  Used by Napster

36 Pure P2P  All nodes are equal  No centralized server  Example: Gnutella

37  A completely distributed P2P network  Gnutella network is composed of clients  Client software is made of two parts:  A mini search engine – the client  A file serving system – the “server”  Relies on broadcast search

38 Gnutella - Operations  Connect – establishing a logical connection  PingPong – discovering new nodes (my friend’s friends)  Query – look for something  Download – download files (simple HTTP)

39 Gnutella – Form an Overlay Connect OK Ping Pong

40 How to find a node?  Initially, ad hoc ways  Email, online chat, news groups…  Bottom line: you got to know someone!  Set up some long-live nodes  New comer contacts the well-known nodes  Useful for building better overlay topology

41 Gnutella – Search Green Toad I have Toad A – look nice Toad B – too far A B I have

42 On a larger scale, things get more complicated

43 Gnutella – Scalability Issue  Can the system withstand flooding from every node?  Use TTL to limit the range of propagation  5 ^ 5 = 3125, how much can you get ?  Creates an “horizon” of computers  The promise is an expectation that you can change horizon everyday when login

44 The Differences  While the pure P2P model is completely symmetric, in the hybrid model elements of both PP2P and C/S coexist  Each model has its disadvantages  PP2P is still having problems locating information  HP2P is having scalability problems as with ordinary server oriented models

45 P2P – Summary  The current settings allowed P2P to enter the world of PCs  Controls the niche of sharing resources  The model is being studied from the academic and commercial point of view  There are still problems out there…

46 End Of Part I

47 Part II Roy Werber Idan Gelbourt Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

48 A P2P Problem  Every application in a P2P environment must handle an important problem: The lookup problem  What is the problem?

49 A Peer-to-peer Storage Problem  1000 scattered music enthusiasts  Willing to store and serve replicas  How do you find the data?

50 The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ? Dynamic network with N nodes, how can the data be found?

51 Centralized Lookup (Napster) Publisher@ Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state and a single point of failure Key=“title” Value=MP3 data… N4N4 Hard to keep the data in the server updated

52 Flooded queries (Gnutella) N4N4 Publisher@ Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O( N ) messages per lookup Key=“title” Value=MP3 data… Lookup(“title”) Not scalable

53 So Far  Centralized : - Table size – O(n) - Number of hops – O(1)  Flooded queries: - Table size – O(1) - Number of hops – O(n)

54 We Want  Efficiency : O(log(N)) messages per lookup  N is the total number of servers  Scalability : O(log(N)) state per node  Robustness : surviving massive failures

55 How Can It Be Done?  How do you search in O(log(n)) time?  Binary search  You need an ordered array  How can you order nodes in a network and data items?  Hash function!

56 Chord: Namespace  Namespace is a fixed length bit string  Each object is identified by a unique ID  How to get the ID? Shark SHA-1 Object ID:DE11AC SHA-1 Object ID:AABBCC 194.90.1.5:8080

57 Chord Overview  Provides just one operation :  A peer-to-peer hash lookup:  Lookup(key)  IP address  Chord does not store the data  Chord is a lookup service, not a search service  It is a building block for P2P applications

58 Chord IDs  Uses Hash function:  Key identifier = SHA-1(key)  Node identifier = SHA-1(IP address)  Both are uniformly distributed  Both exist in the same ID space  How to map key IDs to node IDs?

59 Mapping Keys To Nodes 0 M - an item - a node

60 Consistent Hashing [Karger 97] N32 N90 N105 K80 K20 K5 Circular 7-bit ID space Key 5 Node 105 A key is stored at its successor: node with next higher ID

61 Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

62 “Finger Table” Allows Log(n)-time Lookups N80 1/128 ½ ¼ 1/8 1/16 1/32 1/64 Circular 7-bit ID space N80 knows of only seven other nodes.

63 Finger i Points to Successor of N+2 i N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128 112 N120

64 Lookups Take O(log(n)) Hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

65 Joining: Linked List Insert N36 N40 N25 1. Lookup(36) K30 K38 1. N36 wants to join. He finds his successor

66 Join (2) N36 N40 N25 2. N36 sets its own successor pointer K30 K38

67 Join (3) N36 N40 N25 3. Copy keys 26..36 from N40 to N36 K30 K38 K30

68 Join (4) 4. Set N25’s successor pointer Update finger pointers in the background Correct successors produce correct lookups N36 N40 N25 K30 K38 K30

69 Join: Lazy Finger Update Is OK N36 N40 N25 N2 K30 N2 finger should now point to N36, not N40 Lookup(K30) visits only nodes < 30, will undershoot

70 Failures Might Cause Incorrect Lookup N120 N113 N102 N80 N85 N80 doesn’t know correct successor, so incorrect lookup N10 Lookup(90)

71 Solution: Successor Lists  Each node knows r immediate successors  After failure, will know first live successor  Correct successors guarantee correct lookups  Guarantee is with some probability

72 Choosing the Successor List Length  Assume 1/2 of nodes fail  P(successor list all dead) = (1/2) r  i.e. P(this node breaks the Chord ring)  Depends on independent failure  P(no broken nodes) = (1 – (1/2) r ) N  If we choose : r = 2log(N) makes prob. = 1 – 1/N

73 Chord Properties  Log(n) lookup messages and table space.  Well-defined location for each ID.  No search required.  Natural load balance.  No name structure imposed.  Minimal join/leave disruption.  Does not store documents…

74 Experimental Overview  Quick lookup in large systems  Low variation in lookup costs  Robust despite massive failure  See paper for more results Experiments confirm theoretical results

75 Chord Lookup Cost Is O(log N) Number of Nodes Average Messages per Lookup Constant is 1/2

76 Failure Experimental Setup  Start 1,000 CFS/Chord servers  Successor list has 20 entries  Wait until they stabilize  Insert 1,000 key/value pairs  Five replicas of each  Stop X% of the servers  Immediately perform 1,000 lookups

77 Massive Failures Have Little Impact Failed Lookups (Percent) Failed Nodes (Percent) (1/2) 6 is 1.6%

78 Chord Summary  Chord provides peer-to-peer hash lookup  Efficient: O(log(n)) messages per lookup  Robust as nodes fail and join  Good primitive for peer-to-peer systems http://www.pdos.lcs.mit.edu/chord

79

80 Wide-area Cooperative Storage With CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley

81 What Can Be Done With Chord  Cooperative Mirroring  Time-Shared Storage  Makes data available when offline  Distributed Indexes  Support Napster keyword search

82 How to Mirror Open-source Distributions?  Multiple independent distributions  Each has high peak load, low average  Individual servers are wasteful  Solution: aggregate  Option 1: single powerful server  Option 2: distributed service  But how do you find the data?

83 Design Challenges  Avoid hot spots  Spread storage burden evenly  Tolerate unreliable participants  Fetch speed comparable to whole-file TCP  Avoid O(#participants) algorithms  Centralized mechanisms [Napster], broadcasts [Gnutella]  CFS solves these challenges

84 CFS Overview CFS – Cooperative File System:  P2P read-only storage system  Read-only – only the owner can modify files  Completely decentralized node clientserver node clientserver Internet

85 CFS - File System  A set of blocks distributed over the CFS servers  3 layers:  FS – interprets blocks as files (Unix V7)  Dhash – performs block management  Chord – maintains routing tables used to find blocks

86 Chord  Uses 160-bit identifier space  Assigns to each node and block an identifier  Maps block’s id to node’s id  Performs key lookups (as we saw earlier)

87 Dhash – Distributed Hashing  Performs blocks management on top of chord :  Block’s retrieval,storage and caching  Provides load balance for popular files  Replicates each block at a small number of places (for fault-tolerance)

88 CFS - Properties  Tested on prototype :  Efficient  Robust  Load-balanced  Scalable  Download as fast as FTP  Drawbacks  No anonymity  Assumes no malicious participants

89 Design Overview FS Dhash Chord Dhash Chord DHash stores, balances, replicates, caches blocks DHash uses Chord [SIGCOMM 2001] to locate blocks

90 Client-server Interface  Files have unique names  Files are read-only (single writer, many readers)  Publishers split files into blocks  Clients check files for authenticity FS Clientserver Insert file f Lookup file f Insert block Lookup block node server node

91 Naming and Authentication 1.Name could be hash of file content  Easy for client to verify  But update requires new file name 2.Name could be a public key  Document contains digital signature  Allows verified updates w/ same name

92 CFS File Structure Public key Root block signature H(D) D Directory block H(F) F Inode block H(B1) B1 B2 H(B2) Data block

93 File Storage  Data is stored for an agreed-upon finite interval  Extensions can be requested  No specific delete command  After expiration – the blocks fade

94 Storing Blocks  Long-term blocks are stored for a fixed time  Publishers need to refresh periodically  Cache uses LRU (Least Recently Used) disk: cacheLong-term block storage

95 Replicate Blocks at k Successors N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68  Replica failure is independent

96 Lookups Find Replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 1. 3. 2. 4. Lookup(BlockID=17) RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch

97 First Live Successor Manages Replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 Copy of 17

98 DHash Copies to Caches Along Lookup Path N40 N10 N5 N20 N110 N99 N80 N60 Lookup(BlockID=45) N50 N68 1. 2. 3. RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache 4.

99 Naming and Caching D30 @ N32 Client 1 Client 2  Every hop is smaller,the chance of collision when doing lookup is high  Caching is efficient

100 Caching Doesn’t Worsen Load N32 Only O(log N) nodes have fingers pointing to N32 This limits the single-block load on N32

101 Virtual Nodes Allow Heterogeneity – Load Balancing  Hosts may differ in disk/net capacity  Hosts may advertise multiple IDs  Chosen as SHA-1(IP Address, index)  Each ID represents a “virtual node”  Host load proportional to # v.n.’s  Manually controlled Node A N60N10N101 Node B N5

102 Server Selection By Chord N80 N48 100ms 10ms Each node monitors RTTs to its own fingers Tradeoff: ID-space progress vs delay N25 N90 N96 N18N115 N70 N37 N55 50ms 12ms Lookup(47)

103 Why Blocks Instead of Files?  Cost: one lookup per block  Can tailor cost by choosing good block size  Benefit: load balance is simple  For large files  Storage cost of large files is spread out  Popular files are served in parallel

104 CFS Project Status  Working prototype software  Some abuse prevention mechanisms  Guarantees authenticity of files, updates, etc.  Napster-like interface in the works  Decentralized indexing system  Some measurements on RON testbed  Simulation results to test scalability

105 Experimental Setup (12 nodes)  One virtual node per host  8Kbyte blocks  RPCs use UDP Caching turned off Proximity routing turned off

106 CFS Fetch Time for 1MB File Average over the 12 hosts No replication, no caching; 8 KByte blocks Fetch Time (Seconds) Prefetch Window (KBytes)

107 Distribution of Fetch Times for 1MB Fraction of Fetches Time (Seconds) 8 Kbyte Prefetch 24 Kbyte Prefetch40 Kbyte Prefetch

108 CFS Fetch Time vs. Whole File TCP Fraction of Fetches Time (Seconds) 40 Kbyte Prefetch Whole File TCP

109 Robustness vs. Failures Failed Lookups (Fraction) Failed Nodes (Fraction) (1/2) 6 is 0.016 Six replicas per block;

110 Future work  Test load balancing with real workloads  Deal better with malicious nodes  Indexing  Other applications

111 CFS Summary  CFS provides peer-to-peer r/o storage  Structure: DHash and Chord  It is efficient, robust, and load-balanced  It uses block-level distribution  The prototype is as fast as whole-file TCP  http://www.pdos.lcs.mit.edu/chord

112 The End


Download ppt "Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001."

Similar presentations


Ads by Google