Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 P2P Systems Dan Rubenstein Columbia University Thanks to: B. Bhattacharjee, K. Ross, A. Rowston,

Similar presentations

Presentation on theme: "1 P2P Systems Dan Rubenstein Columbia University Thanks to: B. Bhattacharjee, K. Ross, A. Rowston,"— Presentation transcript:

1 1 P2P Systems Dan Rubenstein Columbia University Thanks to: B. Bhattacharjee, K. Ross, A. Rowston, Don Towsley © Dan Rubenstein

2 2 Defintion of P2P 1) Significant autonomy from central servers 2) Exploits resources at the edges of the Internet m storage and content m CPU cycles m human presence 3) Resources at edge have intermittent connectivity, being added & removed

3 3 Its a broad definition: r P2P file sharing m Napster, Gnutella, KaZaA, etc r P2P communication m Instant messaging r P2P computation m r DHTs & their apps m Chord, CAN, Pastry, Tapestry r P2P apps built over emerging overlays m PlanetLab Wireless ad-hoc networking not covered here

4 4 Tutorial Outline (1) r 1. Overview: overlay networks, P2P applications, copyright issues, worldwide computer vision r 2. Unstructured P2P file sharing: Napster, Gnutella, KaZaA, search theory, flashfloods r 3. Structured DHT systems: Chord, CAN

5 5 Tutorial Outline (cont.) r 4. Experimental observations: measurement studies r 5. Wrap up

6 6 1. Overview of P2P r overlay networks r P2P applications r worldwide computer vision

7 7 Overlay networks overlay edge

8 8 Overlay graph Virtual edge r TCP connection r or simply a pointer to an IP address Overlay maintenance r Periodically ping to make sure neighbor is still alive r Or verify liveness while messaging r If neighbor goes down, may want to establish new edge r New node needs to bootstrap

9 9 More about overlays Unstructured overlays r e.g., new node randomly chooses three existing nodes as neighbors Structured overlays r e.g., edges arranged in restrictive structure Proximity r Not necessarily taken into account

10 10 Overlays: all in the application layer Tremendous design flexibility m Topology, maintenance m Message types m Protocol m Messaging over TCP or UDP Underlying physical net is transparent to developer m But some overlays exploit proximity application transport network data link physical application transport network data link physical application transport network data link physical

11 11 Examples of overlays r DNS r BGP routers and their peering relationships r Content distribution networks (CDNs) r Application-level multicast m economical way around barriers to IP multicast r And P2P apps !

12 12 1. Overview of P2P r overlay networks r current P2P applications m P2P file sharing & copyright issues m Instant messaging m P2P distributed computing r worldwide computer vision

13 13 Millions of content servers Hey Jude Magic Flute Star Wars ER NPR Blue

14 14 Killer deployments r Napster m disruptive; proof of concept r Gnutella m open source r KaZaA/FastTrack m Today more KaZaA traffic then Web traffic! Is success due to massive number of servers, or simply because content is free?

15 15 P2P file sharing software r Allows Alice to open up a directory in her file system m Anyone can retrieve a file from directory m Like a Web server r Allows Alice to copy files from other users open directories: m Like a Web client r Allows users to search the peers for content based on keyword matches: m Like Google Seems harmless to me !

16 16 1. Overview of P2P r overlay networks r P2P applications r worldwide computer vision

17 17 Worldwide Computer Vision Alices home computer: r Working for biotech, matching gene sequences r DSL connection downloading telescope data r Contains encrypted fragments of thousands of non-Alice files r Occasionally a fragment is read; its part of a movie someone is watching in Paris r Her laptop is off, but its backing up others files r Alices computer is moonlighting r Payments come from biotech company, movie system and backup service Your PC is only a component in the computer Pedagogy: just as computer arch has displaced digital logic, computer networking will displace comp arch

18 18 Worldwide Computer (2) Anderson & Kubiatowicz: Internet-scale OS r Thin software layer running on each host & central coordinating system running on ISOS server complex r allocating resources, coordinating currency transfer r Supports data processing & online services Challenges r heterogeneous hosts r security r payments Central server complex r needed to ensure privacy of sensitive data r ISOS server complex maintains databases of resource descriptions, usage policies, and task descriptions

19 19 2. Unstructured P2P File Sharing r Napster r Gnutella r KaZaA r search theory r dealing with flash crowds

20 20 Napster r the most (in)famous r not the first (c.f. probably Eternity, from Ross Anderson in Cambridge) r but instructive for what it gets right, and r also wrong… r also has a political message…and economic and legal…

21 21 Napster r program for sharing files over the Internet r a disruptive application/technology? r history: m 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service m 12/99: first lawsuit m 3/00: 25% UWisc traffic Napster m 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws m 7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus (KaZaA): 300K

22 22 Napster r judge orders Napster to pull plug in July 01 r other file sharing apps take over! gnutella napster fastrack (KaZaA) 8M 6M 4M 2M 0.0 bits per sec

23 23 Napster: how does it work r Application-level, client-server protocol over point-to-point TCP r Centralized directory server Steps: r connect to Napster server r upload your list of files to server. r give server keywords to search the full list with. r select best of correct answers. (pings)

24 24 Napster File list and IP address is uploaded 1. centralized directory

25 25 Napster centralized directory Query and results User requests search at server. 2.

26 26 Napster pings User pings hosts that apparently have data. Looks for best transfer rate. 3. centralized directory

27 27 Napster centralized directory Retrieves file User chooses server 4. Napsters centralized server farm had difficult time keeping up with traffic

28 28 2. Unstructured P2P File Sharing r Napster r Gnutella r KaZaA r search theory r dealing with flash crowds

29 29 Distributed Search/Flooding

30 30 Distributed Search/Flooding

31 31 Gnutella r focus: decentralized method of searching for files m central directory server no longer the bottleneck m more difficult to pull plug r each application instance serves to: m store selected files m route queries from and to its neighboring peers m respond to queries if file stored locally m serve files

32 32 Gnutella r Gnutella history: m 3/14/00: release by AOL, almost immediately withdrawn m became open source m many iterations to fix poor initial design (poor design turned many people off) r issues: m how much traffic does one query generate? m how many hosts can it support at once? m what is the latency associated with querying? m is there a bottleneck?

33 33 Gnutella: limited scope query Searching by flooding: r if you dont have the file you want, query 7 of your neighbors. r if they dont have it, they contact 7 of their neighbors, for a maximum hop count of 10. r reverse path forwarding for responses (not files) Note: Play gnutella animation at:

34 34 Gnutella overlay management r New node uses bootstrap node to get IP addresses of existing Gnutella nodes r New node establishes neighboring relations by sending join messages join

35 35 Gnutella in practice r Gnutella traffic << KaZaA traffic r Fixes: do things KaZaA is doing: hierarchy, queue management, parallel download,…

36 36 Gnutella Discussion: r researchers like it because its open source m but is it truly representative? r architectural lessons learned? r good source for technical info/open questions:

37 37 2. Unstructured P2P File Sharing r Napster r Gnutella r KaZaA r search theory r dealing with flash crowds

38 38 KaZaA: The service r more than 3 million up peers sharing over 3,000 terabytes of content r more popular than Napster ever was r more than 50% of Internet traffic ? r MP3s & entire albums, videos, games r optional parallel downloading of files r automatically switches to new download server when current server becomes unavailable r provides estimated download times

39 39 KaZaA: The service (2) r User can configure max number of simultaneous uploads and max number of simultaneous downloads r queue management at server and client m Frequent uploaders can get priority in server queue r Keyword search m User can configure up to x responses to keywords r Responses to keyword queries come in waves; stops when x responses are found r From users perspective, service resembles Google, but provides links to MP3s and videos rather than Web pages

40 40 KaZaA: Technology Software r Proprietary r files and control data encrypted r Hints: m KaZaA Web site gives a few m Some reverse engineering attempts described in Web r Everything in HTTP request and response messages Architecture r hierarchical r cross between Napster and Gnutella

41 41 KaZaA: Architecture r Each peer is either a supernode or is assigned to a supernode r Each supernode knows about many other supernodes (almost mesh overlay) supernodes

42 42 KaZaA: Architecture (2) r Nodes that have more connection bandwidth and are more available are designated as supernodes r Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants r Guess: supernode has (on average) descendants; roughly 10,000 supernodes r There is also dedicated user authentication server and supernode list server

43 43 KaZaA: Overlay maintenance r List of potential supernodes included within software download r New peer goes through list until it finds operational supernode m Connects, obtains more up-to-date list m Node then pings 5 nodes on list and connects with the one with smallest RTT r If supernode goes down, node obtains updated list and chooses new supernode

44 44 KaZaA Queries r Node first sends query to supernode m Supernode responds with matches m If x matches found, done. r Otherwise, supernode forwards query to subset of supernodes m If total of x matches found, done. r Otherwise, query further forwarded m Probably by original supernode rather than recursively

45 45 Parallel Downloading; Recovery r If file is found in multiple nodes, user can select parallel downloading r Most likely HTTP byte-range header used to request different portions of the file from different nodes r Automatic recovery when server peer stops sending file

46 46 Lessons learned from KaZaA r Exploit heterogeneity r Provide automatic recovery for interrupted downloads r Powerful, intuitive user interface Copyright infringement r International cat-and- mouse game r With distributed, serverless architecture, can the plug be pulled? r Prosecute users? r Launch DoS attack on supernodes? r Pollute? KaZaA provides powerful file search and transfer service without server infrastructure

47 47 2. Unstructured P2P File Sharing r Napster r Gnutella r KaZaA r search theory r dealing with flash crowds

48 48 Modeling Unstructured P2P Networks r In comparison to DHT-based searches, unstructured searches are m simple to build m simple to understand algorithmically r Little concrete is known about their performance r Q: what is the expected overhead of a search? r Q: how does caching pointers help?

49 49 Replication r Scenario m Nodes cache copies (or pointers to) content object info can be pushed from nodes that have copies more copies leads to shorter searches m Caches have limited size: cant hold everything m Objects have different popularities: different content requested at different rates r Q: How should the cache be shared among the different content? m Favor items under heavy demand too much then lightly demanded items will drive up search costs m Favor a more flat caching (i.e., independent of popularity), then frequent searches for heavily- requested items will drive up costs r Is there an optimal strategy?

50 50 Model r Given m m objects, n nodes, each node can hold c objects, total system capacity = cn m q i is the request rate for the i th object, q 1 q 2 … q m m p i is the fraction of total system capacity used to store object i, p i = 1 r Then m Expected length of search for object i = K / p i for some constant K note: assumes search selects node w/ replacement, search stops as soon as object found m Network bandwidth used to search for all objects: B = q i K / p i r Goal: Find allocation for {p i } (as a function of {q i }) to minimize B r Goal 2: Find distributed method to implement this allocation of {p i }

51 51 Some possible choices for {p i } r Consider some typical allocations used in practice m Uniform: p 1 = p 2 = … = p m = 1/m easy to implement: whoever creates the object sends out cn/m copies m Proportional: p i = a q i where a = 1/q i is a normalization constant also easy to implement: keep the received copy cached r What is B = q i K / p i for these two policies? m Uniform: B = q i K / (1/m) = Km/a m Proportional: B = q i K / (a q i ) = Km/a r B is the same for the Proportional and Uniform policies!

52 52 In between Proportional and Uniform r Uniform: p i / p i+1 = 1, Proportional: p i / p i+1 = q i / q i+1 1 r In between: 1 p i / p i+1 q i / q i+1 r Claim: any in-between allocation has lower B than B for Uniform / Proportional r Proof: Omitted here r Consider Square-Root allocation: p i = sqrt(q i ) / sqrt(q i ) r Thm: Square-Root is optimal r Proof (sketch): m Noting p m = 1 – (p 1 + … + p m-1 ) m write B = F(p 1, …, p m-1 ) = m-1 q i /p i + q m /(1- m-1 p i ) m Solving dF/dp i = 0 gives p i = p m sqrt(q i /q m )

53 53 Distributed Method for Square-Root Allocation r Assumption: each copy in the cache disappears from the cache at some rate independent of the object cached (e.g., object lifetime is i.i.d.) r Algorithm Sqrt-Cache: cache a copy of object i (once found) at each node visited while searching for object i r Claim Algorithm implements Square-Root Allocation

54 54 Proof of Claim r Sketch of Proof of Correctness: m Let f i (t) be fraction of locations holding object time t m p i = lim t f i (t) m At time t, using Sqrt-Cache, object i populates cache at avg rate r i = q i / f i (t) m When f i (t) / f j (t) < sqrt(q i ) / sqrt(q j ), then r i (t) / r j (t) = q i f j (t) / q j f i (t) > sqrt(q i ) / sqrt(q j ) hence, ratio f i (t) / f j (t) will increase m When f i (t) / f j (t) > sqrt(q i ) / sqrt(q j ), then r i (t) / r j (t) = q i f j (t) / q j f i (t) < sqrt(q i ) / sqrt(q j ) hence, ratio f i (t) / f j (t) will decrease m Steady state is therefore when f i (t) / f j (t) = sqrt(q i ) / sqrt(q j ),

55 55 2. Unstructured P2P File Sharing r Napster r Gnutella r KaZaA r search theory r dealing with flash crowds

56 56 Flash Crowd r Def: A sudden, unanticipated growth in demand of a particular object r Assumption: content was previously cold and hence an insufficient number of copies is loaded into the cache r How long will it take (on average) for a user to locate the content of interest? r How many messages can a node expect to receive due to other nodes searches?

57 57 Generic Search Protocol r Initiator sends queries to f randomly chosen neighbors r Node receiving query m with object: forwards object directly (via IP) to the initiator m w/o object TTL not exceeded: forwards query to f neighbors, else does nothing m w/o object and TTL exceeded: do nothing r If object not found, increase TTL and try again (to some maximum TTL) r Note: dumb protocol, nodes do not supress repeat queries f = 3 Randomized TTL-scoped search

58 58 Analysis of the Search Protocol r Modeling assumptions: m Neighbor overlay is fully-connected queries are memoryless – if a node is queried multiple times, it acts each time as if its the first time (and a node may even query itself) Accuracy of analysis verified via comparison to simulation on neighbor overlays that are sparsely connected m Protocol is round-based: query received by participant in round i is forwarded to f neighbors in round i+1 m Time searchers start their searches: will evaluate 2 extremes sequential: one user searches at a time simultaneous: all users search simultaneously

59 59 Search Model: Preliminaries r Parameters m N = # nodes in the overlay (fully connected) m H = # nodes that have a copy of the desired object (varies w/ time) r Performance Measures m R = # rounds needed to locate the object m T = # query transmissions r p = P(Randomly chosen node does not have object) = 1-(H/N) r Recall: f = # neighbors each node forwards query to r P(R > i) = p^(f+f 2 +f 3 +…+f i ) = p^((f i +1-f)/(f-1)) r E[R] = P(R > i) i0

60 60 Search Model contd r To compute E[T]: m Create a schedule: Each node determines in advance who to query if a query is necessary m N i,j is the jth node at depth i in the schedule m X i,j = 1 if the query scheduled at N i,j is executed and is 0 otherwise m X i,j = 1 if and only if both X i,j =1 for the i-1 entries N i,j along the path from N 1,1 to N i,j N i,j does not have a copy of the object m P(X i,j =1) = p i-1 m E[T] = P(X i,j =1) = p^(0+f+f 2 +…+f i-1 ) = p^((f i -1)/(f-1)) i,j i i N 2, N 1,1 N 1,3 N 1,2 N 2,2 N 2,3 N 2,9 N 2,4 …

61 61 Analyzing undirected searches during a flash crowd r Scenario: A large majority of users suddenly want the same object r Numerous independent searches for the same object are initiated throughout the network m Nodes cannot suppress one users search for an object with the other. Each search has different location where object should be delivered r What is the cost of using an unstructured search protocol?

62 62 One-after-the-other Searches r N h = # nodes that initially have object, I = max TTL r Sequential searches,f = 10, terminates when all nodes have object r Analytical Results (confirmed with simulation): m Expected transmissions sent and received per node is small (max is manageable) m Expected # of rounds small (unless max TTL kept small) m Simulation results use overlay graphs where # of neighbors bounded by 100: Note error using full connectivity is negligible

63 63 Flash Crowd Scalability: Intuitive Explanation r Gnutella scales poorly when different users search for different objects: high transmission overhead r Q: Why will expanding-ring TTL search achieve better scalability? r A: m Popular objects propagate through overlay via successful searches m Subsequent searches often succeed with smaller TTL: require less overhead

64 64 Simultaneous Searches r Model: Start measuring at a point in time where N h have copies and N d nodes have been actively searching for a long time r Compute upper bound on expected # transmissions and rounds r Details omitted here…

65 65 Simultaneous Search Results r Simulation results show upper bounds to be extremely conservative (using branching process model of search starts) r Conclusion (conservative) : m less than 400 transmissions on average received and sent per node to handle delivery to millions of participants m less than 15 query rounds on average

66 66 Simultaneous Search Intuition r Let h(t) be the number of nodes that have the object after the t th round where d of N nodes are searching r Each searching node contacts s nodes on average per round r Approximation: h(t) = h(t-1) + (d – h(t-1)) s * h(t-1)/N, h(0) > 0 r Even when h(t)/N is small, some node has high likelihood of finding object r h(t) grows quickly even when small when many users search simultaneously

67 67 3. Structured P2P: DHT Approaches r DHT service and issues r CARP r Consistent Hashing r Chord r CAN

68 68 Challenge: Locating Content r Simplest strategy: expanding ring search r If K of N nodes have copy, expected search cost at least N/K, i.e., O(N) r Need many cached copies to keep search overhead small Im looking for NGC02 Tutorial Notes Here you go!

69 69 Directed Searches r Idea: m assign particular nodes to hold particular content (or pointers to it, like an information booth) m when a node wants that content, go to the node that is supposed to have or know about it r Challenges: m Distributed: want to distribute responsibilities among existing nodes in the overlay m Adaptive: nodes join and leave the P2P overlay distribute knowledge responsibility to joining nodes redistribute responsibility knowledge from leaving nodes

70 70 DHT Step 1: The Hash r Introduce a hash function to map the object being searched for to a unique identifier: m e.g., h(NGC02 Tutorial Notes) 8045 r Distribute the range of the hash function among all nodes in the network r Each node must know about at least one copy of each object that hashes within its range (when one exists)

71 71 Knowing about objects r Two alternatives m Node can cache each (existing) object that hashes within its range m Pointer-based: level of indirection - node caches pointer to location(s) of object

72 72 DHT Step 2: Routing r For each object, node(s) whose range(s) cover that object must be reachable via a short path r by the querier node (assumed can be chosen arbitrarily) r by nodes that have copies of the object (when pointer-based approach is used) r The different approaches (CAN,Chord,Pastry,Tapestry) differ fundamentally only in the routing approach m any good random hash function will suffice

73 73 DHT Routing: Other Challenges r # neighbors for each node should scale with growth in overlay participation (e.g., should not be O(N)) r DHT mechanism should be fully distributed (no centralized point that bottlenecks throughput or can act as single point of failure) r DHT mechanism should gracefully handle nodes joining/leaving the overlay m need to repartition the range space over existing nodes m need to reorganize neighbor set m need bootstrap mechanism to connect new nodes into the existing DHT infrastructure

74 74 DHT Layered Architecture TCP/IP DHT Network storage Event notification Internet P2P substrate (self-organizing overlay network) P2P application layer ?

75 75 3. Structured P2P: DHT Approaches r DHT service and issues r CARP r Consistent Hashing r Chord r CAN r Pastry/Tapestry r Hierarchical lookup services r Topology-centric lookup service

76 76 CARP DHT for cache clusters r Each proxy has unique name key = URL = u r calc h(proxy n, u) for all proxies r assign u to proxy with highest h(proxy n, u) institutional network proxies clients Internet if proxy added or removed, u is likely still in correct proxy

77 77 CARP (2) r circa 1997 m Internet draft: Valloppillil and Ross r Implemented in Microsoft & Netscape products r Browsers obtain script for hashing from proxy automatic configuration file (loads automatically) Not good for P2P: r Each node needs to know name of all other up nodes r i.e., need to know O(N) neighbors r But only O(1) hops in lookup

78 78 3. Structured P2P: DHT Approaches r DHT service and issues r CARP r Consistent Hashing r Chord r CAN

79 79 Consistent hashing (1) r Overlay network is a circle r Each node has randomly chosen id m Keys in same id space r Nodes successor in circle is node with next largest id m Each node knows IP address of its successor r Key is stored in closest successor

80 80 Consistent hashing (2) file 1110 stored here Whos resp for file 1110 I am O(N) messages on avg to resolve query Note: no locality among neighbors

81 81 Consistent hashing (3) Node departures r Each node must track s 2 successors r If your successor leaves, take next one r Ask your new successor for list of its successors; update your s successors Node joins r Youre new, node id k r ask any node n to find the node n that is the successor for id k r Get successor list from n r Tell your predecessors to update their successor lists r Thus, each node must track its predecessor

82 82 Consistent hashing (4) r Overlay is actually a circle with small chords for tracking predecessor and k successors r # of neighbors = s+1: O(1) m The ids of your neighbors along with their IP addresses is your routing table r average # of messages to find key is O(N) Can we do better?

83 83 3. Structured P2P: DHT Approaches r DHT service and issues r CARP r Consistent Hashing r Chord r CAN r Pastry/Tapestry r Hierarchical lookup services r Topology-centric lookup service

84 84 Chord r Nodes assigned 1-dimensional IDs in hash space at random (e.g., hash on IP address) r Consistent hashing: Range covered by node is from previous ID up to its own ID (modulo the ID space)

85 85 Chord Routing r A node ss i th neighbor has the ID that is equal to s+2 i or is the next largest ID (mod ID space), i0 r To reach the node handling ID t, send the message to neighbor #log 2 (t-s) r Requirement: each node s must know about the next node that exists clockwise on the Chord (0 th neighbor) r Set of known neighbors called a finger table

86 86 Chord Routing (contd) r A node s is node ts neighbor if s is the closest node to t+2 i mod H for some i. Thus, m each node has at most log 2 N neighbors m for any object, the node whose range contains the object is reachable from any node in no more than log 2 N overlay hops (each step can always traverse at least half the distance to the ID) r Given K objects, with high probability each node has at most (1 + log 2 N) K / N in its range r When a new node joins or leaves the overlay, O(K / N) objects move between nodes iFinger table for node Closest node clockwise to 67+2 i mod 100

87 87 Chord Node Insertion r One protocol addition: each node knows its closest counter- clockwise neighbor r A node selects its unique (pseudo-random) ID and uses a bootstrapping process to find some node in the Chord r Using Chord, the node identifies its successor in the clockwise direction r An newly inserted nodes predecessor is its successors former predecessor pred(86)=72 Example: Insert 82

88 88 Chord Node Insertion (contd) r First: set added node ss fingers correctly m ss predecessor t does the lookup for each distance of 2 i from s iFinger table for node Lookup(86) = 86 Lookup(90) = 1 Lookup(98) = 1 Lookup(14) = 32 Lookup(46) = 67 Lookup(84) = 86 Lookup(83) = 86 Lookups from node 72

89 89 Chord Node Insertion (contd) r Next, update other nodes fingers about the entrance of s (when relevant). For each i: m Locate the closest node to s (counter-clockwise) whose 2 i -finger can point to s: largest possible is s - 2 i m Use Chord to go (clockwise) to largest node t before or at s - 2 i route to s - 2 i, if arrived at a larger node, select its predecessor as t m If ts 2 i -finger routes to a node larger than s change ts 2 i -finger to s set t = predecessor of t and repeat m Else i++, repeat from top r O(log 2 N) time to find and update nodes finger= finger= finger=67 X X e.g., for i=3

90 90 Chord Node Deletion r Similar process can perform deletion finger= finger=67 X X e.g., for i= finger=82

91 91 3. Structured P2P: DHT Approaches r DHT service and issues r CARP r Consistent Hashing r Chord r CAN r Pastry/Tapestry r Hierarchical lookup services r Topology-centric lookup service

92 92 CAN r hash value is viewed as a point in a D-dimensional cartesian space r each node responsible for a D-dimensional cube in the space r nodes are neighbors if their cubes touch at more than just a point (more formally, nodes s & t are neighbors when m s contains some [, ] m and t contains [, ]) Example: D=2 1s neighbors: 2,3,4,6 6s neighbors: 1,2,4,5 Squares wrap around, e.g., 7 and 8 are neighbors expected # neighbors: O(D)

93 93 CAN routing r To get to from m choose a neighbor with smallest cartesian distance from (e.g., measured from neighbors center) e.g., region 1 needs to send to node covering X checks all neighbors, node 2 is closest forwards message to node 2 Cartesian distance monotonically decreases with each transmission expected # overlay hops: (DN 1/D )/4 X

94 94 CAN node insertion r To join the CAN: m find some node in the CAN (via bootstrap process) m choose a point in the space uniformly at random m using CAN, inform the node that currently covers the space m that node splits its space in half 1 st split along 1 st dimension if last split along dimension i < D, next split along i+1 st dimension e.g., for 2-d case, split on x-axis, then y-axis m keeps half the space and gives other half to joining node X Observation: the likelihood of a rectangle being selected is proportional to its size, i.e., big rectangles chosen more frequently 9 10

95 95 CAN node removal r Underlying cube structure should remain intact m i.e., if the spaces covered by s & t were not formed by splitting a cube, then they should not be merged together r Sometimes, can simply collapse removed nodes portion to form bigger rectangle m e.g., if 6 leaves, its portion goes back to 1 r Other times, requires juxtaposition of nodes areas of coverage m e.g., if 3 leaves, should merge back into square formed by 2,4,5 m cannot simply collapse 3s space into 4 and/or 5 m one solution: 5s old space collapses into 2s space, 5 takes over 3s space

96 96 CAN (recovery from) removal process r View partitioning as a binary tree of m leaves represent regions covered by overlay nodes (labeled by node that covers the region) m intermediate nodes represent split regions that could be reformed, i.e., a leaf can appear at that position m siblings are regions that can be merged together (forming the region that is covered by their parent)

97 97 CAN (recovery from) removal process r Repair algorithm when leaf s is removed m Find a leaf node t that is either ss sibling descendant of ss sibling where ts sibling is also a leaf node m t takes over ss region (moves to ss position on the tree) m ts sibling takes over ts previous region r Distributed process in CAN to find appropriate t w/ sibling: m current (inappropriate) t sends msg into area that would be covered by a sibling m if sibling (same size region) is there, then done. Else receiving node becomes t & repeat X

98 98 5. Security in Structured P2P Systems r Structured Systems described thusfar assume all nodes behave m Position themselves in forwarding structure to where they belong (based on ID) m Forward queries to appropriate next hop m Store and return content they are assigned when asked to do so r How can attackers hinder operation of these systems? r What can be done to hinder attacks?

99 99 Attacker Assumptions r The attacker(s) participate in the P2P group r Cannot view/modify packets not sent to them r Can collude

100 100 Classes of Attacks r Routing Attacks: re-route traffic in a bad direction r Storage/Retrieval Attacks: prevent delivery of requested data r Miscellaneous m DoS (overload) nodes m Rapid joins/leaves

101 101 Identity Spoofing r Problem: m Node claims to have an identity that belongs to other node m Node delivers bogus content r Solution: m Nodes have certificates signed by trusted authority m Preventing spoofed identity: base identity on IP address, send query to verify the address.

102 102 Routing Attacks 1: redirection r Malicious node redirects queries in wrong direction or to non-existent nodes (drops) Y X locate Y

103 103 Suggested Solution: Part I r Use iterative approach to reach destination. m verify that each hop moves closer (in ID space) to destination Y X locate Y ?

104 104 Suggested Solution: Part II r Provide multiple paths to re-route around attackers Y X

105 105 Choosing the Alternate paths: e.g., a CAN enhancement r Use a butterfly network of virtual nodes w/ depth log n – log log n r Use: m Each real node maps to a set of virtual nodes m If edge (A,B) exists in Butterfly network, then form (A,B) in actual P2P overlay m Flood requests across the edges that form the butterfly r Results: For any ε, there are constants such that m search time is O(log n) m insertion is O(log n) m # search messages is O(log 2 n) m each node stores O(log 3 n) pointers to other nodes and O(log n) data items m All but a fraction ε of peers can access all but a fraction ε of content

106 106 Routing Attack 2: Misleading updates r An attacker could trick nodes into thinking other nodes have left the system r Chord Example: node kicks out other node r Similarly, could claim another (non-existent) node has joined r Proposed solution: random checks of nodes in P2P overlay, exchange of info among trusted nodes finger=82 86 X X e.g., for i= finger=82 82 Malicious node 86 kicks out node 82

107 107 Routing Attack 3: Partition r A malicious bootstrap node sends newcomers to a P2P system that is disjoint from (no edges to) the main P2P system r Solutions: m Use a trusted bootstrap server m Cross-check routing via random queries, compare with trusted neighbors (found outside the P2P ring)

108 108 Storage/Retrieval Attacks r Node is responsible for holding data item D. Does not store or deliver it as required r Proposed solution: replicate object and make available from multiple sites

109 109 Miscellaneous Attacks r Problem: Inconsistent Behavior - Node sometimes behaves, sometimes does not r Solution: force nodes to sign all messages. Can build body of evidence over time r Problem: Overload, i.e., DoS attack r Solution: replicate content and spread out over network r Problem: Rapid Joins/Leaves r Solutions: ?

110 110 SOS: Using DHTs to Prevent DoS Attacks 1. Select Target to attack 2. Break into accounts (around the network) 3. Have these accounts send packets toward the target 4. Optional: Attacker spoofs source address (origin of attacking packets) To perform a DoS Attack:

111 111 Goals of SOS r Allow moderate number of legitimate users to communicate with a target destination, where m DoS attackers will attempt to stop communication to the target m target difficult to replicate (e.g., info highly dynamic) m legitimate users may be mobile (source IP address may change) r Example scenarios m FBI/Police/Fire personnel in the field communicating with their agencys database m Bank users access to their banking records m On-line customer completing a transaction

112 112 SOS: The Players r Target: the node/end- system/server to be protected from DOS attacks r Legitimate (Good) User: node/end-system/user that is authenticated (in advance) to communicate with the target r Attacker (Bad User): node/end- system/user that wishes to prevent legitimate users access to targets

113 113 SOS: The Basic Idea r DoS Attacks are effective because of their many-to-one nature: many attack one r SOS Idea: Send traffic across an overlay: m Force attackers to attack many overlay points to mount successful attack m Allow network to adapt quickly: the many that must be attacked can be changed

114 114 Goal r Allow pre-approved legitimate users to communicate with a target r Prevent illegitimate attackers packets from reaching the target r Want a solution that m is easy to distribute: doesnt require mods in all network routers m does not require high complexity (e.g., crypto) ops at/near the target Assumption: Attacker cannot deny service to core network routers and can only simultaneously attack a bounded number of distributed end-systems

115 115 SOS: Step 1 - Filtering r Routers near the target apply simple packet filter based on IP address m legitimate users IP addresses allowed through m illegitimate users IP addresses arent r Problems: What if m good and bad users have same IP address? m bad users know good users IP address and spoofs? m good IP address changes frequently (mobility)? (frequent filter updates)

116 116 SOS: Step 2 - Proxies r Step 2: Install Proxies outside the filter whose IP addresses are permitted through the filter m proxy only lets verified packets from legitimate sources through the filter w.x.y.z not done yet…

117 117 Problems with a known Proxy Proxies introduce other problems r Attacker can breach filter by attacking with spoofed proxy address r Attacker can DoS attack the proxy, again preventing legitimate user communication w.x.y.z Im w.x.y.z

118 118 SOS: Step 3 - Secret Servlets r Step 3: Keep the identity of the proxy hidden m hidden proxy called a Secret Servlet m only target, the secret servlet itself, and a few other points in the network know the secret servlets identity (IP address)

119 119 SOS: Steps 4&5 - Overlays r Step 4: Send traffic to the secret servlet via a network overlay m nodes in virtual network are often end-systems m verification/authentication of legitimacy of traffic can be performed at each overlay end-system hop (if/when desired) r Step 5: Advertise a set of nodes that can be used by the legitimate user to access the overlay m these access nodes participate within the overlay m are called Secure Overlay Access Points (SOAPs) User SOAP across overlay Secret Servlet (through filter) target

120 120 SOS with Random routing r With filters, multiple SOAPs, and hidden secret servlets, attacker cannot focus attack SOAP ? secret servlet

121 121 Better than Random Routing r Must get from SOAP to Secret Servlet in a hard-to-predict manner: But random routing routes are long (O(n)) r Routes should not break as nodes join and leave the overlay (i.e., nodes may leave if attacked) r Current proposed version uses DHT routing (e.g., Chord, CAN, PASTRY, Tapestry). We consider Chord: m Recall: A distributed protocol, nodes are used in homogeneous fashion m identifier, I, (e.g., filename) mapped to a unique node h(I) = B in the overlay m Implements a route from any node to B containing O(log N) overlay hops, where N = # overlay nodes h(I) to h(I)

122 122 Step 5A: SOS with Chord r Utilizes a Beacon to go from overlay to secret servlet r Using target IP address A, Chord will deliver packet to a Beacon, B, where h(A) = B r Secret Servlet chosen by target (arbitrarily) r Servlet informs Beacon of its identity via Chord SOAP IP address A Beacon IP address B Im a secret servlet for A To h(A) Be my secret servlet To h(A) SOS protected data packet forwarding 1.Legitimate user forwards packet to SOAP 2.SOAP forwards verified packet to Beacon (via Chord) 3.Beacon forwards verified packet to secret servlet 4.Secret Servlet forwards verified packet to target

123 123 Adding Redundancy in SOS r Each special role can be duplicated if desired m Any overlay node can be a SOAP m The target can select multiple secret servlets m Multiple Beacons can be deployed by using multiple hash functions r An attacker that successfully attacks a SOAP, secret servlet or beacon brings down only a subset of connections, and only while the overlay detects and adapts to the attacks

124 124 Why attacking SOS is difficult r Attack the target directly (without knowing secret servlet ID): filter protects the target r Attack secret servlets: m Well, theyre hidden… m Attacked servlets shut down and target selects new servlets r Attack beacons: beacons shut down (leave the overlay) and new nodes become beacons m attacker must continue to attack a shut down node or it will return to the overlay r Attack other overlay nodes: nodes shut down or leave the overlay, routing self-repairs SOAP beacon secret servlet Chord

125 125 Attack Success Analysis r N nodes in the overlay r For a given target m S = # of secret servlet nodes m B = # of beacon nodes m A = # of SOAPs r Static attack: Attacker chooses M of N nodes at random and focuses attack on these nodes, shutting them down r What is P static (N,M,S,B,A) = P(a ttack prevents communication with target) r P(n,b,c) = P(set of b nodes chosen at random (uniform w/o replacement) from n nodes contains a specific set of c nodes) r P(n,b,c) = = n-c b-c nbnb bcbc ncnc Node jobs are assigned independently (same node can perform multiple jobs)

126 126 r Pstatic(N,M,S,B,A) = 1 - (1 - P(N,M,S))(1 – P(N,M,B))(1 – P(N,M,A)) Attack Success Analysis contd Almost all overlay nodes must be attacked to achieve a high likelihood of DoS

127 127 Dynamic Attacks r Ongoing attack/repair battle: m SOS detects & removes attacked nodes from overlay, repairs take time T R m Attacker shifts from removed node to active node, detection/shift takes time T A (freed node rejoins overlay) r Assuming T A and T R are exponentially distributed R.V.s, can be modeled as a birth-death process … 0 1 μ1μ1 μ2μ2 1 2 M-1 M μ M-1 μMμM M-1 M M = Max # nodes simultaneously attacked π i = P(i attacked nodes currently in overlay) P dynamic = 0 i M (π i P static (N-M+i,i,S,B,A)) Centralized attack: i = Distributed attack: i = (M-i) Centralized repair: μ i = μ Distributed repair: μ i = i μ

128 128 Dynamic Attack Results r 1000 overlay nodes, 10 SOAPs, 10 secret servlets, 10 beacons r If repair faster than attack, SOS is robust even against large attacks (especially in centralized case) centralized attack and repair distributed attack and repair

129 129 SOS Summary r SOS protects a target from DoS attacks m lets legitimate (authenticated) users through r Approach m Filter around the target m Allow hidden proxies to pass through the filter m Use network overlays to allow legitimate users to reach the hidden proxies r Preliminary Analysis Results m An attacker without overlay insider knowledge must attack majority of overlay nodes to deny service to target

130 Anonymnity r Suppose clients want to perform anonymous communication m requestor wishes to keep its identity secret m deliverer wishes to also keep identity secret

131 131 Onion Routing r A Node N that wishes to send a message to a node M selects a path (N, V 1, V 2, …, V k, M) m Each node forwards message received from previous node m N can encrypt both the message and the next hop information recursively using public keys: a node only knows who sent it the message and who it should send to r Ns identity as originator is not revealed

132 132 Anonymnity on both sides r A requestor of an object receives the object from the deliverer without these two entities exhanging identities r Utilizes a proxy m Using onion routing, deliverer reports to proxy (via onion routing) the info it can deliver, but does not reveal its identity m Nodes along this onion-routed path, A, memorize their previous hop m Requestor places request to proxy via onion-routing, each node on this path, B, memorize previous hop m Proxy Deliverer follows memorized path A m Deliverer sends article back to proxy via onion routing m Proxy Requestor via memorized path B Proxy Requestor Deliverer

133 Measurement of Existing P2P Systems r Systems observed m Gnutella m Kazaa m Overnet (DHT-based) r Measurements described m fraction of time hosts are available (availability) m popularity distribution of files requested m # of files shared by host

134 134 Results from 3 studies r [Sar02] m Sampled Gnutella and Napster clients for 8 and 4 day period m measured availability, bandwidths, propagation delays, file sizes, file popularities r [Chu02] m Sampled Gnutella and Napster clients for monthlong period m measured availability, file sizes and popularities r [Bha03] m Sampled Overnet clients for a week-long period m Measured availability, error due to use of IP address as identifier

135 135 Methods used r Identifying clients m Napster: ask central server for clients that provide popular names of files m Gnutella: send pings to well-known (bootstrap) peers and obtain their peer lists m Overnet: search for random IDs r Probing: m Bandwidth/latency: tools that take advantage of TCPs reliability and congestion control mechanism m Availability/Files offered, etc: pinging host (by whatever means is necessary for the particular protocol, usually by mechanism provided in protocol)

136 136 Availability r [Sar02] results: application uptime CDF is concave r [Chu02]: short studies overestimate uptime percentage m Implication: clients use of P2P tool is performed in bursty fashion over long timescales

137 137 Availability contd r [Bha03]: using IP address to identify P2P client can be inaccurate m nodes behind NAT box share IP address m address can change when using DHCP m [Chu02] results about availability as function of period similar even when clients are not grouped by IP address

138 138 Session Duration r [Sar02]:

139 139 File popularity r Popular files are more popular in Gnutella than in Napster r Gnutella clients more likely to share more files

140 140 Bottleneck Bandwidths of Clients

141 Future Research r Specific m Locality in DHT-based systems: how to guarantee copies of objects in the local area r General m Using DHTs: To hash or not to hash (are DHTs a good thing)? m Trust: Building a trusted system from autonomous, untrusted / semi-trusted collections m Dynamicity: Building systems that operate in environments where nodes join/leave/fail at high rates

142 142 Selected References r A. Oram (ed), Peer-to-Peer: Harnessing the Power of Disruptive Technologies, O'Reilly & Associates, 2001 r David P. Anderson and John Kubiatowicz, The Worldwide Computer, Scientific American, March 2002 r Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proceedings of ACM SIGCOMM01, San Diego, CA, August r Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker, A Scalable Content-Addressable Network, Proceedings of ACM SIGCOMM01, San Diego, CA, August r OceanStore: An Architecture for Global-Scale Persistent Storage, John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao. Appears in Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), November 2000 r W. J. Bolosky, J. R. Douceur, D. Ely, M. Theimer; Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs, Proceedings of the international conference on Measurement and modeling of computer systems, 2000, pp r J. Chu, K. Labonte, B. Levine, Availability and Locality Measurements of Peer-to-Peer File Systems, Proceedings of SPIE ITCOM, Boston, MA, July r R. Bhagwan, S. Savage, G. Voelker, Understanding Availability, in Proc. 2nd International Workshop on Peer-to-Peer Systems (IPTPS), Berkeley, CA, Feb r S. Saroiu, P. Gummadi, S. Gribble, A Measurement Study of Peer-to-Peer File Sharing Systems, in Proceedings of Multimedia Computing and Networking 2002 (MMCN'02), San Jose, CA, January Edith Cohen and Scott Shenker, Replication Strategies in Unstructured Peer-to-Peer Networks, in Proceedings of ACM SIGCOMM'02, Pittsburgh, PA, August 2002 r Dan Rubenstein and Sambit Sahu, An Analysis of a Simple P2P Protocol for Flash Crowd Document Retrieval, Columbia University Technical Report r A. Keromytis, V. Misra, D. Rubenstein, SOS: Secure Overlay Services, in Proceedings of ACM SIGCOMM'02, Pittsburgh, PA, August 2002 r M. Reed, P. P. Syverson, D. Goldschlag, Anonymous Connections and Onion Routing, IEEE Journal on Selected Areas of Communications, Volume 16, No. 4, r V. Scarlata, B. Levine, C. Shields, Responder Anonymity and Anonymous Peer-to-Peer File Sharing, in Proc. IEEE Intl. Conference on Network Protocols (ICNP), Riverside, CA, November r E. Sit, R. Morris, Security Considerations for Peer-to-Peer Distributed Hash Tables, in Proc. 1 st International Workshop on Peer- to-Peer Systems (IPTPS), Cambridge, MA, March r J. Saia, A. Fiat, S. Gribble, A. Karlin, S. Sariou, Dynamically Fault-Tolerant Content Addressable Networks, in Proc. 1 st International Workshop on Peer-to-Peer Systems (IPTPS), Cambridge, MA, March r M. Castro, P. Druschel, A. Ganesh, A. Rowstron, D. Wallach, Secure Routing for Structured Peer-to-Peer Overlay Netwirks, In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI'02), Boston, MA, December r 2002.

143 143 Additional references r Antony Rowstron and Peter Druschel, Pastry: Scalable, Decentralized, Object Location and Routing for Large-scale Peer-to-peer Systems, Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms (Middelware)02 r Ben Y. Zhao, John Kubiatowicz, Anthony Joseph, Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, Technical Report, UC Berkeley r A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", 18th ACM SOSP'01, Lake Louise, Alberta, Canada, October r S. Iyer, A. Rowstron and P. Druschel, "SQUIRREL: A decentralized, peer-to-peer web cache", appeared in Principles of Distributed Computing (PODC 2002), Monterey, CA r Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica, Wide-area cooperative storage with CFS, ACM SOSP 2001, Banff, October 2001 r Ion Stoica, Daniel Adkins, Shelley Zhaung, Scott Shenker, and Sonesh Surana, Internet Indirection Infrastructure, in Proceedings of ACM SIGCOMM'02, Pittsburgh, PA, August 2002, pp r L. Garces-Erce, E. Biersack, P. Felber, K.W. Ross, G. Urvoy-Keller, Hierarchical Peer-to-Peer Systems, 2003, r Kangasharju, K.W. Ross, D. Turner, Adaptive Content Management in Structured P2P Communities, 2002, r K.W. Ross, E. Biersack, P. Felber, L. Garces-Erce, G. Urvoy-Keller, TOPLUS: Topology Centric Lookup Service, 2002, r P. Felber, E. Biersack, L. Garces-Erce, K.W. Ross, G. Urvoy-Keller, Data Indexing and Querying in P2P DHT Networks, r K.W. Ross, Hash-Routing for Collections of Shared Web Caches, IEEE Network Magazine, Nov-Dec 1997

Download ppt "1 P2P Systems Dan Rubenstein Columbia University Thanks to: B. Bhattacharjee, K. Ross, A. Rowston,"

Similar presentations

Ads by Google