Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems.

Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems

Introduction-P2P Network A peer-to-peer (P2P) network is a distributed system in which peers employ distributed resources to perform a critical function in a decentralized fashion [LW2004] Classification of P2P networks  Unstructured and Structured  Centralized and Decentralized  Hierarchical and Non-Hierarchical

Structured P2P network Distributed hash table (DHT)  DHT is a structured overlay that offers extreme scalability and hash-table-like lookup interface CAN, Chord, Pastry Other techniques  Skip list Skipgraph, SkipNet

Outline Hashed based techniques in P2P  Hashed based structured P2P system Pastry P-Grid  Two important issues Load balancing Neighbor table consistency preserving  Comparison of DHT techniques Skip-list based system  SkipNet Conclusion

Pastry [RD2001] Pastry is a P2P object location and routing scheme  Hash-based Properties  Completely decentralized  Scalable  Self-organized  Fault-resilient  Efficient search

Design of Pastry nodeID: each node has a unique numeric identifier (128 bit)  Assigned randomly Nodes with adjacent nodeIDs are diverse in geography, ownership, etc Assumption: nodeID is uniform in the ID space  Presented as a sequence of digits with base 2 b b is a configuration parameter (4)

Design of Pastry (cont’) Message/query has a numeric key of same length with nodeIDs  Key is presented as a sequence of digits with base 2 b Route: a message is routed to the node with a nodeID that is numerically closest to the key

Message Key = 10 Destination of Routing 20 31 230312 Destination node

Pastry Schema Given a message of key k, a node A forwards the message to a node whose ID is numerically closest to k among all nodes known to A Each node maintains some routing state

Pastry Node State A leaf set L A routing table A neighborhood set M 10233232 10233122 1023323010233000 10233020 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302

Meanings of ‘Close’ Closest according to proximity metric (real distance ) Nearest Neighbor Closest according to numerical meaning Node with closet nodeID 20 31 230312 31 23

Pastry Node State A leaf set  |L| nodes with closest nodeIDs |L|/2 larger ones and |L|/2 smaller ones  Useful in message routing A neighborhood set  |M| nearest neighbors  Useful in maintaining locality properties

10233232 10233122 1023323010233000 10233021 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302 Leaf Set and Neighborhood Set In this example b=2, l=8 |L| = 2 × 2 b = 8 |M| = 2 × 2 b = 8 SMALLERLARGER A

Routing Table l rows and 2 b columns  i th row: i-prefix  j th column: next digit after the prefix is j b=2 l=8 － > 8 rows and 4 columns 10233232 10233122 1023323010233000 10233021 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302 2 nd 10-0-3120310-1-3210210-3-2330210-0-3120310-1-3210210-3-23302 NodeID 10233102 j=0 j=1 j=3 A

10233232 10233122 1023323010233000 10233021 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302 Routing Step1: If k falls within the range of nodeIDs covered by A’s leaf set, forwarded it to a node in the leaf set whose nodeID is closest to k Eg. k = 10233022 falls in the range (10233000,10233232) Forword it to node10233021 If k is not covered by the leaf set, go to step2 A

10233232 10233122 1023323010233000 10233021 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302 Routing Step2: The routing table is used and the message is forwarded to a node whose ID shares a longer prefix with the k than A’s nodeID does Eg. k = 10223220 forward it to node 10222302 102-2-2302 If the appropriate entry in the routing table is empty, go to step3 A

Step3: The message is forwarded to a node in the leaf set, whose ID has the same shared prefix as A but is numerically closer to k than A Eg. k = 10233320 If such a node does not exist, A is the destination node 10233232 10233122 1023323010233000 10233021 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302 Routing A forward it to node10233232

Routing The routing procedure always converges, since each step chooses a node that  Shares a longer prefix  Shares the same long prefix, but is numerically closer Routing performance  The expected number of routing steps is log 2 b N  Assumption: accurate routing tables and no recent node failures

Performance Average number of routing hops versus number of Pastry nodes b = 4, |L| = 16, |M| =32 and 200,000 lookups.

Discussion of Pastry Pastry: the parameters make it flexible  b is the most important parameter that determines the power of the system Trade-off between the routing efficient (log 2 b N) and routing table size (log 2 b N×2 b )  Each node can choose its own |L| and |M| based on the node situation

Local optimal?? Eg. k = 10233200 Discussion of Pastry – routing schema 10233133 10233122 1023313210233000 10233021 10233001 1023303310233120 LARGER -0-2212102 SMALLER 10-0-31203 1-1-3012331-3-0210221-2-230203 -3-1203203-2-2301203 10-1-32102 1 0 2 3 3 1 0 102-0-0230 1023-0-322 2 1023-1-000 102-2-2302 1023-2-120 10233-0-01 10-3-23302 10233-2-32 102331-2-0 33213321 31301233 3120320322301203 10200230 02212102 1302102211301233 NodeID 10233102 Routing table Leaf set Neighborhood set 102-1-1302 A Y’ nodeID = 10233133 Dis(k, X’ID) = (10233200, 10233232) = 32 Dis(k, Y’ID) = (10233200, 10233133) = 1 X’ nodeID = 10233232 Local optimal node is Y Pastry forward to node X

P-Grid [Aberer2001] P-Grid is a scalable access structure for P2P  Hash-based & virtual binary search tree  Randomized algorithms are used for constructing the access structure 654321 01 00011011 Virtual binary tree 1 :3 01:2 1 :5 01:2 0 :6 11:5 0 :2 11:5 1 :4 00:6 0 :6 10:4 Query k=100 4

P-Grid (cont’) Properties  Complete decentralized  Scalable with the total number of nodes and data items  Fault-resilient, search is robust against failures of nodes  Efficient search

Discussion of Pastry and P-Grid The two system both make uniform assumption  Pastry: ID space  P-Grid: data distribution and behavior on peer If data/message/query distribution is skewed, Pastry and P-Grid are not able to balance the load

Load Balancing Consider a DHT P2P system with N nodes  Θ(logN) imbalance factor if items IDs are uniformly distributed [SMKKB2001]  Even worse if applications associate semantics with the item IDs IDs would no longer be uniformly distributed How to  Minimize the load imbalance?  Minimize the amount of load moved?

Load Balancing Challenges  Data items are continuously inserted/deleted  Nodes join and depart continuously  The distribution of data item IDs and item sizes can be skewed Solution—[GLSKS2004]

Load Balancing Virtual server  Represents a peer in the DHT rather than physical node  A physical node hosts one or more virtual server  Total load of virtual servers = load of node  E.g., in Chord 0 1 6 4 2 7 53 Virtual Server FT 1 FT 3 Node: Physical Node

Load Balancing Basic idea  Directories To store load information of the peer nodes Periodically schedule reassignments of virtual servers Distributed load balancing problem Centralized problem at each directory reduced to

Load Balancing Load balancing algorithm Directory ID (known to all nodes) Nod e Computes a schedule of virtual server transfers among nodes contacting it in order to reduce their maximal utilization Delay T time Receives information from nodes Randomly chooses a directory Send to directory:(1)Loads of all virtual servers that it is responsible for (2)Capacity directory in new cycle OR utilization>K e yes Emergency load balancing

Load Balancing Load balancing algorithm (cont.)  Computing optimal reassignment is NP- complete  Greedy algorithm O(mlogm) For each heavily loaded node, move the least loaded virtual server to pool For each virtual server in pool, from heaviest to lightest, assign to a node n which minimizes the resulting load

Load Balancing Performance  Tradeoff: Load movement vs. Load balancing Load balancing: max node utilization When T decreases  Max node utilization decreases  Load movement increases  Effective in achieving load balancing for System utilization as high as 90% Only transfer 8% of the load that arrives in the system  Emergency load balancing is necessary

Consistency Preserving Neighbor table  A table of neighbor pointers  For efficient routing in a P2P system Challenge  How to maintain consistent neighbor tables in a dynamic network where nodes may join, leave and fail concurrently and frequently?

Consistency Preserving Consistent network  For every entry in neighbor tables, if there exists at least one qualified node in the network, then the entry stores at least one qualified node Qualified node for an entry of a node’s neighbor table: the node whose ID has suffix same as the required suffix of that entry  Otherwise, the entry is empty

Consistency Preserving K-consistent network  For every entry in neighbor tables, if there exist H qualified nodes in the network, then the entry stores at least min(K,H) qualified nodes  Otherwise, the entry is empty For K>0, K-consistency => consistency 1-consistency = consistency

Consistency Preserving General strategy  Identify a consistent subnet as large as possible  Only replace a neighbor with a closer one if both of them belong to the subnet  Expand the consistent subnet after new nodes join  Maintain consistency of the subnet when nodes fail

Consistency Preserving Approach of [LL2004b]  To design a join protocol such that An initially K-consistent network remains K- consistent after a set of nodes join process terminate The termination of join implies the node joined belong to this consistent subnet  To design a failure recovery protocol that Recovers K-consistency of the subnet by repairing holes left by failed neighbors with qualified nodes in the subnet Protocol is presented in the paper [LL2004a], but integrated with join in experiment of this paper

Consistency Preserving Join protocol  Each node has a status copying, waiting, notifying, cset_waiting, in_system S-node: node in status in_system T-node: otherwise  All S-nodes form a consistent subnet

Consistency Preserving copying waiting notifying cnet_wating in_system Copy neighbor infor from S-nodes to fill in most entries of its table level by level. When cannot find a qualified S-node for a level i>=1 Try to find an S-node which shares at least the rightmost i-1 with x and stores x as a neighbor When find such a node, say y Seek and notify nodes that share the rightmost j digits with it, where j is the lowest level that x is stored in y’s table When finish notifying Wait for the nodes joining currently and are likely to be in the same consistent subnet When confirm all nodes have exited notifying status

Consistency Preserving Performance  p-ratio In x’s table, the primary-neighbor of the entry is y, the true primary-neighbor should be z p-ratio = delay from x to y / delay from x to z  K-consistency is always maintained in all experiments  When K increases, p-ratio decreases More neighbor infor is stored => more messages  Even with massive joins and failures, tables are still optimized greatly

Comparing DHTs [DGPR2003] Each DHT Algorithm has many details making it difficult to compare. We will use a component-base analysis approach  Break DHT design into independent components  Analyze impact of each component choice separately Two types of components  Routing-level : neighbor & route selection  System-level : caching, replication, querying policy, latency

Metrics Used Metrics used in comparison  Flexibility – Options in choosing neighbors and routes  Resilience – Does it route when nodes goes down ?  Load balancing – Is the content distributed ?  Proximity & Latency – Is the content stored nearby ? Aspects of DHT  Geometry - a structure that inspires a DHT design,  Distance function –distance between two nodes  Algorithm: rules for selecting neighbors and routes using the distance function

Algorithm & Geometry What is routing algorithm & geometry ?  Routing Algorithm – refers to exact rules for selecting neighbors, routes. (eg. Chord, CAN, PRR, Tapestry, Pastry)  Geometries – refers to the algorithms’ underlying structure derived from the way in which neighbors and routes are chosen. (Eg. Chord routes on a ring). Why is geometry important ? Geometry capture flexibility in selection of neighbors and routes.  Neighbor selection – Does the geometry choose neighbors based on proximity ? Leads to shorter paths.  Route selection – Number of options for selecting next hops. Leads to shorter, reliable paths.

DHT Algorithms Analysis The table summarizes the geometries & algorithms. We will examine the metric flexibility in these two aspects  Flexibility in neighbor selection  Flexibility in route selection GeometryAlgorithm TreePRR HypercubeCAN ButterflyViceroy RingChord XORKademlia HybridPastry root 0 0001 1 1011 010110 011111 000100 001101 0 2 4 6 7 5 1 3 root 0 0001 1 1011

Tree Geometry root 0 0001 1 1011 PRR uses tree geometry. Distance between two nodes is the depth of the binary tree (Well-balanced tree : log N) Node selection flexibility - has 2 (i-1) options of choosing neighbor at distance i. No routing flexibility Height = 1 Height = 2 Leafset

Hypercube Geometry 010110 011111 000100 001101 CAN uses a d-torus hypercube. Each node has log n neighbor. Routing greedily by correcting bits in any order. Neighbors differ by exactly one bit. No flexibility in choosing neighbors. Routing from source to destination at log n distance. First node has log n next hop choices, second hop has log (n – 1) choices. Hence (log n)! choices

Butterfly Geometry Viceroy uses butterfly geometry. Nodes organized in a series of log n “stages” where all the nodes at stage i are capable of correcting the i th bit. Routing consists of 3 phases. Done in O(log N) hops No flexibility in route selection and neighbor selection.

Ring Geometry Chord uses the Ring Maintain log n neighbors and routes to arbitrary destination in log n hops. Routing in O(log n) hops Flexibility in neighbor selection, has 2 (i-1) possible options to pick its i th neighbor An approx of n log n / 2 possible routing tables for each node Yields (log n)! possible routes to route from a source to destination of distance log n. 0 2 4 6 7 5 1 3

Ring Geometry 000 101 100 011 010 001 110 111 110 To route from 000 to 110, we have two routes.  Route to 100 and then to 110.  Route to 010 and then to 110.

XOR Kademlia uses XOR Geometry. Distance between nodes is XOR of their identifier. Node has 2 (i-1) options of choosing neighbor at i th distance. Yields approx n log n / 2 entries per routing table. Route flexibility by fixing lower order bits before fixing the higher bits if an optimal path is not available. May result in longer distances as as the lower order bits fixed need not be preserved by later routing.

Hybrid Pastry is a hybrid. Its nodes are regarded as both leaves of a binary tree and points to a one-dimensional circle. Distance between nodes is either the tree distance and cyclic distance between nodes Node has 2 (i-1) options of choosing neighbor at distance i. Yields approx n ((log n) / 2) entries per routing table. Route selection freedom – allowed to take hops on the ring – these paths might not retain the O(log n) bound on routes. root 0 0001 1 1011

Flexibility Overview PropertyTreeHypercubeRingButterflyXorHybrid Neighbor selectionn log n / 2 1 1 Route Selection (optimal)1c1(log n) 111 Natural support for sequential neighbors? no yesno Deafult – no Fallback – yes Ring & Hypercube have twice the routing flexibilities than Hybrid & XOR geometries

Resilience Two aspects of robust routing  Static resilience measures how well the algorithm can route in a dynamic environment before the recovery algorithms.  Dynamic recovery measures how quickly states are recovered after failure. Node failure- 30% failure  Tree - 90% routes failed (no route selection flexibility)  Ring, Hypercube – 7% routes failed (most route selection flexibility)  Hybrid, XOR - 20% route failed (half flexibility as ring) Route Selection Flexibility affects static resilience

Path Latency Goal is to minimise end-to-end latency of overlay networks. Two proximity methods are considered.  Proximity Neighbor Selection (PNS) Neighbors are chosen on their proximity.  Proximity Route Selection (PRS) Routes are selected depending on the proximity of the neighbors PNS achieves improvement over PRS which achieves improvement over Plain version. Geometry does not affect performance of PNS / PRS.  Thus it is important to choose a routing algorithm that has a geometry that accommodates PNS.

Local Convergence Does messages sent from two nodes to the same destination converge at a node near the two sources ? Leads to low latencies in the following:  Overlay Multicast  Caching  Server selection Measured by number of exit points in the network.  Best case, only one node sends a message off-domain.

Limitations & Findings Limitations  Author has not considered all geometries  Not considered other factors and performance metrics Findings  Routing geometry is important.  Flexibility is improves resilience & proximity. Why not the RING ?  Great flexibility to choose neighbors and routes. Implement both the proximity methods PNS & PRS.  Highest performance in resilience tests and is as good as other geometry in path lengths and local convergence.

Skip List [PSL1990] Skip list are data structures that can be used in place of balanced trees. Uses probabilistic balancing techniques hence algorithms are simpler and faster. Described as a sorted linked list in which some nodes are supplemented with pointers that skip over many list elements. HDR 292327 525 16 29 NIL

Perfect Skip List A perfect skip list is one where the height of the i th node is the exponent of the largest power-of-two that divides i. Pointers at level h have length 2 h. A perfect skip list supports searches in O(log N). Because it is expensive to perform insertion and deletions in a perfect skip list, a probabilistic balanced skip list is proposed by consulting a random number generator. HDR 292327 525 16 29 NIL Height is 2 : (2 2 ) Height is 3 : (2 3 ) Level 2 pointer skips over 2 2 nodes

Examples HDRNIL Add Node 10 (height is 1 chose randomly) HDRNIL 10 Add Node 5 (height is 0 chose randomly) HDRNIL 10 5 Add Node 8 (height is 2 chose randomly) HDRNIL 10 5 8 Add Node 12 (height is 0 chose randomly) HDRNIL 5 10 8 12 Add Node 2 (height is 0 chose randomly) HDRNIL 5 10 8 122

Search Skip List HDR 292327 525 16 29 NIL Search for Node 30. From HDR to Node 29. Then stop and search fails. (illustrated) Search for Node 23. From HDR to Node 16. Drop two levels, From Node 16 to Node 23. Found. Search for Node 27. From HDR to Node 16. Drop one level, From Node 16 to Node 25. Drop one level, from Node 25 to Node 27. Found.

Skip List Worst case performance when significantly unbalanced. Space efficient. Can use 1.33 pointers per element. Maintains a O(log N) searches with high probability. Comparison with AVL, recursive 2-3 & self adjust trees  Skip List performs more comparison than other methods.  Skip List is slightly slower than AVL trees in searches, but insertions and deletions in a skip list are faster Skip Lists are faster than self adjusting tree when a uniform distribution is encountered, but slower for highly skewed distributions

SkipNet Introduction [SNL2003] In DHTs, we cannot control where the data will be stored  Data might be stored far away from the administrative domain and thus hard to administer privileges. – Can we adapt ?  Gives rise to Denial of service attacks and traffic analysis. Solution : Use SkipNet - scalable overlay network that provides controlled data placement and guarantee routing locality by organizing data by string names  Content can be placed on pre-defined node or distributed uniformly across nodes of a hierarchical naming subtree.

Motivation Disadvantages of Chord, CAN, Tapestry, Pastry:  No Content locality: Explicitly place data on a specific overlay nodes or distribute it across nodes in a specified domain. Cannot be prone to traffic analysis & Denial of service attacks  No Path locality: Guarantees that routing path between two overlay nodes in a domain does not leave the domain. Additional security – the traffic does not passed on to other domain which could be its competitor. SkipNet provides both content & path locality.

How does SkipNet do it? Employs a string name and numeric ID space.  Node names and content identifier string mapped into name ID  Hashes of the node names and content identifiers mapped into the numeric ID. By arranging content in name ID order rather than dispersing it, we can achieve content & path locality.

Advantages of locality Improved availability  data stored within organisation and can search even if the network disjoints.  Resilience against Internet failures. Nodes within a cluster gracefully survives failures that disconnect clusters from the rest of the Internet (useful property of SkipNet) Performance  Searches are faster as data is stored near nodes. Manageability  facilitates control and maintenance in an administrative domain Security  Can deal with traffic analysis & denial of service attacks.

SkipNet Structure Adapts the skip list structure  Traversals start from any node  State and processing costs should be the same for all nodes  We use a Ring & doubly linked list. Other enhancements.  Each node also stored 2 log N pointers rather than a high variable number of pointers. SkipNet  Perfect : Pointers at level h point to nodes that are exactly 2 h nodes to the left and right.  Probabilistic : A node in level h probabilistically determines which ring it belongs to.

SkipNet Structure Level 2TT 1MX 0DZ SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown. A D M O T Z X V Level 2DD 1ZO 0XT 000001 010 011 100 101110 111

SkipNet Structure Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring Level L = 0 L = 1 L = 2 L = 3 The full SkipNet routing infrastructure for an 8 node system, including the ring labels.

Routing By Name ID Similar to search in Skip Lists  Message routed from highest level pointer in either clockwise / counter clockwise direction with name ID that are not past the destination value.  Terminates when messages arrives at a node whose name ID is closest to destination.  Because nodes are doubly linked, scheme routes either to left or right pointers depending on name ID’s.  Number of hops is O(log N)

Example Routing a message from Node A to Node V Path:  A (Level 2, clockwise)  T, “T” < “V”  T (Level 2, clockwise)  Failed  T (Level 1, clockwise)  Failed  T (Level 0, clockwise)  V. (Destination) Level 2TT 1MX 0DZ A D M O T Z X V 2DD 1ZO 0XT 000001 010 011 100 101110 111 Level 2AA 1XM 0VO

Routing Algorithm SendMsg(nameID, msg) { if( LongestPrefix(nameID,localNode.nameID)==0 ) msg.dir = RandomDirection(); else if( nameID<localNode.nameID ) msg.dir = counterClockwise; else msg.dir = clockwise; msg.nameID = nameID; RouteByNameID(msg); } // Invoked at all nodes (including the source and // destination nodes) along the routing path. RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg); }

Routing By Numeric ID Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit. Messages forwarded from ring in level h, R h, to a ring in level h+1, R h+1, such that nodes in R h+1 share h+1 digits with destination numeric ID. Terminates when  Deliver message to node with numeric ID = key  If none of the nodes in R h share h+1 digits with destination numeric ID then we pick node with numeric ID that is closest to destination’s numeric ID. Number of message hops is O(log N),

Routing By Numeric ID E.g. Let Z = 1000, O = 1001. Route from A  1011. Path: A(0000)  D (1100 – move up level)  O (1001 – move up level)  Z (1000)  O (1001 – closest match for 1011) (deliver). Ring 0000 Ring 0001 Ring 0100 Ring 0101 Ring 1000 Ring 1001 Ring 1100 Ring 1101 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring …………………. O

Routing Algorithm // Invoked at all nodes (including the source and destination nodes) along the routing path. // Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false RouteByNumericID(msg) { if (msg.numID == localNode.numID || msg.finalDestination) { DeliverMessage(msg.msg); return; } if (localNode == msg.startNode) { // Done traversing current ring. msg.finalDestination = true; SendToNode(msg.bestNode); return; } h = CommonPrefixLen(msg.numID, localNode.numID); if (h > msg.ringLvl) { // Found a higher ring. msg.ringLvl = h; msg.startNode = msg.bestNode = localNode; } else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) { // Found a better candidate for current ring. msg.bestNode = localNode; } // Forward along current ring. nbr = localNode.RouteTable[clockWise][msg.ringLvl]; SendToNode(nbr); }

Benefits Skip Net support routing with the same data structure by  name ID  numeric ID Bottom ring is sorted by name ID and top rings are sorted by numeric ID. For a given node, the SkipNet rings to which it belongs to precisely form a Skip List that is a ring & double linked.

Node Joins & Departure Node Joins  A New node finds top level ring that matches its numeric ID.  Finds a neighbor in the top ring using name Id search.  Starting from one of the neighbors, it searches for its name ID at the next lower level and thus finds neighbors at lower level.  Repeated until it reaches root.  The existing nodes only point to the new node only after it has joined the root ring.  Insertion traverse O(log N) hops with high probability Node Departure  Can route correctly as long as root level ring is maintained. Other levels regarded as optimization hints and it maintains upper-ring membership thru background repair process.

Example Join - Insert node O (101)  Search by numeric ID 101 Highest attainable level is 2 O joins ring containing Z at level 2 Z forwards join message to D at next lower level 1  Proceed by searching by name ID in next lower levels D, V are neighbors in level 1 M, T are neighbors in level 0

Properties of SkipNet Content & Path Locality  Naming nodes like a DNS entry. Path locality for groups in which nodes share a single DNS suffix. E.g. reversing DNS names: john.microsoft.com becomes com.microsoft.john  Incorporating node name ID into content name gurantees that the content will be hosted on that node. E.g. com.microsoft.john/doc-name Constrained Load Balancing  Stored using two parts – a CLB Domain and CLB suffix For example a doc using the name msn.com/DataCenter!TopStories.html.  Searching node Search for node in the CLB Domain using name ID search. Then search by numeric ID for the hash of the CLB suffix constrained by domain ID. Search is constrained by a nameID prefix, we use the double link list. This type of search affect the performance by a factor of 2.  Performed over a naming subtree but not over arbitrary subset of nodes.

Properties of SkipNet Fault tolerance:  Only need to maintain correct neighbors at Level 0 Each node has 16 neighbors at Level 0. Level 0 repaired easily by contacting life nodes. Employs background stabilization mechanisms when failure  Failure across organizational boundaries only segments the overlay. Gracefully survives. Security:  Nodes cannot create global names containing suffix of registered domains.  Path locality avoids traffic analysis  However, outbound traffic still prone to analysis easily. Range queries:  Ability to perform queries over contiguous ring segments.

Enhancements Use Sparse & Dense Routing Table  Use a density parameter k & a non-binary random digit to the base k for numeric ID. Duplicate pointer elimination  Remove duplicate pointers in the routing table. 25% improvements can be achieved. Incorporate Network proximity for routing by name id  Introduce a P-table for proximity routing. The goal of P-table is to maintain routing in O(log ) hops.  Ensures that each hop has low latency. Keeps track of the network distance that are close to itself.

Enhancements Incorporate Network proximity for routing by numeric id  Add a C-table to incorporate network proximity when searching by numeric ID.  Keeps track of nodes that are close and within CLB domain.

Design Alternative IP routing & DNS o Content placement by routing using IP and DNS lookup. Single Overlay Network o Content locality, we name node with the hash of the data’s object’s name. Requires separate routing table for each object o Use 2 part naming scheme –content name consist of node addresses concatenated with node-relative names. Does not support guaranteed path locality o Add constraints to message to limit path locality. However prevents routing from being consistent. o Use a 2 part segments, use numeric ID and name ID like SkipNet. Result is a static form of constrained load balancing.

Design Alternative Multiple overlay network o Multiple overlays with membership could be considered. o Requires that access to other overlays are by gateways. o Access to data is constrained and load balanced within a single overlay not accessible to clients outside except via gateways. SkipNet provides explicit content placement, allows clients to dynamically define new DHTs over any name prefix scope and guarantees path locality within shared name prefix within a single infrastructure.

Experiments The author run experiments against the following:  Basic SkipNet using only R-Table  Full SkipNet using R-Table, P-Table, C-Table.  Pastry  Chord We use the following lookup performance metrics  Relative Delay Penalty (RDP) - latency of overlay path compare to IP  Physical network hops - length of the overlay path measured in IP hops  Number of failed lookups Other metrics (refer to paper)  Format of node name  Organisation size  Models for distribution of nodes and data  Using host or organisation generated node name  Simulation of domain isolation by failing organization’s link

Experiment Results Basic routing costs  Full SkipNet and Pastry are locality aware while basic SkipNet and Chord are not. Hence performed better.  Non-uniform distribution of data does not affect performance. Routing Entries per Node Locality of Placement  Measures physical network hops.  Chord and Pastry have constant physical hops because they are oblivious to locality of data since they diffuse data throughout network.  SkipNet shows performance improvements as the locality of the data references increased. ChordBasic SkipNetFull SkipNetPastry 16.341.7102.263.2

Experiment Results Fault Tolerance – when organisation disconnected  Locality improves fault tolerance.  Chord, Pastry fails totally for local lookups at data diffused  SkipNet functions and does local lookups Constrained Load Balancing (within a domain)  Studies the Relative Delay Penalty (RDP) as node increases  Basic CLB using R-Table cause higher delays penalties  Full CLB causes intermediate delays penalties  Pastry has low delay penalties. Network proximity  Study the effect of RDP over density k which control P-Table entries.  We notice that RDP levels off after k=8 because of the increase of pointers in P-Table

SkipNet Summary SkipNet is the first p2p system that achieves both path and content locality. Provides content locality at desired degree and granularity. Clustering node names allows SkipNet to perform gracefully in face of linkages failure. Performance is similar to other p2p systems such as Chord and Pastry under uniform access patter. Under access patterns where intra-organisation traffic predominates, SkipNet performs better. SkipNet is also more resilience to network partitions than other p2p.

Conclusion Looked at hashed based techniques in P2P  Pastry  P-Grid Two important issues  Load balancing  Neighbor table consistency preserving Comparison of DHT techniques SkipNet – A Skip List Adaption

References [CAN2001] Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott Shenke. A Scalable Content-Addressable Network. SIGCOMM’01, August 27- 31, 2001. [CPLS2001] Ion Stoica Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Chord: A Scalable Peertopeer Lookup Service for InternetApplications. SIGCOMM’01, August 27-31, 2001. [CSWH2000] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anonymous information storage and retrieval system”, Proc. of ICSI Workshop on Design Issues in Anonymity and Unobservability, 2000. [DRGR2003] K. Gummadi, R. Gummadiy, S. Gribble, S. Ratnasamy, S. Shenker, I. Stoicak, The Impact of DHT Routing Geometry on Resilience and Proximity, SIGCOMM’03, August 25–29, 2003. [LL2004a] S. S. Lam and h. Liu. Failure recovery for structured P2P networks: Protocol design and performance evaluation. In Proc. Of ACM SIGMETRICS, June 2004. [LL2004b] Consistency-preserving Neighbor Table Optimization for P2P Networks, Technical Report TR-04-01, Dept. of CS, Univ. of Texas at Austin, January 2004.

References (cont.) [GLSKS2004] Load Balancing in Dynamic Structured P2P Systems, Proc. of IEEE INFOCOM, Portland, Oregon, USA, 2004. [PSL1990] William Pugh. Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM, June 1990 supported by an AT&T Bell Labs Fellowship and by NSF grant CCR–8908900. [RD2001] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale pear-to-per systems”. In Proc. of the 18 th IFIP/ACM International Conf. on Distributed Systems Platforms, November 2001. [SMKKB2001] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc. Of SIGCOMM ’01, San Diego, California, USA [SML+2004] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, Proc. of the 2001 ACM Annual Conference of the Special Interest Group on Data Communication (ACM SIGCOMM’01), 2001. [SNL2003] Nicholas J.A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties. Proceedings of the Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA. March 2003

Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems.

Similar presentations

Presentation on theme: "Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems.

Similar presentations

Presentation on theme: "Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems."— Presentation transcript:

Similar presentations

About project

Feedback