2Unstructured vs Structured Unstructured P2P networks allow resources to be placed at any node. The network topology is arbitrary, and the growth is spontaneous.Structured P2P networks simplify resource location and load balancing by defining a topology and defining rules for resource placement.Guarantee efficient search for rare objectsWhat are the rules???Distributed Hash Table (DHT)
3Hash Tables Store arbitrary keys and satellite data (value) put(key,value)value = get(key)Lookup must be fastCalculate hash function h() on key that returns a storage cellChained hash table: Store key (and optional value) there
4What Is a DHT? Single-node hash table: key = Hash(name)put(key, value)get(key) -> valueHow do I do this across millions of hosts on the Internet?Distributed Hash Table
5Distributed application Distributed hash table DHTDistributed applicationput(key, data)get (key)dataDistributed hash table(DHash)lookup(key)node IP address(Chord)Lookup servicenode….Application may be distributed over many nodesDHT distributes data storage over many nodes
6Why the put()/get() interface? API supports a wide range of applicationsDHT imposes no structure/meaning on keys
7Distributed Hash Table Hash table functionality in a P2P network : lookup of data indexed by keysKey-hash node mappingAssign a unique live node to a keyFind this node in the overlay network quickly and cheaplyMaintenance, optimizationLoad balancing : maybe even change the key-hash node mapping on the flyReplicate entries on more nodes to increase robustness
9The lookup problem N2 N1 N3 ? N4 N6 N5 Key=“title” Value=MP3 data… Internet?ClientPublisher1000s of nodes.Set of nodes may change…Lookup(“title”)N4N6N5
10Centralized lookup (Napster) SetLoc(“title”, N4)N3ClientDBN4Lookup(“title”)Key=“title”Value=MP3 data…N8N9N7N6O(N) state means its hard to keep the state up to date.Simple, but O(N) state and a single point of failure
11Flooded queries (Gnutella) Lookup(“title”)N3ClientN4Key=“title”Value=MP3 data…N6N8N7N9Robust, but large number of messages per lookup
12Routed queries (Chord, Pastry, etc.) N2N1N3ClientN4Lookup(“title”)PublisherKey=“title”Value=MP3 data…N6N8N7Challenge: can we make it robust? Small state? Actually find stuff in a changing system?Consistent rendezvous point, between publisher and client.N9
13Routing challenges Define a useful key nearness metric Keep the hop count smallKeep the tables smallStay robust despite rapid changeFreenet: emphasizes anonymityChord: emphasizes efficiency and simplicity
14What is Chord? What does it do? In short: a peer-to-peer lookup serviceSolves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departuresCore operation in most p2p systems is efficient location of data itemsSupports just one operation: given a key, it maps the key onto a node
15Chord Characteristics Simplicity, provable correctness, and provable performanceEach Chord node needs routing information about only a few other nodesResolves lookups via messages to other nodes (iteratively or recursively)Maintains routing information as nodes join and leave the system
16Mapping onto Nodes vs. Values Traditional name and location services provide a direct mapping between keys and valuesWhat are examples of values? A value can be an address, a document, or an arbitrary data itemChord can easily implement a mapping onto values by storing each key/value pair at node to which that key maps
17Napster, Gnutella etc. vs. Chord Compared to Napster and its centralized servers, Chord avoids single points of control or failure by a decentralized technologyCompared to Gnutella and its widespread use of broadcasts, Chord avoids the lack of scalability through a small number of important information for routing
18Addressed Difficult Problems (1) Load balance: distributed hash function, spreading keys evenly over nodesDecentralization: chord is fully distributed, no node more important than other, improves robustnessScalability: logarithmic growth of lookup costs with number of nodes in network, even very large systems are feasible
19Addressed Difficult Problems (2) Availability: chord automatically adjusts its internal tables to ensure that the node responsible for a key can always be foundFlexible naming: no constraints on the structure of the keys – key-space is flat, flexibility in how to map names to Chord keys
20Chord properties Efficient: O(log(N)) messages per lookup N is the total number of serversScalable: O(log(N)) state per nodeRobust: survives massive failures
21The Base Chord Protocol (1) Specifies how to find the locations of keysHow new nodes join the systemHow to recover from the failure or planned departure of existing nodes
22Consistent HashingHash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1ID(node) = hash(IP, Port)ID(key) = hash(key)Properties of consistent hashing:Function balances load: all nodes receive roughly the same number of keysWhen an Nth node joins (or leaves) the network, only an O(1/N) fraction of the keys are moved to a different location
23Chord IDs Both exist in the same ID space Key identifier = SHA-1(key)Node identifier = SHA-1(IP, Port)Both are uniformly distributedBoth exist in the same ID spaceHow to map key IDs to node IDs?By “key” I usually mean “key identifier”Explain IResults
24Chord IDsconsistent hashing (SHA-1) assigns each node and object an m-bit IDIDs are ordered in an ID circle ranging from 0 – (2m-1).New nodes assume slots in ID circle according to their IDKey k is assigned to first node whose ID ≥ k successor(k)
25Consistent hashing K5 N105 K20 Circular 7-bit ID space N32 N90 K80 Key 5K5Node 105N105K20Circular 7-bitID spaceN32Ids live in a single circular space.N90K80A key is stored at its successor: node with next higher ID
27Node Joins and Departures 6642651371successor(6) = 7successor(1) = 321
28Consistent Hashing – Join and Departure When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n.When node n leaves the network, all of its assigned keys are reassigned to n’s successor.
31Scalable Key LocationA very small amount of routing information suffices to implement consistent hashing in a distributed environmentEach node need only be aware of its successor node on the circleQueries for a given identifier can be passed around the circle via these successor pointers
32A Simple Key Lookup Pseudo code for finding successor: // ask node n to find the successor of idn.find_successor(id)if (id (n, successor])return successor;else// forward the query around the circlereturn successor.find_successor(id);
33Scalable Key Location Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!
34Acceleration of Lookups Lookups are accelerated by maintaining additional routing informationEach node maintains a routing table with (at most) m entries (where N=2m) called the finger tableith entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circles = successor(n + 2i-1) (all arithmetic mod 2m)s is called the ith finger of node n, denoted by n.finger(i).node
37Finger Tables - characteristics Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes farther awayA node’s finger table generally does not contain enough information to determine the successor of an arbitrary key kRepetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually
38Node Joins – with Finger Tables keysstartint.succ.6124[1,2)[2,4)[4,0)1364265137finger tablestartint.succ.keys1235[2,3)[3,5)[5,1)6finger tablestartint.succ.keys72[7,0)[0,2)[2,6)3finger tablekeysstartint.succ.2457[4,5)[5,7)[7,3)66
39Node Departures – with Finger Tables keysstartint.succ.124[1,2)[2,4)[4,0)13364265137finger tablekeysstartint.succ.1235[2,3)[3,5)[5,1)36finger tablekeysstartint.succ.672[7,0)[0,2)[2,6)3finger tablekeysstartint.succ.2457[4,5)[5,7)[7,3)6
41Scalable Lookup Scheme N1Finger Table for N8N56N8N8+1N14N8+2N8+4N8+8N21N8+16N32N8+32N42N51finger 6finger 1,2,3N48N14finger 5N42finger 4N38N21N32finger [k] = first node that succeeds (n+2k-1)mod2m
42Chord key locationLookup in finger table the furthest node that precedes key-> O(log n) hops
43Scalable Lookup Scheme // ask node n to find the successor of idn.find_successor (id)n0 = find_predecessor(id);return n0.successor;// ask node n to find the predecessor of idn.find_predecessor (id)n0 = n;while (id not in (n0, n0.successor] )n0 = n0.closest_preceding_finger(id);return n0;// return closest finger preceding idn.closest_preceding_finger (id)for i = m downto 1if (finger[i].node belongs to (n, id))return finger[i].node;return n;
45Scalable Lookup Scheme Each node forwards query at least halfway along distance remaining to the targetTheorem: With high probability, the number of nodes that must be contacted to find a successor in a N-node network is O(log N)
46“Finger table” allows log(N)-time lookups 1/8Small tables, but multi-hop lookup. Table entries: IP address and Chord ID.Navigate in ID space, route queries closer to successor. Log(n) tables, log(n) hops.Route to a document between ¼ and ½ …1/161/321/641/128N80
47Dynamic Operations and Failures Need to deal with:Node Joins and StabilizationImpact of Node Joins on LookupsFailure and ReplicationVoluntary Node Departures
48Node Joins and Stabilization Node’s successor pointer should be up to dateFor correctly executing lookupsEach node periodically runs a “Stabilization” ProtocolUpdates finger tables and successor pointers
49Node Joins and Stabilization Contains 6 functions:create()join()stabilize()notify()fix_fingers()check_predecessor()
50Create() Creates a new Chord ring n.create() predecessor = nil; successor = n;
51Join() Asks m to find the immediate successor of n. Doesn’t make rest of the network aware of n.n.join(m)predecessor = nil;successor = m.find_successor(n);
52Stabilize() Called periodically to learn about new nodes Asks n’s immediate successor about successor’s predecessor pChecks whether p should be n’s successor insteadAlso notifies n’s successor about n’s existence, so that successor may change its predecessor to n, if necessaryn.stabilize()x = successor.predecessor;if (x (n, successor))successor = x;successor.notify(n);
53Notify() m thinks it might be n’s predecessor n.notify(m) if (predecessor is nil or m (predecessor, n))predecessor = m;
54Fix_fingers()Periodically called to make sure that finger table entries are correctNew nodes initialize their finger tablesExisting nodes incorporate new nodes into their finger tablesn.fix_fingers()next = next + 1 ;if (next > m)next = 1 ;finger[next] = find_successor(n + 2next-1);
55Check_predecessor()Periodically called to check whether predecessor has failedIf yes, it clears the predecessor pointer, which can then be modified by notify()n.check_predecessor()if (predecessor has failed)predecessor = nil;
56TheoremIf any sequence of join operations is executed interleaved with stabilizations, then at some time after the last join the successor pointers will form a cycle on all nodes in the network
57Stabilization Protocol Guarantees to add nodes in a fashion to preserve reachability
58Impact of Node Joins on Lookups If finger table entries are reasonably currentLookup finds the correct successor in O(log N) stepsIf successor pointers are correct but finger tables are incorrectCorrect lookup but slowerIf incorrect successor pointersLookup may fail
59Impact of Node Joins on Lookups PerformanceIf stabilization is completeLookup can be done in O(log N) timeIf stabilization is not completeExisting nodes finger tables may not reflect the new nodesDoesn’t significantly affect lookup speedNewly joined nodes can affect the lookup speed, if the new nodes ID’s are in between target and target’s predecessorLookup will have to be forwarded through the intervening nodes, one at a time
60TheoremIf we take a stable network with N nodes with correct finger pointers, and another set of up to N nodes joins the network, and all successor pointers (but perhaps not all finger pointers) are correct, then lookups will still take O(log N) time with high probability
61Source of Inconsistencies: Concurrent Operations and Failures Basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookupsThose successor pointers can then be used to verify the finger table entriesEvery node runs stabilize periodically to find newly joined nodes
62Stabilization after Join nsn joinspredecessor = niln acquires ns as successor via some n’n notifies ns being the new predecessorns acquires n as its predecessornp runs stabilizenp asks ns for its predecessor (now n)np acquires n as its successornp notifies nn will acquire np as its predecessorall predecessor and successor pointers are now correctfingers still need to be fixed, but old fingers will still workpred(ns) = nnnilsucc(np) = nspred(ns) = npsucc(np) = nnp
64Node joins and stabilization N26 joins the systemN26 aquires N32 as its successorN26 notifies N32N32 aquires N26 as its predecessor
65Node joins and stabilization N26 copies keysN21 runs stabilize() and asks its successor N32 for its predecessor which is N26.
66Node joins and stabilization N21 aquires N26 as its successorN21 notifies N26 of its existenceN26 aquires N21 as predecessor
67Failure RecoveryKey step in failure recovery is maintaining correct successor pointersTo help achieve this, each node maintains a successor-list of its r nearest successors on the ringIf node n notices that its successor has failed, it replaces it with the first live entry in the liststabilize will correct finger table entries and successor-list entries pointing to failed nodePerformance is sensitive to the frequency of node joins and leaves versus the frequency at which the stabilization protocol is invoked
68Impact of node joins on lookups All finger table entries are correct => O(log N) lookupsSuccessor pointers correct, but fingers inaccurate => correct but slower lookups68
69Impact of node joins on lookups Stabilization completed => no influence on performenceOnly for the negligible case that a large number of nodes joins between the target‘s predecessor and the target, the lookup is slightly slowerNo influence on performance as long as fingers are adjusted faster than the network doubles in size69
70Failure of nodes Correctness relies on correct successor pointers What happens, if N14, N21, N32 fail simultaneously?How can N8 aquire N38 as successor?70
71Failure of nodes Correctness relies on correct successor pointers What happens, if N14, N21, N32 fail simultaneously?How can N8 aquire N38 as successor?71
72Voluntary Node Departures Can be treated as node failuresTwo possible enhancementsLeaving node may transfers all its keys to its successorLeaving node may notify its predecessor and successor about each other so that they can update their links
73Chord – factsEvery node is responsible for about K/N keys (N nodes, K keys)When a node joins or leaves an N-node network, only O(K/N) keys change hands (and only to and from joining or leaving node)Lookups need O(log N) messagesTo reestablish routing invariants and finger tables after node joining or leaving, only O(log2N) messages are required
74Pastry Self-organizing overlay network of nodes With high probability, nodes with adjacent nodeId are diverse in geography, ownership, jurisdiction, network attachment, etc…Pastry takes into account network locality (e.g. IP routing hops).- nodeId is assigned in a random uniform distribution
75PastryInstead of organizing the id-space as a Chord-like ring, the routing is based on numeric closeness of identifiersThe focus is not only on the no. of routing hops, but also on network locality – as factors in routing efficiency
76Pastry Identifier space: Nodes and data items are uniquely associated with m-bit ids – integers in the range (0 – 2m -1) – m is typically 128Pastry views ids as strings of digits to the base 2b where b is typically chosen to be 4A key is located on the node to whose node id it is numerically closest
77Routing GoalPastry routes messages to the node whose nodeId is numerically closest to the given key in less than log2b (N) steps:“A heuristic ensures that among a set of nodes with the k closest nodeIds to the key, the message is likely to first reach a node near the node from which the message originates, in term of the proximity metric”
78Routing InformationPastry’s node state is divided into 3 main elementsThe routing table – similar to Chord’s finger table – stores links to id-spaceThe leaf set contains nodes which are close in the id-spaceNodes that are closed together in terms of network locality are listed in the neighbourhood set- nodeId is assigned in a random uniform distribution
79Routing TableA Pastry node’s routing table is made up of m/b (log2b N) rows with 2b -1 entries per rowOn node n, entries in row i hold the identities of Pastry nodes whose node-id share an i-digit prefix with n but differ in digit n itselfFor ex, the first row is populated with nodes that have no prefix in common with nWhen there is no node with an appropriate prefix, the corresponding entry is left emptySingle digit entry in each row shows the corresponding digit of the present node’s id – i.e. prefix matches the current id up to the given value of p – the next row down or leaf set should be examined to find a route.- nodeId is assigned in a random uniform distribution
80Routing TableRouting tables (RT) thus built achieve an effect similar to Chord finger tableThe detail of the routing information increases with the proximity of other nodes in the id-spaceWithout a large no. of nearby nodes, the last rows of the RT are only sparsely populated – intuitively, the id-space would need to be fully exhausted with node-ids for complete RTs on all nodesIn populating the RT, there is a choice from the set of nodes with the appropriate id-prefixDuring the routing process, network locality can be exploited by selecting nodes which are close in terms of proximity ntk. metric- nodeId is assigned in a random uniform distribution
81Leaf SetThe Routing tables sort node ids by prefix. To increase lookup efficiency, the leaf set L of nodes holds the |L| nodes numerically closest to n (|L|/2 smaller and |L|/2 larger, L = 2 or 2 × 2b, normally)The RT and the leaf set are the two sources of information relevant for routingThe leaf set also plays a role similar to Chord’s successor list in recovering from failures of adjacent nodes- nodeId is assigned in a random uniform distribution
82Neighbourhood SetInstead of numeric closeness, the neighbourhood set M is concerned with nodes that are close to the current node with regard to the network proximity metricThus, it is not involved in routing itself but in maintaining network locality in the routing information- nodeId is assigned in a random uniform distribution
83Pastry Node State (Base 4) LNodes that are numerically closer to the present Node (2b or 2x2b entry)RCommon prefix with next digit-rest of NodeId (log2b (N) rows, 2b-1 columns)MNodes that are closest according to the proximity metric (2b or 2x2b entry)Leaf set L is the set of nodes with the L/2 numerically closet larger nodeIds and the L/2 numerically closet smaller nodeIds.Neighborhood set M contains info about the M nodes that are closest in proximity metric to the local node.
84Routing Key D arrives at nodeId A Ril enetry in routing table at column i and row lLi i-th closest nodeId in leaf setDl value of the l’s digit in the key Dshl(A,B) length of the prefix shared in digits
85Routing Routing is divided into two main steps: First, a node checks whether the key K is within the range of its leaf setIf it is the case, it implies that K is located in one of the nearby nodes of the leaf set. Thus, the node forwards the query to the leaf set node numerically closest to K. In case this is the node itself, the routing process is finished.
86RoutingIf K does not fall within the range of the leaf set, the query needs to be forwarded over a large distance using the routing tableIn this case, a node n tries to pass the query on to a node which shares a longer common prefix with K than n itselfIf there is no such entry in the RT, the query is forwarded to a node which shares a prefix with K of the same length as n but which is numerically close to K than n
87RoutingThis scheme ensures that routing loop do not occur because the query is routed strictly to a node with a longer common identifier prefix than the current node, or to a numerically closer node with the same prefix
88Routing performanceRouting procedure converges, each step takes the message to node that either:Shares a longer prefix with the key than the local nodeShares as long a prefix with, but is numerically closer to the key than the local node.
89Routing performanceAssumption: Routing tables are accurate and no recent node failuresThere are 3 cases in the Pastry routing scheme:Case 1: Forward the query (according to the RT) to a node with a longer prefix match than the current node.Thus, the no. of nodes with longer prefix matches is reduced by at least a factor of 2b in each step, so the destination is reached in log2b N steps.
90Routing performance There are 3 cases: Case 2: The query is routed via leaf set (one step). This increases the no. of hop by one
91Routing performance There are 3 cases: Case 3: The key is neither covered by the leaf set nor does the RT contains an entry with a longer matching prefix than the current nodeConsequently, the query is forwarded to a node with the same prefix length, adding an additional routing hop.For a moderate leaf set size ( |L| = 2 × 2b), the probability of this case is less than 0.6%. So, it is very unlikely that more than one additional hop is incurred.
92Routing performanceAs a result, the complexity of routing remains at O(log2b N) on averageHigher values of b leads to fast routing but also increases the amount of state that needs to managed at each nodeThus, b is typically 4 but Pastry implementation can choose an appropriate trade-off for specific application
93Join and Failure Join Error correction Use routing to find numerically closest node already in networkAsk state from all nodes on the route and initialize own stateError correctionFailed leaf node: contact a leaf node on the side of the failed node and add appropriate new neighborFailed table entry: contact a live entry with same prefix as failed entry until new live entry found, if none found, keep trying with longer prefix table entries
94Self Organization: Node Arrival The new node n is assumed to know a nearby Pastry node k based on the network proximity metricNow n needs to initialize its RT, leaf set and neighbourhood set.Since K is assumed to be close to n, the nodes in K’s neghbourhood set are reasonably good choices for n, too.Thus, n copies the neighbourhood set from K.
95Self Organization: Node Arrival To build its RT and leaf set, n routes a special join message via k to a key equal to nAccording to the standard routing rules, the query is forwarded to the node c with the numerically closest id and hence the leaf set of c is suitable for n, so it retrieves c’s leaf set for itself.The join request triggers all nodes, which forwarded the query towards c, to provide n with their routing information.
96Self Organization: Node Arrival Node n’s RT is constructed from the routing information of these nodes starting at row 0.As this row is independent of the local node id, n can use these entries at row zero of k’s routing tableIn particular, it is assumed that n and k are close in terms of network proximity metricSince k stores nearby nodes in its RT, these entries are also close to n.In the general case of n and k not sharing a common prefix, n cannot reuse entries from any other row in K’s RT.
97Self Organization: Node Arrival The route of the join message from n to c leads via nodes v1, v2, … vn with increasingly longer common prefixes of n and viThus, row 1 from the RT of v1 is also a good choice for the same row of the RT of nThe same is true for row 2 on node v2 and so onBased on this information, the RT of n can be constructed.
98Self Organization: Node Arrival Finally, the new node sends its node state to all nodes in its routing data so that these nodes can update their own routing information accordinglyIn contrast to lazy updates in Chord, this mechanism actively updates the state in all affected nodes when a new node joins the systemAt this stage, the new node is fully present and reachable in the Pastry network
99Node FailureNode failure is detected when a communication attempt with another node fails. Routing requires contacting nodes from RT and leaf set, resulting in lazy detection of failuresDuring routing, the failure of a single node in the RT does not significantly delay the routing process. The local node can chose to forward the query to a different node from the same row in the RT. (Alternatively, a node could store backup nodes with each entry in the RT)
100Node FailureTo replace the failed node at entry i in row j of its RT (Rji), a node contacts another node referenced in row jEntries in the same row j of the remote node are valid for the local node and hence it can copy entry Rji from the remote node to its own RTIn case it failed as well, it can probe another node in row j for entry RjiIf no live node with appropriate nodeID prefix can be obtained in this way, the local node queries nodes from the preceding row Rj-1
101Node FailureRepairing a failed entry in the leaf set of a node is straightforward – utilizing the leaf set of other nodes referenced in the local leaf set.Contacts the leaf set of the largest index on the side of the failed nodeIf this node is unavailable, the local node can revert to leaf set with smaller indices
102Node DepartureNeighborhood node: asks other members for their M, checks the distance of each of the newly discovered nodes, and updates its own neighborhood set accordingly.
103Locality“Route chosen for a message is likely to be good with respect to the proximity metric”Discussion:Locality in the routing tableRoute localityLocating the nearest among k nodes
104Locality in the routing table Node A is near XA’s R0 entries are close to A, A is close to X, and triangulation inequality holds entries in X are relatively near A.Likewise, obtaining X’s neighborhood set from A is appropriate.B’s R1 entries are reasonable choice for R1of XEntries in each successive row are chosen from an exponentially decreasing set size.The expected distance from B to any of its R1 entry is much larger than the expected distance traveled from node A to B.Second stage: X requests the state from each of the nodes in its routing table and neighborhood set to update its entries to closer nodes.
105Routing localityEach routing step moves the message closer to the destination in the nodeId space, while traveling the least possible distance in the proximity space.Given that:A message routed from A to B at distance d cannot subsequently be routed to a node with a distance of less than d from AThe expected distance traveled by a message during each successive routing step is exponentially increasing Since a message tends to make larger and larger strides with no possibility of returning to a node within di of any node i encountered on the route, the message has nowhere to go but towards its destination
106Locating the nearest among k nodes Goal:among the k numerically closest nodes to a key, a message tends to first reach a node near the client.Problem:Since Pastry routes primarily based on nodeId prefixes, it may miss nearby nodes with a different prefix than the key.Solution (using a heuristic):Based on estimating the density of nodeIds, it detects when a message approaches the set of k and then switches to numerically nearest address based routing to locate the nearest replica.
107Arbitrary node failures and network partitions Node continues to be responsive, but behaves incorrectly or even maliciously.Repeated queries fail each time since they normally take the same route.Solution: Routing can be randomizedThe choice among multiple nodes that satisfy the routing criteria should be made randomlyIP anomaly cause IP hosts to be unreachable from certain IP hosts but not others.
108CAN : Content Addressable Network Hash value is viewed as a point in a D-dimensional Cartesian spaceHash value points <n1, n2, …, nD>.Each node responsible for a D-dimensional “cube” in the spaceNodes are neighbors if their cubes “touch” at more than just a pointExample: D=21’s neighbors: 2,3,4,66’s neighbors: 1,2,4,5Squares “wrap around”, e.g., 7 and 8 are neighborsExpected # neighbors: O(D)
109CAN : RoutingTo get to <n1, n2, …, nD> from <m1, m2, …, mD>choose a neighbor with smallest Cartesian distance from <m1, m2, …, mD> (e.g., measured from neighbor’s center)e.g., region 1 needs to send to node covering XChecks all neighbors, node 2 is closestForwards message to node 2Cartesian distance monotonically decreases with each transmissionExpected # overlay hops: (DN1/D)/4
110CAN : Join To join the CAN: find some node in the CAN (via bootstrap process)choose a point in the space uniformly at randomusing CAN, inform the node that currently covers the space that node splits its space in half1st split along 1st dimensionif last split along dimension i < D, next split along i+1st dimensione.g., for 2-d case, split on x-axis, then y-axiskeeps half the space and gives other half to joining nodeThe likelihood of a rectangle being selected is proportional to it’s size, i.e., big rectangles chosen more frequently
112CAN: construction Bootstrap node I new node 1) Discover some node “I” already in CAN
113CAN: Join(x,y)Inew node2) Pick random point in space
114CAN: Join(x,y)JInew node3) I routes to (x,y), discovers node J
115CAN: JoinnewJ4) split J’s zone in half… new owns one half
116CAN Failure recovery View partitioning as a binary tree Leaves represent regions covered by overlay nodesIntermediate nodes represents “split” regions that could be “reformed”Siblings are regions that can be merged together (forming the region that is covered by their parent)
117CAN Failure Recovery Failure recovery when leaf S is removed Find a leaf node T that is eitherS’s siblingDescendant of S’s sibling where T’s sibling is also a leaf nodeT takes over S’s region (move to S’s position on the tree)T’s sibling takes over T’s previous region
118Maintenance Use zone takeover in case of failure or leaving of a node Send your neighbor table to neighbors to inform that you are alive at discrete time interval tIf your neighbor does not send alive in time t, takeover its zoneZone reassignment is needed