Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John Kubiatowicz IEEE Journal on Selected Areas in Communications, January 2004

2 Limitations of Plaxton routing 1.Need for global knowledge to construct neighbor table 2.Static architecture - no provision for node addition or deletion 3.Single root for an object is a bottleneck. It is a single point of failure Tapestry provides innovative solutions to some of the bottlenecks of classical Plaxton routing.

3 Tapestry Tapestry is a self-organizing robust scalable wide- area infrastructure for efficient location and delivery of contents in presence of heavy load and node or link failure. It is the backbone of the Oceanstore, a persistent wide-area storage system.

4 Major features of Tapestry Most contemporary P2P systems did not take take network distances into account when constructing their routing overlay; thus, a single overlay hop may span the diameter of the network. In contrast, Tapestry constructs locally optimal routing tables from initialization and subsequently maintains them.

5 Major features of Tapestry DHT based systems fix the number and location of object replicas. In contrast, Tapestry allows applications to place objects according to their needs. Tapestry “publishes” location pointers throughout the network to facilitate efficient routing to those objects with low network stretch.

6 Major features of Tapestry Unlike classical Plaxton routing, Tapestry allows node insertion and deletion to support a dynamic environment. These algorithms are fairly efficient.

7 Routing and Location Namespace (both nodes and objects) 160 bits using the hash function SHA-1 Each object has has one or more roots H (ObjectID) = RootID Suffix routing from A to B At h th hop, arrive at nearest node N h that shares suffix with B of length h digits –Example: 5324 routes to 0629 via 5324 --> 2349 --> 1429 --> 7629 --> 0629

8 Tapestry API PUBLISH OBJECT : Publish (make available) object on the local node. UNPUBLISH OBJECT : Best-effort attempt to remove location mappings for an object. ROUTE TO OBJECT : Routes message to location of an object with GUID (globally unique id). ROUTE TO NODE : Route message to application on the exact node.

9 Requirements for Insert and Delete No central directory can be used –No hot spot or single point of failure –Reduced danger/threat of DoS. Must be fast (should contact only a few nodes) Keep objects available

10 Node Insertion N eed-to-know nodes are notified of, since the inserted node fills a null entry in their routing tables. T he new node might become the root for some existing objects. R eferences to those objects must be moved to maintain object availability. T he algorithms must construct a near optimal routing table for the inserted node. Nodes near the inserted node are notified, and such nodes may consider using it in their routing tables as an optimization.

11 Choosing Root in Tapestry Compute H (ObjectID) = RootID. Attempt to route to this node (which should be the root) without knowing if it exists. If it exists, then it becomes the root. But otherwise –Whenever null entry encountered, choose the next “higher” non-null pointer entry (thus, if xx53 does not exist, try xx63), or a secondary entry –If current node S is only non-null pointer in the rest of map, terminate route, and choose S as the root

12 Acknowledged Multicast Algorithm Locates & Contacts all nodes with a given suffix Popular tool: Useful in node insertion. Create a tree based on IDs Starting node knows when all nodes reached 54345 04345 The node then sends to any ?0345, any ?1345, any ?2345, etc. if possible ??345 ?1345 ?4345 04345 & 54345 ?4345 sends to 04345, 54345… if they exist 

13 Three Parts of Insertion 1.Establish pointers from surrogates to new node. 2.Notify the need-to-know nodes 3.Create routing tables & notify other nodes

14 Finding the surrogates The new node sends a join message to a surrogate The primary surrogate multicasts to all other surrogates with similar suffix. Each surrogate establishes a pointer to the new node. When all pointers are established, continue 01234 01334 ????4 ???34 Gate 79334 39334 surrogates new node

15 Need-to-know nodes “Need-to-know” = a node with a hole in neighbor table filled by new node If 01234 is new node, and no 234s existed, must notify ???34 nodes Acknowledged multicast to all matching nodes During this time, object requests may go either to new node or former surrogate, but that’s okay Once done, delete pointers from surrogates.

16 Constructing the Neighbor Table via a nearest neighbor search Suppose we have an algorithm A for finding the three nearest neighbors for a given node. To fill in a slot, apply A to the subnetwork of three nodes that could fill that slot. (For ????1, run A on network of nodes ending in 1, and pick nearest neighbor.)

17 Finding Nearest Neighbor Let G be such that surrogate matches new node in last j digits of node ID A.G sends j-list to new node; new node pings all nodes on j-list. B.If one is closer, goto that node. If not, done with this level, and let j = j-1 and goto A. 01234 01334 61524 32134 11111 j-list is the closest k=O(log n) nodes matching in j digits

18 Is this the nearest node? Yes, with high probability under an assumption Pink circle = ball around new node of radius d(G, new node) Progress = find any node in pink circle Consider the ball around the G containing all its j-list. Two cases: –Black ball contain pink ball; found closest node –High overlap between pink ball and G-ball so unlikely pink ball empty while G-ball has k nodes G, matches in j digits New node

19 Deleting a node 54321 11115 xxx45 xxxx5 In-neighbors 12345 exiting node 11111 xxxx1 out-neighbors

20 Planned Delete Notify its out-neighbors: Exiting node says “I’m no longer pointing to you” To in-neighbors: Exiting node says it is going and proposes at least one replacement. –Exiting node republishes all objects ptrs it stores Objects rooted at exiting node get new roots

21 Republish-On-Delete republish

22 Unplanned Delete Planned delete relied exiting node’s neighbor table. –List of out-neighbors –List of in-neighbors –Closest matching node for each level. Can we reconstruct this information? –Not easily –Fortunately, we probably don’t need to.

23 Lazy Handling of Unplanned Delete A notices B is dead, A fixes its own state –A removes B from routing tables If removing B produces a hole, A must fill the hole, or be sure that the hole cannot be filled— use acknowledged multicast –A republishes all objects with next hop = B. Use republish-on-delete as before

24 4 2 3 3 3 2 2 1 2 4 1 2 3 3 1 3 4 1 1 43 2 4 Tapestry Mesh Incremental suffix-based routing (slide borrowed from the original authors) NodeID 0x43FE NodeID 0x13FE NodeID 0xABFE NodeID 0x1290 NodeID 0x239E NodeID 0x73FE NodeID 0x423E NodeID 0x79FE NodeID 0x23FE NodeID 0x73FF NodeID 0x555E NodeID 0x035E NodeID 0x44FE NodeID 0x9990 NodeID 0xF990 NodeID 0x993E NodeID 0x04FE NodeID 0x43FE

25 Fault detection Soft-state vs. explicit fault-recovery - Soft-state periodic republish is more attractive –Expected additional traffic for periodic republish is low Redundant roots for better resilience –Object names hashed w/ small salts i.e. multiple names/roots –Queries and publishing utilize all roots in parallel

26 Summary of Insertion steps Step 1: Build up N’s routing maps –Send messages to each hop along path from gateway to current node N –The i th hop along the path sends its i th level route table to N – N optimizes those tables where necessary Step 2: Move appropriate data from N’ to N Step 3: Use back pointers from N’ to find nodes which have null entries for N’s ID, tell them to add new entry to N Step 4: Notify local neighbors to modify paths to route through N where appropriate

27 Dynamic Insertion Example borrowed from the original slides NodeID 0x243FE NodeID 0x913FE NodeID 0x0ABFE NodeID 0x71290 NodeID 0x5239E NodeID 0x973FE NEW 0x143FE NodeID 0x779FE NodeID 0xA23FE Gateway 0xD73FF NodeID 0xB555E NodeID 0xC035E NodeID 0x244FE NodeID 0x09990 NodeID 0x4F990 NodeID 0x6993E NodeID 0x704FE 4 2 3 3 3 2 1 2 4 1 2 3 3 1 3 4 1 1 43 2 4 NodeID 0x243FE

Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Similar presentations

Presentation on theme: "Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Similar presentations

Presentation on theme: "Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John."— Presentation transcript:

Similar presentations

About project

Feedback