Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
Implementation and Deployment of a Large-scale Network Infrastructure Ben Y. Zhao L. Huang, S. Rhea, J. Stribling, A. D. Joseph, J. D. Kubiatowicz EECS,
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Rapid Mobility via Type Indirection Ben Y. Zhao, Ling Huang, Anthony D. Joseph, John D. Kubiatowicz Computer Science Division, UC Berkeley IPTPS 2004.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
The Oceanstore Regenerative Wide-area Location Mechanism Ben Zhao John Kubiatowicz Anthony Joseph Endeavor Retreat, June 2000.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Brocade Landmark Routing on Structured P2P Overlays Ben Zhao, Yitao Duan, Ling Huang Anthony Joseph and John Kubiatowicz (IPTPS 2002) Goals Improve routing.
Each mesh represents a single hop on the route to a given root. Sibling nodes maintain pointers to each other. Each referrer has pointers to the desired.
Tapestry: Wide-area Location and Routing Ben Y. Zhao John Kubiatowicz Anthony D. Joseph U. C. Berkeley.
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter: Chunyuan Liao March 6, 2002 Ben Y.Zhao, John Kubiatowicz, and.
Distributed Object Location in a Dynamic Network Kirsten Hildrum, John D. Kubiatowicz, Satish Rao and Ben Y. Zhao.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
Tapestry Deployment and Fault-tolerant Routing Ben Y. Zhao L. Huang, S. Rhea, J. Stribling, A. D. Joseph, J. D. Kubiatowicz Berkeley Research Retreat January.
Tapestry on PlanetLab Deployment Experiences and Applications Ben Zhao, Ling Huang, Anthony Joseph, John Kubiatowicz.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
CITRIS Poster Supporting Wide-area Applications Complexities of global deployment  Network unreliability.
Locality Optimizations in Tapestry Jeremy Stribling Joint work with: Kris Hildrum Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz Sahara/OceanStore Winter.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
P2P Course, Structured systems 1 Skip Net (9/11/05)
1/17/01 Changing the Tapestry— Inserting and Deleting Nodes Kris Hildrum, UC Berkeley Joint work with John Kubiatowicz, Satish.
Tapestry: Finding Nearby Objects in Peer-to-Peer Networks Joint with: Ling Huang Anthony Joseph Robert Krauthgamer John Kubiatowicz Satish Rao Sean Rhea.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
Tapestry An off-the-wall routing protocol? Presented by Peter, Erik, and Morten.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Tapestry: Decentralized Routing and Location System Seminar S ‘01 Ben Y. Zhao CS Division, U. C. Berkeley.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Arnold N. Pears, CoRE Group Uppsala University 3 rd Swedish Networking Workshop Marholmen, September Why Tapestry is not Pastry Presenter.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Brocade Landmark Routing on P2P Networks Gisik Kwon April 9, 2002.
Tapestry:A Resilient Global- Scale Overlay for Service Deployment Zhao, Huang, Stribling, Rhea, Joseph, Kubiatowicz Presented by Rebecca Longmuir.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Peer to Peer Network Design Discovery and Routing algorithms
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Peer-to-Peer Information Systems Week 12: Naming
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
Accessing nearby copies of replicated objects
John D. Kubiatowicz UC Berkeley
Object Location Problem: Find a close copy of an object in a large network Solution should: Find object if it exists Find a close copy of the object (no.
Locality Optimizations in Tapestry Sahara/OceanStore Winter Retreat
Tapestry: Scalable and Fault-tolerant Routing and Location
Peer-to-Peer Information Systems Week 12: Naming
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Simultaneous Insertions in Tapestry
Presentation transcript:

Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John Kubiatowicz IEEE Journal on Selected Areas in Communications, January 2004

2 Limitations of Plaxton routing 1.Need for global knowledge to construct neighbor table 2.Static architecture - no provision for node addition or deletion 3.Single root for an object is a bottleneck. It is a single point of failure Tapestry provides innovative solutions to some of the bottlenecks of classical Plaxton routing.

3 Tapestry Tapestry is a self-organizing robust scalable wide- area infrastructure for efficient location and delivery of contents in presence of heavy load and node or link failure. It is the backbone of the Oceanstore, a persistent wide-area storage system.

4 Major features of Tapestry Most contemporary P2P systems did not take take network distances into account when constructing their routing overlay; thus, a single overlay hop may span the diameter of the network. In contrast, Tapestry constructs locally optimal routing tables from initialization and subsequently maintains them.

5 Major features of Tapestry DHT based systems fix the number and location of object replicas. In contrast, Tapestry allows applications to place objects according to their needs. Tapestry “publishes” location pointers throughout the network to facilitate efficient routing to those objects with low network stretch.

6 Major features of Tapestry Unlike classical Plaxton routing, Tapestry allows node insertion and deletion to support a dynamic environment. These algorithms are fairly efficient.

7 Routing and Location Namespace (both nodes and objects) 160 bits using the hash function SHA-1 Each object has has one or more roots H (ObjectID) = RootID Suffix routing from A to B At h th hop, arrive at nearest node N h that shares suffix with B of length h digits –Example: 5324 routes to 0629 via > > > > 0629

8 Tapestry API PUBLISH OBJECT : Publish (make available) object on the local node. UNPUBLISH OBJECT : Best-effort attempt to remove location mappings for an object. ROUTE TO OBJECT : Routes message to location of an object with GUID (globally unique id). ROUTE TO NODE : Route message to application on the exact node.

9 Requirements for Insert and Delete No central directory can be used –No hot spot or single point of failure –Reduced danger/threat of DoS. Must be fast (should contact only a few nodes) Keep objects available

10 Node Insertion N eed-to-know nodes are notified of, since the inserted node fills a null entry in their routing tables. T he new node might become the root for some existing objects. R eferences to those objects must be moved to maintain object availability. T he algorithms must construct a near optimal routing table for the inserted node. Nodes near the inserted node are notified, and such nodes may consider using it in their routing tables as an optimization.

11 Choosing Root in Tapestry Compute H (ObjectID) = RootID. Attempt to route to this node (which should be the root) without knowing if it exists. If it exists, then it becomes the root. But otherwise –Whenever null entry encountered, choose the next “higher” non-null pointer entry (thus, if xx53 does not exist, try xx63), or a secondary entry –If current node S is only non-null pointer in the rest of map, terminate route, and choose S as the root

12 Acknowledged Multicast Algorithm Locates & Contacts all nodes with a given suffix Popular tool: Useful in node insertion. Create a tree based on IDs Starting node knows when all nodes reached The node then sends to any ?0345, any ?1345, any ?2345, etc. if possible ??345 ?1345 ? & ?4345 sends to 04345, 54345… if they exist 

13 Three Parts of Insertion 1.Establish pointers from surrogates to new node. 2.Notify the need-to-know nodes 3.Create routing tables & notify other nodes

14 Finding the surrogates The new node sends a join message to a surrogate The primary surrogate multicasts to all other surrogates with similar suffix. Each surrogate establishes a pointer to the new node. When all pointers are established, continue ????4 ???34 Gate surrogates new node

15 Need-to-know nodes “Need-to-know” = a node with a hole in neighbor table filled by new node If is new node, and no 234s existed, must notify ???34 nodes Acknowledged multicast to all matching nodes During this time, object requests may go either to new node or former surrogate, but that’s okay Once done, delete pointers from surrogates.

16 Constructing the Neighbor Table via a nearest neighbor search Suppose we have an algorithm A for finding the three nearest neighbors for a given node. To fill in a slot, apply A to the subnetwork of three nodes that could fill that slot. (For ????1, run A on network of nodes ending in 1, and pick nearest neighbor.)

17 Finding Nearest Neighbor Let G be such that surrogate matches new node in last j digits of node ID A.G sends j-list to new node; new node pings all nodes on j-list. B.If one is closer, goto that node. If not, done with this level, and let j = j-1 and goto A j-list is the closest k=O(log n) nodes matching in j digits

18 Is this the nearest node? Yes, with high probability under an assumption Pink circle = ball around new node of radius d(G, new node) Progress = find any node in pink circle Consider the ball around the G containing all its j-list. Two cases: –Black ball contain pink ball; found closest node –High overlap between pink ball and G-ball so unlikely pink ball empty while G-ball has k nodes G, matches in j digits New node

19 Deleting a node xxx45 xxxx5 In-neighbors exiting node xxxx1 out-neighbors

20 Planned Delete Notify its out-neighbors: Exiting node says “I’m no longer pointing to you” To in-neighbors: Exiting node says it is going and proposes at least one replacement. –Exiting node republishes all objects ptrs it stores Objects rooted at exiting node get new roots

21 Republish-On-Delete republish

22 Unplanned Delete Planned delete relied exiting node’s neighbor table. –List of out-neighbors –List of in-neighbors –Closest matching node for each level. Can we reconstruct this information? –Not easily –Fortunately, we probably don’t need to.

23 Lazy Handling of Unplanned Delete A notices B is dead, A fixes its own state –A removes B from routing tables If removing B produces a hole, A must fill the hole, or be sure that the hole cannot be filled— use acknowledged multicast –A republishes all objects with next hop = B. Use republish-on-delete as before

Tapestry Mesh Incremental suffix-based routing (slide borrowed from the original authors) NodeID 0x43FE NodeID 0x13FE NodeID 0xABFE NodeID 0x1290 NodeID 0x239E NodeID 0x73FE NodeID 0x423E NodeID 0x79FE NodeID 0x23FE NodeID 0x73FF NodeID 0x555E NodeID 0x035E NodeID 0x44FE NodeID 0x9990 NodeID 0xF990 NodeID 0x993E NodeID 0x04FE NodeID 0x43FE

25 Fault detection Soft-state vs. explicit fault-recovery - Soft-state periodic republish is more attractive –Expected additional traffic for periodic republish is low Redundant roots for better resilience –Object names hashed w/ small salts i.e. multiple names/roots –Queries and publishing utilize all roots in parallel

26 Summary of Insertion steps Step 1: Build up N’s routing maps –Send messages to each hop along path from gateway to current node N –The i th hop along the path sends its i th level route table to N – N optimizes those tables where necessary Step 2: Move appropriate data from N’ to N Step 3: Use back pointers from N’ to find nodes which have null entries for N’s ID, tell them to add new entry to N Step 4: Notify local neighbors to modify paths to route through N where appropriate

27 Dynamic Insertion Example borrowed from the original slides NodeID 0x243FE NodeID 0x913FE NodeID 0x0ABFE NodeID 0x71290 NodeID 0x5239E NodeID 0x973FE NEW 0x143FE NodeID 0x779FE NodeID 0xA23FE Gateway 0xD73FF NodeID 0xB555E NodeID 0xC035E NodeID 0x244FE NodeID 0x09990 NodeID 0x4F990 NodeID 0x6993E NodeID 0x704FE NodeID 0x243FE