Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Brocade: Landmark Routing on Peer to Peer Networks Ben Y. Zhao Yitao Duan, Ling Huang, Anthony Joseph, John Kubiatowicz IPTPS, March 2002.
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Internet Indirection Infrastructure (i3 ) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana UC Berkeley SIGCOMM 2002 Presented by:
Supporting Rapid Mobility via Locality in an Overlay Network Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz Sahara / OceanStore Joint Session June 10,
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Qualifying Examination A Decentralized Location and Routing Infrastructure for Fault-tolerant Wide-area Applications John Kubiatowicz (Chair)Satish Rao.
Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Implementation and Deployment of a Large-scale Network Infrastructure Ben Y. Zhao L. Huang, S. Rhea, J. Stribling, A. D. Joseph, J. D. Kubiatowicz EECS,
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
The Oceanstore Regenerative Wide-area Location Mechanism Ben Zhao John Kubiatowicz Anthony Joseph Endeavor Retreat, June 2000.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Brocade Landmark Routing on Structured P2P Overlays Ben Zhao, Yitao Duan, Ling Huang Anthony Joseph and John Kubiatowicz (IPTPS 2002) Goals Improve routing.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter: Chunyuan Liao March 6, 2002 Ben Y.Zhao, John Kubiatowicz, and.
Implementation of a Tapestry Node: The main components: The core router, utilizes the routing and object reference tables to handle messages, The node.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
Tapestry on PlanetLab Deployment Experiences and Applications Ben Zhao, Ling Huang, Anthony Joseph, John Kubiatowicz.
CITRIS Poster Supporting Wide-area Applications Complexities of global deployment  Network unreliability.
Locality Optimizations in Tapestry Jeremy Stribling Joint work with: Kris Hildrum Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz Sahara/OceanStore Winter.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
Tapestry An off-the-wall routing protocol? Presented by Peter, Erik, and Morten.
Tapestry: Decentralized Routing and Location System Seminar S ‘01 Ben Y. Zhao CS Division, U. C. Berkeley.
Locality Aware Mechanisms for Large-scale Networks Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz UC Berkeley Future Directions in Distributed Computing.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
Arnold N. Pears, CoRE Group Uppsala University 3 rd Swedish Networking Workshop Marholmen, September Why Tapestry is not Pastry Presenter.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
IDRM: Inter-Domain Routing Protocol for Mobile Ad Hoc Networks C.-K. Chau, J. Crowcroft, K.-W. Lee, S. H.Y. Wong.
Brocade Landmark Routing on P2P Networks Gisik Kwon April 9, 2002.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Tapestry:A Resilient Global- Scale Overlay for Service Deployment Zhao, Huang, Stribling, Rhea, Joseph, Kubiatowicz Presented by Rebecca Longmuir.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
Tapestry: A Resilient Global-scale Overlay for Service Deployment 1 Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
Peer to Peer Network Design Discovery and Routing algorithms
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Implementation and Deployment of a Large-scale Network Infrastructure Ben Y. Zhao L. Huang, S. Rhea, J. Stribling, A. D. Joseph, J. D. Kubiatowicz EECS,
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Brocade: Landmark Routing on Overlay Networks
Accessing nearby copies of replicated objects
John D. Kubiatowicz UC Berkeley
Locality Optimizations in Tapestry Sahara/OceanStore Winter Retreat
Dynamic Replica Placement for Scalable Content Delivery
Tapestry: Scalable and Fault-tolerant Routing and Location
Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays
Brocade: Landmark Routing on Peer to Peer Networks
Presentation transcript:

Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao

ROC/Sahara Retreats, 1/20022 Why Tapestry Todays Internet Route failures not uncommon BGP too slow to recover, redundant routes unexploited IPv4 constrains deployment of new protocols IP multicast, security protocols (DDoS traceback), … Wide-area applications straining existing systems Scalable management of large scale resources Our goals Wide-area scalable network overlay Highly fault-tolerant routing / location Introspective / self-tuning platform Support application-specific protocols Efficient (b/w, latency) data delivery Pass on wide-area solutions to application layer

ROC/Sahara Retreats, 1/20023 What is Tapestry? A prototype of a decentralized, fault-tolerant, adaptive overlay infrastructure (Zhao, Kubiatowicz, Joseph et al. 2000) Network substrate of OceanStore Routing: Suffix-based hypercube Similar to Plaxton, Rajamaran, Richa (SPAA97) Decentralized location: Virtual hierarchy per object with cached location references Dynamic algorithms using local information Core API: publishObject(ObjectID) routeMsgToObject(ObjectID) routeMsgToNode(NodeID)

ROC/Sahara Retreats, 1/20024 Routing and Location Namespace (nodes and objects) 160 bits length 2 80 names before name collision Each object has its own hierarchy rooted at Root f (ObjectID) = RootID, via a dynamic mapping function Suffix routing from A to B At h th hop, arrive at nearest node hop(h) such that: hop(h) shares suffix with B of length h digits Example: 5324 routes to 0629 via Object location: Root responsible for storing objects location Publish / search both route incrementally to root

ROC/Sahara Retreats, 1/ Tapestry Mesh Incremental suffix-based routing NodeID 0x43FE NodeID 0x13FE NodeID 0xABFE NodeID 0x1290 NodeID 0x239E NodeID 0x73FE NodeID 0x423E NodeID 0x79FE NodeID 0x23FE NodeID 0x73FF NodeID 0x555E NodeID 0x035E NodeID 0x44FE NodeID 0x9990 NodeID 0xF990 NodeID 0x993E NodeID 0x04FE NodeID 0x43FE

ROC/Sahara Retreats, 1/20026 Object Location Randomization and Locality

ROC/Sahara Retreats, 1/20027 Fault-tolerant Routing Strategy: Detect failures via soft-state probe packets Route around problematic hop via backup pointers Handling: 3 forward pointers per outgoing route (2 backups) 2 nd chance algorithm for intermittent failures Upgrade backup pointers and replace Protocols: First Reachable Link Selection (FRLS) Proactive Duplicate Packet Routing

ROC/Sahara Retreats, 1/20028 Talk Outline Tapestry overview Architecture Evaluation Brocade Conclude

ROC/Sahara Retreats, 1/20029 Architecture Background OceanStore implementation Java with asynchronous I/O Event-based, stage driven architecture (Sandstorm – M. Welsh) Operating System Java Virtual Machine Sandstorm (async I/O, event arch.) Tapestry OceanStore Applications

ROC/Sahara Retreats, 1/ Key Stages StaticTClient / Federation Uses config files to bootstrap initial Tapestry DynamicTClient Integrates new nodes into static Tapestry Router Primary handler of routing and location Patchwork Introspective monitoring and fault-detection Sandstorm (async I/O, event arch.) OceanStore Applications Router Static TClient Dynamic TClient Patch work

ROC/Sahara Retreats, 1/ Static TClient Federation used as rendezvous point Pair-wise pings to generate route tables Federation used as global barrier to begin F S S S S 1. S i says hello to F 2. F informs group of S i 3. Nodes do pair-wise pings 4. Nodes signal readiness 5. Barrier reached at F, signals start

ROC/Sahara Retreats, 1/ Dynamic TClient Node Integration 1. Hill-climb to find nearest Gateway 2. Route to surrogate / copy routes 3. Move relevant objects to new root 4. Directed multicast notifies nearby nodes GS Routes Request Routes Response Moving Object Pointers Directed Multicast ? F

ROC/Sahara Retreats, 1/ Routing / Location Router class Maintains: RoutingTable: [ ][ ] of RouteEntries ObjectPointers: Hash(Guid) PublishInfo Hash(Guid) LastHop Handles: Object publication / unpublication / mobile objects Route / location message handling

ROC/Sahara Retreats, 1/ Patchwork Fault-handling / introspective stage Granulated periodic beacons measure loss and network latency to entries in routing table Promote/demote routes in single RouteEntry Router network X A B C ABCBCA

ROC/Sahara Retreats, 1/ Deployment Status Object Location Publish / unpublish / route to object Mobile objects (backtracking unpublish) Active deletes, confirmation of non-existence General Routing Route to node, redundant routes Soft-state fault-detection, limited optimization Advanced policies for fault recovery Dynamic Integration Integration w/ limited optimizations Best effort fault-resilient integration mechanisms Background threads for optimization / refresh

ROC/Sahara Retreats, 1/ Talk Outline Tapestry overview Architecture Evaluation Brocade Conclude

ROC/Sahara Retreats, 1/ Generalized Results Cached object pointers Efficient lookup for nearby objects Reasonable storage overhead Multiple object roots Improves availability under attack Improves performance and perf. stability Reliable packet delivery Redundant pointers approximate optimal reachability FRLS, a simple fault-tolerant UDP protocol

ROC/Sahara Retreats, 1/ First Reachable Link Selection Use periodic UDP packets to gauge link condition Packets routed to shortest good link Assumes IP cannot correct routing table in time for packet delivery ABCDEABCDE IP Tapestry No path exists to dest.

ROC/Sahara Retreats, 1/ Some Numbers Measurements PIII 800, L2.2.18, IBM JDK 1.3 Simulating 6 nodes (4 staticTC, 1 federation, 1 dynamicTC) Publishing / locating ~10 objects PublishMsg, RouteMsg: ~ 0-2 ms Integration: ~2600ms (w/ pings) Integration messages: Assuming latency data available 2 x n (routing and objects) 16 M (directed multicast notification) (M 3)

ROC/Sahara Retreats, 1/ Talk Outline Tapestry overview Architecture Evaluation Brocade Conclude

ROC/Sahara Retreats, 1/ Landmark Routing on P2P Brocade Exploit non-uniformity Minimize wide-area routing hops / bandwidth Secondary overlay on top of Tapestry Select super-nodes by admin. domain Divide network into cover sets Super-nodes form secondary Tapestry Advertise cover set as local objects Routing (A B) uses brocade to route directly into Bs local network

ROC/Sahara Retreats, 1/ Brocade Mechanisms Selective utilization Nodes cache local cover set Only utilize brocade if dest. not in cache Forwarding messages to supernodes 1. Super-node does IP-snooping 2. Direct: cover set caches supernode Inter-domain routing: A B 1. A SN(A) via IP 2. SN(A) finds SN(B) via Tapestry location 3. SN(B) B via Tapestry/Chord/Pastry/CAN

ROC/Sahara Retreats, 1/ Brocade Routing RDP Local cover set cache on; interdomain:intradomain = 3:1 Packet simulator, Transit-stub 4096 T nodes, 16 SuperN

ROC/Sahara Retreats, 1/ Brocade Bandwidth Usage Local cover set cache on B/W unit: (sizeof (Msg) * Hops)

ROC/Sahara Retreats, 1/ Ongoing / Future Work Fill in full functionality Fault-handling policies, introspection, self-repair More realistic experiments Artificial topologies on SOSS simulator Larger scale dynamic integration experiments Code development External deployment / Code release Sprint programmable routers Academic networks Introspective measurement platform Implementing applications (Bayeux, Brocade … )

ROC/Sahara Retreats, 1/ For More Information Tapestry and related projects (and these slides): OceanStore: Related papers: