Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao

Similar presentations

Presentation on theme: "Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao"— Presentation transcript:

1 Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao

2 ROC/Sahara Retreats, 1/20022 Why Tapestry Todays Internet Route failures not uncommon BGP too slow to recover, redundant routes unexploited IPv4 constrains deployment of new protocols IP multicast, security protocols (DDoS traceback), … Wide-area applications straining existing systems Scalable management of large scale resources Our goals Wide-area scalable network overlay Highly fault-tolerant routing / location Introspective / self-tuning platform Support application-specific protocols Efficient (b/w, latency) data delivery Pass on wide-area solutions to application layer

3 ROC/Sahara Retreats, 1/20023 What is Tapestry? A prototype of a decentralized, fault-tolerant, adaptive overlay infrastructure (Zhao, Kubiatowicz, Joseph et al. 2000) Network substrate of OceanStore Routing: Suffix-based hypercube Similar to Plaxton, Rajamaran, Richa (SPAA97) Decentralized location: Virtual hierarchy per object with cached location references Dynamic algorithms using local information Core API: publishObject(ObjectID) routeMsgToObject(ObjectID) routeMsgToNode(NodeID)

4 ROC/Sahara Retreats, 1/20024 Routing and Location Namespace (nodes and objects) 160 bits length 2 80 names before name collision Each object has its own hierarchy rooted at Root f (ObjectID) = RootID, via a dynamic mapping function Suffix routing from A to B At h th hop, arrive at nearest node hop(h) such that: hop(h) shares suffix with B of length h digits Example: 5324 routes to 0629 via 5324 2349 1429 7629 0629 Object location: Root responsible for storing objects location Publish / search both route incrementally to root

5 ROC/Sahara Retreats, 1/20025 4 2 3 3 3 2 2 1 2 4 1 2 3 3 1 3 4 1 1 43 2 4 Tapestry Mesh Incremental suffix-based routing NodeID 0x43FE NodeID 0x13FE NodeID 0xABFE NodeID 0x1290 NodeID 0x239E NodeID 0x73FE NodeID 0x423E NodeID 0x79FE NodeID 0x23FE NodeID 0x73FF NodeID 0x555E NodeID 0x035E NodeID 0x44FE NodeID 0x9990 NodeID 0xF990 NodeID 0x993E NodeID 0x04FE NodeID 0x43FE

6 ROC/Sahara Retreats, 1/20026 Object Location Randomization and Locality

7 ROC/Sahara Retreats, 1/20027 Fault-tolerant Routing Strategy: Detect failures via soft-state probe packets Route around problematic hop via backup pointers Handling: 3 forward pointers per outgoing route (2 backups) 2 nd chance algorithm for intermittent failures Upgrade backup pointers and replace Protocols: First Reachable Link Selection (FRLS) Proactive Duplicate Packet Routing

8 ROC/Sahara Retreats, 1/20028 Talk Outline Tapestry overview Architecture Evaluation Brocade Conclude

9 ROC/Sahara Retreats, 1/20029 Architecture Background OceanStore implementation Java with asynchronous I/O Event-based, stage driven architecture (Sandstorm – M. Welsh) Operating System Java Virtual Machine Sandstorm (async I/O, event arch.) Tapestry OceanStore Applications

10 ROC/Sahara Retreats, 1/200210 Key Stages StaticTClient / Federation Uses config files to bootstrap initial Tapestry DynamicTClient Integrates new nodes into static Tapestry Router Primary handler of routing and location Patchwork Introspective monitoring and fault-detection Sandstorm (async I/O, event arch.) OceanStore Applications Router Static TClient Dynamic TClient Patch work

11 ROC/Sahara Retreats, 1/200211 Static TClient Federation used as rendezvous point Pair-wise pings to generate route tables Federation used as global barrier to begin F S S S S 1. S i says hello to F 2. F informs group of S i 3. Nodes do pair-wise pings 4. Nodes signal readiness 5. Barrier reached at F, signals start

12 ROC/Sahara Retreats, 1/200212 Dynamic TClient Node Integration 1. Hill-climb to find nearest Gateway 2. Route to surrogate / copy routes 3. Move relevant objects to new root 4. Directed multicast notifies nearby nodes GS Routes Request Routes Response Moving Object Pointers Directed Multicast ? F

13 ROC/Sahara Retreats, 1/200213 Routing / Location Router class Maintains: RoutingTable: [ ][ ] of RouteEntries ObjectPointers: Hash(Guid) PublishInfo Hash(Guid) LastHop Handles: Object publication / unpublication / mobile objects Route / location message handling

14 ROC/Sahara Retreats, 1/200214 Patchwork Fault-handling / introspective stage Granulated periodic beacons measure loss and network latency to entries in routing table Promote/demote routes in single RouteEntry Router network X A B C ABCBCA

15 ROC/Sahara Retreats, 1/200215 Deployment Status Object Location Publish / unpublish / route to object Mobile objects (backtracking unpublish) Active deletes, confirmation of non-existence General Routing Route to node, redundant routes Soft-state fault-detection, limited optimization Advanced policies for fault recovery Dynamic Integration Integration w/ limited optimizations Best effort fault-resilient integration mechanisms Background threads for optimization / refresh

16 ROC/Sahara Retreats, 1/200216 Talk Outline Tapestry overview Architecture Evaluation Brocade Conclude

17 ROC/Sahara Retreats, 1/200217 Generalized Results Cached object pointers Efficient lookup for nearby objects Reasonable storage overhead Multiple object roots Improves availability under attack Improves performance and perf. stability Reliable packet delivery Redundant pointers approximate optimal reachability FRLS, a simple fault-tolerant UDP protocol

18 ROC/Sahara Retreats, 1/200218 First Reachable Link Selection Use periodic UDP packets to gauge link condition Packets routed to shortest good link Assumes IP cannot correct routing table in time for packet delivery ABCDEABCDE IP Tapestry No path exists to dest.

19 ROC/Sahara Retreats, 1/200219 Some Numbers Measurements PIII 800, L2.2.18, IBM JDK 1.3 Simulating 6 nodes (4 staticTC, 1 federation, 1 dynamicTC) Publishing / locating ~10 objects PublishMsg, RouteMsg: ~ 0-2 ms Integration: ~2600ms (w/ pings) Integration messages: Assuming latency data available 2 x n (routing and objects) 16 M (directed multicast notification) (M 3)

20 ROC/Sahara Retreats, 1/200220 Talk Outline Tapestry overview Architecture Evaluation Brocade Conclude

21 ROC/Sahara Retreats, 1/200221 Landmark Routing on P2P Brocade Exploit non-uniformity Minimize wide-area routing hops / bandwidth Secondary overlay on top of Tapestry Select super-nodes by admin. domain Divide network into cover sets Super-nodes form secondary Tapestry Advertise cover set as local objects Routing (A B) uses brocade to route directly into Bs local network

22 ROC/Sahara Retreats, 1/200222 Brocade Mechanisms Selective utilization Nodes cache local cover set Only utilize brocade if dest. not in cache Forwarding messages to supernodes 1. Super-node does IP-snooping 2. Direct: cover set caches supernode Inter-domain routing: A B 1. A SN(A) via IP 2. SN(A) finds SN(B) via Tapestry location 3. SN(B) B via Tapestry/Chord/Pastry/CAN

23 ROC/Sahara Retreats, 1/200223 Brocade Routing RDP Local cover set cache on; interdomain:intradomain = 3:1 Packet simulator, Transit-stub 4096 T nodes, 16 SuperN

24 ROC/Sahara Retreats, 1/200224 Brocade Bandwidth Usage Local cover set cache on B/W unit: (sizeof (Msg) * Hops)

25 ROC/Sahara Retreats, 1/200225 Ongoing / Future Work Fill in full functionality Fault-handling policies, introspection, self-repair More realistic experiments Artificial topologies on SOSS simulator Larger scale dynamic integration experiments Code development External deployment / Code release Sprint programmable routers Academic networks Introspective measurement platform Implementing applications (Bayeux, Brocade … )

26 ROC/Sahara Retreats, 1/200226 For More Information Tapestry and related projects (and these slides): OceanStore: Related papers:

Download ppt "Tapestry Architecture and status UCB ROC / Sahara Retreat January 2002 Ben Y. Zhao"

Similar presentations

Ads by Google