Presentation is loading. Please wait.

Presentation is loading. Please wait.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring.

Similar presentations


Presentation on theme: "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring."— Presentation transcript:

1 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring 2007 Alex Shraer

2 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 2 Reading Material SkipNet: A Scalable Overlay Network with Practical Locality Properties Harvey, Jones, Saroiu, Theimer, Wolman Microsoft Research

3 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 3 Reminder: DHT Advantages Peer-to-peer: no centralized control or infrastructure Scalability: O(log N) routing, routing tables, join time Load-balancing Overlay robustness

4 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 4 DHT Disadvantages: SkipNet Motivation No control where data is stored –Data may be stored far from its users –Data may be stored outside its domain –Local accesses leave local organization In practice, organizations want: –Content Locality – explicitly place data where we want (inside the organization) –Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local No prefix search –Search(key) returns file whose name is the closest prefix to key.

5 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 5 Practical Requirements Data Controllability: –Organizations want control over their own data –Even if local data is globally available Manageability: –Data control allows for data administration, provisioning and manageability

6 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 6 Practical Requirements (cont’d) Security: –Content and path locality are key building blocks for dealing with certain external attacks (DoS, Traffic analysis) Data availability –Local data survives network partitions. Performance –Data can be stored near clients that use it

7 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 7 SkipNet Content Locality Place files at nodes according to names Name ID space (DNS-like) –for files and nodes –node name = reverse DNS name of the host (com.microsoft.host1) –file names have same prefix Problem?

8 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 8 Constrained Load-Balancing Data uniformly distributed in designated subset of nodes –e.g., inside organization

9 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 9 SkipNet’s Two Name Spaces Name ID Space Numerical ID Space com.microsoft.host1 h(com.microsoft.host1) non-uniform uniform

10 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 10 Skip Lists - Reminder In-memory dictionary data structure. –Sorted linked list with a subset of nodes having additional links to skip over many list elements Perfect (deterministic) skip list: –Pointer at level h skips over 2 h elements –Search: O (log N), N – number of nodes in the list. –Insertion/deletion: expensive/awkward

11 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 11 Skip Lists - Reminder Probabilistic skip list: –Node at level h with probability 1/2 h –Search, Insert, Delete: O (log N) w.h.p.

12 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 12 Skip List: Good for Us? The Good: –Sorted list: path locality for name-based search –O(log N) search with skip pointers –Up to log(N) skip pointers: O(log N) instertion The Bad: –Lookup starts from root only –Unequal load nodes on the top levels have high chance to be in routing path

13 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 13 SkipNet Global View Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring Level L = 0 L = 1 L = 2 L = 3 The full SkipNet routing infrastructure for an 8 node system, including the ring labels.

14 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 14 SkipNet Structure Skip Graph = Distributed Skip List –Every node belongs to rings at all levels –Search can start at any node –Use doubly linked lists at each level to account for absence of head and tail nodes. Perfect vs. Probabilistic –Perfect : Pointers at level h point to nodes that are exactly 2 h nodes to the left and right. –Probabilistic : A node in level h probabilistically determines which ring it belongs to.

15 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 15 SkipNet Routing Tables Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node A’s Routing Table

16 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 16 An Alternative View Level 2TT 1MX 0DZ SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown. A D M O T Z X V Level 2DD 1ZO 0XT 000001 010 011 100 101110 111

17 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 17 Routing By Name ID Routing in Skip Graph = Search in Skip Lists Simple Rule: –Forward the message to node that is closest to destination, without going too far. Route either clockwise/counterclockwise Terminates when messages arrives at a node whose name ID is closest to destination. Number of hops is O(log N) w.h.p.

18 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 18 Example: Routing from A to V Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D VZ O L = 3

19 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 19 Example: Routing from A to V Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node T’s Routing Table

20 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 20 Example: Routing from A to V Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3

21 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 21 Example: Routing to Object Level: L = 0 L = 1 L = 2 Route from A to F -> Terminates at E Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D E O V X Z Ring 0 A ETX Ring 1 D Z V O O Z AT E X D V A T E X D VZ O L = 3 T

22 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 22 Name ID Routing Algorithm SendMsg(nameID, msg) { if( LongestPrefix(nameID,localNode.nameID)==0 ) msg.dir = RandomDirection(); else if( nameID<localNode.nameID ) msg.dir = counterClockwise; else msg.dir = clockwise; msg.nameID = nameID; RouteByNameID(msg); } // Invoked at all nodes (including the source and // destination nodes) along the routing path. RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg); } Load Balancing Path Locality

23 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 23 Routing By Numeric ID Numeric id’s are random, no ring is sorted by them –We can’t route top-down! Bottom-up Routing –Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit. –Messages forwarded from ring in level h, R h, to a ring in level h+1, R h+1, such that nodes in R h+1 share h+1 digits with destination numeric ID. –Terminates when message delivered, or none the nodes in R h share h+1 digits with destination numeric ID

24 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 24 Example: Routing by Numeric ID –Hash(“Foo.c”) = 101 Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10 Ring 11 Ring000Ring000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 Root Ring D M O T V X Z Ring 0 M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Foo.c A A

25 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 25 Routing by Numeric ID The same routing tables are used for routing by nameID and numericID The number of message hops is O(log N) whp What sequential data structure does this search resemble?

26 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 26 Routing Algorithm // Invoked at all nodes (including the source and destination nodes) along the routing path. // Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false RouteByNumericID(msg) { if (msg.numID == localNode.numID || msg.finalDestination) { DeliverMessage(msg.msg); return; } if (localNode == msg.startNode) { // Done traversing current ring. msg.finalDestination = true; SendToNode(msg.bestNode); return; } h = CommonPrefixLen(msg.numID, localNode.numID); if (h > msg.ringLvl) { // Found a higher ring. msg.ringLvl = h; msg.startNode = msg.bestNode = localNode; } else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) { // Found a better candidate for current ring. msg.bestNode = localNode; } // Forward along current ring. nbr = localNode.RouteTable[clockWise][msg.ringLvl]; SendToNode(nbr); }

27 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 27 Routing Summary It all depends on how we look at the routing tables … What is the data structure consisting of all the pointers in the rings that the specific node’s name ID belongs to? –A Skip List! Search is top-down. What is the data structure consisting of all the rings in respect to searching by numeric id? –A Trie! Search is bottom-up. The search in both directions takes O(log N) messages whp. Ready for join/departure procedures?

28 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 28 Node Join Two-stage process: (1) bottom-up + (2) top-down Bottom-up: find the top level ring that matches the node’s numeric ID. Top-down: build the new node’s routing table –Find a neighbor in the top ring using name ID search. –Starting from this neighbor, search for the name ID at the next lower level and thus find neighbors at lower level. –Repeated until the search reaches the root. Update of the existing nodes’ routing tables: –after the new node has joined the root ring.

29 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 29 Node join illustrated Ring P0 Ring P1 Ring P Only a few in expectation Joining node

30 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 30 Node Join - Analysis Key ideas: –Climb to a weakly populated ring. –Search for the node’s neighbors at the lower levels only after finding the neighbors at the higher levels. –The range of traversed nodes at the level = the range of neighbors at the next higher level. Insertion traverses O(log N) hops whp –Expected O(log N) levels, constant number of neighbors at each level.

31 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 31 Node Departure/Failure Graceful (notified) vs crash departure Key issue –routing tables’ update Key idea – separate vital info from optimizations –Routing is correct as long as the root level ring is maintained. –Other levels regarded as optimization hints –Does this remind something? Upper-ring membership maintained through a background repair process.

32 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 32 Leaf Sets Idea = use redundant pointers at level 0: –Protect from independent failures –Improve the search performance Store L/2 pointers in every direction SkipNet uses L=16 –Not an original SkipNet idea – used in Pastry.

33 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 33 Constrained Load Balancing (CLB) Multiple DHTs with differing scopes using a single SkipNet structure –A result of the ability to route in both address spaces Divide data object names into two parts with ! CLB Domain CLB Suffix microsoft.com ! skipnet.html Numeric Routing Name Routing microsoft.com/skipnet.html! – controlled placement !microsoft.com/skipnet.html – Global DHT

34 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 34 CLB Example File ID = “com.microsoft ! skipnet.html” –Route by name ID to com.microsoft –Inside com.microsoft, route by numeric ID to hash(“skipnet.html”) com.sun edu.ucb gov.irs com.microsoft skipnet. html

35 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 35 SkipNet Path Locality Organizations correspond to contiguous SkipNet segments –Internal routing by NameID remains internal Nodes have left / right pointers com.sun edu.ucb gov.irs com.microsoft com.microsoft.research


Download ppt "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring."

Similar presentations


Ads by Google