Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Tutorial 4: SkipNet Spring.

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Distributed Hash Tables: An Overview
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
SKIP GRAPHS Slides adapted from the original slides by James Aspnes Gauri Shah.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
A Scalable Content Addressable Network (CAN)
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Eddie Bortnikov/Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
P2P Network Structured Networks (III) Distributed Hash Tables Pedro García López Universitat Rovira I Virgili
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Secure routing for structured peer-to-peer overlay networks Miguel Castro, Ayalvadi Ganesh, Antony Rowstron Microsoft Research Ltd. Peter Druschel, Dan.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
SkipNet Christian Schmidt-Madsen, Peter Tiedemann,
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Distributed Lookup Systems
Idit Keidar, Topics in Reliable Distributed Systems, Technion EE, Winter Topics in Reliable Distributed Systems Winter Dr.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 13: SkipNet Spring.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
P2P Course, Structured systems 1 Skip Net (9/11/05)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
P2P Course, Structured systems 1 Introduction (26/10/05)
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT via KBR FTW) Revised 4/1/2013.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Peer to Peer Network Design Discovery and Routing algorithms
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Chord Advanced issues.
Chord Advanced issues.
Presentation transcript:

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Tutorial 4: SkipNet Spring 2008 Alex Shraer

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Reading Material SkipNet: A Scalable Overlay Network with Practical Locality Properties Harvey, Jones, Saroiu, Theimer, Wolman Microsoft Research

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: DHT Advantages Peer-to-peer: no centralized control or infrastructure Scalability: O(log N) routing, routing tables, join time Load-balancing Overlay robustness

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring DHT Disadvantages: SkipNet Motivation No control where data is stored –Data may be stored far from its users –Data may be stored outside its administrative domain hard to administer privileges invites different security attacks –Local accesses leave local organization In practice, organizations want: –Content Locality – explicitly place data where we want (inside the organization) –Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local No prefix search –Search(key) returns file whose name has key as prefix

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Practical Requirements Data Controllability: –Organizations want control over their own data –Even if local data is globally available Manageability: –Data control allows for data administration, provisioning and manageability

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Practical Requirements (cont’d) Security: –Content and path locality are key building blocks for dealing with certain external attacks (DoS, Traffic analysis) Data availability –Local data survives network partitions. Performance –Data can be stored near clients that use it

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring SkipNet Content Locality Place files at nodes according to names Name ID space (DNS-like) –for files and nodes –node name = reverse DNS name of the host (com.microsoft.host1) –file names have same prefix Problem?

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Constrained Load-Balancing Data uniformly distributed in designated subset of nodes –e.g., inside organization How can this be achieved? Numeric ID space! –similar to Chord, Pastry and others –nodes are randomly distributed –Hashes of the node names and content identifiers mapped into the numeric ID. –Content is stored on the node with id closest to content’s hashed name. Key property of SkipNet: two address spaces

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Skip Lists - Reminder In-memory dictionary data structure. –Sorted linked list with a subset of nodes having additional links to skip over many list elements Perfect (deterministic) skip list: –Pointer at level h skips over 2 h elements –Search: O (log N), N – number of nodes in the list. –Insertion/deletion: expensive/awkward

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Skip Lists - Reminder Probabilistic skip list: –Node at level h with probability 1/2 h –Search, Insert, Delete: O (log N) w.h.p.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Skip List: Good for Us? The Good: –Sorted list: path locality for name-based search –O(log N) search with skip pointers –Up to log(N) skip pointers: O(log N) instertion The Bad: –Lookup starts from root only –Unequal load nodes on the top levels have high chance to be in routing path

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring SkipNet Global View Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring Level L = 0 L = 1 L = 2 L = 3 The full SkipNet routing infrastructure for an 8 node system, including the ring labels.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring SkipNet Structure Skip Graph = Distributed Skip List –Every node belongs to rings at all levels –Search can start at any node –Use doubly linked lists at each level to account for absence of head and tail nodes. Perfect vs. Probabilistic –Perfect : Pointers at level h point to nodes that are exactly 2 h nodes to the left and right. –Probabilistic : A node in level h probabilistically determines which ring it belongs to. All rings are sorted according to Name IDs Ring membership is according to Numeric IDs –All nodes sharing the same prefix of Numeric IDs of length h are members of the same ring at level h

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring SkipNet Routing Tables Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node A’s Routing Table

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring An Alternative View Level 2TT 1MX 0DZ SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown. A D M O T Z X V Level 2DD 1ZO 0XT

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Routing By Name ID Routing in Skip Graph = Search in Skip Lists Simple Rule: –Forward the message to node that is closest to destination, without going too far. Route either clockwise/counterclockwise Terminates when messages arrives at a node whose name ID is closest to destination. Number of hops is O(log N) w.h.p.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Routing from A to V Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D VZ O L = 3

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Routing from A to V Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Node T’s Routing Table

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Routing from A to V Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D M O T V X Z Ring 0 A M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Routing to Object Level: L = 0 L = 1 L = 2 Route from A to F -> Terminates at E Ring 00 Ring 01 Ring 10Ring 11 Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A Root Ring D E O V X Z Ring 0 A ETX Ring 1 D Z V O O Z AT E X D V A T E X D VZ O L = 3 T

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Name ID Routing Algorithm SendMsg(nameID, msg) { if( LongestPrefix(nameID,localNode.nameID)==0 ) msg.dir = RandomDirection(); else if( nameID<localNode.nameID ) msg.dir = counterClockwise; else msg.dir = clockwise; msg.nameID = nameID; RouteByNameID(msg); } // Invoked at all nodes (including the source and // destination nodes) along the routing path. RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg); } Load Balancing Path Locality

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Routing By Numeric ID Numeric id’s are random, no ring is sorted by them –We can’t route top-down! Bottom-up Routing –Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit. –Messages forwarded from ring in level h, R h, to a ring in level h+1, R h+1, such that nodes in R h+1 share h+1 digits with destination numeric ID. –Terminates when message delivered, or none the nodes in R h share h+1 digits with destination numeric ID, at a node in R h with closest possible numeric id.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Routing by Numeric ID –Hash(“Foo.c”) = 101 Level: L = 0 L = 1 L = 2 Ring 00 Ring 01 Ring 10 Ring 11 Ring000Ring000 Ring 001 Ring 001 Ring 010 Ring 010 Ring 011 Ring 011 Ring 100 Ring 100 Ring 101 Ring 101 Ring 110 Ring 110 Ring 111 Ring 111 Root Ring D M O T V X Z Ring 0 M T X Ring 1 D Z V O O Z AT M X D V A T M X D V Z O L = 3 Foo.c A A

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Routing by Numeric ID The same routing tables are used for routing by nameID and numericID When Numeric IDs are binary: in each ring R h, in expectation only 2 nodes visited before encountering one belonging to the next ring R h+1 –The number of message hops is O(log N) w.h.p.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Routing Algorithm // Invoked at all nodes (including the source and destination nodes) along the routing path. // Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false RouteByNumericID(msg) { if (msg.numID == localNode.numID || msg.finalDestination) { DeliverMessage(msg.msg); return; } if (localNode == msg.startNode) { // Done traversing current ring. msg.finalDestination = true; SendToNode(msg.bestNode); return; } h = CommonPrefixLen(msg.numID, localNode.numID); if (h > msg.ringLvl) { // Found a higher ring. msg.ringLvl = h; msg.startNode = msg.bestNode = localNode; } else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) { // Found a better candidate for current ring. msg.bestNode = localNode; } // Forward along current ring. nbr = localNode.RouteTable[clockWise][msg.ringLvl]; SendToNode(nbr); }

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Base (k) for Numeric IDs If a higher base k>2 is used for Numeric IDs the routing is O(klog k N) w.h.p. When we increase k  more rings in each level  less levels  less pointers in routing table  less state but more hops… Optimization - dense routing table (R-Table) Normal (sparse) R-Table + k-1 pointers to contiguous nodes in both directions at each level.  More state but less hops

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Node Join Two-stage process: (1) bottom-up + (2) top-down Bottom-up: find the top level ring that matches the node’s numeric ID. Top-down: build the new node’s routing table –Find a neighbor in the top ring using name ID search. –Starting from this neighbor, search for the name ID at the next lower level and thus find neighbors at lower level. –Repeated until the search reaches the root. Update of the existing nodes’ routing tables: –after the new node has joined the root ring.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Node join illustrated Ring P0 Ring P1 Ring P Only a few in expectation Joining node

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Node Join - Analysis Key ideas: –Climb to a weakly populated ring. –Search for the node’s neighbors at the lower levels only after finding the neighbors at the higher levels. –The range of traversed nodes at the level = the range of neighbors at the next higher level. Insertion traverses O(log N) hops whp –Expected O(log N) levels, constant number of neighbors at each level.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Node Departure/Failure Graceful (notified) vs crash departure Key issue –routing tables’ update Key idea – separate vital info from optimizations –Routing is correct as long as the root level ring is maintained. –Other levels regarded as optimization hints –Does this remind something? Upper-ring membership maintained through a background repair process.

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Leaf Sets Idea = use redundant pointers at level 0: Store L/2 pointers in each direction SkipNet uses L=16 –Not an original SkipNet idea – used in Pastry. Protect from independent failures Improve the search performance –rout directly using leaf set if got within L/2 of the target

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Constrained Load Balancing (CLB) Multiple DHTs with differing scopes using a single SkipNet structure –A result of the ability to route in both address spaces Divide data object names into two parts with ! CLB Domain CLB Suffix microsoft.com ! skipnet.html Numeric Routing Name Routing microsoft.com/skipnet.html! – controlled placement !microsoft.com/skipnet.html – Global DHT

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring CLB Example File ID = “com.microsoft ! skipnet.html” –Route by name ID to com.microsoft –Inside com.microsoft, route by numeric ID to hash(“skipnet.html”) com.sun edu.ucb gov.irs com.microsoft skipnet. html

Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring SkipNet Path Locality Organizations correspond to contiguous SkipNet segments –Internal routing by NameID remains internal Nodes have left / right pointers com.sun edu.ucb gov.irs com.microsoft com.microsoft.research