Other Structured P2P Systems CAN, BATON Lecture 4 1.

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes Presenter: Quang Hieu Vu H.V.Jagadish, Beng Chin Ooi, Quang Hieu Vu,
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
1Department of Electrical Engineering and Computer Science, University of Michigan, USA. 2Department of Computer Science, National University of Singapore,
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Small-world Overlay P2P Network
A Scalable Content Addressable Network (CAN)
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Wide-area cooperative storage with CFS
P2P Course, Structured systems 1 Skip Net (9/11/05)
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
P2P Course, Structured systems 1 Introduction (26/10/05)
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
CS4432: Database Systems II
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
Starting at Binary Trees
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
PRIN WOMEN PROJECT Research Unit: University of Naples Federico II G. Ferraiuolo
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Multiway Search Trees Data may not fit into main memory
Accessing nearby copies of replicated objects
SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.
DHT Routing Geometries and Chord
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Presentation transcript:

Other Structured P2P Systems CAN, BATON Lecture 4 1

CAN  A Scalable Content Addressable Network 2

3 CAN (content-addressable network)  hash value is viewed as a point in a D-dimensional cartesian space  each node responsible for a D-dimensional “cube” in the space  nodes are neighbors if their cubes “touch” at more than just a point (more formally, nodes s & t are neighbors when s contains some [, ] and t contains [, ]) Example: D=2 1’s neighbors: 2,3,4,6 6’s neighbors: 1,2,4,5 Squares “wrap around”, e.g., 7 and 8 are neighbors expected # neighbors: O(D)

4 CAN routing  To get to from choose a neighbor with smallest cartesian distance from (e.g., measured from neighbor’s center) e.g., region 1 needs to send to node covering X checks all neighbors, node 2 is closest forwards message to node 2 Cartesian distance monotonically decreases with each transmission expected # overlay hops: (DN 1/D )/4 X

5 CAN node insertion  To join the CAN: find some node in the CAN (via bootstrap process) choose a point in the space uniformly at random using CAN, inform the node that currently covers the space that node splits its space in half 1 st split along 1 st dimension if last split along dimension i < D, next split along i+1 st dimension e.g., for 2-d case, split on x-axis, then y- axis keeps half the space and gives other half to joining node X Observation: the likelihood of a rectangle being selected is proportional to it’s size, i.e., big rectangles chosen more frequently 9 10

6 CAN node removal  Underlying cube structure should remain intact i.e., if the spaces covered by s & t were not formed by splitting a cube, then they should not be merged together  Sometimes, can simply collapse removed node’s portion to form bigger rectangle e.g., if 6 leaves, its portion goes back to 1  Other times, requires juxtaposition of nodes’ areas of coverage e.g., if 3 leaves, should merge back into square formed by 2,4,5 cannot simply collapse 3’s space into 4 and/or 5 one solution: 5’s old space collapses into 2’s space, 5 takes over 3’s space

7 CAN (recovery from) removal process  View partitioning as a binary tree of leaves represent regions covered by overlay nodes (labeled by node that covers the region) intermediate nodes represent “split” regions that could be “reformed”, i.e., a leaf can appear at that position siblings are regions that can be merged together (forming the region that is covered by their parent)

8 CAN (recovery from) removal process  Repair algorithm when leaf s is removed Find a leaf node t that is either s’s sibling descendant of s’s sibling where t’s sibling is also a leaf node t takes over s’s region (moves to s’s position on the tree) t’s sibling takes over t’s previous region  Distributed process in CAN to find appropriate t w/ sibling: current (inappropriate) t sends msg into area that would be covered by a sibling if sibling (same size region) is there, then done. Else receiving node becomes t & repeat X

BATON  A Balanced Tree Structure for Peer-to-Peer Networks 9

Overlay Networks  Add additional routing protocol on top of network routing (i.e. TCP/IP)  Need to be resilient to node failures, intermittent connectivity Redundancy in data and metadata  Most popular is Distributed Hash Table (DHT) Pastry Chord

Chord Overview  Randomly assign each peer an ID between 0 and 2 m  Hash key to find ID for an item  Next peer after ID (mod 2 m )is responsible for storing item  Each peer has routing table to subsequent peers  Table is used for incremental routing

Chord Routing Example (m = 6) Figures poached from Stoica et al., “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications,” SIGCOMM 2001.

DHTs: The Good, the Bad and the Ugly  Good Efficient routing algorithms Easy to understand Simple replication methods  Bad Load balancing depends on good choice of hash function  Ugly Can only do exact ID lookups, since items are clustered in key space Would like to do range and prefix searches

Related Work  P-Grid (CoopIS’01) Based on a binary prefix tree structure.  Can not guarantee log N search step boundary  P-Tree (WebDB’04) Based on B+-tree structure and uses CHORD as overlay framework Each node maintains a branch of the B+tree  Expensive cost to keep consistence knowledge among nodes  Multi-way tree (DBISP2P’04) Each node maintains links to its parent, its siblings, its neighbors, and its children  Can not guarantee log N search step boundary

BATON Architecture Binary Balanced Tree Index Architecture  BATON: BAlanced Tree Overlay Network  Definition: A tree is balanced if and only if at any node in the tree, the height of its two subtrees differ by at most one.

Theorems  Theorem 1: The tree is a balanced tree if every node in the tree that has a child also has both its left and right routing tables full (*).  Theorem 2: If a node, say x, contains a link to another node, say y, in its left or right routing tables, parent node of x must also contain a link to parent node of y unless the same node is parent of both x and y. (*) A routing table is full if none of the valid links is NULL.

Node join  Example: new node u joins the network a ih kjmlon f d g e b c pqrs u u

Node join  Cost of finding a node to join: O(log N)  When a node accepts a new node as its child Split half of its content (its range of values) to its new child Update adjacent links of itself and its new child Notify both its neighbor nodes and its new child’s neighbor nodes to update their knowledge Cost: 6 log N

Node departure  When a node wishes to leave the network If it is a leaf node and there is no neighbor node having children, it can leave the network Transfer its content to the parent node, and update correspondence adjacent link Notify its neighbor nodes and its parent’s neighbor nodes to update their knowledge Cost: 4 log N If it is a leaf node and there is a neighbor node having children, it needs to find a leaf node to replace it by sending a FINDREPLACEMENTNODE request to a child of that neighbor node If it is an intermediate node, it needs to find a leaf node to replace it by sending a FINDREPLACEMENTNODE to one of its adjacent nodes

Node departure  Example: existing node b leaves the network a ih kjmlon f d g e c pqsu t b r r

Node departure  Cost of finding a leaf node to replace: O(log N)  When a node comes to replace a leave node Notify its parent and its neighbor nodes as in case of leaf node leaving: 4 log N Notify its new parent node, its new neighbor nodes, and the parent’s neighbor nodes: 4 log N Total cost: 8 log N

Fault tolerance  Node failure Nodes discovering failure of a node report to that node’s parent. The failure’s parent node will take responsibility for finding a leaf node to replace if necessary. Routing information of the failure node can be recovered by contacting its neighbor nodes via routing information of its parent.  Fault tolerance: failure node can be passed by two ways Through routing tables (similar to CHORD) - horizontal axis Through parent-child and adjacent links - vertical axis Specifically, even if all nodes at the same level fail, the tree is not partitioned

Network restructuring  Necessary in case of forced join or forced leave that is used in load balancing scheme  Network restructuring is triggered when the condition in the theorem 1 is violated  Network restructuring is done by shifting nodes via adjacent links No data movement is required Each shifted node requires O(log N) effort to update routing tables

Forced join  Example 1: network restructure is triggered as a forced join

Forced leave  Example 2: network restructuring is triggered as a forced leave

Index construction  Each node is assigned a range of values  The range of values directly managed by a node is Greater than the range managed by its left adjacent node Smaller than the range managed by its right adjacent node

Exact match query  Example: node h wants to search data belonged to node c, say 74 a ih kjmlon f d g e b c pqrts [0-5) [8-12) [17-23)[34-39)[51-54)[61-68)[75-81)[89-93) [5-8) [23-29)[54-61)[81-85) [12-17) [72-75) [45-51) [29-34)[39-45)[68-72)[85-89)[93-100)

Range query  Process similar to exact match query First, find an intersection with searched range Second, follow adjacent links to retrieve all results  Cost Exact match query: O(log N) Range query: O(log N + X) where X is the total number of nodes containing searched results

Data insertion and deletion  Insertion Follow the exact match query process to find the node where data should be inserted except that If it is the left most node and the inserted value is still less than its lower bound, or if it is the right most node and the inserted value is still greater than its upper bound, it expands its range of values to accept the new inserted value. In this case, additional log N cost is needed for updating routing tables  Deletion Follow the exact match query process to find the node containing data which should be deleted  Cost: similar to exact match query process O(log N) for both insertion and deletion

Load balancing  Load balancing process is initialized when a node is overloaded or under loaded due to insertion or deletion  2 load balancing schemes Do load balancing with adjacent nodes An overloaded node finds a lightly loaded node to share work load (only if the overloaded / under loaded node is a leaf node) A lightly loaded node is found by traveling through neighbor nodes within O(logN) steps. Once found, the lightly loaded node transfers its content to one of its adjacent nodes, forced leaves its current position, and forced joins as a child of the overloaded node. Network restructuring is triggered if necessary Similar process is applied to under loaded nodes  Cost: O(log N) for each node attending load balancing process

Load balancing  Example: node g is an overloaded node while node f is a lightly loaded node

Experimental study  Experimental setup Test the network with different number of nodes N from 1000 to For a network of size N, 1000 x N data values in the domain of [1, ) are inserted in batches 1000 exact queries, and 1000 range queries are executed CHORD and Multi-way tree are used to compare

Join and leave operations Cost of finding join node and replacement node Cost of updating routing tables

Insert and delete operations Cost of insert and delete operations

Search operations Cost of exact match query Cost of range query

Access load Access load for nodes at different levels

Effect of load balancing Average messages of load balancing operation Size of load balancing process

Effect of network dynamics Network Dynamics

Conclusion  BATON The first P2P overlay network based on a balanced tree structure Strengths Incur less cost of updating routing tables compared to other systems Support both exact match query and range query efficiently Flexible and efficient load balancing scheme Scalability (NOT bounded by network size or ID space before hand)

Thank you Q & A