BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.

Slides:



Advertisements
Similar presentations
VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes Presenter: Quang Hieu Vu H.V.Jagadish, Beng Chin Ooi, Quang Hieu Vu,
Advertisements

 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Scalable Content-Addressable Network Lintao Liu
Chapter 4: Trees Part II - AVL Tree
1Department of Electrical Engineering and Computer Science, University of Michigan, USA. 2Department of Computer Science, National University of Singapore,
A Fast and Memory Efficient Dynamic IP Lookup Algorithm Based on B-Tree Author:Yeim-Kuan Chang and Yung-Chieh Lin Publisher: 2009 International Conference.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Rooted Trees. More definitions parent of d child of c sibling of d ancestor of d descendants of g leaf internal vertex subtree root.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Wide-area cooperative storage with CFS
P2P Course, Structured systems 1 Skip Net (9/11/05)
P2P Course, Structured systems 1 Introduction (26/10/05)
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
CS4432: Database Systems II
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
COSC 2007 Data Structures II Chapter 15 External Methods.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
B-Tree – Delete Delete 3. Delete 8. Delete
B-trees Eduardo Laber David Sotelo. What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables II.
AA Trees.
Multiway Search Trees Data may not fit into main memory
SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.
Tree data structure.
KD Tree A binary search tree where every node is a
B-Trees (continued) Analysis of worst-case and average number of disk accesses for an insert. Delete and analysis. Structure for B-tree node.
Tree data structure.
Search Sorted Array: Binary Search Linked List: Linear Search
Presentation transcript:

BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu

Related Work  P-Grid (CoopIS’01) Based on a binary prefix tree structure.  Can not guarantee log N search step boundary  P-Tree (WebDB’04) Based on B+-tree structure and uses CHORD as overlay framework Each node maintains a branch of the B+tree  Expensive cost to keep consistence knowledge among nodes  Multi-way tree (DBISP2P’04) Each node maintains links to its parent, its siblings, its neighbors, and its children  Can not guarantee log N search step boundary

BATON Architecture Binary Balanced Tree Index Architecture  BATON: BAlanced Tree Overlay Network  Definition: A tree is balanced if and only if at any node in the tree, the height of its two subtrees differ by at most one.

Theorems  Theorem 1: The tree is a balanced tree if every node in the tree that has a child also has both its left and right routing tables full (*).  Theorem 2: If a node, say x, contains a link to another node, say y, in its left or right routing tables, parent node of x must also contain a link to parent node of y unless the same node is parent of both x and y. (*) A routing table is full if none of the valid links is NULL.

Node join  Example: new node u joins the network a ih kjmlon f d g e b c pqrs u u

Node join  Cost of finding a node to join: O(log N)  When a node accepts a new node as its child Split half of its content (its range of values) to its new child Update adjacent links of itself and its new child Notify both its neighbor nodes and its new child’s neighbor nodes to update their knowledge Cost: 6 log N

Node departure  When a node wishes to leave the network If it is a leaf node and there is no neighbor node having children, it can leave the network Transfer its content to the parent node, and update correspondence adjacent link Notify its neighbor nodes and its parent’s neighbor nodes to update their knowledge Cost: 4 log N If it is a leaf node and there is a neighbor node having children, it needs to find a leaf node to replace it by sending a FINDREPLACEMENTNODE request to a child of that neighbor node If it is an intermediate node, it needs to find a leaf node to replace it by sending a FINDREPLACEMENTNODE to one of its adjacent nodes

Node departure  Example: existing node b leaves the network a ih kjmlon f d g e c pqsu t b r r

Node departure  Cost of finding a leaf node to replace: O(log N)  When a node comes to replace a leave node Notify its parent and its neighbor nodes as in case of leaf node leaving: 4 log N Notify its new parent node, its new neighbor nodes, and the parent’s neighbor nodes: 4 log N Total cost: 8 log N

Fault tolerance  Node failure Nodes discovering failure of a node report to that node’s parent. The failure’s parent node will take responsibility for finding a leaf node to replace if necessary. Routing information of the failure node can be recovered by contacting its neighbor nodes via routing information of its parent.  Fault tolerance: failure node can be passed by two ways Through routing tables (similar to CHORD) - horizontal axis Through parent-child and adjacent links - vertical axis Specifically, even if all nodes at the same level fail, the tree is not partitioned

Network restructuring  Necessary in case of forced join or forced leave that is used in load balancing scheme  Network restructuring is triggered when the condition in the theorem 1 is violated  Network restructuring is done by shifting nodes via adjacent links No data movement is required Each shifted node requires O(log N) effort to update routing tables

Forced join  Example 1: network restructure is triggered as a forced join

Forced leave  Example 2: network restructuring is triggered as a forced leave

Index construction  Each node is assigned a range of values  The range of values directly managed by a node is Greater than the range managed by its left adjacent node Smaller than the range managed by its right adjacent node

Exact match query  Example: node h wants to search data belonged to node c, say 74 a ih kjmlon f d g e b c pqrts [0-5) [8-12) [17-23)[34-39)[51-54)[61-68)[75-81)[89-93) [5-8) [23-29)[54-61)[81-85) [12-17) [72-75) [45-51) [29-34)[39-45)[68-72)[85-89)[93-100)

Range query  Process similar to exact match query First, find an intersection with searched range Second, follow adjacent links to retrieve all results  Cost Exact match query: O(log N) Range query: O(log N + X) where X is the total number of nodes containing searched results

Data insertion and deletion  Insertion Follow the exact match query process to find the node where data should be inserted except that If it is the left most node and the inserted value is still less than its lower bound, or if it is the right most node and the inserted value is still greater than its upper bound, it expands its range of values to accept the new inserted value. In this case, additional log N cost is needed for updating routing tables  Deletion Follow the exact match query process to find the node containing data which should be deleted  Cost: similar to exact match query process O(log N) for both insertion and deletion

Load balancing  Load balancing process is initialized when a node is overloaded or under loaded due to insertion or deletion  2 load balancing schemes Do load balancing with adjacent nodes An overloaded node finds a lightly loaded node to share work load (only if the overloaded / under loaded node is a leaf node) A lightly loaded node is found by traveling through neighbor nodes within O(logN) steps. Once found, the lightly loaded node transfers its content to one of its adjacent nodes, forced leaves its current position, and forced joins as a child of the overloaded node. Network restructuring is triggered if necessary Similar process is applied to under loaded nodes  Cost: O(log N) for each node attending load balancing process

Load balancing  Example: node g is an overloaded node while node f is a lightly loaded node

Experimental study  Experimental setup Test the network with different number of nodes N from 1000 to For a network of size N, 1000 x N data values in the domain of [1, ) are inserted in batches 1000 exact queries, and 1000 range queries are executed CHORD and Multi-way tree are used to compare

Join and leave operations Cost of finding join node and replacement node Cost of updating routing tables

Insert and delete operations Cost of insert and delete operations

Search operations Cost of exact match query Cost of range query

Access load Access load for nodes at different levels

Effect of load balancing Average messages of load balancing operation Size of load balancing process

Effect of network dynamics Network Dynamics

Conclusion  BATON The first P2P overlay network based on a balanced tree structure Strengths Incur less cost of updating routing tables compared to other systems Support both exact match query and range query efficiently Flexible and efficient load balancing scheme Scalability (NOT bounded by network size or ID space before hand)

Thank you Q & A