1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.

Slides:



Advertisements
Similar presentations
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Advertisements

VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes Presenter: Quang Hieu Vu H.V.Jagadish, Beng Chin Ooi, Quang Hieu Vu,
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Scalable Content-Addressable Network Lintao Liu
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
Indexing Network Voronoi Diagrams*
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
2-dimensional indexing structure
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
B+-tree and Hashing.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
P2P Course, Structured systems 1 Introduction (26/10/05)
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Balanced Search Trees Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Starting at Binary Trees
Comp 335 File Structures B - Trees. Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
Internal and External Sorting External Searching
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
AA Trees.
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Spatial Indexing I Point Access Methods.
Advanced Topics in Data Management
Spatio-Temporal Databases
Segment Trees Basic data structure in computational geometry.
Tree-Structured Indexes
Richard Anderson Spring 2016
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux

2 Plan Introduction SDDS R-tree SD-Rtree Evolution Balancing Spatial Rotations Overlapping Redundant Coverage Queries Performance Conclusion

3 SDDS Principles (1993) Data are at server nodes Communicating through point-to-point messaging ; Overloaded servers split over new servers Queries go to client nodes use local images of the SDDS No central addressing component A node can be client and server (peer)

4 SDDS Principles (1993) An outdated image may send a query an incorrect server Servers forward such a query to the correct server Image gets adjusted Image Adjustment Message (IAM) comes back Client does not repeat the same error twice Data are basically in the RAM of the servers

5 SD-Rtree : a Spatial SDDS Distributed Spatial Data

6 SD-Rtree : a Spatial SDDS Distributed Index No central component

7 SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future)

8 SD-Rtree : Generalizes R-tree R-tree: Nodes are minimal bounding boxes Leaf nodes point to data Internal nodes bound subtrees May overlap Split when overflow Generate balanced m-ary tree

9 SD-Rtree : Generalizes R-tree R-tree: An insert may go through multiple paths Ends up in the smallest bounding box If there is any One of the boxes gets enlarged Box may split

10 SD-Rtree : Generalizes R-tree R-tree: Search may go through multiple paths All paths may bring relevant objects

11 Distribution issues First issue : adapt the structure to the context Cost model based on messages No paging ! The degree M of the tree is not a constraint => split and balancing algorithms must be reviewed Second issue : distribute the tree over the servers Balance evenly the load ; Do not overload the root node => search algorithms must be reviewed as well

12 SD-Rtree: a Balanced Binary Tree Each split generates a new edge Half of data moves to the new server Each server hosts exactly one leaf and one internal node of SD-Rtree

13 SD-Rtree: a Balanced Binary Tree The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has exactly two sons Each leaf node stores a subset of the indexed dataset At each node, the height of the subtrees differ by at most one Each server stores one data node and one routing node

14 Sd-tree: Binary Tree Structure di = data node (leaf) ri = routing node (internal node)

15 Sd-tree: Tree Distribution

16 Sd-tree: Evolution

17 SD-Rtree Balancing The binary tree should be height- balanced The heights of the two subtrees rooted at any node should not differ by more than 1 (cf. AVL trees) The tree height is then logarithmic in the number of leaves

18 SD-Rtree Balancing SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial change rectangles of internal nodes Best rotation minimizes rectangle overlapping Tie breaking minimizes the « dead space »

19 SD-Rtree : Spatial Rotations

20 Rotation Pattern Properties The sons of a node are not ordered => more freedom for reorganizing the tree Any imbalanced node matches a rotation pattern A rotation pattern is a subtree a(b(e(f,g),d),c) such that: h(c) = h(d) = h(f ) = n − 1 (n > 0) h(g) = max(0, n − 2)

21 SD-Rtree :Spatial Rotation

22 Rotation Cost Constant number of messages (3 or 6, depending on the choice) Few rotations in practice In particular when the dataset is uniformly distributed See our experiments

23 SD-Rtree : Images Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact server IAMs make it a subtree Splits make images outdated IAMs adjust it incrementally

24 Image Adjustment Client contacts a server with a query Each incorrect server initiates a traversal of the tree During the traversal, the description of the nodes is collected The correct server sends the up-to-date tree structure The client updates its image

25 Image Construction Using the image The client first searches its local image and chooses the servers that best corresponds to the query The correct server is found in O(log n) in the worst case

26 Out-of-range situation

27 Insertion of objects

28 Overlapping management The directory rectangles in an Rtree may overlap Local subtree does not suffice for locating all the nodes that contains the point (point query) or the window (window query) searched for. SD-Rtree servers maintain data on node overlapping Redundant Coverage It avoids to systematically access the root node.

29 Redundant Coverage Example The region common to A and B is stored on both nodes If a point query sent to A falls in the region shared with B: A sends a point query message to B For D: we must keep the intersection with C or B: here empty.

30 Queries Point queries and window queries. The technique is similar to the insertion algorithm: Search in the client image a server whose mbb contains the point or intersects the window Send the query to this server If the server actually covers the point or the window; it answers to the client; else it sends the query to its parent node A server uses the overlapping information to transmit the query

31 Experiments Synthetic data (points and rectangles) generated with GSTD to objects 0 to queries Server capacity: objects Comparison of three SD-Rtree variants: BASIC: no image; every query is processed top- down from the root IMSERVER: no IAMs among the servers IMCLIENT: client images

32 Cumulative Insert cost

33 Per Insert Cost

34 Cost of balancing

35 Image convergence

36 Distribution of messages

37 Cost per Query

38 Conclusion SD-Rtree is an efficient scalable distributed Rtree For very large spatial data collections Can be processed in distributed RAM Access time much faster than to disk data Load balancing Spatial rotations Overlapping management Redundant coverage O(log n) worst insert cost Future work kNN-queries Objects distribution balancing on servers

39 SD-Rtree Thank You for Your Attention Questions: