Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries Performance Comparison of xBR-trees and R*-trees for Single Dataset.

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Nearest Neighbor Search
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
On Spatial-Range Closest Pair Query Jing Shan, Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University.
Nearest Neighbor Queries using R-trees
Indexing and Range Queries in Spatio-Temporal Databases
Multidimensional Indexing
Searching on Multi-Dimensional Data
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
Spatial Indexing SAMs.
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Spatial Indexing I Point Access Methods.
Spatial Queries Nearest Neighbor Queries.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing (part 2)
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multimedia Databases Chapter 4.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Birch: An efficient data clustering method for very large databases
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
SEMILARITY JOIN COP6731 Advanced Database Systems.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Strategies for Spatial Joins
Spatial Queries Nearest Neighbor and Join Queries.
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Multiway Search Trees Data may not fit into main memory
Spatial Indexing I Point Access Methods.
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Nearest Neighbor Queries using R-trees
Data Structures: Segment Trees, Fenwick Trees
Spatio-Temporal Databases
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Spatial Indexing I R-trees
Birch presented by : Bahare hajihashemi Atefeh Rahimi
Multidimensional Search Structures
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries George Roumelis (Master in Information Systems, Open University of Cyprus, Cyprus) Michael Vassilakopoulos (*) (Dept. of Computer Science and Biomedical Informatics, University of Central Greece, Greece) Antonio Corral (Department of Languages and Computing, University of Almeria, Spain) * speaker Fifteenth East-European Conference on Advances in Databases and Information Systems September 19th–23rd, 2011, Vienna, Austria

Outline  Motivation  Contribution and Background  R-tree and R*-tree  XBR-tree, Internal Nodes and Leaf Nodes  Single Dataset query processing on XBR-trees  Experimental Results: Settings,Tree Building, Single Dataset Queries  Conclusions and Future Work 2

Motivation §In applications, a variety of Spatial Queries arise, l involving one spatial dataset, like Point Location, Window, Distance Range, Nearest Neighbor Queries l involving two spatial datasets, like Distance Join, Closest Pair, All-Nearest Neighbor Queries §Usually, there is no overall performance winner §Many researchers have compared different Access Methods, regarding their I/O and execution time performance, for a variety a Spatial Queries §In this work, we compare the popular R*-tree and the External Balanced Regular (xBR) tree for Single Dataset Spatial Queries 3

Contribution We implement the xBR-tree and present conclusions arising from the (real data based) experimental comparison of xBR-trees and R*-trees regarding I/O Performance and Execution Time for §Tree building §Point Location Queries (PLQs) §Window Queries (WQs) §Distance Range Queries (DRQs) §K-Nearest Neighbor Queries (K-NNQs) §Constrained K-Nearest Neighbor Queries (CK-NNQs) 4

Background (1) §Given an index I and a query point q, the PLQ returns true if q belongs to I and false otherwise §Given an index I and a query rectangle r, the result of the WQ is the set of all points in I that are completely inside r §Given an index I, a query point q and a distance threshold delta >= 0, the DRQ returns all points of I that are within the delta distance from q (according to a distance function) 5

Background (2) §Given an index I, a query point q, and a value K > 0, the K-NNQ returns K points of I which are closest to q (according to a distance function) §Given an index I, a query point q, a value K > 0 and a distance threshold delta >= 0, the CK-NNQ returns K closest points of I which are within delta distance from q (according to a distance function) 6

R10 R11 R12 R1 R2 R3R4 R5 R6R7 R8 R9 Nodes that contain points R-Tree Nested MBRs can be organized as a tree (R-tree) R1 R2 R5 R3 R7 R9 R6 R4 R8 R10R11 R12 7 Clusters of spatial objects can be recursively grouped into Minimum Bounding Rectangles – MBRs

R*-Tree 8  The R*-tree is the most popular R-tree variation  It added two major enhancements to the original R- tree, when a node overflow is caused l First, rather than just considering the area, the node-splitting algorithm in R*-trees also minimizes the perimeter and overlap enlargement of the MBRs l Second, an overflowed node is not split immediately, but a portion of entries of the node is reinserted from the top of the R*-tree (forced reinsertion)

 xBR-trees can be defined for various dimensions  For the ease of exposition, we assume 2 dimensions  For 2 dimensions the hierarchical decomposition of space is that of Quadtrees.  The space indexed by an xBR-tree is a square, expressed in a coordinate system of real numbers  The nodes of xBR-trees are disk pages and are distinguished in two kinds: leaves, which store the actual multidimensional points and internal nodes, which provide a multiway indexing mechanism for these data xBR-Tree 9

 Each node consists of a non-predefined number of entries of the form  An address is formed by directional digits (NW, NE, SW, SE), determines the region of a child node and is accompanied by the pointer to this child  Shape is a flag that determines if the region of the child is a complete square (used widely in queries)  REG stores the coordinates of the region referenced by address (it is more expensive to calculate it)  The region of a child is the subquadrant of its address minus the subquarants of the next addresses in the same node Internal Nodes 10

 The region of the root is the original space (a quadrangle)  The * symbol denotes the end of an address  The address of the right child is 0*, since the region of this child is the NW quadrant of the original space  The address of the left child is *: its region is the whole space minus the region of the right child An xBR-tree example 11

 Queries are processed in a top-down manner on the xBR-tree, like on the R*-tree  During a PLQ for a specified point, the appropriate leaf is determined by descending the tree from the root (unlike the R*-tree, a single path is followed)  During WQs, we examine if the subquadrant of the current internal-node entry and the query window intersect and follow the pointer to the related child  We repeat until we have examined all entries of the internal node, or until the query window is completely inscribed inside the region of the entry that we examine Processing PLQs and WQs 12

 The DRQ follows the same strategy as the WQ  At first, the querying circle is replaced from its MBR (the calculations are faster in this way) and if the answer about the intersection of the subquadrant of the current entry and the query MBR is positive, then we follow the pointer to the related child at the next lower level  If we reach a leaf with a region that intersects the query MBR, we select the points in the leaf that are inside the query circle Processing DRQs 13

 The K-NNQ algorithm follows a DF tree traversal  In an internal node, entries are visited according to their mindist from the query point  The process is repeated recursively until the leaf level is reached, where a potential next NN is found  It is possible to reach a leaf, but the next NN may exist in a neighboring region. Thus, we use a global max K- heap and insert in it every point of this leaf that is nearest to the query point than the root of the heap  When the heap is full, according to a set of conditions*, the search is stopped *Roumelis et al., Nearest Neighbor Algorithms Using xBR-Trees, PCI 2011 Processing K-NNQs 14

Experimental Results  We used 5 real datasets of different sizes (CSN: line- segments, NApp: points, NAcl: 9203 points, Narr: line-segments, Nard: line-segments)  To create 2d point datasets from non-point datasets, we used the centroids of the line-segment MBRs  Environment used: Linux machine, Intel core duo 2x2GHz processor, 3 GB of RAM, gcc  Performance measurements: I/O activity (page accesses) and Execution Time  An extended set of experimental results is accessible from: Experimental settings 15

Tree Construction  In all cases, the xBR-tree uses less space (i.e. it is more compact) and time than the R*-tree (the R*-tree creation is slower, partially, due to the use of forced reinsertion that improves searching efficiency)  The difference in creation time is enlarged as the size of node increases 16

Point Location Query  The xBR-tree needs less read accesses and executes every query faster than the R*-tree  The xBR-tree needs a number of disk accesses equal to its tree height, while the R*-tree needs at least this number of access and, in most cases, even more 17

Window Query (1)  The xBR-tree needs more page accesses  As the size of node increases the I/O difference between the two trees becomes smaller  In both trees, a linear dependence of the number of accesses to the size of the node appears  This is due to reduction of tree height as the size of node increases 18

Window Query (2)  Query windows that were inhabited by points (non-empty windows) were used  The xBR-tree becomes clearly faster (execution time) for all sizes of nodes and the I/O efficiency of the two trees is closer  Main memory processing is simpler (and thus faster) for the xBR-tree 19

Distance Range Query  Τhe xBR-tree needs less disk accesses and is faster than the R*-tree, in all cases and for all datasets  The results were even better when the DRQs addressed only non- empty regions 20

K- Nearest Neighbor Query  The xBR-tree needs more disk accesses than the R*-tree, but the difference gets smaller when the size of node increases  Regarding the execution time, the xBR-tree shows improved performance, in relation to its I/O difference from the R*-tree  The worse time performance of both trees is for larger node sizes  This is due to the fact that as the node size increases, the trees become very wide and very short 21

Constrained K-NN Query  Studying the results of several CK-NNQ experiments, we see that the xBR-tree is improved for both performance categories  Depending on the dataset, for non-empty regions the CK-NNQ time performance of the xBR-tree is almost the same to, or much better than the R*-tree 22

Conclusions  We performed an extensive (real data based) experimental comparison of the performance of xBR-trees and R*-trees, for spatial queries where a single index is involved  The conclusions arising from this comparison show that the two structures are competitive  The xBR-tree is smaller and is built faster than the R*-tree  The performance of the xBR-tree is higher for PLQs and DRQs and for WQs when the query window is non-empty  The R*-tree is better for K-NNQs and needs less disk access for CK-NNQs  The execution time winner for CK-NNQs depends on whether the query returns result points, or not 23

Future Work  To extend the xBR-tree for modeling empty regions too  To use in the xBR-tree internal-node address representation that facilitates main memory computation  To study the relative performance of the two structures using memory consuming BF algorithms  To study the relative performance of the two trees for two dataset (join) queries  To study the relative performance of the two structures in the presence of buffering 24

Thank you for your attention