1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) II (some slides are based on notes by C. Faloutsos)
Multimedia Database Systems
Indexing and Range Queries in Spatio-Temporal Databases
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Search Trees.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Spatial Indexing for NN retrieval
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Accessing Spatial Data
Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.
Spatial Indexing SAMs.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Retrieving Data from Disk – Access Methods Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 19, 2005 Slides.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
R* Tree By Rohan Sadale Akshay Kulkarni.  Motivation  Optimization criteria for R* Tree  High level Algorithm  Example  Performance Agenda.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
B+-Tree Deletion Underflow conditions B+ tree Deletion Algorithm
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Spatial Indexing I Point Access Methods.
Tree-Structured Indexes
Spatial Indexing I R-trees
Database Design and Programming
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein

2 Spatial Indexing  Applications: when we need to deal with n dimensional objects  Geographic Information Systems (GIS)  Very Large Scale Integrated Circuit (VLSI)  Query workload: equality and range predicates in the n dimensional space  Find points in a polygon  Find polygons in a polygon  Find polygons overlapping with a polygon  Find polygons containing a polygon

3 R-tree: A Spatial Index  Like B-tree  Height-balanced given arbitrary inserts/deletes.  Minimum 50% occupancy (except for root).  Unlike B-tree  In a leaf node, data entry has the form.  In a non-leaf node, index entry has the form.  No particular order among data entries at the leaf level.  Why rectangles? Why not use a more precise description?

4 An Example R-tree Note the containment and overlapping between objects!

5 Search  Given a n-dimensional search rectangle S, find all data entries that overlap with S.  At each non-leaf node, find all MBRs overlapping with S, follow their child pointers.  At each leaf node, find all MBRs overlapping with S, retrieve the corresponding data records.  For each data record, do a detailed comparison.  Need to traverse multiple paths! (DFS)

6 Insert  Insert a data entry E : Step 1: Invoke ChooseSubtree to descend from the root and select a leaf node L for insertion. Step 2: Insert E to L. If L has room, insert E. Otherwise, invoke SplitNode to split entries in L and E into the L node and a new LL node. Step 3: Invoke AdjustTree to propagate changes upward, passing L and LL if split. Adjust covering rectangles along the path bottom-up. Propagate node splits as necessary.

7 ChooseSubtree  ChooseSubtree: Descend from the root to select a leaf node for placing a new data entry E.  At each level of the tree, pick an index (data) entry using the least enlargement of the MBR. Ties broken by picking a MBR with the smallest area.  Heuristic-based algorithm. Other alternatives exist.  More on this when discussing R*-tree.

8 SplitNode  SplitNode: Split M+1 entries (M from the node L, and the new entry E) between L and a new node LL.  Exhaustive: enumerate all legal subsets of all M+1 entries, considering the minimum occupancy requirement.  Guarantees to find the optimal split, but at an exponential cost!

9 SplitNode (Contd.)  SplitNode: Split entries in the node L plus the new entry E between L and a new node LL.  Quadratic: sacrifice optimality for performance.  Pick seeds: Compute the area increase if two entries (Ei, Ej) are merged. Pick the pair with largest area increase, treat as the seeds for two groups.  Split: Greedily add other entries to the two groups by picking the next E’ that creates the max. difference in area increase between the two groups, adding E’ to the group with less area increase (ties resolved by picking the group with fewer entries).

10 SplitNode (Contd.)  SplitNode: Split entries in the node L plus the new entry E between L and a new node LL.  Linear: Like quadratic, but pick seeds differently. For each dimension, pick the entry with lowest value and the other with the highest value, take the distance, normalize with the length of the dimension. Among all, pick the dimension with highest normalized distance. Treat the two extremes along that dimension as the seeds.  In practice, some systems (e.g. Postgres) seem to use the 50% minimum occupancy and quadratic split.

11 Delete  Remove a data entry E : Step 1: Invoke FindLeaf to descend from the root and locate a leaf node L containing E. Like search may traverse multiple paths, but would expect just one match. Step 2: Delete E from L. Invoke CondenseTree, passing L, to adjust the tree.

12 CondenseTree  Start with a leaf node L with a data entry deleted. Step 1: Bottom-up adjustment. At each level, do: If L has too few entries, delete L and its index entry in the parent node P, add its entries to Q for re- insertions. Otherwise, adjust its covering rectangle in P. Set L := P, and repeat. Step 2: Insert orphaned entries in Q. Use the Insert algorithm, but entries (including non-leaf ones) should be inserted at the right level ! Reinsertion incrementally refines the tree structure!

13 R*-tree  Parameters of retrieval performance:  Area covered by a MBR, precisely, the area covered by a MBR but not its enclosed MBRs should be minimized.  Overlap between MBRs should be minimized.  Margin of a MBR, i.e., the sum of the lengths of edges of the MBR, should be minimized. Benefit queries with quadratic rectangles. Improve the structure.  Storage utility should be optimized.

14 Issues with R-tree  Issues with R-tree:  SplitNode is designed to minimize the covering rectangles of the two nodes after split.  Can cause problems regarding other parameters. Node to be splitLow utili- zation Too much overlap Area increase Good split!

15 Insert in R*-tree  ChooseSubtree: Minimize different badness metrics depending on tree level:  area cost at interior nodes, overlap cost at leaf nodes. Why so?  Interesting idea, but performance is only slightly better than R-tree.  Can have some improvement in CPU cost, but benefit is not demonstrated.

16 Insert in R*-tree (Contd.)  SplitNode: O(NMlogM), more effective  For each dimension, sort entries by the lower value and then by upper value of their rectangles, O(MlogM)  For each sort, create M-2m+2 distributions of entries k-th distribution: Group 1 (m-1)+k entries, Group 2 contains the remaining entries. Each group having [m, M-m+1] entries implies k  M-2m+2.  For each distribution, compute goodness value based on (a)area-value, (b)margin-value, (c)overlap-value.  Select the best axis, choose the best distribution.  The algorithm here uses (b) to choose split axis, (c) to pick a distribution along that axis and (a) to break ties.

17 Forced Reinserts  Structure varies with order of insertion.  Structure determined by earlier inserts is not suitable for good retrieval performance at the current situation.  Splits only cause local reorganization.  So, force entries to be reinserted.  Used in R-tree to deal with deletion.  Used in R*-tree to handle entries in node splitting.  Overflow treatment replaces node splitting.  Can reinsert both leaf and interior nodes.  For each entry to be reinserted, do it only if the call of OverflowTreatment is the first at a specific level.  Why? Reinsertion can go to the same node that needs to be treated. No need to reinsert any more in that case. Split instead.

18 Comments on Forced Reinserts  Benefits:  decreases overlap of siblings  improves storage utilization  splits less often: causes better utilization?  shapes tend to be more quadratic: pack better, generate smaller parents…  Experimental results show the best performance with reinsertion of 30% most “extreme” entries.

19 Comments on R*-tree  Popular, because it outperforms R-tree on search.  Problems:  Concurrency problems because of reinsertion.  Heuristics based, no formal analysis. Benefits proven empirically, but may vary with data sets used.  Not clear how different heuristics/algorithms contribute to the results, need some sort of cost decomposition.