R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of.

Slides:

Advertisements

Similar presentations

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Advertisements

Multimedia Database Systems

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree

2-dimensional indexing structure

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.

Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.

Spatial Indexing for NN retrieval

B+-tree and Hashing.

Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.

Accessing Spatial Data

Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.

CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.

Spatial Indexing SAMs.

1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.

Chapter 3: Data Storage and Access Methods

B-Trees Disk Storage What is a multiway tree? What is a B-tree?

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.

Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.

1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.

CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.

1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

B-Trees and B+-Trees Disk Storage What is a multiway tree?

Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.

Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.

Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..

E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.

Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.

1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.

CS4432: Database Systems II

R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.

Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.

INDEXING SPATIAL DATABASES Atinder Singh Department of Computer Science University of California Riverside, CA

R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.

Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.

Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.

Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.

Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Spatial Data Management Chapter 28.

1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,

Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.

Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.

12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,

R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.

1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.

IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.

Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.

Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.

Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.

Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:

Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.

1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.

Spatial Data Management

Multiway Search Trees Data may not fit into main memory

Azita Keshmiri CS 157B Ch 12 indexing and hashing

Chapter 25: Advanced Data Types and New Applications

Extra: B+ Trees CS1: Java Programming Colorado State University

Spatio-Temporal Databases

Advance Database System

Multiway Trees Searching and B-Trees Advanced Tree Structures

Spatial Indexing I R-trees

Presentation transcript:

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of Southern California

Motivating Example Type in your street address in Google Type in your street address in Google

Example (Cont…) Show me all the pizza places close by: Show me all the pizza places close by:

Terminology Example query is termed a spatial query. Example query is termed a spatial query. R-tree is a spatial index structure. R-tree is a spatial index structure.  K-D-B trees are useful for point data only.  Exact-point lookup!  Show me the USC Salvatory Computer Science building.  R-tree represents data objects in intervals in several dimensions.  Exact-point and range lookups!  Show me all Pizza places in a 2 mile radius of USC Salvatory Computer Science building. R-tree is: R-tree is:  A height-balanced tree similar to B-tree with index records in its leaf nodes containing pointers to data objects.  A node is a disk page.  Assumes each tuple has a unique identifier, RID.

R-Tree: Leaf Nodes Leaf nodes contain index records: Leaf nodes contain index records:  (I, tuple-identifier) tuple-identifier is RID, tuple-identifier is RID, I is an n-dimensional rectangle that bounds the indexed spatial object I is an n-dimensional rectangle that bounds the indexed spatial object I = (I 0, I 1, …, I n-1 ) where n is the number of dimensions. I = (I 0, I 1, …, I n-1 ) where n is the number of dimensions. I i is a closed bounded interval [a,b] describing the extent of the object along dimension i. I i is a closed bounded interval [a,b] describing the extent of the object along dimension i. Values for a and b might be infinity, indicating an unbounded object along dimension i. Values for a and b might be infinity, indicating an unbounded object along dimension i.

R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries.

R-Tree: A 2-D (n=2) Example

R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions?

R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? What is this?

R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? Disk Page address!

R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? How about this? What is it?

R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? An n dimensional rectangle: I = (I 0, I 1, …, I n-1 )

R-tree: Properties Assume: Assume: 1. M = Maximum number of entries in a node. 2. m <= M/2 3. N = Number of records R-tree has the following properties: R-tree has the following properties:  Every leaf node contains between m and M index records. Root node is the exception.  For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object represented in the indicated tuple.  Every non-leaf node has between m and M children. Root node is the exception.  For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node.  The root node has at least two children unless it is a leaf.  All leaves appear on the same level.  Height of a tree = Ceiling(log m N)-1.  Worst case utilization for all nodes except the root is m/M.

Searching Descend from root to leaf in a B+-tree manner. Descend from root to leaf in a B+-tree manner. If multiple sub-trees contain the point of interest then follow all. If multiple sub-trees contain the point of interest then follow all. Assume: Assume:  EI denotes the rectangle part of an index entry E,  Ep denotes the tuple- identifier or child-pointer. Search (T: Root of the R- tree, S: Search Rectangle) Search (T: Root of the R- tree, S: Search Rectangle)  If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search(Ep, S).  If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record.

Insertion Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Insert (T: Root of the R-tree, E: new index entry) Insert (T: Root of the R-tree, E: new index entry) 1. Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E. 2. Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L. 3. Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed. 4. Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

Insertion: ChooseLeaf ChooseLeaf (E: new index entry) ChooseLeaf (E: new index entry) 1. Initialize: Set N to be the root node, 2. Leaf check: If N is a leaf, return N. 3. Choose subtree: Let F be the entry in N whose rectangle FI needs least enlargement to include E. Resolve ties by choosing the entry with the rectangle of smallest area. 4. Descend until a leaf is reached: Set N to be the child node pointed to by Fp and repeat from step 2.

SplitNode: Node Splitting A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Heuristic: The total area of two covering rectangles after a split should be minimized. Heuristic: The total area of two covering rectangles after a split should be minimized. Total area is larger!

SplitNode: Node Splitting A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Heuristic: The total area of two covering rectangles after a split should be minimized. Heuristic: The total area of two covering rectangles after a split should be minimized. Total area is larger!

Node Splitting: How? How to find the minimum area node split? How to find the minimum area node split? 1. Exhaustive algorithm, 2. Quadratic-cost algorithm, 3. Linear cost algorithm.

Exhaustive Algorithm Generate all possible groups and choose the best with minimum area. Generate all possible groups and choose the best with minimum area. Number of possibilities ~ 2 to power of M-1 Number of possibilities ~ 2 to power of M-1  M ~ 50  Number of possibilities ~ 600 Trillion

Exhaustive Algorithm Generate all possible groups and choose the best with minimum area. Generate all possible groups and choose the best with minimum area. Number of possibilities ~ 2 to power of M-1 Number of possibilities ~ 2 to power of M-1  M ~ 50  Number of possibilities ~ 600 Trillion  US deficit pales!

Quadratic-Cost algorithm A heuristic to find a small-area split. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Pick two of the M+1 entries to be the first elements of the two new groups.  Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Assign remaining entries to groups one at a time.

Quadratic-Cost algorithm A heuristic to find a small-area split. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Pick two of the M+1 entries to be the first elements of the two new groups.  Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Assign remaining entries to groups one at a time.

Quadratic-Cost algorithm A heuristic to find a small-area split. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Pick two of the M+1 entries to be the first elements of the two new groups.  Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Assign remaining entries to groups one at a time.

Linear Cost Algorithm Identical to Quadratic with the following differences: Identical to Quadratic with the following differences:  Uses a different version of PickSeeds.  PickNext simply chooses any of the remaining entries. Linear: Choose two objects that are furthest apart. Quadratic: Choose two objects that create as much empty space as possible.

Comparison Linear node-split is simple, fast, and as good as quadratic! Linear node-split is simple, fast, and as good as quadratic!  Quality of the splits is slightly worse!

Insertion Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Insert (T: Root of the R-tree, E: new index entry) Insert (T: Root of the R-tree, E: new index entry) 1. Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E. 2. Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L. 3. Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed. 4. Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

AdjustTree Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits. Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits.

Deletes Straightforward. The only complication is under-flows: Straightforward. The only complication is under-flows: An under-full node can be merged with whichever sibling will have its area increased least. An under-full node can be merged with whichever sibling will have its area increased least.  Orphaned entries are inserted back into the R- Tree.

R-Tree

R-tree Variations R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries. R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries.  No overlap for minimum bounding rectangels at the same level.  Specific object’s entry might be duplicated.  Insertions might lead to a series of update operations in a chain- reaction.  Under certain circumstances, the structure may lead to a deadlock, e.g., every rectangle encloses a smaller one.

R*-tree [1990] Node split is more sophisticated. Node split is more sophisticated.  Does not obey the limitation of the number of pairs per node.  When a node overflows, p entries are extracted and reinserted in the tree (p might be 25%).  Considers minimization of:  the overlapping between minimum bounding rectangles at the same level.  the perimeter of the produced minimum bounding rectangles. Insertion is more expensive while retrievals are faster. Insertion is more expensive while retrievals are faster.

Static R-trees Assumes the dataset is known in advance. Assumes the dataset is known in advance. Static R-trees are more efficient than dynamic ones: Static R-trees are more efficient than dynamic ones:  Tree structure is more compact,  Contains fewer news,  Overlap between minimum bounding rectangles is reduced.

Summary R-tree is a spatial index structure that provides competitive average performance. R-tree is a spatial index structure that provides competitive average performance. Many different variations in the literature: Many different variations in the literature:  Spatio-temporal access methods, 3-d R-tree.  Historical R-trees and Time-Parameterized R-tree fo spatiotemporal applications. Have been used to speed-up operations in OLAP applications, data warehouses and data mining. Have been used to speed-up operations in OLAP applications, data warehouses and data mining.