R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing for NN retrieval
B+-tree and Hashing.
Accessing Spatial Data
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Spatial Indexing SAMs.
B + -Trees Same structure as B-trees. Dictionary pairs are in leaves only. Leaves form a doubly-linked list. Remaining nodes have following structure:
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.
Chapter 3: Data Storage and Access Methods
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Preliminaries Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children Trees –For all.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
Tirgul 6 B-Trees – Another kind of balanced trees.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
CS4432: Database Systems II
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Adapted from Mike Franklin
Starting at Binary Trees
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
B/B+ Trees 4.7.
B-Tree Michael Tsai 2017/06/06.
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes
COP Introduction to Database Structures
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
CS522 Advanced database Systems
Chapter 25: Advanced Data Types and New Applications
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Multiway Trees Searching and B-Trees Advanced Tree Structures
Spatial Indexing I R-trees
Adapted from Mike Franklin
B+-Trees j a0 k1 a1 k2 a2 … kj aj j = number of keys in node.
Tree-Structured Indexes
Presentation transcript:

R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman

R-Tree: Why, What … ? Why do we need R-Trees? What are R-Trees? How do I perform operations? Alternatives? Why not a B+ tree?

Properties of R-Trees Height Balanced 2 types of nodes Leaves point to disk pages Records in the leaves point to actual data objects For a max capacity of M, min occupancy should be M/2 Completely dynamic Guaranteed Fan-out of M/2 Every leaf record is a smallest bounding box. Root has at least two children

R-Trees: The Structure. Internal nodes : ( rectangle, child pointer) – N dimensional rectangle. – Pointer to all rectangles that are cointained. Leaf Nodes : (MBR, tuple-identifier) – MBR is minimum bounding rectangle – Tuple-identifier is a pointer to the data object.

R-tree of order 4

Example a b cde fghij kl mnop

a b c d m a b cde fghij kl mnop

a b c d m e f n a b cde fghij kl mnop

a b c d m e f n h g i o p a b cde fghij kl mnop

R-Trees: Operations Inserts Deletes Updates ( delete and re-insert) Queries/Searches – Names of all the roads in 1 sq km area? – Which buildings would be encountered between Roger’s Hall and Reitz Union? – Give me all rectangles that are contained in the input rectangle. – Give me all rectangles intersecting this rectangle.

Insert Similar to insertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded. – Which leaf to insert into? (Choose Leaf) – How to split a node? (Node Split)

Insert: Choose Leaf m n op

m

n

o

Insert: Choose leaf p

Node Splitting Quadratic method – Select max area gradient in the nodes as seeds. – Start clustering from the seeds Linear method – Select seeds with max separation using max x, y – Randomly assign rectangles to seeds

Delete Search for the rectangle If the rectangle is found, remove it. If the node is deficient, – Put the remaining entries in a re-insert queue. – Adjust the parent rectangle if needed. – Continue this till you reach the root. – Re-insert in such a way that all internal nodes remain above the leaf nodes. Adjust the rectangles making them smaller. Alternative sibling combination like a B-tree. – But re-insertion shows similar performance and is simple to implement.

Performance Tests R-Trees in C under UNIX on VAX11/780 computer running on 2D data(1057) for 5 page sizes – Linear node split was better than quadratic as expected. – CPU time unchanged with page sizes, indicating that when one side became full all split algorithms simply put everything in the other side. – Delete is affected by the fill factor. – Search insensitive to the fill factor and split algorithm used. – Storage space is a function of the fill factor, page size and split algorithm – All split algorithms came in 10% of the best exhaustive search and split algorithm.

Performance: 2 nd Innings Same configuration but on various data sizes 1057, 2238, 3295 and 4559 rectangles. – Low CPU cost, close to 150 micro seconds. – Comparable performance of split algorithms – Most space was used by the leaf nodes

Conclusions from the paper. R-Tree perform well for spatial data with non zero node sizes. With smaller node structure can be used as an in-memory spatial data index. – CPU performance of in-memory R-tree index is comparable and there is no IO cost. Linear split was almost as good as others. – It was fast. – Node split quality was a bit off-target, but it did not hurt the search performance noticeably. Possible use with abstract data types and abstract indexes to streamline handling of spatial data.