Trees for spatial indexing

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj.
Trees for spatial indexing
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Spatial Mining.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Access Methods Chapter 26 of book Read only 26.1, 26.2, 26.6 Dr Eamonn Keogh Computer Science & Engineering Department University of California.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Spatial Indexing SAMs.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
B + -Trees (Part 2) Lecture 21 COMP171 Fall 2006.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AVL Trees / Slide 1 Deletion  To delete a key target, we find it at a leaf x, and remove it. * Two situations to worry about: (1) target is a key in some.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD Shahram Ghandeharizadeh Computer Science Department University of.
CS4432: Database Systems II
Binary Trees Chapter 6.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Trees for spatial data representation and searching
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
CS Data Structures Chapter 15 Trees Mehmet H Gunes
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
Starting at Binary Trees
Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
File Processing : Multi-dimensional Index 2015, Spring Pusan National University Ki-Joune Li.
Prof. Bayer, DWH, Ch.6, SS Chapter 6: UB-tree for Multidimensional Indexing Note: all relational databases are multidimensional: a tuple in a relation.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Multiway Search Trees Data may not fit into main memory
CMPS 3130/6130 Computational Geometry Spring 2017
Spatial Indexing I Point Access Methods.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+ Tree.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Indexing and Hashing Basic Concepts Ordered Indices
Presentation transcript:

Trees for spatial indexing

Tree (data structure) Introduction B-Tree,B+-Tree,B*-Tree Spatial Access Method (SAM) vs Point Access Method (PAM) Buddy-Tree, UB-Tree (8 slides) R-Tree X-Tree, TV-Tree

Pantheon Problem 200’000’000 points are in a database. Indexing in a B-Tree is not suffisant. We want to optimize the query range. Which indexing method should we use ? What is the best structure ?

Pantheon

What kind of data structure ? Structur depends on what kind of data : point access method : A data structure to search for lines, polygons, … etc. k-d tree quadtree UB-tree buddy tree Spatial access method : A data structure and associated algorithms primarily to search for points defined in multidimensional space. D-tree P-tree R+-tree R-tree R*-tree

Types of queries in spatial data 'geometry' refers to a point, line, box or other two or three dimensional shape, the kind of queries we need are : Distance(geometry, geometry) Equals(geometry, geometry) Disjoint(geometry, geometry) Intersects(geometry, geometry) Touches(geometry, geometry) Crosses(geometry, geometry) Overlaps(geometry, geometry) Contains(geometry, geometry) Several other operations performed on only one geometry such as length, area and centroid

Introduction Some Definitions : Node : A node may contain a value or a condition or represent a separate data structure or a tree of its own. Each node in a tree has 0 or more child nodes. A node that has a child is called the child's parent node (or ancestor node, or superior). A node has at most one parent. Root nodes : The topmost node in a tree is called the root node. Being the topmost node, the root node will not have parents. Every node in a tree can be seen as the root node of the subtree rooted at that node. Leaf nodes : Nodes at the bottom most level of the tree are called Leaf nodes. Since they are at the bottom most level, they will not have any children.

Tree of the trees B-Tree … … … … B+ B* … R-Tree … Buddy UB-Tree … UBU X TV ? ? Spatial Access Method (SAM) vs Point Access Method (PAM)

Common Operations Enumerating all the items Searching for an item Adding a new item at a certain position on the tree Deleting an item Removing a whole section of a tree (called pruning) Adding a whole section to a tree (called grafting) Finding the root for any node

B-Tree a B-tree is a tree data structure that keeps data sorted and allows insertions and deletions in logarithmic amortized time. It is most commonly used in databases and filesystems. in a 2-3 B-tree (often simply 2-3 tree), each internal node may have only 2 or 3 child nodes. Each internal node's elements act as separation values which divide its subtrees.

B+-Tree A B+ tree is a variation on a B-tree. In a B+ tree, in contrast to a B-tree, all data is saved in the leaves. Internal nodes contain only keys and tree pointers. All leaves are at the same lowest level. Leaf nodes are also linked together as a linked list to make range queries easy.

R-Tree Extends the B+-Tree All non-leaf node contains entries of form (cp,rectangle) where cp is the address of a child node and rectangle is the minimum bounding box rectangle (MBR). ~ Leaf nodes contain entries of the form (dataObject,Rectangle). We use the term directory rectangle which is the MBR of the underlying rectangles.

R-Tree properties Let M be the maximum number of entries that fit in one node and let m be a parameter specifying the minimum number of entries in a node (2 ≤ m ≤ M), an R-Tree statisfies the following properties The root has at least two children unless it’s a leaf. Every non-leaf node has beetween m and M children unless it’s a root. Every leaf node contains beetween m and M entries unless it’s a root. All leaves appear on the same level. A R-tree is completely dynamic. It allows overlapping directory rectangle => multiple path for an exact match query.

PAM’s The basic principle of all multidimensional PAMs is to partition the data space into page regions. We classify PAMs according to 3 properties : Rectangular Avoid empty-space Disjoint PAM x UB-Tree Twin-grid file Buddy-Tree The regions are pairwise disjoint or not ? ( R-tree they are not … ) The regions are rectangular or not ? ( All the PAMs and SAMs are in our case … ) The partition into regions is complete or not ? The union of all regions spans the complete data space or not ? ( For us important that it is … ) Avoid empty-space !

Buddy-Tree The Buddy-Tree uses similar concepts as the R-Tree. But it is extended and has more interesting properties : It does not partition empty space Insertion and deletion of a record is restricted to exactly one path. It does not allow overlap in the directory nodes.

Buddy-Tree : Formal Definition The nodes of the tree-directory consist of a collection of entries {E1,…,Ek}, k ≥ 2. Each entry Ei, 1 ≤ i ≤ k, is given by a tuple Ei=(Ri,pi) where Ri is a d-dimensional rectangle and pi is a pointer referring to as subtree or to a data page containing all the records of the file which are in the rectangle Ri. The set of rectangles in a directory node must be a regular B-partition

B-Rectangle, B-partition Given 2 d-dimensional rectangles R,S with R ≤ S, R is called a B-rectangle of S iff it can be generated by successive halfing of S. A B-region of R, written B(R) is the smallest rectangle such that R ≤ B. Such a B-region also exists for a union of rectangles R1 U R2 U … U Rk, k ≥ 1. A set of d-dimensional rectangles {R1,…,Rk}, k ≥ 1, is called a B-partition of the data space D, iff B(Ri) ∩ B(Rj) = Ø

The Buddies Let V = {R1,…,Rk} a B-partition, k > 1, and let S,T Є V, S ≠T. The rectangles S,T are called buddies iff B(S U T) ∩ B(R) = Ø For all R Є V\{S,T} S S T T S,T are Buddies S,T are NOT Buddies

Dynamic behavior To obtain an efficient dynamic behavior it must be possible to merge without destroying the order preservation. For this the regions of the pages must be buddies. In the buddy-tree the set of rectangles in a directory node must be a regular B-partition. We say that a B-parition is regular iff all B-rectangles B(Ri) 1 ≤ i ≤ k can be represented in a kd-trie. A kd-trie is a binary tree where the internal ndoes consist of an axis and 2 pointers referring to subtrees.

Example Here we say a regular B-Partition because we can represent it by a kd-trie t1 t3 Kd-trie is not unique … We can make more with this regular B-Partition t1 t2 s t3 s t2 B-Partition Kd-trie

UB-Tree (Universal B-Tree) Methods with good performance are guaranted for only 1 dimension. UB-Tree can handle multidimensional data. We can implement the UB-Tree on top of any database system. ( by preprocessing techniques )

UB-Tree (Universal B-Tree)[2] Basic Concepts Area : First we Partition a cube C of dimension n into 2n subcubes numbered : sc(i) for i=1,2,…,2n. For example : in 2 dimensions. Sc(1) Sc(2) Sc(3) Sc(4) AreaC(k) := Ui=1 to k, sc(i) for k = 0,1,…,2n AreaC(k.j) := AreaC(k) U Areasc(k+1)(J) Area(3)

Concept of Address An address α is a sequence I1,i2,… il where ij Є 0,1,… 2n For example this area has address 0.3, noted alpha(A) = 0.3

Definitions and lemmas Region : is the difference of 2 areas. Address of pixel : is the address of the area defined by including the pixel as the last and smallest subcube contained in this Area. There is a one-to-one map beetween Cartesian coordinates (x1,x2,…,xn) of a n-dimensional pixel and its address α. Alpha(cart(α)) = α

Definitions and lemmas[2] A point (x1,x2,…xn) has address region(β,δ), Γ = alpha(x1,x2,…,xn), it belong to the unique region(β,δ) with the condition β< Γ. region(0.1,3)

Range Queries The query is defined by an interval for each dimension. Each dimension can be beetween (-∞,+∞). The query is the cartesian product of the intervals for all dimensions, called the query box.

Range queries (2) Definition : we call all subcubes of level s of a cube brothers. Those with a smaller address are younger and those with a larger are older.

Range queries (3)

Complexity of UB-Tree Point-Query : O(logk(N)) N is the number of objects, k = 1/2M. Let Q be the number of objects intersecting the querybox q. Let r be the number of regions intersecting q. Point-Query : O(logk(N)) Range Query : r * O(logk(N)), For points only it’s : (N*Q/M) * O(logk(N)) Point insertion : O(logk(N))

Spatial Access Method Spatial indexes are used by spatial databases to optimize spatial queries. Indexes used by non-spatial databases cannot effectively handle features such as how far two points differ and whether points fall within a spatial area of interest. TV-Tree X-Tree

TV-Tree (Telescopic-Vector tree) The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

TV-tree We have also a hierarchical structure: The objects are clustered into leaf nodes of the tree, and the (MBR), minimum bounding region is stored in the parent node. Parents are recursively grouped, until the root is formed. At the top levels it’s optimal because it uses only a few basic features.

TV-tree The TV-tree can be applied to a tree with nodes that describe bounding regions of any shape (cubes,spheres,rectangles, … etc ).

Telescoping function The telescoping problem can be described as follows. Given an n x 1 feature vector x and m x n (m≤n) contraction matrix Am. The Amx is an m-contraction of x. A sequence of such matrices Am with m=1,… describes a telescoping function provided that the following condition is satisfied : If the m1-contractions of the 2 vectors x and y are equal, then so are their respective m2-contractions, for every m2 ≤ m1.

Multiple shapes We can use for example a sphere, because it’s only a center and a radius r. Represents the set of points with euclidean distance ≤ r. ~the euclidean distance is a special case of the Lp metrics with p=2. For L1 metric (manhattan distance) it defines a diamond shape. The TV-tree is working with any Lp-sphere.

TMBR (Telescopic Minimum Bounding Region) Each node in the TV-Tree represents the MBR (an Lp-sphere) of all its descendents. Each region is represented by a center, which is a vector determined by the telescoping vectors representing the objects and a scalar radius. We use the term TMBR to denote an MBR with such a telescopic vector as a center.