# Multidimensional Data Structures

## Presentation on theme: "Multidimensional Data Structures"— Presentation transcript:

Multidimensional Data Structures
Why do we need indexing? Quick access Search methods sequential search binary search balanced search tree (B or B+ tree) Why do we need new indexing structures Traditional data is one-dimensional multimedia data is multidimensional in general, if a given information has k features, it can be represented by a k-dimensional space, where each dimension corresponds to one feature

What kind of queries may be expected?
Given a set of points in k-dimensional space we may want to see if the point is in the set or not (exact match) we may want to find the closest points to the given point (similarity based search or nearest neighbor queries) given a region, we may want to find all the points in the given region (range query) Approach divide the space into regions insert the new object corresponding region if the region is full, split the region Query determine which regions are required to answer the query, and limit the search to these regions

Multidimensional Data Structures
K-d Trees Point Quadtrees MX-Quadtrees R-Trees Many others exist We do not discuss them in the class

K-d Trees Used to store k dimensional point data
Not used to store region data A 2-d tree (k=2) stores 2-dimensional point data while 3-d tree stores 3-dimensional point data, ..

2-d trees Node Structure INFO filed is any user-defined type
nodetype = record INFO: infotype; XVAL: real; YVAL: real; LLINK: nodetype; RLINK: nodetype; end INFO filed is any user-defined type XVAL and YVAL denote the coordinates of a point associated with the node LLINK and RLINK fields point to two children INFO XVAL YVAL LLINK RLINK

2-d Trees 2-d tree is a binary tree satisfying the following properties If N is node in the tree such that level(N) is even, then every node M in the subtree rooted at N.LLINK satisfies M.XVAL < N.XVAL every node P in the subtree rooted at N.RLINK satisfies P.XVAL  N.XVAL If N is node in the tree such that level(N) is odd, then every node M in the subtree rooted at N.LLINK satisfies M.YVAL < N.YVAL every node P in the subtree rooted at N.RLINK satisfies P.YVAL  N.YVAL

2-d Trees: Insertion/Search
To insert N into a tree pointed by T, Check if N and T agree on their XVAL and YVAL IF so, just overwrite T and we are done Else branch left if N.XVAL < T.XVAL and branch right otherwise Suppose P is the child. If N and P agree on their XVAL and YVAL, overwrite P and we are done, else branch left if N.YVAL < P.YVAL and branch right otherwise Repeat this procedure, branching on XVALs when we are at even levels, and YVALs when we are at odd levels in the tree A2 30,50 60 A1 >= 30 A1 < 30 (30,50) 60,10 40 A2 >= 10 A2 < 10 (45,20) 45,20 20 (60,10) A1 >= 45 A1 < 45 A1 20 40 60

Another example of 2-d tree
(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

Example of Insertion (XVAL,YVAL) City Banja Luka (19,45)
Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

Example of Insertion (XVAL,YVAL) City Banja Luka (19,45)
Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

Example of Insertion (XVAL,YVAL) City Banja Luka (19,45)
Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

Example of Insertion (XVAL,YVAL) City Banja Luka (19,45)
Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

Deletion in 2-d Trees Suppose we wish to delete (x,y) from a 2-d tree T search for a node N in T with N.XVAL = x and N.YVAL = y if N is a leaf node, then set the LLINK and RLINK fields of N’s parent to NIL and return N to appropriate storage otherwise, either the subtree rooted at N.LLINK (Tl) or the subtree rooted at N.RLINK (Tr) is non-empty Step1: Find a “candidate replacement” node R that occurs either in Tl or in Tr Step2: Replace all of N’s non-link fields by those of R Step3: Recursively delete R from Tl or Tr (whichever is applicable) the above recursion is guaranteed to terminate because Tl (Tr) has strictly smaller height than the original tree T

Finding Candidate Replacement Node
The candidate replacement node R must bear the same spatial relation to all nodes P in both Tl or Tr that N bore to P I.e, if P is to the southwest of N, then P must be to the southwest of R, if P is to the northwest of N, then P must be to the northwest of R, …. This means R must satisfy the following properties: 1. every node M in Tl is such that: M.XVAL < R.XVAL if level(N) is even and M.YVAL < R.YVAL if level(N) is odd 2. every node M in Tr is such that: M.XVAL  R.XVAL if level(N) is even and M.YVALR.YVAL if level(N) is odd

Finding Candidate Replacement Node
If Tr is not empty, and level(N) is even, then any node in Tr with smallest possible XVAL in Tr is a candidate replacement node If Tr is empty, then we might not be able to find a candidate replacement node from Tl In this case, find the node R’ in Tl with the smallest possible XVAL filed. Replace N with this Set N.RLINK = N.LLINK and set N.LLINK = NIL Recursively delete R’

Range queries in 2-d trees
A range query with respect to a 2-d tree T is a query that specifies a point (xc,yc) and a distance r the answer is the set of all points (x,y) in T such that (x,y) lies within distance r of (xc,yc) I.e., a range query defines a circle of radius r centered at (xc,yc), and expects to find all points in the 2-d tree that lie within the circle recall that each node N in T implicitly represents a region RN If the circle specified in a query has no intersection with RN, then there is no point in searching the subtree rooted at N

Example of the Range query
(xc,yc) = (35,46) r = 9.5

K-d trees It is a binary tree
Every node contains a data record, a left pointer and a right pointer At every level of the tree, a different attribute of the tree is used as the discriminator in a round-robin fashion All algorithms for 2-d trees generalize in the obvious way to k-d trees

In a 2-d tree, node N splits a region into two by drawing one line through the point (N.XVAL,N.YVAL) In a point quadtree, node N splits the region it represents by drawing both horizontal and vertical line through the point (N.XVAL,N.YVAL) These 4 parts are called the NW, SW, NE, SE quadrants determined by node N; each of these corresponds to a child of N Node Structure qtnodetype = record INFO: infotype; XVAL: real; YVAL: real; NW,SW,NE,SE: qtnodetype; end

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

(XVAL,YVAL) City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

If the node being deleted is a leaf node, deletion is trivial: we just set the appropriate link filed of node N’s parent to NIL and return node to storage otherwise, as in the case of 2-d trees, we need to find an appropriate replacement node

Expanded Node Type Expand the node structure to the following
qtnodetype = record INFO: infotype; XVAL,YVAL: real; XLB,YLB,XUB,YUB: real  {-,+} NW,SW,NE,SE: qtnodetype; end When inserting a node N into T we need to ensure that If N is the root node, then N.XLB = - , N.YLB = - , N.XUB = +, N.YUB = + If P is the parent of N (assume w=P.XUB-P.XLB and h=P.YUB-Y.YLB), then Case N.XLB N.XUB N.YLB N.YUB N=P.NW P.XLB P.XLB+w* P.YLB+h* P.YUB N=P.SW P.XLB P.XLB+w* P.YLB P.YLB+h*.5 N=P.NE P.XLB+w* P.XUB P.YLB+h* P.YUB N=P.SE P.XLB+w* P.XUB P.YLB P.YLB+h*.5

When deleting an interior node N, we must find a replacement node R in one of the subtrees of N such that every other node R1 in N.NW is to the northwest of R every other node R2 in N.SW is to the southwest of R every other node R3 in N.NE is to the northeast of R every other node R4 in N.SE is to the southeast of R In general, it may not always be possible to find such a replacement node deletion of an interior node N may require reinsertion of all nodes in the subtrees of N In the worst case, this may require almost all nodes to be reinserted

Similar to that of 2-d trees each node in a point quadtree represents a region do not search regions that do not intersect the circle defined by the query

the shape of the tree depends on the order in which objects are inserted the split (into 2 for 2-d and 4 for point quad) may be uneven depending on exactly where the point (N.XVAL,N.YVAL) is located inside the region represented by N MX-quadtrees (MX stands for matrix) attempt to ensure that the shape (and height) of the tree are independent of the number of nodes present in the tree as well as the order of insertion of these nodes provide efficient deletion and search algorithms

The MX-Quadtree The map being represented is split up into a grid of size (2k2k) of some k the application developer is free to choose k to reflect the desired granularity, but once chosen, this must be kept fixed Node structure: exactly same as that of point quadtrees, except that the root represents the region specified by XLB=0, XUB=2k, YLB=0, YUB=2k When a region gets split, it gets split in the middle the regions represented by the four children of N (w denotes the width of the region represented by N Child XLB XUB YLB YUB NW N.XLB N.XLB+w/2 N.YLB+w/2 N.YLB+w SW N.XLB N.XLB+w/ N.YLB N.YLB+w/2 NE N.XLB+w/2 N.XLB+w N.YLB+w/2 N.YLB+w SE N.XLB+w/2 N.XLB+w N.YLB N.YLB+w/2

It is fairly simple because all points are represented at the leaf level To delete N, First set the appropriate link of N’s parent (M) to NIL Check if all the four link fields of M are NIL If so, examine M’s parent (P), find the link field P.dir1 = M, set P.dir1 = NIL, and see if P’s four link fields are NIL If so, continue this process Complexity of deletion is O(k)

Handled exactly the same way as for point quadtrees, but there are 2 differences the content of XLB,XUB,YLB,YUB fields are different as points are stored at the leaf level, checking to see if a point is in the circle defined by the range query needs to be performed only at the leaf level

R-Trees R-trees are used to store rectangular regions of an image or map R-trees are particularly useful in storing very large amounts of data on disk They provide a convenient way of minimizing the number of disk accesses Each R-tree has an associated order, K Each non-leaf node contains at most K rectangles and at least K/2 rectangles (except root) (I.e, each non-root node must be at least half full) This makes R-trees appropriate for disk based retrieval (because each disk access brings back at least K/2 rectangles)

R-Trees Manipulate two kinds of rectangles: “real” and “group”

R-Tree Node Structure rtnodetype = record Rec1, …. RecK: rectangle;
P1,…PK: rtnodetype; end

Insertion into an R-tree

Insertion into an R-tree

Insertion into an R-tree

An incorrect insertion into an R-tree

Deletion in R-Trees Deletion may cause a node to “underflow” because an R-Tree must contain at least K/2 rectangles (real or group) (Recall B+-trees) When we delete a rectangle, we must make sure that the node is not underfull

Comparison of the four data structures
k-d trees easy to implement k-d tree with k nodes may have height k since the trees are binary, search and insertion are expensive point quadtrees comparison requires comparison of 2 attributes deletion is difficult complexity of range queries O(2n) where n is the number of records in the tree

Comparison of the four data structures
MX-quadtrees height is at most O(n) where the region represented is composed of (2n2n) cells insertion, deletion, search: O(n) range search is very efficient O(N+2h) where N is the number of points in the answer to the query and h is the height of the tree R-tree insertion, deletion, search, same as MX-quadtrees since large number of rectangles are stored in each node, they are appropriate for disk accesses the bounding rectangles may overlap (I.e., we may have to search via multiple paths)

Other multidimensional data structures
K-d-B trees hB-tree PMR-tree R*-tree ….

Commercial Systems Informix’s MapInfo Geocoding datablade: allows assignment of latitudinal and longitudinal elements to records Informix’s Spatial datablade: employs R-tree Oracle Universal server provides a spatial data option and is based on quadtree technology Intergraph’s Land Information System allows integration of survey data, imagery, etc., Allows to create temporal/historical view of landuse ESRI provides ARC/INFO system the spatial database engine works with geographic data stored in Oracle, Informix, Sybase, Microsoft SQL server, and DB2 Interesting to see what data structure do they employ!!