Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multidimensional Data Structures Why do we need indexing? –Quick access Search methods –sequential search –binary search –balanced search tree (B or B+

Similar presentations


Presentation on theme: "Multidimensional Data Structures Why do we need indexing? –Quick access Search methods –sequential search –binary search –balanced search tree (B or B+"— Presentation transcript:

1 Multidimensional Data Structures Why do we need indexing? –Quick access Search methods –sequential search –binary search –balanced search tree (B or B+ tree) Why do we need new indexing structures –Traditional data is one-dimensional –multimedia data is multidimensional in general, if a given information has k features, it can be represented by a k-dimensional space, where each dimension corresponds to one feature

2 What kind of queries may be expected? Given a set of points in k-dimensional space –we may want to see if the point is in the set or not (exact match) –we may want to find the closest points to the given point (similarity based search or nearest neighbor queries) –given a region, we may want to find all the points in the given region (range query) Approach –divide the space into regions –insert the new object corresponding region –if the region is full, split the region Query –determine which regions are required to answer the query, and limit the search to these regions

3 Multidimensional Data Structures K-d Trees Point Quadtrees MX-Quadtrees R-Trees Many others exist –We do not discuss them in the class

4 K-d Trees Used to store k dimensional point data Not used to store region data A 2-d tree (k=2) stores 2-dimensional point data while 3-d tree stores 3-dimensional point data,..

5 2-d trees Node Structure  nodetype = record  INFO: infotype;  XVAL: real;  YVAL: real;  LLINK:  nodetype;  RLINK:  nodetype;  end  INFO filed is any user-defined type  XVAL and YVAL denote the coordinates of a point associated with the node  LLINK and RLINK fields point to two children INFO XVAL YVAL LLINK RLINK

6 2-d Trees 2-d tree is a binary tree satisfying the following properties –If N is node in the tree such that level(N) is even, then every node M in the subtree rooted at N.LLINK satisfies M.XVAL < N.XVAL every node P in the subtree rooted at N.RLINK satisfies P.XVAL  N.XVAL –If N is node in the tree such that level(N) is odd, then every node M in the subtree rooted at N.LLINK satisfies M.YVAL < N.YVAL every node P in the subtree rooted at N.RLINK satisfies P.YVAL  N.YVAL

7 2-d Trees: Insertion/Search 30,50 A1 < 30 A1 >= 30 60,10 A2 < 10 A2 >= 10 45,20 A1 < 45 A1 >= 45 A1 A (30,50) (45,20) (60,10)    To insert N into a tree pointed by T, –Check if N and T agree on their XVAL and YVAL –IF so, just overwrite T and we are done –Else branch left if N.XVAL < T.XVAL and branch right otherwise –Suppose P is the child. If N and P agree on their XVAL and YVAL, overwrite P and we are done, else branch left if N.YVAL < P.YVAL and branch right otherwise –Repeat this procedure, branching on XVALs when we are at even levels, and YVALs when we are at odd levels in the tree

8 Another example of 2-d tree City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4) (XVAL,YVAL)

9 City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4) (XVAL,YVAL)

10 Example of Insertion City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4) (XVAL,YVAL)

11 Example of Insertion City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4) (XVAL,YVAL)

12 Example of Insertion City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4) (XVAL,YVAL)

13 Example of Insertion City Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4) (XVAL,YVAL)

14 Deletion in 2-d Trees Suppose we wish to delete (x,y) from a 2-d tree T –search for a node N in T with N.XVAL = x and N.YVAL = y –if N is a leaf node, then set the LLINK and RLINK fields of N’s parent to NIL and return N to appropriate storage –otherwise, either the subtree rooted at N.LLINK (T l ) or the subtree rooted at N.RLINK (T r ) is non-empty Step1: Find a “candidate replacement” node R that occurs either in T l or in T r Step2: Replace all of N’s non-link fields by those of R Step3: Recursively delete R from T l or T r (whichever is applicable) –the above recursion is guaranteed to terminate because T l (T r ) has strictly smaller height than the original tree T

15 Finding Candidate Replacement Node The candidate replacement node R must bear the same spatial relation to all nodes P in both T l or T r that N bore to P I.e, if P is to the southwest of N, then P must be to the southwest of R, if P is to the northwest of N, then P must be to the northwest of R, …. This means R must satisfy the following properties: –1. every node M in T l is such that: M.XVAL < R.XVAL if level(N) is even and M.YVAL < R.YVAL if level(N) is odd –2. every node M in T r is such that: M.XVAL  R.XVAL if level(N) is even and M.YVAL  R.YVAL if level(N) is odd

16 Finding Candidate Replacement Node If T r is not empty, and level(N) is even, then any node in T r with smallest possible XVAL in T r is a candidate replacement node If T r is empty, then we might not be able to find a candidate replacement node from T l In this case, find the node R’ in T l with the smallest possible XVAL filed. Replace N with this Set N.RLINK = N.LLINK and set N.LLINK = NIL Recursively delete R’

17 Range queries in 2-d trees A range query with respect to a 2-d tree T is a query that specifies a point (x c,y c ) and a distance r the answer is the set of all points (x,y) in T such that (x,y) lies within distance r of (x c,y c ) I.e., a range query defines a circle of radius r centered at (x c,y c ), and expects to find all points in the 2-d tree that lie within the circle recall that each node N in T implicitly represents a region R N If the circle specified in a query has no intersection with R N, then there is no point in searching the subtree rooted at N

18 Example of the Range query (x c,y c ) = (35,46) r = 9.5

19 K-d trees It is a binary tree Every node contains a data record, a left pointer and a right pointer At every level of the tree, a different attribute of the tree is used as the discriminator in a round-robin fashion All algorithms for 2-d trees generalize in the obvious way to k-d trees

20 Point Quadtrees Point quad trees always split regions into 4 parts In a 2-d tree, node N splits a region into two by drawing one line through the point (N.XVAL,N.YVAL) In a point quadtree, node N splits the region it represents by drawing both horizontal and vertical line through the point (N.XVAL,N.YVAL) These 4 parts are called the NW, SW, NE, SE quadrants determined by node N; each of these corresponds to a child of N Node Structure  qtnodetype = record  INFO: infotype;  XVAL: real;  YVAL: real;  NW,SW,NE,SE:  qtnodetype;  end

21 Insertion into Point Quadtrees City (XVAL,YVAL) Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

22 Insertion into Point Quadtrees City (XVAL,YVAL) Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

23 Insertion into Point Quadtrees City (XVAL,YVAL) Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

24 Insertion into Point Quadtrees City (XVAL,YVAL) Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

25 Insertion into Point Quadtrees City (XVAL,YVAL) Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

26 Insertion into Point Quadtrees City (XVAL,YVAL) Banja Luka (19,45) Derventa (40,50) Toslic (38,38) Tuzla (54,35) Sinj (4,4)

27 Deletion in Point Quadtrees If the node being deleted is a leaf node, deletion is trivial: we just set the appropriate link filed of node N’s parent to NIL and return node to storage otherwise, as in the case of 2-d trees, we need to find an appropriate replacement node

28 Expanded Node Type Expand the node structure to the following  qtnodetype = record  INFO: infotype;  XVAL,YVAL: real;  XLB,YLB,XUB,YUB: real  {- ,+  }  NW,SW,NE,SE:  qtnodetype;  end When inserting a node N into T we need to ensure that –If N is the root node, then N.XLB = - , N.YLB = - , N.XUB = + , N.YUB = +  If P is the parent of N (assume w=P.XUB-P.XLB and h=P.YUB-Y.YLB), then Case N.XLB N.XUB N.YLB N.YUB N=P.NW P.XLB P.XLB+w*.5 P.YLB+h*.5 P.YUB N=P.SW P.XLB P.XLB+w*.5 P.YLB P.YLB+h*.5 N=P.NE P.XLB+w*.5 P.XUB P.YLB+h*.5 P.YUB N=P.SE P.XLB+w*.5 P.XUB P.YLB P.YLB+h*.5

29 Deletion in Point Quadtrees When deleting an interior node N, we must find a replacement node R in one of the subtrees of N such that –every other node R 1 in N.NW is to the northwest of R –every other node R 2 in N.SW is to the southwest of R –every other node R 3 in N.NE is to the northeast of R –every other node R 4 in N.SE is to the southeast of R In general, it may not always be possible to find such a replacement node –deletion of an interior node N may require reinsertion of all nodes in the subtrees of N –In the worst case, this may require almost all nodes to be reinserted

30 Range Searches in Point Quadtree Similar to that of 2-d trees each node in a point quadtree represents a region do not search regions that do not intersect the circle defined by the query

31 The MX-Quadtree For both 2-d trees and point quadtrees, –the shape of the tree depends on the order in which objects are inserted –the split (into 2 for 2-d and 4 for point quad) may be uneven depending on exactly where the point (N.XVAL,N.YVAL) is located inside the region represented by N MX-quadtrees (MX stands for matrix) attempt to –ensure that the shape (and height) of the tree are independent of the number of nodes present in the tree as well as the order of insertion of these nodes –provide efficient deletion and search algorithms

32 The MX-Quadtree The map being represented is split up into a grid of size (2 k  2 k ) of some k the application developer is free to choose k to reflect the desired granularity, but once chosen, this must be kept fixed Node structure: exactly same as that of point quadtrees, except that the root represents the region specified by XLB=0, XUB=2 k, YLB=0, YUB=2 k When a region gets split, it gets split in the middle –the regions represented by the four children of N (w denotes the width of the region represented by N Child XLB XUB YLB YUB NW N.XLB N.XLB+w/2 N.YLB+w/2 N.YLB+w SW N.XLB N.XLB+w/2 N.YLB N.YLB+w/2 NE N.XLB+w/2 N.XLB+w N.YLB+w/2 N.YLB+w SE N.XLB+w/2 N.XLB+w N.YLB N.YLB+w/2

33 Insertion into MX-Quadtree

34

35 Deletion in MX-Quadtrees It is fairly simple because all points are represented at the leaf level To delete N, –First set the appropriate link of N’s parent (M) to NIL –Check if all the four link fields of M are NIL –If so, examine M’s parent (P), find the link field P.dir1 = M, set P.dir1 = NIL, and see if P’s four link fields are NIL –If so, continue this process –Complexity of deletion is O(k)

36 Range Queries in MX-Quadtrees Handled exactly the same way as for point quadtrees, but there are 2 differences –the content of XLB,XUB,YLB,YUB fields are different –as points are stored at the leaf level, checking to see if a point is in the circle defined by the range query needs to be performed only at the leaf level

37 R-Trees R-trees are used to store rectangular regions of an image or map R-trees are particularly useful in storing very large amounts of data on disk They provide a convenient way of minimizing the number of disk accesses Each R-tree has an associated order, K Each non-leaf node contains at most K rectangles and at least  K/2  rectangles (except root) (I.e, each non-root node must be at least half full) This makes R-trees appropriate for disk based retrieval (because each disk access brings back at least K/2 rectangles)

38 R-Trees Manipulate two kinds of rectangles: “real” and “group”

39 R-Tree Node Structure  rtnodetype = record  Rec 1, …. Rec K : rectangle;  P 1,…P K :  rtnodetype;  end

40 Insertion into an R-tree

41

42

43 An incorrect insertion into an R-tree

44 Deletion in R-Trees Deletion may cause a node to “underflow” because an R-Tree must contain at least  K/2  rectangles (real or group) (Recall B + -trees) When we delete a rectangle, we must make sure that the node is not underfull

45 Comparison of the four data structures k-d trees –easy to implement –k-d tree with k nodes may have height k –since the trees are binary, search and insertion are expensive point quadtrees –easy to implement –comparison requires comparison of 2 attributes –deletion is difficult –complexity of range queries O(2  n) where n is the number of records in the tree

46 Comparison of the four data structures MX-quadtrees –height is at most O(n) where the region represented is composed of (2 n  2 n ) cells –insertion, deletion, search: O(n) –range search is very efficient O(N+2 h ) where N is the number of points in the answer to the query and h is the height of the tree R-tree –insertion, deletion, search, same as MX-quadtrees –since large number of rectangles are stored in each node, they are appropriate for disk accesses –the bounding rectangles may overlap (I.e., we may have to search via multiple paths)

47 Other multidimensional data structures K-d-B trees hB-tree PMR-tree R*-tree ….

48 Commercial Systems Informix’s MapInfo Geocoding datablade: allows assignment of latitudinal and longitudinal elements to records Informix’s Spatial datablade: employs R-tree Oracle Universal server provides a spatial data option and is based on quadtree technology Intergraph’s Land Information System allows integration of survey data, imagery, etc., Allows to create temporal/historical view of landuse ESRI provides ARC/INFO system –the spatial database engine works with geographic data stored in Oracle, Informix, Sybase, Microsoft SQL server, and DB2 –Interesting to see what data structure do they employ!!


Download ppt "Multidimensional Data Structures Why do we need indexing? –Quick access Search methods –sequential search –binary search –balanced search tree (B or B+"

Similar presentations


Ads by Google