Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

Similar presentations


Presentation on theme: "1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak."— Presentation transcript:

1 1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric Lo Sindy Shou Hugh Wang

2 2 Spatial Access Method (SAM) Handle spatial data efficiently Query Build Index  Retrieve data item from a database system quickly  Dynamic  Update Why not use B-tree?  1 dimensional Designed for multi-dimensional points  E.g. 2D for Map

3 3 R-tree and R*-tree R-tree [Guttman84] R*-tree [Beckmann90] Height-balanced tree (Similar to B-tree) Leaf-nodes has format:  I is the Minimum Bounding Rectangle of a spatial object Tuple-identifier  id to retreive the spatial object in the database (name, address, etc)

4 4 The Spatial Data Minimum Bounding Box

5 5 R-tree and R*-tree properties Leaf: Non-leaf: I’ covers all rectangles in the children nodes entries Parameters:  M (max no of entries per node)  m (min no of entries per node)  m <= M/2 Root has at least two children All leaves in same level 1 node  1 disk page (minimize no. of I/O)

6 6 Outline Introduction  Motivation  R-tree and R*-tree structure Searching of R*-tree Construction of R*-tree Conclusions References

7 7 Searching May search more than one sub-tree (why?) Try to search a rectangle S Search (S):  Search from root  Find all index records overlap with S  If not a leaf, check overlap, if yes  Search (subTree)  Else it is a leaf, check all entries in that leaf which entries overlap with S

8 8 Searching examples

9 9 Spatial Data Introduction  Motivation  R-tree and R*-tree structure Searching of R*-tree Construction of R*-tree Conclusions References

10 10 R*-tree Optimization Criteria:  Minimize the area covered by an index rectangle  Minimize overlap between bounding rectangles  Minimizes the number of paths to be traversed  Minimize the margin of a directory rectangle  Creates less overlap, using same amount of area  Allows for better, more structured clustering  Optimize the storage utilization  Nodes in tree should be filled as much as possible Sometimes it is impossible to optimize all the above criteria at the same time!

11 11 R*-tree Insertion To insert a new entry, you need to choose which leaf entry to insert ChooseSubTree: Select a leaf in which to place a new index entry E:  Start from Root  If non-leaf node (children are leaves), choose the node using the following criteria: 1)Least overlap enlargement 2)Least area enlargement 3)Smaller area  If non-leaf node (children are not leaves), use 1 and 2  Invoke ChooseSubTree recursively  If leaf, return this node to be inserted

12 12 Splitting Node How about if a new entry E going to add to a node N which is full?  Split the full node?  Reinserted? How to split? 1.Determine the axis 2.Distribute the entries into 2 groups along that axis 3.Distribution may not evenly distributed!

13 13 1. Determine the axis (M+1) entries For each axis (i.e. x and y axis)  sort entries by the lower value, then by upper value  E.g. X axis, sort by lower value, then generate M- 2m+2 = 3 distributions (M=3, m=1)  k st distribution: [(m-1)+k] [the rest]  E.g. 2 nd distribution (1-1)+2: [E1 E2] [E3 E4]  3 rd distribution (1-1)+3: [E1 E2 E3] [E4]

14 14 1. Determining split axis (cont.) Compute S  sum of all margin-value of all (1, 2… M-2m+2) distributions Margin-value = perimeters of rectangles Choose the axis with lower S E.g. the S of 6 x-axis distributions (3 for lower value, 3 for higher-value) < that of y-axis Return x-axis as splitting axis

15 15 2. Distribute entries along axis How to split? 1.Determine the axis 2.Distribute the entries into 2 groups along that axis 3.Distribution may not evenly distributed! Along that axis, choose the distribution (out of 3) that with minimum overlap-value Overlap-value: area[ rect(group1)] area[ rect(group2)]

16 16 Who call split? R*-tree Insertion Algorithm Insert: Add a new entry into the level specified Begin End 1.Find appropriate node Invoke ChooseSubtree to find node N in which to place the new entry E. 2. Check for space in node to insert entry If N has less entries then M, insert E. Else 3. Split or Reinsert Invoke OverflowTreatment 4. Propagate changes upward If a split was performed, propagate upward. If a split of root node occurred, Create new root. 5. Adjust covering rectangles Adjust all rectangles in the insertion path to be minimum bounding box.

17 17 R*-tree Insertion (cont.) Algorithm OverflowTreatment: Determine whether to split the current node or try reinsertion. Begin End 1. Check conditionIf level is not root level and this is the first call of OverflowTreatment in the given level during the insertion of one data rectangle, 2. Do Reinsert Invoke ReInsert Else 3. Do Split Invoke Split

18 18 R*-tree Insertion (cont.) Algorithm ReInsert. Begin End 1.Compute DistanceFor all M+1 entries of a node N, compute the distance between the centers of their rectangles and the center of the bounding rectangle of N. 2.Sort entriesSort entries in decreasing order of their distances computed in 2. 3.Remove entriesRemove the first p entries from N and adjust bounding rectangle 4.Reinsert entriesInvoke Insert starting with maximum or minimum distance as defined in 3.

19 19 R*-tree Split Example R-tree R*-tree Quadratic Split m = 40% m = 40%

20 20 R*-tree Forced Reinsert:  When R*-tree node p overflows, instead of splitting p immediately, try to see if some entries in p could possibly fit better in another node  As splitting only contribute to local re-organization of the directory rectangles  Reinsert increase slightly the construction time, BUT resulting less overlap  improve query response time Remove 30% (p) yield best performance

21 21 Performance Comparison Using forced reinsert increases storage efficiency, decreases overlap, causes fewer spits, and makes rectangles more quadratic (square). CPU cost is higher when implementing forced reinsert, but due to fewer splits, the increase in disk access for insertions is only 4% (remains the lowest of all R-tree variants)!

22 22 Outline Introduction  Motivation  R-tree and R*-tree structure Searching of R*-tree Construction of R*-tree Conclusions References

23 23 Conclusions R*-trees performs significantly better than the other R-tree variants.  It is the most robust of the trees – requires less disk access  Gain is higher for smaller rectangles because strong utilization is more important for larger query rectangles  400% gain over Linear, 180% gain over Quadratic split in R- tree  The best storage utilization  Even with forced reinsertion, insertion cost is decreased, due to fewer splits  Spatial join has the highest gain

24 24 References Guttman , A., “R-Trees: A Dynamic Index Structure for Spatial Searching”, Proceedings, ACM SIGMOD, pp47-57, June 1984. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B. ”The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles”,Proceedings, ACM SIGMOD International Conferences on Management of Data, May 23-25, 1990.


Download ppt "1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak."

Similar presentations


Ads by Google