Presentation is loading. Please wait.

Presentation is loading. Please wait.

Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.

Similar presentations


Presentation on theme: "Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA."— Presentation transcript:

1 Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

2   Introduction  R-Tree Index Structure  Searching and Updating  Performance Tests  Conclusion Outline 2

3   Introduction  Background  Previous Works  R-Tree Index Structure  Searching and Updating  Performance Tests  Conclusion Outline 3

4   Motivation  To deal with spatial data efficiently  Traditional database are for one-dimension data  Traditional Index Structure  Hash Tables  B Trees and ISAM Background 4

5  Previous Works MethodDisadvantage Cell methodsNot good for dynamic structures Quad treesDo not take paging of secondary memory into account K-D tree K-D-B treeUseful only for point data Corner StltchmgHomogeneous primary memory Not efficient Grid files 5

6   Introduction  R-Tree Index Structure  R-Tree Index Structure  Properties of the R-Tree  Example of a R-Tree  Searching and Updating  Performance Tests  Conclusion Outline 6

7   What is a R-tree  Height-balanced tree similar to a B-tree  No need for doing periodic reorganization  What is the contents in the nodes  (I, tuple-identifier) in leaf node  (I, child-pointer) in non-leaf node  It must satisfy following properties R-Tree Index Structure 7

8   Let M be the maximum number of entries that will fit in one node  Let m <= M/2 be a parameter specifying the minimum number of entries in a node Properties of the R-Tree 8

9  1.Every leaf node contains between m and M index records unless it is the root 2.For each index record(I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple 3.Every non-leaf node has between m and M children unless it is the root 4.For each entry(I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node 5.The root node has at least two children unless it is a leaf 6.All leaves appear on the same level Properties of the R-Tree 9

10  10 Example of a R-Tree

11   Introduction  R-Tree Index Structure  Searching and Updating  Searching  Example of Searching  Insertion  Updates and Other Operations  Node Splitting  Performance Tests  Conclusion Outline 11

12   Problem definition Give an R-Tree whose root node is T, find all index records whose rectangles overlap a search rectangle S  Notations EI is the rectangle part of an index entry E Ep is the tuple-identifier or child-pointer of an E 12 Searching

13  Search(T, LIST) { IF (T is not a leaf) FOR EACH (E in T) IF (E.EI overlaps S) Search(E.Ep); ELSE FOR EACH (E in T) IF (E.EI overlaps S) LIST.ADD(E.Ep); } 13 Searching

14  14 Example of Searching

15   It is similar to insert a record in B-tree that new record are added to the leaves, nodes that overflow are split, and splits propagate up the tree Insert(T, E) { L = ChooseLeaf(T, E); INSTALL E; IF (L is full) { LL = SplitNode(L); AdjustTree(L, LL); } 15 Insertion

16  N ChooseLeaf(T, E) { SET N = T; IF (N is a non-leaf node) { find the F that F.FI needs least enlargement to include E.EI IN N SET N = F.Fp; ChooseLeaf(N, E); } ELSE return N; } 16 Insertion - ChooseLeaf()

17  AdjustTree(L, LL) { SET N = L; SET NN = LL; IF (N is root) // check if done return; SET P = N.parent; SET En to be N’s entry in P ADJUST EnI so that it tightly encloses all entry rectangles in N IF (NN != NULL) { CREATE Enn; // Enn.p = NN, EnnI enclosing all rectangles in NN P.add(Enn); IF (P is full) { PP = SplitNode(P); AdjustTree(P, PP); } 17 Insertion - AdjustTree() These three lines are for adjust covering rectangle in parent entry

18   Remove index record E from an R-tree Delete(T, E) { L = FindLeaf(T, E); IF (L != NULL) { Remove(E, L); // remove E from L CondenseTree(L); IF (root node has only one child) make the child the new root; } 18 Deletion

19   Given an R-tree whose root node is T, find the leaf node containing the index entry E T FindLeaf(T, E) { IF (T is not a leaf) { FOR EACH (F in T) { IF (FI overlaps EI) { T = FindLeaf(Fp, E); } IF (T is leaf) { FOR EACH (F in T) IF (F MATCH E) return T; } 19 Deletion - FindLeaf()

20  CondenseTree(L) { CT1: SET N = L; SET Q = empty; // the set of eliminated nodes. CT2: IF (N is root) { FOR EACH (E in Q) Insert(T, E); } ELSE { SET P = N.parent; SET En to be N’s entry in P; CT3: IF (N has fewer than m entries) { DELETE (En, P) // delete En from P Q.add(N); } ELSE { CT4: adjust EnI to tightly contain all entries in N; CT5: SET N = P; GOTO CT2; } 20 Deletion - CondenseTree()

21   Update  Just perform deletion and re-insertion to do update  Other operations  To find all data objects completely contained in a search area, or all objects that contain a search area  Range deletion 21 Updates and Other Operations

22   We need to perform node splitting when we insert an entry into a full node  The two covering rectangles after a split should be minimized because it affect efficiency seriously  The are three different kind of splitting algorithms: exhaustive algorithm, quadratic-cost algorithm and linear-cost algoritym 22 Node Splitting

23  23 Node Splitting- Exhaustive Algorithm

24   It attempts to find a small-area split, but is not guaranteed to find one with the smallest area possible  The cost is quadratic in M and linear in the number of dimensions  Process 1.Pick first entry for each group 2.Check if done 3.Select entry to assign 24 Node Splitting - Quadratic-Cost Algorithm

25   Select two entries to be the first elements of the groups  Process 1.Calculate inefficiency of grouping entries together 2.Choose the most wasteful pair 25 Quadratic-Cost Algorithm PickSeeds()

26   Select one remaining entry for classification in a group  Process 1.Determine cost of putting each entry in each group 2.Find entry with greatest preference for one group 26 Quadratic-Cost Algorithm PickNext()

27   It is linear in M and in the number of dimensions  It is identical to Quadratic Split but used a different version of PickSeed, PickNext  Process 1.Find extreme rectangles along all dimensions 2.Adjust for shape of the rectangle cluster 3.Select the most extreme pair 27 Node Splitting – Linear-Cost Algorithm

28   Introduction  R-Tree Index Structure  Searching and Updating  Performance Tests  Performance Tests  CPU Cost of Inserting Records  CPU Cost of Deleting Records  Search Performance Pages Touched  Search Performance CPU Cost  Space Efficiency  Second Series of Tests  CPU Cost of Inserts and Deletes vs. Amount of Data  Search Performance vs. Amount of Data Pages Touched  Search Performance vs. Amount of Data CPU Cost  Space Required for R-Tree vs. Amount of Data  Conclusion Outline 28

29   Implemented R-trees in C under Unix on a Vax 11/780 computer  It purpose is to choose values for M and m, and to evaluate different node-splitting algorithms  Five page sizes were tested, corresponding to different values of M  Values tested for m were M/2, M/3 and 2  All tests used two-dimensional data 29 Performance Tests Bytes per Page Max Entries per Page(M) 1286 25612 51225 102450 2048102

30  30 CPU Cost of Inserting Records

31  31 CPU Cost of Deleting Records

32  32 Search Performance Pages Touched

33  33 Search Performance CPU Cost

34  34 Space Efficiency

35   It measured T-tree performance as a function of the amount of data in the index  The same sequence of test operations as before was run on samples containing 1057, 2238, 3295, and 4559 rectangles  Parameters  Linear algorithm with m = 2  Quadratic algorithm with m = M/3  Both with a page size of 1024 bytes(M=50) 35 Second Series of Tests

36  36 CPU Cost of Inserts and Deletes vs. Amount of Data

37  37 Search Performance vs. Amount of Data Pages Touched

38  38 Search Performance vs. Amount of Data CPU Cost

39  39 Space Required for R-Tree vs. Amount of Data

40   Introduction  R-Tree Index Structure  Searching and Updating  Performance Tests  Conclusion Outline 40

41   Author proposed an useful index structure, named R-tree, for multi-dimensional data  Author also gave tree different splitting algorithm, ran some tests on it, and concluded that linear node- split algorithm is the most efficient approach  R-tree would be easy to add to any relational database system 41 Conclusion


Download ppt "Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA."

Similar presentations


Ads by Google