Presentation is loading. Please wait.

Presentation is loading. Please wait.

R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of.

Similar presentations


Presentation on theme: "R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of."— Presentation transcript:

1 R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of Southern California

2 Motivating Example Type in your street address in Google Type in your street address in Google

3 Example (Cont…) Show me all the pizza places close by: Show me all the pizza places close by:

4 Terminology Example query is termed a spatial query. Example query is termed a spatial query. R-tree is a spatial index structure. R-tree is a spatial index structure.  K-D-B trees are useful for point data only.  Exact-point lookup!  Show me the USC Salvatory Computer Science building.  R-tree represents data objects in intervals in several dimensions.  Exact-point and range lookups!  Show me all Pizza places in a 2 mile radius of USC Salvatory Computer Science building. R-tree is: R-tree is:  A height-balanced tree similar to B-tree with index records in its leaf nodes containing pointers to data objects.  A node is a disk page.  Assumes each tuple has a unique identifier, RID.

5 R-Tree: Leaf Nodes Leaf nodes contain index records: Leaf nodes contain index records:  (I, tuple-identifier) tuple-identifier is RID, tuple-identifier is RID, I is an n-dimensional rectangle that bounds the indexed spatial object I is an n-dimensional rectangle that bounds the indexed spatial object I = (I 0, I 1, …, I n-1 ) where n is the number of dimensions. I = (I 0, I 1, …, I n-1 ) where n is the number of dimensions. I i is a closed bounded interval [a,b] describing the extent of the object along dimension i. I i is a closed bounded interval [a,b] describing the extent of the object along dimension i. Values for a and b might be infinity, indicating an unbounded object along dimension i. Values for a and b might be infinity, indicating an unbounded object along dimension i.

6 R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries.

7 R-Tree: A 2-D (n=2) Example

8 R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions?

9 R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? What is this?

10 R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? Disk Page address!

11 R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? How about this? What is it?

12 R-Tree: Non-leaf nodes Non-leaf nodes contain entries of the form: Non-leaf nodes contain entries of the form: (I, child-pointer) (I, child-pointer) Child-pointer is the address of a lower node in the R-Tree. Child-pointer is the address of a lower node in the R-Tree. I covers all rectangles in the lower node’s entries. I covers all rectangles in the lower node’s entries. Questions? Questions? An n dimensional rectangle: I = (I 0, I 1, …, I n-1 )

13 R-tree: Properties Assume: Assume: 1. M = Maximum number of entries in a node. 2. m <= M/2 3. N = Number of records R-tree has the following properties: R-tree has the following properties:  Every leaf node contains between m and M index records. Root node is the exception.  For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n dimensional data object represented in the indicated tuple.  Every non-leaf node has between m and M children. Root node is the exception.  For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node.  The root node has at least two children unless it is a leaf.  All leaves appear on the same level.  Height of a tree = Ceiling(log m N)-1.  Worst case utilization for all nodes except the root is m/M.

14 Searching Descend from root to leaf in a B+-tree manner. Descend from root to leaf in a B+-tree manner. If multiple sub-trees contain the point of interest then follow all. If multiple sub-trees contain the point of interest then follow all. Assume: Assume:  EI denotes the rectangle part of an index entry E,  Ep denotes the tuple- identifier or child-pointer. Search (T: Root of the R- tree, S: Search Rectangle) Search (T: Root of the R- tree, S: Search Rectangle)  If T is not a leaf, check each entry E to determine whether EI overlaps S. For all overlapping entries, invoke Search(Ep, S).  If T is a leaf, check all entries E to determine whether EI overlaps S. If so, E is a qualifying record.

15 Insertion Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Insert (T: Root of the R-tree, E: new index entry) Insert (T: Root of the R-tree, E: new index entry) 1. Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E. 2. Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L. 3. Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed. 4. Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

16 Insertion: ChooseLeaf ChooseLeaf (E: new index entry) ChooseLeaf (E: new index entry) 1. Initialize: Set N to be the root node, 2. Leaf check: If N is a leaf, return N. 3. Choose subtree: Let F be the entry in N whose rectangle FI needs least enlargement to include E. Resolve ties by choosing the entry with the rectangle of smallest area. 4. Descend until a leaf is reached: Set N to be the child node pointed to by Fp and repeat from step 2.

17 SplitNode: Node Splitting A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Heuristic: The total area of two covering rectangles after a split should be minimized. Heuristic: The total area of two covering rectangles after a split should be minimized. Total area is larger!

18 SplitNode: Node Splitting A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. A full node contains M entries. Divide the collection of M+1 entries between 2 nodes. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Objective: Make it as unlikely as possible for the resulting two new nodes to be examined on subsequent searches. Heuristic: The total area of two covering rectangles after a split should be minimized. Heuristic: The total area of two covering rectangles after a split should be minimized. Total area is larger!

19 Node Splitting: How? How to find the minimum area node split? How to find the minimum area node split? 1. Exhaustive algorithm, 2. Quadratic-cost algorithm, 3. Linear cost algorithm.

20 Exhaustive Algorithm Generate all possible groups and choose the best with minimum area. Generate all possible groups and choose the best with minimum area. Number of possibilities ~ 2 to power of M-1 Number of possibilities ~ 2 to power of M-1  M ~ 50  Number of possibilities ~ 600 Trillion

21 Exhaustive Algorithm Generate all possible groups and choose the best with minimum area. Generate all possible groups and choose the best with minimum area. Number of possibilities ~ 2 to power of M-1 Number of possibilities ~ 2 to power of M-1  M ~ 50  Number of possibilities ~ 600 Trillion  US deficit pales!

22 Quadratic-Cost algorithm A heuristic to find a small-area split. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Pick two of the M+1 entries to be the first elements of the two new groups.  Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Assign remaining entries to groups one at a time.

23 Quadratic-Cost algorithm A heuristic to find a small-area split. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Pick two of the M+1 entries to be the first elements of the two new groups.  Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Assign remaining entries to groups one at a time.

24 Quadratic-Cost algorithm A heuristic to find a small-area split. A heuristic to find a small-area split. Cost is quadratic in M and linear in the number of dimensions. Cost is quadratic in M and linear in the number of dimensions. Pick two of the M+1 entries to be the first elements of the two new groups. Pick two of the M+1 entries to be the first elements of the two new groups.  Choose these in a manner to waste the most area if both were put in the same group. Assign remaining entries to groups one at a time. Assign remaining entries to groups one at a time.

25 Linear Cost Algorithm Identical to Quadratic with the following differences: Identical to Quadratic with the following differences:  Uses a different version of PickSeeds.  PickNext simply chooses any of the remaining entries. Linear: Choose two objects that are furthest apart. Quadratic: Choose two objects that create as much empty space as possible.

26 Comparison Linear node-split is simple, fast, and as good as quadratic! Linear node-split is simple, fast, and as good as quadratic!  Quality of the splits is slightly worse!

27 Insertion Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Similar to B-trees, new index records are added to the leaves, nodes that overflow are split, and splits propagate up the tree. Insert (T: Root of the R-tree, E: new index entry) Insert (T: Root of the R-tree, E: new index entry) 1. Find position for new record: Invoke ChooseLeaf to select a leaf node L in which to place E. 2. Add record to leaf node: If L has room for E then insert E and return. Otherwise, invoke SplitNode to obtain L and LL containing E and all the old entries of L. 3. Propagate changes upwards: Invoke AdjustTree on L, also passing LL if a split was performed. 4. Grow tree taller: If node split propagation caused the root to split, create a new root whose children are the two resulting nodes.

28 AdjustTree Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits. Ascend from a leaf node L to the root, adjusting covering rectangles and propagating node splits.

29 Deletes Straightforward. The only complication is under-flows: Straightforward. The only complication is under-flows: An under-full node can be merged with whichever sibling will have its area increased least. An under-full node can be merged with whichever sibling will have its area increased least.  Orphaned entries are inserted back into the R- Tree.

30 R-Tree

31 R-tree Variations R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries. R+-tree enhances retrieval performance by avoiding visiting multiple paths when searching for point queries.  No overlap for minimum bounding rectangels at the same level.  Specific object’s entry might be duplicated.  Insertions might lead to a series of update operations in a chain- reaction.  Under certain circumstances, the structure may lead to a deadlock, e.g., every rectangle encloses a smaller one.

32 R*-tree [1990] Node split is more sophisticated. Node split is more sophisticated.  Does not obey the limitation of the number of pairs per node.  When a node overflows, p entries are extracted and reinserted in the tree (p might be 25%).  Considers minimization of:  the overlapping between minimum bounding rectangles at the same level.  the perimeter of the produced minimum bounding rectangles. Insertion is more expensive while retrievals are faster. Insertion is more expensive while retrievals are faster.

33 Static R-trees Assumes the dataset is known in advance. Assumes the dataset is known in advance. Static R-trees are more efficient than dynamic ones: Static R-trees are more efficient than dynamic ones:  Tree structure is more compact,  Contains fewer news,  Overlap between minimum bounding rectangles is reduced.

34 Summary R-tree is a spatial index structure that provides competitive average performance. R-tree is a spatial index structure that provides competitive average performance. Many different variations in the literature: Many different variations in the literature:  Spatio-temporal access methods, 3-d R-tree.  Historical R-trees and Time-Parameterized R-tree fo spatiotemporal applications. Have been used to speed-up operations in OLAP applications, data warehouses and data mining. Have been used to speed-up operations in OLAP applications, data warehouses and data mining.


Download ppt "R-TREES: A Dynamic Index Structure for Spatial Searching by A. Guttman, SIGMOD 1984. Shahram Ghandeharizadeh Computer Science Department University of."

Similar presentations


Ads by Google