Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University.

Similar presentations


Presentation on theme: "Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University."— Presentation transcript:

1 Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University

2 WIM workshop, Gl. Vrå Slot, December 6-8, 20012 Outline An overview of the R-tree and the TPR-tree Project proposals: Update-Efficient TPR-tree Time-parameterized SS-tree

3 WIM workshop, Gl. Vrå Slot, December 6-8, 20013 Spatial Indexing With the R-Tree Example Query R1 R2 R1R2 R3R4R5 p6p7 p5 p1p2 Pointers to data tuples p8 p3p4 p9p10 p11 p12p13 R6R7 R3 R4 R5 R6 R7 p1 p7 p6 p8 p2 p3 p4 p5 p9 p10 p11 p12 p13

4 WIM workshop, Gl. Vrå Slot, December 6-8, 20014 R-tree Properties Leaf entry = Non-leaf entry = MBR – a Minimum Bounding Rectangle of all points in the subtreee pointed to by ptr R-tree is a balanced tree – all leaves are at same depth from root Through insertion and deletion algorithms, nodes are kept at least m% full (except root) m is usually chosen to be 40%. m is the minimum fill factor, depending on the workload the average fill factor is usually  70%.

5 WIM workshop, Gl. Vrå Slot, December 6-8, 20015 Internal Nodes Leaf Nodes BP 1 BP n BP 2... …..... Grow-Post Trees Union( ) – computes a BP of a coleection of entries (in the R-tree, computes an MBR – minimum and maximum in all dimensions ) Penalty(BP, E) – returns an estimate how “worse” BP becomes if E is inserted under it Bounding predicate (BP) = something that describes entries in a subtree Building blocks of algorithms: Consistent(BP, Q) – returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q) PickSplit(node) – splits a page of entries into two groups R-Tree – a Grow-Post tree

6 WIM workshop, Gl. Vrå Slot, December 6-8, 20016 Range Query in R-trees Answering range query Q in R-trees 1. Start at the root 2. If current node is non- leaf, for each entry, if Consistent(MBR, Q), search subtree identified by ptr 3. If current node is leaf, for each entry, if E overlaps Q, rid identifies a point that overlaps Q Note: We may have to search several subtrees at each node! (In contrast, a B- tree equality search goes to just one leaf.) Worst-case performance O(n)! But in practice, R-trees exhibit good query performance for various data sets What about insertion and deletion?

7 WIM workshop, Gl. Vrå Slot, December 6-8, 20017 Insert Entry E Insertion algorithm  cn = root  If cn is leaf stop.  From all entries in cn choose the one e with the smallest Penalty (e.BP, E). (In R-trees, choose an entry whose MBR needs least enlargement to cover B; resolve ties by going to smallest area child)  cn = e.ptr, go to 3.  Insert e into cn. Call PropogateUp (cn). PropogateUp(cn)  If cn is overfull, call PickSplit(cn) to produce cn1 and cn2, replace cn’s old entry in its parent by e1 = Union(cn1), e2 = Union(cn2), call PropogateUp on cn’s parent.  Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent. Create a new root with two entries whenever a root is split.

8 WIM workshop, Gl. Vrå Slot, December 6-8, 20018 Heuristics for Penalty Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty. R1 R2 R1R2 R3R4R5 p6p7 p5 p1p2 Pointers to data tuples p8 p3p4 p9p10 p11 p12p13 R6R7 R3 R4 R5 R6 R7 p1 p7 p6 p8 p2 p3 p4 p5 p9 p10 p11 p12 p13 p14

9 WIM workshop, Gl. Vrå Slot, December 6-8, 20019 Heuristics for Penalty Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty. R1 R2 R1R2 R3R4R5 p6p7 p5 p1p2 Pointers to data tuples p8 p3p4 p9p10 p11 p12p13 R6R7 R3 R4 R5 R6 R7 p1 p7 p6 p8 p2 p3 p4 p5 p9 p10 p11 p12 p13 p14

10 WIM workshop, Gl. Vrå Slot, December 6-8, 200110 Deletion in R-trees Delete entry E  Using the search procedure, find a leaf cn where entry E is located  Remove E from cn. Call PropogateUp(cn). PropogateUp(cn)  If cn is underfull, deallocate the node cn remove cn’s entry in its parent, call PropogateUp on cn’s parent, and reinsert all cn’s entries or merge them into some other node  Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent. No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine.

11 WIM workshop, Gl. Vrå Slot, December 6-8, 200111 Modeling Continuous Movement In conventional databases, data is assumed constant unless explicitly modified. With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data

12 WIM workshop, Gl. Vrå Slot, December 6-8, 200112 Modeling Continuous Movement In conventional databases, data is assumed constant unless explicitly modified. With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions. We use linear functions to capture the present and future positions. Updates are necessary only when the parameters of the functions change. For example, given, the current and anticiapted, future position of a two- dimensional point can be described by four parameters.

13 WIM workshop, Gl. Vrå Slot, December 6-8, 200113 Queries Type 1: objects that intersect a given rectangle at Type 2: objects that intersect a given rectangle sometime from to Type 3: objects that intersect a given moving rectangle sometime between and 1 2 3 4 5 6 x t 123456 o1o1 o1o1 o2o2 o3o3 o4o4 We can expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t 1, t 2 <= CT + W

14 WIM workshop, Gl. Vrå Slot, December 6-8, 200114 Time-Parameterized Rectangles The TPR-tree is based on the R-tree. Moving points are bounded with time-parameterized rectangles. Are bounding from now on. The R-tree allows overlap. The tree employs conservative bounding rectangles. At any t > t c we can get a valid R-tree: TPBR-tree(t) = R-tree

15 WIM workshop, Gl. Vrå Slot, December 6-8, 200115 Insertion: Grouping Points How to group moving points (Penalty and PickSplit)? The R-tree’s algorithms minimize characteristics of MBRs such as area, overlap, and margin. How does that work for moving points? 7 1 6 5 4 2 3 7 5 6 4 2 3 1 6 5 4 2 3 1 7 7 5 6 4 2 3 1 7 5 6 4 2 3 1 7 5 6 4 2 3 1

16 WIM workshop, Gl. Vrå Slot, December 6-8, 200116 Insertion in the TPR-Tree The bounding rectangle characteristics (area, overlap, and margin) are functions of time. The goal is to minimize these for all time points from now to now+H. Minimizing the characteristics for time now + H/2 does not work (e.g., the area of a conservative bounding rectangle is not linear). where A(t) is, e.g., the area of an MBR We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals. What H to use? H depends on the update rate, and on how far queries may reach into the future (W).

17 WIM workshop, Gl. Vrå Slot, December 6-8, 200117 Bounding Predicates Bounding rectangles in R-tree-type indices are used in two different ways: For querying (Consistent) For deleting (Consistent) and inserting (grouping data entries into nodes – Penalty )

18 WIM workshop, Gl. Vrå Slot, December 6-8, 200118 Outline An overview of the R-tree and the TPR-tree Project proposals: Update-Efficient TPR-tree Time-parameterized SS-tree

19 WIM workshop, Gl. Vrå Slot, December 6-8, 200119 Update-Efficient TPR-tree Handling hyper-dynamic data 500,000 objects; on the average each object updates its positional info three times per hour => ~400 updates per second Update – deletion followed by an insertion Observations: Usually object’s positional information does not change too drastically in-between updates Most of the update cost is due to a search phase of a deletion (several paths down the tree may be followed)  We assume that the object reports it’s previous positional information, so that we know what to delete.  We need to spend I/Os on making bounding predicates as “tight” as possible, although we may be willing to sacrifice query performance

20 WIM workshop, Gl. Vrå Slot, December 6-8, 200120 In-place Updates Lazy Update R-tree (LUR-tree): Hash table (on object id’s) is used to access leaf pages directly (without the search phase of deletion). Update is one operation: 1. Go to the hash table with an object’s id, and get the pointer to the leaf page 2. Update the object’s information in this page or, if object’s information changed too “drastically”, insert it from the top of the tree using the normal insertion procedure

21 WIM workshop, Gl. Vrå Slot, December 6-8, 200121 Problems to Solve  Problems (that you have to try to solve, refining and applying these ideas to the TPR-tree): How do we update bounding rectangles in ancestor nodes?  Possible solution: hash table storing the full path from the root to the leaf When do we do a real insertion and when an update in place? What do we do when nodes are split/merged? (Can we spend so many I/Os maintaining our hash table?)  Possible solution: Lazy updating of the hash table and use of pointers to split-off nodes as in R-link trees.

22 WIM workshop, Gl. Vrå Slot, December 6-8, 200122 Time-Parameterized SS-trees SS-tree – a Grow-Post tree, where bounding predicates are spheres: Good for Nearest Neighbor queries Compact description of a bounding predicate (independent of dimensionality) Project – explore time-parameterized SS-trees. Issues to be addressed: Writing the Consistent method Writing the Penalty method Experimentally comparing with TPR-tree for range queries and NN queries


Download ppt "Project Proposals Simonas Šaltenis Aalborg University Nykredit Center for Database Research Department of Computer Science, Aalborg University."

Similar presentations


Ads by Google