Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.

Similar presentations


Presentation on theme: "1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux."— Presentation transcript:

1 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux

2 2 Plan Introduction SDDS R-tree SD-Rtree Evolution Balancing Spatial Rotations Overlapping Redundant Coverage Queries Performance Conclusion

3 3 SDDS Principles (1993) Data are at server nodes Communicating through point-to-point messaging ; Overloaded servers split over new servers Queries go to client nodes use local images of the SDDS No central addressing component A node can be client and server (peer)

4 4 SDDS Principles (1993) An outdated image may send a query an incorrect server Servers forward such a query to the correct server Image gets adjusted Image Adjustment Message (IAM) comes back Client does not repeat the same error twice Data are basically in the RAM of the servers

5 5 SD-Rtree : a Spatial SDDS Distributed Spatial Data

6 6 SD-Rtree : a Spatial SDDS Distributed Index No central component

7 7 SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future)

8 8 SD-Rtree : Generalizes R-tree R-tree: Nodes are minimal bounding boxes Leaf nodes point to data Internal nodes bound subtrees May overlap Split when overflow Generate balanced m-ary tree

9 9 SD-Rtree : Generalizes R-tree R-tree: An insert may go through multiple paths Ends up in the smallest bounding box If there is any One of the boxes gets enlarged Box may split

10 10 SD-Rtree : Generalizes R-tree R-tree: Search may go through multiple paths All paths may bring relevant objects

11 11 Distribution issues First issue : adapt the structure to the context Cost model based on messages No paging ! The degree M of the tree is not a constraint => split and balancing algorithms must be reviewed Second issue : distribute the tree over the servers Balance evenly the load ; Do not overload the root node => search algorithms must be reviewed as well

12 12 SD-Rtree: a Balanced Binary Tree Each split generates a new edge Half of data moves to the new server Each server hosts exactly one leaf and one internal node of SD-Rtree

13 13 SD-Rtree: a Balanced Binary Tree The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has exactly two sons Each leaf node stores a subset of the indexed dataset At each node, the height of the subtrees differ by at most one Each server stores one data node and one routing node

14 14 Sd-tree: Binary Tree Structure di = data node (leaf) ri = routing node (internal node)

15 15 Sd-tree: Tree Distribution

16 16 Sd-tree: Evolution

17 17 SD-Rtree Balancing The binary tree should be height- balanced The heights of the two subtrees rooted at any node should not differ by more than 1 (cf. AVL trees) The tree height is then logarithmic in the number of leaves

18 18 SD-Rtree Balancing SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial change rectangles of internal nodes Best rotation minimizes rectangle overlapping Tie breaking minimizes the « dead space »

19 19 SD-Rtree : Spatial Rotations

20 20 Rotation Pattern Properties The sons of a node are not ordered => more freedom for reorganizing the tree Any imbalanced node matches a rotation pattern A rotation pattern is a subtree a(b(e(f,g),d),c) such that: h(c) = h(d) = h(f ) = n − 1 (n > 0) h(g) = max(0, n − 2)

21 21 SD-Rtree :Spatial Rotation

22 22 Rotation Cost Constant number of messages (3 or 6, depending on the choice) Few rotations in practice In particular when the dataset is uniformly distributed See our experiments

23 23 SD-Rtree : Images Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact server IAMs make it a subtree Splits make images outdated IAMs adjust it incrementally

24 24 Image Adjustment Client contacts a server with a query Each incorrect server initiates a traversal of the tree During the traversal, the description of the nodes is collected The correct server sends the up-to-date tree structure The client updates its image

25 25 Image Construction Using the image The client first searches its local image and chooses the servers that best corresponds to the query The correct server is found in O(log n) in the worst case

26 26 Out-of-range situation

27 27 Insertion of objects

28 28 Overlapping management The directory rectangles in an Rtree may overlap Local subtree does not suffice for locating all the nodes that contains the point (point query) or the window (window query) searched for. SD-Rtree servers maintain data on node overlapping Redundant Coverage It avoids to systematically access the root node.

29 29 Redundant Coverage Example The region common to A and B is stored on both nodes If a point query sent to A falls in the region shared with B: A sends a point query message to B For D: we must keep the intersection with C or B: here empty.

30 30 Queries Point queries and window queries. The technique is similar to the insertion algorithm: Search in the client image a server whose mbb contains the point or intersects the window Send the query to this server If the server actually covers the point or the window; it answers to the client; else it sends the query to its parent node A server uses the overlapping information to transmit the query

31 31 Experiments Synthetic data (points and rectangles) generated with GSTD 50.000 to 500.000 objects 0 to 3.000 queries Server capacity: 3 000 objects Comparison of three SD-Rtree variants: BASIC: no image; every query is processed top- down from the root IMSERVER: no IAMs among the servers IMCLIENT: client images

32 32 Cumulative Insert cost

33 33 Per Insert Cost

34 34 Cost of balancing

35 35 Image convergence

36 36 Distribution of messages

37 37 Cost per Query

38 38 Conclusion SD-Rtree is an efficient scalable distributed Rtree For very large spatial data collections Can be processed in distributed RAM Access time much faster than to disk data Load balancing Spatial rotations Overlapping management Redundant coverage O(log n) worst insert cost Future work kNN-queries Objects distribution balancing on servers

39 39 SD-Rtree Thank You for Your Attention Questions: First.Last@dauphine.fr


Download ppt "1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux."

Similar presentations


Ads by Google