Presentation is loading. Please wait.

Presentation is loading. Please wait.

DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer, 2006-05-02.

Similar presentations


Presentation on theme: "DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer, 2006-05-02."— Presentation transcript:

1 DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer, 2006-05-02

2 2 Motivation DGrid was designed to: –Support very large sets of dynamic point data (i.e. points that move unpredictably). –Offer flexible trade-offs between the cost of search operations and the cost of updates. –Run on parallel and distributed systems.

3 3 Spatial Data Structures Definition—Any data structure that holds: - points- rectangles - lines- polygons - curves- etc. They are typically optimized for a particular type of search operation. We’ll focus on point data.

4 4 Commonly used example: The Quadtree. –Works like a binary search tree. –Each node in the tree has four children: NE, SE, SW, NW. –This is called ‘Recursive Decomposition’

5 5 The quadtree implementation in DGrid works like this: A (1,3)B (1,2)C (2,0) This is a ‘bottom-up Matrix (MX) Quadtree.’ (section 3.1.2)

6 6 Let’s do a search on that quadtree: We ruled out the ‘entire’ NE quadrant at the root level of the tree.

7 7 Trade-offs for bottom-up MX Quadtree, compared to other tree data structures: –The shape of the tree does not depend on the insertion order. –No need to balance the tree (which would be expensive). –Insertion and deletion are cheaper for clustered data.

8 8 C++ Templates They look like this: template class vector { //... }; In this case, a separate vector class is generated for each item type. // Strong type-checking using templates: MyType * a = someVector.get(5); // instead of: MyType * a = (MyType)someVector.get(5);

9 9 Turns out, C++ templates are a crude functional programming language. Why ‘crude?’ {- Haskell -} fact 1 = 1 fact n = n * fact (n - 1) // C++ Templates template struct fact { static const int value = N * fact :: value; }; template struct fact { static const int value = 1; }; This is ‘executed’ by the compiler!

10 10 This is called template metaprogramming. It’s used extensively in DGrid, to make it: –easier to use; –faster; –type safe.

11 11 Distributed Data DGrid uses Message Passing Interface (MPI) to run on distributed systems. –MPI is a library of basic ‘send’ and ‘receive’ operations. –Each processor gets a unique ID (‘rank’). –Use if-statements to run different code on different processes.

12 12

13 13 DGrid DGrid has these data structures: –Two types of 2D arrays. –A quadtree. –A distributed data structure. –A location class. Allows nesting of these data structures.

14 14 Let’s see some examples of nested data structures: –A 2D array of quadtrees (implied: the quadtree contains locations). –A quadtree of small 2D arrays. –A 2D array of 2D arrays. Called ‘tiling.’ A lot like a ‘shallow’ quadtree.

15 15 DGrid uses the Composite Design Pattern: DataStructure location

16 16 DGrid uses templates instead of a ‘Component’ interface. The result is that the user can do this: using namespace dgrid::tags; typedef dgrid::dgrid<MyItem, partial_grid_tag< quadtree_tag > > bucket; bucket a(0, 0, 639, 639, tiles(64, 64) << tiles(1, 1)); This is the ‘2D array of quadtrees’ example.

17 17 Important: The definition of the data structure is a type! Consequences: –Can check parameters at compile time. (Must be tiles( ) << tiles ( ) for this example, or it won’t compile.) –Compiler can optimize extensively (it knows which functions are going to call each other). –Can’t define a type at runtime, so ‘composition’ must be known at compile time.

18 18 Data structure operations: –insert(x, y, item) – add item at (x, y) –delete(x, y, item) – remove item from (x, y) –get(x, y, some_list) – get all items at (x, y) –get_range(x 0, y 0, x 1, y 1 ) – get all items in the range [ (x 0, y 0 ) : (x 1, y 1 ) ] Note: even the location class must support these operations.

19 19 In a nested data structure, operations are passed on from level to level. Because the types are known at compile time, these calls can be inlined. –Pro: eliminates the overhead of the function call. –Con: code size increases (function body is repeated).

20 20

21 21 Future Work Add more data structures, more search operations. Separate interface further from implementation. (Dynamic) Load Balancing.


Download ppt "DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer, 2006-05-02."

Similar presentations


Ads by Google