Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Indexing in SpatialHadoop

Similar presentations


Presentation on theme: "Dynamic Indexing in SpatialHadoop"— Presentation transcript:

1 Dynamic Indexing in SpatialHadoop
Tin K. Vu, CSE Dept, UC Riverside Advisor: Prof. Ahmed Eldawy, CSE Dept, UC Riverside Project Full Presentation, course CS267, F16

2 Introduction

3 SpatialHadoop A framework for big spatial data: language, storage, MapReduce, operations. Work efficiently with static data.

4 Dynamic Indexing in SpatialHadoop
Make SpatialHadoop being able to work with dynamic data. Maintain performance of spatial queries.

5 Store new data in master node. Repartition with low cost.
Key ideas Store new data in master node. Repartition with low cost. How does it overcome limitations of existing works? Keep the advantages of SpatialHadoop. Construct a good strategy for repartitioning.

6 Outline Introduction Related works Dynamic Indexing Experiments Conclusions

7 Related works

8 Big Spatial Indexes Hadoop-GIS: partition based-on density. Limitations: support limited types of spatial data type or query. SpatialHadoop: pre-indexing based-on boundary. Limitation: work with static data.

9 Dynamic Indexes AsterixDB, HBase: support high rate of data ingestion. Limitations: support limited types of spatial data type or query.

10 Dynamic Spatial Indexes
MD-HBase, GeoMesa: view spatial data in key- value aspect. Limitations: support limited types of spatial data type or query.

11 Dynamic Indexing

12 Approach Multi-levels tree with HDFS: New data is stored in master node, then flush to slave nodes. Cost model for finding a good repartition strategy.

13 Indexing System Prototype

14 Insertion Process Insert to 2nd internal node first. Flush data to corresponding partition when its size reaches to a threshold.

15 How to repartition?

16 Similarity between partitions
Sim(R1,R2) = Intersection(R1,R2) / Union(R1,R2) Repartition when Sim(R1,R2) < threshold. E.g. 95%. Quality = (G+U)/T * 100% G is total area of new partitions. U is total area of interactions between unchanged partitions and standard corresponding partition. T is total area of standard partitions.

17 Repartition strategy Step 1: compute boundaries of standard partitions. Step 2: Compute similarities between old partitions and standard partitions. Step 3: Split partitions which its similarity is less than a configurable threshold.

18 Algorithm: find partitions to split

19 Experiments

20 Experiment setup Datasets were randomly generated by SpatialHadoop (100MB, 200MB, 300MB). Single node, HDFS block size: 16MB.

21 Experiment setup

22 Experiment setup

23 Conclusions

24 Contributions Proposed an indexing prototype. Proposed a cost-model to evaluate cost of repartitioning. Proposed an algorithm to find the good strategy for repartitioning.

25 Future works Execute experiment with diversity data. Execute experiment to compare spatial query performance between static and dynamic index.

26 Thank you!


Download ppt "Dynamic Indexing in SpatialHadoop"

Similar presentations


Ads by Google