Dynamic Indexing in SpatialHadoop

Dynamic Indexing in SpatialHadoop
Tin K. Vu, CSE Dept, UC Riverside Advisor: Prof. Ahmed Eldawy, CSE Dept, UC Riverside Project Full Presentation, course CS267, F16

Introduction

SpatialHadoop A framework for big spatial data: language, storage, MapReduce, operations. Work efficiently with static data.

Dynamic Indexing in SpatialHadoop
Make SpatialHadoop being able to work with dynamic data. Maintain performance of spatial queries.

Store new data in master node. Repartition with low cost.
Key ideas Store new data in master node. Repartition with low cost. How does it overcome limitations of existing works? Keep the advantages of SpatialHadoop. Construct a good strategy for repartitioning.

Outline Introduction Related works Dynamic Indexing Experiments Conclusions

Related works

Big Spatial Indexes Hadoop-GIS: partition based-on density. Limitations: support limited types of spatial data type or query. SpatialHadoop: pre-indexing based-on boundary. Limitation: work with static data.

Dynamic Indexes AsterixDB, HBase: support high rate of data ingestion. Limitations: support limited types of spatial data type or query.

Dynamic Spatial Indexes
MD-HBase, GeoMesa: view spatial data in key- value aspect. Limitations: support limited types of spatial data type or query.

Dynamic Indexing

Approach Multi-levels tree with HDFS: New data is stored in master node, then flush to slave nodes. Cost model for finding a good repartition strategy.

Indexing System Prototype

Insertion Process Insert to 2nd internal node first. Flush data to corresponding partition when its size reaches to a threshold.

How to repartition?

Similarity between partitions
Sim(R1,R2) = Intersection(R1,R2) / Union(R1,R2) Repartition when Sim(R1,R2) < threshold. E.g. 95%. Quality = (G+U)/T * 100% G is total area of new partitions. U is total area of interactions between unchanged partitions and standard corresponding partition. T is total area of standard partitions.

Repartition strategy Step 1: compute boundaries of standard partitions. Step 2: Compute similarities between old partitions and standard partitions. Step 3: Split partitions which its similarity is less than a configurable threshold.

Algorithm: find partitions to split

Experiments

Experiment setup Datasets were randomly generated by SpatialHadoop (100MB, 200MB, 300MB). Single node, HDFS block size: 16MB.

Experiment setup

Conclusions

Contributions Proposed an indexing prototype. Proposed a cost-model to evaluate cost of repartitioning. Proposed an algorithm to find the good strategy for repartitioning.

Future works Execute experiment with diversity data. Execute experiment to compare spatial query performance between static and dynamic index.

Thank you!

Dynamic Indexing in SpatialHadoop

Similar presentations

Presentation on theme: "Dynamic Indexing in SpatialHadoop"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Indexing in SpatialHadoop

Similar presentations

Presentation on theme: "Dynamic Indexing in SpatialHadoop"— Presentation transcript:

Similar presentations

About project

Feedback