Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp.

Similar presentations


Presentation on theme: "Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp."— Presentation transcript:

1 Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp

2 Contents  What is Spatial Tajo?  Motive for Development  Why I chose Apache Tajo?  Plan for the implementation of the plug-in  Current status ◦ Parts implemented ◦ Parts not yet implemented  Conclusion  References

3 What is Spatial Tajo?  A plug-in to provide spatial queries for Tajo  In detail, it is a plug-in allowing the provision and performance of queries about spatial relations and spatial analysis for data sets stored in the distributed data warehouse system. ◦ Providing spatial functions for spatial queries ◦ Supporting data types ◦ Supporting an index structure for spatial data sets ◦ Supporting raster data Overall Architecture of Apache Tajo Tajo Worker Local File- System HDFS Amazon S3 QueryMaster Local Query Engine StorageManager Spatial Tajo Tajo Master CatalogStore Allocate a query Manage metadata Client JDBC SQL Shell Web UI

4 Motive for development  The volume of the spatial data sets to be analyzed come near big data, and I’d like to analyze this using SQL.  I'd like to use a system working without batch processing.  I'd like to use free software or free solution (Provided that I contribute my experiences to communities).

5 Motive for development  Of course, there are good ones among the known software and solutions. But… ◦ Relational Database and DBMS ◦ Oracle Spatial and Graph (Oracle Database+Plug-in) ◦ MySQL DBMS ◦ PostGIS with PostgreSQL ◦ NoSQL ◦ Document-oriented database: MongoDB, CouchDB (Plug-in), RethinkDB ◦ HBase, Hive ◦ Cluster and Cloud ◦ GeoMesa, ESRI GIS Tools for Hadoop, SpatialHadoop (http://spatialhadoop.cs.umn.edu) (on Hadoop)http://spatialhadoop.cs.umn.edu ◦ CartoDB (on top of PostgreSQL, PostGIS and SaaS)  Conclusion: FOSS + Spatial → Apache Tajo + Spatial Plug-in!

6 Why I chose Apache Tajo?  Apache TajoApache Tajo ◦ A robust big data relational and distributed data warehouse system for Apache Hadoop ◦ Designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL on large-da ta sets stored on HDFS and other data sources.  Why did I choose Apache Tajo?: Features ◦ Motions of distributing and storing data are entrusted to Hadoop or Amazon S3 ◦ It supports the insertion of data, but doesn’t support the update of the data ◦ Compatibility: Can query with ANSI/ISO SQL ◦ It is faster than processing using MapReduce and guarantees fault tolerance ◦ It is easy to build up and manage ◦ The user can implement and plug it in yourself if necessary

7 Plan for the implementation of the plug-in  Spatial functions for spatial queries ◦ Distances, Equals, Disjoints, Intersects and Touches, Crosses, Overlaps, Contains, Lengths, Areas and Centroids ◦ Transforming functions for spatial types (like from_OOOO or to_OOOO)  Adding spatial data types  Enabling to run kNN queries  Supporting an index for spatial data ◦ R-tree, Quad-tree and KD-tree, etc.

8 Current status – Parts implemented  Most primary spatial functions ◦ Implementing most primary spatial functions using JTS ◦ Distances, Equals, Disjoints, Intersects, Touches, Crosses, Overlaps and Contains  Running kNN queries ◦ It can run kNN queries using the implemented spatial functions.  Indexing for spatial data ◦ R-tree indexing using Sort-Tile-Recursive (STR) Local indexes Global Index The process of reading data using the index 1.Reading the global index and finding search keys 2.Finding local indexes corresponding to the search keys, 3.Finding the search keys in the local indexes 4.Directly reading the tuples. Wikipedia: R-tree - STR,SpatialHadoop PaperSpatialHadoop Paper and Tajo DocumentTajo Document

9 Current status – Parts not yet implemented  Adding spatial data types ◦ Parameters of spatial functions are inaccurate and inconvenient to use.  Spatial functions not yet implemented ◦ Lengths, areas and centroids ◦ Transform functions (e.g. from_OOO and to_OOO)  Optimizing functions and queries ◦ Optimizing spatial functions ◦ Optimizing kNN queries  Indexing spatial data ◦ Quad-tree (with GeoHash) and KD-tree  Modularization ◦ Currently, since it is not separated from Apache Tajo, it is impossible to install the plug-in.

10 Conclusion  What is Spatial Tajo?  Motive for Development  Why I chose Apache Tajo?  Plan for the implementation of the plug-in  Current status ◦ Parts implemented ◦ Parts not yet implemented

11 References  Apache Tajo ◦ Official Website, Source codes, User documentation Official WebsiteSource codesUser documentation ◦ Efficient In-situ processing of various storage types on Apache Tajo Efficient In-situ processing of various storage types on Apache Tajo ◦ Apache Tajo Enters the SQL-on-Hadoop Space Apache Tajo Enters the SQL-on-Hadoop Space ◦ SQL-on-Hadoop: What does “100 times faster than Hive” actually mean? SQL-on-Hadoop: What does “100 times faster than Hive” actually mean? ◦ Setting up an Apache Tajo Cluster on Amazon EMR Setting up an Apache Tajo Cluster on Amazon EMR  PostgreSQL and PostGIS Document  SpatialHadoop ◦ Official Website, Source codes Official WebsiteSource codes ◦ A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data ◦ Spatialhadoop: towards flexible and scalable spatial processing using MapReduce Spatialhadoop: towards flexible and scalable spatial processing using MapReduce

12 Reference  Spatial Databases: With Application to GIS Spatial Databases: With Application to GIS  Indexing ◦ STR: A simple and efficient algorithm for R-tree packing STR: A simple and efficient algorithm for R-tree packing ◦ Spatialhadoop: towards flexible and scalable spatial processing using mapreduce Spatialhadoop: towards flexible and scalable spatial processing using mapreduce ◦ Tajo: A distributed data warehouse system on large clusters Tajo: A distributed data warehouse system on large clusters ◦ Apache Tajo Documents: Index types Apache Tajo Documents: Index types  Wikipedia ◦ Spatial Database Spatial Database ◦ Spatial Query Spatial Query ◦ R-tree R-tree

13 Thank You for listening Do you have any questions? Please e-mail(pseudojo.1989@gmail.com) me with the questions, and I’ll answer them in detail.pseudojo.1989@gmail.com GitHub : https://github.com/awarematics/spatial-tajohttps://github.com/awarematics/spatial-tajo


Download ppt "Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp."

Similar presentations


Ads by Google