Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Geospatial Indexing

Similar presentations


Presentation on theme: "Distributed Geospatial Indexing"— Presentation transcript:

1 Distributed Geospatial Indexing
Rich Fecher Kent Miller

2 What is GeoWave? GeoWave…
Bridges the gaps between popular geospatial projects, and distributed processing/analytics frameworks. Leverages the scalability of distributed key-value stores for effective storage, retrieval, and analysis of massive geospatial datasets. An open source LocationTech project from the National Geospatial-Intelligence Agency (NGA) in collaboration with RadiantBlue Technologies and Booz-Allen Hamilton.

3 Core Problem How should GeoWave index multi-dimensional (i.e. spatial) data in a 1-dimensional, sorted key-value store?

4 Dimensionality Reduction
Use a Space Filling Curve (SFC) to impose multi-dimensional data.

5 Space Filling Curve Selection
Z-Order Hilbert H-order Peano AR2W2 WL∞ 6 4 8 5.40 5.00 WL2 6.04 WL1 9 10.66 12.00 9.00 WBA 2.40 3.00 2.00 3.05 2.22 ABA 2.86 1.41 1.69 1.42 1.47 1.40 Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv: v2 Worst Case Bounding Box Area Ratio (WBA) Average Total Bounding Box Area (ABA) Worst Case Dilation

6 Query-Time Aggregation
Computes customized summaries on query Runs distributed on data nodes Returns user-specified aggregate values

7 Map Occlusion Culling: Subsample At Pixel Resolution
A specific determined zoom level, each pixel signifies a range in degrees. Scanning the data, only one entry is needed within each pixel range. The rest of the entries can be skipped. The block identified in red represents many data points, but is rendered by the 9 pixels.

8 Pixel-based Subsampling: Apply at Data Node on Keys
1 2 3 4 2 3 Database Data Displayed Pixels The accumulo iterator starts at the first pixel, scans until it hits a geometry, then skips to the next pixel. 1 4 TO DO: Fix graphic The rendering engine received only these points Scan to the first pixel Seek to the beginning of the next pixel Points that were all skipped.

9 Distributed Rendering
GeoServer (GeoWave Plugin) Map Request Layer Style GeoWave Data Nodes Each scan result is an image with the data in the range Rendered Map Map Response All resultant images are composited together

10 Built On

11 Additional Core Features in GeoWave
Command Line Utilities & RESTful Web Services Local/HDFS Ingest Kafka Streaming Ingest Stats Generation OSM Utilities Landsat8 Utilities Base Analytics KDE Heat Map DBSCAN K-Means Clustering Open Geospatial Consortium standard services via GeoServer Spatial-Temporal indexing Integrated with other geospatial frameworks PDAL, Mapnik HBase Support Raster datasets Rapid Deployment via AWS EMR GeoWave MapReduce input/output formats Statistics Attribute ranges/histograms Enveloping bounding box over all geometries Counts of # of stored items Counts of discrete attribute values Support for a variety of ingest types Shapefile, GeoJSON, PostGIS, ArcGrid, GeoTIFF, GPX, T-Drive

12 GeoWave Where can I find it?
GitHub Source Examples Documentation This Demo

13 Thanks! Rich Fecher rfecher@radiantblue.com Kent Miller

14 Backup Slides

15 OSM – Planet GPX Every track ever uploaded to Open Street Map
Complete data attribution 2.9 Billion spatial entities (points)

16 Global View Entire Pointset Visualized

17 Zoomed In

18

19 Kernel Density Estimate Global View

20 Kernel Density Estimate Zoomed In

21 Tiered Indexing Tier 0 (1x1) Tier 1 (2x2) Tier 2 (4x4) Tier 3 (8x8)
Point Polygon Tier Duplicates Cell(s) 1 4 2 14 6 3 56 21-24 222 9 35-42

22 Space Filling Curve Granularity
8x8 Grid 64x64 Grid Polygons overlap few cells Many points per cell Polygons overlap many cells Fewer points per cell Which is better? Why?

23 Good to Know Space Filling Curve Range Decomposition
Intersects Bounding Box Query (71 -> 98) Range of cells from 92-99 Range of cells from 70-75 Range of cells from

24 Microsoft - GeoLife Microsoft research has made available a trajectory data set that contains the GPS coordinates of 182 users over a three year period (April 2007 to August 2012). There are 17,621 trajectories in this data set.

25 GeoLife Original Track Data

26 Kernel Density Estimate Gaussian Kernel

27 Zoomed In Original Track Data

28 Kernel Density Estimate Zoomed In

29 Good to Know GeoWave Key Structure
Value Row ID Column Time-Stamp Family Qualifier Visibility Index ID Adapter ID Data ID Adapter ID Length Data ID Length # of Duplicates Field ID Field Value Tier Bin Hilbert GeoWave Feature Metadata Feature Attribute Field/Value Pair Tier Index Hilbert SFC Index

30 Initially 2 Points, 2 Polygons
Ingest Example Ingest a Line Tier 31 Tier 3 Initially 2 Points, 2 Polygons Many Duplicates 18 Duplicates Tier 2 Tier 1 Tier 0 Final State 8 Duplicates 3 Duplicates 1 Duplicate Final State

31 Query Example Intersects Query Tier 1 Tier 2 Magenta Bounding Box
1 cell, 0 filtered, 0 intersect 2 cells, 2 filtered, 0 intersect Tier 3 Tier 4 Tier 31 Final Result 4 cells, 1 filtered, 0 intersect 6 cells, 0 filtered, 1 intersect 0 filtered, 1 intersect 1 Point, 1 Polygon Returned


Download ppt "Distributed Geospatial Indexing"

Similar presentations


Ads by Google