Presentation is loading. Please wait.

Presentation is loading. Please wait.

The EarthServer initiative: towards Agile Big Data Services

Similar presentations

Presentation on theme: "The EarthServer initiative: towards Agile Big Data Services"— Presentation transcript:

1 The EarthServer initiative: towards Agile Big Data Services
2nd GEOSS Science and Technology Stakeholder Workshop Bonn, Germany, 2012-aug-29 Peter Baumann Jacobs University | rasdaman GmbH Bremen, Germany

2 About the Presenter Professor of CS, Jacobs University
Head, Large-Scale Scientific Information Systems research group Main outcome so far: rasdaman first „Big Raster Data Analytics“ server Standardization OGC: chair of raster-relevant working groups, editor of 12+ standards & candidate standards ISO: working on Raster („Array“) SQL INSPIRE: Invited expert for coverages

3 Roadmap OGC standards rasdaman EarthServer EarthServer & GEOSS

4 Feature and Coverage Data Standards
Core element in OGC: geographic feature = abstraction of a real world phenomenon associated with a location relative to Earth Special kind of feature: coverage = space-time varying multi-dimensional phenomenon Typical representative: raster image ...but there is more! Typically, coverages are Big Data

5 Coverage Types 5 as per GML 3.2.1 Abstract Coverage all n-D
«FeatureType» Abstract Coverage all n-D New subtypes possible Discrete Coverage Continuous Coverage Rectified GridCoverage Referenceable GridCoverage Grid Coverage MultiSolid Coverage MultiSurface Coverage MultiCurve Coverage MultiPoint Coverage 5

6 Coverage Encoding Pure GML: complete coverage represented by GML
Special Format: other suitable file format (ex: MIME type “image/tiff”) Multipart-Mixed: multipart MIME, type “multipart/mixed” GML Coverage Domain set Range type Range set App Metadata GML Coverage Domain set Range type xlink App Metadata NetCDF file NetCDF Domain set Range type Range set App Metadata GeoTIFF Range type Range set 6 6

7 Core OGC Service Standards
data images data data feature coverage meta FE WCPS CQL WFS-T WCS-T CS-T WFS WMS WCS CS-W WMS "portrays spatial data”  pictures WCS "provides data + descriptions; data with original semantics, may be interpreted, extrapolated, etc.“ [09-110r4] 7

8 Web Coverage Service (WCS)
Core: Simple & efficient access to multi-dimensional coverages subset = trim | slice WCS Extensions for additional functionality facets “band extraction”, scaling, reprojection, interpolation, query language, ... Application Profiles define domain-oriented bundling 8

9 Web Coverage Processing Service (WCPS)
Raster Query Language: ad-hoc navigation, extraction, aggregation, analytics Time series Image processing Summary data Sensor fusion & pattern mining

10 EarthServer: Big Earth Data Analytics
Scalable On-Demand Processing for the Earth Sciences EU funded, 3 years, 5.85 mEUR Platform: rasdaman (Array Analytics server)  Distributed query processing, integrated data/metadata search, 3D clients  Strictly open standards: OGC WMS+WCS+WCPS; W3C Xquery; X3D 6 * 100+ TB databases for all Earth sciences + planetary science in attachment slide 5 with our contribution. Meteorological / climate studies require 5D datasets, thus: 3D for space, 1D for time, and 1D for different variables (humidity, temperature, precipitation, and so on). The picture shows a thunderstorm simulation, with the solid surface representing a threshold in the 3D humidity filed, while colors represent temperature isosurfaces. In the bottom, there is the top view of the simulated thunderstorm to simulate satellite view, and the respective satellite observation.

11 The rasdaman Raster Analytics Server
Array DBMS for massive n-D raster data new database attribute type: array<celltype,extent> Data integration: rasters stored in standard database Extending ISO SQL with array operators: “tile streaming” architecture n-D array  set of n-D tiles extensive optimization, hw/sw parallelization In operational use dozen-Terabyte objects Analytics queries in 50 ms on laptop select[x0:x1,y0:y1] > 130 from LandsatArchive as img

12 Value-Added Satellite Image Archive
[Diedrich et al 2001]

13 rasdaman: Distributed Query Processing
WCPS peer-to-peer cloud each node accepts all requests Incoming node distributes query, semantics based Manifold optimization criteria coverage A for $a in ( A ) return encode( ($a.nir - $ / ($a.nir + $, “array-compressed“ ) for $a in ( A ), $b in ( B ) return encode( ( ($a.nir - $ / ($a.nir + $ ($b.nir - $ / ($b.nir + $ ), “HDF5“ ) coverage B for $b in ( B ) return encode( ($b.nir - $ / ($b.nir + $, “array-compressed“ ) [Owonibi 2012]

14 EarthServer Contribution to GEOSS
Integrated n-D coverage data / metadata search Smooth integration with Broker [Nativi, Mazzetti 2012]

15 EarthServer Contribution to GEOSS
Including „reverse lookup“ queries: „give me metadata for data with specific properties“ Also integration with MapServer, GDAL, ... Scalable n-D interfaces, based on OGC standards Working „in situ“on existing archives; no copying! Flexible ad-hoc processing & filtering Through OGC standardized query language nD visual Web clients 1D diagrams, 2D maps, 3D data cubes, 3D timeseries sets, ... Dynymically composed from query results Integrated n-D coverage data / metadata search Smooth integration with Broker

16 Conclusion Sensor, image, & statistics data = a main source of Big Data in Earth Sciences Petrol industry has „more bytes than barrels“ OGC standards offer common platform spatio-temporal coverages – a unified, cross-domain data model Web Coverage Service suite – from simple download to flexible analytics EarthServer can contribute Agile Analytics to GEOSS OGC coverage standards rasdaman technology

17 Integration of OGC WCS and SWE
SWE O&M and SensorML (+ friends): high flexibility to accommodate virtually any data structure → upstream integration GMLCOV and WCS (+WCPS): one generic schema for all coverage types; scalable; versatile processing → downstream services coverage server O&M + SensorML GMLCOV + WCS Semantic Web

18 VAROS (contd.d)

19 The Integrated Geo Warehouse
nD 2D Compprehensive geophysics data mgmt seismic measurement, borehole data, geophone data, geo tomograms, stratigraphy layers, geological models, ... + annotations + meta data 1D 3D

20 Let’s Take a Closer Look...
Divergent access patterns for ingest and retrieval Alternative 1: simple access service, let client chisel result Alternative 2: Deliver to exact needs no bandwidth waste, higher quality of service Server must mediate between access patterns (...later more) Intelligent access interfaces help

21 standard database system
System Architecture petascope request translator rasdaman engine metadata standard database system WCS+WCPS WPS+WCPS interfaces: OGC or API Server: OGC interfaces as servlets: WCS 2.0, WCPS 1.0, WPS 1.0 Server engine: C++ Bindings to GDAL, MapServer, ERDAS (to be extended) Ex: VAROS project (ESA) Commercial client, ChartLink Open-source server, rasdaman

22 Just-In-Time Compilation
Times [ms] for 5122 * n ops Observation: interpreted mode slows down Approach: cluster suitable operations compile & dynamically bind Benefit: Speed up complex, repeated operations Variation: compile code for GPU select x*x*...*x from float_matrix as x [Jucovschi, Stancu-Mara 2008]

23 Query Optimization – Ex. 1

Download ppt "The EarthServer initiative: towards Agile Big Data Services"

Similar presentations

Ads by Google