Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Ingestion in EMSO Presented by Marco Pappalardo

Similar presentations


Presentation on theme: "Data Ingestion in EMSO Presented by Marco Pappalardo"— Presentation transcript:

1 Data Ingestion in EMSO Presented by Marco Pappalardo
Spacearth Technology Srl, Italy INDIGO SUMMIT on Data Ingestion Catania, 12th May 2017 RIA

2 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
What is EMSO? The European Multidisciplinary Seafloor and water-column Observatory (EMSO) is a large scale, distributed, marine Research Infrastructure (RI) of fixed-point observatories It serves marine science researchers, marine technology engineers, policy makers, and the public. It monitors natural hazards, climate change, and marine ecosystems. 11 nodes and 4 test sites Catania - May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

3 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
EMSO Nodes Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

4 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Observatory what? Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

5 EMSO Generic Instrumentation Module
EGIM is a sea-floor observatory. Data acquired by the EGIMs, through an EGIM Sensor Observation Service Gateway, will be dispatched both to the EMSO Regional Data Nodes and to the EMSODEV Data Management Platform. The EMSODEV (EMSO) Data Management Platform will collect, analyze, … and publish data. Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

6 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Why EGIM? Goal: to develop and deploy EGIMs to measure a specific set of variables suitable for all sites and depths, including: temperature, conductivity (salinity), pressure (depth), turbidity, dissolved oxygen, ocean currents, and passive acoustics 1st deployment on Dec Vilanova y la Geltrù Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

7 EMSODEV Data Management Platform
The DMP includes a set of common services, compliant to the phases of the computational viewpoint of the ENVRI Reference Model v2.0: Data acquisition; Data curation (including data storage and partitioning, data quality checking and cataloguing services, import/export utilities, query services); Data publishing (query preparation, preparation for import/export of curated data); Data processing services (real time and/or batch processing computing capabilities); Data use (platform authentication and authorization). Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

8 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
DMP API emsodev-api is a Spring-Boot based RESTfull web service REST API docs available within deployed app through Swagger Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

9 Data gathering from OBSEA SOS
Two raw data collectors exist: A Pull Transfer Flow: data is retrieved via API exposed by the SOS server available at the OBSEA observatory. A Push Transfer Flow: data will be sent to a DMP service which “listens” to near-real time updates on XML files describing sensors data and observations SOS server API GetCapabilities EMSODEV DATA MANAGEMENT PLATFORM GetObservation OBSEA data DescribeSensor Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

10 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Acquisition Real time data access several standards like OGC Sensor Web Enabled (OGC SWE) specifying interoperability interfaces and metadata encodings that enable real time integration of heterogeneous sensor webs into the information infrastructure. SWE specification like Sensor Observations Service (SOS), Sensor Model Language (SensorML), and Observations & Measurements (O&M), will be supported. Metadata formats extended Dublin Core format ISO19139 Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

11 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

12 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Curation Sensors data will be coming from the in asynchronous/batch (PULL) mode real-time mode (PUSH) “Push” and “Pull” send (HTTP POST/PUT) formatted data to data store controllers Distributed File Systems, NoSQL DBs, Time Series DBs, Streaming Store Controllers Both PUSH and PULL transfer flow save metadata into Metadata and Service Repository. OneData was evaluated as candidate solution to enlarge this set of Data Storage solutions. Sensor data can be either Retrived via APIs exposed by an SOS server (Pull Transfer) Sent to DMP(latform) before being consolidated on the SOS server (Push Transfer Flow) Two main processes happen during the each transfer flow: data scraping, extracting parts of marine observ’s coming/retrieved from SOS server; data munging/wrangling, converting data from a "raw" format into another one that allows data to be more conveniently consumed later Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

13 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Publishing The will be equipped with DMP Tools in addition to API Activate process of importing a dataset from external data sources (EMSO regional nodes); Querying data curated within the EMSODEV DMP; Activate the process of defining (e.g. selecting a time range and a measured parameter) and generating a dataset to be exported outside the EMSODEV DMP. Medium to long-term preservation is ensured by regional EMSO nodes. Long term archiving will be ensured by national and international certified long-term data archives such as those of the ICSU World Data System (PANGAEA) and the National Oceanographic data centers (NODC). A common approach for Data Preservation is to be derived. Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

14 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Use and Reuse Complex interactions are mediated by virtual laboratories providing a persistent context for interactions between groups of users and components within DMP. experimental laboratory: a utility/tool allowing scientists/users to deploy datasets for processing and acquiring results. All laboratories must interact with a security service (AAI). Data produced will be available for usage beyond the original purpose Adopted sensors are often multi-purpose and designed for multiple users and applications. Selection of certified repositories for long-term preservation/curation in progress Data to be stored together with the minimum software, metadata and documentation. EMSO promotes standardization+integration of Regional EMSO Nodes data. to improve overall accessibility and reusability of local node data via the EMSO Data Portal. Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

15 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Demo Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

16 Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Acknowledgement Daniele Baratta (Swing:It, Software Engineering Italia Srl) Michał Orzechowski (CYFRONET) Daniele Cosenza (Spacearth Technology Srl) Riccardo Delpopolo Carciopolo (Spacearth Technology Srl) Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO

17 Thank you for watching

18 INDIGO and EUDAT Solutions
Currently OneData IAM B2DROP B2SHARE B2FIND In the future EUDAT services to use DMPonline Future Gateways Automated Integrity Tests Catania – May 12, 2017 Indigo Summit on Data Ingestion – Data Ingestion in EMSO


Download ppt "Data Ingestion in EMSO Presented by Marco Pappalardo"

Similar presentations


Ads by Google