Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)

Similar presentations


Presentation on theme: "Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)"— Presentation transcript:

1 Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR) EGI Community Forum 2013 Manchester 8-12 Apr 2013

2 The Species Distribution Modeling The AquaMaps Scenario D4Science Infrastructure gCube Framework gCube Statistical Manager D4Science Cloud ComputingConclusions 2 Overview

3 3 Species distribution models aiming at estimating the presence of a species in a given area are essential instruments in the development of strategies and policies for the management and the sustainable and equitable use of living resources. 2 Main issues to face: Need for large computing capabilities and appropriate modeling tools Need for both a sufficient amount of good quality occurrence point datasets and suitable environmental datasets Species Distribution modeling

4 Model-based, large-scale predictions of known natural occurrence of marine species. Predictions are made by matching species tolerances against local environmental conditions. ( e.g. salinity, temperature) Computation is based on algorithms such as AquaMaps: – Developed by Kashner et al. (2006) to predict global distributions of marine mammals – Color-coded species range map, using a half-degree latitude and longitude dimensions The AquaMaps scenario 4

5 HSPENHSPEC HCAF – Species Environmental Envelope (HSPEN) Range of environmental tolerance and preference of a species – Cells Authority File (HCAF) Metadata about half degree cells: membership, physical attributes – Cells Species Assignments (HSPEC) Probability of occurrence of a species in a given cell Defining Environmental Envelopes Generating Species Occurrence Probabilities Plotting Occurrence Maps and GIS Layers The AquaMaps scenario 5

6 Very large volume of input and output data Less than 7,000 species: – HSPEC native range = 56,468,301 – HSPEC suitable range = 114,989,360 Estimate for 50,000 species: – HSPEC native range = 350,000,000 – HSPEC suitable range = 715,000,000 [Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center] Very large number of computation One Multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species – requires 125 millions computations One global map (extended to all species and cells around the world) – requires about 400 billions computations [N. Bailly, WorldFish Center] 11,549 species ( from FishBase) 2 Days of sequential computation The AquaMaps scenario 6

7 GBIF DRIVER EGI AquaMaps VRE GENESI-DEC VENUS-C GeoNetwork OBISs 7 Production level infra deployed and maintained during D4Science (2007) and D4Science II (2009) projects D4Science Infrastructure

8 D4Science hosts biodiversity communities federated by the iMarine and the EUBrazilOpenBio initiatives D4Science will provide ENVRI RIs with seed resources D4Science hosts biodiversity communities federated by the iMarine and the EUBrazilOpenBio initiatives D4Science will provide ENVRI RIs with seed resources D4Science Infrastructure Well suited for typical biodiversity processes like Ecological Modeling Provides access to – computational and storage resources offered by commercial cloud providers – new storage technologies generally identified as no-sql databases – several algorithms for performing data analysis and mining Offers scalable platforms for data interoperability and efficient data management Offers a scalable infrastructure for efficient spatial data access, processing, and visualization 8

9 D4Science: example of communities 9 1920 Collaborators, 33 M Hits/month 50 K/month unique visitors from 26 countries Aquamaps Operational Data Observation Data 400 Experts OpenModeller Cloud

10 gCube is a JAVA service-oriented framework managing: – creation and interconnection of e-Infrastructures in a controlled and highly configurable environment. – deployment of dynamic Virtual Research Environments Enabling Layer Allows deployments of: – Native components on Tomcat (hot deployments) – gCube components on Axis container (dynamic deployments) Implements Infrastructure components optimal deployment and allocation (automatic or admin driven) 10 gCube Framework

11 Information System – This service is a key one in a gCube-based infrastructure since it offers functionalities for publishing, monitoring, discovering and accessing the set of resources forming the infrastructure Storage Manager – the management of files storage is based on a network of distributed storage nodes managed via specialized software for document- oriented databases. The Storage Manager Library in its current implementation offers files management over two possible document store software: MongoDB and Terrastore. Message Queue – This service is based on Apache Active Message Broker to support a queue-based mechanism for distributing messages to consumers 11 gCube Framework: Main Components

12 Executor – This service is a key component to endow a gCube-empowered infrastructure with cloud processing. It acts as a container for gCube tasks ( as plugins of the service) which can be dynamically deployed into the service and executed through its interface. Generic Worker – task of the Executor which is exploited in cloud computation tasks. It is able to execute “processes”, either binary executables or scripts, along with their dependencies in a sandbox. 12 gCube Framework: Main Components

13 Geospatial Data Manager – Service for discovering and accessing to distributed environmental data and maps. This service relies on maps stored on several GeoServer instances. A set of PostGIS databases store the concrete values and geometries and the GeoServer distributes them according to standard Open Geospatial Consortium (OGC) protocols like Web Map Service (WMS), Web Coverage Service (WCS) and Web Feature Service (WFS). A GeoNetwork instance is endowed with an OGC CSW based search engine which allows for retrieving meta-information 13 gCube Framework: Main Components

14 Statistical Manager Java Objects User’s DB D4Science DB Source D4Science DB Source SDMX Storage Manager Storage Manager CSV HTTP CALLS D4Science Workspace External Features Sources gCube Statistical Manager 14

15 Statistical Manager is able to: Generate Geographical Probability models for species (e.g. Aquamaps) Perform transformations on data (e.g. interpolations) Perform data mining operations (e.g. modeling, clustering) Evaluate models, distributions and experiments (e.g. ROC curve, AUC, Accuracy) Perform data quality analysis (e.g. Habitat Representativeness Score) Scope 15

16 Architecture 16

17 Advanced Graphical Interfaces 17

18 D4Science Cloud Processing 18

19 The Statistical Manager is instantiated with the AquaMaps algorithm Data generation is up to 50-times faster on D4Science cloud Adds the generation and publication of GIS layers representing the species distribution Supports generation of transect Supports dataset management facilities Solves scalability issues 19 Statistical Manager & AquaMaps

20 Conclusions Ecological Modeling in D4Science: Perform modeling by using Cloud Computing in a transparent way to users Take care of parallelization issues Evaluate models performances Next Step: Transparent generation of Geospatial features at different resolutions by implementing geospatial data processing by means of cloud computing facilities, endowed with a WPS protocol interface. D4Science 20

21 Go mobile with iMarine 21 iMarine application for iOS and Android to discover over 500 world marine species and stay informed on iMarine news & activities Try AppliFish ! iOS Android AppliFish

22 Landscape D4Science e-Infrastructure gCube Framework gCube Apps Discussion Thanks for your attention Questions? www.i-marine.org www.d4science.org https://portal.i-marine.d4science.org 22


Download ppt "Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)"

Similar presentations


Ads by Google