Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

Implementation of a Data Node in China's Spatial Information Grid Based on NWGISS Dengrong Zhang, Le Yu, Liping Di Institute of Spatial.
NSDI and Cyberinfrastructure Doug Nebert April 2010.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
D4Science Project (DILIGENT For Science) Donatella Castelli CNR-ISTI DRIVER Summit January 2008 Gottingen (Germany)
1 Alternate Title Slide: Presentation Name Goes Here Presenter’s Name Infrastructure Solutions Division Date GIS Perfct Ltd. Autodesk Value Added Reseller.
Understanding and Managing WebSphere V5
U.S. Department of the Interior U.S. Geological Survey U.S. National Water Census “Cyber – Platform” Update Progress and challenges to overcome in realizing.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Sharing Geographic Content
Mobile Mapping Systems (MMS) for infrastructural monitoring and mapping are becoming more prevalent as the availability and affordability of solutions.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Massimiliano Assante – Leonardo Candela – Donatella Castelli – Pasquale Pagano Fourteenth International Conference on Grey Literature An Environment Supporting.
, Increasing Discoverability and Accessibility of NASA Atmospheric Science Data Center (ASDC) Data Products with GIS Technology ASDC Introduction The Atmospheric.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Geospatial Systems Architecture Todd Bacastow. GIS Evolution
DISTRIBUTED COMPUTING
material assembled from the web pages at
Introduction to iMarine and it’s challenges Alexandros Antoniadis (NKUA) John Gerbesiotis (NKUA)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
AquaMaps - behind the scene Josephine “Skit” D. Rius FishBase Project/INCOFISH WP1 WorlFish Center INCOFISH WP3 Technical Workshop Campinas, Brazil
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
RSISIPL1 SERVICE ORIENTED ARCHITECTURE (SOA) By Pavan By Pavan.
©2012 LIESMARS Wuhan University Building Integrated Cyberinfrastructure for GIScience through Geospatial Service Web Jianya Gong, Tong Zhang, Huayi Wu.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
IMarine and our contribution 1 Presentation methodology: PechaKucha 20x20 Andrea Manzi (CERN) Nick Drakopoulos (CERN) IT GT.
Building Scientific Workflows for the Fisheries and Aquaculture Management Community based on Virtual Research Environments Pedro Andrade (CERN)
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
On the D4Science Approach Toward AquaMaps Richness Maps Generation Pasquale Pagano - CNR-ISTI Pedro Andrade.
Managing Virtual Research Environments in Hybrid Data Infrastructures Pasquale Pagano (CNR, Italy) iMarine Technical Director
User scenario on Marine Biodiversity AquaMaps Pasquale Pagano National Research Council (CNR) – ISTI Italy.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
Managing deployment and activation of Web Applications in a distributed e-Infrastructure EGI Technical Forum September 2011 Lyon
May 2010 GGIM, New York City The National System for Coordination of Territorial Information SNIT NSDI of Chile.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
EGI Technical Forum Madrid COMPSs in the EGI Federated Cloud Daniele Lezzi – BSC EGI Technical Forum Madrid.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
The EUBrazilOpenBio-BioVeL Use Case in EGI Daniele Lezzi, Barcelona Supercomputing Center EGI-TF September 2013.
Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,
GEOSPATIAL CYBERINFRASTRUCTURE. WHAT IS CYBERINFRASTRUCTURE(CI)?  A combination of data resources, network protocols, computing platforms, and computational.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
GeoServer Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Virtual multidisciplinary EnviroNments USing Cloud infrastructures Data Management at VENUS-C Ilja Livenson KTH
Virtual Research Environments as-a-Service Donatella Castelli CNR-ISTI EGI Conference 2016, 6-8 April.
EGI Technical Forum Madrid The EUBrazilOpenBio-BioVeL Use Case in EGI Daniele Lezzi – BSC EGI Technical Forum Madrid.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio.
Daniele Lezzi Execution of scientific workflows on federated multi-cloud infrastructures IBERGrid Madrid, 20 September 2013.
1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.
Accessing the VI-SEEM infrastructure
Pasquale Pagano (CNR-ISTI) Project technical director
Discovering and accessing data from a distributed network of data centres S. Mazzeo (ESA)
Virtual Research Environments as-a-Service
Pasquale Pagano CNR – ISTI (Pisa, Italy)
Pasquale Pagano CNR, Italy
INTAROS WP5 Data integration and management
Open Source distributed document DB for an enterprise
Flanders Marine Institute (VLIZ)
Brief introduction to the project
Introduction to D4Science
Virtual Research Environments as-a-Service
Presentation transcript:

Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR) EGI Community Forum 2013 Manchester 8-12 Apr 2013

The Species Distribution Modeling The AquaMaps Scenario D4Science Infrastructure gCube Framework gCube Statistical Manager D4Science Cloud ComputingConclusions 2 Overview

3 Species distribution models aiming at estimating the presence of a species in a given area are essential instruments in the development of strategies and policies for the management and the sustainable and equitable use of living resources. 2 Main issues to face: Need for large computing capabilities and appropriate modeling tools Need for both a sufficient amount of good quality occurrence point datasets and suitable environmental datasets Species Distribution modeling

Model-based, large-scale predictions of known natural occurrence of marine species. Predictions are made by matching species tolerances against local environmental conditions. ( e.g. salinity, temperature) Computation is based on algorithms such as AquaMaps: – Developed by Kashner et al. (2006) to predict global distributions of marine mammals – Color-coded species range map, using a half-degree latitude and longitude dimensions The AquaMaps scenario 4

HSPENHSPEC HCAF – Species Environmental Envelope (HSPEN) Range of environmental tolerance and preference of a species – Cells Authority File (HCAF) Metadata about half degree cells: membership, physical attributes – Cells Species Assignments (HSPEC) Probability of occurrence of a species in a given cell Defining Environmental Envelopes Generating Species Occurrence Probabilities Plotting Occurrence Maps and GIS Layers The AquaMaps scenario 5

Very large volume of input and output data Less than 7,000 species: – HSPEC native range = 56,468,301 – HSPEC suitable range = 114,989,360 Estimate for 50,000 species: – HSPEC native range = 350,000,000 – HSPEC suitable range = 715,000,000 [Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center] Very large number of computation One Multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species – requires 125 millions computations One global map (extended to all species and cells around the world) – requires about 400 billions computations [N. Bailly, WorldFish Center] 11,549 species ( from FishBase) 2 Days of sequential computation The AquaMaps scenario 6

GBIF DRIVER EGI AquaMaps VRE GENESI-DEC VENUS-C GeoNetwork OBISs 7 Production level infra deployed and maintained during D4Science (2007) and D4Science II (2009) projects D4Science Infrastructure

D4Science hosts biodiversity communities federated by the iMarine and the EUBrazilOpenBio initiatives D4Science will provide ENVRI RIs with seed resources D4Science hosts biodiversity communities federated by the iMarine and the EUBrazilOpenBio initiatives D4Science will provide ENVRI RIs with seed resources D4Science Infrastructure Well suited for typical biodiversity processes like Ecological Modeling Provides access to – computational and storage resources offered by commercial cloud providers – new storage technologies generally identified as no-sql databases – several algorithms for performing data analysis and mining Offers scalable platforms for data interoperability and efficient data management Offers a scalable infrastructure for efficient spatial data access, processing, and visualization 8

D4Science: example of communities Collaborators, 33 M Hits/month 50 K/month unique visitors from 26 countries Aquamaps Operational Data Observation Data 400 Experts OpenModeller Cloud

gCube is a JAVA service-oriented framework managing: – creation and interconnection of e-Infrastructures in a controlled and highly configurable environment. – deployment of dynamic Virtual Research Environments Enabling Layer Allows deployments of: – Native components on Tomcat (hot deployments) – gCube components on Axis container (dynamic deployments) Implements Infrastructure components optimal deployment and allocation (automatic or admin driven) 10 gCube Framework

Information System – This service is a key one in a gCube-based infrastructure since it offers functionalities for publishing, monitoring, discovering and accessing the set of resources forming the infrastructure Storage Manager – the management of files storage is based on a network of distributed storage nodes managed via specialized software for document- oriented databases. The Storage Manager Library in its current implementation offers files management over two possible document store software: MongoDB and Terrastore. Message Queue – This service is based on Apache Active Message Broker to support a queue-based mechanism for distributing messages to consumers 11 gCube Framework: Main Components

Executor – This service is a key component to endow a gCube-empowered infrastructure with cloud processing. It acts as a container for gCube tasks ( as plugins of the service) which can be dynamically deployed into the service and executed through its interface. Generic Worker – task of the Executor which is exploited in cloud computation tasks. It is able to execute “processes”, either binary executables or scripts, along with their dependencies in a sandbox. 12 gCube Framework: Main Components

Geospatial Data Manager – Service for discovering and accessing to distributed environmental data and maps. This service relies on maps stored on several GeoServer instances. A set of PostGIS databases store the concrete values and geometries and the GeoServer distributes them according to standard Open Geospatial Consortium (OGC) protocols like Web Map Service (WMS), Web Coverage Service (WCS) and Web Feature Service (WFS). A GeoNetwork instance is endowed with an OGC CSW based search engine which allows for retrieving meta-information 13 gCube Framework: Main Components

Statistical Manager Java Objects User’s DB D4Science DB Source D4Science DB Source SDMX Storage Manager Storage Manager CSV HTTP CALLS D4Science Workspace External Features Sources gCube Statistical Manager 14

Statistical Manager is able to: Generate Geographical Probability models for species (e.g. Aquamaps) Perform transformations on data (e.g. interpolations) Perform data mining operations (e.g. modeling, clustering) Evaluate models, distributions and experiments (e.g. ROC curve, AUC, Accuracy) Perform data quality analysis (e.g. Habitat Representativeness Score) Scope 15

Architecture 16

Advanced Graphical Interfaces 17

D4Science Cloud Processing 18

The Statistical Manager is instantiated with the AquaMaps algorithm Data generation is up to 50-times faster on D4Science cloud Adds the generation and publication of GIS layers representing the species distribution Supports generation of transect Supports dataset management facilities Solves scalability issues 19 Statistical Manager & AquaMaps

Conclusions Ecological Modeling in D4Science: Perform modeling by using Cloud Computing in a transparent way to users Take care of parallelization issues Evaluate models performances Next Step: Transparent generation of Geospatial features at different resolutions by implementing geospatial data processing by means of cloud computing facilities, endowed with a WPS protocol interface. D4Science 20

Go mobile with iMarine 21 iMarine application for iOS and Android to discover over 500 world marine species and stay informed on iMarine news & activities Try AppliFish ! iOS Android AppliFish

Landscape D4Science e-Infrastructure gCube Framework gCube Apps Discussion Thanks for your attention Questions?