On the D4Science Approach Toward AquaMaps Richness Maps Generation Pasquale Pagano - CNR-ISTI Pedro Andrade.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

WP3 Biomapping results to date WP3: NRM, CDF, CEFAS, DINARA, WCS Additional input: WP1, AquaMaps workgroup.
A new look at the interactions between marine mammals and fisheries (or did we we mess up the North Atlantic, and now blame them for what we did?) A new.
gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI ICIS Requirements Gathering.
D4Science Project (DILIGENT For Science) Donatella Castelli CNR-ISTI DRIVER Summit January 2008 Gottingen (Germany)
Final Review Meeting 16 th March 2010 Brussels (Belgium) D4Science Production Infrastructure Contract n°: RI Pedro Andrade (CERN)
Overview of Search Engines
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
EU 2nd Year Review – Jan – WP9 WP9 Earth Observation Applications Demonstration Pedro Goncalves :
Massimiliano Assante – Leonardo Candela – Donatella Castelli – Pasquale Pagano Fourteenth International Conference on Grey Literature An Environment Supporting.
JetWeb on the Grid Ben Waugh (UCL), GridPP6, What is JetWeb? How can JetWeb use the Grid? Progress report The Future Conclusions.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Virtual Research Environments: e-Infrastructures beyond Digital Libraries Pasquale Pagano CNR-ISTI RCDL08 Conference Information.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 Electronic Atlas for All Marine Species Rainer Froese IFM-GEOMAR
Donatella Castelli CNR-ISTI
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
AquaMaps - behind the scene Josephine “Skit” D. Rius FishBase Project/INCOFISH WP1 WorlFish Center INCOFISH WP3 Technical Workshop Campinas, Brazil
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
AquaMaps Rainer Froese GBIF-Copenhagen 30 January 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Metadata Mòrag Burgon-Lyon University of Glasgow.
AquaMaps Predictive distribution maps for marine organisms K. Kaschner, J. S. Ready, E. Agbayani, J. Rius, K. Kesner-Reyes, P. D. Eastwood, A. B. South,
From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.
ATLAS Grid Requirements A First Draft Rich Baker Brookhaven National Laboratory.
AquaMaps: Mapping Biodiversity Hotspots and Assessing Impacts of Climate Change K.Kaschner (FAO & Albert-Ludwigs- University of Freiburg), M. Taconet (FAO),
Composition in Modeling Macromolecular Regulatory Networks Ranjit Randhawa September 9th 2007.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Mapping distributions of marine organisms using environmental niche modelling - AquaMaps K. Kaschner, J. Ready, S. Kullander, R. Froese and many more….INCOFISH,
Building Scientific Workflows for the Fisheries and Aquaculture Management Community based on Virtual Research Environments Pedro Andrade (CERN)
Managing Virtual Research Environments in Hybrid Data Infrastructures Pasquale Pagano (CNR, Italy) iMarine Technical Director
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
User scenario on Marine Biodiversity AquaMaps Pasquale Pagano National Research Council (CNR) – ISTI Italy.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
The GridPP DIRAC project DIRAC for non-LHC communities.
The EUBrazilOpenBio-BioVeL Use Case in EGI Daniele Lezzi, Barcelona Supercomputing Center EGI-TF September 2013.
Realising Virtual Research Environments by Hybrid Data Infrastructures: the D4Science Experience Andrea Manzi (CERN) Leonardo Candela, Donatella Castelli,
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Mapping Reference Codelists Claudio Baldassarre, Yann Laurent, Aureliano Gentile.
Pedro Andrade > IT-GD > D4Science Pedro Andrade CERN European Organization for Nuclear Research GD Group Meeting 27 October 2007 CERN (Switzerland)
Virtual Research Environments as-a-Service Donatella Castelli CNR-ISTI EGI Conference 2016, 6-8 April.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.
Accessing the VI-SEEM infrastructure
e-Infrastructure Integration with gCube
The BlueBRIDGE project
Pasquale Pagano (CNR-ISTI) Project technical director
Simulation Production System
Discovering and accessing data from a distributed network of data centres S. Mazzeo (ESA)
Pasquale Pagano CNR – ISTI (Pisa, Italy)
Pasquale Pagano CNR, Italy
Flanders Marine Institute (VLIZ)
Rainer Froese, Kathleen Kesner-Reyes and Cristina Garilao
Introduction to D4Science
Laura Bright David Maier Portland State University
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
Presentation transcript:

On the D4Science Approach Toward AquaMaps Richness Maps Generation Pasquale Pagano - CNR-ISTI Pedro Andrade – CERN 4th EGEE User Forum/OGF 25 and OGF Europe's 2nd International Event Le Ciminiere, Catania, Sicily, Italy 2-6 March 2009

2 What are AquaMaps? Model-based, large-scale predictions of known natural occurrence of marine species Originally developed by Kashner et al. (2006) to predict global distributions of marine mammals Predictions made by matching species tolerances (environmental envelope) against local environmental conditions Color-coded species range map, using a half-degree latitude and longitude dimensions Supplements existing occurrence data On the D4Science Approach Toward AquaMaps Richness Maps Generation

3 How AquaMaps works Web based On-demand generation of maps by implementing an environmental envelope type modeling approach 1.Defining environmental envelopes describing the tolerance of a species w.r.t. each environmental parameter (e.g. sea temperature, salinity) 2.Generating probabilities of species occurrence by matching the species environmental envelope against local environmental conditions 3.Plotting these predictions to document large-scale and long-term presence of a species On the D4Science Approach Toward AquaMaps Richness Maps Generation

4 How AquaMaps works Relational DB containing tables for:  Species Environmental Envelope (HSPEN)  Range of environmental tolerance and preference of a species  Cells Authority File (HCAF)  Metadata about half degree cells  E.g. name, membership (FAO, EEZ), physical attributes (salinity)  Cells Species Assignments (HSPEC)  Probability of occurrence of a species in a given cell On the D4Science Approach Toward AquaMaps Richness Maps Generation

5 Defining environmental envelopes Species Environmental Envelope (HSPEN) Generating probabilities of species occurrence Cells Species Assignments (HSPEC) Plotting these predictions CSquare Mapper AquaMaps On the D4Science Approach Toward AquaMaps Richness Maps Generation How AquaMaps works Cells Authority File (HCAF) Compute Intensive tasks

6 AquaMaps Bottlenecks Difficult Periodic Maintenance  Size of the map data, mainly HSPEC  Re-generate all data files:  every time a species is included or excluded  Re-generate maps every:  every time the algorithm for “probability of occurrence” is updated Limited Performance  Huge number of combinations of different richness maps  species, family, country, ecosystem, bounding box, etc. On the D4Science Approach Toward AquaMaps Richness Maps Generation

7 How can D4Science help D4Science provides an e- Infrastructure based on the concept of Virtual Research Environments (VREs). These VREs bring together different resources (data collections, services, computing and storage) to serve particular needs of distributed VOs. On the D4Science Approach Toward AquaMaps Richness Maps Generation EGEE D4Science gLite gCube

8 How can D4Science help Direct Grid Execution (DONE)  Exploit the grid to generate and store AquaMaps  i.e. replace “on-demand generation” with “search for” maps VRE Grid Execution (ONGOING)  Support the whole workflow including maintenance  Generate data from different predictive algorithms  Alternative predictive maps will coexist, support for comparison and validation; On the D4Science Approach Toward AquaMaps Richness Maps Generation

9 create the jobs input AquaMaps DB submit jobs gLite SE store maps+metadata download C-SquaresGrid app gLite WNs Grid Execution create C-SquareGrid Input D4Science Services On the D4Science Approach Toward AquaMaps Richness Maps Generation run C-SquaresGrid

10 Grid Execution C-SquaresGrid Mapper is the application resulting of the porting of C-Square Mapper to the gLite environment  C-Square Mapper  a web-oriented perl-based utility  plotting dataset on a 2D base maps starting from a csquares string  C-SquaresGrid Mapper  standalone perl application executed as gLite job  accept csquare input string as files  generate global 3D maps also by using Xplanet  generate AquaMaps metadata and provenance data  store the products to gLite SE On the D4Science Approach Toward AquaMaps Richness Maps Generation

11 AquaMaps are generated by:  Species, Family, Order, Class, Phylum Each AquaMaps grid job elaborates 50 Species distribution maps (or Class, Family)  Data are obtained by aggregating Species data and distributing these probability data in 5 clusters  C-SquareGridMapper is provided with files containing 5 csquare string for each AquaMaps Each AquaMaps grid job produces:  one 2D map and 13 Global 3D views (for each Species)  provenance data and metadata Grid Execution On the D4Science Approach Toward AquaMaps Richness Maps Generation

12 Grid Execution Data Processed (1 st round)  6086 Species Maps ~ products (5.3 GB of images+metadata)  592 Family Maps ~11000 products (673 MB)  132 Order Maps ~2500 products (127 MB)  39 Class Maps ~700 products (50 MB)  13 Phylum Maps ~250 products (25 MB) On the D4Science Approach Toward AquaMaps Richness Maps Generation

13 Grid Execution The total number of job submitted was 136  success rate of 99%  510 min for 6000 species The maps creation time heavily depended on the generation of the distribution maps itself: Species: ~15 sec for each species Family: ~25 sec Order: ~1 minute Class: ~3 minutes Phylum : ~5 minutes On the D4Science Approach Toward AquaMaps Richness Maps Generation

14 VRE Exploitation A gCube VRE handling AquaMaps generation  Exploiting D4Science and gLite services  Equipped with a AquaMaps DB containing data on fishery species and their distribution  AquaMaps generation orchestrated via a VRE workflow  Job submission and management over the EGEE infrastructure  Interacting with the Archive Import Service to store generated AquaMaps compound objects in Content Management Service  Execution of Time Series Services to support the comparison and validation of multiple distribution maps  Discovery, access, and retrieval of distribution maps using different criteria (environmental, biological, geographical) On the D4Science Approach Toward AquaMaps Richness Maps Generation

15 Enhanced performance in the creation of AquaMaps  Distribution maps covering larger number of species  More often updates of distribution maps  Easier to tune “probability of occurrence” algorithm VRE exploitation brings advanced functionality  Support for alternative predictive maps  Execution of comparison and validation of predictive maps Scientists can focus on their core activities ! On the D4Science Approach Toward AquaMaps Richness Maps Generation Conclusions

16 Thanks! Questions? On the D4Science Approach Toward AquaMaps Richness Maps Generation