Presentation on theme: "Hannu Saarenmaa – University of Eastern Finland GEO BON, WG8 – Data Integration and Interoperability EU BON, WP2 – Data Integration and Interoperability."— Presentation transcript:
Hannu Saarenmaa – University of Eastern Finland GEO BON, WG8 – Data Integration and Interoperability EU BON, WP2 – Data Integration and Interoperability BioVeL, WP2 – Workflows for Scientific Research Organising data flows and modelling for the Essential Biodiversity Variables 1 GEO - X Plenary Geneva, 14 January 2014
Essential Biodiversity Variables Conceived by GEO BON Collaborators (Pereira et.al. (2013) “Essential Biodiversity Variables”, Science, Vol. 339, 18 Jan 2013). EBVs facilitate data integration by providing an intermediate abstraction layer between primary observations and indicators. Computed from a large number of inputs (monitoring/incidental data). EBVs aim to help observation communities harmonise monitoring, by identifying how variables should be sampled and measured. EBVs standardise an ontology for biodiversity and harmonise measurements, observations, and protocols. Endorsed by Convention on Biological Diversity (CBD) and in line with the 2020 Aichi Targets. Provide focus for GEO BON and hence for the interoperability thrust within GEO BON. A use case that GEO BON, EU BON and BioVeL focus on.
Where does the data come from? In Europe there are about 2000 biodiversity observation networks (only 643 listed by EUMON). GBIF has 10,000 data sets, openly accessible, conforming to GEOSS Data Sharing Principles. LTER/DataONE has 1,000’s biodiversity datasets. EU BON is carrying out a gap analysis: – There is a massive duplication of effort in data management, and lack of data sharing. – There are very few data sets whose ”quality” (coverage, accuracy, etc.) has been documented and guaranteed. – So called ”Data core” in biodiversity has not yet been defined. 4
“Workflows” (series of data analysis steps) allow to process vast amounts of data. Build your own workflow: select and apply successive “services” (data processing techniques.) Import data from one’s own research and/or from existing libraries (i.e. GBIF, Catalogue of Life). Access a library of workflows and re-use existing workflows. Cut down research time and overhead expenses. 6 Part of a workflow to study the ecological niche of the horseshoe crab Biodiversity Virtual e-Laboratory BioVeL processing services and workflows
Aim: Predictive modelling of biodiversity change Available tools from a growing family of ENM workflows – released to public at 1.Data Refinement Workflow (DRW) for pre-processing – Taxonomic Name Resolution / Occurrence retrieval – Geo-temporal data selection using ‘BioSTIF’. – Data quality checks / filtering using ‘Google Refine’. 2.Ecological Niche Modelling Workflow (ENM) – Classic ENM with 15 algorithms – Separate BioClim workflow (requires special inputs) 3.ENM Statistical Workflow (ESW) for post-processing – DIFF: Extent and intensity of change – STACK: Extent, intensity, and a cumulated potential – SHIFT: of the centre of gravity (direction, length, in kilometers) Data discovery Data assembly, cleaning, and refinement Ecological Niche Modeling Statistical analysis The analytical cycle
Seamless exchange of data layers
Use case: The spruce bark beetle, Ips typographus, disturbance of forest ecosystems Pre 2002 Year 2050 Difference Statistical processing of the difference in Finland indicates that susceptibility of spruce forests to Ips typographus damage will get five-fold by Policy advise: Stricter forest hygiene through tougher legislation, so that Ips populations are kept at minimum, because of the increased risk. Papers for Silva Fennica and INTECOL session proceedings at Journal of Ecology.
Outline of the use case Running Ecological Niche Modeling (ENM) workflow for large number of species – Process data points for hundreds of species (e.g. plants, butterflies, …) – Use data mostly from GBIF, but also from elsewhere – Each individual species may have 10 5 of data points – Run openModeller based ENM for all the data points – Choose predictive layers from WorldClim and GEOSS sources Generate summary statistics that can answer questions such as: – How many species are increasing? How many are decreasing? Does the flora/fauna move to any direction? Is distribution fragmenting? Is distribution shrinking? How many populations are becoming marginalised? – Prototype automatic data processing for computing the Essential Biodiversity Variables (EBV) 11 EBVs?
Status of the current BioVeL ENM workflow Current openModeller based ENM workflows work at a smaller scale – focus on one or a few selected species Current workflow requires frequent interaction with the user (many clicks if we simply multiply runs) We need a system that is scalable and automated to run ENM for hundreds of species We need a system that can perform a summary analysis across all the species based on the individual ENM runs 12 The 2 nd generation BioVeL portal will provide the required capabilities. To be released publicly in January 2014 (currently in beta mode)
Envisaged application structure 13 GBIF query LTER query EUMON query... Selected species ENM parameter sets for species ENM workflow... Summary analysis Multiple species may use the same ENM parameter set (e.g. Mediterranean dryland plants) Parameter sets are generated and tested with another workflow (see next slide) Some species may need other offline data, or private data (uploaded from user side). One ENM workflow predicts the impact of environmental changes on the distribution of one species. Performed with R-based custom tool outside the portal EBV production by combining data from different models ENM output file Portal offers files for download
ENM parameter optimisation workflow 14 Parameter test and selection job... Selected species Parameter matrix Possible parameter combinations. ENM parameter sets for species The optimal parameter input for the large ENM workflow (see previous slide)
Initialising the data sweep on portal 15
Results of data sweep, ready to be mapped, and statistically analysed 16
Example product: Accumulated invasive potential for ecological groups Example: Stack of combined macrozoobenthic invasion heatmaps Slide by Matthias Obst, BioVeL 20 blacklisted species divided in 4 ecological regimes -Zoobenthos -Phytobenthos -Zoopelagial -Phytopelagial