Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail IG breakout session Amsterdam, 23 Sep 2014
The problem – Capturing and integrating biodiversity data How to we join up these activities?How do we use this as a tool? Species conservation & protected areas Impacts of human development Biodiversity & human health Impacts of climate change Food, farming & biofuels Invasive alien species What infrastructures do we need? (technologies, tools, standards…) What processes do we need? (Modelling, workflows…) What data do we need? (Genes, localities…)
LinkD Challenge 1: mobilising data at all scales
LinkD Challenge 2: linking & aggregating data at different scales National Efforts c.5M (e.g. NHM Data Portal) Communities c.50k (e.g. Scratchpads) Global Efforts c.500M (e.g. GBIF Data Portal)
LinkD Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity Projecting Responses of Ecological Diversity In Changing Terrestrial Systems 2M records, 19k sites, 34k spp. Management Practices EcosystemsAgro-systems Small aggregated datasets Species richness in different ecosystems Land-use change Pollution Invasive species Infrastructure Models to predict how biodiversity responds to human pressures
The problem – integrating biodiversity research Figure from Costello M.J et al, doi: /science
c new sp and subsp. described every year c new sp and subsp. described every year The problem – integrating biodiversity research
Key problems Landscape is complex, fragmented & hard to navigate Many audiences (policy makers, scientists, amateurs, citizen scientists) Many scales (global solutions to local problems) Figure adapted from Peterson et al 2010 An informaticians view of biodiversity
Investigator-focused 'small data‘ Locally generated 'invisible data' 'incidental data' dark data 20% 80% Published and discoverable data Dark data more important mainly due to their volume 1 1 Heidorn PB. Library Trends 57:
Incentives for mobilising long-tail research Leverage effort and data impact Increase exposure and citability of work Provide easy to use and long-lasting VRE Promote the culture of openness in science
Increase exposure and citability of work Scholarly data publication Enable easy publication of data and data descriptors Link data journals with data sources (repositories, VREs) using common data exchange standards Small data contributions
Leverage effort and data impact Virtual Research Environments Empower researchers through development and deployment of service-driven digital research environments 515 Scratchpad Communities by 6,321 active registered users covering 176,950 taxa in 932,296 pages. 134 paper citations in 2013 In total more than 2,500,000 visitors
Leverage effort and data impact Long tail data External data & services
Leverage effort and data impact Enable long tail researchers to do science online by processing own data together with data from cross-disciplinary sources Provide workflows for the processing of data in major areas of biodiversity research: ecological niche modelling, ecosystem functioning, and taxonomy. The BioVeL approach Design and Construct – Run – Share and Discover scientific workflows
Leverage effort and data impact A highly dynamic but fragmented landscape
Data curation Data curation Data publishing Data publishing Data mobilisation & generation Data mobilisation & generation Data analysis Data analysis Leverage effort and data impact Seamless virtual research environments that incentivise mobilisation of long tail research
H VRE Proposal: LinkD Topic: EINFRA Virtual Research Environments Estimated Budget: € 8-9 m Consortium: c. 24 partners LinkD Linking data, services and communities for predictive modelling of the biosphere Deliver a coherent and accessible ecosystem of federated services and deploy a network of research and collaboration enabling tools to support scientific excellence towards the long term vision of predicting modelling of the biosphere Builds upon: ViBRANT | BioVeL | pro-iBiosphere | EU-BON Strategic links to: ESFRI projects (incl. LifeWatch, ELIXIR)