Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,

Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair, RDA Biodiversity Data Integration IG RDA outputs for addressing biodiversity data challenges

The problem: Capturing and integrating biodiversity data How to we join up these activities?How do we use this as a tool? Species conservation & protected areas Impacts of human development Biodiversity & human health Impacts of climate change Food, farming & biofuels Invasive alien species What infrastructures do we need? (technologies, tools, standards…) What processes do we need? (Modelling, workflows…) What data do we need? (Genes, localities…)

Challenge 1: mobilising data at all scales

Technical aspect of data mobilisation Collections 1.5-3B specimens in collections worldwide Fragments efforts / need coordination Biodiversity literature >300M pages, BHL scanned 41M to date Copyright post-1923 & article metadata Informatics challenges Automation & annotation Storage & persistence Business models to sustain activity Collections, literature & metadata How can we quickly, efficiently and cost effectively mobilise biological data at scale? Bibliography of Life (RefFinder & RefBank) BHL literature NHM Digitisation

Big Data in Taxonomy and Systematics

Challenge 2: linking & aggregating data at different scales National Efforts c.5M (e.g. NHM Data Portal) Communities c.50k (e.g. Scratchpads) Global Efforts c.500M (e.g. GBIF Data Portal)

Conceptually has many potential uses Identifying trends Explaining patterns Making predictions Real time alerts - when data contradicts current knowledge The ultimate policy tool Major informatics challenges Technical very difficult (many years off) Needs effective prototypes & platforms Some first steps e.g. Local Ecological Footprint Tool Nature 2013, doi:10.1038/493295a Reasoning across large, linked biodiversity datasets A clear, singular, long-term vision, which biodiversity data can contribute too Challenge 3: Synthesising data, e.g. modelling human pressures on biodiversity

www.predicts.org.uk Projecting Responses of Ecological Diversity In Changing Terrestrial Systems 2M records, 19k sites, 34k spp. Management Practices EcosystemsAgro-systems Small aggregated datasets Species richness in different ecosystems Land-use change Pollution Invasive species Infrastructure Models to predict how biodiversity responds to human pressures Synthetic challenges: Modeling the biosphere

Diversity of Data types AND data sources Rod Page. http://iphylo.blogspot.co.uk. 2014

GBIF Aggregators Occurrence data aggregated from different nodes (data holders) http://gbif.org

Encyclopedia of Life http://eol.org Aggregators

EOL - TraitBank Over 8 million traits Aggregators

GenBank http://www.ncbi.nlm.nih.gov GenBank is part of the International Nucleotide Sequence Database Collaboration A comprehensive database that contains publicly available nucleotide sequences for almost 260,000 formally described species Aggregators

Species+ http://www.speciesplus.net/ Aggregators A combined source for legislation, distribution and trade in MEA-listed species

Making taxonomy digital, open & linked Aggregators http://scratchpads.eu

The Scratchpads concept Your data External data & services Data papers

http://catalogueoflife.org Catalogue of Life Providers A single authoritative source of taxonomic information

Biodiversity literature openly available to the world Biodiversity Heritage Library (BHL) http://www.biodiversitylibrary.org/ > 200M pages of legacy literature Providers

Rod Page. http://iphylo.blogspot.co.uk. 2015

Linking everything together is not enough. We need to provide turn-key solutions for researchers across scientific domains Will spark new ideas and support disruptive innovation in science

The 1k species project

Data mobilisation: mass digitisation and institutional data portals Generate the data necessary to document our collections and provide the means to access this information Provide common institutional portals to these data with a use a common licensing framework Tracking and linkage of data, specimens and authors through the adoption of persistent digital object identifier frameworks (e.g. ORCID, DataCite, and CrossRef DOIs). Data citation Metadata PID Information Types RDA outputs

Data access: a common digital gateway to collections Three billion specimens dispersed across multiple physical locations Union catalogue – Integrated information on collective museum holdings Download datasets, images and 3D models from across all our institutions in a single step Repository Audit and certification RDA outputs

Tools and services that facilitate the manipulation and analysis of big, integrated datasets. Examples include: taxonomic name matching, checklist production, authority files, georeferencing, image recognition and acoustic recognition. Cross-institutionally agreed core data models and technical interfaces (APIs). Data services: a service driven architecture for tool and model development RDA outputs Practical policy Publishing Data Services Data Foundation and terminology

Data consensus: community data curation and attribution Incentivise community contribution and curation through: 1.Easy to use mechanisms (services) 2.Clear incentives with academic and wider societal value RDA outputs Data Attribution (Will be proposed) Data Attribution (Will be proposed)

What is the potential value of RDA outputs for integrating biodiversity data? Strong advocacy from domain experts for RDA activities Robust, simple and transferable case studies of how RDA outputs can underpin our efforts Streamline the process of collating requirements specifications from all domain IGs as a first step for all tech WGs Organise joint sessions with emphasis on scientific topics derived from domain IGs Leverage the domain IGs as the driving force for technical solutions

The vision Develop the Biodiversity knowledge graph and build services on top of that

Thank you http://uk.linkedin.com/in/dkoureas @DimitrisKoureas d.koureas@nhm.ac.uk

Array Database Working GroupPeter Baumann Brokering Governance WGStefano Nativi, Max Craglia, Jay Pearlman Data Citation WG Andreas Rauber, Ari Asmi, Dieter van Uytvanck Data Description Registry Interoperability (DDRI) WGAmir Aryani, Adrian Burton Data Foundation and Terminology WG Peter Wittenburg, Gary Berg-Cross, Raphael Ritz Data Type Registries WGLarry Lannom, Daan Broeder Metadata Standards Catalog WGRebecca Koskela, Keith Jeffrey, Alex Ball Metadata Standards Directory WG rda-mdir-wg@rda- groups.org Jane Greenberg, Keith Jeffery, Rebecca Koskela, Alex Ball PID Information Types WGTobias Weigel, Tim DiLauro Practical Policy WGReagan Moore, Rainer Stotzka QoS-DataLC Definitions WGPaul Millar RDA/CODATA Summer Schools in Data Science and Cloud Computing in the Developing World Hugh Shanahan, Andrew Harrison, Simon Hodson RDA/WDS Publishing Data Bibliometrics WG Kerstin Lehnert, Todd Carpenter, John Kratz, Sarah Callaghan RDA/WDS Publishing Data Services WGHylke Koers, Adrian Burton RDA/WDS Publishing Data Workflows WG Sunje Dallmeier-Tiessen, Fiona Murphy, Nurnberger, Varsha Khodiyar Repository Audit and Certification DSA–WDS Partnership WG Lesley Rickards, Mary Vardigan, Rorie Edmunds Research Data Collections WG Bridget Almas, Frederik Baumgardt, Tobias Weigel, Tom Zastrow The BioSharing Registry: connecting data policies, standards & databases in life sciences Susanna-Assunta Sansone, Rebecca Lawrence, Simon Hodson & Peter McQuilton Wheat Data Interoperability WG rda-wdinterop-wg@rda- groups.org Esther Dzalé Yeumo, Richard Fulss Working Group Data Security and Trust

All data objects need to have unique identifiers and resolvable handlers Standardised web-services Ontologies and vocabularies and open services Clear and robust governance models The challenges of a fragmented domain of e-infrastructures

Diversity of Data types AND data sources Investigator-focused 'small data‘ Locally generated 'invisible data' 'incidental data' Dark data 20% 80% Published and discoverable data Dark data more important mainly due to their volume 1 1 Heidorn PB. Library Trends 57:280-299

Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,

Similar presentations

Presentation on theme: "Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,

Similar presentations

Presentation on theme: "Dr Dimitris Koureas Lead of Research Data & Partnerships Natural History Museum London Executive Secretary, Biodiversity Information Standards (TDWG) Co-chair,"— Presentation transcript:

Similar presentations

About project

Feedback