Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The.

Slides:



Advertisements
Similar presentations
A vision for the future of taxonomic databases David Eades Illinois Natural History Survey Presented at the Natural History Museum, London, 17 January.
Advertisements

Globalnames.org.  Discovery  Ephemeral  Individualistic  Massive redundancy  Optional  Risk taking.
How to publish genomic Data papers based on BOL data - Biodiversity Data Journal Lyubomir Penev Bulgarian Academy of Sciences & Pensoft Publishers ViBRANT.
Use it or lose it: Crowdsourcing support and outreach activities in a hybrid sustainability model for e-infrastructures The ViBRANT project case studies.
Don’t make me think Biodiversity data publishing made easy Vince Smith, Alice Heaton, Laurence Livermore, Simon Rycroft, Ben Scott & Lyubomir Penev* The.
Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr Dimitrios Koureas Department of Life Sciences | Biodiversity Informatics.
EDIT General Meeting Carvoeiro, January 2008.
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Making small data big! The Biodiversity Data Journal (BDJ) Lyubomir Penev, Jordan Biserkov, Teodor Georgiev, Pavel Stoev, David Roberts, Vincent Smith.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Making small data big! The Biodiversity Data Journal (BDJ) Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts, Vincent Smith ViBRANT.
Facilitating biodiversity science through
Scratchpads Publishing biodiversity: The interplay between Scratchpads and the Biodiversity Data Journal Dr Dimitrios Koureas Biodiversity Informatics.
EU BON citizen science gateway Veljo Runnel University of Tartu Natural History Museum.
OpenUp! A New Project on Opening up the European Natural History Heritage for EUROPEANA W. G. Berendsohn, A. K. Michel, A. Güntsch, W.-H. Kusber (2011)
Dimitris Koureas, Vince Smith & Simon Rycroft Natural History Museum London Linking data, services and communities using Virtual Research Environments.
1 Digital Libraries and Evidence in the Developing World Context Dr. Jon Ferguson Senior Health Database Scientist IMMPACT Project University of Aberdeen.
Link yourself or perish? PhytoKeys, the next generation journal in systematic botany Lyubomir Penev 1, W. John Kress 2, Sandra Knapp 3, De-Zhu Li 4, Susanne.
Fourth Annual Summit | Feb | Tucson, AZ Scratchpads for community involvement for natural history collections Dr Dimitris Koureas Biodiversity.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Sustainability of EDIT Informatics Activities. BoD working group on sustainability Executive Summary, 20th July 2009: “… set of themes we are sure we.
Nurturing a community based sustainability model Support and outreach structures in Scratchpads Livermore L. & Koureas D. Biodiversity Informatics Group.
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Virtual Biodiversity ViBRANT Vince Smith & Dave Roberts Natural History Museum, London ViBRANT Virtual Biodiversity.
The Pensoft Journal System and XML-based workflow Lyubomir Penev Life and Literature Conference, Chicago 2011 ViBRANT Virtual Biodversity.
@dimitriskoureas making small data… big. Publications based on countless specimens, images, maps, keys and datasets Typically generated by small communities.
SCIENCE, RESEARCH DATA, AND PUBLISHING Stewart Wills Editorial Director, Web & New Media, Science 26 February 2013.
ODINCINDIO Marine Information Management Training Course February 2006 Cataloguing: Introduction Murari P Tapaswi National Institute of Oceanography,
Dimitris Koureas, PhD Natural History Museum London Linking layers of biodiversity data: Informatics challenges for the long tail research RDA - Long Tail.
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
A Biodiversity Content Management System for Research, Education, and Outreach Cynthia Sims Parr University of Maryland, College Park Co-authors Roger.
Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Reading,
Resolving the publishing bottleneck and increasing data interoperability in biodiversity science Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts,
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
CBoL Taipei, september 2007 BARCODE DATA, MUSEUM CATALOGS AND GBIF Simon Tillier.
Encyclopedia of Life Established May 2007 First version of portal went online Feb year goals –Assemble infinitely expandable web pages for all.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
An Introduction to Scratchpads: Making your data work for you Laurence Livermore Natural History Museum, London Joinville, Brazil.
Finding the right balance between human effort and automation for metadata creation Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Don’t make me think Biodiversity Data Publishing Made Easy Laurence Livermore, Vince Smith, Alice Heaton, Simon Rycroft, Ed Baker, Ben Scott & Lyubomir.
MARC Content Designation and Utilization Learning from Artifacts: Metadata Utilization Analysis William E. Moen School of Library and Information Sciences.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
Taxonomic Workflow in the EDIT Platform for Cybertaxonomy Andreas Kohlbecker, Pepe Ciardelli, Niels Hoffmann, Katja Luther, Andreas Müller Botanic Garden.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Virtual Biodiversity ViBRANT Vocabularies, Standards, merging and linking Data Olaf Banki University of Amsterdam ViBRANT Virtual Biodiversity.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan Senior Programme Officer for DIGIT 10 th Meeting of the GBIF Participant Node Managers Committee.
Scratchpads Virtual Research Environments for taxonomic and biodiversity related data.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Nordic Cooperation on Biodiversity Informatics Hannu Saarenmaa NordBIN meeting Uppsala /03.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.
Scratchpads Virtual Research Environments for taxonomic and biodiversity related data Dr. Vince Smith Informatics Research Leader The Natural History Museum.
GBIF Implementation Plan Highlights
International Congress of Entomology, Orlando
The IPT user interface and data quality tools
Biodiversity Informatics 101
Cynthia S. Parr, Robert Guralnick, Nico Cellinese, Roderic D.M. Page 
Bird of Feather Session
MSDI training courses feedback MSDIWG10 March 2019 Busan
Digital Objects: The Science
Australian and New Zealand Metadata Working Group
Presentation transcript:

Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The Natural History Museum London

What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

link, What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

is a author of person Fred book What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

The Semantic web: “The future of the web …and always will be” – Peter Norvig (Google) What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

Biodiversity informatics The study of the transformation and communication of information in Life and Earth sciences provides the means (generating and enhancing the necessary infrastructure)

Research vs Infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona

vs Infrastructure  Discovery  Ephemeral  Individualistic  Massive redundancy  Optional  Risk taking Slide adapted from Patterson D. 2013, Tempe, Arizona Research

vs Infrastructure  Discovery  Ephemeral  Individualistic  Massive redundancy  Optional  Risk taking  Implementation  Communal / agreed  Essential  Persistent  Robust & reliable  Adaptable Slide adapted from Patterson D. 2013, Tempe, Arizona Research

What are the current challenges in Biodiversity informatics?

Publications based on countless specimens, images, maps, keys and datasets Current taxonomic data production Typically generated by small communities for “local” research projects Figure from Costello M.J et al, 2013 doi: /science

15-20k new spp. described annually (2M total) 1 30k nomenclatural acts (12M total) 1 20k phylogenies (750k total) 2 31k taxa sequenced (360k taxa total) 3 800k BioMed papers (40M total pp. of taxonomy) 4 Countless specimens, images, maps, keys and datasets Our current taxonomic data production Figures from 1) Zhang, Zootaxa , 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed. 1.8 M described spp. (17M names) 300M pages (over last 250 years) 1.5-3B specimens

Estimates of 7.5 million species still undescribed 1 1 How Many Species Are There on Earth and in the Ocean? Mora C et al. doi: /journal.pbio Now imagine that…

Biodiversity informatics landscape Key problems Landscape is complex, fragmented & hard to navigate Many audiences (policy makers, scientists, amateurs, citizen scientists) Many scales (global solutions to local problems) Figure adapted from Peterson et al, Syst. & Biodiv doi: /

Science is carried out “locally” By local scientists Being part of local infrastructures Having local funders Science is global It needs global standards Global workflows Cooperation of global players BUT

Expected volume of taxonomic and biodiversity data Need of extracting, aggregating and linking data on a global level

Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE doi: /j.tree This requires data, information & knowledge to be… Digital Not printed paper Openly accessible Not behind barriers (e.g. paywalls) Linked-up Not in silos “ Link together evolutionary data … by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” To achieve this…

Hour-glass motif for big data infrastructure Data re-use Data generation Data pool Slide adapted from Patterson D. 2013, Tempe, Arizona

Big data world with re-use data AggregationVisualizationAnalysisManipulation ModelsObservationsExperimentsProcessed Data re-use Data generation Data pool

AggregationVisualizationAnalysisManipulation ModelsObservationsExperimentsProcessed Data re-use Data generation Data pool Big data world with re-use data

Nodes interconnected Slide adapted from Patterson D. 2013, Tempe, Arizona

But how many biodiversity informatics projects are out there?

At least 679 ! But how many biodiversity informatics projects are out there? Sources: EDIT, TDWG & ViBRANT 2013 Categories: Data Aggregator - a web site that collates data from a variety of sources (digital and hardcopy) and presents it in one form Data Indexer - a web site that provides lists or indexes of other sites that provide data Data Provider - a web site that provides data directly from research or other studies Data Standards - a web site that contributes to formulating or developing standards for data Facilitator - a web site that facilitates the provision of data by other projects or web sites

GBIF: Our global leader in occurrence data Aggregators

EU-NOMEN - PESI Aggregators

Making taxonomy digital, open & linked Aggregators

Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data taxonomic workflow in a single virtual environment

A Scratchpad is a website that holds data for you and your community The Scratchpads concept Your data External data & services

65,000 unique visitors/month Per month unique visitors to Scratchpads sites 580 Scratchpads Communities by 8,185 active registered users covering 55,607 taxa in 653,274 pages. In total more than 1,300,000 visitors

Researchers can assemble, test, and analyse their data records in BOLD before uploading them to: International Nucleotide Sequence Database Collaboration (DDBJ, ENA, GenBank) BOLD Barcode of Life Data Systems Facilitators

Biodiversity literature openly available to the world as part of a global biodiversity community Biodiversity Heritage Library BHL > 40 M pages of legacy literature Providers

Standard Exchange formats

Darwin Core (DwC) Primarily used as a specimen records metadata standard Standard Exchange formats

Access to Biological Collection Data (ABCD) highly detailed and aims to provide a complete set of data elements for natural history collection items Standard Exchange formats

Audubon Core Multimedia Resources Metadata Schema The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections. Standard Exchange formats

Taxonomic Concept Transfer Schema (TCS) Mechanism to exchange data concerning the names of organisms Standard Exchange formats

Standards facilitate systems interoperability

UPIDs to identify content Identifiers A key to find something in a database. We need Unique Identifiers

/ We need Unique Identifiers

We need Unique Identifiers

Can a taxonomic name be used as a UPID? Is it Unique? Is it Persistent? Is it an Identifier? Are taxonomic names enough for communication between Scientists? YES Are taxonomic names enough for communication between machines? CAN BE IF We need Unique Identifiers

For example: Page R., Brief Bioinform (2008) 9 (5): doi: /bib/bbn022 We need Unique Identifiers

ONLY IF Name reconciliation Patterson, D. J. et al Names are key to the big new biology. TREE 25: doi: /j.tree We need Unique Identifiers

The need for Controlled Vocabularies and Ontologies Knowledge Organisation Systems Google has done it: Ontologies Plant anatomical and structural development Ontology

Deans A. et al. Time to change how we describe biodiversity, Trends in Ecology & Evolution 2012 doi: /j.tree Example of ontology usage

Examples of integrated projects

How are all this relevant to my work ? What should I take home ?

Repositories #bigdata Repositories #bigdata Providers Data silos Community

The four nodes of data workflow 1. We collect and generate data 2. We curate, link and structure data 3. We analyse data 4. We publish data

Data curation Data curation Data analysis Data analysis Data publishing Data publishing The four nodes of data workflow Data collection & generation Data collection & generation What are the bottlenecks in the workflow ?

Data curation Data curation Data analysis Data analysis Data publishing Data publishing What we need is… Data collection & generation Data collection & generation a seamless workflow

Old Joke: A drunk is crawling around a lamp post on his hands and knees. A cop comes along … Cop: What are you doing? Drunk: Looking for my car keys. Cop: Are you sure you dropped them here? Drunk: No, I dropped them in the alley. Cop: So why are you looking here? Drunk: Because the light’s better. Old Joke

Science is a ‘light’s better’ endeavor in that research effort is not directed at areas where the work is technically infeasible. Research is directed where real, interpretable results may be obtained. We do, in fact, conduct research where the light’s better. But, when the light changes, so does science. With better illumination, we look in new areas. We find new things… Old Joke

Addressing the challenges of biodiversity informatics “…the field [of biodiversity informatics] appears to be growing in a void of overarching, motivating questions, effectively making it a set of technologies in search of questions to address.” Peterson et al, Syst. & Biodiv doi: /