The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary.

Slides:



Advertisements
Similar presentations
A vision for the future of taxonomic databases David Eades Illinois Natural History Survey Presented at the Natural History Museum, London, 17 January.
Advertisements

How to publish genomic Data papers based on BOL data - Biodiversity Data Journal Lyubomir Penev Bulgarian Academy of Sciences & Pensoft Publishers ViBRANT.
Don’t make me think Biodiversity data publishing made easy Vince Smith, Alice Heaton, Laurence Livermore, Simon Rycroft, Ben Scott & Lyubomir Penev* The.
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
To share data, all providers must agree upon a data standard.
Academia Sinica, 16 January 2007 DNA Barcoding: An Emerging Global Standard for Species Identification Consortium for the Barcode of Life National Museum.
NYBG + KE EMu The New York Botanical Garden + KE EMu Melissa Tulig Botanical Information Management.
Connect.barcodeoflife.org. Promote barcoding as a global standard Build participation Working Groups BARCODE standard International Conferences.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian Barcode of Life Network Biodiversity.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Arthur ChapmanData Quality Training SABIF June 2012 Taxonomic and Nomenclature Data A. D. Chapman Data Quality.
Catalogue of Life, Reading, UK, 29 March 2007 Consortium for the Barcode of Life (CBOL): Linking Molecules to the Catalogue of Life David E. Schindel,
DNA Barcodes: Linking GenBank records to Museum Specimens David E. Schindel, Executive Secretary, CBOL Robert Hanner, University of Guelph.
Lecture 2.21 Retrieving Information: Using Entrez.
Data Analysis Working Group, DIMACS, 26 Sept 2005 DNA Barcoding and the Consortium for the Barcode of Life David E. Schindel, Executive Secretary National.
CBOL, DNA Barcoding and Long-Term Ecological Studies David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
ABBI/FISH-BOL meeting, Buenos Aires, March 2007 Overview of DNA Barcoding David E. Schindel, Executive Secretary Consortium for the Barcode of Life National.
The Role of Online Biodiversity Databases David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution
Dan Masiga Molecular Biology and Biotechnology Department International Centre of Insect Physiology and Ecology, Nairobi, Kenya BARCODE Data Standard The.
Consortium for the Barcode of Life A rapid, cost-effective system for species identification David E. Schindel, Executive Secretary National Museum of.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Species Identification, Regulatory Agencies and DNA Barcoding David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution.
DNA Barcoding Amy Driskell Laboratories of Analytical Biology
Scott Miller – SANBI, 7 April 2006 Overview of DNA Barcoding and the Barcode of Life Initiative Scott E. Miller, Chair, CBOL Executive Committee National.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
Census of Marine Life, Amsterdam – 16 May 2006 The Protocol Chain for DNA Barcoding Projects.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
BARCODE OF LIFE DATA SYSTEMS (BOLD) Riadul Mannan Biodiversity Institute of Ontario.
ABBI/FISH-BOL Neotropical Working Group Meeting 14 March 2007
South/Central America Regional Meeting, Campinas, Brazil, 19 March 2007 Overview of Consortium for the Barcode of Life (CBOL) David E. Schindel, Executive.
Peter H. Wiebe and Nancy Copley Woods Hole Oceanographic Institution How does CMarZ Work? CMarZ Information System / Database /OBIS/ Species Pages.
Freek T. Bakker Nationaal Herbarium Nederland Wageningen University branch DNA barcoding: the CBOL perspective.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
The Global Names Architecture: Integration In Action (NOT “Inaction”) 1.Overview of GNA, GNI & GNUB (15 mins) 2.Questions, Elaborations & Clarifications.
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
Utah State University – 29 Nov 2006 DNA Barcoding: An Emerging Global Standard for Species Identification Consortium for the Barcode of Life National Museum.
Progress since the February 2005 London DNA Barcode of Life Conference Scott Miller, Chair Consortium for the Barcode of Life Smithsonian Institution.
Introduction to the Consortium for the Barcode of Life (CBOL) Scott Miller Smithsonian Institution and Consortium for the Barcode of Life.
Aspects for Improving the ABBI Patricia Escalante Instituto de Biología UNAM AOU-Collections Committee member.
Richard White Biodiversity Informatics. What is biodiversity informatics? The preceding project, among others, shows that the challenges facing biodiversity.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
Biodiversity Data Journal: mobilization, reuse and integration of small data Lyubomir D. Penev 1,3, Teodor A. Georgiev 3, Pavel E. Stoev 2,3, Jordan Bisserkov.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Consortium for the Barcode of Life
CBoL Taipei, september 2007 BARCODE DATA, MUSEUM CATALOGS AND GBIF Simon Tillier.
National Science Foundation – 7 February 2006 Consortium for the Barcode of Life (CBOL) David E. Schindel, Executive Secretary National Museum of Natural.
Eastern Africa Regional Meeting, Nairobi, 18 October 2006 DNA Barcoding and the Consortium for the Barcode of Life (CBOL) Status in 2006, Ambitions for.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY DNA Barcoding in Southern Africa Cape Town 7 April
South/Central America Regional Meeting, Campinas, Brazil, 19 March 2007 CBOL Working Groups David E. Schindel, Executive Secretary National Museum of Natural.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
DNA Barcoding and the Consortium for the Barcode of Life Katie Ferrell, Project Manager National Museum of Natural History Smithsonian Institution
Distributed Biodiversity Information Databases A. Townsend Peterson.
DNA Barcoding and the Consortium for the Barcode of Life Scott Miller Smithsonian Institution
Journal Searching Nancy B. Clark, M.Ed. Director of Medical Informatics Education FSU College of Medicine 1 All recourses are available online in Medical.
Don’t make me think Biodiversity Data Publishing Made Easy Laurence Livermore, Vince Smith, Alice Heaton, Simon Rycroft, Ed Baker, Ben Scott & Lyubomir.
Linking Barcode Data to Multiple Users David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution
1 The National Biological Information Infrastructure and Biodiversity Collections Annette Olson BCI meeting, Washington DC, January 28-29th, 2008.
Andrew Polaszek Executive Secretary, ICZN, c/o Natural History Museum, London UK
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Barcode sequences at GenBank
GBIF Implementation Plan Highlights
GBIF Today and Tomorrow
Presentation transcript:

The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution 202/ ; fax 202/

Infrastructure of Taxonomy: Fragmented, Disconnected Collections and databases of specimens Seedbanks, culture/cell line collections Compilations of taxonomic names Floristic and faunistic surveys/inventories Monographs, Taxonomic revisions Data repositories (gene sequences, characters, images, trees) The (undigitized) Taxonomic Literature

Linking Logical Categories (1): Specimens, Names, Opinions ??

Linking Logical Categories (2): Naming and defining species Holotype specimens

Linking Logical Categories (3): Establishing species boundaries ?? Species concept beyond holotype - Paratype series - Typological versus population thinking - Genetic lineages - BSC (hard to apply)

Linking Logical Categories (4): Interpreting species boundaries ?? Other assigned specimens: Species philosophy of original author Interpretation of user

Databases of Names, Specimens, Species Distributions Authority files of taxonomic names Museum databases of associated data Databases of species occurrences and distribution (OBIS)

DNA Barcodes: A Key Variable for Biodiversity Informatics Authority files of taxonomic names Museum databases of associated data Databases of species occurrences and distribution (OBIS)

CBOL’s Working Groups Database: Designing/constructing the Barcode Section of GenBank DNA: Protocols for formalin-fixed and old museum specimens; Producing LIMS for dissemination Data Analysis: Beyond phenetic methods; population genetics perspective (Plants: Initiated discussions of plant barcode gene region(s))

BARCODE Data Standards Consultations with GenBank, ITIS, museum database developers, GBIF, ISIS, from 2004 Consensus results of Front Royal meeting –GBIF  ITIS  GRIN –NBII  Species2000  IPNI –ICZN  ZooRecord  OBIS GenBank Proposed to International Nucleotide Sequence Database Collaboration (EMBL, DDBJ) Approved by CBOL and INSDC mid-2005

Reserved Keyword “BARCODE” GenBank reviews records against standard Adds keyword “BARCODE” in annotation field Can be removed by CBOL

Requirements Species name selected from authority Sequence from COI or other barcode region approved by CBOL Structured link to voucher specimen Online access to metadata Trace files and quality scores Primer sequences and names Minimum sequence length (500bp for COI) Geographic locality

Recommended fields, added to INSDC at CBOL’s request Latitude and longitude Name of the identifier Name of the collector Date of collection

New Data Fields Latitude/Longitude Collection date Collector’s name Identifier’s name

BARCODE Keyword in GenBank

Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) BARCODE Records in INSDC Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.

Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) Structured link to Vouchers Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.

What constitutes a voucher? Long-term reference tied to BARCODE Corroborates the species identification Provides additional tissue CBOL relies on community decisions: –Full specimen? –Parts for morphologic features (e.g., feather?) –Frozen tissue? –E-Vouchers for large specimens, destructive samples, catch-and-release?

Where’s the voucher?

Linking to Vouchers Structured Voucher IDs

Based on Darwin Core Eventually will be replaced by GUID Triplet: Institution Acronym : Collection : Specimen # NMNH : FISH : CBOL, GBIF and NCBI discussing global registry of: –Institutional acronyms –Collection codes –“Pre-accession” specimen IDs Voucher Specimen ID

Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) Link to Species Names Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp. Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species

Species names in INSDC

NCBI Taxonomy Browser The good, the bad, and the ugly Species names provided by submitters Checked against compilations Linkout to Catalogue of Life, other sources Names not found added to Taxonomy Browser Submitters informed of errors but not forced to make corrections

NCBI Taxonomy Browser

NCBI Taxonomy Browser Some names have no other source

Other names linked to GBIF and Catalogue of Life…

…and primary data source

Authoritative Species Lists Catalogue of Life Species lists compiled by barcoding projects –FISH-BOL from FishBase, CoF –MBI mosquito catalog Nomenclators NameBank New names in publications Eventually, central registries (e.g., ZooBank)

Provisional Species ID Uncertain identifications Species complexes Newly discovered variants Ecogenomic samples Need general guidelines to ensure: –Globally unique, –Stable, retrievable –Can’t be confused with valid species name

Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) BARCODE Records in INSDC Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.

Improving links to taxonomic journals Connecting taxonomic articles

Links to Taxonomic Literature Library-Laboratory meeting in London, 2005, on electronic access to taxonomic literature Led to formation of Biodiversity Heritage Library initiative Proactive steps with PubMed to add taxonomic journals to online abstracts Aggressive negotiation with publishers of barcoding papers Involvement in Encyclopedia of Life

Long-term data curation of BARCODE records Data records assembled IDs consistent with other records? Compliant with BARCODE standards? Data records released on INSDC Data records published in BOLD Community feedback Update records (audit trail of species names retained) CBOL control of BARCODE flag GenBank adds BARCODE flag

Acknowledgements Robert Hanner, University of Guelph, Chair of CBOL’s Database Working Group Scott Federhen, NCBI Taxonomy Browser Donald Hobern, Head of Informatics, GBIF