Promote barcoding as a global standard Build participation Working Groups BARCODE standard International Conferences Increase production of public BARCODE records Networks, Projects, Organizations Barcode of Life Community
Principles and Goals Free and open access Standardization and scalability Specimen-centered Rapid data release following primary QA/QC Ongoing crowd-sourced data curation Enable accelerated modern taxonomy Navigate across data types (DNA, specimens, species, publications, georeferences) Locate, aggregate, display and analyze data, resources
How Barcoding Works Building the reference library: – Well-identified specimen – Tissue subsample – DNA extraction, PCR amplification – DNA sequencing – Data submission to GenBank Using the reference library: – Unidentified specimen – Tissue, DNA, sequencing – Comparison with reference sequences
How Barcoding is Done From specimen to sequence to species Voucher Specimen DNA extractionCO1 geneDNA sequencing Trace file Public Databases of Barcode Records Collecting ND3ND3 COIIICOIII ND2ND2 ND1ND1
NBII, 25 February 2009 BOLD Workbench for Barcode Data Assembly/Analysis
GenBank, EMBL, and DDBJ Official Archival Repositories of Barcode Data http://www.insdc.org/
Current Norm: High throughput Large labs, hundreds of samples per day ABI 3100 capillary automated sequencer Large capacity PCR and sequencing reactions
● US$100-150K purchase ● 2-3 hours processing time ● 150-500 samples per day ● US$3-5 per sample
Technology Development Partnership Goal The DNA Sequencing Lab of 2013?
Producing Barcode Data: 201? Barcode data anywhere, instantly Data in seconds to minutes Pennies per sample Link to reference database A taxonomic GPS Usable by non- specialists
Status of Barcode Data BOLD records (public and private): – 956,000 records, 78,000 named species BARCODE records in GenBank: – 194,000 records – Insects: 150,000 records – Fish: 23,500 records – Birds: 6,000 records – Mammals: 2500 records
BARCODE Data Standard Required Elements for COI Species designation Voucher ID in standard Darwin Core format Minimum 500 bp, >1% ambiguous sites Bidirectional overlapping reads, 2 trace files Primer name and sequences Country/ocean region Strongly recommended: – Collection date and collector – Identifier – Latitude/longitude
Non-COI regions for other taxa Land plants: – Chloroplast matK and rbcL approved Nov 09 – Non-coding plastid and nuclear regions being explored Fungi and protists: – CBOL Working Groups convened – Recommendations expected in 2010
Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) BARCODE Records in INSDC Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.
Darwin Core Triplet Structured Link to Vouchers Institutional Acronym Collection Code Catalog ID ::
Structured Link to Vouchers NHMLEP123456 :: personalDHJanzenSRNP12345 ::
NCBI’s Biorepository List Compiled from Index Herbariorum, literature sources, GenBank submissions 6,936 records 1,177 records with non-unique acronyms 517 homonymous acronyms 374 shared by two records 143 shared by three records
AMNH Icelandic Institute of Natural History, Akureyri DivisionAkureyriIceland AMNHAmerican Museum of Natural HistoryNew YorkUSA UNLUniversidad Autónoma de Nuevo León Monterrey, Nuevo LeónMexico UNLUniversity of Nebraska State MuseumLincoln, NebraskaUSA UNL Centro de Estratigrafia e Paleobiologia da Universidade Nova de LisboaMonte de CaparicaPortugal ZMKZoological Musem, KristianiaOsloNorway ZMKZoologisches Museum der Universität KielKielGermany ZMKZoological Museum, CopenhagenCopenhagenDenmark
CBOL/GBIF/NCBI Registry of Biorepositories www.biorepositories.org
What Should We Do? CBOL will invest a year to populate institution and collection data in biorepositories.org Hope to build synchronization with: – Institution database at GenBank – Index Herbariorum – Authority files in BOLD Hope to install web services How can we accelerate registration process? Where should the data reside long-term? – GenBank? – GBIF?