Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université de Montréal / Canadensys Marie-Élise Lecoq, GBIF France Tim Robertson, GBIF
Darwin Core Archive (DwC-A) DarwinCore standard does not impose strong rules on the content associated with any DarwinCore terms.
Current GBIF DwC-A Validator Original goal “… test Darwin Core Archives as specified in the Darwin Core Text Guide.”
Current GBIF DwC-A Validator Original target DwC-A are simple and can be created using simple custom scripts. “… make sure GBIF and others can read the information as expected.”
Current GBIF DwC-A Validator Validates archive structure Offer web presence – Report viewer – API
Next GBIF DwC-A Validator? New goal Extends validation to the content of the archive
Current content validators Atlas of Living Australia sandbox VertNet – Spatial quality GBIF Spain – Darwin Test Encyclopedia of Life – dwc-validator Scratchpads – dwca-validator GlobalNames – dwc-archive ruby gem … much more See Appendix 1 for links
What we need? Accommodate different scopes Configuration/customizations – Use more knowledge when available Web access (page and API)
Scopes Data entry Desktop software – Scientific Work Flow – Statistical software Integrated Publishing Toolkit (IPT) National nodes Aggregators
Configuration/Customization Where the validator will be used? Can we provide more information? – e.g. I know all the dates in my file should be ISO
Components Library Web Extension Support
Library Define structure for validation process Provide a validation framework enabling sharing Close to DarwinCore specification
Web Web page to submit archive or URL Report viewer API
Extension Support Include domain knowledge Propose interpreted data
Internals Validation types – Structure Metadata – Records : Rows Fields data (e.g. date, coordinates) – Records : Columns ID uniqueness
Internals – Record level Validation chain – Composed by chain elements – Possible parallelism
Internals – Record level Immutable Chain element – Self contained Never relies on another chain element – Ordering independent Same behaviour wherever the element is used in the chain But what if I need really ordering?
Internals - Composition Composed chain element Exposed as one chain element
Composition example Mandatory Latitude/Longitude – Check record completion on lat/long – Check decimal lat/long value
Configuration example Select mandatory DarwinCore terms – scientificName must be provided Restrict bounding box – decimalLatitude and decimalLongitude must be between
Customization example Apply your own controlled vocabulary – Use your own dictionary for a term – ControlledVocabularyEvaluationRule
Extension Example Suggester, link to narhwal-processor – Suède –> ISO :SE – URI –>
Collaborative Share configuration Share customization (dictionary) Implement new reusable component – e.g. validation on specific Dwc-A extension
Collaboration Where to go? – Who can contribute? – Everyone What is needed? – Ideas, constructive comments – Code review, feedback
Project status Not yet released Command line interface available Follow the project on GitHub
Acknowledgments
Special thanks SiB Colombia SiB Brazil Peter Desmet John Wieczorek Dag Endresen …
Appendix 1 DwC Content validators Atlas of Living Australia sandbox VertNet – Spatial quality Displayed on occurrence pages at GBIF Spain – Darwin Test Encyclopedia of Life – dwc-validator
Appendix 1 - continue Scratchpads – dwca-validator GlobalNames – dwc-archive ruby gem wc-archive