Presentation is loading. Please wait.

Presentation is loading. Please wait.

Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.

Similar presentations


Presentation on theme: "Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université."— Presentation transcript:

1 Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université de Montréal / Canadensys Marie-Élise Lecoq, GBIF France Tim Robertson, GBIF

2 Darwin Core Archive (DwC-A) DarwinCore standard does not impose strong rules on the content associated with any DarwinCore terms.

3 Current GBIF DwC-A Validator Original goal “… test Darwin Core Archives as specified in the Darwin Core Text Guide.” http://tools.gbif.org/dwca-validator/

4 Current GBIF DwC-A Validator Original target DwC-A are simple and can be created using simple custom scripts. “… make sure GBIF and others can read the information as expected.”

5 Current GBIF DwC-A Validator Validates archive structure Offer web presence – Report viewer – API

6 Next GBIF DwC-A Validator? New goal Extends validation to the content of the archive https://github.com/gbif/dwca-validator

7 Current content validators Atlas of Living Australia sandbox VertNet – Spatial quality GBIF Spain – Darwin Test Encyclopedia of Life – dwc-validator Scratchpads – dwca-validator GlobalNames – dwc-archive ruby gem … much more See Appendix 1 for links

8 What we need? Accommodate different scopes Configuration/customizations – Use more knowledge when available Web access (page and API)

9 Scopes Data entry Desktop software – Scientific Work Flow – Statistical software Integrated Publishing Toolkit (IPT) National nodes Aggregators

10 Configuration/Customization Where the validator will be used? Can we provide more information? – e.g. I know all the dates in my file should be ISO

11 Components Library Web Extension Support

12 Library Define structure for validation process Provide a validation framework enabling sharing Close to DarwinCore specification

13 Web Web page to submit archive or URL Report viewer API

14 Extension Support Include domain knowledge Propose interpreted data

15 Internals Validation types – Structure Metadata – Records : Rows Fields data (e.g. date, coordinates) – Records : Columns ID uniqueness

16 Internals – Record level Validation chain – Composed by chain elements – Possible parallelism

17 Internals – Record level Immutable Chain element – Self contained Never relies on another chain element – Ordering independent Same behaviour wherever the element is used in the chain But what if I need really ordering?

18 Internals - Composition Composed chain element Exposed as one chain element

19 Composition example Mandatory Latitude/Longitude – Check record completion on lat/long – Check decimal lat/long value

20 Configuration example Select mandatory DarwinCore terms – scientificName must be provided Restrict bounding box – decimalLatitude and decimalLongitude must be between

21 Customization example Apply your own controlled vocabulary – Use your own dictionary for a term – ControlledVocabularyEvaluationRule

22 Extension Example Suggester, link to narhwal-processor – Suède –> ISO 3166-2:SE – URI –> http://sws.geonames.org/2661886

23 Collaborative Share configuration Share customization (dictionary) Implement new reusable component – e.g. validation on specific Dwc-A extension

24 Collaboration Where to go? – https://github.com/gbif/dwca-validator https://github.com/gbif/dwca-validator Who can contribute? – Everyone What is needed? – Ideas, constructive comments – Code review, feedback

25 Project status Not yet released Command line interface available Follow the project on GitHub

26 Acknowledgments

27 Special thanks SiB Colombia SiB Brazil Peter Desmet John Wieczorek Dag Endresen …

28 Appendix 1 DwC Content validators Atlas of Living Australia sandbox http://sandbox.ala.org.au/datacheck/ VertNet – Spatial quality Displayed on occurrence pages at http://portal.vertnet.org/search GBIF Spain – Darwin Test http://www.gbif.es/darwin_test/Darwin_Test_in.php Encyclopedia of Life – dwc-validator http://services.eol.org/dwc_validator/

29 Appendix 1 - continue Scratchpads – dwca-validator https://github.com/edwbaker/dwca_validator/ GlobalNames – dwc-archive ruby gem https://github.com/GlobalNamesArchitecture/d wc-archive


Download ppt "Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université."

Similar presentations


Ads by Google