Presentation is loading. Please wait.

Presentation is loading. Please wait.

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Similar presentations


Presentation on theme: "Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape."— Presentation transcript:

1 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing John Wieczorek (tuco@berkeley.edu) Information Architect Museum of Vertebrate Zoology, UC Berkeley Buenos Aires (Argentina) 28 September 2011

2 Background: Data Exchange ABCD (TDWG Standard) > 1200 concepts XML Shared via BioCase, Tapir Darwin Core (pre-standard v. 1.2, 47 versions) 48 concepts, specimens XML Shared via by DiGIR Darwin Core (pre-standard v. 1.4) 46 concepts (plus extensions), specimens XML Shared via Tapir Darwin Core (TDWG Standard) 172 concepts (156 in Simple Darwin Core), biodiversity data CSV, XML, RDF, JSON, … Shared via Text files, Tapir, Darwin Core Archive…

3 Darwin Core Archive Primary Biodiversity Data Taxonomic Data Metadata http://www.someplace.org/data.zip

4 Darwin Core Archive Complete Package Standard Darwin Core terms in a single, self-contained dataset Taxon records or Occurrence Records Data set metadata in EML

5 Simple format (text files) Efficient harvesting (single file) Efficient storage (compressed) Easy access (no special software required) Extensible (related files in one archive) Darwin Core Archive: Benefits Preferred format for publishing data in the GBIF network

6 Darwin Core Archive: Anatomy Archives always have a metadata file as EML

7 Ecological Metadata Language (EML) Title and Abstract Citation and Attribution Contact and Authors Geographic Scope Sampling Methods Bibliography and more… For describing data sets – even unpublished ones

8 Darwin Core Archive: Anatomy Archives always have a core data file as text

9 Core data file types Records based on taxa – one species per row Records based on species occurrences – one per row OR

10 Darwin Core Archive: Anatomy Archives always have a core data file as text

11 Core contains a “core ID” column, unique for every record in the file Darwin Core Archive: Anatomy

12 Columns are matched to Darwin Core terms Darwin Core Archive: Anatomy

13 Columns that do not match to a Darwin Core term may be included, but are ignored “Wingspan” is not a Darwin Core term Darwin Core Archive: Anatomy

14 1) Rename columns in text file Two ways to match columns to Darwin Core terms Darwin Core Archive: Anatomy

15 2) Match columns to terms in a separate meta.xml file Two ways to match columns to Darwin Core terms Darwin Core Archive: Anatomy

16 meta.xml matches the columns in the core data file (species.txt) More on how to make the meta.xml file later… Darwin Core Archive: Anatomy

17 Archives can include extension files Species.txt Common_names.txt Extensions allow multiple records to be linked to a core record. Extensions link to the core through the core ID Darwin Core Archive: Anatomy

18 GBIF hosts extension definitions http://rs.gbif.org/extension/

19 Multiple extensions files can be linked to the core Darwin Core Archive: Anatomy

20 All files are stored in a single folder Darwin Core Archive: Anatomy

21 The folder is zipped. This is a Darwin Core Archive Data files Column matching file Data set documentation Darwin Core Archive: Anatomy

22 http://www.organisation.org /my_data.zip Archives on a web server can be accessed by a URL. Share this URL to “publish” your data! Darwin Core Archive: Publishing

23 Darwin Core Archive: Publishing Options

24 GBIF Spreadsheet Templates

25 Integrated Publishing Toolkit

26 Data Hosting Centers

27 Darwin Core Mapping Assistant Metafile http://tools.gbif.org/dwca-assistant/

28 Darwin Core Mapping Assistant

29 GBIF Darwin Core Archive Spreadsheet Templates: data in a spreadsheet already simple archive authoring IPT: creating/managing archives for multiple data sets managing archives for multiple organisations metadata as GBIF Metadata Profile of EML Make Your Own: automating archive generation customisation Hosting center: economy of scale Infrastructure and support Combinations… Darwin Core Archive: Publishing Options

30 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing Presenter (email) Role Organization Buenos Aires (Argentina) 28 September 2011


Download ppt "Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape."

Similar presentations


Ads by Google