Presentation is loading. Please wait.

Presentation is loading. Please wait.

GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.

Similar presentations


Presentation on theme: "GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell."— Presentation transcript:

1 GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell

2 INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

3 INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

4 DATA PUBLISHING LANDSCAPE DiGIR/TAPIR in high use to publish biodiversity data Idea for simple, compressed text-based file for publishing introduced at TDWG GBIF introduces IPT 1.0 GBIF redevelops IPT GBIF introduces IPT 2.0 Data Publishing taught at Nodes training Nodes and aggregators begin to install and use IPTs Occurrence and checklist type datasets along with IPT installations show continued growth 2008 200920102011 2012  2011

5 DATA PUBLISHING LANDSCAPE - STATISTICS http://www.gbif.org/ipt/stats

6 DATA PUBLISHING LANDSCAPE - STATISTICS

7 DATA PUBLISHING LANDSCAPE 2015 The continued GBIF commitment to improving access to biodiversity data Refinement and expansion of standards and publishing software Evolving social norms Most data still published with simple occurrence core Portals do not contain the features to support richer data Many institutions still need convincing to publish biodiversity data http://www.gbif.org/page/82104

8 INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

9 WHAT IS BIODIVERSITY DATA? Digital text or multimedia data record detailing facts about the instance of occurrence of an organism, i.e. on the what, where, when, how and by whom of the occurrence and the recording.

10 WHAT IS DATA PUBLISHING? “Publishing” refers to making biodiversity datasets publicly accessible and discoverable, in a standardized form, via an access point, typically a web address (a URL). IPT ∞

11 BIODIVERSITY DATA TYPES http://www.gbif.org/publishing-data/summary#datatypes Checklists Occurrences Metadata

12 BIODIVERSITY DATA TYPES – SAMPLE DATA http://www.gbif.org/newsroom/news/sample-based-data Samples

13 DATA STANDARDS http://www.tdwg.org/standards/ ABCD Access to Biological Collection Data (2005) DwC Darwin Core (2009) AC Audubon Core Multimedia Resources Metadata Schema (2013) NCD Natural Collection Descriptions (Draft)

14 DARWIN CORE http://rs.tdwg.org/dwc recordedBy: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first. Examples: "José E. Crespo", "Oliver P. Pearson | Anita K. Pearson”

15 SIMPLE DARWIN CORE SIMPLEDWC is a specification for one particular way to use the Darwin Core terms - to share data about taxa and their occurrences in a simply structured way - and is probably what is meant if someone suggests to "format your data according to the Darwin Core". http://rs.tdwg.org/dwc/terms/simple/index.htm

16 DARWIN CORE ARCHIVE A Darwin Core Archive (DwCA) is the text representation of data formatted to Darwin Core. A DwCA is a compressed file containing a minimum of three files. http://rs.tdwg.org/dwc/terms/guides/text/index.htm

17 STAR SCHEMA Ext 2 Core Ext 1 Ext 3 meta.xml EML.xml + DwC Archive Ext 4 Ext 5

18 MAPPING CORES Taxon Core The category of information pertaining to taxonomic names, taxon name usages, or taxon concepts. Released April 2015, this version removes terms dcterms:source and dcterms:rights, and adds dcterms:license. 43 terms. Occurrence Core The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.). Released July 2015, this version removes terms dcterms:source, dcterms:rights, dwc:individualID, dwc:occurrenceDetails, and adds dcterms:license, dwc:organismQuantity, dwc:organismQuantityType, dwc:organismID, dwc:organismName, dwc:organismScope, dwc:associatedOrganisms, dwc:organismRemarks, dwc:parentEventID, dwc:sampleSizeValue, dwc:sampleSizeUnit. 169 terms. Event The category of information pertaining to a sampling event. Issued 29 May 2015. 95 terms

19 EXTENSIONS Darwin Core does not provide terms for every possible type of data. 22 registered 25 under development Examples Audubon Media Description (aka Audubon Core) Darwin Core Identification History Darwin Core Measurement or Facts http://tools.gbif.org/dwca-validator/extensions.do

20 STAR SCHEMA EXAMPLE - OCCURRENCE Media Occurrence Core Geographical Determination meta.xml EML.xml + DwC Archive Occurrence Germoplasm

21 STAR SCHEMA EXAMPLE - CHECKLIST Literature Taxon Core Description Occurrences meta.xml EML.xml + DwC Archive Checklist Vernacular Distribution Types

22 STAR SCHEMA EXAMPLE - SAMPLE Event Core Occurrences Measurement/Fact meta.xml EML.xml + DwC Archive Samples Relevé

23 DATA NORMALIZATION What is data normalization? Reasons to normalize a database Normal forms http://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/http://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/, http://databases.about.com/od/specificproducts/a/normalization.htm, http://www.dotnet-tricks.com/Tutorial/sqlserver/756N210512-Database-Normalization-Basics.html http://databases.about.com/od/specificproducts/a/normalization.htmhttp://www.dotnet-tricks.com/Tutorial/sqlserver/756N210512-Database-Normalization-Basics.html

24 DATA QUALITY Tools Should you work on improving the data? Importance of feedback http://community.gbif.org/pg/pages/view/48546/precourse-activities

25 DATA PUBLISHING METHODS

26

27 DATA PUBLISHING METHODS – POLLS To be explained in the live session… 

28 INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

29 PROMOTION OF DATA PUBLISHING Topic of discussion at the Nodes Training in Berlin in 2013. Core element in the day-to-day work of Node Managers.

30 PROMOTION OF DATA PUBLISHING - BARRIERS Psychological & cultural barriers 1.Lack of knowledge 2.Lack of understanding 3.Lack of will 4.Perceived data value 5.Privacy concerns 6.Lack of authorization 7.Lack of time / planning 8.Lack of capacity 9.Lack of funding 10.Lack of infrastructure http://www.gbif.org/publishing-data/benefitshttp://www.gbif.org/publishing-data/benefits, http://www.gbif.org/resource/81196http://www.gbif.org/resource/81196 Institutional barriers Capacity barriers Practical barriers

31 PROMOTION OF DATA PUBLISHING - RESTRICTIONS 1.Refuse to share. 2.Refuse to share until they have exhausted the planned use of the data. 3.Will only share their data for a fee. 4.Will only share data under specific restrictions. 5.Agree to share data openly.

32 PROMOTION OF DATA PUBLISHING - STRATEGIES 1.Facilitate access to financial support. 2.Call upon commitments or legal mandates. 3.Call upon open access / moral principles. 4.Show the benefits of a better data management. 5.Show the benefit for their scientific careers. 6.Peer pressure. 7.Start / support big digitization programmes. 8.Start / support data repatriation efforts.

33 PROMOTION OF DATA PUBLISHING – DISCUSSION Challenges Not wanting to publish and/or not wanting to publish all the data Technical threshold of an IPT Restrictive licensing of data Strategies Start smaller – meta data only Promote one-off publishing with multiple exposures Provide hosted IPTs to eliminate technical threshold Illustrate licensing with telling examples. Promote and organize trainings to bring reluctant publishers in with an easier “sell” like data papers. http://community.gbif.org/pg/forum/topic/48616/precourse-activity-promoting-data-publishing/

34 INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

35 USE CASES - INTRODUCTION Explore four use cases based on current publishing practices Literature Observation data Natural history collections Checklists Complete two exercises Definition of publishing strategies Publish datasets

36 USE CASES: DATA FROM LITERATURE Blue Group

37 USE CASE 2: OBSERVATIONAL DATA Green Group Red Group

38 USE CASE 3: NATURAL HISTORY COLLECTION DATA Yellow Group

39 USE CASE 4: TAXONOMIC CHECKLISTS Purple Group

40 INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

41 GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell


Download ppt "GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell."

Similar presentations


Ads by Google