Presentation is loading. Please wait.

Presentation is loading. Please wait.

ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010.

Similar presentations


Presentation on theme: "ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010."— Presentation transcript:

1 ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010

2 Acknowledgements and Sources  Bob Cook, ORNL DAAC Scientist  DataONE Core CI team, particularly Matt Jones (UCSB) and Dave Vieglais (U Kansas)  ESIP Product & Stewardship, particularly Ruth Duerr (NSIDC) and Bob Downs (SEDAC)  Note: ORNL’s CDIAC has started assigning DOI’s for all of their finalized data sets. 22 February 20102

3 ORNL DAAC Citation Policy 22 February 20103  http://daac.ornl.gov/citation_policy.html  http://daac.ornl.gov/citation_style.html  Citation is in the name of the investigators  Example (with DOI):  Turner, D.P., W.D.Ritts, and M. Gregory. 2006. BigFoot NPP Surfaces for North and South American Sites, 200-2004. Data set. Available from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. [http://daac.ornl.gov]. doi:10.3334/ORNLDAAC/750

4 What Problem Are We Addressing? 22 February 20104  ORNL DAAC has used data citations for many years  Track use of data in literature (impact)  Provide credit to investigators  Create incentives for publishing and sharing data  Some journal editors rejected URL citations  Regarded as transient (very valid concern)  Some scientists didn’t see data as “publication”  We want data sets listed on CV’s  Strong way to measure impact of data set for tenure

5 What Is a DOI? 22 February 20105  Technically, it’s a particular Handle implementation  Limited number of registrars  Each publisher gets a prefix (e.g. 10.3334)  Publisher assigns an identifier after the prefix  Publisher registers the DOI with a URL and metadata  Endpoint URL can be updated as systems evolve  Registration can include back-links (documents cited)  Enables citation chain  Can help establish dependence of data sets (future use)  DOI resolves at use time to current endpoint URL  http://dx.doi.org/10.3334/ORNLDAAC/945  doi:10.3334/ORNLDAAC/945

6 ORNL Experience 22 February 20106  Working with CrossRef as a registrar  $500/year membership fee, ~$250 to register 900 DOIs  Our DOIs resolve to a web page about the dataset  Very positive reaction from investigators  Makes usage metrics somewhat easier  Haven’t implemented backlinking yet, but should  It’s a social contract that we don’t change the data  Updated dataset ==> new DOI (if “significant”)  Minor updates (spelling corrections, clarifications) OK  Adding a new data format file is harder to decide

7 Different types of update operations 22 February 20107  Correct reference or spelling in documentation  No change in DOI, but still should show provenance  Augment documentation for clarity  No change in DOI, but still should show provenance  Add copy of data in new format  Probably no change in DOI, but still should show provenance  Correct error in data  New DOI; show provenance  Append new data  New DOI; show provenance

8 DOIs work well for some things 22 February 20108  Finalized datasets (ones that don’t change)  Datasets that change occasionally  Global Fire emission dataset updated annually  Documents (best practices, product documentation)  Could work for Remote Sensing at the product level

9 DOI’s less appropriate for other things 22 February 20109  Cost (primarily) prohibits assigning for granules  Unique ID’s needed, but may be data center-internal  DOIs are a publishing standard, adapting for data ID  Dynamically generated and stream data  One DOI per MODIS product probably makes sense  Desirable to be able to reproduce data, but hard  MODIS subsetter (particularly considering reprocessed granules)  Would have to have a separate identifier for each request  Other processing tools, like OGC web services  Possibly use data citations with workflow provenance  Partition data citation from data reproducibility

10 Good citations help assess dependence 22 February 201010  Synthesis is increasingly important science  Are all of the data used in the study independent?  Example: Luyssaert et al Net Primary Productivity (NPP)  Data at ORNL DAAC (doi:10.3334/ORNLDAAC/949)  Article at doi:10.1111/j.1365-2486.2007.01439.x  Drawn from many sources (very well documented)  ftp://ftp.daac.ornl.gov/data/global_vegetation/forest_carbon_flux/comp /appendix_a_database_sources.pdf  Future work using Luyssaert dataset can’t compare it to any of the underlying data  Also an issue in cal-val for remote sensing  What data was used for this RS product?

11 Data Identifiers are evolving 22 February 201011  DataCite.org (German Library + others, including CDL)  Particularly focused on research data  Life Science Identifiers (LSID)  Heavily used in oceans community  Some concerns about URN versus URI  See http://en.wikipedia.org/wiki/LSIDhttp://en.wikipedia.org/wiki/LSID  Globally Unique Identifiers (GUIDs)  Need some type of resolution mechanism  Big challenge to support something “forever”

12 Impact Metrics 22 February 201012  “Cited” means formal citation in reference list  “Referred” means the data was acknowledged somewhere in the body of the paper


Download ppt "ORNL DAAC Experience With Digital Object Identifiers (DOIs) Bruce Wilson, ORNL DAAC Manager for NASA Data Center Managers telecon 22 Feb 2010."

Similar presentations


Ads by Google