Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifiers and Citation

Similar presentations


Presentation on theme: "Identifiers and Citation"— Presentation transcript:

1 Identifiers and Citation
Frequently Asked Questions Joan Starr John Chodacki California Digital Library

2 Identifiers & Citations: FAQs
Why identify data? Which identifier is best for citation? What’s the difference between DOIs & ARKs? What’s the difference between DataCite & Crossref? What level of granularity is right for data citation? How should versioning be handled? How should dynamic data be cited? Your questions here! Here are some of the most frequently asked questions I hear from clients, webinar participants, and listserv discussions. Hand out post-its (or set up google doc) and ask everyone to write out other questions or common scenarios that aren’t up there or included…anything about PIDs/DOIs/ARKs that we are not answering.  

3 Why identify data? get credit ensure transparency measure impact
track use (even over a long project) aid reproducibility

4 Which identifier? machine (& human) actionable globally unique
widely used by a community commitment to persistence your requirements here Credit: Jt. Declaration of Data Citation Principles

5 DOIs ARKs What’s the difference? Strict metadata requirements
Flexible metadata guidelines From the scholarly communication community From the archives and museums community Established “brand name” Option-rich, open source Use case: Data Citation Use case: Data Documentation

6 ARKs and DOIs in data management plans
Image credit:

7 ARKS: Data Documentation
What it looks like: At top-level directory/folder: Project Title Unique Identifier Date (yyyy or yyyy.mm.dd) At sub-directories: optional identifiers at granular levels Sample plan language: [Team] follows the recommended best practice for good data management by assigning unique identifiers (ARKs) to the data as part of the data documentation. Why it’s important: Researcher benefits: Data documentation helps you keep track of (and remember) aspects of your data throughout the research project. Reference for “documentation and metadata”:

8 DOIs: Data Citation What it looks like: Sample plan language:
Aguilée R, Lambert A, Claessen D (2011)  Data from: Ecological speciation in dynamic landscapes. Journal of Evolutionary Biology doi: /dryad.74024   Publication of data shall occur during the project, if appropriate, or at the end of the project, consistent with normal scientific practices. [Team] follows a standardized data product citation including DOI, that indicates the version and how to obtain a copy of that product. Why it’s important: OSTP mandate to: identify and provide “appropriate attribution to scientific data sets” Researcher benefits: Credit, increased citations, increased productivity OSTP Public Access Memo: Increased citations: Piwowar’s 2007 study:

9 DataCite Crossref What’s the difference? Founded 2009 Founded 2000
From the research libraries and information centres community From the publishing community Primary service focus: data and related objects Primary service focus: journal articles and related objects DataCite and Crossref also have many things in common: both are membership organizations, both are focused on scholarly communications, both offer the Digital Object Identifier or DOI. They are collaboration partners on a number of citation solutions.

10 What level of granularity is right for data citation?
Image: by velmc Granularity describes the degree of aggregation of the object to be registered. Different levels of granularity can be useful depending on discipline or resource. The identification of an object can be executed to any desired level of granularity (element of a file, a file, a file collection, etc.) as the purpose dictates. When deciding which level of granularity is used to register an object, THERE ARE SEVERAL FACTORS TO KEEP IN MIND.

11 Granularity considerations
Citation: What is likely to be cited? Use cases: How will funders/publishers/administrators/etc. use the data? Complexity: What type of resource/structure is being registered? Complex objects may require a more granular identifier structure than a document or image file. Sustainability: PID owners must be able to maintain target URLs and citation metadata for PIDs. Relationships: The DataCite Metadata Schema includes a flexible mechanism to specify relationships between objects. DOIs have strict contractual requirements for maintenance.

12 How should versioning be handled?
Image: by sweet as candy photo I’ll base my versioning practice suggestions on 2 sources: DataCite Metadata Schema documentation and the Earth Science Information Partners (ESIP)

13 Versioning considerations
ESIP: track major_version.minor_version For ESIP, see: Datacite: Register a new DOI for a major version change. Individual stewards need to determine major vs. minor. DataCite: Use AlternateIdentifier and RelatedIdentifier to indicate various information updates and Description to indicate the nature and file/record range of version. For DataCite, see:

14 How should dynamic data
be cited? by Jon Anderson Citation of dynamic data is an evolving topic. There is no single best practice, because the best approach depends on the characteristics of the data, as well as the capabilities of the repository storing the data. I’m going to present the DataCite recommendations and then point you to some sources for more information and developments.

15 Citing dynamic data Cite a specific slice (the set of updates to the dataset made during a particular period of time or to a particular area of the dataset); Cite a specific snap-shot (a copy of the entire dataset made at a specific time); Cite the continuously updated dataset, and add an Access Date and Time to the citation; Cite a query time-stamped for re-execution against versioned DB (Research Data Alliance proposal) For datasets that are continuously and rapidly updated, there are special challenges both in citation and preservation. 4 approaches are possible. Note that a “slice” and “snap-shot” are versions of the dataset and require unique identifiers. The third and fourth options are controversial. The 3rd, because it necessarily means that following the citation does not result in observation of the resource as cited. The 4th, because it shifts a significant burden onto repositories to store database versions for all the queries.

16 More about dynamic data
Ball, A. & Duke, M. (2015). ‘How to Cite Datasets and Link to Publications’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: For more information about the RDA approach, see this post on DataCite’s blog: Join the RDA Working Group to follow activity: Another place to watch is Force11:

17 Image: https://www. flickr
Image: by Henry

18 @joan_starr @chodacki


Download ppt "Identifiers and Citation"

Similar presentations


Ads by Google