Presentation is loading. Please wait.

Presentation is loading. Please wait.

FORCE11 Data Citation Synthesis Group

Similar presentations


Presentation on theme: "FORCE11 Data Citation Synthesis Group"— Presentation transcript:

1 FORCE11 Data Citation Synthesis Group
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group

2 DataCite: DataCite was founded in 2009 to “increase acceptance of data as legitimate, citable contributions to the scientific record,” among other objectives. A Comment from Scott Edmunds on Amsterdam Manifesto: Wanted to make this stronger. 9. As a first-class records of research, and to encourage and credit their dissemination, publishers and citation indexes should treat data citations in the same manner as article publications, and enable their citation, tracking and credit in the same way. DC: 1 and C-I are similar as to intent, but C-I goes a bit further, and emphasizes the need to elevate the status of data citations to that already accorded citations to other types of objects that comprise the scholarly record Recommendation: General agreement; modify to capture sentiments of all.

3 DataCite: Research data for which identifiers will be assigned must be located in data centres or repositories committed to persistence and maintenance.” Comment from Amsterdam Manifesto:What about private human data that requires special permissions to access? DCC has nothing to say on this point F-2 and C-III both mention persistence, but differ in that F-2 specifies the mechanism of “public” repositories (which is partly about access), while C-III is agnostic about the nature of the repository and mode of access (open or not). There is a legitimate debate to be had over the value of open access, but citation practices need to be applicable to data stored in repositories that are either open or subscription, public or privately owned (and even to data not stored in a repository.) Recommendation: Do not get into the specifics of persistence of data and what should be kept for how long; do make it clear that all data may not be fully public; do recommend that data be stored in places committed to maintenance and persistence rather than on an individual’s machine. Make it clear that the citation may outlive the data

4 Comments: From MM: I would like a note or a word about attribution when the data come from large public databases that contain entries from potentially thousands of users. Provenance and attribution for individual data sets collected by a single individual, group or organization can be handled by theses principles, but attribution in the case of aggregate data requires a special mention. I think in that case, the database from which the data were acquired should be cited and the date on which it was accessed. But do they need to mint a copy? Tim: Cited so as to facilitate credit, discovery, provenance and attribution Should link to documents that provide implementation. “Published conclusions should cite the evidence that they are based on” Tim: From the clinical side, not all data may be accessible Provenance: Distinction between citing an entire aggregation vs an individual record. Time stamping may be necessary. Needs to be a couple of different levels of detail here. Link to relevant documents DataCite Metadata Schema supports the ability to describe a rich set of relationships between the article and data, including IsCitedBy IsSupplementTo IsReferencedBy IsDocumentedBy suggesting that a registered dataset could “provide” metadata to assist with its placement in relationship to the article. A Recommendation: Merge: data should be cited…so as to facilitate credit, discovery, provenance and attribution. Perhaps need to be a couple of levels of detail here We can infer that the reasons for using the mechanism specified in F-are those referred to in C-I, C-II, C-V, and perhaps C-VI.

5 Major discussion point: Data need to be distinguishable as data
Recommendation: Use CoData but perhaps give example that a bibliography might be an appropriate place For example, in our current mode of publishing, it should be handled at the same level as bibliographic materials DataCite recommends a bibliographic citation style, but makes no statement about its location in the publication. A Major discussion point: Data need to be distinguishable as data From MM: Perhaps this is where Scott Edmunds concern might be addressed? An admonition to publishers and others that such information should not be stripped from an article, regardless of publication style and formats. From MM: I also think that we should recommend appropriate identifiers be inserted into the text. See the article by Jo McIntyre that was circulated. Just having them in the reference list does not seem quite enough to me. See comment below about semantics. Data needs to identified uniquely for those tracking its use. But how would you identify data? This issue needs to be discussed further. Ruth: ; but definition of data or distinction may be difficult; but the citer can perhaps decide. Anita brings up the fact that types of publications are changing so the concept of a bibliographic record might change Co data too abstract; Should be a format for citing data and handled as bibliographic resources Comment from Henry Rezpa on Amsterdam Manifesto: I totally endorse the concept of citable data.  But here I note a quote from the ESI on page 12 of "Emma, please insert NMR data here! where are they? and for this compound, just make up an elemental analysis..” which has  gone viral. Much of the blame is being laid at the door of the reviewers who "should have spotted this". The "ESI" refererred to does  NOT adhere to the  Manifesto. It requires a human to track down eg   and a human to parse the semantics relating to the request made of  Emma. My colleague  Peter Murray-Rust has made the point powerfully at  that to deserve to be cited, data must also be semantically enabled. That point is  NOT one of the eight made in the manifesto (perhaps it was considered obvious, or off-topic?). We need data to be both citable and semantically processable!  F-4 is about the means rather than the purpose. We can infer that this is a means to enforce the purpose mentioned in C-I. The degree to which a data citation should resemble a bibliographic citation is debatable. It might be better to specify the purposes and functions that the citation should fulfill or facilitate, and leave the details of implementation to the communities who will need to implement them.

6 though different citations might use
DataCite: a persistent approach to access, identification, sharing, and re-use of datasets” is offered by DataCite, which uses DOIs to achieve this aim. A DCC 1: The citation itself must be able to identify uniquely the object cited, though different citations might use different methods or schemes to do so Perhaps CoData's Flexibility requirement is also relevant here? Flexibility: Citation methods should be sufficiently flexible to accommodate the variant practices among communities Perhaps DataCites metadata requirements belong here?: DataCite Metadata Schema supports the ability to describe a rich set of relationships between the article and data, including IsCitedBy IsSupplementTo IsReferencedBy IsDocumentedBy suggesting that a registered dataset could “provide” metadata to assist with its placement in relationship to the article F-5 goes beyond C-IX in recommending a DOI as the specific type of persistent identifier, which is a means rather than a purpose. The purpose of using registries of persistent identifiers such DOIs, ARKs, or other handles is to provide persistence of findability if the location of a digital object changes. (Purpose stated in C-III.) Many communities of practice have already developed systems of persistent identifiers, some of which pre-date the existence of DOIs. Some people argue that proper use of URIs (and the redirect mechanism that is already a part of HTTP) could accomplish persistence without the intermediate step of a PID registry. While use of widely-accepted metadata standards (See C-IX) helps 5 to ensure interoperability, it seems presumptuous to specify DOIs over other PID systems already in use.

7 dataset; b. indeed, when expressed
Data Cite: “Clients will ensure that the URL assigned to the identifier provides users with the necessary information for making meaningful use of the data. Often this will be in the form of a landing page…It is best practice to have a landing page for all registered data…” B DCC 3 a: it must provide the reader with enough information to access the dataset; b. indeed, when expressed digitally it should provide a mechanism for accessing the dataset through the Web infrastructure Comment: The thread contained a discussion of actionable metadata vs. actionable data, I believe. An issue which I think needs discussion. F-6 specifies a both a purpose (actionability by both humans and machines) and a particular means (a landing page). C-IV specifies only the purpose. The best means for accomplishing this purpose may evolve over time.

8 DataCite supports content negotiation:
DCC 1: The citation itself must be able to identify uniquely the object cited, though different citations might use different methods or schemes to do so. 2. It must be able to identify subsets of the data as well as the whole dataset. Comment from MM: Versioning is different than subsets; agree both should be addressed. Perhaps here is we need to address the issue of aggregators like the Protein Data Bank where users will be referencing snapshots of a dynamic database. F-7 refers generally to the need to identify the specific version of the data being referenced. C-VI, C-VII, and C-VIII refer to distinct aspects of version: Provenance, Granularity, and Verifiability. The distinction among these aspects is useful.

9 The DataCite Metadata Schema supports the ability to supply metadata for unlimited contributors, including name, role (or type) and identifier information. A DCC 4B: In particular, there need to be services that use the citations in metrics to support the academic reward system, and services that can generate complete citations. F-8 and C-II both address the function of attribution. C-II draws the subtle distinction between legal attribution and scholarly norm of giving credit to others for work they have performed. These are similar concepts, but there are some important differences between them, and what is necessary to accomplish them.

10 MM: I think this is a very important principle missing from the others
MM: I think this is a very important principle missing from the others. We want to move towards interoperable systems and so there can only be so much flexibility allowed. We don’t want the situation with current citations where each is in a different style.


Download ppt "FORCE11 Data Citation Synthesis Group"

Similar presentations


Ads by Google