Presentation on theme: "Update on Data Publishing With Dataverse"— Presentation transcript:
1 Update on Data Publishing With Dataverse Intro to Dataverse and Data Science TeamOptions (3) for Journals to publish data associated with their articlesOJS integration projectRigorous Data Publishing Workflows with 4.0 (Versioning)4.0 Deaccession only once published w/ a reason (cannot delete). Always a landing page with a DOIPublishing Privacy Sensitive Data with DataverseEleni Castro, Research CoordinatorInstitute for Quantitative Social Science (IQSS)Harvard UniversityDataCite Annual Nancy, FranceAugust 25, 2014
2 Introduction to Dataverse Software framework for publishing, citing and preserving research data(open source on github for others to install)Provides incentives for researchers to share:Recognition & credit via data citationsControl over data & brandingFulfill Data Management Plan requirementsHarvard Dataverse (open to all; repository instance at Harvard) currently has:761Dataverses> 1 Million Downloads54,828DatasetsEZID DOI (2013)748,554Files
3 Who’s Using Dataverse?Worldwide Dataverse InstallationsInstitutions can setup & host their own Dataverse installation (e.g., Odum, OCUL, DANS, Fudan, etc) and within them can support datasets from a variety of users (across all research domains): Researchers, Projects, Departments, Journals, etc.
4 Journals Publishing Data w/ Dataverse Option A. Journals include Dataverse as a Recommended RepositoryOption B. Authors Contribute Directly to a Journal DataverseOption B. Step 3: This can be done at the same time as the journal article is being reviewed.Option C. Seamless Integration btw Journal + Dataverse (e.g., OJS)
5 OJS-Dataverse Integration OJS JournalJournal DataverseCitation to DataCitationto ArticleDetails/Updates: 2 Year ProjectIntegrating w/ PKP’s Open Journal Systems (Data Deposit API).Pilot with ~ 50 journals + expanding outreach (100s) .OJS’ Dataverse plugin now available with latest OJS release.Future: Embed Dataverse widgets into journal article.
6 OJS Plugin: Journal Data Policies Boilerplate Templates Including Guidelines for:Authors (w/ data citation)ReviewersBoilerplate policies for authors to deposit and cite data in OJS Dataverse plugin. Includes boiler plate for Reviewers and copyeditors.Read full Data Policies / Guidelines Template:
7 OJS Plugin: Author Manuscript + Data Submission Joint Declaration of Data Citation principles #3 In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be citedOption to: (A) deposit into Dataverse AND/OR; (B) if data is already in a repository can include the data citation (w/ persistent URL/identifier).
8 OJS Plugin: Editor Reviews Article + Data Joint Declaration of Data Citation principles #3 In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited
9 Data Published in Dataverse w/ OJS Plugin In OJS:In Dataverse:2 Options in OJS:1) Dataset Published (with DOI) at Article Approval. 2) Dataset Published when Journal Issue is Released.
10 OJS Plugin: Article Published w/ Data Citation Now that the Data Citation is listed on the same page as the article it helps it go one step closer to data citation principle #1 Importance: Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications
11 Towards An Integrated Publishing Lifecycle See: Data Citation Principle #1 ImportanceThis is really a reference implementation that we hope will inspire other repositories and journal management / publishing systems to use our open source code to expand and extend the possibilities of data publishing. Its one step in the right direction to bringing data closer to the same level of importance as the published article.Image Credit: Mercè Crosas
13 Rigorous Data Publishing Workflows UploadDraftDatasetSee Data Citation Principle #7 Specificity & VerifiabilityPublishedDataset v1Publish Version 1Authors, Title, Year, DOI, Repository, V1PublishedDatasetv1.1Publish Version 1.1: small metadata change; citation doesn’t change.Compliant w/ Joint Declaration of Data Citation Principles and based on the paper by Altman and King, 2007 were pushing for machine readable information to be added to data citations (UNF, persistent identifier). Principle #7 Specificity & Verifiability Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally citedPublish Version 2: File change (automatic); big metadata change (e.g., author, title).PublishedDataset v2Authors, Title, Year, DOI, Repository, UNF, V2See: Altman, M., & King, G. (2007) doi: /march2007-altman
14 Dataset Versioning (1)If you don’t add or change any files and havent changed any default metadata fields (eg, author, title) you will get the option to publish the data citation as a minor or major version
15 Dataset Versioning (2)If you add or change files the version will go up a major version (ex. V1 to V2)
16 Dataset Versioning (3)Ex. Added files to a Dataset so it bumped up to a major version change.If you add or change files the version will go up a major version (ex. V1 to V2). And here is a screenshot of how you would see the version differences when changing/adding files.
17 Dataset Versioning (4)Ex. Added small metadata change to a Dataset so it bumped up to a minor version change.If you add or change files the version will go up a major version (ex. V1 to V2)
18 Deaccession Data in 4.0Before a Dataset is published the DOI is private (reserved). Only when published is it made public & searchable.In accordance w/ Data Citation Principle #6 Persistence: A Published Dataset cannot be deleted; only deaccessioned, with a reason.You can Deaccession (in 4.0):a version(s) of a Dataset, oran entire Dataset.And now that we support more granular levels of versioning we are also working on improving dataset deaccession. Prior to 4.0 you could delete a published a dataset which goes completely against data citation best practices of Persistence (#6)
19 Deaccession Workflow (Step 1) Ex. This file was added in v2 and has identifiable information.To try out deaccessioning, go to a dataset you’ve already published (or add a new one and publish it), click on Edit Dataset, then Deaccession Dataset. If you have multiple versions of a dataset, you can select here which versions you want to deaccession or choose to deaccession the entire dataset.an entire dataset.
20 Deaccession Workflow (Step 2) You are then presented with the option to deaccession the entire dataset or just certain version(s) with a drop down list of "Reasons for deaccession". Are there any important reasons to deaccession that we might be missing here?
21 Deaccession Workflow (Step 3) Screenshot 3: Once you select a reason you can enter additional information. This is particularly important if you select Other. If the Dataset has moved then a URL may also be entered (we validate URLs).
22 Deaccession Workflow (Step 4) Deaccession Landing PageData Citation Principle #6 PersistenceA persistent landing page, that includes very limited metadata (see below), will always be accessible to the public if they use the DOI provided in the citation for that dataset. In accordance with Data Citation Principle #6 Persistence: Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe
23 Data Publishing After 4.0 (2015) Publishing Privacy Sensitive DataSecure DataverseDataTags (demo) (based on Privacy Laws and DUAs)In the current version of Dataverse, identifiable data cannot be deposited safely. Based on the data publication need for sufficient information to understand and reuse the data (metadata, documentation, code) after 4.0 users will be able to safely upload datasets that have identifiable information, while a minimal risk version of the data is automatically rendered (synthetic data, differential privacy, statistical disclosure control (SDC), etc) and can be made available for the public.Full interviewIntegration with ORCID (API): create ORCID account, connect all Dataverse datasets to ORCID account. (Note: 4.0 will already allow for authors to enter ID.)
24 Thank you! Contact: firstname.lastname@example.org More information: