Presentation on theme: "SCIENTIFIC DATA Presentation to the California Digital Library, 20 th June 2014 Ruth Wilson – Head of Publishing Services Andrew Hufton – Managing Editor."— Presentation transcript:
SCIENTIFIC DATA Presentation to the California Digital Library, 20 th June 2014 Ruth Wilson – Head of Publishing Services Andrew Hufton – Managing Editor Iain Hrynaszkiewicz – Head of Data and HSS
Introduction Open Access at NPG Drivers for data publication Scientific Data Next steps
Development of Open Access 3 General landscape 2000: PubMed Central launched 2003: The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities signed 2005: The Wellcome Trust introduced its open access mandate to Wellcome-funded research 2005: National Institutes of Health adopted NIH Public Access Policy 2006: RCUK open access mandates come into effect 2009: First international Open Access Week 2013: Obama administration US and HEFCE, UK both introduce open access mandates for taxpayer-funded research 2014: Chinese science research funding agencies mandate open access Open Access at NPG 2001: Nature, Science, and the Third World Academy of Sciences launch SciDev, a free online source of science news and research 2002: NPG ceases to require copyright transfer on research articles 2005: First full OA title launched, Molecular Systems Biology 2009-2011: All non- Nature journals offer OA option 2011: Scientific Reports launched 2013: Nature Publishing Group partners with open access publisher Frontiers. 2014: Launch of Scientific Data 2014: Launch of Nature Partner Journals 2014: 51% of NPG and Frontiers content is published open access
Open Access at Nature Publishing Group 4 Nature Communications Launched in 2010, NatComms now has an impact factor of 10.015 and receives more submissions than Nature Scientific Reports Fully open access, Scientific Reports is a primary research publication covering all areas of the natural sciences. Frontiers A community-oriented open-access academic publisher and research network. Society open access journals We publish 18 fully open access titles with society partners Scientific Data Open access publication publishing Data Descriptors, peer-reviewed, scientific publications that provide detailed descriptions of datasets. Nature Partner Journals A new series of online open access journals, published in collaboration with world-renowned international partners. Subscription journals offering open access option Over 40 journals in the NPG family offer an open access option.
Drivers for data publication 6 Two important factors are driving to make research data more available and reusable: To ensure the scientific process is transparent and can be scrutinised and research results reproduced To speed the scientific process, lead to new insights and reduce duplicated and repeated work To achieve this research data needs to be Available, Discoverable, Interpretable, Re-usable, Citable Stakeholders Funders/researchers/research institutes/data repositories/libraries/learned societies/publishers/standards groups/curators
Researchers and data 7 What do researchers do with their data? ~ 75% of researchers store their data locally and do not publish it. ~17% publish data in supplementary info ~14% delete research data ~10% deposit data in a public repository A strong collaborative culture exists among researchers: They share 60% of their data with their colleagues 50% look at other researchers’ datasets at least once a month Researchers are supportive of Scientific Data: Over 90% reacted positively to the concept of Scientific Data 80% believed that Scientific Data would increase repository deposition rates What was important to them? 96% - increased visibility and discovery of their research data 95% - increased usability of their research data 93% - credit mechanism for those who take the time to deposit and explain their data 80% - peer review of content/datasets
A new open-access publication for descriptions of scientifically valuable datasets
Get Credit for Sharing Your Data Publications will be indexed and citeable. Open-access Authors select from three Creative Commons licenses for the main Data Descriptor. Each publication supported by CCO metadata. Focused on Data Reuse All the information others need to reuse the data; no interpretative analysis, or hypothesis testing Peer-reviewed Rigorous peer-review focused on technical data quality and reuse value Promoting Community Data Repositories Not a new data repository; data stored in community data repositories
Focus on data reuse Sections: Title Abstract Background & Summary Methods Technical Validation Data Records Usage Notes Figures & Tables References Data Citations Data Descriptor Detailed descriptions of the methods and technical analyses supporting the quality of the measurements. Does not contain tests of new scientific hypotheses
Data Descriptor Experimental metadata or structured component (in-house curated, machine-readable formats) Article or narrative component (PDF and HTML)
Data Citations Formally link Data Descriptor to external data records Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group, incl.: -CODATA -Research Data Alliance, -Force11
In-house curation team: assists users to submit the structured content via simple templates and an internal authoring tool performs value-added semantic annotation of the experimental metadata For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification: analysis method script Data file or record in a database Data Descriptor structured metadata (CC0)
Clear data sharing policies Data must be deposited to an approved data repository before manuscript submission, prior to peer-review. If datasets are private, they must be made accessible to editors and referees in a secure and confidential manner. Must agree to release data to the public, without undue restrictions, at the time of publication. Reasonable controls allowed for datasets with human privacy restrictions.
Data repositories criteria 1.Broadly support and recognition within their scientific community 2.Ensure long-term persistence and preservation of datasets in their published form 3.Provide expert curation 4.Implement relevant, community-endorsed reporting requirements 5.Provide for confidential review of submitted datasets 6.Provide stable identifiers for submitted datasets 7.Allow public access to data without unnecessary restrictions 17
Our recommended repositories 18 We currently recognize over 60 public data repositories. We have integrated systems with both figshare and Dryad No institutional repositories yet, but we are open to adding them …
The right licence for the right content Data: the primary datasets will reside in public repositories. Partnering with figshare and Dryad, which both use the CC0 waiver. Metadata: released under the CC0 waiver to maximize reuse and aid data miners Data Descriptor article: Licensed under one of three Creative Commons licenses, by author choice:
Diverse content from across the natural sciences
Ecology Associated Nature articles Data in figshare Integrated figshare data viewer Citizen science project
Neuroscience Code in GitHub New Dataset Data in OpenfMRI Source code in GitHub Big Data
Synthesis Analysis Conclusions What did I do to generate the data? How was the data processed? Where is the data? Who did what when Methods and technical analyses supporting the quality of the measurements. Do not contain tests of new scientific hypotheses Data Descriptor relation with traditional articles
Advisory Panel Guide the development, policies, standards and editorial scope of Scientific Data. Senior scientists from academia and industry along with representatives from the data repository, librarian, biocurator and funder communities.
Editorial Board Active scientists oversee peer-review Peer-review assesses The completeness of the description Alignment with community standards Data deposition in an appropriate repository Technical quality of the measurements Reuse value
Scientific Data & the University of California Advisory Panel Patricia Cruse, CDL Joseph Ecker, Salk & UCSD Editorial Board Michelle Arkin, UCSF Trey Ideker, UCSD Maryann Martone, UCSD Adam Renslo, UCSF Amir AghaKouchak, UCI 27
Now launched! Visit nature.com/scientificdata Email firstname.lastname@example.org Tweet @ScientificData Honorary Academic Editor Susanna-Assunta Sansone Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators Supported by Helping you publish, discover and reuse research data Thanks for your time!