Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paul Price Dow Chemical Company

Similar presentations


Presentation on theme: "Paul Price Dow Chemical Company"— Presentation transcript:

1 Paul Price Dow Chemical Company pprice@dow.com
A Perspective Paul Price Dow Chemical Company

2 Publications are changing
Leather-bound journals and dedicated libraries, the format of the scientific paper, weird abbreviations (Tox. & App. Pharm.) Recent on the need for packing materials Dump the filing cabinets - PDF/HTML replaces paper (free color!) Paper journals are evolving into curated web sites Upsetting the status quo – No technical reason for not sharing detailed technical findings

3 Sharing data Ethical issues for not sharing
Privacy of individuals Economic reasons for not sharing Intellectual property rights Charging for access: the economics of journals and data owners Academics: My career depends on mining my data on my schedule Internet-based expectations I expect to see everything from home using my web browser

4 Social contracts Permission to sell is contingent on demonstrating safety Credence for findings is less contingent on peer review and more contingent on sharing relevant data Science that supports regulatory decisions needs to be in the sunlight

5 Parting thought When I share data I am asking the world “can someone do a better job then me in understanding the data?” When I withhold data I am saying “no one can do a better job then me in understanding the data” Therefore journals should require the sharing of raw data as a condition or publication

6 Data Access: Issues and Opportunities
Alan F. Karr National Institute of Statistical Sciences February 13, 2012

7 Points for Discussion The problem is hard “The data” is ill-defined
Players are responding rationally to incentives Not “one size fits all” “The data” is ill-defined “Availability” is vague: what about Cost Liability Tech support Co-authorship Data subjects Reproducibility (data + code) vs. replicability (data only?) There are effective mechanisms for access, based on statistical disclosure limitation

8 The Analysis Matters

9 Data Dissemination: High-Level View

10 Should Journals Require the Release of Supporting Data as a Condition of Publication?
Jane C. Schroeder, DVM PhD Science Editor, Environmental Health Perspectives

11 No.

12 Is it a given that access to raw data will advance knowledge?
Why is access to raw data desirable? To advance scientific knowledge Is it a given that access to raw data will advance knowledge?

13 How would access advance knowledge?
1. Identify unintentional errors Data entry errors, transcribing, labeling Errors in coding, misconstrued variables Copy editing errors Some can be identified by a careful review of reported results Avoid via documentation, data management, internal review Some would require truly raw data

14 How would access advance knowledge?
2. Identify scientific misconduct If the perpetrator is competent, unlikely to be evident If not competent, likely to be multiple cues Plagiarism, inconsistent logic, incredible findings If access to raw data is the only way to prevent fraud, we are in trouble

15 How would access advance knowledge?
3. Identify “errors” in decision-making Such “errors” may represent legitimate differences There is no single “best way” to analyze data However, decision-making should be completely transparent

16 How would access advance knowledge?
4. Reduce the time from data collection to full dissemination Investigators must be able to recoup their investment of time and effort Loose jobs  no data for anyone Confidentiality, informed consent agreements

17 What should journals do?
Careful & detailed reviews, including requests for code, data when appropriate Require complete methods Rationale/criteria for decisions Information on data management, QA/QC Require information to assess study quality Missing data, participation, drop-out, numbers of observations

18 What should journals do?
Require full reporting of all results used to support key analytic decisions and conclusions Essential when interpretation is subjective or criteria are not widely accepted Null findings as well as positive ones Sensitivity analyses of assumptions, alternate approaches Supplemental material, external archiving Review and update policies when it is in the best interest of science communication to do so

19 What should the community do?
Discipline-appropriate standards for data management, QA/QC, and reporting Bona fide internal reviews before publication Support for costs of data sharing Encourage and reward analyses of combined data from multiple studies Avoid regulations that may ultimately impede scientific advancement by serving some members of the community at the expense of others

20 Introducing the Dryad Digital Repository Society of Toxicology webinar
February 2013 Peggy Schaeffer 20 datadryad.org

21 Many journals require data sharing upon request
Psychology Requested data from 141 articles “6 months later, after … 400 s, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” data was obtained from 27% of articles. Wicherts et al. (2006). Am. Psych. 61: Genetics 47% of respondents denied a request for data or materials w/in 3 yrs 28% unable confirm others’ published research as a result. #1 reason for data withholding (80%): effort required to share it. Campbell et al. (2002) JAMA (4): Many journals require data sharing upon request, but studies have shown that this person to person approach just doesn’t work, even in the short term. Wicherts: All authors had signed a commitment to share their data upon request. Received only 38 positive reactions datadryad.org

22 Data archiving has many benefits
Direct Verification of published research Preserving accessibility to data Allowing reuse and repurposing of data Discoverability of data Indirect (costs avoided) Redundant data collection Inefficient legacy data curation Burden of sharing-upon-request Opportunity cost of science not done Near term Protection against personnel turnover Availability for review and validation Long term Secure long-term stewardship Increased impact per publication Private Increased citations New collaborations New research opportunities Fulfilling funding mandates Public More efficient use of research dollars Public trust in science Educational opportunities Improved methodologies More informed policy Dryad is designed to realize the substantial benefits of data archiving, while mitigating the costs. Modified from Beagrie et al. (2009) Keeping Research Data Safe 2 datadryad.org

23 Joint Data Archiving Policy
[Journal] requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as [list of approved archives here]. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. The Joint Data Archiving Policy (JDAP) describes a requirement that supporting data be publicly available. This policy was adopted in a joint and coordinated fashion by many leading journals in the field of evolution in 2011, and JDAP has since been adopted by other journals across various disciplines. Additional journals are welcome to endorse and implement JDAP, or use it as a model. Journals that adopt JDAP often recommend Dryad as an appropriate data repository, however the JDAP initiative is distinct from Dryad. datadryad.org

24 Why use Dryad rather than Supplementary Online Materials?
LIFE3: Predicting Long Term Preservation Costs - Brian Hole, British Library Why use Dryad rather than Supplementary Online Materials? Dryad SOM Discoverable: indexed and exposed to both web and bibliographic search engines Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers ✗* Permanent: processes in place to promote preservation (incl. format migration) ✔/✗** Curated: quality control by both automated processes and human inspection Ease of deposit: streamlined deposit, allowance for large and complex datasets Formatted for reuse: support for non-PDF file formats Updatable: new versions of data files can be added, metadata can be enhanced Support for embargoes: can delay release of data in accordance with journal policy Free reuse: no paywall, clear terms of reuse (all data released under CC Zero) Economy of scale: cost efficiency from shared infrastructure Alignment to organizational mission: focus on archiving and reuse of scientific data A frequent question is what Dryad can offer that publishers do not already provide with supplementary online materials. In principle, publishers could offer many of the same features, but in practice most do not, and are not inclined to invest heavily in improving the services offered. * A few publisher SOM sites are exceptions to the general rule. ** Practices differ among publishers, see Smit (2011), doi: /january2011-smit datadryad.org KeepIt Training Course 05/02/10

25 Researchers are using Dryad for data archiving…
As of 7 Feb-2013, Dryad contains 7306 data files associated with 2662 publications from 191 different journals Dryad currently receives over 100 submissions a month, and the rate of submission continues to grow steadily. About one quarter of submissions are volunteered from authors publishing in non-partner journals. datadryad.org datadryad.org

26 and using the data for research…
This is the landing page for a data package, showing all the data files archived in association with a given article. (There are 7 files of different file types displayed below the abstract.) There is evidence for widespread data reuse. This 2012 data package been viewed over 800 times, and its complete dataset downloaded over 600 times. For data packages deposited in 2012, the median number of downloads for the most popular file in each data package is twelve. datadryad.org datadryad.org

27 Over 25 integrated journals
.. and 20 more on the way Working with: Biology Letters BMJ F1000 Research PLoS Biology, PLOS Genetics datadryad.org

28 Trustworthy repository infrastructure
Making data available is the primary mission of the organization No pay-walls or restrictive licenses (all released under CCZero) The same data may be hosted by other services (non-exclusivity) Built on the DSpace repository platform An open source framework used by hundreds of institutional repositories Multiple machine and human interfaces for discovery and access Dublin Core metadata harvestable through OAI-PMH DOIs registered through DataCite Curation-enhanced metadata to enhance keyword searching Indexed by Web of Science and other bibliographic services Assurance of data integrity and permanent availability Service mirroring and backup File migration and bit-level integrity assurance Organizational failover through DataONE and (soon) CLOCKSS Dryad provides a trustworthy infrastructure to preserve data files in perpetuity. datadryad.org

29 Governance Not-for-profit organization
Incorporated in North Carolina (USA) Membership is open to a diversity of stakeholder organizations Scientific societies, publishers, funding agencies, universities, libraries, etc. Members need not publish a partner journal Governed by a rotating 12-member Board of Directors, nominated and elected by the membership datadryad.org

30 Sustainability Long-term preservation requires an organization with a viable business model Not dependent on the vagaries of grant funding Or the largesse of an institution that may have other priorities Revenue will be primarily from deposit fees This enables Dryad to make access to the data free in perpetuity The time of deposit is when the majority of costs are incurred Revenue scales with costs (i.e. volume of deposits) The costs are distributed both fairly and widely Additional revenue Membership fees ($1000/yr) will cover costs of annual Membership meetings Project grants will supplement the operational budget for R&D activities With research and development activities funded by grants at various institutions (e.g. Duke University, Univ. of North Carolina at Chapel Hill) Dryad’s leaders have worked on sustainability since the project began in 2008; the current business plan utilizes 2 fees: membership fees, and deposit fees. datadryad.org

31 Payment plans Plan Contract? Paid by Non-member Cost1 Subscription yes
Journal, society, or publisher, in advance Based on total annual volume of research $30/article Deferred payment Journal or other sponsoring organization, invoiced periodically for prior deposits $75/data package2 Voucher Journal or other sponsoring organization, paid in advance $70/data package Pay on deposit no Author, at time of deposit $80/data package, with a process for granting waivers for authors from less-developed countries The Dryad Board has voted to implement these payment plans in September of this year. There are 3 plans for deposit fees, Designed to accommodate partner journals with very different business models (subscription, open-access, etc.) a journal-- or a publisher or society, or a research institute--- can choose from the prospective or retrospective options There is a 10% discount for membership in the Dryad organization The last option, where Author Pays, is designed for cases when authors wish to archive data associated with an article in a journal not affiliated with Dryad; this carries higher costs for us since we don’t have the metadata from the journal. . 1 Up to a fixed deposit size (currently 10GB). Additional charges for larger deposits. 2 Data package = all the data associated with an article. datadryad.org datadryad.org

32 The value proposition For researchers, Dryad…
increases the impact of, and citations to, published research preserves and makes available others’ data frees researchers from the burden of data preservation and access For societies, journals, and publishers Dryad… offers more visibility for research outputs promotes prestige for the discipline supports a wide range of journal policies on data sharing frees journals from the burden of maintaining supplemental data For libraries and institutions, Dryad… makes data available at no cost, under clear terms of use helps fulfill their research data management mandates For funders, Dryad… provides a cost-effective mechanism to make research more accessible Dryad offers services and assets to all the major players in the scientific research cycle For Authors and researchers: Enables citable data Offers access to data to verify published results, to refine methodologies, and to repurpose. For societies, more visibility for their research Prestige for the discipline For journals and publishers: enables them to increase the discoverability and impact of their articles, and to add value to the communities they serve-- W/O SOM File migration --- Excel 2010 to Excel 2020 datadryad.org datadryad.org

33 To learn more Repository home: http://datadryad.org
News: Project documentation: Facebook: contact us: Todd Vision, Project Director, Laura Wendell, Executive Director, Peggy Schaeffer, Communications Coordinator, Let me know your questions, and THANK YOU ! datadryad.org datadryad.org


Download ppt "Paul Price Dow Chemical Company"

Similar presentations


Ads by Google