Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K.

Slides:



Advertisements
Similar presentations
Research Data Access and Preservation Summit Panel 2 - Promoting Re-Use of Scientific Collections Some responses to the questions posed... John Harrison.
Advertisements

April 2010 MRC Data Sharing Policy Peter Dukes Policy Lead – Data Sharing & Preservation.
Institutional repositories and SHERPA Stephen Pinfield University of Nottingham.
Repositories, Learned Societies and Research Funders Stephen Pinfield University of Nottingham.
Open Access - Implications for research funding, management and assessment ARMA Conference 9 th June 2010 Bill Hubbard Centre for Research Communications.
The Cost of Open Access? RCS Workshop Conference Aston 23rd July 2010 Bill Hubbard Centre for Research Communications University of Nottingham.
The Repositories Support Project (RSP) JISC e-Science All Hands Meeting Sept 2007 Gareth J Johnson.
Scholarly Communications in Flux Michael Jubb Director, Research Information Network Bloomsbury Conference on E-Publishing and E-Publications 29 June 2007.
Professor Dave Delpy Chief Executive of Engineering and Physical Sciences Research Council Research Councils UK Impact Champion Competition vs. Collaboration:
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Linking Data and Publications: the Chemistry Way Simon Coles School of Chemistry, University of Southampton, U.K. CLADDIER workshop.
1 Working together to strengthen research in Europe Open access and preservation: how can knowledge sharing be improved in ERA? (session 1.5) Alma Swan.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
© S.J. Coles 2006 eCrystals: A Route for Open Access to Small Molecule Crystal Structure Data Simon Coles School of Chemistry, University of Southampton,
A centre of expertise in digital information management UKOLN is supported by: Curating the Scientific Record: The Challenges Ahead Dr.
A centre of expertise in digital information management UKOLN is supported by: Dealing with Data: Roles, Rights, Responsibilities & Relationships.
UKOLN is supported by: Digital Repositories Roadmap: looking forward The JISC/CNI Meeting, July 2006 Rachel Heery Assistant Director R&D, UKOLN
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
A centre of expertise in digital information management UKOLN is supported by: British Academy e-Resources Policy Review: UKOLN Report.
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
A centre of expertise in digital information management UKOLN is supported by: Changing Roles, Responsibilities and Relationships Dr Liz.
A centre of expertise in digital information management UKOLN is supported by: Research Data & Institutions Roles & Responsibilities? Dr.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
A centre of expertise in digital information management UKOLN is supported by: Open Science and the Research Library: Roles, Challenges.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
A centre of expertise in data curation and preservation UKOLN Open ForumIWMW June 2006 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in digital information management A QA Framework To Support Your Library Web Site Review Brian Kelly UKOLN University of Bath Bath.
Information Professionals and Learning Object Repositories … more than just metadata quality … Sarah Currier Stòr Cùram Project Librarian JISC X4L Repository.
UCL Library Services and Research Data Management – a case study Martin Moyle UCL Library Services ODE Workshop, LIBER Conference, 27 June 2012.
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
University of Southampton, U.K.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
"Keeping alert: issues to know today for long-term digital preservation with repositories" Neil Beagrie Fedora Users Group Open Repositories Southampton.
© S.J. Coles 2006 Data Management in the Chemistry Domain Simon Coles School of Chemistry, University of Southampton, U.K.
The Changing Face of Research Anthony Beitz DART Integration Manager.
Experiences with Repositories and Blogs in Laboratories or ‘R4L: The Repository for the Laboratory’ Leslie Carr, Simon Coles & Jeremy.
Copyright 2006 M.R.Thorley/NERC Mark Thorley, Natural Environment Research Council Research Outputs: Their Access & Preservation A perspective.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
THROUGH OR AROUND? SCIENTIFIC RESEARCH DATA AND THE INSTITUTIONAL REPOSITORY Panel Presentation for the International Conference on University Libraries.
Digital/Open Access repositories Paul Sheehan Director of Library Services DCU HEAnet National Networking Conference Athlone 11 th November 2005.
15/06/2012 slide 1 OA and Research Information Josh Brown Programme Manager for Research Information Management and Scholarly Communications.
‘intelligent openness’ The common objective of an RCUK data policy Gregor McDonagh
11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
CombeDay Making Data Openly Available Simon Coles.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Recent Developments in Open Access Publication. What is Open Access? It’s about making publications freely available on the Web Peter Suber: “Open-access.
UKOLN is supported by: Library futures in the new research landscape. Dr Liz Lyon, UKOLN, University of Bath, UK CURL Members Meeting October 2004, London.
Introduction to Research Data Management Joy Davidson and Sarah Jones Digital Curation Centre
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Beyond the Repository: Research Systems, REF & New Opportunities William J Nixon Digital Library Development Manager.
Open Exeter Project Team
Moving on : Repository Services after the RAE
eCrystals Federation: Open Repositories for global Open Science
Research on Data Curation and Repositories
Open access in REF – Planning Workshop
Developing Institutional Data Repositories
eCrystals Federation: Open Repositories for global Open Science
Presentation transcript:

Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K.

Data Generation Synthesis Data Collection Data Workup Data Processing Publication

Data Types G bytes M bytes Lab / Institution Subject Repository / Data Centre / Public Domain k bytes RAW data DERIVED data RESULTS data

Incentives and Drivers Chemists dont think about their data! They need to understand that their data is valuable and has a use beyond that of an immediate gain, before they will consider curation issues. So what are the incentives and drivers? –Data Management –Data Deluge –Publishing Data –Validation, Assessment and Peer Review –Re-analysing Data –Data Reuse and Derivative Studies –Publishing and Funding Mandates

Curation Incentives - Data Management, Deluge & Publishing Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant Lost in some research assistants computer, the data are often irretrievable or an undecipherable string of digits To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science. Lost in a Sea of Science Data S.Carlson, The Chronicle of Higher Education (23/06/2006)

Curation Incentives - Data Management, Deluge & Publishing 30,000,000 2,000, ,000

Curation Incentives - Data Management, Deluge & Publishing

Separating Data from Interpretations Underlying data (Institutional data repository) Intellect & Interpretation (Journal article, report, etc)

The eCrystals Data Repository An Institutional Repository

The Repository for the Laboratory Search / Browse Deposit Create new compound Add experiment data and metadata

Curation Incentives - Validation & Peer Review

Curation Incentives - Raw Data Re-analysis Good dataDifficult data You never know when data might have to be revisited or new innovations will allow re-interpretation!

Curation Incentives - Funding and/or publishing mandates Mandates to store / make data available RCUK statement

Curation Incentives - Derivative Science Starting points for new science Derivation of knowledgebases

Curation Issues Need to engage stakeholders throughout the whole research data lifecycle: –Instrument manufacturers, –scientists, –archivists, –librarians, –subject repositories, –data centres, –publishers, –funders, –data miners & information providers

Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data

Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data –Future proofing… –Technology developments –eScience

Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community

Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community What data is worth storing? –Estimated that the real cost of a crystal structure is £75 - £100 ($200) –But what about the cost of producing the crystal? –Priceless! –The crystal was synthesised in a specialised laboratory, by highly trained researchers under a specific research program –A laboratory, researcher or scheme of work is a transient or evolving entity –As much data as possible must be acquired and future-proofed whilst the analyst has the substance to hand

Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community What data is worth storing? Provenance, workflow and rights protection

Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community What data is worth storing? Provenance, workflow and protection of rights Available expertise, library/information services structure Cost and policy Business models –Subject librarian model - working closely with practitioners –New funding/structure models to support open data as OA takes off –Working group to assess the volume and diversity of research data –JISC funded survey - Cost of preserving research data –Commercialisation of knowledge derived from collections of data

Dealing with Data Report, June 2007 Recommendations 1 JISC should develop a Data Audit Framework to enable all Universities & colleges to carry out an audit of departmental data collections, awareness, policies & practice… Each Higher Education Institution should implement an Institutional Data Management, Preservation & Sharing Policy, which recommends data deposit in an appropriate open access data repository and/or data centre where these exist.

Institutional Structure Encourage restructuring through strategic funding Rechannel existing funding routes Financial structure – money for self archive or OA publishing Physical structure – embed LIS/curation staff in departments for advocacy – need to go native. Library / Information services need to be introspective / reinvent

Advocacy Younger digital generation Elders will not listen Method to engage at departmental level Funders undervaluing work – need enlightening

Funding Small science Low budget / funding Hypo publishing Unsupported Initial target areas that are safe – i.e. no sensitive data

Practice Small science vs big science Instrumentation vs manual Automate data capture Heterogeneity/variety in practice Problems same in industry

Tools Seamless Simple to use Low barrier to use Integrated into familiar environment Self describing (generrate provenance and preservation metadata in the background) Tagging / controlled vocab tools / servers Vocab checking Browser tools (familiar to youth) Thin client tools – repository lite. Minimal management. Highly distributed repositories

eInfrastructure Semantic / controlled vocabulary central services

Economic models and value Data *NOT* valueless once published (EPSRC train of thought) What is the *value* of departmental level data – this is not necessarily monetary Department, institution, individual, data centre, pharma, government, research council, public, third party services/businesses We undervalue data Subject repository economic sustainability Evidence to back up advocacy