Context and Linking in the Research Lifecycle CERIF and other standards Catherine Jones Scientific Information Group Scientific Computing Department STFC.

Slides:



Advertisements
Similar presentations
Grey Literature, Institutional Repositories and the Organisational Context Simon Lambert, Brian Matthews & Catherine Jones Business & Information Technology.
Advertisements

Towards an information model for I2S2
I2S2 - Infrastructure for Integration in Structural Sciences Cross-Institutional Pilot
I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop RAL 11 th February 2010
ICAT + Information Model Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory
PaN-data WP7 - Integration Brian Matthews STFC-e-Science.
The Role of Environmental Monitoring in the Green Economy Strategy K Nathan Hill March 2010.
A multi-level metadata approach for a Public Sector Information data infrastructure Nikos Houssos 1,2, Brigitte Jörg 1,3, Brian Matthews 4 1 euroCRIS 2.
Managing your research data: University support for researchers Sally Rumsey The Bodleian Libraries University of Oxford Mary Harssch
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
Superconducting Undulator Workshop Rutherford Appleton Laboratory 28 th & 29 th April 2014 Jim Clarke STFC Daresbury Laboratory.
Data Catalogue Service Work Package 4. Main Objective: Deployment, Operation and Evaluation of a cataloguing service for scientific data. Why: Potential.
PaNdata Photon and Neutron Data Infrastructure I2S2Meeting 1 April 2011 Juan Bicarregui.
©STFC/Keith G Jeffery Metadata in the European e-Infrastructure Metadata in the European e-Infrastructure Keith G Jeffery Science and Technology.
School of something FACULTY OF OTHER University Library The Library’s Digital Repository or Whatever happened to MIDESS? Michael Emly Jonathan Ainsworth.
THE JOINED UP WORLD OF E-RESEARCH Professor Neil McLean National Technical Standards Adviser to the Department of Education Science and Training (DEST)
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.
Catherine Jones Science and Technology Facilities Council SCAPE Training Statsbiblioteket, Aarhus, November 2013 Control Policy formulation The why.
Scientists are Sensitive too: Some Issues in Research ethics arising from Data Sharing Brian Matthews Scientific Information Group Scientific Computing.
Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.
Data Management Development and Implementation: an example from the UK SLA Conference, Boston, June 2015 Geraldine Clement-Stoneham Knowledge and Information.
Integrating Digital Curation in a Digital Library curriculum: the International Master DILL case study Anna Maria Tammaro University of Parma Florence,
PaNdata Europe Midpoint workshop 8-10 February 2011 Soleil, Paris PaN-data Europe – building a sustainable data infrastructure for Neutron and Photon laboratories.
Challenges & opportunities in the preservation of (digital) information: the case of European research libraries Museo de las Ciencias Teatro de UNIVERSUM.
Data Infrastructures Opportunities for the European Scientific Information Space Carlos Morais Pires European Commission Paris, 5 March 2012 "The views.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Why persistent identifiers are crucial in digital preservation.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
IRU 7th Euro-Asian Road Transport Conference & Ministerial Meeting Amman, Jordan, June 2013 Building Safe & Sustainable Transport Links Kiran K.
Metadata for Large Science: The ICAT Data Model Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory.
1 INFRA : INFRA : Scientific Information Repository supporting FP7 “The views expressed in this presentation are those of the author.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
E-science in the Netherlands Maria Heijne TU Delft Library Director / Chair Consortium of University Libraries and National Library.
Manjula Patel Scaling-up to Integrated Research Data Management Workshop 6 th International Digital Curation Conference Holiday Inn, Mart Plaza Chicago,
1 Web: Steve Brewer: Web: EGI Science Gateways Initiative.
Support for eResearch in a Research Council Setting Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton Laboratory.
Data Management in Scholarly Journals and possible Roles for Libraries – Some Insights from EDaWaX Sven Vlaeminck | Leibniz-Information Centre for Economics.
The Faster Research Cycle Interoperability for better science Brian Matthews, Leader, Information Management Group, E-Science Centre, STFC Rutherford Appleton.
Jamie Hall (ILL). SciencePAD Persistent Identifiers Workshop PANData Software Catalogue January 30th 2013 Jamie Hall Developer IT Services, Institut Laue-Langevin.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Metadata for structural science Workshop on research metadata in context Nijmegen, 7–8 September 2010 Simon Lambert STFC e-Science UK.
Challenges of Coping with Funding and Data Management in a Changing World Rick Lyons Director Infectious Disease Research Center.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
PaNdata ODI Open Data Infrastructure INFRA : Data infrastructures for e-Science PaNdata-ODI will develop, deploy and operate an Open Data Infrastructure.
CombeDay Making Data Openly Available Simon Coles.
Importing record using DOIs Catherine Jones & Robert Darby eScience Centre, Science & Technology Facilities Council.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
ICAT Status Alistair Mills Project Manager Scientific Computing Department.
A look into current and future trends in national policies for eHealth and Innovation in the WHO European Region Clayton Hamilton, eHealth and Innovation.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
E-infrastructure requirements from the ESFRI Physics, Astronomy and Analytical Facilities cluster Provisional material based on outcome of workshop held.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Enhancements to Galaxy for delivering on NIH Commons
An Approach to Software Preservation
Scientific Computing Department
Pasquale Pagano CNR, Italy
Data Ingestion in ENES and collaboration with RDA
Research Data Context Preservation in SCAPE
National e-Infrastructure Vision
DATA SPHINX & EUDAT Collaboration
Open Science: the crucial importance of metadata
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
STFC case study: PhD research graph
Presentation transcript:

Context and Linking in the Research Lifecycle CERIF and other standards Catherine Jones Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory

The science we do Research Data lifecycle Drivers for developments Infrastructure to support data management

The science we do

Science and Technology Facilities Council Provide large-scale scientific facilities for UK Science –particularly in physics and astronomy –ISIS and Diamond Light Source facilities Scientific Computing Department –Provides advanced IT development and services to the STFC Science Programme –Strong role in management of our science data –Computational science and engineering

Large-Scale Facilities Big Facilities for Small Science

The Science we do - Structure of materials Fitting experimental data to model Bioactive glass for bone growth Structure of cholesterol in crude oil Hydrogen storage for zero emission vehicles Magnetic moments in electronic storage ~30,000 user visitors each year in Europe: –physics, chemistry, biology, medicine, –energy, environmental, materials, culture –pharmaceuticals, petrochemicals, microelectronics Longitudinal strain in aircraft wing Diffraction pattern from sample Visit facility on research campus Place sample in beam Billions of € of investment –c. £400M for DLS –+ running costs Over high impact publications per year in Europe –But so far no integrated data repositories –Lacking sustainability & traceability

Research Lifecycle

Vision for STFC data/publications Data generated at STFC Facilities is discoverable and reusable. –Creator privilege, commercial or IP considerations not withstanding Stages in the research lifecycle linked in a machine readable way Impact measurement –Effective and shareable – CERIF has a role here. Retrievable context for the future

Research lifecycle proposal approvalexperiment Data production Data management Data analysis Record publication info Internal to the Organisation requirements External requirements

Research lifecycle proposal approvalexperiment Data production Data management Data analysis Record publication info Links to organisational info: people, projects, organisational structure Provenance and context for the results – machine readable links from data to publication

Why capture the lifecycle and linkage? Explicitly links the stages in the process –Makes each different kind of data part of a bigger process Easy for the scientists –Linking the notification of publications from the last proposal to the next proposal –Reduces the need for re-keying Provide the evidential basis for research –Validate and verify publications –Safeguard against error or fraud Measure the impact of science –Provide information on the value of the facility to service providers, funders and researchers –Influence the policy makers Reuse of data –Get new science from old data –Non-repeatable results –Value for money –Teaching material –Comparative studies Encourages good data management practices –RCUK directives –Data Preservation considerations at data creation stage

Drivers for developments in this area

Policy RCUK/UK Government –Open Data; Open Access to publications –Impact agenda –Active data management This includes preservation

Technological/Scientific developments Standards for interchange –CERIF; DC & domain specific Interest in capturing analysis stages to enhance provenance of data Electronic Lab notebooks Social media and online communities Persistent identifiers for digital objects Possibilities for linking objects

Infrastructure to support data management

Key tools for STFC ICAT – data catalogue ePubs – publication repository DataCite – assigning DOIs to data Safety Deposit Box – ISIS preservation tool

ePubs – STFC’s publication Repository Aims to collect the scientific and technical output of the Laboratories Standard metadata concerning publications Needs to be able to link the publication to its context: data; organisational structure

FRBR for publications Conceptual Model 4 levels: Work; Expression, Manifestation and Item Related items include People Enables linking of related objects ePubs uses this as the conceptual model

CSMD for Data –underpins ICAT Investigation PublicationKeywordTopic Sample Sample Parameter Dataset Dataset Parameter Datafile Datafile Parameter Investigator Related Datafile Parameter Authorisation CSMD: Core Scientific MetaData model Designed to describe facilities based experiments in Structural Science Forms the information model for ICAT, a production data management infrastructure employed by STFC Forms the basis for extensions: - To derived data - To laboratory based science - To secondary analysis data - To preservation information - To publication data

Other projects working to realise this vision WebTracks –linking publications and data ePubs revamp –considering reporting impact requirements (CERIF possibilities) SCAPE –EU project considering scalable digital preservation PANDATA – Consortium of Photon and Neutron sources in Europe

Conclusions Many more reasons for sharing data – or information about the data Need to be able to use appropriate standards for data exchange Interest in linking the stages in the Research Lifecycle Requirements for impact reporting

Thank You Questions?

PaN-Data Vision Single Infrastructure  Single User Experience Capacity Storage Publications Repositories Data Repositories Software Repositories Raw Data Catalogue Data Analysis Analysed Data Catalogue Publication Data Catalogue Publications Catalogue Raw Data Data Analysis Analyse d Data Publication Data Publication s Facility 1 Raw Data Data Analysis Analyse d Data Publication Data Publication s Facility 2 Raw Data Data Analysis Analyse d Data Publication Data Publication s Facility 3 Different Infrastructures  Different User Experiences

... to construct and operate a shared data infrastructure for Photon and Neutron laboratories... Neutron diffraction X-ray diffraction High-quality structure refinement Common data catalogue Integration of users data from different facilities Track provenance of data through analysis stages Deploy standards for long-term curation Support scalability through parallelisation Deploy infrastructure in three different techniques PaN-data ODI – an Open Data Infrastructure for European Photon and Neutron laboratories Driver 4: Interoperability across Facilities

A Data Management Architecture Generic Can be applied to different customers Robust Can be monitored and maintained Fast Manages large rates of data ingest Scalable Manages the storage of very large amounts of data Secure Allows role-based access control to be applied Integrity Data Verification at ingest Does not lose or mis-identify data over time Monitoring Must generate reports