PaN-data WP7 - Integration Brian Matthews STFC-e-Science.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

Grey Literature, Institutional Repositories and the Organisational Context Simon Lambert, Brian Matthews & Catherine Jones Business & Information Technology.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
CLADDIER project fundamentals Citation, Location and Deposition in Discipline and Institutional Repositories Sam Pepler Project Manager BADC CLADDIER workshop,
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Towards an information model for I2S2
UK Digital Curation Centre : enabling research data management at the coalface Dr Liz Lyon Associate Director DCC / Director UKOLN University of Bath,
I2S2 - Infrastructure for Integration in Structural Sciences Cross-Institutional Pilot
I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop RAL 11 th February 2010
ICAT + Information Model Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory
Slide: 1 Welcome to the workshop ESRFUP-WP7 User Single Entry Point.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
Research data: lifecycle, plans and planning SEQld Data Intensive 29 th January 2015 Kathryn Unsworth.
IDENTIFIERS & THE DATA CITATION INDEX DISCOVERY, ACCESS, AND CITATION OF PUBLISHED RESEARCH DATA NIGEL ROBINSON 17 OCTOBER 2013.
WP5 – Knowledge Resource Sharing and Management Kick-off Meeting – Valkenburg 8-9 December 2005 Dr. Giancarlo Bo Giunti Interactive Labs S.r.l.
University of Southampton, U.K.
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
NHPRC ELECTRONIC RECORDS RESEARCH FELLOWSHIP SYMPOSIUM Nov. 19, 2004 Rebecca Schulte University of Kansas Project Title: Testing Boundaries—An Exploration.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.
PaN-data Meeting 4-5 October 2010 HZB, Berlin. Project Summary.
PaNdata Europe Midpoint workshop 8-10 February 2011 Soleil, Paris PaN-data Europe – building a sustainable data infrastructure for Neutron and Photon laboratories.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Agenda: DMWG SM policy status ESIP meeting recap Reminder - DM Webinar Series New and updated web pages on DM website Metadata Training Sessions CDI meeting.
Evolving Roles in Scholarly Communications Susan Reilly, APA, Frascati, 7th Nov, 2012.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Making Connections: SHARE and the Open Science Framework Jeffrey Open Repositories 2015.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning.
Context and Linking in the Research Lifecycle CERIF and other standards Catherine Jones Scientific Information Group Scientific Computing Department STFC.
Scholarly communications Discussion group Linked Data Workshop May 2010.
BlogForever Project Presentation Vangelis Banos, Project Manager, ALTEC Software Stratos Arampatzis, Dissemination Manager, Tero Dr. Alexandra Cristea,
Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries.
Jump to first page (o ns) Modernising Statistical Systems to improve Quality The experiences of the Office for National Statistics (ONS) Presented by Emma.
Cross-linking and Referencing Data and Publications in CLADDIER Brian Matthews, E-Science Centre, STFC Rutherford Appleton Laboratory.
Jamie Hall (ILL). SciencePAD Persistent Identifiers Workshop PANData Software Catalogue January 30th 2013 Jamie Hall Developer IT Services, Institut Laue-Langevin.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
VIVO and Scholarly Repositories: Synergistic Opportunities.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
U.S. Department of the Interior U.S. Geological Survey August 24-25, 2011 Data Management Best Practices: FY11 Report.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Experimental Context, Publishing and Research Objects Brian Matthews STFC.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Thomas Gutberlet HZB User Coordination NMI3-II Neutron scattering and Muon spectroscopy Integrated Initiative WP5 Integrated User Access.
Brian Nosek University of Virginia -- Center for Open Science -- Improving Openness.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
International Planetary Data Alliance Registry Project Update September 16, 2011.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Data Ingestion in ENES and collaboration with RDA
Summit 2017 Breakout Group 2: Data Management (DM)
Research Data Context Preservation in SCAPE
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Bird of Feather Session
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

PaN-data WP7 - Integration Brian Matthews STFC-e-Science

Integration Workpackage Last work-package to start – M8 (January) – Goes on to the end of project Dependencies on outcomes of other WPs – Users, Data, Software Deliverables – D7.1: Report on survey of publication repositories, cross-linking and long-term preservation (M12). – D7.2: Proposal for integration of practices (M16). – D7.3 : Final report on standards for publication repositories, cross- linking and long-term preservation (M18) STFC (4 SM), DLS (2 SM), 0.5SM Early, so now general ideas on the work in the area. – Get the right people together in advance, – Quite an open ended work-package – Start thinking

WP7 Development of standards for integration and cross-linking of outputs Objectives To foster the integration of the whole science lifecycle, focussing on linking of publications and data, interaction between institutional repositories of publications, packaging for long-term preservation, and services for search and reuse. Methodology: Publications repositories complete the lifecycle of innovation. Linking to Users, Data and Software enable traceability of published results through the scientific process. Sharing of the final results provides a foundation for the next cycle of science, and packaging enables long-term preservation of the outputs of research. Association of data with the publications resulting from it is a basis for preservation through Representation Information—a term from the OAIS standard (Open Archival Information System), meaning information necessary to ensure continued understandability and usability of a digital resource. Furthermore, this is also a basis for reuse of data across diverse communities, since the supplementary information needed for continued understandability is also valuable for transfer across communities. The European Support Action PARSE. Insight (of which STFC is WPL) is producing a roadmap for digital preservation in Europe, informed by a large-scale survey of attitudes and practices in a wide range of scientific disciplines. The roadmap includes components such as tools for creation of Representation Information, and will be taken into account in the project work. Task 7.1: Review existing provision for publication repositories, citation recording and long-term preservation in use across the facilities and in the user community, including facility libraries. (M8-M12) Task 7.2: Propose strategy on integration of practices across the community (M12-M16). Task 7.3: Develop final proposal on integration of practices across the community (M17-18). (Note: the final workshop to disseminate the results of the work package takes place in WP3) Deliverables D7.1: Report on survey of publication repositories, cross-linking and long-term preservation (M12). D7.2: Proposal for integration of practices (M16). D7.3 : Final report on standards for publication repositories, cross-linking and long-term preservation (M18)

Objective 7 – Integration and cross- linking of outputs To foster the integration of the whole science lifecycle, focussing on linking of publications and data, interaction between institutional repositories of publications, packaging for long-term preservation, and services for search and reuse.

Desired Information Flow Reference Linking Research Outputs User registration data; Instrument allocation data etc. Comments, annotations, ratings etc. Risk assessment data; other sample data Analyse Derived Data Research Concept and/or Experiment Design Acquire Sample Peer-review Proposal Conduct Experiment Generate, Create, & Collect Raw Data Process Raw Data into Derived Data Interpret & Analyse Results Data Archive, Preservation & Curation IPR, Embargo & Access Control Validate, Reuse & Repurpose Data Publish Research Results DataDerived DataProcessed Data Raw, Correction & Calibration Data Papers, articles, presentations, reports I2S2: An Idealised Scientific Research Activity Lifecycle Model Documentation, Metadata & Storage (Reference, Provenance, Context, Calibration etc.) Start Project Write Proposal (include DMP) Scholarly Knowledge Write Usage Reports Publication Database Research ActivityResearch Admin Activity Archive Activity Information Flow KEY Prepare Supplementary Data Prepare Manuscript Peer Review Research Discover & Access Appraisal & Quality Control Programs (generate customised software) Publication Activity Integration and linking via - Common information exchange model - Common tools, services and protocols

Facilities Lifecycle Proposal Approval Scheduling Experiment Data storage Record Publication Scientist submits application for beamtime Facility committee approves application Facility registers, trains, and schedules scientist’s visit Scientists visits, facility run’s experiment Subsequent publication registered with facility Raw data filtered, cleansed and stored Data analysis Tools for processing made available Link Why Link? - Discovery of results - Auditing of usage of facility - Allowing greater reuse of data - Validation of results

Raw Data Data Analysis Analysed Data Publication Data Publications Facility 1 Raw Data Data Analysis Analysed Data Publication Data Publications Facility 2 Raw Data Data Analysis Analysed Data Publication Data Publications Facility 3 Capacity Storage Publications Repositories Data Repositories Raw Data Catalogue Data Analysis Analysed Data Catalogue Publication Data Catalogue Publications Catalogue Single Infrastructure  Single User Experience Software Repositories

Objective 7 – Integration and cross- linking of outputs To foster the integration of the whole science lifecycle, focussing on linking of publications and data, interaction between institutional repositories of publications, packaging for long-term preservation, and services for search and reuse. Outcomes 1.promote the linking of publications,... to the data on which they are based, 2.foster the development of interaction between repositories of publications,... 3.work towards packaging the full scientific results of particular experiments for archival purposes,... aimed at the long-term preservation of the data and other results, 4.define search services... which will enable single searches..., and importantly will open up the possibility of reuse of data across different disciplines through the same mechanism of packaging for archival with the needed supplementary information for understanding and reuse.

Issues Existing repositories Data citation Constructing and maintaining links – Identifying users, data resources, software – Federating and accessing linked infrastructure – Linked Web of Data Digital preservation Packaging and access

Existing publication management systems What existing methods do facilities use to track publications arising from work at their facilities? – In house – Libraries – Public services – Entry points

Citation of data – Persistent Identifiers (e.g. DOIs ) – Standard ways of citing data – Who do you cite? – What do you cite Raw data, Derived data Data delivered to publishers – Data policy

Linking publications and data Find datasets that in repositories which are used to derive publications. Find papers which are written from datasets. – Can validate the results of the paper – Can perform new secondary analyses – Can judge the value of a data set from its use – Can give credit to data providers, tracing usage – Can also add forward links to paper- to evaluate their use.

Constructing Links Ideally the archives holding the data would be notified that a paper citing them had been submitted. – Metadata associated with those records would be updated to reflect the citations. – The metadata in the publication repository should also link to the metadata in the data archives and vice versa. – It would be great if this notification could be done automatically. Tedious to enter citations “forward citations” (“cited-by”) are hard to track Builds a citation graph – Fits well with the notion of “Linked Web of Data” – Could easily be extended to other components Derived data Software

Preservation Preservation policies and planning – What data to preserve, for how long ? Procedures for managing preservation – Persistent Ids – Maintaining media – Maintaining Links – Maintaining context Representation information Packaging preserved data for access to users

Access Cross-searching – Common metadata models – Common services E.g. TopCat front end on ICat – Cross-searching Complex data objects – OAI-ORE – SPARQL end-points OAIS packages

Tasks Task 7.1: Review existing provision for publication repositories, citation recording and long-term preservation in use across the facilities and in the user community, including facility libraries. (M8-M12) – D7.1: Report on survey of publication repositories, cross- linking and long-term preservation (M12). Task 7.2: Propose strategy on integration of practices across the community (M12-M16). – D7.2: Proposal for integration of practices (M16). Task 7.3: Develop final proposal on integration of practices across the community (M17-18) – D7.3 : Final report on standards for publication repositories, cross-linking and long-term preservation (M18)

Who should be involved? All partners involved – Representation from managers of records of publications (libraries) Set up a wiki group to start thinking of issues and approaches Evaluate user, data, software outputs for integration Collect information on suitable publication repositories Collect information on suitable initiatives and standards – Data integration and linking – Data preservation – Persistent identifiers – Data citation Begin to evaluate for best practice Ready to participate with outlines at M9 workshops

eCrystals

eCrystals citation management screen

Publishes a Trackback URI ePubs publication

Invoking the trackback Enters the trackback URI

A citation of the paper

A Citation of the data