Presentation on theme: "A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on."— Presentation transcript:
a centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on data curation and preservation This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Funded by: Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University of Bath, UK
About UKOLN a centre of expertise in digital information management Funding: Joint Information Systems Committee (JISC) + Museums, Libraries & Archives Council (MLA) Portfolio of R&D projects Delos, DRIVER, Grand Challenge 29+ staff based at the University of Bath Inform the library, information, education and cultural heritage communities Policy, advocacy at national level, build innovative Web- based systems & services, R&D, e-journal Ariadne, workshops and conferences. http://www.ukoln.ac.uk/ Acknowledgement: Alex Ball, Grand Challenge Project
UK Digital Curation Centre Digital Curation Centre Funded by JISC & EPSRC Development activities Research agenda Delivering services Outreach Programme http://www.dcc.ac.uk/
a centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Overview Data curation and digital preservation issues Draw on research and scholarship perspectives Data / information flows and the business process UK Digital Curation Centre activities maintaining and adding value to a trusted body of digital information for current and future use
Data- centric 2020 vision Reference datasets as infrastructure?
(Very simple) Product Research Cycle & Data Curation Formulate ideas / hypothesis, test, experiment, observe, design: data creation, collection & capture Adding value: Data linking, annotation, visualisation, simulation (New) knowledge extraction: data mining, modelling, analysis, synthesis e-Infrastructure Open ?? access Collaboration Scholarly communications & Business transactions: data disclosure, publication, citation, discovery, re-use Data management storage & validation: description, deposit, self-archiving, preservation, certification Data processing This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0
RepoMMan: Repository Metadata and Management (Hull) using WS-BPEL Are your engineering workflows identified and described? Workflow e-Scientist desktop? Slide: Carole Goble
Research outputs in institutional repositories: engineering
JISC Vision: a global landscape of federated repositories fusion layer repository federator repository portal heterogeneous - metadata formats, content formats, identifiers, packaging standards homogeneous - metadata formats, content formats, identifiers, packaging standards From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jisc- ie/arch/presentations/jiie-jcs-2005/ Multi-disciplinary, cross- sectoral National, institutional Different platforms Many format types: data, eprints, images, geospatial e-Framework and Information Environment context Define common + domain- specific + repository services Interoperability based on open standards, software tools
Pilot Engineering Repository Xsearch PerX http://www.engineering.ac.uk/
a centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006
Repositories and OAIS Reference Model an archive consisting of an organisation of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community..an identified group of potential consumers who should be able to understand a particular set of information
Assuring permanence: digital preservation Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005 Defined criteria: –Organisation –Functions, processes & procedures –Designated community & usability –Technologies & technical infrastructure Revised Checklist based on feedback and pilot audits (KB, BADC) Self-certification: DINI-Zertifikat: requirements & recommendations: –Server policy / Guidelines –Author support –Legal issues –Authenticity and integrity –Cataloguing –Access statistics –Long-term sustainability Has your repository / PLM been audited?
Interdisciplinary discovery Validation, publication & discovery of data models & schema Harmonisation and normalisation of metadata and semantics Packaging standards: METS, MPEG-21 DIDL Formal high-level and domain ontologies ePrints DC Application Profile http://www.ukoln.ac.uk/repositories/digirep/index/ Eprints_Application_Profile eBank Application Profile crystallography data http://www.ukoln.ac.uk/projects/ebank- uk/schemas/ What data models and metadata schema are in place?
Persistent identifiers for data citation How will they be used? We need use cases: depositor, author, service provider, researcher, publisher? Schemes: DOI, Handle, ARK, PURL Global identification: express as http URIs Data citation (human and machine-actionable) Publication & citation of scientific primary data project National Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de Is there a data citation policy? What persistent identifiers have been assigned to your data?
Discovering data: eBank Project Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k Domain identifier: International Chemical Identifier (INChI) code Google molecule using INChI Slide from Simon Coles Domain identifiers for engineering?
Format migration challenges? CAD Program Compatibility Chart http://www.okino.com/conv/filefrmt_cad.htm
Development: Representation Information Registry Repository DCC Approach to Digital Curation based on OAIS Representation Information Registry Repository Prototype demonstrator: based on 2 key concepts to facilitate sharing of the curation effort –Curation Persistent Identifier (CPID) –Descriptive label (structural, semantic, other metadata) Development of (M2M) tools and interfaces for creating, using and re-using representation information http://dev.dcc.ac.uk Wiki and email list EU CASPAR Integrated Project Task Force on the Permanent Access to the Records of Science http://www.casparpreserves.info/pages/1/index.htm http://tfpa.kb.nl/
Registry API Allows applications to talk to many different registry implementations e.g. GDFR, PRONOM, UDDI GUI Access and via Web browser http://registry.dcc.ac.uk
Adding value through annotation Research at the University of Edinburgh Scientific databases: Annotation scoping report New annotation model + prototype MONDRIAN Intuitive visual interface iMONDRIAN Annotate sets of values Support for querying annotations
a centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 DCC Case Study published: Wide Field Astronomy Unit
Supporting the community: Outreach & Services Workshops: Geospatial data, NeSC, 27 October OAIS 5 year Review, October Audit & Certification Forum, October Records Management, Lpool 30 Nov Curation & Preservation Training, Dec 2007 Preservation of journals tbc 2007 Legal environment tbc 2007 Preparing for audit tbc Information Days British Library Lpool UCL 2 nd International DCC Conference 21-22 November, Glasgow Keynotes: Hans F. Hoffmann, CERN, Clifford Lynch, CNI
a centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 DCC Phase 2: 2007-2010 Working more closely with data centres, e-Science Programmes and Research Councils SCARP Project: disciplinary approach JISC Digital Repository Programme collaboration RepInfo Registry service migration Define self-assessment procedures and tools Collaborate with CASPAR, DPE and PLANETS (EU- funded Digital Preservation Projects) Workshop Programme, International Conference 2007
University of Bath, 13 September 2006 a centre of expertise in data curation and preservation Thank you. Questions? email@example.com Join the DCC Associates Network at www.dcc.ac.uk