Presentation on theme: "A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:"— Presentation transcript:
a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship: process, product and people This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Funded by: Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University of Bath, UK
a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Three themes How? –Unpacking the title: open scholarship What? –Creating and using science-ready archives Who? –Digital natives as data scientists
Publicly available? Shared? Inclusive? Collaborative? Participative? Non-proprietary? What do we mean by open?
Scholarship today? Open Access
Data- centric 2020 vision Data-driven science
Reference datasets as infrastructure
Research into neglected tropical diseases Open source science
Synthetic biology: materials for (bio) mash-ups? Interesting IPR issues…..
Bioblog Blogs, blogs and meta- blogs….
The Tool Box?
The Peer Review Process?
The Scientific Paper?
Crystal Structure reports - data-rich scientific articles 3-d positional coordinates Atomic motions Molecular geometry Chemical bonding Crystal packing Chemical behaviour arising from structure Two dedicated IUCr journals: Acta Cryst. C, E Important part of scientific discussion in many other titles: Acta Cryst. B, D, F Original slide: Brian McMahon, IUCr Validation of data through publication
Data-centric scholarly publications Raw, primary, derived data integrated with interpretations Mandatory submission of data with text
a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 The database publication?
The mash-up Data from FAO, WHO + Google Earth
Pause for thought….. Big science communities –Grid-enabled applications –Large managed open data archives –Funder policy driver Small(er) science communities –Collaborative and social software –Evolving open wikis and blogs –Grassroots driver Curation and preservation issues –Burgeoning wiki and blog content –Web archiving Positioning of repositories???
Big science Funder-mandated sharing? Top down Small science Community culture Discipline? Institution? Bottom up science-ready archives
Laboratory protocols: common practice Instrumentation: proprietary software Standard specifications and formats Data capture
Working towards standard specifications in the lab –Open Microscopy Environment OME –Medical imaging DICOM –Flow cytometry standard FCS –Mass Spectrometry Standards Working Group mzData vs mzXML Laboratory management data systems in development
RepoMMan: Repository Metadata and Management (Univ Hull) using WS-BPEL Workflow: m2m? e-Scientist desktop? Slide: Carole Goble
Silchester: A VRE for Archaeology
Harmonisation and normalisation Standard Deposit API (GNU eprints, Dspace, Fedora) Dublin Core Application Profile for ePrints (+ Eduserv) Requirements: richer metadata set, support for value-added services, version identification, appropriate copy (OA), citations Based on FRBR Data model for scholarly works Application profile includes simple and qualified DC properties
The ePrints application profile simple DC properties (the usual suspects … ) –identifier, title, abstract, subject, creator, publisher, type, language, format qualified DC properties –access rights, licence, date available, bibliographic citation, references, date modified new properties –grant number, affiliation institution, status, version, copyright holder properties from other schemes –funder, supervisor, editor (MARC relators) –name, family name, given name, workplace homepage, mailbox, homepage (FOAF) clearer use of existing relationships –has version, is part of new relationship properties –has adaptation, has translation, is expressed as, is manifested as, is available as vocabularies –access rights, entity type, resource type and status Slide: Julie Allinson, UKOLN, Andy Powell, Eduserv
Use DC Application Profile for ePrints?
Data description and discovery Validation, publication & discovery of data models & schema eBank Application Profile uk/schemas/ Harmonisation and normalisation of metadata and semantics DOI Rights & Citation policy Crystallography: a community working together
Aggregator services Institutional data repositories Deposit, Validation Publication Validation Data analysis, transformation, mining, modelling Search, harvest Presentation services / portals Data discovery, linking, citation Laboratory repository Deposit eCrystals Global Federation Model 23/10/2006 Publishers: peer- review journals, conference proceedings, etc Curation Preservation Subject Repository Institution Library & Information Services This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Data creation & capture in Smart lab Data discovery, linking, citation Search, harvest Deposit
Data deposit & sharing: roles and responsibilities Funder Institution Faculty Individual Noor et al PLoS Biol 4(7) 2006
Repository wow-factor… …or adding value through user interface tools…
Facilitating use and re-use: text mining tools Adding value
Second pause for thought… We need to work with instrument suppliers We need to understand more about workflow We need to develop new ways of adding value to datasets through innovative user tools and services We need more evidence of how data is used and re-used (or not…)
Getting the skills mix Communities, teams, individuals International Virtual Observatory Alliance –Global community –Virtual organisation Multi-disciplinary team approach –eBank Project exemplar: computer scientists, domain scientists (chemists), digital library experts –Lessons learnt: e-Science Human Factors Audit Report 2006 Roy Kawalsky, Loughborough NSF Report 2005 Long-lived digital data collections –Data scientist
? Wanted! data scientist
Digital natives as data scientists? eBank Project: assessing role of research data in u/g Chemical Informatics and MChem courses at Univ. of Southampton Pedagogic evaluation by Grainne Conole Report imminent….
Well basically Ive done nothing like it before, so its the first time Ive sort of delved into computing or computational chemistry … quite nice, quite enjoyed starting off with just like a string of data and pop it into say a database, just a flat string of numbers basically and then come out with a crystal structure, which is exactly what it should represent which is quite cool There were several parts to the course – We started off with how to get 2D and 3D representations of molecules onto a computer using a one-dimensional format, a SMILE string …so just ways of like getting data into a format so that it can be easily shared between different computers or different people without having to change lots of things Source: Grainne Conole
New skills requirements: interdisciplinary quantitative data curation Integrate within the curriculum Wingreen & Botstein Mol Cell Biol 7, 2006
Final pause for thought… Various approaches to develop and obtain digital curation skills Skills are there but often in discrete communities: we need to bring communities together (like at this conference…) Integration within the curriculum: undergraduate students, library & information science, archival studies, computer science Provide recognition and a career path for emerging data scientists
a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Take home messages Scholarship is changing fast Big science and open source science both create significant digital curation challenges Science-ready archives are the goal Native data scientists are coming The culture will change too……….
a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Thank you….