Presentation on theme: "A centre of expertise in digital information management www.ukoln.ac.uk UKOLN is supported by: Dealing with Data: Roles, Rights, Responsibilities & Relationships."— Presentation transcript:
A centre of expertise in digital information management www.ukoln.ac.uk UKOLN is supported by: Dealing with Data: Roles, Rights, Responsibilities & Relationships Dr Liz Lyon, Director, UKOLN Associate Director, UK Digital Curation Centre JISC Digital Repositories Conference, Manchester June 2007. This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0
Overview Outcomes of a recent JISC-funded study by UKOLN –Institutions (repositories) and data centres –Roles, rights, responsibilities, relationships –High-level data-flow models Positioned in the UK context –8 perspectives from Strategy to Practice –Examples of best practice –Recommendations
Strategy & Co-ordination Synthesis –Funder support for data curation is (still) patchy –Gaps in infrastructure support –High level and strategic –Operational level and practical : data services & data centres –Within and between institutions –Within and between disciplines : globally Recommendations –Datasets Mapping & Gap Analysis –Data Curation & Preservation Strategy for the UK –Data Audit Framework for institutions –Data Networking Forum for data centre staff
Policy & Planning Synthesis –Limited formal links between programme planning and support infrastructure but examples of good practice –Formal data policies are essential –Web 2.0 influence: data sharing using social software –Better joint planning for data management Recommendations –Funders should openly publish, implement and enforce a Data Management, Preservation and Sharing Policy –Research projects should submit a Data Management Plan for peer-review –Universities should implement an Institutional Data Management, Preservation and Sharing Policy
A centre of expertise in digital information management www.ukoln.ac.uk January 2007 Data Management and Sharing Plan required if creating or developing a resource for the research community as the primary goal or involve the generation of a significant quantity of data that could potentially be shared for added benefit
NATURAL ENVIRONMENT RESEARCH COUNCIL NERC has: 7 designated data centres Published policy (under review) Data Management Co-ordinator Developing DataGrid
General Data Selection Criteria Usability –Quality of data –Usable data format –Conditions of Use –Reputable Author –Documentation Usefulness –Data quality –Uniqueness of data –Potential Strategic Use –Usefulness of parameters NATURAL ENVIRONMENT RESEARCH COUNCIL
Practice Synthesis –Data capture automatically at source from instruments, in the lab, in the field –Not much data in Institutional Repositories (IR)…. yet? –Integrated architectures linking IRs and datacentres –Models for sharing data? –Barriers: lack of awareness, resistance to change –Level of re-use of data? Recommendations –Data capture as part of end-to-end research workflow –Evaluate re-purposing of datasets: identify the significant properties which facilitate re-use –Develop Disciplinary Case Studies
Technical Integration and Interoperability Synthesis –Data are highly complex and diverse –Data discovery to delivery –Standards, standards, standards, standards…. –Value of generic data models, metadata application profiles? Recommendations –Identifiers and data citation best practice –Version control of datasets –Annotation models and standards best practice –Bi-directional interdisciplinary linking between data objects and derived resources
Microarray data to inform gene expression Consensus on community standards MIAME Data pipelines at source via Laboratory Information Management Systems LIMS User tools MIAMExpress & value-added services Annotation of data using the Gene Ontology Submission & deposit is embedded in community culture: requirement for publication Training programme, eLearning materials coming This level of data curation is expensive!!
Reactome EnsEMBL Genome Annotation EMBL-Bank DNA sequences UniProt Protein Sequences Array-Express Microarray Expression Data EMSD Macromolecular Structure Data IntAct Protein Interactions Source: Graham Cameron, EBI
Flybase MGD SGD BRENDA Chemical data resources Medical data resources Biodiversity data resources IMGT Pasteur DBs Eumorphia/ Phenotypes Core biomolecular resources Specialist biomolecular data resource examples Mutants Large resources in related disciplines Model organism resource examples Mouse Atlas Source: Graham Cameron, EBI
Legal and Ethical Issues Synthesis –IPR is a barrier to data sharing e.g. geospatial data, performing arts –We need a better understanding of the issues Recommendations –JISCLegal provide enhanced advice about data and IPR –Develop model licences with other organisations Sustainability Synthesis –Are current economic models for preservation & data sharing infrastructure a) appropriate? b) adequate? c) sustainable? –Should inform research prioritisation and investment Recommendations –Cost-benefit study –Construct new economic models
Advocacy Synthesis –Programmes need to reach across sectors –Harmonisation and consistent messages –Researcher has some curatorial responsibility Recommendations –UK Co-ordination and target at specific disciplines Training and Skills Synthesis –Leverage library & archive experience, EU projects DPE and PLANETS –Data curators and native data scientists Recommendations –Co-ordination: in the UK –Review career development of data scientists –Assess value of data handling and curation in the curriculum UK Digital Curation Centre http://www.dcc.ac.uk/
Scientist : creation and use of data Rights Of first use. To be acknowledged. To expect IPR to be honoured. To receive data training and advice. Responsibilities Manage data for life of project. Meet standards for good practice. Comply with funder / institutional data policies and respect IPR of others. Work up data for use by others. Relationships With institution as employee. With subject community With data centre. With funder of work. Baroness Susan Greenfield, UK
Institution : curation of and access to data Rights To be offered a copy of data. Responsibilities Set internal data management policy. Manage data in the short term. Meet standards for good practice. Provide training and advice to support scientists. Promote the repository service. Relationships With scientist as employer. With data centre through expert staff. http://www.flickr.com/photos/nrparmar/383549700/in/pool-bath-uni/
Data centre : curation of and access to data Rights To be offered a copy of data. To select data of long-term value. Responsibilities Manage data for the long-term. Meet standards for good practice. Provide training for deposit. Promote the repository service. Protect rights of data contributors. Provide tools for re-use of data. Relationships With scientist as client With user communities. With institution through expert staff. With funder of service.
User : use of 3 rd party data Rights To re-use data (non- exclusive licence). To access quality metadata to inform usability. Responsibilities Abide by licence conditions. Acknowledge data creators / curators. Manage derived data effectively. Relationships With data centre as supplier. With institution as supplier. GridPP computing facilities at Imperial College, London
Funder : set/react to public policy drivers Rights To implement data policies. To require those they fund to meet policy obligations. Responsibilities Consider wider public-policy perspective & stakeholder needs. Participate in strategy co-ordination. Develop policies with stakeholders. Participate in policy co-ordination, joint planning & fund service delivery. Monitor and enforce data policies. Resource post-project long-term data management. Act as advocate for data curation & fund expert advisory service(s). Support workforce capacity development of data curators. Relationships With scientist as funder. With institution. With data centre as funder. With other funders. With other stakeholders as policy-maker and funder of services.
Publisher : maintain integrity of the scientific record Rights To expect data are available to support publication. To request pre-publication data deposit in long-term repository. Responsibilities Engage stakeholders in development of publication standards. Link to data to support publication standards. Monitor & enforce public. standards. Relationships With scientist as creator, author and reader. With data centres and institutions as suppliers.
A centre of expertise in digital information management www.ukoln.ac.uk Dealing with Data Report will be published shortly at www.ukoln.ac.uk