Presentation on theme: "Publishing Data Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton, UK."— Presentation transcript:
Publishing Data Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton, UK 15 th May 2007
Contents Set the scene Definition of publication Complexities Making data permanently available Quality control User requirements Issues
Microsofts Science 2020 Report Modern scientific communication relies on both journals and databases. At present these are not integrated. By 2020 mutual linking will be commonplace and publications just containing peer-reviewed data will become available. http://research.microsoft.com/towards2020science/downloads.htm
Publication concept In this context publication is defined as the process through which data is fixed and made retrievable over the long term, and may imply that there has been some quality control process.
Complexities of Data These all show the same data at different levels of processing.
Making data permanently available Three areas: 1.Defining what is to be kept: encapsulation 2.Ensuring that it is described effectively: metadata 3.Identifying who is responsible for the data management: trusted repository
Encapsulation A method of identifying a fixed collection of meaningful data so that it can be preserved as a clearly defined unchanging entity. Datasets which are still growing Versions of datasets Format translations
Metadata Needs to be created to ensure that the data is usable now and over the long term. Semantic encapsulation is important as this is likely to be used in citation.
Trusted repository To ensure that the data is available over the long term, the Data Centre needs to be on a secure footing and well managed.
Quality Control Usability of the dataset. This is one of the roles of the Data Centres. Usefulness of the dataset. This is the role of domain experts.
User requirements for citation 1.Need for an unambiguous reference to a well defined permanent entity 2.This reference/citation needs to be understandable for humans 3.Author and publication year, or equivalents, are important 4.An unambiguous data reference, in this area includes the activity or tool which produced the data 5.Source of the data (i.e. the repository) may be as important as the producer and needs to be unambiguous
Requirements from data producers 1.Traceable to the data provider/producer 2.Usable for usage metrics 3.To be recognised as intellectually equivalent to academic papers 4.Able to be used to search for papers citing data
Citation format Author, title, [medium], publisher, publication date, identifier, feature, [access date, available at] Natural Environment Research Council, Mesosphere- Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. British Atmospheric Data Centre (BADC), 1990- urn badc.nerc.ac.uk/data/mst/v3/upd15032006, feature 200409031205 [http://featuretype.registry/VerticalProfile] [cited 2006 Apr 25. Available from http://badc.nerc.ac.uk/data/mst.]
Issues for consideration The ability to cite data is strongly linked to the definition of the data. Dynamic datasets pose additional issues for long- term accessibility. Versioning of the data and the processing/analysing software are big issues to resolve. Peer review of the data is important. Identification of datasets where a facility may provide data from a set of instruments is a complex decision.