Presentation is loading. Please wait.

Presentation is loading. Please wait.

‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community S.J. Colesa*, J.G. Freya,

Similar presentations


Presentation on theme: "‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community S.J. Colesa*, J.G. Freya,"— Presentation transcript:

1 ‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community S.J. Colesa*, J.G. Freya, M.B. Hursthousea, A.J. Milsted, L. Carrb, M. Dukec, T. Kochc & E. Lyonc. aSchool of Chemistry; bSchool of Electronics and Computer Science, University of Southampton, UK.; cUKOLN, University of Bath, UK. The Publication Problem The Institutional Data Archive The UK funding councils recently stated that ‘the data underpinning the published results of publically-funded research should be made available as widely and rapidly as possible’. Thirty years ago a research student would present about five crystal structures as their PhD thesis, however with modern technologies and good crystals this can now be achieved in the timespan of a single morning. This increase in pace of generation further exacerbates a problem in the communication of the results. Additionally, the general route for the publication of a crystal structure report is coupled with and often governed by the underlying chemistry and is therefore subject to the lengthy peer review process and tied to the timing of the publication as a whole. This bottleneck in the dissemination of crystal structure data hinders the potential growth of databases (just 500,000 small unit cell crystal structures are available in the CSD, ICSD & CRYSMET, while it is estimated that at least three times this number have been determined in laboratories worldwide). In addition, publication in the mainstream literature still offers only indirect (and often subscription controlled) access to this data. The eBank-UK ( project has addressed the publication problem by establishing an institutional data archive. On one hand this archive is capable of supporting and managing ALL the digital files generated during the course of a crystallographic experiment. On the other hand it is capable of acting as a dissemination tool, by making metadata relating to these crystallographic datasets available to the public domain. This process alters the traditional method of peer review by openly providing crystal structure data, where the reader or user may directly check correctness and validity. The repository ( makes available all the raw, derived and results data from a crystallographic experiment with little further researcher effort after the creation of a normal completed structure in a laboratory archive. Not only does this approach allow rapid release of crystal structure data into the public domain, but it also provides a mechanism for the construction of value added services that allow rapid discovery of the data for further studies and reuse, whilst ownership of the data is retained by the creator. Access to ALL the data Getting data into the archive The archive makes available all results data, including a CML file and a CIF accompanied by a CHECKCIF validation output. Important files generated during the experiment are also provided, e.g. final refinement listings, details of all scans and corrections performed and precession photographs. Thus COMPLETE details of all the steps undertaken during the analysis are provided and anyone wishing to reuse the structure can fully assess its validity. The archive is configured to recognise all the files generated during the course of a crystallographic experiment. It is also necessary to perform numerous operations and file format conversions to generate an archive compliant dataset. A ‘toolbox’ has been created which seamlessly performs these operations to ensure that minimal human error is introduced. Further metadata are associated with the dataset during the deposition process by means of a simple interface. Data aggregation services Publicising and harvesting content Information providers may regularly probe the archive interface for new or updated entries and download the associated metadata. These services can then ‘aggregate’ the metadata, -that is perform linking and cross referencing exercises that enable the researcher to move navigate seamlessly through the literature. Metadata relating to the dataset are made available to a public interface via a digital libraries protocol (OAI-PMH) that enables third parties to ‘harvest’ information on the content of the archive. Primary bibliographic data e.g. title (IUPAC name), authors & affiliation, in addition to chemical metadata e.g. International Chemical Identifier (InChI), empirical formula, compound class & keywords are provided. The dataset is registered with a persistent identifier (DOI) which enables the generation of a permanent citation. The OAI also states which aspects of the experimental process contain files, so that a harvester may assess whether an entry is appropriate for the desired purpose. Data-based aggregators may discover relevant data and download it for indexing and inclusion in their collections. Additionally subject or academic literature based services may use the harvested metadata to associate a dataset with other relevant works in the literature. The Future: Institutional Support, Further Deployments & Third Party Services eCrystals has been devised as a part of a project addressing the challenge of whole-lifecycle use of data, by investigating the role of aggregator services in linking datasets to peer reviewed articles. UKOLN (University of Bath) and the eCrystals team have designed a prototype service based on metadata harvested from the archive and aggregated it with the primary crystallographic literature (IUCr journals). Future work in this area will focus on enabling data-based harvesters to automatically harvest datasets so that eCrystals entries can be indexed and incorporated into subject specific databases (e.g. CSD). The prototype service will be developed further to provide a mechanism to aggregate datasets with the broader chemical literature and other bodies of publicly available chemical information. Current developments include securing backing from host institutions and we are in the final stages of making an agreement with the University of Southampton to support this archive as part of its Institutional Repository scheme, hosted by our Library and Information Services department. In addition further installations in other institutions are planned so that a ‘federation’ of archives can be constructed, which will enrich the content of third party aggregator services and promote their development. The authors would like to acknowledge and thank the Joint Information Systems Committee (JISC) for funding this project.


Download ppt "‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community S.J. Colesa*, J.G. Freya,"

Similar presentations


Ads by Google