The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K. This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0
Current Situation - Data Generation Synthesis Characterisation
Current Situation – Data Management “Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant” “Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits” “To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data” “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)
Current Situation – Data and Publishing
Separating Data from Interpretations Underlying data (Institutional data repository) Intellect & Interpretation (Journal article, report, etc)
Smart Labs
Laboratory IRs and Information Management
The R4L Repository Deposit Search / Browse Create new compoundAdd experiment data and metadata
Blogging Experiments A repository can… Allow one to put, store and get digital objects Provide minimal search and browse functions NOT provide the presentation and discussion functions essential to a scientific study Social networking tools and approaches can provide a way…
Facilitating Research Facilitates ‘geographically distributed collaborative research’ Useful approach for sharing ‘failed’ experiments?
Machines Blogging Experiments Automatic upload by scientific instrument
Comments and Annotation A picture says a thousand words! Chemists like to sketch! Need for more advanced Blog tools / technology
Current Situation - Data Deluge 30,000, ,000, ,000
Laboratory Data Management and Archive
The eCrystals Public Data Archive
NCS Data Publication Policy Joint publication: Timed release of data tied to conventional journal article Separate publication: Independent release of data so that it can be cited e.g. from a journal article, grant report, poster ‘Accidental’ or ‘undesired’ results: Immediate release after agreement with concerned parties Never to be formally published results: Automatic release after three years Embargo feature: default 3 years, but timescale can be defined by depositor Record can be made public at any time (following agreement from all concerned parties) Roles of all concerned parties defined (originator, etc) Data citation, DOI, Rights
Linking and aggregating Link data and associated ‘publications’ Dataset annotated with metadata Semantic publishing on WWW and in journals nals/ProjectProspect/index.asp bank-uk/pilot/
Aggregator services Institutional data repositories Deposit, Validation Publication Validation Data analysis Search, harvest Presentation services / portals Data discovery, linking, citation Laboratory repository Deposit eCrystals ‘Global Federation’ Model Publishers: peer- review journals, conference proceedings, etc Curation Preservation Subject Repository Institution Library & Information Services Data creation & capture in “Smart lab” Data discovery, linking, citation Search, harvest Deposit
Changing Times! Information Providers Information Consumers All I am saying is that now is the time to develop the technology to deflect an asteroid