11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &

Slides:



Advertisements
Similar presentations
IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
Advertisements

Comb-e-Chem Jeremy Frey Sept 2003 From e-Science to Jeremy Frey School of Chemistry University of Southampton, UK X-ray single Mol STM.
AHM, Nottingham, September eBank UK : linking research data, scholarly communication and learning. Dr Liz Lyon, UKOLN, University of Bath Dr Simon.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
Less is More Lightweight Ontologies and User Interfaces for Smart Labs J. G. Frey, G. V. Hughes, H. R. Mills, m. c. schraefel, G. M. Smith, David De Roure.
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
Crystallographic Metadata Simon Coles CrystalGrid Collaboratory Foundation Meeting September 2004.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Linking Data and Publications: the Chemistry Way Simon Coles School of Chemistry, University of Southampton, U.K. CLADDIER workshop.
Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
© S.J. Coles 2006 eCrystals: A Route for Open Access to Small Molecule Crystal Structure Data Simon Coles School of Chemistry, University of Southampton,
Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath
UKOLN is supported by: eBank UK : linking research data, scholarly communications and learning. Dr Liz Lyon, UKOLN, University of Bath, UK JISC CNI Conference.
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
EBank UK CCLRC Workshop February eBank and CCLRC Workshop February 2005 University of Bath.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
PaN-data WP7 - Integration Brian Matthews STFC-e-Science.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
THE GLOBAL CHEMISTRY NETWORK David James Executive Director, Strategic Innovation Jim Iley Executive Director, Science and Education 3 rd September 2013.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
28 October 2005Jeremy Frey, University of Southampton1 “The CombeChem Experience” CICC Workshop 28 October 2005 Bloomington Indiana.
David De Roure Manchester Edition. John Taylor There are a number of grid applications being developed and there is a whole raft of computer technologies.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
University of Southampton, U.K.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
© S.J. Coles 2006 Data Management in the Chemistry Domain Simon Coles School of Chemistry, University of Southampton, U.K.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Library Automation and Digital Libraries Class #5 LBSC 690 Information Technology.
Experiences with Repositories and Blogs in Laboratories or ‘R4L: The Repository for the Laboratory’ Leslie Carr, Simon Coles & Jeremy.
21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Smart Lab, Smart Tea H. R. Mills, G. V. Hughes, m. c. schraefel, J. G. Frey, G. M. Smith, David De Roure CombeChem Project Electronics and Computer Science.
Joint agINFRA & SCI-BUS workshop, 30/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA Joint agINFRA & SCI-BUS workshop agINFRA.
Metadata in a distributed information environment: Interoperability as recombinant potential Lorcan Dempsey OCLC/SCURL pre-IFLA conference, 15/16 Aug 02.
Philip E. Bourne Professional Development Lecture 7 Understanding and Working the Publishing Process.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
Adapting the Electronic Laboratory Notebook for the Semantic Era Tara Talbott, Michael Peterson, Jens Schwidder, James D. Myers 2005 International Symposium.
The Collaborative Semantic Grid David De Roure University of Southampton, UK
CombeDay Making Data Openly Available Simon Coles.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Oct 2004 Jeremy Frey Informatics1 Automation and Semantics: The CombeChem Experience Jeremy Frey CombeDay Feb 2005.
David De Roure Workflows in Support of Large-Scale Science Provenance, a.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
UKOLN is supported by: Library futures in the new research landscape. Dr Liz Lyon, UKOLN, University of Bath, UK CURL Members Meeting October 2004, London.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
eCrystals Federation: Open Repositories for global Open Science
VI-SEEM Data Repository
Research on Data Curation and Repositories
LOD reference architecture
JISC Joint Programmes Meeting 2005
Developing Institutional Data Repositories
eCrystals Federation: Open Repositories for global Open Science
Presentation transcript:

11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey & Simon Coles School of Chemistry University of Southampton

AHM2006Data Curation Workshop 2 The Comb e Chem Project  End to End linking of data and information  Laboratory to publication and back again  Very long data chains can be involved e.g. from a chemistry lab to mouse genetic expression  The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing  “Automation, Semantics & the Grid”  End to End linking of data and information  Laboratory to publication and back again  Very long data chains can be involved e.g. from a chemistry lab to mouse genetic expression  The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing  “Automation, Semantics & the Grid”

AHM2006Data Curation Workshop 3 Plan & COSHH Digital Model Information Integration Report Knowledge Goal Literature Synthesis not just one laboratory but many co-laboratories working together Analysis Smart Laboratory Smart StorageSmart Dissemination Smart HCI

AHM2006Data Curation Workshop 4 Problems with ‘Small Laboratory’ Working Practice “Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant” “Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits” “To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data” “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

AHM2006Data Curation Workshop 5 The concept of  Trace all the way back from publication to the original data – provenance  The data is the key - DataGrid  Start as you mean to go on – ELNs are a necessity  Curation of subsequently produced data  Trace all the way back from publication to the original data – provenance  The data is the key - DataGrid  Start as you mean to go on – ELNs are a necessity  Curation of subsequently produced data

AHM2006Data Curation Workshop 6 Observations are never collected on note pads, filter paper or other temporary paper for later transfer into a notebook If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA

AHM2006Data Curation Workshop 7 Lab books are a big block to if it’s not digital, it is more difficult to share Need a usable digital lab book. Design by analogy to help Chemists and Computer Scientists work together. Only some equipment is networked This is where it all starts: The Lab & The Lab Book

AHM2006Data Curation Workshop 8 COSHH leverage off things we already have to do

AHM2006Data Curation Workshop 9 PLAN Process Record

AHM2006Data Curation Workshop 10

AHM2006Data Curation Workshop 11 getRecord() There is a potential containment problem in pulling back partial RDF graphs from the triple store. Solved by using multiple triple stores but boundaries are a major issue for the future.

AHM2006Data Curation Workshop 12 Architecture SURIG Data stores Semantic Data Other services Weights & Measures Bench Planner0 Viewer0 PHP Java “Client” Libraries SOAP Jena SURIG Applications Institutional archives and metadata publication

AHM2006Data Curation Workshop 13 The Analytical Laboratory  Capture information from places you would not want to put your eyes  Capture environmental data automatically  Capture people and movements  Provide this information in real time as well as for the laboratory record  Capture information from places you would not want to put your eyes  Capture environmental data automatically  Capture people and movements  Provide this information in real time as well as for the laboratory record

AHM2006Data Curation Workshop 14 Data Source Archive Client Web Client Mobile phone Data Source PDA Message Broker Translator Service Pub-Sub systems provide the flexible & extensible approach to distribution BLOG

AHM2006Data Curation Workshop 15 Temperature – room, laser Door & interlock, Motion Sensors Air Conditioning failed

AHM2006Data Curation Workshop 16 Databases - Our experience  What do you do when the actual users keep changing their mind?  Is a traditional relational database suitable?  Danger of re-enforcing scientific bias against relational database for laboratory data.  RDF & Triple stores were again the solution  What do you do when the actual users keep changing their mind?  Is a traditional relational database suitable?  Danger of re-enforcing scientific bias against relational database for laboratory data.  RDF & Triple stores were again the solution

AHM2006Data Curation Workshop 17 RDF/RDFS High level Schema for chemical properties

AHM2006Data Curation Workshop 18

AHM2006Data Curation Workshop 19 Triple Stores - The Heart of the Semantic Web Scaling - 3Store response Memory leak in testing program!

AHM2006Data Curation Workshop 20 Scaling the triplestores Moved from…  A model of harvesting data from multiple sources into one scalable store to  A model of distributed RDF sources and caching what is needed for the task at hand into multiple stores fit-for-purpose Moved from…  A model of harvesting data from multiple sources into one scalable store to  A model of distributed RDF sources and caching what is needed for the task at hand into multiple stores fit-for-purpose The Semantic Web!

AHM2006Data Curation Workshop 21 Experiments on the Grid: The NCS Service HTTPS

AHM2006Data Curation Workshop 22 Binary raw data archived in Atlas Datastore x300 ADS £’s

AHM2006Data Curation Workshop 23 A Data-Rich Subject – the Crystallography Problem 30,000, ,000, ,000

AHM2006Data Curation Workshop 24 The eCrystals Digital Repository

AHM2006Data Curation Workshop 25 Access to the underlying data

AHM2006Data Curation Workshop 26 Aggregator services Institutional data repositories Validation Deposit Publishers: peer- review journals, conference proceedings, etc Publication Validation Data analysis, transformation, mining, modelling Search, harvest Presentation services / portals Data discovery, linking, citation Laboratory repository Deposit The eCrystals ‘Global’ Model Preservation and curation

AHM2006Data Curation Workshop 27 Laboratory Repositories and Information Management

AHM2006Data Curation Workshop 28 Need for a data archive in the laboratory Not just the published spectra!

AHM2006Data Curation Workshop 29 Deposit The R4L Repository Search / Browse Create new compoundAdd experiment data and metadata

AHM2006Data Curation Workshop 30 Several groups making and analysing; the library Administrative Domains transfer or share the data Researcher National Archive Research Group Institution International Database Research Group

AHM2006Data Curation Workshop 31 SVG “active” graphics Link to data, follow links back to the raw data archive Link to simulation, full simulation data archived in BioSimGrid R4L Paper organized using RDF

AHM2006Data Curation Workshop 32 Summary:  Making sure other people can find, understand and re-use your data easily and with confidence (even when there is a huge amount of it!)  Make use of Plans to inform the digital context - metadata in advance  Have concern for the “End-to-End life cycle” of chemistry information from the start.  Understanding Usability and Human Computer Interaction is vital for adoption  Making sure other people can find, understand and re-use your data easily and with confidence (even when there is a huge amount of it!)  Make use of Plans to inform the digital context - metadata in advance  Have concern for the “End-to-End life cycle” of chemistry information from the start.  Understanding Usability and Human Computer Interaction is vital for adoption