Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.

Similar presentations

Presentation on theme: "The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation."— Presentation transcript:

1 The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation

2 Jeremy G. Frey University of Southampton Combechem Smart Lab R4L eBank E-Malaria Instruments on the Grid BioSimGrid Statistics

3 The Combechem Project National X-Ray Service Data Mining and Analysis Automatic Annotation Combinatorial Chemistry Wet Lab HPC Simulation Video Data Stream Diffractometer Middleware Structures Database

4 National Crystallographic Service Send sample material to NCS service Search materials database and predict properties using Grid computations Download full data on materials of interest Collaborate in e-Lab experiment and obtain structure

5 A digital lab book replacement that chemists were able to use, and liked

6 Monitoring laboratory experiments using a broker delivered over GPRS on a PDA

7 Crystallographic e-Prints Direct Access to Raw Data from scientific papers Raw data sets can be very large - stored at UK National Datastore using SRB software

8 Entire e-Science Cycle Encompassing experimentation, analysis, publication, research, learning 5 Institutional Archive Local Web Publisher Holdings Digital Research Repository Graduate Students Undergraduate Students Virtual Learning Environment e-Experimentation e-Scientists Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses Data, Metadata & Ontologies eBank Project

9 Key Issues  The Data Life Cycle  From Acquisition to Preservation  Scholarly Communication  Access to Data and Publications

10 The Scientific Data Life Cycle  Data Acquisition  Data Ingest  Metadata  Annotation  Provenance  Data Storage  Data Cleansing  Data Mining  Curation  Preservation

11 Southampton Experiment Blogs  Example from a Bio-Organic Laboratory  Student (Jenny Hale)  Based in Southampton  Advisor (Cameron Neylon)  Southampton only 1/3 time, RAL 2/3 time

12 Laboratory “Blogs”  Explore what is needed for a Blog to be the heart of an Electronic Laboratory Notebook (ELN)  Encourage and facilitate collaboration  Need a data repository behind the Blog  R4L  eBank  Building a Virtual Research Environment (VRE)?






18 An Instrument Blog ‘Blog-jects’

19 Blog-jects  Equipment becomes first class members of the Web  Interacts well with Pub-Sub as items are attached to topics, topics relate the Blog items  With automation this evolves to a two- way communication  Live Copy essential

20 Pub-Sub system for real time laboratory monitoring and archiving Smart Laboratory Spaces

21 People Transformation Agents Archive Sensors EPrint Rep. Lab Rep. BLOG Broker Instruments

22 Nature’s ‘5Ds’ Framework  Deep Data  Discussion and Dialogue  Digital Discovery  Dynamic Delivery  Data Display  Revolution in Scholarly Communication With thanks to Timo Hannay

23 Link to data, follow links back to the raw data archive Link to simulation, full simulation data archived in BioSimGrid Publications as Live Documents

24 New Forms of Peer Review

25 Tagging for Researchers

26 Scholarly Communication  Research repositories will contain not only full text versions of research papers and links to publisher sites, but also ‘grey’ literature such as technical reports and theses  In addition, research repositories of the future will contain data, images and software  There will be many types of repository software and a need for more powerful interoperability protocols such as OAI-ORE  Libraries and researchers will add value by creating composite services as ‘eScience Mashups’

27 Institutional Repositories?

28 A New Science Paradigm  Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena  Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations …  Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena  Today: eScience or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - requires data exploration and data mining - requires data exploration and data mining  ‘ eScience’ is a shorthand for a set of technologies to support collaborative networked science  HPC and Information Management are key technologies to support this eScience revolution (With thanks to Jim Gray)

29 © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

Download ppt "The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation."

Similar presentations

Ads by Google