Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory

Similar presentations


Presentation on theme: "Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory"— Presentation transcript:

1 Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory

2 Facilities Process Proposal Approval Scheduling Experiment Data storage Record Publication Scientist submits application for beamtime Facility committee approves application Facility registers, trains, and schedules scientists visit Scientists visits, facility runs experiment Subsequent publication registered with facility Raw data filtered, cleansed and stored Data analysis Tools for processing made available Characteristics : - formal application - set processes - central infrastructure - standard tools - hierarchical control - dedicated staff user office instrument scientists Library and IT support

3 Requirements Secure access to users data Flexible data searching Scalable architecture Extensible architecture Integration with analysis tools Access to high-performance resources Linking to other scientific outputs Data policy aware

4 Principles Online Proposal System User Office System: User Database Scheduling Health and Safety Proposal Management Metadata Catalogue Data Acquisition System Storage Management System DataAccess Portal Single Sign On Account Creation and Management ICAT Software Suite, providing the crucial integration of key functions. The ICAT software suite Catalogues all experiment related information Metadata gathered via integration with existing IT systems – proposal systems – data acquisition Provides a well defined API for easy embedding into any applications. Access data anywhere via the web Annotate and Search for data Share data with colleagues Access data via users own programs Utilise integrated e-Science resources Link to data from your publications

5 Component architecture

6 RDBMS Web Services API ICAT API Command Line Tools Glassfish / JBOSS JavaC++Fortran Data Storage/ Delivery System Single Sign On User Database System Proposal System Publication System e-Science Services Software Repository ICAT Deployment

7 Data Portal

8 TopCat

9 Towards an Information Model

10 Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston

11 Functional requirements

12 A Metadata Model for Facilities Science A common general format/standard for Scientific Studies and data holdings metadata did not exist By proposing a Model –A specification for the types of metadata to capture Scientific Studies –Cataloguing data holdings: provide access for the Data Owner –Ease citation, sharing collaboration, and integration –Allow easy Federation of distributed heterogeneous metadata systems into a homogeneous (virtual) Platform Therefore – The Common Scientific Metadata Model (CSMD) developed.

13 A Domain Model

14 Modelling Scientific Activity

15

16

17 Investigation PublicationKeywordTopic Sample Sample Parameter Dataset Dataset Parameter Datafile Datafile Parameter Investigator Reference / Proposal Id Previous Reference Facility Instrument Title Abstract Etc. Name Name/Units/Value etc Searchable Is Sample Parameter Is Dataset Parameter Is Datafile Parameter Verified Name Units String Value Numeric Value Range Top Range Bottom Error Full Reference URL Repository Name Parent Id Topic Level User Id Role Name Chemical Formula Safety Information Name Units String Value Numeric Value Range Top Range Bottom Error Name Sample Id Description Name Units String Value Numeric Value Range Top Range Bottom Error Name Description Version Location Format Format Version Create Time Modify Time Size Checksum Related Datafile Parameter Authorisation Source Datafile Id Destination Datafile Id Relation S/W Apllication S/W Version User Id Role e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc. Element Type Element Id Damian Flannery Core Scientific Metadata Model

18 Description set profile

19 Metadata granule Metadata Granule Topic Study Description Access Conditions Data Location Data Description Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational aid to where the data on the study can be found. References into the literature and community providing context about the study. Related Material Legal Note Copyright, patents and conditions of use etc relating to the study and the data in the study.

20 ICAT 3.3 Schema – Study (2)

21 Syntax and metadata formats

22 ICAT API and XML format

23 ICAT 3.3 Database Schema

24 CSMD History Model first pilot developed in 2001! Now in ICAT 3.3 Serving data from STFC Facilities (ISIS, DLS) Model proven robust – simple yet expressive –http://code.google.com/p/icatproject/http://code.google.com/p/icatproject/

25 I2S2 - Infrastructure for Integration in Structural Sciences Bridging the gap between raw and derived data Lone researcher scenario data sharing with colleagues via Little or no infrastructure Little management of raw or derived data EPSRC National Crystallography Service service provision function operates across institutions moderate infrastructure Diamond & ISIS operates on behalf of multiple institutions processes for experiments large infrastructure engineered to manage raw data derived data taken off site on laptops / removable drives

26 Interactions between research process Grant Proposal Facilities Proposal Facilities Experiment Data cleansing Record Publication Data analysis Local experiments Simulation Sample Preparation Literature Review Publication Proposal Approval Scheduling Facilities Experiment Data storage Record Publication Analysis Tools CSMD Cover the scientists research lifecycle as well as the facilities. Extend to To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies By being: - standardised - modular - extensible

27 Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston

28 Issues Metadata model Framework for developing metadata model Modularisation mechanisms and extensions Formats Model supporting laboratory tools –How does the model fit ? –Flexibility to handle local processes Adhoc, partial, un-ordered –What needs changing in the model? –What needs changing in tools? Data input and maintenance??? Simple ways of inputting the data Lab books?

29 Extension areas: Secondary analysis data Preservation data Publication data Topic data chemistry Controlled lists (ontologies) for Instruments Facilities, Methods Access control Safety data Blogs and notebooks

30 ISIS - ICAT Part of ISIS study Gudrun Control file Correction dataSample dataCalibration data Scattering function data User inputs

31 Derived Data Generalised model Managing the links between data Inputs of data sets Associated with a software item with a set of parameters Managing this? - lab-books ? - simple tools? - VRE ?


Download ppt "Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory"

Similar presentations


Ads by Google