Presentation on theme: "Towards an information model for I2S2"— Presentation transcript:
1 Towards an information model for I2S2 Brian Matthews,Leader, Scientific Applications Group,E-Science Centre,STFC Rutherford Appleton Laboratory
2 Facilities Process Characteristics : - formal application ProposalApprovalSchedulingExperimentData storageRecord PublicationFacilities ProcessCharacteristics :- formal application- set processes- central infrastructure- standard tools- hierarchical control- dedicated staffuser officeinstrument scientistsLibrary and IT supportSubsequent publication registered with facilityData analysisScientist submits application for beamtimeTools for processing made availableFacility committee approves applicationRaw data filtered, cleansed and storedScientists visits, facility run’s experimentFacility registers, trains, and schedules scientist’s visit
3 Requirements Secure access to user’s data Flexible data searching Scalable architectureExtensible architectureIntegration with analysis toolsAccess to high-performance resourcesLinking to other scientific outputsData policy aware
4 Principles proposal systems The ICAT software suiteCatalogues all experiment related informationMetadata gathered via integration with existing IT systemsproposal systemsdata acquisitionProvides a well defined API for easy embedding into any applications.Access data anywhere via the webAnnotate and Search for dataShare data with colleaguesAccess data via user’s own programsUtilise integrated e-Science resourcesLink to data from your publicationsOnline Proposal SystemUser Office System:User DatabaseSchedulingHealth and SafetyProposal ManagementMetadata CatalogueData Acquisition SystemStorage Management SystemDataAccess PortalSingle Sign On Account Creation and ManagementICAT Software Suite, providing the crucial integration of key functions.
5 Component architecture The ICAT software suite has a modular design with clear functional boundaries for each component.Core functionalities have been grouped together, customisable presentation layers are separated from the function layer to achieve easy maintenance, easy customisation, insulation from changes to underlying areas.All interaction with the ICAT catalogue are now through the ICAT API.
6 Data Storage/ Delivery System ICAT DeploymentUser Database SystemSingle Sign OnData Storage/ Delivery SystemProposal SystemPublication SystemICAT APIe-Science ServicesRDBMSSoftware RepositoryWeb Services APICommand Line ToolsFortranC++JavaGlassfish / JBOSS
12 A Metadata Model for Facilities Science A common general format/standard for Scientific Studies and data holdings metadata did not existBy proposing a ModelA specification for the types of metadata to capture Scientific StudiesCataloguing data holdings: provide access for the Data OwnerEase citation, sharing collaboration, and integrationAllow easy Federation of distributed heterogeneous metadata systems into a homogeneous (virtual) PlatformTherefore – The Common Scientific Metadata Model (CSMD) developed.
19 Metadata granuleMetadataGranuleTopicKeywords providing a index on what the study is about.StudyDescriptionProvenance about what the study is, who did it and when.AccessConditionsConditions of use providing information on who and how the data can be accessed.DataDescriptionDetailed description of the organisation of the data into datasets and files.DataLocationLocations providing a navigational aid to where the data on the study can be found.RelatedMaterialReferences into the literature and community providing context about the study.LegalNoteCopyright, patents and conditions of use etc relating to the study and the data in the study.
24 CSMD History Model first pilot developed in 2001! Now in ICAT 3.3 Serving data from STFC Facilities (ISIS, DLS)Model proven robust – simple yet expressive
25 I2S2 - Infrastructure for Integration in Structural Sciences Bridging the gap between raw and derived dataEPSRC National Crystallography Serviceservice provision functionoperates across institutionsmoderate infrastructureDiamond & ISISoperates on behalf of multiple institutionsprocesses for experimentslarge infrastructure engineered to manage raw dataderived data taken off site on laptops / removable drives“Lone” researcher scenariodata sharing with colleagues viaLittle or no infrastructureLittle management of raw or derived data
26 Interactions between research process ProposalExtend toTo laboratory based scienceTo secondary analysis dataTo preservation informationTo publication dataTo domain specific vocabulariesBy being:- standardised- modular- extensibleRecord PublicationApprovalCSMDSchedulingAnalysis ToolsFacilitiesExperimentFacilities ExperimentData storageData cleansingSample PreparationData analysisLocal experimentsPublicationSimulationFacilities ProposalCover the scientist’s research lifecycle as well as the facilities.Record PublicationLiterature ReviewGrant Proposal
27 MethodologyThe Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnston
28 Issues Metadata model Framework for developing metadata model Modularisation mechanisms and extensionsFormatsModel supporting laboratory toolsHow does the model fit ?Flexibility to handle local processesAdhoc, partial, un-orderedWhat needs changing in the model?What needs changing in tools?Data input and maintenance???Simple ways of inputting the dataLab books?
29 Extension areas: Secondary analysis data Preservation data Publication dataTopic datachemistryControlled lists (ontologies) forInstrumentsFacilities,MethodsAccess controlSafety dataBlogs and notebooks
30 Scattering function data Part of ISIS studyISIS - ICATCorrection dataSample dataCalibration dataUser inputsControl fileGudrunScattering function data
31 Derived Data Generalised model Managing the links between data Inputs of data setsAssociated with a software itemwith a set of parametersManaging this?- lab-books ?- simple tools?- VRE ?