Presentation is loading. Please wait.

Presentation is loading. Please wait.

Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept,

Similar presentations


Presentation on theme: "Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept,"— Presentation transcript:

1 Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept, CLRC Kerstin Kleese-van Dam E-Science Centre, CLRC b.m.matthews@rl.ac.uk

2 Brian Matthews, CRIS 2002, 31/08/02 2 Overview Science produces two outputs – Conventional Publications – Science Data Sets In traditional Science, the 1 st is used as a measure of success –The second is locked away. In this talk I shall discuss: –A general purpose science data portal for allowing access to data sets –Potential links to publications. To make all the outputs of science available.

3 Brian Matthews, CRIS 2002, 31/08/02 3 Central Laboratory of the Research Councils 1700 staff - supporting 12000 scientists and engineers from universities and industry Based at 3 sites: –Daresbury Laboratory –Rutherford Appleton Laboratory –Chilbolton Observatory A Multidisciplinary Laboratory Who we are (CLRC)

4 Brian Matthews, CRIS 2002, 31/08/02 4 A Multidisciplinary Laboratory Spallation Neutron and Muon Source (ISIS) Synchrotron Radiation Source (SRS) Lasers Microstructures Space Science and Technology Molecular Spectroscopy Earth Observation Atmospheric Science Computational Science Energy Research Information Technology Particle Physics Radio Communications Surfaces Transforms and Interfaces

5 Brian Matthews, CRIS 2002, 31/08/02 5 The Problem Scientific institutions generate vast quantities of data –CLRC - ISIS, SRS, Space Science, Particle Physics, Computational Science,... More data coming on stream all the time: –CERN-LHC, Diamond, CASIM, HGP,... Very good at handling large amounts of data Diverse approaches to organising and distributing it. Need a usable way of gaining access to the data

6 Brian Matthews, CRIS 2002, 31/08/02 6 User Scenarios Lecturer: –This published study would be a good example for teaching, is the raw data publicly available? Researcher: –This is an interesting paper - can I check the data? Experiment Proposer: –Have there been any neutron or X-Ray studies of this molecule at 100 K? What reports and papers have been published on them? Instrument Scientist: –The instrument seems a bit unstable recently, fetch me the results of all calibration runs from the last 3 months? Is there are report on this instrument? Need a usable way of gaining access to publications with data

7 Brian Matthews, CRIS 2002, 31/08/02 7 The Data Portal Concept Single point of access to the CLRC data resources Encompasses a wide range of data holdings –Describes what data is available from the facilities –Links to the data held at the facility –Different archiving methods Caters for a wide range of users –general community  data curators Supports a wide range of queries –employing data mining, thesauri, ….

8 Brian Matthews, CRIS 2002, 31/08/02 8 Combine Diverse Users & Searches... DiscoveryExcavation Wider science community Data curator Specialist user Experimenter General community

9 Brian Matthews, CRIS 2002, 31/08/02 9 … with Distributed Data Silos…. Facility 1Facility 2Facility 3Facility 4

10 Brian Matthews, CRIS 2002, 31/08/02 10 …using a central common metadata index... http CLRC Data Access Server Client XML wrapper Common metadata catalogue database Local data Local metadata XML wrapper Facility 1

11 Brian Matthews, CRIS 2002, 31/08/02 11 … and a Web based interface Exploit the existing Web infrastructure. –Use New Technologies (XML/RDF); –rapidly disseminated; –widely accessible; –database and user platform independent –can be developed now, but with the GRID in mind. Every user who needs to can get to the information.

12 Brian Matthews, CRIS 2002, 31/08/02 12 Metadata Science Metadata Model ISISSRSHEP Space Science Social Science Env. Science A generic metadata model for all scientific applications with Specialisation for each domain Can answer questions across domains Can answer questions about specific domains

13 Brian Matthews, CRIS 2002, 31/08/02 13 Metadata Model Metadata Object Topic Study Description Access Conditions Data Location Data Description Related Material Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational to where the data on the study can be found. References into the literature and community providing context about the study.

14 Brian Matthews, CRIS 2002, 31/08/02 14 Study Description The Study is the basic unit for a scientific activity. Can be further divided into: –Programmes: for connected studies. –Investigations: for a single measurement, experiment or simulation.

15 Brian Matthews, CRIS 2002, 31/08/02 15 Hierarchy of Data Holdings With investigations, there are associated data holdings. These are themselves arranged in a hierarchy: data sets, and files, with links between them Logical organisation – identity separated from location. Data Holding File 1 name: date: Investigation Data Holding Data-Set 1 (Raw)Data-Set 2 (Inter)Data-Set 3 ( Final) File 1 name: date: File 1 name: date:

16 Brian Matthews, CRIS 2002, 31/08/02 16 Metadata example Chemistry Crystal Structure Copper... Crystal Structure: Copper : Palladium: :complex: 150K... Porter... University of Peebles... EPSRC... 21/04/1999…. To study the structure of Copper and Palladium co-ordination complexes at a 150K. Teat... SRS Station 9.8, BRUKER AXS SMART 1K......Wavelength... Angstrom... 0.6890... …Crystal-to-detector distance cm... 5.00... The user has to be one of: Prof. F. Porter….

17 Brian Matthews, CRIS 2002, 31/08/02 17 Metadata collection Metadata collection and maintenance is a big problem. But doing science is a process. Submit proposal Prepare experiment Generate results Analyse results Write report Provenance metadata + access conditions data description +++ data location Related material Collecting the metadata can then become part of the experimental support environment

18 Brian Matthews, CRIS 2002, 31/08/02 18 Grid middleware Architecture Users Other Data Portals Local data Local metadata XML wrapper Facility 4 Local data Local metadata XML wrapper Facility 2 Local data Local metadata XML wrapper Facility 1 Local data Local metadata XML wrapper Facility 3 CLRC broke r XML wrapper Common metadata catalogue database CLRC Data Portal

19 Brian Matthews, CRIS 2002, 31/08/02 19 Server Architecture User input interpreter pre-set XSL Script Query Generator USER Central metadata repository XML File XML Parser Key: Internal http Ascii file External agent module User output generator Response Generator Local metadata repository XML File

20 Brian Matthews, CRIS 2002, 31/08/02 20 Example Result of searching : search across facilities - returns XML to session and displays summary

21 Brian Matthews, CRIS 2002, 31/08/02 21 Expand Results - give more details from the same XML

22 Brian Matthews, CRIS 2002, 31/08/02 22 Going Deeper - Can browse the data sets

23 Brian Matthews, CRIS 2002, 31/08/02 23 Select data - pick the required data files and download from convenient location.

24 Brian Matthews, CRIS 2002, 31/08/02 24 Current developments Pilot completed Consolidate and broaden existing system –move towards a development system –handle a greater diversity of data sources – e.g. Max Planck Institute for Meteorology Enhance the Technology –Web services (SOAP, WDSL, OGSA, XML Query) Provide links to other information sources: –Library systems –Thesauri

25 Brian Matthews, CRIS 2002, 31/08/02 25 Interface with existing archives CLRC maintains existing data archives –Atmospheric, earth observation, STP, astronomy. –Existing access mechanisms (Web, Z39.50) –Existing metadata catalogues and formats Can we use the Data Portal to access them? –Use the Metadata format as a framework to be specialised to express existing metadata framework –XML Query as a query layer on the archive

26 Brian Matthews, CRIS 2002, 31/08/02 26 Re-architect system Break up the portal middleware into components. DP Results collation Data source location Query generation ontology service Security service Replication service User service replication service Globus GIS - MDS Globus GSI Grid Enable with Web Services RDF+DAML+OIL XML Query

27 Brian Matthews, CRIS 2002, 31/08/02 27 Access to Data and Publications The Data Portal offers the potential to integrate the outputs of scientific research: data and publications. Need to have a common search mechanism over library and data portals. –Can abstract the science metadata to Dublin Core. –Links to CERIF would further deepen connection. –Access to common thesauri for classification. Common web service interface –Data Portal provides this. –XML Query as a communication mechanism

28 Brian Matthews, CRIS 2002, 31/08/02 28 Mapping between Dublin Core and Science Metadata Title –Study: Name Creator –Study: Investigator: Name (Role is principle investigator) Subject –Topic: Keyword Description –Study: Study Information: Purpose Publisher –Investigation: Data Manager Contributor –Study: Investigator: Name ; Investigation: Data Manager Date –Study: Study Information: Time Resource Type –Collection; or Dataset. Format –Data Description: File Format Resource Identifier –Study: Study Id (whole study) –Data description: File: URI (for individual data files). Source –Data description: Data sets: Related Data sets –Related Material: Related work Language –Not covered in the current metadata format; but an simple extension Relation –Related Material: Related work Coverage –Data description: Logical Description: Coverage Rights Management –Access Conditions

29 Brian Matthews, CRIS 2002, 31/08/02 29 Where are we? Data Portal up and running –Being developed in the E-Science Centre in CLRC http://esc.dl.ac.uk:9000/index.html –Science metadata proving very robust –Trying to extend its use into other areas of science – materials science, environmental science. Beginning to approach the problem of integrating with electronic library resources. b.m.matthews@rl.ac.uk


Download ppt "Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept,"

Similar presentations


Ads by Google