Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany."— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany S. Kindermann, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany

2 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 2 QFLUX: Humidity flux calculation

3 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 3 Structure What is Earthsystem Science about? –Typical workflows –Traditional infrastructure Why can grid-technology help? –Limits of the current practice –Outline of possible and existing use areas How do we use this technology? –Conceptual Outline of the developing infrastructure –Demo of an example workflow Potential impact and vision –Next steps and challenges

4 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 4 Earthsystem Sciences Goal: learn about the past, the present, and possible futures of the earth system Community: internationally and interdisciplinary distributed but strongly interconnected Method: Analysing, comparing and processing data Input: data from observations and/or other modelling studies Collect & Prepare Visualize 4 Analyse Find & Select Distributed Climate Data Model Data Observation Data Analysis Dataset Result Dataset Scenario data 3 2 Data description 1 Typical workflow

5 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 5 Visualize selected result An example workflow: “qflux” Collect & Prepare a temporal and spatial subset of the data 4 Analyse the integrated, transport of humidity between selected levels Find & Select relevant & available datasets Distributed Climate Data Analysis Dataset Result Dataset Wind speed 3 2 1 Temperature Specific humidity Datavolume Several PB ~3,1TB (300-500 files) ~10,3GB (28 files) ~76 MB ~6MB ~66KB Location Various data centers & portals Institutional storage & computing facilities local facilities Personal Computer

6 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 6 Potential use of grid technology Search & selectSearch & select –Different portals with different authentications and data descriptions Collect & prepareCollect & prepare –Different access mechanisms of the different providers –Pre-processing requires sufficient local facilities AnalyseAnalyse –Existing tools and already processed data are available locally and miss proper description VisualizeVisualize –Detached from the remaining workflow Current issues Central unique authentication to a common catalogue with standardized metadata Shared resources with standardized access hiding proprietary access mechanisms Commonly defined tool description Log processing steps and automatically republish processed data Integrate basic visualization (first peep) into the workflow

7 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 7 Find & select Collect & prepare analyse visualize C3 Grid and EGEE - the components Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139) Standardized access interface: hide the complexity of specific data access mechanisms and pre-processing functionalities (webservice technology) Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )

8 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 8 Data access in ESR grid projects Earth System Grid project (USA) C3 Grid (Germany) NERC data grid (UK) Scope (project) High performance access of climate model data Uniform & effective discovery and access of data of various disciplines & types Harmonized & detailed search and access of data of various disciplines & types Data stock (status) Homogenous Flat-file storage Heterogeneous Databases & flat-file storage Heterogeneous Databases & flat-file storage Data description (solution) Use aspect of data, tools and models E.g. NcML for netCDF data Discovery and some use aspects ISO 19115/ISO 19139 Content of the data in great detail Semantic datamodel (CSML, based on GML) Data access (solution) Different protocols Intelligence at portal Uniform access interface Intelligence at data provider / grid Different protocols Intelligence at portal

9 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 9 Bridging EGEE and C3 EGEE UI C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index OAI-PMH server Webservice Interface OAI-PMH server AMGA Metadata Catalog (f) Publish (ISO 19115/19139) (g) Harvest (OAI-PMH) German Climate Data Providers: WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS Data Resource Metadata (a) Publish (ISO 19115/19139) (b) Harvest (OAI-PMH)

10 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 10 Demo (1) Search-, discover-, and select- functionalities of the portal (2) Upload and register data to EGEE (3) Trigger the example workflow qflux from the portal

11 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 11 Upload pre-processed data to EGEE EGEE UI Data Resource C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index Webservice Interface OAI-PMH server OAI-PMH server AMGA Metadata Catalog (1) Find & Select (2) Collect & Prepare (b) Retrieve (jdbc or archive) (c) Stage & Provide Webservice Interface (a) Request (webservice) (d) notify Webservice Interface (f) Transfer & Register (lcg-tools) (e) Request (webservice) (g) Register ( Java-API) Metadata (f) Publish (ISO 19115/19139)

12 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 12 Trigger qflux workflow EGEE UI Data Resource Metadata C3Grid data interface Climate Data Workspace Webservice Interface SECE WN (3) Analyse LFC Catalog (4) Visualize Web Portal C3 Lucene Index Webservice Interface OAI-PMH server OAI-PMH server AMGA Metadata Catalog Webservice Interface (b) submit (glite) qflux (a) Request (webservice) (g) Harvest (OAI-PMH) (f) Publish (ISO 19115/19139) (c) retrieve (lcg-tools) (e) Return graphic (d) Update (Java-API)

13 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 13 Potential Impact Ease and accelerate the search, discovery, access and processing of German ESR data  Potential impact on the German ESR-community Provide a framework to easily and consistently exchange and manage esr-data and tools between EGEE and traditional earth science data- storage-systems  Potential impact on current and potential EGEE ESR-community Other portals or infrastructures can be integrated analogously to EGEE  Potential impact on international ESR-community Built on international standards thus easy adaptable/expandable by other disciplines and by further partners  Potential impact on other disciplines

14 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 14 Next steps Expand the demonstrated prototype to a reliable and stable system Porting further workflows and some pre- processing functionalities to EGEE Enlarge the user community

15 Enabling Grids for E-sciencE INFSO-RI-031688 1st EU-Review May 15.-16 2007 15 Future challenges or missing bricks Establish a comprehensive and consistent security context to control access to (restricted) data with a single sign-on –C3Grid starts to implement a federated AA infrastructure based on Shibboleth Describe analysis-services to improve discovery, use and share possibilities –First approaches to adapt ISO19119/19139 as a common metadata format for tool description Modularize workflows to increase the flexibility and enable intelligent scheduling –First steps to implement a workflow information service


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany."

Similar presentations


Ads by Google