INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

C. Grimme, A. Papaspyrou Scheduling in C3-Grid AstroGrid-D Workshop Project: C3-Grid Collaborative Climate Community Data and Processing Grid Scheduling.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1 NODC, Russia GISC & DCPC developers meeting Langen, 29 – 31 March E2EDM technology implementation for WIS GISC development S. Sukhonosov, S. Belov.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
GIS e-Science: developing a roadmap Paul S. Ell Centre for Data Digitisation & Analysis Queen’s Belfast.
NERC Data Grid Helen Snaith and the NDG consortium …
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
EU 2nd Year Review – Jan – WP9 WP9 Earth Observation Applications Demonstration Pedro Goncalves :
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
DARIAH Rutger Kramer Software Development Coordinator DANS – KNAW, The Hague, NL EGEE09.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
XIth International Congress for Mathematical Geology - September 3-8, 2006 – Liège, Belgium Contribution of GeoScienceML to the INSPIRE data harmonisation.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S. Fiore The Climate-G testbed is an interdisciplinary.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
INFSO-RI Enabling Grids for E-sciencE A service oriented framework to create, manage and update metadata for earth system science.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Demo Session Introduced by Massimo Lamanna.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
1 Using the GEOSS Common Infrastructure in the Air Quality & Health SBA: Wildfire & Smoke Assessment Prepared by the GEOSS AIP-2 Air Quality & Health Working.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Workflow management tool for Earth science applications Ladislav Hluchy, Viet Tran Institute of Informatics.
DKRZ German Climate Computing Center Stephan Kindermann Distributed Data Handling Infrastructures in Climatology and “the Grid”
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
GEOSS Common Infrastructure (GCI) The GEOSS Common Infrastructure allows Earth Observations users to search, access and use the data, information, tools.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
DataGrid France 12 Feb – WP9 – n° 1 WP9 Earth Observation Applications.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
INFSO-RI Enabling Grids for E-sciencE ESR Database Access K. Ronneberger,DKRZ, Germany H. Schwichtenberg, SCAI, Germany S. Kindermann,
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Medical Data Manager use case: 3D medical images analysis workflow.
Flanders Marine Institute (VLIZ)
GSAF Grid Storage Access Framework
GSAF Grid Storage Access Framework
Gordon Erlebacher Florida State University
Data Management Components for a Research Data Archive
Introduction to the SHIWA Simulation Platform EGI User Forum,
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany S. Kindermann, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany

Enabling Grids for E-sciencE INFSO-RI st EU-Review May QFLUX: Humidity flux calculation

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Structure What is Earthsystem Science about? –Typical workflows –Traditional infrastructure Why can grid-technology help? –Limits of the current practice –Outline of possible and existing use areas How do we use this technology? –Conceptual Outline of the developing infrastructure –Demo of an example workflow Potential impact and vision –Next steps and challenges

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Earthsystem Sciences Goal: learn about the past, the present, and possible futures of the earth system Community: internationally and interdisciplinary distributed but strongly interconnected Method: Analysing, comparing and processing data Input: data from observations and/or other modelling studies Collect & Prepare Visualize 4 Analyse Find & Select Distributed Climate Data Model Data Observation Data Analysis Dataset Result Dataset Scenario data 3 2 Data description 1 Typical workflow

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Visualize selected result An example workflow: “qflux” Collect & Prepare a temporal and spatial subset of the data 4 Analyse the integrated, transport of humidity between selected levels Find & Select relevant & available datasets Distributed Climate Data Analysis Dataset Result Dataset Wind speed Temperature Specific humidity Datavolume Several PB ~3,1TB ( files) ~10,3GB (28 files) ~76 MB ~6MB ~66KB Location Various data centers & portals Institutional storage & computing facilities local facilities Personal Computer

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Potential use of grid technology Search & selectSearch & select –Different portals with different authentications and data descriptions Collect & prepareCollect & prepare –Different access mechanisms of the different providers –Pre-processing requires sufficient local facilities AnalyseAnalyse –Existing tools and already processed data are available locally and miss proper description VisualizeVisualize –Detached from the remaining workflow Current issues Central unique authentication to a common catalogue with standardized metadata Shared resources with standardized access hiding proprietary access mechanisms Commonly defined tool description Log processing steps and automatically republish processed data Integrate basic visualization (first peep) into the workflow

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Find & select Collect & prepare analyse visualize C3 Grid and EGEE - the components Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139) Standardized access interface: hide the complexity of specific data access mechanisms and pre-processing functionalities (webservice technology) Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Data access in ESR grid projects Earth System Grid project (USA) C3 Grid (Germany) NERC data grid (UK) Scope (project) High performance access of climate model data Uniform & effective discovery and access of data of various disciplines & types Harmonized & detailed search and access of data of various disciplines & types Data stock (status) Homogenous Flat-file storage Heterogeneous Databases & flat-file storage Heterogeneous Databases & flat-file storage Data description (solution) Use aspect of data, tools and models E.g. NcML for netCDF data Discovery and some use aspects ISO 19115/ISO Content of the data in great detail Semantic datamodel (CSML, based on GML) Data access (solution) Different protocols Intelligence at portal Uniform access interface Intelligence at data provider / grid Different protocols Intelligence at portal

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Bridging EGEE and C3 EGEE UI C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index OAI-PMH server Webservice Interface OAI-PMH server AMGA Metadata Catalog (f) Publish (ISO 19115/19139) (g) Harvest (OAI-PMH) German Climate Data Providers: WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS Data Resource Metadata (a) Publish (ISO 19115/19139) (b) Harvest (OAI-PMH)

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Demo (1) Search-, discover-, and select- functionalities of the portal (2) Upload and register data to EGEE (3) Trigger the example workflow qflux from the portal

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Upload pre-processed data to EGEE EGEE UI Data Resource C3Grid data interface Climate Data Workspace Webservice Interface SECE WN LFC Catalog Web Portal C3 Lucene Index Webservice Interface OAI-PMH server OAI-PMH server AMGA Metadata Catalog (1) Find & Select (2) Collect & Prepare (b) Retrieve (jdbc or archive) (c) Stage & Provide Webservice Interface (a) Request (webservice) (d) notify Webservice Interface (f) Transfer & Register (lcg-tools) (e) Request (webservice) (g) Register ( Java-API) Metadata (f) Publish (ISO 19115/19139)

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Trigger qflux workflow EGEE UI Data Resource Metadata C3Grid data interface Climate Data Workspace Webservice Interface SECE WN (3) Analyse LFC Catalog (4) Visualize Web Portal C3 Lucene Index Webservice Interface OAI-PMH server OAI-PMH server AMGA Metadata Catalog Webservice Interface (b) submit (glite) qflux (a) Request (webservice) (g) Harvest (OAI-PMH) (f) Publish (ISO 19115/19139) (c) retrieve (lcg-tools) (e) Return graphic (d) Update (Java-API)

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Potential Impact Ease and accelerate the search, discovery, access and processing of German ESR data  Potential impact on the German ESR-community Provide a framework to easily and consistently exchange and manage esr-data and tools between EGEE and traditional earth science data- storage-systems  Potential impact on current and potential EGEE ESR-community Other portals or infrastructures can be integrated analogously to EGEE  Potential impact on international ESR-community Built on international standards thus easy adaptable/expandable by other disciplines and by further partners  Potential impact on other disciplines

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Next steps Expand the demonstrated prototype to a reliable and stable system Porting further workflows and some pre- processing functionalities to EGEE Enlarge the user community

Enabling Grids for E-sciencE INFSO-RI st EU-Review May Future challenges or missing bricks Establish a comprehensive and consistent security context to control access to (restricted) data with a single sign-on –C3Grid starts to implement a federated AA infrastructure based on Shibboleth Describe analysis-services to improve discovery, use and share possibilities –First approaches to adapt ISO19119/19139 as a common metadata format for tool description Modularize workflows to increase the flexibility and enable intelligent scheduling –First steps to implement a workflow information service