Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.

Similar presentations


Presentation on theme: "Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG."— Presentation transcript:

1 Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG

2 ESG Overview Earth System Grid enables management, discovery, distributed access, processing and analysis of distributed terascale climate research dataEarth System Grid enables management, discovery, distributed access, processing and analysis of distributed terascale climate research data A “Collaboratory Pilot Project” funded by the DOE(Department of Energy) SciDAC programA “Collaboratory Pilot Project” funded by the DOE(Department of Energy) SciDAC program Build upon ESG-I, Globus Toolkit , DataGrid technologiesBuild upon ESG-I, Globus Toolkit , DataGrid technologies

3 ESG Overview The main goal of ESG is to make climate data an easily accessible community resource.The main goal of ESG is to make climate data an easily accessible community resource. Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical.Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develope a collection of server-side capabilities – minimize the amount of data movementThe broad strategy is to develope a collection of server-side capabilities – minimize the amount of data movement Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulationMultiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation

4 ESG Participants ANL Argonne National Laboratory (Argonne, IL) ISIInformation Sciences Institute (Marina del Rey, CA) LANLLos Alamos National Laboratory (Los Alamos, NM) LBNLLawrence Berkeley National Laboratory (Berkeley, CA) LLNL Lawrence Livermore Nat. Laboratory (Livermore, CA) NCARNat. Center for Atmospheric Research (Boulder, CO) NERSCNat. Energy Res. Scient. Comp. Center (Oakland, CA) ORNLOak Ridge National Laboratory (Oak Ridge, TN) USCUniversity Of Southern California (Los Angeles, CA)

5 ESG History ESG-I: DOE NGI(Next Generation Internet) project –Focus on high-performance data movement, Grid-enabled versions of LLNL tools –Early successes include bandwidth challenge at SC’2001, significant technology output –Experimental deployments only, at participating sites ESG-II: DOE SciDAC(Scientific Discovery through Advanced Computing) project –“Smart servers” for server-side data reduction –Integration with common “thin” clients, e.g. DODS and Data Portals –Client software in the hands of environmental scientists –Production deployments at participating instances

6 Climate GRID Example for Ocean Model Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_bounds(i,j,4) Lon_bounds(i,j,4)

7 Geographical Overview

8 ESG-II Architecture

9 ESG Components User authentication Metadata Search Replica Location and transfer Data analysis and visualization Demonstration Workflow: Globus Toolkit (ANL, ISI) –GridFTP data transfer –GRAM resource access –Community Authorization Service (CAS) –Replica Location Service (RLS) –Metadata Catalog Service (MCS) Web interface (NCAR) and workflow manager Hierarchical Resource Manager (HRM) (LBNL) Storage Resource Manager Metadata (NCAR, LLNL, ISI) OpenDAP-G (NCAR, ANL) Live Access Server (NCAR)

10 The Globus Toolkit™ An Open Source Project Security Directory, Metadata, and Replica Services Resource Management Data Access and Management Distributed Computation Open Grid Services Architecture (OGSA) –Reliable, persistent web services

11 The Globus Toolkit™ Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids. GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library. ESG is integrating OpenDAP (DODS protocol) with GridFTP protocol. Single sign-on using Grid Security Infrastructure Proxy certificates Community Authorization Service (CAS) Replica Location Service: manages copying and placement of files in a distributed environment. Logical vs. physical files

12 Distributed Data Access Protocol Data (local) netCDF lib Application Data (remote) OpenDAP Client Application OpenDAP Via http Big Data (remote) ESG client Application ESG Grid + DODS OpenDAP Server ESG Server Distributed Application data OpenDAP Via Grid Typical Application Grid + OpenDAP - -Transparency - -Performance - -Security - -Resource Management - -Analysis functions

13 ESG Metadata Services METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA QUERY METADATA QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database CF Database mirror Dublin Core XML Files COMMENTS XML Files METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA DISCOVERY METADATA DISCOVERY METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEACH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION

14 Resource Management Hierarchical Resource Manager - queuing of file transfer requests - reordering of request to optimize Parallel FTP - monitoring progress and error messages - re-schedules failed transfers - enforces local resource policy Storage Resource Management - Manage space - Manage files on behalf of a user - Manage file sharing - Get files from remote locations when necessary - Manage multi-file requests - Provide grid access to/from mass storage - Transfer protocol negotiation

15 Live Access Server General purpose Web server for geo-science data sets Directs communications between a user and an application running under a Web server Converts requests into a series of commands which actually does the data access

16 ESG Data Portal Goal: Make large ESG data sets easily accessible to Scientists for production use Scientists for production use

17 TOMCAT Servlet engine TOMCAT Servlet engine MCS Metadata Cataloguing Services MCS Metadata Cataloguing Services RLS Replica Location Services RLS Replica Location Services SOAP RMI MyProxy server MyProxy server MCS client RLS client MyProxy client GRAM gatekeeper GRAM gatekeeper CAS Community Authorization Services CAS Community Authorization Services CAS client disk MSS Mass Storage System HPSS High Performance Storage System disk HPSS High Performance Storage System disk SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server LBNL LLNL ISI NCAR ORNL ANL Striped gridFTP client Striped gridFTP client gridFTP openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server gridFTP openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server gridFTP LAS Live Access Server LAS Live Access Server

18 ESG: Strategies & Goals Move data a minimal amount, keep it close to computational point of origin when possible –Data access protocols, distributed analysis When we must move data, do it fast and with a minimum amount of human intervention –Storage Resource Management, fast networks Keep track of what we have, particularly what’s on deep storage –Metadata and Replica Catalogs Harness a federation of sites –Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

19 ESG Development in 2003 Metadata Conventions and Services –Application groups deciding on one (or more) metadata schemas –Better MCS support for XML schema –Distribution and federation of heterogeneous metadata catalogs Integration of DODS server and GridFTP data transport protocol Customization of Replica Location Service for ESG Storage Resource Manager (from LBNL) to optimize storage transfers Community authorization service to provide fine-grained access control


Download ppt "Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG."

Similar presentations


Ads by Google