Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Earth System Curator Spanning the Gap Between Models and Datasets.
High Performance Computing Course Notes Grid Computing.
1 SRM-Lite: overcoming the firewall barrier for large scale file replication Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory April, 2007.
Data Grids Darshan R. Kapadia Gregor von Laszewski
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Long Term Ecological Research Network Information System LTER Grid Pilot Study LTER Information Manager’s Meeting Montreal, Canada 4-7 August 2005 Mark.
Ian Foster Argonne National Lab University of Chicago Globus Project The Grid and Meteorology Meteorology and HPN Workshop, APAN.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
ESG The Earth System Grid (ESG) Presented by Don Middleton & Luca Cinquini NCAR Scientific Computing Division On Behalf of the ESG Team SCD Executive Committee.
The Earth System Grid (ESG) Goals, Objectives and Strategies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Intergrid KoM Santander 22 june, 2006 E-Infraestructure shared between Europe and Latin America José Manuel Gutiérrez
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid: A Visualisation Solution Gary Strand.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Fox 2 AISRP April 4-6, 2005  Earth System Grid  Grid-enabled OPeNDAP  Architecture - Server and Application access  Framework experience.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
1 SRM-Lite: overcoming the firewall barrier for data movement Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory SDM Center All-Hands.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Current Globus Developments Jennifer Schopf, ANL.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
The Data Grid: Towards an architecture for Distributed Management
The Earth System Grid: A Visualisation Solution
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Data Requirements for Climate and Carbon Research
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Data Management Components for a Research Data Archive
Presentation transcript:

Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG

ESG Overview Earth System Grid enables management, discovery, distributed access, processing and analysis of distributed terascale climate research dataEarth System Grid enables management, discovery, distributed access, processing and analysis of distributed terascale climate research data A “Collaboratory Pilot Project” funded by the DOE(Department of Energy) SciDAC programA “Collaboratory Pilot Project” funded by the DOE(Department of Energy) SciDAC program Build upon ESG-I, Globus Toolkit , DataGrid technologiesBuild upon ESG-I, Globus Toolkit , DataGrid technologies

ESG Overview The main goal of ESG is to make climate data an easily accessible community resource.The main goal of ESG is to make climate data an easily accessible community resource. Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical.Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develope a collection of server-side capabilities – minimize the amount of data movementThe broad strategy is to develope a collection of server-side capabilities – minimize the amount of data movement Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulationMultiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation

ESG Participants ANL Argonne National Laboratory (Argonne, IL) ISIInformation Sciences Institute (Marina del Rey, CA) LANLLos Alamos National Laboratory (Los Alamos, NM) LBNLLawrence Berkeley National Laboratory (Berkeley, CA) LLNL Lawrence Livermore Nat. Laboratory (Livermore, CA) NCARNat. Center for Atmospheric Research (Boulder, CO) NERSCNat. Energy Res. Scient. Comp. Center (Oakland, CA) ORNLOak Ridge National Laboratory (Oak Ridge, TN) USCUniversity Of Southern California (Los Angeles, CA)

ESG History ESG-I: DOE NGI(Next Generation Internet) project –Focus on high-performance data movement, Grid-enabled versions of LLNL tools –Early successes include bandwidth challenge at SC’2001, significant technology output –Experimental deployments only, at participating sites ESG-II: DOE SciDAC(Scientific Discovery through Advanced Computing) project –“Smart servers” for server-side data reduction –Integration with common “thin” clients, e.g. DODS and Data Portals –Client software in the hands of environmental scientists –Production deployments at participating instances

Climate GRID Example for Ocean Model Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_bounds(i,j,4) Lon_bounds(i,j,4)

Geographical Overview

ESG-II Architecture

ESG Components User authentication Metadata Search Replica Location and transfer Data analysis and visualization Demonstration Workflow: Globus Toolkit (ANL, ISI) –GridFTP data transfer –GRAM resource access –Community Authorization Service (CAS) –Replica Location Service (RLS) –Metadata Catalog Service (MCS) Web interface (NCAR) and workflow manager Hierarchical Resource Manager (HRM) (LBNL) Storage Resource Manager Metadata (NCAR, LLNL, ISI) OpenDAP-G (NCAR, ANL) Live Access Server (NCAR)

The Globus Toolkit™ An Open Source Project Security Directory, Metadata, and Replica Services Resource Management Data Access and Management Distributed Computation Open Grid Services Architecture (OGSA) –Reliable, persistent web services

The Globus Toolkit™ Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids. GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library. ESG is integrating OpenDAP (DODS protocol) with GridFTP protocol. Single sign-on using Grid Security Infrastructure Proxy certificates Community Authorization Service (CAS) Replica Location Service: manages copying and placement of files in a distributed environment. Logical vs. physical files

Distributed Data Access Protocol Data (local) netCDF lib Application Data (remote) OpenDAP Client Application OpenDAP Via http Big Data (remote) ESG client Application ESG Grid + DODS OpenDAP Server ESG Server Distributed Application data OpenDAP Via Grid Typical Application Grid + OpenDAP - -Transparency - -Performance - -Security - -Resource Management - -Analysis functions

ESG Metadata Services METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA QUERY METADATA QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database CF Database mirror Dublin Core XML Files COMMENTS XML Files METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA DISCOVERY METADATA DISCOVERY METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEACH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION

Resource Management Hierarchical Resource Manager - queuing of file transfer requests - reordering of request to optimize Parallel FTP - monitoring progress and error messages - re-schedules failed transfers - enforces local resource policy Storage Resource Management - Manage space - Manage files on behalf of a user - Manage file sharing - Get files from remote locations when necessary - Manage multi-file requests - Provide grid access to/from mass storage - Transfer protocol negotiation

Live Access Server General purpose Web server for geo-science data sets Directs communications between a user and an application running under a Web server Converts requests into a series of commands which actually does the data access

ESG Data Portal Goal: Make large ESG data sets easily accessible to Scientists for production use Scientists for production use

TOMCAT Servlet engine TOMCAT Servlet engine MCS Metadata Cataloguing Services MCS Metadata Cataloguing Services RLS Replica Location Services RLS Replica Location Services SOAP RMI MyProxy server MyProxy server MCS client RLS client MyProxy client GRAM gatekeeper GRAM gatekeeper CAS Community Authorization Services CAS Community Authorization Services CAS client disk MSS Mass Storage System HPSS High Performance Storage System disk HPSS High Performance Storage System disk SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server LBNL LLNL ISI NCAR ORNL ANL Striped gridFTP client Striped gridFTP client gridFTP openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server gridFTP openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server gridFTP LAS Live Access Server LAS Live Access Server

ESG: Strategies & Goals Move data a minimal amount, keep it close to computational point of origin when possible –Data access protocols, distributed analysis When we must move data, do it fast and with a minimum amount of human intervention –Storage Resource Management, fast networks Keep track of what we have, particularly what’s on deep storage –Metadata and Replica Catalogs Harness a federation of sites –Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

ESG Development in 2003 Metadata Conventions and Services –Application groups deciding on one (or more) metadata schemas –Better MCS support for XML schema –Distribution and federation of heterogeneous metadata catalogs Integration of DODS server and GridFTP data transport protocol Customization of Replica Location Service for ESG Storage Resource Manager (from LBNL) to optimize storage transfers Community authorization service to provide fine-grained access control