1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order.

Slides:



Advertisements
Similar presentations
A. Sim, CRD, L B N L 1 ANI and Magellan Launch, Nov. 18, 2009 Climate 100: Scaling the Earth System Grid to 100Gbps Networks Alex Sim, CRD, LBNL Dean N.
Advertisements

Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
High Performance Computing Course Notes Grid Computing.
ESCI/CMIP5 Tools - Jeudi 2 octobre CMIP5 Tools Earth System Grid-NetCDF4- CMOR2.0-Gridspec-Hyrax …
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Earth System Grid: Model Data Distribution & Server-Side Analysis to Enable Intercomparison Projects PCMDI Software Team UCRL-PRES
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
The Earth System Grid (ESG) Goals, Objectives and Strategies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks G. Quigley, B. Coghlan, J. Ryan (TCD). A.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
Grid Middleware Tutorial / Grid Technologies IntroSlide 1 /14 Grid Technologies Intro Ivan Degtyarenko ivan.degtyarenko dog csc dot fi CSC – The Finnish.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
ESG Observational Data Integration Presented by Feiyi Wang Technology Integration Group National Center of Computational Sciences.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
1 Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
ESMF and the future of end-to-end modeling Sylvia Murphy National Center for Atmospheric Research
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech 5 th GO-ESSP Community Meeting.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Metadata Development in the Earth System Curator
Data Management Components for a Research Data Archive
Presentation transcript:

1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order but also a major technological challenge. The international climate community is expected to generate hundreds of petabytes of simulation data within the next three to seven years.

2 Earth System Grid Center for Enabling Technologies: (ESG-CET) Data management and analysis for the Earth System Grid  ESG’s mission is to provide climate researchers worldwide with access to: data, information, models, analysis tools, and computational capabilities required to make sense of enormous climate simulation datasets.  ESG’s goals make data more useful to climate researchers by developing Grid technology that enhances data usability, meet specific distributed database, data access, and data movement needs of national and international climate projects, provide a universal and secure web-based data access portal for broad multi- model data collections, and provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies. develop key ideas and concepts that are important contributions to other domain areas.

3  Early 1990’s (e.g., AMIP1, PMIP, CMIP1): modest collection of monthly mean 2D files: ~1 GB  Late 1990’s (e.g., AMIP2): large collection of monthly mean and 6-hourly 2D and 3D fields: ~500 GB  Present (e.g., IPCC/CMIP3): fairly comprehensive output from both ocean and atmospheric components; monthly, daily, and 3 hourly: ~35 TB  Future 2010: The IPCC 5 th Assessment Report (AR5) in 2010: expected between 2.5 to 15 PB; The Climate Science Computational End Station (CCES) project at ORNL: expected around 3 PB; The North American Regional Climate Change Assessment Program (NARCCAP): expected around 1 PB; and The Cloud Feedback Model Intercomparison Project (CFMIP) archives: expected to be.3 PB Growing data Earth System Grid Center for Enabling Technologies: (ESG-CET)

4 Network Traffic, Climate and Physics Data, and Network Capacity ESnet traffic HEP experiment data ESnet capacity roadmap Historical Projection All Three Data Series are Normalized to “1” at Jan Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth” measures are growing significantly faster than ESnet projected capacity Climate model data 2010 value PBy -- 4 PBy

5 Data integration challenges facing climate science  Modeling groups will generate more data in the near future than exist today  Large part of research consists of writing programs to analyze data  How best to collect, distribute, and find data on a much larger scale? At each stage tools could be developed to improve efficiency Substantially more ambitious community modeling projects (Petabyte (PB ) and Exabyte (EB )) will require a distributed database  Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. )  How to make information understandable to end-users so that they can interpret the data correctly  More users than just Working Group (WG)1-science. (WG2-impacts and WG3- mitigation) (Policy makers, economists, health officials, etc.)  Integration of multiple analysis tools, formats, data from unknown sources  Trust and security on a global scale (not just an agency or country, but worldwide ) Earth System Grid Center for Enabling Technologies: (ESG-CET)

6 Petabyte-scale data volumes Globally federated sites “Virtual Datasets” created through subsetting and aggregation Metadata-based search and discovery Bulk data access Web-based and analysis tool access Increased flexibility and robustness ESG Goals Current ESG Sites Earth System Grid Center for Enabling Technologies (ESG-CET)

7  Data Different formats not standardized Different sites require knowledge of different methods of access Log onto multiple sites to hopefully find and retrieve data Gigabyte (GB 10 9 ) - Terabyte (TB ) data volume  Metadata Painful to produce Most kept in files separate from data Data lost or reproduced numerous times  Locating data Manual Unreachable unless one is “in the know” (location kept in someone’s brain) Not formalized  Data requests/analysis Beginnings of a formal process Far too much done by hand Logging nearly non-existent  Data Standard output Model compliance tools to facilitate standard output Quality assurance Different sites but standardized access protocol One stop shop Terabyte (TB ) - Petabyte (PB ) data volume  Metadata Exhaustive detail Created via semi-automated processes Put data in databases and make it visible to others  Locating data Formalized process Highly granular – down to per-file, per-model level, per-variable Readily searchable - sophisticated search tools  Data requests/analysis/visualization Completely automated All logging done automatically Secure Before ESGESG Present, Future Computers do the more complicated and repetitive tasks and scientists focus on research Tremendous manual intervention, inefficient by any measure Climate model data management and analysis issues Earth System Grid Center for Enabling Technologies: (ESG-CET)

8 Current ESG architecture and underlying technologies  Climate Data Metadata Catalog NcML (metadata schema) OPeNDAP-g (aggregation and subsetting)  Data Management Storage Resource Mgr  Data Transfer Globus Security Infrastructure Data Mover Lite GridFTP Monitoring and Discovery Services Replica Location Service  Security Access Control MyProxy User Registration  Long-term Storage Tertiary data storage systems Earth System Grid Center for Enabling Technologies: (ESG-CET)

9 CCSM ESG Portal CMIP3 (IPCC AR4) ESG Portal 198 TB of data at four locations  1,150 datasets  1,032,000 files  Includes the past 6 years of joint DOE/NSF climate modeling experiments 35 TB of data at one location  74,700 files  Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change  Data from 13 countries, representing 25 models 8,000 registered users2,000 registered projects Downloads to date  60 TB  176,000 files Downloads to date  ~1/2 PB  1,300,000 files  500 GB/day (average) 400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data ESG: The world’s source for climate modeling data ESG usage: over 500 sites worldwide ESG monthly download volumes Earth System Grid Center for Enabling Technologies: (ESG-CET)

10 Earth System Grid Center for Enabling Technologies: (ESG-CET) Intercomparison example  Browse database  Download data  Organize data on local site  Regrid data at local site  Perform diagnostics  Produces results  Search, browse and discover distributed data  Remote site  Request data  Regrids  Data system reduction  ESG returns user defined products Future Usage Current Usage

11 Earth System Grid Center for Enabling Technologies: (ESG-CET)  Much broader model for scientific metadata  ‘Faceted’ search capability guides the user toward datasets of interest At a given point in the search, only those options which produce non-empty result sets are shown Avoids ‘deadend’ searches Flexible browsing hierarchy  Automated, GUI-based publication tools  Single sign-on  Full support for data aggregations A collection of files, usually ordered by simulation time, that can be treated as a single file for purposes of data access, computation, and visualization  File-streaming and release capabilities for data access to deep storage  Client access to subsetting, visualization services  Server-generated visualization products  Fine-grained access to datasets based on user groups and roles.  User notification service Users can choose to be notified when a dataset has been modified  Pre-computed products (e.g., global averages)  User workspace (storing of favorite products, search criteria, etc.) ESG-CET improvements over ESG II

12 Earth System Grid Center for Enabling Technologies: (ESG-CET) Gateways and nodes  Federated architecture  Gateways Portals, search capability, distributed metadata, registration and user management Initially PCMDI, NCAR, ORNL, eventually GFDL May be customized to an institution’s requirements More complex architecture than nodes, fewer sites  Nodes Where data is stored and published Data may be on disk or tertiary mass store. Each node has a trust relationship with a specific gateway, for publication. Data reduction Less complex architecture A site can be both a gateway and a node. Federation is a virtual trust relationship among independent management domains that have their own set of services. Users authenticate once to gain access to data across multiple systems and organizations.

13 Earth System Grid Center for Enabling Technologies: (ESG-CET) Architecture of the next generation of ESG-CET

14 The next generation ESG-CET system Earth System Grid Center for Enabling Technologies: (ESG-CET) Distributed and federated architecture Support discipline specific Gateways Support browser-based + direct client access

15 Use Case 1: scientific metadata search “Find surface temperature data across all models for a specific IPCC experiment that has volcanic forcing.” Earth System Grid Center for Enabling Technologies: (ESG-CET) Capture Scientific Metadata in detailed object model Faceted Search to slice through data via user-selected categories Link to Data Access Points (files or products) Broker application for one-click request of data products?

16 Use Case 2: large number of files “Download 1000 files from deep storage to my desktop while I am sleeping.” Earth System Grid Center for Enabling Technologies: (ESG-CET) Gateway interaction via Web Services: User finds, requests data through Gateway User passes request identifier to DML DML downloads file as they become available DML “releases” files already downloaded User checks request status via DML (or Gateway)

17 Use Case 3: high-end product, federation “Show me sea surface temperature plots for 3 different datasets (output of different models, same forcing) that are stored at 3 different locations.” Earth System Grid Center for Enabling Technologies: (ESG-CET) More powerful data selection algorithm Integration of LAS product server on Gateway, Data Nodes Single Sign-On authentication via OpenID Common authorization model

18 Use Case 4: multiple intercomparison example Earth System Grid Center for Enabling Technologies: (ESG-CET)

19 Earth System Grid Center for Enabling Technologies: (ESG-CET) Immediate and future challenges of software development  Sustain and build upon the very successful ESG archives (e.g., CCSM, CMIP3, CFMIP, PCM, POP, etc.)  Address future scientific needs for data management and analysis by extending support for sharing and diagnosing climate simulation data SciDAC II: A Scalable and Extensible Earth System Model for Climate Change Science Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists contributing to the IPCC Fifth Assessment Report (AR5) in 2010, The Climate Science Computational End Station (CCES), The North American Regional Climate Change Assessment Program (NARCCAP), and Other wide-ranging climate model evaluation activities.  How to make information understandable to end-users so that they can interpret the data correctly  Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …) Integrating analysis into a distributed environment Providing climate diagnostics Delivering climate component software to the community

20 AR5 testbed partners  Major driver for global federation: CMIP5 IPCC (AR5) in 2010  By early 2009 it is expected to include: Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.), National Center for Atmospheric Research - NCAR (U.S.), Geophysical Fluid Dynamics Laboratory - GFDL (U.S.), Oak Ridge National Laboratory - ORNL (U.S.), British Atmosphere Data Centre - BADC (U.K.), Max Planck Institute for Meteorology - MPI (Germany), The University of Tokyo Center for Climate System Research (Japan). Earth System Grid Center for Enabling Technologies: (ESG-CET)

21 ESG-CET AR5 timeline  2008: Design and implement core functionality: Browse and search Registration Single sign-on / security Publication Distributed metadata Server-side processing  Early 2009: Testbed Plan to include at least seven centers in the US, Europe, and Japan:  PCMDI, NCAR, GFDL, ORNL, BADC, MPI, CCSR  2009: Deal with system integration issues, develop production system  2010: Modeling centers publish data  : Research and journal articles submissions  2013: IPCC AR5 Assessment Report Earth System Grid Center for Enabling Technologies: (ESG-CET)

22 Earth System Grid Center for Enabling Technologies: (ESG-CET) U.S. collaborations  NOAA GFDL is an active contributor to AR5 and ESG-CET, CF and GO-ESSP Data Archive and Access Requirements Working Group (DAARWG)  NASA Facilitating Climate Modeling Research By Integrating NASA and the Earth System Grid  SciDAC Scientific Data Management Center (SDM) DataMover Lite - efficient bulk transfer of data in a secure grid environment  SciDAC Visualization and Analytics Center (VACET) University of Utah, LLNL, LBNL, ORNL Integration of VisTrails visual analysis tool with CDAT  Ultrascale Visualization Web enabled collaborative climate visualization  Earth System Curator (ESC) Developing database schemas and interfaces for model configuration  Tech-X Corporation Analyze and visualize petabytes of archived data on Mosaic grids  VisTrails, Inc. Complete audit trail of computational processes  Many more…

23 Earth System Grid Center for Enabling Technologies: (ESG-CET) International collaborations  Global Organization for the Earth System Science Portal (GO-ESSP) - focused on facilitating the organization and implementation of an infrastructure for full data sharing among a consortium spanning continents, countries, and intergovernmental agencies  CF: The Climate and Forecast Metadata Convention Designed to promote the processing and sharing of files created with the NetCDF application programmer's interface. Enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities CMOR: Climate Model Output Rewriter  Used to produce CF-compliant netCDF files that fulfill the requirements of many of the climate community's standard model experiments (such as CMIP, CFMIP, NARCCAP, etc.)

24 AR5 open issues and questions  What are the set of runs to be done and derived from that the expected data volumes we can expect?  Expected participants – where will data be hosted? (Who is going to step up and host the data nodes, and provide the level of support expect in terms of manpower and hardware capability.) minimum software and hardware data holding site requirement (e.g., ftp access and ESG authentication and authorization) skilled staff help desk  AR5 archive to be globally distributed with support for WG1, WG2, and WG3. Will there be a need for a central (or core) archive and what will it look like?  Replication of holdings - disaster protection, a desire to have a replica of the core data archive on every continent, etc.  Number of users and level of access – scientist, policy makers, economists, health officials, etc. Earth System Grid Center for Enabling Technologies: (ESG-CET)