Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order.

Similar presentations


Presentation on theme: "1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order."— Presentation transcript:

1 1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order but also a major technological challenge. The international climate community is expected to generate hundreds of petabytes of simulation data within the next three to seven years.

2 2 Earth System Grid Center for Enabling Technologies: (ESG-CET) Data management and analysis for the Earth System Grid  ESG’s mission is to provide climate researchers worldwide with access to: data, information, models, analysis tools, and computational capabilities required to make sense of enormous climate simulation datasets.  ESG’s goals make data more useful to climate researchers by developing Grid technology that enhances data usability, meet specific distributed database, data access, and data movement needs of national and international climate projects, provide a universal and secure web-based data access portal for broad multi- model data collections, and provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies. develop key ideas and concepts that are important contributions to other domain areas.

3 3  Early 1990’s (e.g., AMIP1, PMIP, CMIP1): modest collection of monthly mean 2D files: ~1 GB  Late 1990’s (e.g., AMIP2): large collection of monthly mean and 6-hourly 2D and 3D fields: ~500 GB  Present (e.g., IPCC/CMIP3): fairly comprehensive output from both ocean and atmospheric components; monthly, daily, and 3 hourly: ~35 TB  Future 2010: The IPCC 5 th Assessment Report (AR5) in 2010: expected between 2.5 to 15 PB; The Climate Science Computational End Station (CCES) project at ORNL: expected around 3 PB; The North American Regional Climate Change Assessment Program (NARCCAP): expected around 1 PB; and The Cloud Feedback Model Intercomparison Project (CFMIP) archives: expected to be.3 PB Growing data Earth System Grid Center for Enabling Technologies: (ESG-CET)

4 4 Network Traffic, Climate and Physics Data, and Network Capacity ESnet traffic HEP experiment data ESnet capacity roadmap Historical Projection All Three Data Series are Normalized to “1” at Jan. 1990 Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth” measures are growing significantly faster than ESnet projected capacity Climate model data 2010 value -- 40 PBy -- 4 PBy

5 5 Data integration challenges facing climate science  Modeling groups will generate more data in the near future than exist today  Large part of research consists of writing programs to analyze data  How best to collect, distribute, and find data on a much larger scale? At each stage tools could be developed to improve efficiency Substantially more ambitious community modeling projects (Petabyte (PB 10 15 ) and Exabyte (EB 10 18 )) will require a distributed database  Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. )  How to make information understandable to end-users so that they can interpret the data correctly  More users than just Working Group (WG)1-science. (WG2-impacts and WG3- mitigation) (Policy makers, economists, health officials, etc.)  Integration of multiple analysis tools, formats, data from unknown sources  Trust and security on a global scale (not just an agency or country, but worldwide ) Earth System Grid Center for Enabling Technologies: (ESG-CET)

6 6 Petabyte-scale data volumes Globally federated sites “Virtual Datasets” created through subsetting and aggregation Metadata-based search and discovery Bulk data access Web-based and analysis tool access Increased flexibility and robustness ESG Goals Current ESG Sites http://www.earthsystemgrid.orghttp://www-pcmdi.llnl.gov Earth System Grid Center for Enabling Technologies (ESG-CET)

7 7  Data Different formats not standardized Different sites require knowledge of different methods of access Log onto multiple sites to hopefully find and retrieve data Gigabyte (GB 10 9 ) - Terabyte (TB 10 12 ) data volume  Metadata Painful to produce Most kept in files separate from data Data lost or reproduced numerous times  Locating data Manual Unreachable unless one is “in the know” (location kept in someone’s brain) Not formalized  Data requests/analysis Beginnings of a formal process Far too much done by hand Logging nearly non-existent  Data Standard output Model compliance tools to facilitate standard output Quality assurance Different sites but standardized access protocol One stop shop Terabyte (TB 10 12 ) - Petabyte (PB 10 15 ) data volume  Metadata Exhaustive detail Created via semi-automated processes Put data in databases and make it visible to others  Locating data Formalized process Highly granular – down to per-file, per-model level, per-variable Readily searchable - sophisticated search tools  Data requests/analysis/visualization Completely automated All logging done automatically Secure Before ESGESG Present, Future Computers do the more complicated and repetitive tasks and scientists focus on research Tremendous manual intervention, inefficient by any measure Climate model data management and analysis issues Earth System Grid Center for Enabling Technologies: (ESG-CET)

8 8 Current ESG architecture and underlying technologies  Climate Data Metadata Catalog NcML (metadata schema) OPeNDAP-g (aggregation and subsetting)  Data Management Storage Resource Mgr  Data Transfer Globus Security Infrastructure Data Mover Lite GridFTP Monitoring and Discovery Services Replica Location Service  Security Access Control MyProxy User Registration  Long-term Storage Tertiary data storage systems Earth System Grid Center for Enabling Technologies: (ESG-CET)

9 9 CCSM ESG Portal CMIP3 (IPCC AR4) ESG Portal 198 TB of data at four locations  1,150 datasets  1,032,000 files  Includes the past 6 years of joint DOE/NSF climate modeling experiments 35 TB of data at one location  74,700 files  Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change  Data from 13 countries, representing 25 models 8,000 registered users2,000 registered projects Downloads to date  60 TB  176,000 files Downloads to date  ~1/2 PB  1,300,000 files  500 GB/day (average) 400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data ESG: The world’s source for climate modeling data ESG usage: over 500 sites worldwide ESG monthly download volumes Earth System Grid Center for Enabling Technologies: (ESG-CET)

10 10 Earth System Grid Center for Enabling Technologies: (ESG-CET) Intercomparison example  Browse database  Download data  Organize data on local site  Regrid data at local site  Perform diagnostics  Produces results  Search, browse and discover distributed data  Remote site  Request data  Regrids  Data system reduction  ESG returns user defined products Future Usage Current Usage

11 11 Earth System Grid Center for Enabling Technologies: (ESG-CET)  Much broader model for scientific metadata  ‘Faceted’ search capability guides the user toward datasets of interest At a given point in the search, only those options which produce non-empty result sets are shown Avoids ‘deadend’ searches Flexible browsing hierarchy  Automated, GUI-based publication tools  Single sign-on  Full support for data aggregations A collection of files, usually ordered by simulation time, that can be treated as a single file for purposes of data access, computation, and visualization  File-streaming and release capabilities for data access to deep storage  Client access to subsetting, visualization services  Server-generated visualization products  Fine-grained access to datasets based on user groups and roles.  User notification service Users can choose to be notified when a dataset has been modified  Pre-computed products (e.g., global averages)  User workspace (storing of favorite products, search criteria, etc.) ESG-CET improvements over ESG II

12 12 Earth System Grid Center for Enabling Technologies: (ESG-CET) Gateways and nodes  Federated architecture  Gateways Portals, search capability, distributed metadata, registration and user management Initially PCMDI, NCAR, ORNL, eventually GFDL May be customized to an institution’s requirements More complex architecture than nodes, fewer sites  Nodes Where data is stored and published Data may be on disk or tertiary mass store. Each node has a trust relationship with a specific gateway, for publication. Data reduction Less complex architecture A site can be both a gateway and a node. Federation is a virtual trust relationship among independent management domains that have their own set of services. Users authenticate once to gain access to data across multiple systems and organizations.

13 13 Earth System Grid Center for Enabling Technologies: (ESG-CET) Architecture of the next generation of ESG-CET

14 14 The next generation ESG-CET system Earth System Grid Center for Enabling Technologies: (ESG-CET) Distributed and federated architecture Support discipline specific Gateways Support browser-based + direct client access

15 15 Use Case 1: scientific metadata search “Find surface temperature data across all models for a specific IPCC experiment that has volcanic forcing.” Earth System Grid Center for Enabling Technologies: (ESG-CET) Capture Scientific Metadata in detailed object model Faceted Search to slice through data via user-selected categories Link to Data Access Points (files or products) Broker application for one-click request of data products?

16 16 Use Case 2: large number of files “Download 1000 files from deep storage to my desktop while I am sleeping.” Earth System Grid Center for Enabling Technologies: (ESG-CET) Gateway interaction via Web Services: User finds, requests data through Gateway User passes request identifier to DML DML downloads file as they become available DML “releases” files already downloaded User checks request status via DML (or Gateway)

17 17 Use Case 3: high-end product, federation “Show me sea surface temperature plots for 3 different datasets (output of different models, same forcing) that are stored at 3 different locations.” Earth System Grid Center for Enabling Technologies: (ESG-CET) More powerful data selection algorithm Integration of LAS product server on Gateway, Data Nodes Single Sign-On authentication via OpenID Common authorization model

18 18 Use Case 4: multiple intercomparison example Earth System Grid Center for Enabling Technologies: (ESG-CET)

19 19 Earth System Grid Center for Enabling Technologies: (ESG-CET) Immediate and future challenges of software development  Sustain and build upon the very successful ESG archives (e.g., CCSM, CMIP3, CFMIP, PCM, POP, etc.)  Address future scientific needs for data management and analysis by extending support for sharing and diagnosing climate simulation data SciDAC II: A Scalable and Extensible Earth System Model for Climate Change Science Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists contributing to the IPCC Fifth Assessment Report (AR5) in 2010, The Climate Science Computational End Station (CCES), The North American Regional Climate Change Assessment Program (NARCCAP), and Other wide-ranging climate model evaluation activities.  How to make information understandable to end-users so that they can interpret the data correctly  Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …) Integrating analysis into a distributed environment Providing climate diagnostics Delivering climate component software to the community

20 20 AR5 testbed partners  Major driver for global federation: CMIP5 IPCC (AR5) in 2010  By early 2009 it is expected to include: Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.), National Center for Atmospheric Research - NCAR (U.S.), Geophysical Fluid Dynamics Laboratory - GFDL (U.S.), Oak Ridge National Laboratory - ORNL (U.S.), British Atmosphere Data Centre - BADC (U.K.), Max Planck Institute for Meteorology - MPI (Germany), The University of Tokyo Center for Climate System Research (Japan). Earth System Grid Center for Enabling Technologies: (ESG-CET)

21 21 ESG-CET AR5 timeline  2008: Design and implement core functionality: Browse and search Registration Single sign-on / security Publication Distributed metadata Server-side processing  Early 2009: Testbed Plan to include at least seven centers in the US, Europe, and Japan:  PCMDI, NCAR, GFDL, ORNL, BADC, MPI, CCSR  2009: Deal with system integration issues, develop production system  2010: Modeling centers publish data  2011-2012: Research and journal articles submissions  2013: IPCC AR5 Assessment Report Earth System Grid Center for Enabling Technologies: (ESG-CET)

22 22 Earth System Grid Center for Enabling Technologies: (ESG-CET) U.S. collaborations  NOAA GFDL is an active contributor to AR5 and ESG-CET, CF and GO-ESSP Data Archive and Access Requirements Working Group (DAARWG)  NASA Facilitating Climate Modeling Research By Integrating NASA and the Earth System Grid  SciDAC Scientific Data Management Center (SDM) DataMover Lite - efficient bulk transfer of data in a secure grid environment  SciDAC Visualization and Analytics Center (VACET) University of Utah, LLNL, LBNL, ORNL Integration of VisTrails visual analysis tool with CDAT  Ultrascale Visualization Web enabled collaborative climate visualization  Earth System Curator (ESC) Developing database schemas and interfaces for model configuration  Tech-X Corporation Analyze and visualize petabytes of archived data on Mosaic grids  VisTrails, Inc. Complete audit trail of computational processes  Many more…

23 23 Earth System Grid Center for Enabling Technologies: (ESG-CET) International collaborations  Global Organization for the Earth System Science Portal (GO-ESSP) - focused on facilitating the organization and implementation of an infrastructure for full data sharing among a consortium spanning continents, countries, and intergovernmental agencies  CF: The Climate and Forecast Metadata Convention Designed to promote the processing and sharing of files created with the NetCDF application programmer's interface. Enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities CMOR: Climate Model Output Rewriter  Used to produce CF-compliant netCDF files that fulfill the requirements of many of the climate community's standard model experiments (such as CMIP, CFMIP, NARCCAP, etc.)

24 24 AR5 open issues and questions  What are the set of runs to be done and derived from that the expected data volumes we can expect?  Expected participants – where will data be hosted? (Who is going to step up and host the data nodes, and provide the level of support expect in terms of manpower and hardware capability.) minimum software and hardware data holding site requirement (e.g., ftp access and ESG authentication and authorization) skilled staff help desk  AR5 archive to be globally distributed with support for WG1, WG2, and WG3. Will there be a need for a central (or core) archive and what will it look like?  Replication of holdings - disaster protection, a desire to have a replica of the core data archive on every continent, etc.  Number of users and level of access – scientist, policy makers, economists, health officials, etc. Earth System Grid Center for Enabling Technologies: (ESG-CET)


Download ppt "1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order."

Similar presentations


Ads by Google