Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Requirements for Climate and Carbon Research

Similar presentations


Presentation on theme: "Data Requirements for Climate and Carbon Research"— Presentation transcript:

1 Data Requirements for Climate and Carbon Research
David C. Bader Chief Scientist, DOE Climate Change Prediction Program Presentation Material Courtesy of: John Drake, ORNL Don Middleton, National Center for Atmospheric Research and CCPP Scientists UCRL-PRES

2 Why is DOE Interested in Climate?

3

4 Courtesy Warren Washington

5 IPCC Fourth Assessment Runs
Plan: Commitment Runs : B1 Scenario simulated with ensemble calculations Dedicated 6 (32way) nodes of Cheetah for Sept-Nov 2003 Dedicated 12 (32way) nodes of Cheetah for Mar-July 2004 Cray X1 Science runs in 2004

6

7 Baseline Numbers T42 CCSM (current, 280km) T85 CCSM (140km)
7.5GB/yr, 100 years -> .75TB T85 CCSM (140km) 29GB/yr, 100 years -> 2.9TB T170 CCSM (70km) 110GB/yr, 100 years -> 11TB

8 Diagonstic Analysis of Coupled Models
Courtesy PCMDI Courtesy CCSM/AMWG

9 Tools

10 The Earth System Grid http://www.earthsystemgrid.org
U.S. DOE SciDAC funded R&D effort Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data A “Collaboratory Pilot Project” Build upon ESG-I, Globus Toolkit, DataGrid technologies, and deploy Potential broad application to other areas

11 ESG Team LLNL/PCMDI USC/ISI ANL NCAR LBNL ORNL Bob Drach
Dean Williams (PI) USC/ISI Anne Chervenak Carl Kesselman (Laura Perlman) NCAR David Brown Luca Cinquini Peter Fox Jose Garcia Don Middleton (PI) Gary Strand ANL Ian Foster (PI) Veronika Nefedova (John Bresenhan) (Bill Allcock) LBNL Arie Shoshani Alex Sim ORNL David Bernholdte Kasidit Chanchio Line Pouchard

12

13 Capacity-related Improvements
Increased turnaround, model development, ensemble of runs Increase by a factor of 10, linear data Current T42 CCSM 7.5GB/yr, 100 years -> .75TB * 10 = 7.5TB Many more runs, need to keep track of them

14 Capability-related Improvements
Spatial Resolution: T42 -> T85 -> T170 Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data Increase by factor of ~ 4, linear data Note potential trend toward process-oriented analysis CCM3 at T170 (70km)

15 Capability-related Improvements
Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice Increase by another factor of 2-3, data flat Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics), middle Atmosphere Model… Increase by another factor of 10+, linear data

16 Model Improvements cont.
Grand Total: Increase compute by a Factor O( ) Trends Increased space and time resolution Increased complexity Increased scope (e.g. biogeochemical) Capability advances will lead to analysis that becomes increasingly process-oriented. Analysis may become more exploratory and “intense”. Many more runs and we must keep track of them.

17 ESG: Challenges Enabling the simulation and data management team
Enabling the core research community in analyzing and visualizing results Enabling broad multidisciplinary communities to access simulation results And as we’re able to compute more and quicker, the workflow among these groups becomes ever more important John Shalf and others made some good points in this area in yesterday’s vis session Note observational data We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

18 ESG: Strategies Move data a minimal amount, keep it close to computational point of origin when possible Data access protocols, distributed analysis When we must move data, do it fast and with a minimum amount of human intervention Storage Resource Management, fast networks Keep track of what we have, particularly what’s on deep storage Metadata and Replica Catalogs Harness a federation of sites, web portals Globus Toolkit -> The Earth System Grid -> The UltraDataGrid Coupled hardware resources for data Distributed data access protocols It’s already very hard to keep track of all the different runs, for different scenarios, at different sites. Metadata will enable tracking as well as enhanced analysis capabilities Even with an ESS, we will have multiple sites

19 HRM aka “DataMover” Running well across DOE/HPSS systems
New component built that abstracts NCAR Mass Storage System Defining next generation of requirements with climate production group First “real” usage

20 OPeNDAP An Open Source Project for a Network Data Access Protocol
(originally DODS, the Distributed Oceanographic Data System)

21 ESG: Metadata Services
EXTRACTION DISPLAY BROWSING QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database COARDS mirror XML Files COMMENTS METADATA HOLDINGS ANNOTATION VALIDATION METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES AGGREGATION DISCOVERY METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEARCH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION

22 Collaborations & Relationships
CCSM Data Management Group The Globus Project Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid Toolkit OPeNDAP/DODS (multi-agency) NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) U.K. e-Science and British Atmospheric Data Center NOAA NOMADS and CEOS-grid Earth Science Portal group (multi-agency, intnl.)

23

24

25


Download ppt "Data Requirements for Climate and Carbon Research"

Similar presentations


Ads by Google