ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner,

Slides:



Advertisements
Similar presentations
Earth System Curator Spanning the Gap Between Models and Datasets.
Advertisements

Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
Summary discussion Top-down approach Consider Carbon Monitoring Systems, tailored to address stakeholder needs. CMS frameworks can be designed to provide.
CMIP5: Overview of the Coupled Model Intercomparison Project Phase 5
Scientific Grand Challenges Workshop Series: Challenges in Climate Change Science and the Role of Computing at the Extreme Scale Warren M. Washington National.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
1 Geophysical Fluid Dynamics Laboratory Review June 30 - July 2, 2009.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
NCPP – needs, process components, structure of scientific climate impacts study approach, etc.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Components of the climate system, interactions, and changes (Source: IPCC AR4 WG1 Ch.1, FAQ 1.2, Figure 1)
1 Eric Guilyardi and the Metafor team Common Metadata for Climate Modelling Digital Repositories Metafor Dissemination Workshop Abingdon, 14 March 2011.
IS-ENES [ees-enes] InfraStructure for the European Network for Earth System Modelling IS-ENES will develop a virtual Earth System Modelling Resource Centre.
1 Robert S. Webb and Roger S. Pulwarty NOAA Climate Service.
The Climate Prediction Project Global Climate Information for Regional Adaptation and Decision-Making in the 21 st Century.
CORDEX Scope, or What is CORDEX?  Provide a set of regional climate scenarios (including uncertainties) covering the period , for the majority.
Sensitivity Studies James Done NCAR Earth System Laboratory National Center for Atmospheric Research NCAR is Sponsored by NSF and this work is partially.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
Initiatives toward Climate Services in France and in the European Communities C. Déandreis (CNRS/IPSL); M. Plieger and W. Som de Cerff (KNMI); Ph. Dandin,
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Getting Ready for the Future Woody Turner Earth Science Division NASA Headquarters May 7, 2014 Biodiversity and Ecological Forecasting Team Meeting Sheraton.
WCRP update WDAC3 Galway, Ireland, 6-7 May 2014 M. Rixen, WCRP JPS.
Update on CORDEX Key issues for discussion Filippo Giorgi Abdus Salam ICTP CORDEX-SAT meeting, ICTP May, 2014.
CORDEX South-Asia 2 nd Science and Training Workshop Katmandu, Nepal M. Rixen, WCRP JPS 27 August
The Merton Report an AIMES/IGBP-ESA partnership As Earth System science advances and matures, it must be supported by robust and integrated observation.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
World Climate Research Programme Climate Information for Decision Making Ghassem R. Asrar Director, WCRP.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Community of Practice Richard B. Rood August 12, 2013.
Principle Investigator : Eric Guilyardi IPSL/Reading Lois Steenman-Clark NCAS, University of Reading Leader of METAFOR Work Package 2 GO-ESSP Seattle 2008.
The Climate-G testbed towards a large scale data sharing environment for climate change S. Fiore Scientific Computing and Operations Division, CMCC, Italy.
Innovative Program of Climate Change Projection for the 21st century (KAKUSHIN Program) Innovative Program of Climate Change Projection for the 21st century.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
- EGU 2010 ESSI May Building on the CMIP5 effort to prepare next steps : integrate community related effort in the every day workflow to.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
The CF Conventions: Options for Sustained Support Involving Unidata Russ Rew Unidata Policy Committee May 12, 2008.
Exascale Climate Data ANalysis From the Inside Out Frédéric Laliberté Paul Kushner University of Toronto ExArch WP3.
IPCC-GEO meeting 2nd GEOSS Science and Technology Stakeholder Workshop "GEOSS: Supporting Science for the Millennium Development Goals and Beyond" Bonn,
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Sarah Callaghan 1, Gerry Devine 2, Eric Guilyardi 3, Bryan Lawrence 1, Charlotte Pascoe 1, Lois Steenman-Clark 2 and the Metafor Project Team 1 NCAS-BADC;
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
Welcome to the PRECIS training workshop
The Modeling Circle Courtesy M. Lautenschlager, DKRZ.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Data Requirements for Climate and Carbon Research John Drake, Climate Dynamics Group Computer.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
4 th WCRP Observations and Assimilation Panel Meeting Hamburg, Germany, March 29-31, Workshop on Ensuring Access and Trustworthiness of Climate.
Figure 3. Overview of system architecture for RCMES. A Regional Climate Model Evaluation System based on Satellite and other Observations Peter Lean 1.
Support to scientific research on seasonal-to-decadal climate and air quality modelling Pierre-Antoine Bretonnière Francesco Benincasa IC3-BSC - Spain.
Report of WGCM (Met Office, Exeter, 3-5 Oct 2005) and WGCM/WMP (Met Office, Exeter, 5 Oct 2005 (pm)) meetings Howard Cattle.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech 5 th GO-ESSP Community Meeting.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
The NOAA Operational Model Archive and Distribution System NOMADS CEOS-Grid Application Status Report Glenn K. Rutledge NOAA NCDC CEOS WGISS-19 Cordoba,
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal EGU, 21 April 2016 Antony Wilson, Victoria.
1. 2 NOAA’s Mission To describe and predict changes in the Earth’s environment. To conserve and manage the Nation’s coastal and marine resources to ensure.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
The CF Conventions: Governance and Community Issues in Establishing Standards for Representing Climate, Forecast, and Observational Data Russ Rew 1, Bob.
RAL, 2012, May 11 Research behaviour Martin Juckes, 11 May, 2012.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Climate Data Analytics in a Big Data world
Michel Rixen, WCRP Joint Planning Staff
Metadata Development in the Earth System Curator
Presentation transcript:

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner, D. Waliser The ExArch project seeks to develop a framework for the scientific interpretation of multi-model ensembles at the peta- and exa-scale. Specifically, a framework for evaluating the imminent Climate Model Intercomparison Project, Phase 5 (CMIP5) archive, which will be largest of its kind ever assembled in this domain and the Coordinated Regional Downscaling experiment (CORDEX), which will push even beyond CMIP5 in resolution, albeit on the regional scale.

ExArch: management numbers Start: March 1st, 2010 Duration: 39 months Budget: 1.44 million Euros Effort: Start: March 1st, 2011 Duration: 39 months Budget: 1.44 million Euros Effort: 246 staff months

Global Organization for Earth System Science Portals (GO-ESSP) [unfunded].... a federation of frameworks that can work together using agreed-upon standards METAFOR (EU FP7): Detailed meta-data for climate models IS-ENES (EU FP7): Infrastructure support, including distributed data archive. ESG (DOE, SciDAC): access to: data, information, models, analysis tools, and computational resources CURATOR (NSF, NASA. NOAA): capable infrastructure for Earth system research and operations The ExArch back-story

Model development CMIP3 Model development CMIP5 Evaluation Downscaling Impacts models IPCC 4 th Assessment Report (2005-7) IPCC 5 th Assessment Report ( ) Evaluation Downscaling Impacts models The climate assessment process

CMIP5 and AR5: a brief organisational overview Experiment design coordinated by Karl Taylor at PCMDI on behalf of WCRP CMIP5 archive A globally distributed archive, a federation of data centres and modelling centres will host the archive Access to the global archive expected in April

A schematic roadmap

Improving precision Image STS41B was provided by the Earth Sciences and Image Analysis Laboratory at NASA's Johnson Space Center. NASA Graphics

Fig 1 of Delaney and Barga (2010) Dealing with complexity Improved accuracy requires representing a vast range of processes:  complexity in model design;  complexity in experimental design.

Sampling uncertainty Uncertainty in initial conditions – We do not know the state of the ocean to sufficient accuracy Uncertainty from natural variability – Some components of the system are chaotic, we can never predict what state they will be in beyond O(10 days) Uncertainty in model design and configuration – Many choices need to be made to create a computable system. The multi-model ensemble is a “multi-metric, distance optimising Monte- Carlo approach”, where each major design decision is a trade-off between adopting approaches used by others or exploring new territory.

Overpeck et al. (2011), Science.

N pp = Number of mesh points pole to pole N g = Total number of spatial mesh points = O(N pp 3 ) N v = Number of variables ~ √N pp N e = Ensemble size ~ N pp N t = Time steps per simulated year ~ N pp N y = Years simulated per intercomparison ~ √N pp. Cost ~ N pp 6 O(40) decrease in storage density needed to bring this estimate in line with Overpeck et al.

Some trends – ballpark figures Change per yearChange per decade Data centre storage+60% Data centre energy use+25% Energy use/unit capacity-22%10-fold decrease Purchase cost/Tb200 USD3 USD Operating power10 W/Tb1W/Tb Electricity cost (UK)90 GBP/MWh120 GBP/MWh Cost of 1Tb* 3 years Size at constant funding1Pb27Pb Analysis by Kryder and Soo Kim (2009) suggests hard drives will not be replaced by solid state or other new media before 2020.

StrategyInformatics Climate Science 2 Workshops Interactions with GCOS, ESA and NASA Governance structures Accessibility Software management Robust metadata Query management Near archive processing Quality assurance Climate science diagnostics Work Packages

Provenance Dealing with thousands of experiments with complex models in multiple configurations – need comprehensive and machine readable documentation of model configuration. Accurate metadata As meta-data becomes more complex, we need to develop systems to ensure that it is accurate: Downstream: auto-generate meta-data from model configurations Upstream: prescribe meta-data and auto-generate model configurations

Query syntax The query syntax is a key element of the communication between system components. Resource management Bringing calculations to an exa-scale data archive is going to create a significant computational demand.

Many to many data distribution Can we create a bit-torrent for data with access control? Security, transparency, inter-operability Particularly inter-operability with observational climate archives. Need to support multiple access routes: OpenDAP – for climate scientists OGC Browser interface

Quality control Standard diagnostics Structured queries E.g. Ensemble mean of all models with dynamic* vegetation. *: “dynamic” vegetation in the climate modelling community refers to temporally evolving vegetation, as opposed to prescribed fixed vegetation. Many climate data users want to look at standard indices (e.g. strength of El Nino) rather than the raw data. With millions of files, quality control is very important. With hundreds of different variables it is also very hard.

Themes: Taking the computation to the data The Climate Data Operators (CDO) library is maintained by DKRZ. Designed to act on files containing gridded spatial data, exploiting variable and units naming conventions. Over 400 operators (cf Gray's law start with 20). Objectives: (1) allow preliminary analysis to select data (e.g. days with cyclones affecting US mainland) to be carried out at the archive; (2) allow data reduction operations (e.g. area mean) to be carried out at the archive. Later: take the diagnostic computation into the run-time???

Degrees of closeness to the data In the room Limited CPU resource Intercontinental Regional, on backbone On site (inside firewall) May be necessary if using multiple archives General purpose resources Restricted access rights

Themes: Governance and communication (brain to brain) Efficient exploitation of complex archives is dependent on a range of standards with various governance procedures. Will these standards respond to the demands of exa-scale systems? ExArch will: (1) support the extension of the METAFOR Common Information Model to cover regional climate models; (2) explore methods of ensuring validity of model documentation; (3) engage with the NetCDF CF conventions governance process; (4) encourage early planning by sharing documents and interlinking milestones, discourage late planning by getting everyone round the table (doesn't scale well).

The end