CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Challenges of Analyzing.
Earth System Curator Spanning the Gap Between Models and Datasets.
Scientific Applications of the Regional Climate Model Evaluation System (RCMES) Paul C. Loikith, Duane E. Waliser, Chris Mattmann, Jinwon Kim, Huikyo Lee,
Preparing CMOR for CMIP6 and other WCRP Projects
J. Kim 1, D. Waliser 2, C. Mattmann 2, P. Ramirez 2, H. Lee 2, P. Loikith 2, M. Bounstani 2, C. Goodale 2, A. Hart 2, J. Sanjay 3, M.V.S. Rama Rao 3, R.
Jinwon Kim and Paul Ramirez and RCMES Science and IT Teams led by Duane Waliser (JPL), Science Leader Chris Mattmann (JPL), IT Leader Regional Climate.
Ana4MIPS Update for WDAC3 Michael Bosilovich. Ana4MIPs Project Original Goal tracks Obs4MIPS – Repackage variables to conform to CMIP standard format.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Facilitating Distributed.
May 17, Capabilities Description of a Rapid Prototyping Capability for Earth-Sun System Sciences RPC Project Team Mississippi State University.
2. Point Cloud x, y, z, … Complete LiDAR Workflow 1. Survey 4. Analyze / “Do Science” 3. Interpolate / Grid USGS Coastal & Marine.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
RCMES team at Jet Propulsion Laboratory
The Regional Climate Model Evaluation System (RCMES): A Training Session Paul C. Loikith, Paul Ramirez, Huikyo Lee, and the RCMES Climate and Computer.
Duane Waliser Jet Propulsion Laboratory/Caltech & University of California, Los Angeles Chris Mattmann, Paul Loikith, Huikyo Lee, etc, JPL Jinwon Kim,
Office of Research and Development National Exposure Research Laboratory, Atmospheric Modeling Division, Applied Modeling Research Branch October 8, 2008.
NARCCAP Users' Workshop April 2012 Boulder, Colorado J. Kim 1, Waliser 2, C. Mattmann 2, L. Mearns 3, C. Goodale 2 A. Hart 2. Crichton 2, S. Mcginnis.
International Conference on Regional Climate – CORDEX 2013, 4-8 November 2013, Brussels, Belgium Systematic Biases in the CORDEX-Africa and NARCCAP Multi-RCM.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
C. Mattmann 1, C. Goodale 1, J. Kim 2, D.E. Waliser 1,2, D. Crichton 1, A. Hart 1, P. Zimdars 1 and Peter Lean* The International Workshop on CORDEX-East.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
A Regional Climate Model Evaluation System: Facilitating the Use of Contemporary Satellite and Other Observations for Evaluating Regional Climate Model.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Understanding and Comparing Remote Sensing Data to Model Output Chris A. Mattmann Senior Computer Scientist, NASA Jet Propulsion Laboratory Adjunct Assistant.
Where to find LiDAR: Online Data Resources.
The New Zealand Institute for Plant & Food Research Limited Use of Cloud computing in impact assessment of climate change Kwang Soo Kim and Doug MacKenzie.
1 NASA CEOP Status & Demo CEOS WGISS-24 Oberpfaffenhofen, Germany October 15, 2007 Yonsook Enloe.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
The Regional Climate Model Evaluation System (RCMES): Introduction and Demonstration Paul C. Loikith, Duane E. Waliser, Chris Mattmann, Jinwon Kim, Huikyo.
Climate Data Formats Deniz Bozkurt
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Data for Model Evaluations Karl E. Taylor Program for Climate Model Diagnosis and Intercomparison (PCMDI) Presented to the Fourth WCRP Observation and.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
ESG Observational Data Integration Presented by Feiyi Wang Technology Integration Group National Center of Computational Sciences.
Chris A. Mattmann Senior Computer Scientist Jet Propulsion Laboratory D. Waliser (JPL) C. Goodale (JPL) J. Kim (UCLA/JIFRESSE Many others ADSIMNOR-CORDEX.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
Regional Climate Model Evaluation System based on satellite and other observations for application to CMIP/AR downscaling Peter Lean 1, Jinwon Kim 1,3,
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Evaluating RCM Experiments PRECIS Workshop Tanzania Meteorological Agency, 29 th June – 3 rd July 2015.
Welcome to the PRECIS training workshop
4 th WCRP Observations and Assimilation Panel Meeting Hamburg, Germany, March 29-31, Workshop on Ensuring Access and Trustworthiness of Climate.
Figure 3. Overview of system architecture for RCMES. A Regional Climate Model Evaluation System based on Satellite and other Observations Peter Lean 1.
Support to scientific research on seasonal-to-decadal climate and air quality modelling Pierre-Antoine Bretonnière Francesco Benincasa IC3-BSC - Spain.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Federal Land Manager Environmental Database (FED) Overview and Update June 6, 2011 Shawn McClure.
NOAA Northeast Regional Climate Center Dr. Lee Tryhorn NOAA Climate Literacy Workshop April 2010 NOAA Northeast Regional Climate.
A41I-0105 Supporting Decadal and Regional Climate Prediction through NCAR’s EaSM Data Portal Doug Schuster and Steve Worley National Center for Atmospheric.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
A Quick Tour of the NOAA Environmental Software Infrastructure and Interoperability Group Cecelia DeLuca Dr. Robert Detrick visit March 28, 2012
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
NASA Earth Science Data Stewardship
Brian Johnson and Doug Young
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Intro to CMIP, the WHOI CMIP5 community server, and planning for CMIP6
iSERVOGrid Architecture Working Group Brisbane Australia June
Future Data Architectures Big Data Workshop – April 2018
National Center for Atmospheric Research
Digital Object Management for ENES: Challenges and Opportunities
Presentation transcript:

CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech

Reproducibility issues in climate science Lots of published papers and reports do not include a computational description which is sufficiently detailed to reproduce the results. Even with detailed description, it is practically impossible to reproduce others’ climate simulation results. How many readers of the IPCC report can draw this plot? (from the latest IPCC report)

Climate Science is Big Data Science Data sets are massive and stored in distributed systems over many physical locations. Coupled Model Intercomparison Project Phase 5 (CMIP5) for IPCC assessment: 110 different experiments, 24 modeling centers, 45 models, 3.3 petabytes of data. By 2020 each experiment will generate an exabyte of data. Use massive observational data sets to: Formulate hypotheses from observed empirical relationships. Simulate current and past conditions under those hypotheses using climate models. Test hypotheses by comparing simulations to observations.

Our unique challenges : data change quickly over time Community Earth System Model (CESM) developed at National Center for Atmospheric Research Options: discretization methods, sub-grid scale physics, coupling with ocean, and so on. CESM is open source, but it is practically impossible to reproduce others’ simulation results. CESM 1.0 (June 2010) CESM 1.0 (June 2010) CESM (May 2014) CESM (May 2014) CESM (June 2011) CESM (June 2011) minor updates and branch versions numerous ways to configure a simulation

Regional Climate Model Evaluation System (RCMES, RCMES is an open source software package developed by NASA’s JPL and UCLA to facilitate the evaluation of climate models. Now Open Climate Workbench (OCW) is one of top-level projects at the Apache Software Foundation. Make observational datasets, with some emphasis on NASA satellite data, more accessible to the climate modeling community for climate model evaluation. Provide researchers more time to spend on analyzing results and less time coding and worrying about file formats, data transfers, etc. Provide guidance to further improve models by visualizing collective evaluation results of models. Make some basic model evaluation for climate models reproducible.

Ingest obs/models, re-gridding, calculate metrics (e.g., bias, RMSE, correlation, significance, PDFs), and visualize results (e.g., contour, time series, Taylor). RCMES High-level technical architecture RCMED (Regional Climate Model Evaluation Database) A large scalable database to store data in a common format RCMET (Regional Climate Model Evaluation Toolkit) A library of codes for extracting data from RCMED and model and for calculating evaluation metrics Raw Data: Various Formats, Resolutions, Coverage Metadata Data Table Common Format, Native grid, Efficient architecture Common Format, Native grid, Efficient architecture MySQL Extractor TRMM MODIS AIRS SWE ETC Soil moisture Extract OBS data Extract RCM data RCM data user choice Regridder Put the OBS & RCM data on the same grid for comparison Regridder Put the OBS & RCM data on the same grid for comparison Metrics Calculator Calculate comparison metrics Metrics Calculator Calculate comparison metrics Visualizer Plot the metrics Visualizer Plot the metrics URL User’s own codes for ANAL and VIS. Data extractor (Fortran binary) Data extractor (Fortran binary) RCMES High-level technical architecture RCMED (Regional Climate Model Evaluation Database) A large scalable database to store data in a common format RCMET (Regional Climate Model Evaluation Toolkit) A library of codes for extracting data from RCMED and model and for calculating evaluation metrics Raw Data: Various Formats, Resolutions, Coverage Metadata Data Table Common Format, Native grid, Efficient architecture Common Format, Native grid, Efficient architecture MySQL Extractor TRMM MODIS AIRS SWE ETC Soil moisture Extract OBS data Extract RCM data RCM data user choice Regridder Put the OBS & RCM data on the same grid for comparison Regridder Put the OBS & RCM data on the same grid for comparison Metrics Calculator Calculate comparison metrics Metrics Calculator Calculate comparison metrics Visualizer Plot the metrics Visualizer Plot the metrics URL User’s own codes for ANAL and VIS. Data extractor (Fortran binary) Data extractor (Fortran binary) Raw Data: Various sources, formats, Resolutions, Coverage RCMED (Regional Climate Model Evaluation Database) A large scalable database to store data from variety of sources in a common format RCMET (Regional Climate Model Evaluation Tool) A library of codes for extracting data from RCMED and model and for calculating evaluation metrics Metadata Data Table Common Format, Native grid, Efficient architecture Common Format, Native grid, Efficient architecture Extractor for various data formats TRMM MODIS AIRS CERES ETC Soil moisture Extract OBS data Extract model data User input Regridder (Put the OBS & model data on the same time/space grid) Regridder (Put the OBS & model data on the same time/space grid) Metrics Calculator (Calculate evaluation metrics) Metrics Calculator (Calculate evaluation metrics) Visualizer (Plot the metrics) Visualizer (Plot the metrics) URL Use the re- gridded data for user’s own analyses and VIS. Data extractor (Binary or netCDF) Model data Other Data Centers (ESG, DAAC, ExArch Network) Other Data Centers (ESG, DAAC, ExArch Network) Regional Climate Model Evaluation System powered by Apache Software Foundation

Replication of Kim et al. (2013) using RCMES

How to make climate studies more reproducible? Different programming languages (Fortran, Matlab, R, Python, IDL, NCL, GrADS, ….): the workflow system could facilitate replication of other studies. Difficulties in reproducing others’ simulation results: Earth System Grid Federation (ESGF) provides software infrastructure to facilitate model intercomparison projects using observational data. Climate scientists need more open source software similar to RCMES that can facilitate their analyses of observational and model data.