Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.

Similar presentations


Presentation on theme: "CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech."— Presentation transcript:

1 CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech

2 Reproducibility issues in climate science Lots of published papers and reports do not include a computational description which is sufficiently detailed to reproduce the results. Even with detailed description, it is practically impossible to reproduce others’ climate simulation results. How many readers of the IPCC report can draw this plot? (from the latest IPCC report)

3 Climate Science is Big Data Science Data sets are massive and stored in distributed systems over many physical locations. Coupled Model Intercomparison Project Phase 5 (CMIP5) for IPCC assessment: 110 different experiments, 24 modeling centers, 45 models, 3.3 petabytes of data. By 2020 each experiment will generate an exabyte of data. Use massive observational data sets to: Formulate hypotheses from observed empirical relationships. Simulate current and past conditions under those hypotheses using climate models. Test hypotheses by comparing simulations to observations.

4 Our unique challenges : data change quickly over time Community Earth System Model (CESM) developed at National Center for Atmospheric Research Options: discretization methods, sub-grid scale physics, coupling with ocean, and so on. CESM is open source, but it is practically impossible to reproduce others’ simulation results. CESM 1.0 (June 2010) CESM 1.0 (June 2010) CESM 1.0.6 (May 2014) CESM 1.0.6 (May 2014) CESM 1.0.3 (June 2011) CESM 1.0.3 (June 2011) minor updates and branch versions numerous ways to configure a simulation

5 Regional Climate Model Evaluation System (RCMES, http://rcmes.jpl.nasa.gov/) RCMES is an open source software package developed by NASA’s JPL and UCLA to facilitate the evaluation of climate models. Now Open Climate Workbench (OCW) is one of top-level projects at the Apache Software Foundation. Make observational datasets, with some emphasis on NASA satellite data, more accessible to the climate modeling community for climate model evaluation. Provide researchers more time to spend on analyzing results and less time coding and worrying about file formats, data transfers, etc. Provide guidance to further improve models by visualizing collective evaluation results of models. Make some basic model evaluation for climate models reproducible.

6 Ingest obs/models, re-gridding, calculate metrics (e.g., bias, RMSE, correlation, significance, PDFs), and visualize results (e.g., contour, time series, Taylor). RCMES High-level technical architecture RCMED (Regional Climate Model Evaluation Database) A large scalable database to store data in a common format RCMET (Regional Climate Model Evaluation Toolkit) A library of codes for extracting data from RCMED and model and for calculating evaluation metrics Raw Data: Various Formats, Resolutions, Coverage Metadata Data Table Common Format, Native grid, Efficient architecture Common Format, Native grid, Efficient architecture MySQL Extractor TRMM MODIS AIRS SWE ETC Soil moisture Extract OBS data Extract RCM data RCM data user choice Regridder Put the OBS & RCM data on the same grid for comparison Regridder Put the OBS & RCM data on the same grid for comparison Metrics Calculator Calculate comparison metrics Metrics Calculator Calculate comparison metrics Visualizer Plot the metrics Visualizer Plot the metrics URL User’s own codes for ANAL and VIS. Data extractor (Fortran binary) Data extractor (Fortran binary) RCMES High-level technical architecture RCMED (Regional Climate Model Evaluation Database) A large scalable database to store data in a common format RCMET (Regional Climate Model Evaluation Toolkit) A library of codes for extracting data from RCMED and model and for calculating evaluation metrics Raw Data: Various Formats, Resolutions, Coverage Metadata Data Table Common Format, Native grid, Efficient architecture Common Format, Native grid, Efficient architecture MySQL Extractor TRMM MODIS AIRS SWE ETC Soil moisture Extract OBS data Extract RCM data RCM data user choice Regridder Put the OBS & RCM data on the same grid for comparison Regridder Put the OBS & RCM data on the same grid for comparison Metrics Calculator Calculate comparison metrics Metrics Calculator Calculate comparison metrics Visualizer Plot the metrics Visualizer Plot the metrics URL User’s own codes for ANAL and VIS. Data extractor (Fortran binary) Data extractor (Fortran binary) Raw Data: Various sources, formats, Resolutions, Coverage RCMED (Regional Climate Model Evaluation Database) A large scalable database to store data from variety of sources in a common format RCMET (Regional Climate Model Evaluation Tool) A library of codes for extracting data from RCMED and model and for calculating evaluation metrics Metadata Data Table Common Format, Native grid, Efficient architecture Common Format, Native grid, Efficient architecture Extractor for various data formats TRMM MODIS AIRS CERES ETC Soil moisture Extract OBS data Extract model data User input Regridder (Put the OBS & model data on the same time/space grid) Regridder (Put the OBS & model data on the same time/space grid) Metrics Calculator (Calculate evaluation metrics) Metrics Calculator (Calculate evaluation metrics) Visualizer (Plot the metrics) Visualizer (Plot the metrics) URL Use the re- gridded data for user’s own analyses and VIS. Data extractor (Binary or netCDF) Model data Other Data Centers (ESG, DAAC, ExArch Network) Other Data Centers (ESG, DAAC, ExArch Network) Regional Climate Model Evaluation System powered by Apache Software Foundation

7 Replication of Kim et al. (2013) using RCMES

8 How to make climate studies more reproducible? Different programming languages (Fortran, Matlab, R, Python, IDL, NCL, GrADS, ….): the workflow system could facilitate replication of other studies. Difficulties in reproducing others’ simulation results: Earth System Grid Federation (ESGF) provides software infrastructure to facilitate model intercomparison projects using observational data. Climate scientists need more open source software similar to RCMES that can facilitate their analyses of observational and model data.


Download ppt "CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech."

Similar presentations


Ads by Google