Presentation on theme: "Credibility of Climate Model Projections of Future Climate: Issues and Challenges Linda O. Mearns National Center for Atmospheric Research SAMSI 2011-12."— Presentation transcript:
Credibility of Climate Model Projections of Future Climate: Issues and Challenges Linda O. Mearns National Center for Atmospheric Research SAMSI 2011-12 Program on Uncertainty Quantification Pleasanton, CA, August 29, 2011
Doubt is not a pleasant condition, but certainty is an absurd one. -Voltaire
How can we best evaluate the quality of climate models?
Different Purposes Reasons for establishing reliability/credibility – for recommending what scenarios should be used for impacts assessments, –for selecting which global models should be used to drive regional climate models, –for differential weighting to provide better measures of uncertainty (e.g, probabilistic methods) Mainly going to discuss this in terms of multi- model ensembles (however, there are important limitations to the types of uncertainty that can be represented in MMEs)
Reliability, confidence, credibility What do we mean? For a long time, climate modelers/analysts would make statements that if model reproduced observations well then we had confidence in future projections – but this really isnt adequate – synonymous with Gabbis naïve view
Average Changes in Temperature and Precipitation over the Grid Boxes of the Lower 48 States Three Climate Model 2xCO2 Experiments Smith et al., 1989
Global Model Change in Precipitation - Summer
Relationship to Uncertainty Historically (and even in the most recent IPCC Reports) each climate model is given equal weight in summarizing model results. Does this make sense, given different model performances? Rapid new developments in how to differentially weight climate model simulations in probabilistic models of uncertainty of regional climate change
REA Method Summary measure of regional climate change based on weighted average of climate model responses Weights based on model reliability Model Reliability Criteria: Performance of AOGCM (validation) Model convergence (for climate change) Giorgi and Mearns, 2002, 2003
REA Results for Temperature A2 Scenario DJF
Summary REA changes differ from simple averaging method –by few tenths to 1 K for temperature –by few tenths to 10% for precipitation Uncertainty range is narrower in the REA method Overall reliability from model performance was lower than that from model convergence Therefore to improve reliability, must reduce model biases
Tebaldi & Knutti, 2007
Search for correct performance metrics for climate models – where are we Relative ranking of models varies depending on variable considered - points to difficulty of using one grand performance index Importance of evaluating a broad spectrum of climate processes and phenomena Remains largely unknown what aspects of observed climate must be simulated well to make reliable predictions about future climate Gleckler et al., 2008, JGR
Model Performance over Alaska and Greenland RMSEs of seasonal cycles of temperature, precipitation, sea level pressure Tendency of models with smaller errors to simulate a larger greenhouse gas warming over the arctic and greater increases in precipitation Choice of subset of models may narrow uncertainty and obtain more robust estimates of future climate change in Arctic Walsh et al., 2008
Selection of reliable scenarios in the Southwest Evaluation of CMIP3 models for winter temperature and precipitation (using modified Giorgi and Mearns REA method) Reproduction of 250 mb geopotential height field (reflecting location of subtropical jet stream) Two models (ECHAM5, HadCM3 score best for these three variables) Dominguez et al., 2010
SW Reliability Scores Dominguez et al., 2010
Studies where selection did not make a difference Pierce et al., 2009 - future average temperature over the western US – 14 randomly selected GCMs produced results indistinguishable from those produced by subset of best models. Knutti et al., 2010 – metric of precipitation trend, 11 randomly selected GCMs, produced same results as those from 11 best GCMs.
ENSEMBLES Methodological Approach Six metrics are identified based on ERA40-driven runs –F1: Large scale circulation and weather regimes (MeteoFrance) –F2: Temperature and precipitation meso-scale signal (ICTP) –F3: Pdfs of daily precipitation and temperature (DMI, UCLM,SHMI) –F4: Temperature and precipitation extremes (KNMI, HC) –F5: Temperature trends (MPI) –F6: Temperature and precipitation annual cycle (CUNI)
The result 0,3 0,2 0,1 Christensen et al., 2010
The result RCMf1f1 f2f2 f3f3 f4f4 f5f5 f6f6 W prod W redu W rank C4I-RCA3.00,0580,0500,0670,0440,0660,0690,0260,0580,057 CHMI-ALADIN0,0710,0580,0670,0700,0600,0690,0540,0660,08 CNRM-ALADIN0,0690,0590,0670,1130,0660,0610,0840,0640,066 DMI-HIRHAM50,0680,0390,0660,0620,0700,0680,0350,0660,053 ETHZ-CLM0,0750,0730,0670,0360,0590,0690,0380,0670,073 ICTP-RegCM30,0730,1120,0650,0660,0690,0670,1120,0750,073 KNMI-RACMO20,0700,1370,0690,1320,0660,0680,2680,0940,123 Met.No-HIRHAM0,0700,0410,0670,0570,0650,0670,0320,0640,055 Meto-HC-HadRM3Q00,0610,0480,0670,0540,0710,0660,0340,0630,057 Meto-HC-HadRM3Q30,0610,0490,0660,0300,0640,0620,0160,0470,036 Meto-HC-HadRM3Q160,0610,0510,0670,0800,0730,0660,0550,0690,071 MPI-REMO0,0680,0720,0660,0380,069 0,0390,0680,063 OURANOS- MRCC4.2.30,0720,0890,0650,0630,0650,0660,0770,0650,057 SMHI-RCA3.00,0570,0530,0670,0540,0670,0690,0350,0630,062 UCL-PROMES0,0670,0680,0670,0990,0700,0650,0960,0740,073
An application for 2020- 2050 Changes for European capitals 2021-2050 (Déqué, 2009; Déqué & Somot 2010) 17 RCMs in 5(7) GCMs Convert discrete data set into a continuous PDFs of climate change variables. –This is done using a Gaussian Kernel algorithm applied to the discrete dataset with the aim to take into account also the model specific weights
Temperature Pdf Climate Change Pdf of daily temperature (°C) for DJF (left) and JJA (right) for the 1961-1990 (solid line) and 2021-2050 (dash line) with ENSEMBLES weights (thick line) and for a single model based on median Ranked Probability Score (thin line). Deque & Somot 2010
To weight or not to weight Recommendation from IPCC Expert Meeting on Evaluating MMEs (Knutti et al. 2010) –Rankings or weighting could be used to select subsets of models –But useful to test statistical significance of the difference between models and the subset vs. the full ensemble to establish if subset is meaningful –Selection of metric is crucial - is it a truly meaningful one from a process point of view?
More process-oriented approaches
Process-oriented approaches Hall and Qu snow-albedo feedback example What we are trying in NARCCAP
In the AR4 ensemble, intermodel variations in snow albedo feedback strength in the seasonal cycle context are highly correlated with snow albedo feedback strength in the climate change context SNOW ALBEDO FEEDBACK
observational estimate based on ISCCP 95% confidence interval Its possible to calculate an observed value of the SAF parameter in the seasonal cycle context based on the ISCCP data set (1984-2000) and the ERA40 reanalysis. This value falls near the center of the model distribution. Its also possible to calculate an estimate of the statistical error in the observations, based on the length of the ISCCP time series. Comparison to the simulated values shows that most models fall outside the observed range. However, the observed error range may not be large enough because of measurement error in the observations. (Hall and Qu, 2007)
What controls the strength of snow albedo feedback? snow cover component snow metamorphosis component It turns out that the snow cover component is overwhelmingly responsible not only for the overall strength of snow albedo feedback in any particular model, but also the intermodel spread of the feedback. Qu and Hall 2007a
Establishing Process-level Differential Credibility of Regional Scale Climate Simulations Determining through in depth process-level analysis of climate simulations of current (or past) climate, the ability of the model to reproduce those aspects of the climate system most responsible for the particular regional climate; then analyzing the model response to future forcing and determining specifically how model errors in the current simulation affect the models response to the future forcing. Which model errors really matter? Essentially it is a process-based integrated expert judgment of to what degree the models response to the future forcing is deemed credible.
The North American Regional Climate Change Assessment Program (NARCCAP) Explores multiple uncertainties in regional and global climate model projections 4 global climate models x 6 regional climate models Develops multiple high resolution regional (50 km, 30 miles) climate scenarios for use in impacts and adaptation assessments Evaluates regional model performance to establish credibility of individual simulations for the future Participants: Iowa State, PNNL, LNNL, UC Santa Cruz, Ouranos (Canada), UK Hadley Centre, NCAR Initiated in 2006, funded by NOAA-OGP, NSF, DOE, USEPA-ORD – 5-year program www.narccap.ucar.edu
Organization of Program Phase I: 25-year simulations using NCEP-Reanalysis boundary conditions (19802004) Phase II: Climate Change Simulations –Phase IIa: RCM runs (50 km res.) nested in AOGCMs current and future –Phase IIb: Time-slice experiments at 50 km res. (GFDL and NCAR CAM3). For comparison with RCM runs. Quantification of uncertainty at regional scales – probabilistic approaches Scenario formation and provision to impacts community led by NCAR. Opportunity for double nesting (over specific regions) to include participation of other RCM groups (e.g., for NOAA OGP RISAs, CEC, New York Climate and Health Project, U. Nebraska).
Process Credibility Analysis of the Southwest M. Bukovsky
What do we need to make further progress? Many more in depth process oriented studies that examine plausibility of process change under future forcing - what errors really matter and which dont?
What is the danger of false certainty?
How is this done? Using performance metrics – e.g., Gleckler et al., 2008, Reichler and Kim, 2010 –For variously weighting schemes –For selecting good (or bad) performers –Example from ENSEMBLES program (Christensen et al.) –In probabilistic or selection approaches (does selection or weighting really provide different estimates? Usually not - but, example of Walsh et al., Dominguez et al.