Ensemble Forecasting: Calibration, Verification, and use in Applications Tom Hopson.

Ensemble Forecasting: Calibration, Verification, and use in Applications
Tom Hopson

Outline Motivation for ensemble forecasting and post-processing
Introduce Quantile Regression (QR; Kroenker and Bassett, 1978) post-processing procedure Ensemble forecast verification Thorpex-Tigge data set Ensemble forecast examples: a) Southwestern African flooding b) African meningitis c) US Army test range weather forecasting d) Bangladesh flood forecasting

Goals of an Ensemble Prediction System (EPS)
Predict the observed distribution of events and atmospheric states Predict uncertainty in the day’s prediction Predict the extreme events that are possible on a particular day Provide a range of possible scenarios for a particular forecast

More technically … Greater accuracy of ensemble mean forecast (half the error variance of single forecast) Likelihood of extremes Non-Gaussian forecast PDF’s Ensemble spread as a representation of forecast uncertainty => All rely on forecasts being calibrated Further … -- Argue calibration essential for tailoring to local application: NWP provides spatially- and temporally-averaged gridded forecast output -- Applying gridded forecasts to point locations requires location specific calibration to account for local spatial- and temporal-scales of variability ( => increasing ensemble dispersion)

Note: Take home message:
obs Probability Forecast PDF Discharge Take home message: For a “calibrated ensemble”, error variance of the ensemble mean is 1/2 the error variance of any ensemble member (on average), independent of the distribution being sampled

Forecast “calibration” or “post-processing”
“bias” obs Forecast PDF Probability Probability Forecast PDF obs “spread” or “dispersion” calibration Flow rate [m3/s] Flow rate [m3/s] Post-processing has corrected: the “on average” bias as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”) Our approach: under-utilized “quantile regression” approach probability distribution function “means what it says” daily variation in the ensemble dispersion directly relate to changes in forecast skill => informative ensemble skill-spread relationship

You cannot verify an ensemble forecast with a single observation.
Rank Histograms – measuring the reliability of an ensemble forecast You cannot verify an ensemble forecast with a single observation. The more data you have for verification, (as is true in general for other statistical measures) the more certain you are. Rare events (low probability) require more data to verify => as do systems with many ensemble members. From Barb Brown

From Tom Hamill

Troubled Rank Histograms
Counts Counts Ensemble # Ensemble # Slide from Matt Pocernic

From Tom Hamill

Example of Quantile Regression (QR)
Our application Fitting T quantiles using QR conditioned on: Ranked forecast ens ensemble mean ensemble median 4) ensemble stdev 5) Persistence R package: quantreg

Probability/°K T [K] Temperature [K] Time Probability/°K T [K]
Step 2: For each quan, use “forward step-wise cross-validation” to iteratively select best subset Selection requirements: a) QR cost function minimum, b) Satisfy binomial distribution at 95% confidence If requirements not met, retain climatological “prior” Step I: Determine climatological quantiles Probability/°K climatological PDF 1. Regressor set: 1. reforecast ens 2. ens mean 3. ens stdev 4. persistence 5. LR quantile (not shown) 3. T [K] 2. 4. Temperature [K] observed forecasts Time Step 3: segregate forecasts into differing ranges of ensemble dispersion and refit models (Step 2) uniquely for each range Final result: “sharper” posterior PDF represented by interpolated quans forecasts Forecast PDF in both min and maxT forecasts underforecasting by about 1.5C (i.e. f - o = -1.5) AR(1) means I’m only using today’s observed error to forecast tomorrow’s error, and then Removing this from the forecast 16% of ensembles gained by the AR(1) as was determined using “generalized cross-validation” posterior I. II. III. II. I. Probability/°K prior T [K] Temperature [K] Time

Rank Probability Score
for multi-categorical or continuous variables

Scatter-plot and Contingency Table
Brier Score Does the forecast detect correctly temperatures above 18 degrees ? y = forecasted event occurence o = observed occurrence (0 or 1) i = sample # of total n samples => Note similarity to MSE Slide from Barbara Casati

Other post-processing approaches …
1) Bayesian Model Averaging (BMA) – Raftery et al (1997) 2) Analogue approaches – Hopson and Webster, J. Hydromet (2010) 3) Kalman Filter with analogues – Delle Monache et al (2010) 4) Quantile regression – Hopson and Hacker, MWR (under review) 5) quantile-to-quantile (quantile matching) approach – Hopson and Webster J. Hydromet (2010) … many others

Quantile Matching: another approach when matched forecasts-observation
pairs are not available => useful for climate change studies ECMWF 51-member Ensemble Precipitation Forecasts compared To observations 2004 Brahmaputra Catchment-averaged Forecasts black line satellite observations colored lines ensemble forecasts -Basic structure of catchment rainfall similar for both forecasts and observations -But large relative over-bias in forecasts

Forecast Bias Adjustment
done independently for each forecast grid (bias-correct the whole PDF, not just the median) Model Climatology CDF “Observed” Climatology CDF Pmax Pmax Precipitation Pfcst Padj 25th 50th 75th 100th 25th 50th 75th 100th Quantile Quantile In practical terms … ranked forecasts ranked observations Precipitation 1m Precipitation 1m Hopson and Webster (2010)

Bias-corrected Precipitation Forecasts
Original Forecast Brahmaputra Corrected Forecasts Corrected Forecast => Now observed precipitation within the “ensemble bundle”

THORPEX Interactive Grand Global Ensemble
TIGGE, the THORPEX Interactive Grand Global Ensemble component of the World Weather Research Programme TIGGE archive consists of ensemble forecast data from ten global NWP centers designed to accelerate the improvements in the accuracy of 1-day to 2 week high-impact weather forecasts for the benefit of humanity. starting from October 2006 available for scientific research near-real time forecasts (some centers delayed)

Archive Status and Monitoring, Data Receipt
UKMO CMC CMA ECMWF MeteoFrance NCAR NCEP JMA NCDC KMA IDD/LDM HTTP FTP Archive Centre CPTEC Current Data Provider BoM Unidata IDD/LDM Internet Data Distribution / Local Data Manager Commodity internet application to send and receive data

Archive Status and Monitoring, Variability between providers

Archive Status and Monitoring, Archive Completeness
Variable LvL ECWF UKMO JMA NCEP CMA CMC BOM MetF KMA CPTC Geopotential Z PL Specific H T U-velocity V-velocity Potential Vor PT Potential T PV V-Velocity U 10m SL V 10m CAPE Conv. Inhib. Land-sea Mean SLP Orog. Skin T Snow D. H20 Snow F. H20 PL = Pressure Level, PT = 320K θ Level, PV = ± 2 Potential Vorticity Level, SL = Single/Surface Level

Archive Status and Monitoring, Archive Completeness
Variable LvL ECWF UKMO JMA NCEP CMA CMC BOM MetF KMA CPTC Soil Moist. SL Soil T Sunshine D. Surf. DPT Surf. ATmax Surf. ATmin Surf. AT Surf. P LW Rad. Out LH flux Net Rad Net Therm. Rad Sensible Rad. Cloud Cov Column Water Precipitation Wilt. Point Field Cap. PL = Pressure Level, PT = 320K θ Level, PV = ± 2 Potential Vorticity Level, SL = Single/Surface Level

Early May 2011, floods in southwestern Africa

-- examine ens forecasts … ECMWF 24hr precip

-- examine ens forecasts … NCEP GEFS 24hr precip

-- examine ens forecasts … ECMWF 5-day precip

-- examine ens forecasts … NCEP GEFS 5day precip

A Cautionary Warning about using Probabilistic
Precipitation Forecasts in Hydrologic Modeling (Importance of Maintaining Spatial and Temporal Covariances for Hydrologic Forecasting => one option: “Schaake Shuffle”) River catchtment A ensemble1 ensemble2 ensemble3 subC subB QC QB QA QA same For all 3 possible ensembles Scenario for smallest possible QA? No. Scenario for average QA? Scenario for largest possible QA? No.

Dugway Proving Ground

Dugway Proving Grounds, Utah e.g. T Thresholds
Includes random and systematic differences between members. Not an actual chance of exceedance unless calibrated.

Challenges in probabilistic mesoscale prediction
Model formulation Bias (marginal and conditional) Lack of variability caused by truncation and approximation Non-universality of closure and forcing Initial conditions Small-scales are damped in analysis systems, and the model must develop them Perturbation methods designed for medium-range systems may not be appropriate Lateral boundary conditions After short time periods the lateral boundary conditions can dominate Representing uncertainty in lateral boundary conditions is critical Lower boundary conditions Dominate boundary-layer response Difficult to estimate uncertainty in lower boundary conditions

RTFDDA and Ensemble-RTFDDA
Liu et al AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, 2010

3-hr dewpoint time series
Station DPG S01 Before Calibration After Calibration National Security Applications Program Research Applications Laboratory

42-hr dewpoint time series
Station DPG S01 Before Calibration After Calibration

PDFs: raw vs. calibrated
Blue is “raw” ensemble Black is calibrated ensemble Red is the observed value Notice: significant change in both “bias” and dispersion of final PDF (also notice PDF asymmetries) obs

3-hr dewpoint rank histograms
Station DPG S01 National Security Applications Program Research Applications Laboratory

42-hr dewpoint rank histograms
Station DPG S01 National Security Applications Program Research Applications Laboratory

Utilizing Verification measures near-real-time …
Measures Used: Rank histogram (converted to scalar measure) Root Mean square error (RMSE) Brier score Rank Probability Score (RPS) Relative Operating Characteristic (ROC) curve New measure of ensemble skill-spread utility => Using these for automated calibration model selection by using weighted sum of skill scores of each

Skill Scores Single value to summarize performance.
Reference forecast - best naive guess; persistence, climatology A perfect forecast implies that the object can be perfectly observed Positively oriented – Positive is good

Skill Score Verification
CRPS Skill Score RMSE Skill Score Reference Forecasts: Black -- raw ensemble Blue -- persistence National Security Applications Program Research Applications Laboratory

Thank You!

Ensemble Forecasting: Calibration, Verification, and use in Applications Tom Hopson.

Similar presentations

Presentation on theme: "Ensemble Forecasting: Calibration, Verification, and use in Applications Tom Hopson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ensemble Forecasting: Calibration, Verification, and use in Applications Tom Hopson.

Similar presentations

Presentation on theme: "Ensemble Forecasting: Calibration, Verification, and use in Applications Tom Hopson."— Presentation transcript:

Similar presentations

About project

Feedback