Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC

2 Questions: 1)How best to utilize a multi-model simulation (forecast), especially if under-dispersive? a)Should more dynamical variability be searched for? Or b)Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble? 2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”? a)Blending two post-processing approaches (quantile regression and logistic regression) b)Applying “poor-man’s” data assimilation

Outline Explore these two questions using multi-model simulations for the French Broad River, NC of MOPEX I.Multi-model: Framework for Understanding Structural Errors (FUSE)  Pre-calibration results => under-dispersive II.Calibration procedure  Introduce Quantile Regression (“QR”; Kroenker and Bassett, 1978) III.Discussing of Question 1) -- how best to utilize multi-model IV.Question 2 -- multi-model vs single-best model  single-best model calibration procedure: Logistic Regression employed under QR framework

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. FUSE: Framework for Understanding Structural Errors

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: upper layer architecture

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: lower layer / baseflow

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: percolation

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: surface runoff

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Build unique models: combination 1

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Build unique models: combination 2

PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Build unique models: combination 3 78 UNIQUE HYDROLOGIC MODELS ALL WITH DIFFERENT STRUCTURE

Example: French Broad River Before Calibration => underdispersive Black curve shows observations; colors are ensemble

Goal of “calibration” or “post-processing” Probability calibration Discharge Probability Skillful ensemble post-processing should correct: the “on average” bias as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”) ensemble represents a random-draw from an (unknown) PDF of hydrologic uncertainty  goal: produce calibrated ensemble forecast that has as much information content (i.e. “narrow PDF”) as possible “spread” or “dispersion” “bias” obs Forecast PDF Forecast PDF Discharge

Our approach: Quantile Regression (QR) Benefits 1)Less sensitivity to outliers 2)Works with heteroscedastic errors 3)Optimally fit for each part of the PDF 4) “flat” rank histograms

Calibration Procedure Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations). For each of quantile: 1)Perform a “climatological” fit to the data => simulation always as good as “climatology” 2)Starting with full regressor set, iteratively select best subset using “forward step-wise cross- validation” –Fitting done using QR –Selection done by: a)Minimizing QR cost function b)Satisfying the binomial distribution => Verification measures directly inform the model selection 3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility Probability Discharge  obs Model PDF Regressor set for each quantile: 1) - 78) All individual 78 model simulations 79) ensemble mean 80) ensemble standard deviation 81) ranked ensemble member (sorted ensemble that corresponds to quantile being fit)

Example: French Broad River Before CalibrationAfter Calibration Black curve shows observations; colors are ensemble

Raw versus Calibrated PDF’s Blue is “raw” FUSE ensemble Black is calibrated ensemble Red is the observed value Notice: shift in both “bias” and dispersion of final PDF (also notice PDF asymmetries) obs

Rank Histogram Comparisons After quantile regression, rank histogram more uniform (although now slightly over-dispersive) Raw full ensembleAfter calibration

Calibration: Rank Histogram and Skill-Spread Information (reference forecast uncalibrated FUSE ensemble) Rank Histogram Skill Score = 0.86 Skill-Spread Skill Score = 0.75

Frequency Used for Quantile Fitting of Method I: Best Model=76% Ensemble StDev=13% Ensemble Mean=0% Ranked Ensemble=6% What Nash-Sutcliffe implies about Utility

Sequentially-averaged FUSE models (ranked based on NS Score) and their resultant NS Score  Notice the degredation of NS with increasing # (with a peak at 2 models)  For an equitable multi-model, NS should rise monotonically  Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?) What Nash-Sutcliffe implies about Utility (cont) -- degredation with increased ensemble size

Initial Frequency Used for Quantile Fitting: Best Model=76% Ensemble StDev=13% Ensemble Mean=0% Ranked Ensemble=6% What Nash-Sutcliffe implies about Utility (cont) Reduced Set Frequency Used for Quantile Fitting: Best Model=73% Ensemble StDev=3% Ensemble Mean=32% Ranked Ensemble=29% …using only top 1/3 of models To rank and form ensemble mean … … earlier results …  Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …  Examine verification skill measures …

Skill Score Comparisons between full- and “filtered” FUSE ensemble sets Points: -- quite similar results for a variety of skill scores -- both approaches give appreciable benefit over the original raw multi-model output -- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set  post-processing method fairly robust  More work (more filtering?)! GREEN -- full calibrated multi-model BLUE -- “filtered” calibrated multi-model Reference -- uncalibrated FUSE set

Calibration Procedure (cont) I. Method I (FUSE multi-model) regressor set for each 78 quantiles: 1) sorted associated ensemble member; 2) ensemble median; 3) ensemble standard deviation; 4) all 78 models II. Method 2 (FUSE best member) regressor set for each 78 quantiles: First step - use Logistic Regression (LR) – Calibrate CDF over prescribed set of climatological quantiles using FUSE best member -- for each day: resample 78 member ensemble 1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

Calibration Procedure -- single best model Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations). For each of quantile: 1)Perform a “climatological” fit to the data => simulation always as good as “climatology” 2)Starting with full regressor set, iteratively select best subset using “forward step-wise cross- validation” –Fitting done using QR –Selection done by: a)Minimizing QR cost function b)Satisfying the binomial distribution => Verification measures directly inform the model selection 3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility Probability Discharge obs Model PDF Regressor set for each quantile: First step - use Logistic Regression – Calibrate CDF over prescribed set of climatological quantiles using FUSE best member -- for each day: resample 78 member ensemble 1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

Regression Variable Usage for Best-member method)  Best Model 34%  Persistence 66%  Logistic regress q 86%

Example: French Broad River I. Multi-Model MethodII. Best-member Approach Black curve shows observations; colors are ensemble

GREEN -- Multi-model BLUE -- Best-Model Reference -- uncalibrated FUSE set Skill Score Comparisons between multi-model versus single best model calibrated ensembles Points: -- quite similar results for a variety of skill scores -- both approaches give appreciable benefit over the original raw multi-model output -- CRPSS shows an improvement of the single model over the full set  post-processing method fairly robust

2 Questions revisited: 1)How best to utilize a multi-model simulations (forecast), especially if under-dispersive? a)Should more dynamical variability be searched for? Or b)Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble? “Answer”: adding more models can lead to decreasing skill of the ensemble mean (even if the ensemble is under-dispersive) 2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”? a)Blending two post-processing approaches (quantile regression and logistic regression) b)Applying “poor-man’s” data assimilation “Answer”: quantile-regression-based calibration is fairly robust and can do a lot with just a single model, especially if a variety of approaches are utililized => Benefit of moving beyond model ensemble calibration alone, to include “kitchen sink”, including “blending” of different techniques

Summary  Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast  This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc.)  “step-wise cross-validation” calibration approach used here ensures forecast skill greater than climatology and persistence (by utilizing both in the regression selection process)  The approach here improves a forecast’s ability to represent its own potential forecast error: – More uniform rank histogram – Guaranteeing utility in the ensemble dispersion (=> more spread, more uncertainty)

Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Similar presentations

Presentation on theme: "Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Similar presentations

Presentation on theme: "Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC."— Presentation transcript:

Similar presentations

About project

Feedback