Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC.

Similar presentations


Presentation on theme: "Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC."— Presentation transcript:

1 Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC

2 2 Questions: 1)How best to utilize a multi-model simulation (forecast), especially if under-dispersive? a)Should more dynamical variability be searched for? Or b)Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble? 2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”? a)Blending two post-processing approaches (quantile regression and logistic regression) b)Applying “poor-man’s” data assimilation

3 Outline Explore these two questions using multi-model simulations for the French Broad River, NC of MOPEX I.Multi-model: Framework for Understanding Structural Errors (FUSE)  Pre-calibration results => under-dispersive II.Calibration procedure  Introduce Quantile Regression (“QR”; Kroenker and Bassett, 1978) III.Discussing of Question 1) -- how best to utilize multi-model IV.Question 2 -- multi-model vs single-best model  single-best model calibration procedure: Logistic Regression employed under QR framework

4 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. FUSE: Framework for Understanding Structural Errors

5 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: upper layer architecture

6 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: lower layer / baseflow

7 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: percolation

8 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Define development decisions: surface runoff

9 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Build unique models: combination 1

10 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Build unique models: combination 2

11 PRMSSACRAMENTO ARNO/VICTOPMODEL pe S1TS1T S1FS1F S2TS2T S 2 FA S 2 FB q if qbAqbA qbBqbB q 12 ep q sx qbqb q 12 S2S2 S1S1 GFLWR S2S2 S1S1 ep q 12 q sx qbqb ep S 1 TA q sx q if qbqb S 1 TB S1FS1F q 12 S2S2 Clark, M.P., A.G. Slater, D.E. Rupp, R.A. Woods, J.A. Vrugt, H.V. Gupta, T. Wagener, and L.E. Hay (2008) Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research, 44, W00B02, doi:10.1029/2007WR006735. Build unique models: combination 3 78 UNIQUE HYDROLOGIC MODELS ALL WITH DIFFERENT STRUCTURE

12 Example: French Broad River Before Calibration => underdispersive Black curve shows observations; colors are ensemble

13 Goal of “calibration” or “post-processing” Probability calibration Discharge Probability Skillful ensemble post-processing should correct: the “on average” bias as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”) ensemble represents a random-draw from an (unknown) PDF of hydrologic uncertainty  goal: produce calibrated ensemble forecast that has as much information content (i.e. “narrow PDF”) as possible “spread” or “dispersion” “bias” obs Forecast PDF Forecast PDF Discharge

14 Our approach: Quantile Regression (QR) Benefits 1)Less sensitivity to outliers 2)Works with heteroscedastic errors 3)Optimally fit for each part of the PDF 4) “flat” rank histograms

15 Calibration Procedure Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations). For each of quantile: 1)Perform a “climatological” fit to the data => simulation always as good as “climatology” 2)Starting with full regressor set, iteratively select best subset using “forward step-wise cross- validation” –Fitting done using QR –Selection done by: a)Minimizing QR cost function b)Satisfying the binomial distribution => Verification measures directly inform the model selection 3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility Probability Discharge  obs Model PDF Regressor set for each quantile: 1) - 78) All individual 78 model simulations 79) ensemble mean 80) ensemble standard deviation 81) ranked ensemble member (sorted ensemble that corresponds to quantile being fit)

16 Example: French Broad River Before CalibrationAfter Calibration Black curve shows observations; colors are ensemble

17 Raw versus Calibrated PDF’s Blue is “raw” FUSE ensemble Black is calibrated ensemble Red is the observed value Notice: shift in both “bias” and dispersion of final PDF (also notice PDF asymmetries) obs

18 Rank Histogram Comparisons After quantile regression, rank histogram more uniform (although now slightly over-dispersive) Raw full ensembleAfter calibration

19 Calibration: Rank Histogram and Skill-Spread Information (reference forecast uncalibrated FUSE ensemble) Rank Histogram Skill Score = 0.86 Skill-Spread Skill Score = 0.75

20 Frequency Used for Quantile Fitting of Method I: Best Model=76% Ensemble StDev=13% Ensemble Mean=0% Ranked Ensemble=6% What Nash-Sutcliffe implies about Utility

21 Sequentially-averaged FUSE models (ranked based on NS Score) and their resultant NS Score  Notice the degredation of NS with increasing # (with a peak at 2 models)  For an equitable multi-model, NS should rise monotonically  Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?) What Nash-Sutcliffe implies about Utility (cont) -- degredation with increased ensemble size

22 Initial Frequency Used for Quantile Fitting: Best Model=76% Ensemble StDev=13% Ensemble Mean=0% Ranked Ensemble=6% What Nash-Sutcliffe implies about Utility (cont) Reduced Set Frequency Used for Quantile Fitting: Best Model=73% Ensemble StDev=3% Ensemble Mean=32% Ranked Ensemble=29% …using only top 1/3 of models To rank and form ensemble mean … … earlier results …  Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …  Examine verification skill measures …

23 Skill Score Comparisons between full- and “filtered” FUSE ensemble sets Points: -- quite similar results for a variety of skill scores -- both approaches give appreciable benefit over the original raw multi-model output -- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set  post-processing method fairly robust  More work (more filtering?)! GREEN -- full calibrated multi-model BLUE -- “filtered” calibrated multi-model Reference -- uncalibrated FUSE set

24 Calibration Procedure (cont) I. Method I (FUSE multi-model) regressor set for each 78 quantiles: 1) sorted associated ensemble member; 2) ensemble median; 3) ensemble standard deviation; 4) all 78 models II. Method 2 (FUSE best member) regressor set for each 78 quantiles: First step - use Logistic Regression (LR) – Calibrate CDF over prescribed set of climatological quantiles using FUSE best member -- for each day: resample 78 member ensemble 1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

25 Calibration Procedure -- single best model Use QR to perform a fit on 78 quantiles individually (recall: 78 FUSE models simulations). For each of quantile: 1)Perform a “climatological” fit to the data => simulation always as good as “climatology” 2)Starting with full regressor set, iteratively select best subset using “forward step-wise cross- validation” –Fitting done using QR –Selection done by: a)Minimizing QR cost function b)Satisfying the binomial distribution => Verification measures directly inform the model selection 3) 2nd pass: segregate forecasts into differing ranges of ensemble dispersion, and refit models => forcing skill-spread utility Probability Discharge obs Model PDF Regressor set for each quantile: First step - use Logistic Regression – Calibrate CDF over prescribed set of climatological quantiles using FUSE best member -- for each day: resample 78 member ensemble 1) best member; 2) persistence; 3) associated Logistic Regression ensemble member

26 Regression Variable Usage for Best-member method)  Best Model 34%  Persistence 66%  Logistic regress q 86%

27 Example: French Broad River I. Multi-Model MethodII. Best-member Approach Black curve shows observations; colors are ensemble

28 GREEN -- Multi-model BLUE -- Best-Model Reference -- uncalibrated FUSE set Skill Score Comparisons between multi-model versus single best model calibrated ensembles Points: -- quite similar results for a variety of skill scores -- both approaches give appreciable benefit over the original raw multi-model output -- CRPSS shows an improvement of the single model over the full set  post-processing method fairly robust

29 2 Questions revisited: 1)How best to utilize a multi-model simulations (forecast), especially if under-dispersive? a)Should more dynamical variability be searched for? Or b)Is it better to balance post-processing with multi-model utilization to create a properly dispersive, informative ensemble? “Answer”: adding more models can lead to decreasing skill of the ensemble mean (even if the ensemble is under-dispersive) 2) How significant are the gains of a multi-model approach over utilizing a single best model with a few “tricks”? a)Blending two post-processing approaches (quantile regression and logistic regression) b)Applying “poor-man’s” data assimilation “Answer”: quantile-regression-based calibration is fairly robust and can do a lot with just a single model, especially if a variety of approaches are utililized => Benefit of moving beyond model ensemble calibration alone, to include “kitchen sink”, including “blending” of different techniques

30 Summary  Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast  This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc.)  “step-wise cross-validation” calibration approach used here ensures forecast skill greater than climatology and persistence (by utilizing both in the regression selection process)  The approach here improves a forecast’s ability to represent its own potential forecast error: – More uniform rank histogram – Guaranteeing utility in the ensemble dispersion (=> more spread, more uncertainty)


Download ppt "Multi-Model or Post- processing: Pros and Cons Tom Hopson - NCAR Martyn Clark - NIWA Andrew Slater - CIRES/NSIDC."

Similar presentations


Ads by Google