1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop.

1 James Brown James.D.Brown@noaa.gov An introduction to verifying probability forecasts RFC Verification Workshop

2 1.Introduction to methods What methods are available? How do they reveal (or not) particular errors? Lecture now, and hands-on training later 2.Introduction to prototype software Ensemble Verification System (EVS) Part of a larger experimental project (XEFS) Lecture now, and hands on training later Goals for today

3 3.To establish user-requirements EVS in very early (prototype) stage Pool of methods may expand or contract Need some input on verification products AND to address pre-workshop questions…... Goals for today

4 How is ensemble verification done? Same for short/long-term ensembles? What tools, and are they operational? Which metrics for which situations? Simple metrics for end-users? How best to manage the workload? What data need to be archived/how? Pre-workshop questions

5 1.Background and status 2.Overview of EVS 3.Metrics available in EVS 4.First look at the user-interface (GUI) Contents for next hour

6 1. Background and status

7 A first look at operational needs Two classes of verification identified 1.High time sensitivity (‘prognostic’) e.g. how reliable is my live flood forecast?... …where should I hedge my bets? 2.Less time sensitive (‘diagnostic’) e.g. which forecasts do less well and why? A verification strategy?

8 Prognostic example Temperature ( o C) Forecast lead day Live forecast (L) Historical observations | μ H = μ L ± 1.0 ˚ C Matching historical forecasts (H)

9 Diagnostic example Probability of warning correctly (hit) Probability of warning incorrectly (‘false alarms’) 0 0 1.0 e.g. flood warning when P>=0.9 Climatology Single-valued forecast

10 Motivation for EVS (and XEFS) Demand: forecasters and their customers Demand for useable verification products ….limitations of existing software History Ensemble Verification Program (EVP) Comprised (too) many parts, lacked flexibility Prototype EVS begun in May 07 for XEFS….. Motivation for EVS

11 Position in XEFS IFP Ensemble Viewer OFS Raw flow ens. Pp’ed flow ens. Ensemble Verification Subsystem Flow Data Ens. Product Generation Subsystem Ensemble verification products Hydrologic Ensemble Hindcaster Ens. User Interface EPP User Interface Ens. Pre- Processor Atmospheric forcing data Ensemble /prob. products Ens. Post- Proc. Ens. Streamflow Prediction Subsystem HMOS Ensemble Processor MODs EPP3 ESP2EnsPostEPG EVS Hydro- meteorol. ensembles Precip., temp. etc. Streamflow

12 2. Overview of EVS

13 Diagnostic verification For diagnostic purposes (less time-sensitive) Prognostic built into forecasting systems Diagnostic questions include…. Are ensembles reliable? Prob[flood]=0.9: does it occur 9/10 times? Are forecaster MODS working well? What are the major sources of uncertainty? Scope of EVS

14 Verification of continuous time-series Temperature, precipitation, streamflow etc. > 1 forecast points, but not spatial products All types of forecast times Any lead time (e.g. 1 day – 2 years or longer) Any forecast resolution (e.g. hourly, daily) Pair forecasts/observed (in different t-zones) Ability to aggregate across forecast points Design goals of EVS

15 Flexibility to target data of interest Subset based on forecasts and observations Two conditions: 1) time; 2) variable value e.g. forecasts where ensemble mean < 0˚C e.g. max. observed flow in 90 day window Ability to pool/aggregate forecast points Number of observations can be limiting Sometimes appropriate to pool points Design goals of EVS

16 Carefully selected metrics Different levels of detail on errors Some are more complex than others, but…. Use cases and online docs. to assist To be ‘user-friendly’ Many factors determine this…. GUI, I/O, exec. speed, batch modes Design goals of EVS

17 Example of workflow (Re)define verification unit Condition: Flow > flood && winter Box plot and Talagrand Run (data read, paired, plots done) View and save plots How biased are my winter flows > flood level at dam A?

18 Coordinated across XEFS: The forecasts Streamflow: ESP binary files (.CS) Temperature and precip: OHD datacard files The observations OHD datacard files Unlikely to be database in near future Archiving requirements

19 3. Metrics available

20 Many ways to test a probability forecast 1.Tests for single-valued property (e.g. mean) 2.Tests of broader forecast distribution Both may involve reference forecasts (“skill”) Caveats in testing probabilities Observed probabilities require many events Big assumption 1: we can ‘pool’ events Big assumption 2: observations are ‘good’ Types of metrics

21 Discrete/categorical forecasts Many metrics rely on discrete forecasts e.g. will it rain? {yes/no} (rain > 0.01) e.g. will it flood? {yes/no} (stage > flood level) What about continuous forecasts? An infinite number of events Arbitrary event thresholds (i.e. ‘bins’)? Typically, yes (and choice will affect results) Problem of cont. forecasts

22 Detail varies with verification question e.g. inspection of ‘blown’ forecasts (detailed) e.g. avg. reliability of flood forecast (< detail) e.g. rapid screening of forecasts (<< detail) All included to some degree in EVS…… Metrics in EVS

23 Greatest + ve 90 percent. 80 percent. 50 percent. 20 percent. 10 percent. ‘Errors’ for 1 forecast Greatest - ve Observation Ensemble forecast errors Most detailed (box plot) 0 2 4 6 8 10 12 14 16 18 20 Time (days since start time)

24 Greatest + ve 90 percent. 80 percent. 50 percent. 20 percent. 10 percent. ‘Errors’ for 1 forecast Greatest - ve Observation Ensemble forecast errors Observed value (increasing size) Most detailed (box plot)

25 Less detail (Reliability) Observed probability given forecast 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Forecast probability (probability of flooding) “On occasions when flooding is forecast with probability 0.5, it should occur 50% of the time.” “Forecast bias”

26 Less detail (C. Talagrand) Cumulative probability “If river stage <=X is forecast with probability 0.5, it should be observed 50% of the time.” 0 10 20 30 40 50 60 70 80 90 100 Position of observation in forecast distribution “Forecast bias”

27 Least detailed (a score) 0 5 10 15 20 25 30 River stage Time (days) 2.0 1.6 1.2 0.8 0.4 0.0 Flood stage 1 2 3 5 Forecast Observation Brier score = 1/5 x { (0.8-1.0) 2 + (0.1-1.0) 2 + (0.0-0.0) 2 + (0.95-1.0) 2 + (1.0-1.0) 2 } 4

28 Least detailed (a score) 0 5 10 15 20 25 30 Cumulative probability Precipitation amount 1.0 0.8 0.6 0.4 0.2 0.0 Single forecast Observation A B CRPS = A 2 + B 2 Then average across multiple forecasts: small scores are better

29 4. First look at the GUI

30 Two-hour lab sessions with EVS Start with synthetic data (with simple errors) Then more on to a couple of real cases Verification plans and feedback Real-time (‘prognostic’) verification Screening verification outputs Developments in EVS Feedback: discussion and survey Rest of today

1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop.

Similar presentations

Presentation on theme: "1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop.

Similar presentations

Presentation on theme: "1 James Brown An introduction to verifying probability forecasts RFC Verification Workshop."— Presentation transcript:

Similar presentations

About project

Feedback