14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic 14-16 May 2001.

Slides:

Advertisements

Similar presentations

Verification of Probabilistic Forecast J.P. Céron – Direction de la Climatologie S. Mason - IRI.

Advertisements

ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.

F. Grazzini, Forecast Products Users Meeting June 2005 SEVERE WEATHER FORECASTS F. Grazzini, F. Lalaurette.

1 Verification Continued… Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Guidance of the WMO Commission for CIimatology on verification of operational seasonal forecasts Ernesto Rodríguez Camino AEMET (Thanks to S. Mason, C.

6th WMO tutorial Verification Martin GöberContinuous 1 Good afternoon! नमस्कार नमस्कार Guten Tag! Buenos dias! до́брый день! до́брыйдень Qwertzuiop asdfghjkl!

Simple Linear Regression and Correlation

1 Verification Introduction Holly C. Hartmann Department of Hydrology and Water Resources University of Arizona RFC Verification Workshop,

Verification of probability and ensemble forecasts

Seasonal Predictability in East Asian Region Targeted Training Activity: Seasonal Predictability in Tropical Regions: Research and Applications 『 East.

Bookmaker or Forecaster? By Philip Johnson. Jersey Meteorological Department.

Creating probability forecasts of binary events from ensemble predictions and prior information - A comparison of methods Cristina Primo Institute Pierre.

Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.

Gridded OCF Probabilistic Forecasting For Australia For more information please contact © Commonwealth of Australia 2011 Shaun Cooper.

Verification and evaluation of a national probabilistic prediction system Barbara Brown NCAR 23 September 2009.

Statistical Weather Forecasting Independent Study Daria Kluver From Statistical Methods in the Atmospheric Sciences by Daniel Wilks.

NWP Verification with Shape- matching Algorithms: Hydrologic Applications and Extension to Ensembles Barbara Brown 1, Edward Tollerud 2, Tara Jensen 1,

PROVIDING DISTRIBUTED FORECASTS OF PRECIPITATION USING A STATISTICAL NOWCAST SCHEME Neil I. Fox and Chris K. Wikle University of Missouri- Columbia.

Application of Forecast Verification Science to Operational River Forecasting in the National Weather Service Julie Demargne, James Brown, Yuqiong Liu.

Probability Forecasting, Probability Evaluation, and Scoring Rules: Expanding the Toolbox Robert L. Winkler Duke University Subjective Bayes Workshop –

Evaluation of a Mesoscale Short-Range Ensemble Forecasting System over the Northeast United States Matt Jones & Brian A. Colle NROW, 2004 Institute for.

The 10th annual Northeast Regional Operational Workshop, Albany, NY Verification of SREF Aviation Forecasts at Binghamton, NY Justin Arnott NOAA / NWS.

Chapter 13 – Weather Analysis and Forecasting. The National Weather Service The National Weather Service (NWS) is responsible for forecasts several times.

Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.

McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.

The Practice of Social Research

Multi-Model Ensembling for Seasonal-to-Interannual Prediction: From Simple to Complex Lisa Goddard and Simon Mason International Research Institute for.

Introduction to Seasonal Climate Prediction Liqiang Sun International Research Institute for Climate and Society (IRI)

ECMWF WWRP/WMO Workshop on QPF Verification - Prague, May 2001 NWP precipitation forecasts: Validation and Value Deterministic Forecasts Probabilities.

1 On the use of radar data to verify mesoscale model precipitation forecasts Martin Goeber and Sean Milton Model Diagnostics and Validation group Numerical.

Understanding Statistics

Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.

4IWVM - Tutorial Session - June 2009 Verification of categorical predictands Anna Ghelli ECMWF.

Verification of the Cooperative Institute for Precipitation Systems‘ Analog Guidance Probabilistic Products Chad M. Gravelle and Dr. Charles E. Graves.

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Data mining in the joint D- PHASE and COPS archive Mathias.

Measuring forecast skill: is it real skill or is it the varying climatology? Tom Hamill NOAA Earth System Research Lab, Boulder, Colorado

Latest results in verification over Poland Katarzyna Starosta, Joanna Linkowska Institute of Meteorology and Water Management, Warsaw 9th COSMO General.

61 st IHC, New Orleans, LA Verification of the Monte Carlo Tropical Cyclone Wind Speed Probabilities: A Joint Hurricane Testbed Project Update John A.

Probabilistic Forecasting. pdfs and Histograms Probability density functions (pdfs) are unobservable. They can only be estimated. They tell us the density,

Foundations of Sociological Inquiry Statistical Analysis.

Refinement and Evaluation of Automated High-Resolution Ensemble-Based Hazard Detection Guidance Tools for Transition to NWS Operations Kick off JNTP project.

Statistical Analysis Quantitative research is first and foremost a logical rather than a mathematical (i.e., statistical) operation Statistics represent.

The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Verification and Metrics (CAWCR)

Basic Verification Concepts

Standard Verification Strategies Proposal from NWS Verification Team NWS Verification Team Draft03/23/2009 These slides include notes, which can be expanded.

1 Probabilistic Forecast Verification Allen Bradley IIHR Hydroscience & Engineering The University of Iowa RFC Verification Workshop 16 August 2007 Salt.

Probabilistic Forecasts of Extreme Precipitation Events for the U.S. Hazards Assessment Kenneth Pelman 32 nd Climate Diagnostics Workshop Tallahassee,

U. Damrath, COSMO GM, Athens 2007 Verification of numerical QPF in DWD using radar data - and some traditional verification results for surface weather.

Verification of ensemble systems Chiara Marsigli ARPA-SIMC.

Short Range Ensemble Prediction System Verification over Greece Petroula Louka, Flora Gofa Hellenic National Meteorological Service.

Nathalie Voisin 1, Florian Pappenberger 2, Dennis Lettenmaier 1, Roberto Buizza 2, and John Schaake 3 1 University of Washington 2 ECMWF 3 National Weather.

Common verification methods for ensemble forecasts

Extracting probabilistic severe weather guidance from convection-allowing model forecasts Ryan Sobash 4 December 2009 Convection/NWP Seminar Series Ryan.

Verification methods - towards a user oriented verification The verification group.

Verification of C&V Forecasts Jennifer Mahoney and Barbara Brown 19 April 2001.

User-Focused Verification Barbara Brown* NCAR July 2006

Uncertainty cones deduced from an ensemble prediction system

Verifying and interpreting ensemble products

Precipitation Products Statistical Techniques

Binary Forecasts and Observations

Verification of nowcasting products: Issues and methods

Forecast Assimilation: A Unified Framework for the

Contingency Tables and Association

Probabilistic forecasts

Validation-Based Decision Making

COSMO-LEPS Verification

Ensemble forecasts and seasonal precipitation tercile probabilities

the performance of weather forecasts

Short Range Ensemble Prediction System Verification over Greece

Presentation transcript:

14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001 Barbara G. Brown NCAR Boulder, Colorado, U.S.A.

14 May 2001QPF Verification Workshop Why probability forecasts? “…the widespread practice of ignoring uncertainty when formulating and communicating forecasts represents an extreme form of inconsistency and generally results in the largest possible reductions in quality and value.” --Murphy (1993)

14 May 2001QPF Verification Workshop Outline 1.Background and basics –Types of events –Types of forecasts –Representation of probabilistic forecasts in the verification framework

14 May 2001QPF Verification Workshop Outline continued 2.Verification approaches: focus on 2-category case –Measures –Graphical representations –Using statistical models –Signal detection theory –Ensemble forecast verification –Extensions to multi-category verification problem –Comparing probabilistic and categorical forecasts 3.Connections to value 4.Summary, conclusions, issues

14 May 2001QPF Verification Workshop Background and basics Types of events: –Two-category –Multi-category Two-category events: –Either event A happens or Event B happens –Examples: Rain/No-rain Hail/No-hail Tornado/No-tornado Multi-category event –Event A, B, C, ….or Z happens –Example: Precipitation categories (< 1 mm, 1-5 mm, 5-10 mm, etc.)

14 May 2001QPF Verification Workshop Background and basics cont. Types of forecasts –Completely confident Forecast probability is either 0 or 1 Example: Rain/No rain –Probabilistic Objective (deterministic, statistical, ensemble-based) Subjective Probability is stated explicitly

14 May 2001QPF Verification Workshop Background and basics cont. Representation of probabilistic forecasts in the verification framework x = 0 or 1 f = 0, …, 1.0 f may be limited to only certain values between 0 and 1 Joint distribution: p(f,x), where x = 0, 1 Ex: If there are 12 possible values of f, then p(f,x) is comprised of 24 elements

14 May 2001QPF Verification Workshop Background and basics, cont. Factorizations: Conditional and marginal probabilities –Calibration-Refinement factorization: p(f,x) = p(x|f) p(f) p(x=0|f) = 1 – p(x=1|f) = 1 – E(x|f)  Only one number is needed to specify the distribution p(x|f) for each f p(f) is the frequency of use of each forecast probability Likelihood-Base Rate factorization: p(f,x) = p(f|x) p(x) p(x) is the relative frequency of a Yes observation (e.g., the sample climatology of precipitation); p(x) = E(x)

14 May 2001QPF Verification Workshop Attributes [from Murphy and Winkler(1992)] (sharpness)

14 May 2001QPF Verification Workshop Use the counts in this table to compute various common statistics (e.g., POD, POFD, H-K, FAR, CSI, Bias, etc.) Verification approaches: 2x2 case Completely confident forecasts:

14 May 2001QPF Verification Workshop Verification measures for 2x2 (Yes/No) completely confident forecasts

14 May 2001QPF Verification Workshop Relationships among measures in the 2x2 case Many of the measures in the 2x2 case are strongly related in surprisingly complex ways. For example:

14 May 2001QPF Verification Workshop The lines indicate different values of POD and POFD (where POD = POFD). From Brown and Young (2000)

14 May 2001QPF Verification Workshop CSI as a function of p(x=1) and POD=POFD

14 May 2001QPF Verification Workshop CSI as a function of FAR and POD

14 May 2001QPF Verification Workshop Measures for Probabilistic Forecasts Summary measures: –Expectation Conditional: E(f|x=0), E(f|x=1) E(x|f) Marginal: E(f) E(x) = p(x=1) –Correlation Joint distribution –Variability Conditional: Var.(f|x=0), Var(f|x=1) Var(x|f) Marginal : Var(f) Var(x) = E(x)[1-E(x)]

14 May 2001QPF Verification Workshop From Murphy and Winkler (1992) Summary measures for joint and marginal distributions:

14 May 2001QPF Verification Workshop From Murphy and Winkler (1992) Summary measures for conditional distributions:

14 May 2001QPF Verification Workshop Performance measures Brier score: –Analogous to MSE; negative orientation; –For perfect forecasts: BS=0 Brier skill score: –Analogous to MSE skill score

14 May 2001QPF Verification Workshop From Murphy and Winkler (1992):

14 May 2001QPF Verification Workshop Brier score displays From Shirey and Erickson,

14 May 2001QPF Verification Workshop Brier score displays From

14 May 2001QPF Verification Workshop Decomposition of the Brier Score Break Brier score into more elemental components: ReliabilityResolutionUncertainty Where I = the number of distinct probability values and Then, the Brier Skill Score can be re-formulated as

14 May 2001QPF Verification Workshop Graphical representations of measures Reliability diagram p(x=1|f i ) vs. f i Sharpness diagram p(f) Attributes diagram –Reliability, Resolution, Skill/No-skill Discrimination diagram p(f|x=0) and p(f|x=1) Together, these diagrams provide a relatively complete picture of the quality of a set of probability forecasts

14 May 2001QPF Verification Workshop Reliability and Sharpness (from Wilks 1995) ClimatologyMinimal RESUnderforecasting Good RES, at expense of REL Reliable forecasts of rare event Small sample size

14 May 2001QPF Verification Workshop Reliability and Sharpness (from Murphy and Winkler 1992) St. Louis h PoP Cool Season No skill No RES Model Sub Model Sub

14 May 2001QPF Verification Workshop Attributes diagram (from Wilks 1995)

14 May 2001QPF Verification Workshop Icing forecast examples

14 May 2001QPF Verification Workshop Use of statistical models to describe verification features Exploratory study by Murphy and Wilks (1998) Case study –Use regression model to model reliability –Use Beta distribution to model p(f) as measure of sharpness –Use multivariate diagram to display combinations of characteristics Promising approach that is worthy of more investigation

14 May 2001QPF Verification Workshop Fit Beta distribution to p(f) 2 parameters: p. q 0 1 Ideal: p<1; q<1

14 May 2001QPF Verification Workshop Fit regression to Reliability diagram [p(x|f) vs. f] 2 parameters: b 0, b 1 Murphy and Wilks (1997)

14 May 2001QPF Verification Workshop Summary Plot Murphy and Wilks 1997

14 May 2001QPF Verification Workshop Signal Detection Theory (SDT) Approach that has commonly been applied in medicine and other fields Brought to meteorology by Ian Mason (1982) Evaluates the ability of forecasts to discriminate between occurrence and non-occurrence of an event Summarizes characteristics of the Likelihood-Base Rate decomposition of the framework Tests model performance relative to specific threshold Ignores calibration Allows comparison of categorical and probabilistic forecasts

14 May 2001QPF Verification Workshop Mechanics of SDT Based on likelihood-base rate decomposition p(f,x) = p(f|x) p(x) Basic elements : –Hit rate (HR) HR = POD = YY / (YY+NY) Estimate of p(f=1|x=1) –False Alarm Rate (FA) FA = 1 - POFD = YN / (YN + NN) Estimate of p(f=1|x=0) Relative Operating Characteristic curve –Plot HR vs. FA

14 May 2001QPF Verification Workshop ROC Examples: Mason(1982)

14 May 2001QPF Verification Workshop ROC Examples: Icing forecasts

14 May 2001QPF Verification Workshop ROC Area under the ROC is a measure of forecast skill –Values less than 0.5 indicate negative skill Measurement of ROC Area often is better if a normal distribution model is used to model HR and FA –Area can be underestimated if curve is approximated by straight line segments –Harvey et al (1992), Mason (1982); Wilson (2000)

14 May 2001QPF Verification Workshop Idealized ROC (Mason 1982) S=2S=1S=0.5 f(x=1) f(x=0) S =    

14 May 2001QPF Verification Workshop Comparison of Approaches Brier score –Based on squared error –Strictly proper scoring rule –Calibration is an important factor; lack of calibration impacts scores –Decompositions provide insight into several performance attributes –Dependent on frequency of occurrence of the event ROC –Considers forecasts’ ability to discriminate between Yes and No events –Calibration is not a factor –Less dependent on frequency of occurrence of event –Provides verification information for individual decision thresholds

14 May 2001QPF Verification Workshop Relative operating levels Analogous to the ROC, but from the Calibration- Refinement perspective (i.e., given the forecast) Curves based on –Correct Alarm Ratio: –Miss Ratio: These statistics are estimates of two conditional probabilities: –Correct Alarm Ratio: p(x=1|f=1) –Miss Ratio: p(x=1|f=0) –For a system with no skill, p(x=1|f=1) = p(x=1|f=0) = p(x)

14 May 2001QPF Verification Workshop ROC Diagram (Mason and Graham 1999)

14 May 2001QPF Verification Workshop ROL Diagram (Mason and Graham 1999)

14 May 2001QPF Verification Workshop Verification of ensemble forecasts Output of ensemble forecasting systems can be treated as –A probability distribution –A probability –A categorical forecast Probabilistic forecasts from ensemble systems can be verified using standard approaches for probabilistic forecasts Common methods –Brier score –ROC

14 May 2001QPF Verification Workshop Example: Palmer et al. (2000) Reliability ECMWF ensembleMulti-model ensemble <0 <1

14 May 2001QPF Verification Workshop Example: Palmer et al. (2000) ROC ECMWF ensemble Multi-model ensemble

14 May 2001QPF Verification Workshop Verification of ensemble forecasts (cont.) A number of methods have been developed specifically for use with ensemble forecasts. For example: Rank histograms –Rank position of observations relative to ensemble members –Ideal: Uniform distribution –Non-ideal can occur for many reasons (Hamill 2001) Ensemble distribution approach (Wilson et al. 1999) –Fit distribution to ensemble –Determine probability associated with that observation

14 May 2001QPF Verification Workshop Rank histograms:

14 May 2001QPF Verification Workshop Distribution approach (Wilson et al. 1999)

14 May 2001QPF Verification Workshop Extensions to multiple categories Examples: –QPF with several thresholds/categories Approach 1: Evaluate each category on its own –Compute Brier score, reliability, ROC, etc. for each category separately –Problems: Some categories will be very rare, have few Yes observations Throws away important information related to the ordering of predictands and magnitude of error

14 May 2001QPF Verification Workshop Example: Brier skill score for several categories From

14 May 2001QPF Verification Workshop Extensions to multiple categories (cont.) Approach 2: Evaluate all categories simultaneously –Rank Probability Score (RPS) –Analogous to Brier Score for multiple categories –Skill score: –Decompositions analogous to BS, BSS

14 May 2001QPF Verification Workshop Multiple categories: Examples of alternative approaches Continuous ranked probability score (Bouttier 1994; Brown 1974; Matheson and Winkler 1976; Unger 1985) and decompositions (Hersbach 2000) –Analogous to RPS with infinite number of classes –Decompose into Reliability and Resolution/uncertainty components Multi-category reliability diagrams (Hamill 1997) –Measures calibration in a cumulative sense –Reduces impact of categories with few forecasts Other references –Bouttier 1994 –Brown 1974 –Matheson and Winkler 1976 –Unger 1985

14 May 2001QPF Verification Workshop Continuous RPS example (Hersbach 2000)

14 May 2001QPF Verification Workshop MCRD example (Hamill 1997)

14 May 2001QPF Verification Workshop Connections to value Cost-Loss ratio model Optimal to protect whenever C C/L where p is the probability of adverse weather

14 May 2001QPF Verification Workshop Wilks’ Value Score (Wilks 2001) VS is the percent improvement in value between climatological and perfect information as a function of C/L VS is impacted by (lack of) calibration VS can be generalized for particular/idealized distributions of C/L

14 May 2001QPF Verification Workshop VS example: Wilks (2001) Las Vegas, PoP April 1980 – March 1987

14 May 2001QPF Verification Workshop VS example: Icing forecasts

14 May 2001QPF Verification Workshop VS: Beta model example (Wilks 2001)

14 May 2001QPF Verification Workshop Richardson approach ROC context Calibration errors don’t impact the score

14 May 2001QPF Verification Workshop Miscellaneous issues Quantifying the uncertainty in verification measures –Issue: Spatial and temporal correlation –A few approaches: Parametric methods Ex: Seaman et al. (1996) Robust methods (confidence intervals for medians) Ex: Brown et al. (1997) Velleman and Hoaglin (1981) Bootstrap methods Ex:Hamill (1999) Kane and Brown (2001) Treatment of observations as probabilistic?

14 May 2001QPF Verification Workshop Conclusions Basis for evaluating probability forecasts was established many years ago (Brier, Murphy, Epstein) Recent renewal in interest has led to new ideas Still more to do –Develop and implement a cohesive set of meaningful and useful methods –Develop greater understanding of methods we have and how they inter-relate

14 May 2001QPF Verification Workshop Verification of Probabilistic QPFs: Selected References Brown, B.G., G. Thompson, R.T. Bruintjes, R. Bullock and T. Kane, 1997: Intercomparison of in-flight icing algorithms. Part II: Statistical verification results. Weather and Forecasting, 12, Davis, C., and F. Carr, 2000: Summary of the 1998 workshop on mesoscale model verification. Bulletin of the American Meteorological Society, 81, Hamill, T.M., 1997: Reliability diagrams for multicategory probabilistic forecasts. Weather and Forecasting, 12, 736–741. Hamill, T.M., 1999: Hypothesis tests for evaluating numerical precipitation forecasts. Weather and Forecasting, 14, Hamill, T.M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129,

14 May 2001QPF Verification Workshop References (cont.) Harvey, L.O., Jr., K.R. Hammond, C.M. Lusk, and E.F. Mross, 1992: The application of signal detection theory to weather forecasting behavior. Monthly Weather Review, 120, Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15, Hsu, W.-R., and A.H. Murphy, 1986: The attributes diagram: A geometrical framework for assessing the quality of probability forecasts. International Journal of Forecasting, 2, Kane, T.L., and B.G. Brown, 2000: Confidence intervals for some verification measures – a survey of several methods. Preprints, 15 th Conference on Probability and Statistics in the Atmospheric Sciences, 8-11 May, Asheville, NC, U.S.A., American Meteorological Society (Boston),

14 May 2001QPF Verification Workshop References (cont.) Mason, I., 1982: A model for assessment of weather forecasts. Australian Meteorological Magazine, 30, Mason, I., 1989: Dependence of the critical success index on sample climate and threshold probability. Australian Meteorological Magazine, 37, Mason, S., and N.E. Graham, 1999: Conditional probabilities, relative operating characteristics, and relative operating levels. Weather and Forecasting, 14, Murphy, A.H., 1993: What Is a god forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293. Murphy, A.H., and D.S. Wilks, 1998: A case study of the use of statistical models in forecast verification: Precipitation probability forecasts. Weather and Forecasting, 13,

14 May 2001QPF Verification Workshop References (cont.) Murphy, A.H., and R.L. Winkler, 1992: Diagnostic verification of probability forecasts. International Journal of Forecasting, 7, Richardson, D.S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quarterly Journal of the Royal Meteorological Society, 126, Seaman, R., I. Mason, and F. Woodcock, 1996: Confidence intervals for some performance measures of Yes-No forecasts. Australian Meteorological Magazine, 45, Stanski, H., L.J. Wilson, and W.R. Burrows, 1989: Survey of common verification methods in meteorology. WMO World Weather Watch Tech. Rep. 8, 114 pp. Velleman, P.F., and D.C. Hoaglin, 1981: Applications, Basics, and Computing of Exploratory Data Analysis. Duxbury Press, 354 pp.

14 May 2001QPF Verification Workshop References (cont.) Wilks, D.S., 1995: Statistical Methods in the Atmospheric Sciences, Academic Press, San Diego, CA, 467 pp. Wilks, D.S., 2001: A skill score based on economic value for probability forecasts. Meteorological Applications, in press. Wilson, L.J., W.R. Burrows, and A. Lanzinger, 1999: A strategy for verification of weather element forecasts from an ensemble prediction system. Monthly Weather Review, 127,