Verification of nowcasting products: Issues and methods

Verification of nowcasting products: Issues and methods
Barbara Brown NCAR, Boulder, CO USA Collaborators: E. Ebert, E. Gilleland, R. Roberts, E. Tollerud, T. Jensen Thanks to: D. Ahijevych, J. Pinto, H. Caio, P. Nurmi 26 October 2011, Workshop on Use of NWP for Nowcasting

Topics What makes nowcasts special Verification Issues
Methods – old, new, and twists... Summary and concluding thoughts

Characteristics of nowcasts
High frequency and resolution Þ Large variability in space and time SNOW-V10: Visibility forecasts and obs before mens giant snowboard slalom finals Forecasts and obs – Beijing Olympics

Characteristics of nowcasts (cont)
Extreme weather Typical focus on high-impact weather – heavy precip, strong wind, low visibility, high reflectivity, etc. “Extreme” weather often implies “Rare” or infrequent event Infrequent events (low “base rate”) often require special statistical treatment... Gare Montparnasse, 1895

Characteristics of nowcasts (cont)
Large impact on users Goal: Nowcasts impact users’ decision making In designing verification it is important to “Know” the user and the decisions being made Identify relevant “events” to evaluate Ask relevant questions - to provide meaningful information about forecast quality Example: SNOW-V10 May not need to get every visibility forecast correct May be more important to know whether the visibility will be below a relevant threshold sometime during an event

Issues and challenges... Inconsistent reporting of scores
Ex: What smoothing was used? Should always be reported... (NOTE: Same smoothing should be used for ALL scores; not the smoother of choice for each score) Are scores meaningful to a user? Ex: What does CSI = 0.2 really mean? Use of more diagnostic measures helps with this... Relationships and dependencies among scores Determining if apparent differences are meaningful Benefits of high resolution are also detriments for forecast evaluation Ex: High resolution, spatial and temporal variability

Ex: Relationships among scores
CSI is a nonlinear function of POD and FAR CSI depends on base rate (event frequency) and Bias CSI Very different combinations of FAR and POD lead to the same CSI value What about the user? FAR POD

Incorporation of uncertainty information
Uncertainty information is critical aspect of forecast evaluation Verification statistics have inherent uncertainty (sample, observations, grid) GSS GSS

Impacts of high resolution and spatial variability
Grid-to-grid results: POD = 0.40 FAR = 0.56 CSI = 0.27 Forecast Observed REMOVE? Traditional approaches ignore spatial structure in many (most?) forecasts Spatial correlations Small errors lead to poor scores (squared errors… smooth forecasts are rewarded) Methods for evaluation are not diagnostic Same issues exist for ensemble and probability forecasts

Challenge: High resolution forecasts
Which rain forecast is best? Mesoscale model (5 km) 21 Mar 2004 Sydney Global model (100 km) 21 Mar 2004 Observed 24h rain RMS=13.0 RMS=4.6 “Smooth” forecasts generally “Win” according to traditional verification approaches. From E. Ebert

Example: Probability forecasts
1 2 3 4 Example 5 Example 3 Example 4

Conclusion: Calibration and skill are highly dependent on displacement
Large displacement Conclusion: Calibration and skill are highly dependent on displacement

A little about methods and displays
Traditional approaches Spatial methods Translation to user variables Some new measures 11 May 2011 Ensemble Workshop

Beijing Olympics 2008 Real-time system was found to be very useful
Forecasters preferred scatterplots and Quantile-Quantile plots These plots can also be very useful for forecast diagnostics

Timing of Modeled Storm Initiation
overnight storms 1200 0800 Model performs well during day. Performance not a function of cycle time. Issues with timing of overnight storms. 0400 HRRR late Model Initiation Times (UTC) 0000 2000 HRRR early 1600 All analyses are for JJA 2010 1200 Observed Initiation Times (UTC) *145 MCS initiation events From J. Pinto

Performance diagrams Success ratio = 1 - FAR Equal lines of CSI
Equal lines of Bias Equal lines of CSI Success ratio = 1 - FAR From Roberts et al. 2011

New Spatial Verification Approaches
MET Tutorial 15 June New Spatial Verification Approaches Neighborhood Successive smoothing of forecasts/obs Object- and feature-based Evaluate attributes of identifiable features Scale separation Measure scale-dependent error Field deformation Measure distortion and displacement (phase error) for whole field In particular, many researchers around the world were developing methods that our work indicated would fit into 4 basic categories. One is the category that includes MODE – feature-based. But other methods took different approaches – neighborhood methods, which evaluate performance as scale is broadened to include smoother scales; scale separation, which consider performance at individual scales; and field deformation which evaluates how much a forecast would have to change in order to best match the observed field. So – several questions arose: How should a user determine what method to use for a particular application? What are the attributes and properties of these different methods? Web site:

HWT Example: Attribute Diagnostics for NWP Neighborhood & Object-based Methods - REFC > 30 dBZ
FSS = 0.14 FSS = 0.30 FSS = 0.64 Neighborhood Methods provide a sense of how model performs at different scales (Fraction Skill Score) Object-Based Methods Provide a sense of how forecast attributes compare with observed Includes a measure of overall matching skill, based on user-selected attributes 20-h 22-h 24-h Matched Interest: 0 Area Ratio: n/a Centroid Distance: n/a P90 Intensity Ratio: n/a Matched Interest: 0.89 Area Ratio: 0.18 Centroid Distance: 112km P90 Intensity Ratio: 1.08 Matched Interest: 0.96 Area Ratio: 0.53 Centroid Distance: 92km P90 Intensity Ratio: 1.04

MODE application to HWT ensembles
Observed CAPS PM Mean Radar Echo Tops (RETOP) RETOP

MODE Storm Size Distribution (Midwest)
4 hr Forecast 8 hr Forecast 12 hr Forecast Obs 2011 HRRR Log10(Number) Forecast Log10(Number) Log10(Number) 2010 HRRR Log10(Number) Log10(Number) Log10(Number) From H. Cai

Evaluation of temporal characteristics
Wind change events MODE-TD: MODE with time dimension Rife et al. 2005

Translation to user-variable: Aviation capacity
Forecast capacity Observed capacity User translation provides info that is closer to user decision making. Evaluation faces issues similar to those for weather variables. Brier score contribution CSI for E-W routes 20 kft 30 kft 40 kft From D. Ahijevych

New scores to consider... SEEPS: Stable Equitable Error in Probability Space New ECMWF ”Supplementary Headline” score for non-probabilistic forecasts of accumulated precipitation Rodwell et al, 2010 (QJRMS, 136) Derived from ”LEPS” (Linear Error in Probability Space) SEDS and variants: Symmetric Extreme Dependency Score

Extreme dependency scores
Standard scores tend to zero for rare events SEDI has desirable statistical properties Extremal Dependency Index - EDI Symmetric Extremal Dependency Index - SEDI Ferro & Stephenson, 2010: Improved verification measures for deterministic forecasts of rare, binary events. Wea. and Forecasting Base rate independence  Functions of H and F From Nurmi 2011

Experimentation done at FMI, HMI, KNMI, M-F ...
More work is needed to assess their potential as scores for severe weather events From Nurmi 2011

Concluding thoughts Clear disclosure of methodologies (e.g., smoothing parameters) is necessary and should be a requirement of our science In the same vein... uncertainty estimates are critical when making comparisons, to justify decisions CSI alone can be very misleading Does not tell anyone much about what is really going on – especially when applied broadly to all forecasts and full domain Caution: You can get apparently good scores for the wrong reasons A challenge: Think beyond CSI/ETS (and FSS)

Supplementary headline measure (i): 1 - SEEPS
Supplementary headline measure (i): 1 – SEEPS for 24-hr deterministic Precipitation Supplementary headline measure (i): 1 - SEEPS 1 – SEEPS remains above 45% Proposal Supplementary headline score for deterministic precipitation forecasts. The curve shows the number of days for which the centered 12-month mean skill remains above a specified threshold for precipitation forecasts over the extra-tropics. The verification is for 24-hour total precipitation verifying against available synoptic observations. The forecast day on the y-axis is the end of the 24-hour period over which the precipitation is accumulated. The threshold is chosen to reflect the forecast skill that is achieved at approximately day 3.5 at the beginning of the strategy period.

Supplementary headline measure (i): 1 - SEEPS
SEEPS  Stable Equitable Error in Probability Space Rodwell et al, 2010: QJRMS, Latest ECMWF Newsletter # 128 Derived from LEPS score  Linear Error in Probability Space Forecast error is measured in probability space using the climatological cumulative distribution function At each observation location, precipitation is partitioned into 3 categories: (i) “dry” (ii) “light precip” (iii) “heavy precip” Long-term climatological precipitation categories at given SYNOP stations are derived  Accounts for climate differences between stations Evaluates forecast performance across all 3 categories Stable to sample variations and obs error  Good for detecting trends Negatively oriented error measure  Perfect score = 0  1 - SEEPS

Verification of nowcasting products: Issues and methods

Similar presentations

Presentation on theme: "Verification of nowcasting products: Issues and methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Verification of nowcasting products: Issues and methods

Similar presentations

Presentation on theme: "Verification of nowcasting products: Issues and methods"— Presentation transcript:

Similar presentations

About project

Feedback