Presentation on theme: "Richard (Rick) Jones SWFDP Training Workshop on Severe Weather Forecasting Bujumbura, Burundi, Nov 11-16, 2013."— Presentation transcript:
Richard (Rick) Jones SWFDP Training Workshop on Severe Weather Forecasting Bujumbura, Burundi, Nov 11-16, 2013
Verification WMO sponsored Joint Working Group on Forecast Verification Research JWGFVR ecast_Verification.html ecast_Verification.html
Why? “you can't know where you're going until you know where you've been” Proverb or George Santayana-"Those who are unaware of history are destined to repeat it.” Quality management- “Plan Do Check Act” – Deming How to verify ….“Begin with the end in mind” … Covey Training Product differentiation
Verification as a measure of Forecast quality to monitor forecast quality to improve forecast quality to compare the quality of different forecast systems
Introduction “Verification activity has value only if the information generated leads to a decision about the forecast or system being verified” – A. Murphy “User-Oriented Verification” Verification methods designed with the needs of a specific user in mind. “Users” are those who are interested in verification results, and who will take action based on verification results Forecasters, modelers are users too.
SWFDP Goals PROGRESS AGAINST SWFDP GOALS To improve the ability of NMSs to forecast severe weather events To improve the lead time of alerting these events To improve the interaction of NMSs with Disaster Management and Civil Protection authorities before, during and after severe weather events To identify gaps and areas for improvements To improve the skill of products from Global Centres through feedback from NMSs EVALUATION OF WEATHER WARNINGS Feedback from the public Feedback from the DMCPA to include comments of the timeliness and usefulness of the warnings Feedback from the media Warning verification by the NMCs
Goals of Verification Administrative Justify cost of provision of weather services Justify additional or new equipment Monitor the quality of forecasts and track changes Usually means summarizing the verification into few numbers (scoring) Impact - $ and injuries 7
Goals of Verification Scientific To identify the strengths and weaknesses of a forecast product in sufficient detail that actions can be specified that will lead to improvements in the product, ie to provide information to direct R&D. Demands more detail in verification methodology “diagnostic verification” SWFDP: Both administrative goals and scientific goals 8
Forecast “goodness” What makes a forecast good? QUALITY: How well it corresponds with the actual weather, as revealed by observations. (Verification) VALUE: The increase or decrease in economic or other value to a user, attributable to his use of the forecast. (satisfaction) Requires information from the user to assess, in addition to verification Can be assessed by methods of decision theory. (Cost-Loss etc)
Principles of (Objective) Verification Verification activity has value only if the information generated leads to a decision about the forecast or system being verified User of the information must be identified Purpose of the verification must be known in advance No single verification measure provides complete information about the quality of a forecast product. 10
The contingency Table 11 Observations Forecasts YesNo Yes
Preparation of the event table Start with matched forecasts and observations Forecast event is precipitation >50 mm / 24 h Next day Threshold – medium risk Count up the number of each of hits, false alarms, misses and correct negatives over the whole sample Enter them into the corresponding 4 boxes of the table. DayFcst to occur? Observed ? 1Yes 2NoYes 3No 4YesNo 5 6Yes 7No 8 Yes 9No
Exercice Mozambique contingency table Review on Saturday 16 Nov
Outline Introduction: Purposes and Principles of verification Some relevant verification measures: Contingency table and scores Verification of products from the SWFDP Verification of probability forecasts Exercise results and interpretation (Saturday)
Forecast “goodness” Evaluation of forecast system Forecast goodness Evaluation of delivery system timeliness (are forecasts issued in time to be useful?) relevance (are forecasts delivered to intended users in a form they can understand and use?) robustness (level of errors or failures in the delivery of forecasts)
Principles of (Objective) Verification Forecast must be stated in such a way that it can be verified What about subjective verification? With care, is OK. If subjective, should not be done by anyone directly connected with the forecast. Sometimes necessary due to lack of objective information 16
Verification Procedure Start with dataset of matched observations and forecasts Data preparation is the major part of the effort of verification Establish purpose Scientific vs. administrative Pose question to be answered, for specific user or set of users Stratification of dataset On basis of user requirements (seasonal, extremes etc) Take care to maintain sufficient sample size 17
Verification Procedure Nature of variable being verified Continuous: Forecasts of specific value at specified time and place Categorical: Forecast of an “event”, defined by a range of values, for a specific time period, and place or area Probabilistic: Same as categorical, but uncertainty is estimated SWFDP: Predicted variables are categorical: Extreme events, where extreme is defined by thresholds of precipitation and wind. Some probabilistic forecasts are available too. 18
What is the Event? For categorical and probabilistic forecasts, one must be clear about the “event” being forecast Location or area for which forecast is valid Time range over which it is valid Definition of category Example?
What is the Event? And now, what is defined as a correct forecast? A “hit” The event is forecast, and is observed – anywhere in the area? Over some percentage of the area? Scaling considerations Discussion:
Events for the SWFDP Best if “events” are defined for similar time period and similar-sized areas One day 24h Fixed areas; should correspond to forecast areas and have at least one reporting stn. The smaller the areas, the more useful the forecast, potentially, BUT… Predictability lower for smaller areas More likely to get missed event/false alarm pairs
Events for the SWFDP Correct negatives a problem Data density a problem Best to avoid verification where there is no data. Non-occurrence – no observation problem
The contingency Table 23 Observations Forecasts YesNo Yes
Contingency tables 24 Characteristics: PoD= “Prefigurance” or “probability of detection”, “hit rate” Sensitive only to missed events, not false alarms Can always be increased by overforecasting rare events FAR= “False alarm ratio” Sensitive only to false alarms, not missed events Can always be improved by underforecasting rare events range: 0 to 1 best score = 1 range: 0 to 1 best score = 0 Forecasts Observations
Contingency tables 25 range: 0 to 1 best score = 1 Forecasts Observations best score = 1 Characteristics: PAG= “Post agreement” PAG= (1-FAR), and has the same characteristics Bias: This is frequency bias, indicates whether the forecast distribution is similar to the observed distribution of the categories (Reliability)
Contingency tables 26 Forecasts Observations range: 0 to 1 best score = 1 Characteristics: Better known as the Threat Score Sensitive to both false alarms and missed events; a more balanced measure than either PoD or FAR
Contingency tables 27 Forecasts Observations range: negative value to 1 best score = 1 Characteristics: A skill score against chance (as shown) Easy to show positive values Better to use climatology or persistence needs another table
Contingency tables 28 range: 0 to 1 best score = 1 Forecasts Observations best score = 0 Characteristics: Hit Rate (HR) is the same as the PoD and has the same characteristics False alarm RATE. This is different from the false alarm ratio. These two are used together in the Hanssen-Kuipers score, and in the ROC, and are best used in comparison.
Extreme weather scores Extreme Dependency Score EDS Extreme Dependency Index EDI Symmetric Extremal Dependency Score SEDS Symmetric Extremal Dependency Index SEDI
Contingency tables 30 Forecasts Observations range: -1 to 1 best score = 1 Extreme dependency score characteristics: Score can be improved by incurring more false alarms Considered useful for extremes because does not converge to 0 as the base rate (observed frequency of events) decreases A relatively new score – not yet widely used.
EDS – EDI – SEDS - SEDI Novelty categorical measures! Standard scores tend to zero for rare events Extremal Dependency Index - EDI Symmetric Extremal Dependency Index - SEDI Ferro & Stephenson, 2010: Improved verification measures for deterministic forecasts of rare, binary events. Wea. and Forecasting Base rate independence Functions of H and F Verification of extreme, high-impact weather
Weather Warning Index (Canada)
Weather warning index for the i th variable if
Example - Madagascar LowObs yes Obs no Totals Fcst yes Fcst no Totals MedObs yesObs noTotals Fcst yes Fcst no Totals HighObs yesObs noTotals Fcst yes13417 Fcst no Totals Cases Separate tables assuming low, medium, high risk as thresholds Can plot the hit rate vs the false alarm RATE = FA/total obs no
Discrimination User-perspective: Does the model or forecast tend to give higher values of precipitation when heavy precipitation occurs than when it doesn’t? (or temperature?)
How do we verify this?
Contingency Table for spatial data Possible interpretation for spatially defined threat areas: Put grid of equal area boxes over overlaid obs and fcsts Entries are just the number of boxes covered by the areas as shown. Correct negatives problematic, but could limit to total forecast domain Likely to result in overforecasting bias – different interpretation? Can be done only where spatially continuous obs and forecasts are available – hydro estimator? Forecast Observed False alarms Hits Misses
Verification of regional maps SAWS Stephanie Landman: Regional map is discretized into 0 and 1. All fields are rescaled to 0.25 resolution. SWFDP fields are created for both HE (hydroestimator) and TRMM domains. HE and TRMM fields are converted to dichotomous fields for both 25 and 50 mm/day threshold values. 25 mm/day is used together with 50 mm/day since 25 mm/day for a 0.25 is considered extreme and falls within the 95thpercentile value. Statistics are calculated per season as well as for whole period. Daily verification is also done.
Summary – Verification of SWFDP products ProductWho should verify General method NMC severe weather warnings NMCContingency tables and scores RSMC severe weather guidance charts RSMCGraphical contingency table Global centre deterministic models Global centres Continuous scores (temperature); contingency tables (precip, wind) Global EPSGlobal centres. Scores for ensemble pdfs; scores for probability forecasts with respect to relevant categories.
Probability forecast verification – Reliability tables Reliability: The level of agreement between the forecast probability and the observed frequency of an event Usually displayed graphically Measures the bias in a probability forecast: Is there a tendency to overforecast or underforecast. Cannot be evaluated on a single forecast.
Reliability – Summer 08- Europe 114 h
Summary – NMS products Warnings issued by NMSs Contingency tables as above, if enough data is gathered Important for a warning to determine the lead time – must archive the issue time of the warning and the occurrence time of the event. Data problems – verify the “reporting of the event”
Summary and discussion…. Summary Keep the data! Be clear about all forecasts! Know why you are verifying and for whom! Keep the verification simple but relevant! Just do it! Case studies – post-mortem
Resources The EUMETCAL training site on verification – computer aided learning: sgcrs/index.htm The website of the Joint Working Group on Forecast Verification Research: WMO/TD 1083 : Guidelines on performance assessment of the performance of Public Weather Systems