Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September 2013 1.Sampling uncertainty.

Similar presentations


Presentation on theme: "Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September 2013 1.Sampling uncertainty."— Presentation transcript:

1 Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September 2013 1.Sampling uncertainty and sampling schemes for (2x2) tables 2.Hit rate 3.Extensions – other measures and serial correlation

2 EMS September 20132 Binary deterministic forecasts Such forecasts are fairly common – forecast whether or not an event will occur Their format leads to a (2x2) contingency table

3 EMS September 20133 (2 x 2) table and some verification measures a/(a+c) Hit rate (H) = probability of detection b/(b+d) False alarm rate (F) = probability of false detection H-F - Peirce’s (1884) skill score (PSS) (a+d)/n Proportion correct (PC) (a+b)/(a+c) Frequency bias a/(a+b+c). Critical success index (CSI) = threat score Event Observed Event not observed Total Event Forecast a (Hits)b (False alarms) a + b Event not forecast c (Misses) d (Correct rejections) c + d Totala + cb + dn … many more -18 in Chapter 3 (by Hogan & Mason) in Jolliffe and Stephenson (2012) Forecast Verification. A Practitioner’s Guide in Atmospheric Science, 2 nd edition, Wiley.

4 Uncertainty/inference for verification measures Given the value of some verification measure, some idea of its uncertainty is needed to make inferences e.g. construct confidence intervals The example is a subset of the well-known Finlay tornado data for May 1884. The figure resamples from these data. EMS September 20134 Tornado Observed Tornado not observed Total Tornado Forecast 31922 Tornado not forecast 7511518 Total10530540

5 Sampling schemes Event Observed Event not observed Total Event Forecast aba + b Event not forecast cdc + d Totala + cb + dn EMS September 20135 Could have: 1. a, b, c, d all independent Poisson 2. n fixed; a, b, c, d multinomial 3. Row totals fixed or column totals fixed – independent binomials 4. Row totals and column totals fixed – hypergeometric Which is most plausible? Does it make much difference?

6 MULTINOMIAL SAMPLINGBINOMIAL SAMPLING  Binomial sampling has fixed a+c=10 and so hit rate is always a multiple of 1/10  Multinomial has additional sampling variation in hit rates between 1/10ths 6EMS September 2013

7 Sampling schemes Event Observed Event not observed Total Event Forecast aba + b Event not forecast cdc + d Totala + cb + dn EMS September 20137 Could have: 1. a, b, c, d all independent (Poisson) 2. n fixed; a, b, c, d multinomial 3. Row totals fixed or column totals fixed – independent binomials 4. Row totals and column totals fixed – hypergeometric The second of these is the most plausible for much climate data Hogan & Mason (Chapter 3 of Jolliffe & Stephenson) give (approximate) variances for 16 measures, but they assume column totals fixed.

8 Sampling schemes Event Observed Event not observed Total Event Forecast aba + b Event not forecast cdc + d Totala + cb + dn EMS September 20138 Could have: 1. a, b, c, d all independent (Poisson) 2. n fixed; a, b, c, d multinomial 3. Row totals fixed or column totals fixed – independent binomials 4. Row totals and column totals fixed – hypergeometric The second of these is the most plausible for much climate data – but you may disagree!! Hogan & Mason (Chapter 3 of Jolliffe & Stephenson) give (approximate) variances for 16 measures, but they assume column totals fixed.

9 Variance of hit rate EMS September 20139 Hit rate or probability of detection is H = a/(a+c) Suppose that (a+c) is fixed (binomial sampling) and that θ H is the probability that the event has been forecast, given that it occurred Then var(H) = θ H (1- θ H )/(a+c) which is estimated by ac/(a+c) 3 The multinomial sampling scheme can be obtained by first sampling (a+c) from a binomial with n trials and probability of success equal to the probability of event occurring (base rate); then, given the sampled value of (a+c), sample from the binomial with (a+c) trials and probability of success θ H

10 Variances of hit rate II EMS September 201310 It turns out that with multinomial sampling, var(H) = θ H (1- θ H )/(a+c) is replaced by var(H) = θ H (1- θ H )E[(a+c) -1 ] with slight abuse of notation Using a variance expression based on fixed (a+c) ignores the variability in (a+c) that occurs under multinomial sampling There is a complication that (a+c) can equal zero, leading to an infinite value of E[(a+c) -1 ], but data with (a+c) = 0 can be ignored as they provide no information on the performance of the forecasts

11 Multinomial vs. binomial comparison for hit rate The table gives, for n=100, some values of the ratio of multinomial vs. binomial variances for various values of (a+c) The diagram shows this ratio for more values of (a+c) and three values of n EMS September 201311 s(a+c)nsE[(a+c) -1 ] 0.0221.153 0.0661.231 0.10101.115 0.20201.044

12 Multinomial vs. binomial comparison EMS September 201312 Inflation of variance for most values of (a+c) Exception for very small values of (a+c) – due to frequently discarded zero values? Maximum inflation of around 30% occurs around (a+c) = 4 Inflation decreases towards 0 as (a+c) increases A remarkable similarity of curves for different n For the tornado data, multinomial variance is 12.7% larger than for binomial

13 Extensions Only one measure (hit rate) has been examined here Exactly the same reasoning can be used for other measures with a similar ratio formula Modifications are needed for other measures Serial correlation is another complication – the results given assume independence which is not necessarily true. Can have a bigger effect than choice of sampling scheme. EMS September 201313

14 When reporting values of verification measures it is important to quantify the uncertainty associated with that value For the seemingly simple case of data in a (2x2) contingency table this is a surprisingly subtle task because – Different sampling schemes lead to different variances – Serial correlation (or other forms of dependence) also change variances Some fairly general results can be found, but for many measures and situations tailor-made calculations may be needed Not withstanding the difficulties, the calculations should be done EMS September 201314 Conclusions

15 EMS September 201315 Questions? Comments? i.t.jolliffe@exeter.ac.uk

16 Other verification measures Exactly the same reasoning can be used to obtain multinomial- based variances for measures which are proportions, with the denominator equal to a sum of cell counts and the numerator a sum of a subset of the denominator counts, for example – F = False alarm rate b/(b+d) – J = Threat score a/(a+b+c) The variance comparison table for H can be used – For F replacing (a+c) by (b+d) – For J, replacing (a+c) by (a+b+c). The comparison here is with an unrealistic sampling scheme, which nonetheless corresponds to a variance estimate given in the literature. EMS September 201316

17 Other verification measures II For proportion correct, there are exact analytic expressions for variance under both binomial and multinomial sampling, which can be compared EMS September 201317 For the tornado data, the percentage increases in variance for multinomial sampling compared to the alternative scheme assumed by the table are 12.7 (H), 3.4 (J) and 17.5 (PC) Asymptotic expressions are available for some other measures, but different considerations are needed for exact values, possibly including simulation

18 Serial correlation – another complication All that has been said has assumed independence of the n observations being forecast This is not necessarily true – there may be serial correlation. Rain today may be more likely if there was rain yesterday than if there was not Serial correlation can have a bigger effect on variance than assuming the wrong sampling scheme EMS September 201318

19 Serial correlation – an example Gabriel & Neumann (1962), QJRMS, 88, 90-95, give data on wet/day days in Tel Aviv for 27 years of daily data, November-April There is serial correlation – for example, for November the probability of a wet day following a wet (dry) day is 0.60 (0.13) To assess how much such serial correlation affects variances of verification measures use Markov chain simulation EMS September 201319

20 Markov chain simulation Wilks (2010), QJRMS, 136, 2109-2118 considers probability forecasts and builds in serial dependence between forecasts directly We consider binary deterministic forecasts with dependence built directly into the observations and hence indirectly into the forecasts We simulate from a two-state Markov chain for various values of n (sample size), s (base rate) and ρ, the serial correlation EMS September 201320

21 Multinomial vs. binomial comparison for hit rate The table gives, for n=100, some values of the ratio of variances with/without serial correlation for various values of (a+c) and ρ The diagram shows this ratio for more values of (a+c) and three values of n EMS September 201321 s(a+c)ρnsE[(a+c) -1 ] 0.10100.251.216 0.10100.751.876 0.20200.251.078 0.20200.751.548

22 Serial correlation – simulation results Ratio gets bigger for increasing ρ Largest values are bigger than when comparing sampling schemes For given n, things get worse as (a+c) decreases Things get worse for lower base rate EMS September 201322

23 Serial correlation - examples The Gabriel/Neumann data have large n and moderate s and ρ, so the effect of serial correlation is small For example, for November, ρ=0.47, s=0.24 and n=810 leading to only a 1% increase in variance For the May tornado data, n is again large (540) but s is much smaller (0.02). We don’t know ρ but if it were 0.5, then variance would be increased by about 30% by serial correlation. In reality non-independence is likely to exist in the tornado data but will be more complex with space and time both involved EMS September 201323


Download ppt "Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September 2013 1.Sampling uncertainty."

Similar presentations


Ads by Google