# Exeter, February 2008 1 The impenetrable hedge: some thoughts on propriety, equitability and consistency Ian Jolliffe Forecast verification Hedging Propriety.

## Presentation on theme: "Exeter, February 2008 1 The impenetrable hedge: some thoughts on propriety, equitability and consistency Ian Jolliffe Forecast verification Hedging Propriety."— Presentation transcript:

Exeter, February 2008 1 The impenetrable hedge: some thoughts on propriety, equitability and consistency Ian Jolliffe Forecast verification Hedging Propriety Equitability Consistency

2 Forecast verification Forecasts of weather and climate are made at all timescales. Forecasts of weather and climate are made at all timescales. It is desirable to verify/validate/assess forecasts It is desirable to verify/validate/assess forecasts –Science: A look back at the accuracy of forecasts is necessary to determine whether current forecasting methods should be continued, abandoned or modified. –Users: Decisions are based on past data but also on forecasts of data not yet observed.

3 Verification measures In a measures-oriented approach to forecast verification, the value of one or more measure that quantifies desirable properties of a set of forecasts is calculated. In a measures-oriented approach to forecast verification, the value of one or more measure that quantifies desirable properties of a set of forecasts is calculated. Often a variety of measures or scores is available for a set of forecasts. To choose between them, desirable properties of the measures themselves have been defined. This is called metaverification. Often a variety of measures or scores is available for a set of forecasts. To choose between them, desirable properties of the measures themselves have been defined. This is called metaverification. Such criteria include propriety, consistency and equitability, all of which relate to the avoidance of hedging. Such criteria include propriety, consistency and equitability, all of which relate to the avoidance of hedging.

4 Hedging Hedging has a variety of definitions, but is commonly taken in everyday use to mean ‘placing bets on the opposite side in order to cut losses or guarantee a minimum amount of winnings’. In other words a forecast allows more than one (conflicting) possibility. Hedging has a variety of definitions, but is commonly taken in everyday use to mean ‘placing bets on the opposite side in order to cut losses or guarantee a minimum amount of winnings’. In other words a forecast allows more than one (conflicting) possibility. The term is fairly well-known in meteorology, though not very often used in print. When it is, it is taken to mean that it occurs (Murphy, 1978) ‘whenever a forecaster’s judgement and forecast differ’. The term is fairly well-known in meteorology, though not very often used in print. When it is, it is taken to mean that it occurs (Murphy, 1978) ‘whenever a forecaster’s judgement and forecast differ’.

5 To hedge or not to hedge ‘A meteorologist who prepares probability forecasts should not “hedge,” i.e. the meteorologist’s probabilities should express his true beliefs’. ‘A meteorologist who prepares probability forecasts should not “hedge,” i.e. the meteorologist’s probabilities should express his true beliefs’. ‘A meteorologist whose forecasts are evaluated with a particular scoring system can, and should, be expected to “hedge” to obtain the best possible score.’ ‘A meteorologist whose forecasts are evaluated with a particular scoring system can, and should, be expected to “hedge” to obtain the best possible score.’ Both quotations express plausible positions. Both are from Murphy & Epstein (1967), the latter deriving from a panel discussion reported in BAMS (1952). Both quotations express plausible positions. Both are from Murphy & Epstein (1967), the latter deriving from a panel discussion reported in BAMS (1952).

6 Why hedge? Hedging is used to make some sort of gain over what can be achieved without hedging. Hedging is used to make some sort of gain over what can be achieved without hedging. In everyday usage the gain is financial. In everyday usage the gain is financial. In meteorology the gain is a better value or expected value of some score used to assess/verify forecasts. Hence hedging is ‘playing the score’. In meteorology the gain is a better value or expected value of some score used to assess/verify forecasts. Hence hedging is ‘playing the score’. To make both quotations compatible we can restrict ourselves to using scores for which hedging is impossible – we need proper, or consistent, or perhaps equitable, scores. To make both quotations compatible we can restrict ourselves to using scores for which hedging is impossible – we need proper, or consistent, or perhaps equitable, scores. Caveat: If missing an extreme event is far more costly than a ‘false alarm’, hedging by over-forecasting is actually desirable. This is because the concept of value comes into play. Caveat: If missing an extreme event is far more costly than a ‘false alarm’, hedging by over-forecasting is actually desirable. This is because the concept of value comes into play.

7 Finley’s tornados Tornado Observed Tornado not observed Total Tornado Forecast 28 (a) 72 (b) 100 Tornado not forecast 23 (c) 2680 (d) 2703 Total5127522803 Finley’s classic tornado forecasts: Consider the measure Proportion Correct (PC) = (a+d)/n = 2708/2803 = 0.966. The naïve strategy of always forecasting ‘no tornado’ does better (2752/2803 = 0.981), so PC can be hedged.

8 Value of Tornado Forecasts If a wrong forecast of any sort costs \$1K, then the cost of the forecasting system is \$95K, but the naive system costs only \$51K – so the forecast system should not be used. If a wrong forecast of any sort costs \$1K, then the cost of the forecasting system is \$95K, but the naive system costs only \$51K – so the forecast system should not be used. However, if a false alarm costs \$1K, but a tornado missed costs \$10K, then the system costs \$302K, but naivety costs \$510K. So the forecasting system saves \$208K compared to naivety. However, if a false alarm costs \$1K, but a tornado missed costs \$10K, then the system costs \$302K, but naivety costs \$510K. So the forecasting system saves \$208K compared to naivety. … not to mention lives. … not to mention lives. Following Finley’s article, Peirce (1884) not only devised an equitable score, but looked at ‘utility’, associating a ‘profit’ with hits and ‘loss’ with false alarms. Following Finley’s article, Peirce (1884) not only devised an equitable score, but looked at ‘utility’, associating a ‘profit’ with hits and ‘loss’ with false alarms.

9 Propriety (proper scores) Next they should be taught about equitability and consistency. Next they should be taught about equitability and consistency.

10 Propriety II For probability forecasts, a (strictly) proper scoring system is one for which the forecaster obtains the best possible expected score by forecasting his/her true beliefs (and only by doing so) – Murphy & Epstein, 1967. For probability forecasts, a (strictly) proper scoring system is one for which the forecaster obtains the best possible expected score by forecasting his/her true beliefs (and only by doing so) – Murphy & Epstein, 1967. It is not necessary to invoke a ‘true belief’ to argue for propriety – Broeker and Smith, 2007. It is not necessary to invoke a ‘true belief’ to argue for propriety – Broeker and Smith, 2007. The Brier score is the best known proper score – there are others (logarithmic, spherical), also plenty of theory and discussion. Equally, many scores (e.g. linear) are not proper. The Brier score is the best known proper score – there are others (logarithmic, spherical), also plenty of theory and discussion. Equally, many scores (e.g. linear) are not proper.

11 Propriety is irrelevant for deterministic forecasts Either –The forecaster’s true belief is probabilistic. Or –The forecast is certain his/her (deterministic) forecast is correct, so the best possible value of the score will be achieved for any score, and there is no incentive to hedge.

12 Equitability ‘All’ unskilled forecasts should have the same expected score. ‘All’ unskilled forecasts should have the same expected score. Not so obviously related to hedging as propriety. Not so obviously related to hedging as propriety. But if a score is not equitable, it can be hedged in the sense that a forecaster who knows (s)he has little skill may do better using an unskilled forecast with a better expected score. But if a score is not equitable, it can be hedged in the sense that a forecaster who knows (s)he has little skill may do better using an unskilled forecast with a better expected score.

13 Equitability – what is an ‘unskilled’ forecast? 1. Choosing a forecast by some completely random mechanism. 2. Always making the same forecast (but what if that forecast is ‘climatology’?) 3. Forecasting persistence?? Most definitions include 1 & 2, but not 3.

14 Propriety and equitability No scoring system for probability forecasts can be both proper and equitable. Given the choice, which would you prefer? –Propriety or equitability?

15 Equitability and probability forecasts Not only is equitability incompatible with propriety, but it is rather difficult to achieve equitability at all for probability forecasts. Not only is equitability incompatible with propriety, but it is rather difficult to achieve equitability at all for probability forecasts. Any score is a function S(d) of the difference d=f-o, where f is forecast probability and o the corresponding observation, which is always 0 or 1. Any score is a function S(d) of the difference d=f-o, where f is forecast probability and o the corresponding observation, which is always 0 or 1. If S(d) is required to be symmetric [S(d)=S(-d)], then equitability is impossible unless If S(d) is required to be symmetric [S(d)=S(-d)], then equitability is impossible unless –S(p) = S(1-p) for all p between 0 and 1 or –The base rate/climatology θ is equal to 0.5. The second of these conditions is very restrictive and the first makes no sense. The second of these conditions is very restrictive and the first makes no sense.

16 Equitability and probability forecasts II What about allowing non-symmetry of S(d)? It is then relatively easy to get an equitable score. What about allowing non-symmetry of S(d)? It is then relatively easy to get an equitable score. Arbitrarily set S(0) = 0, S(1) =1 (other ‘anchor points’ give qualitatively similar conclusions). Arbitrarily set S(0) = 0, S(1) =1 (other ‘anchor points’ give qualitatively similar conclusions). The nature of the asymmetry is determined by the base rate θ. The nature of the asymmetry is determined by the base rate θ. –For θ = 1/3, S(1) = 1 and S(-1) = 2. –For θ = 0.1, S(1) = 1 and S(-1) = 9. It is plausible that a highly asymmetric score might be needed when ‘value’ is taken into account; much less so that the asymmetry is determined solely by base rate … It is plausible that a highly asymmetric score might be needed when ‘value’ is taken into account; much less so that the asymmetry is determined solely by base rate … But perhaps penalising wrong forecasts a long way from base rate less than wrong forecasts close to base rate has some merit?? But perhaps penalising wrong forecasts a long way from base rate less than wrong forecasts close to base rate has some merit??

17 Consistency For deterministic forecasts ‘consistency’ takes the place of ‘propriety’. For deterministic forecasts ‘consistency’ takes the place of ‘propriety’. Like ‘hedging’ the meaning is slightly different from everyday usage. Like ‘hedging’ the meaning is slightly different from everyday usage. There are ‘consistent’ forecasts – those that correspond with the forecaster’s judgments i.e. the forecaster does not hedge (Murphy, 1993). There are ‘consistent’ forecasts – those that correspond with the forecaster’s judgments i.e. the forecaster does not hedge (Murphy, 1993). There are also ‘consistent’ performance measures (scores) … There are also ‘consistent’ performance measures (scores) …

18 Consistent scores For the definition of consistency given by Murphy & Daan (1985) we need to assume that any forecaster really has a probability distribution for the variable to be forecast and that a rule or directive determines the deterministic forecast to be made, given the forecaster’s probability distribution. For the definition of consistency given by Murphy & Daan (1985) we need to assume that any forecaster really has a probability distribution for the variable to be forecast and that a rule or directive determines the deterministic forecast to be made, given the forecaster’s probability distribution. Then a score is consistent with the directive if that score is minimised by forecasting using that directive. For example if the directive is ‘forecast the mean’ for a continuous variable, then mean square error is a consistent score as it is minimised by forecasting the mean. Then a score is consistent with the directive if that score is minimised by forecasting using that directive. For example if the directive is ‘forecast the mean’ for a continuous variable, then mean square error is a consistent score as it is minimised by forecasting the mean.

19 Consistent scores - remarks The assumption of how the forecaster behaves (forecast deterministically when his/her beliefs are probabilistic) implies that the forecaster always hedges in the sense that true beliefs are not forecast. The assumption of how the forecaster behaves (forecast deterministically when his/her beliefs are probabilistic) implies that the forecaster always hedges in the sense that true beliefs are not forecast. However, it is the opposite of hedging in its everyday usage (make forecasts less definite), and is not done to improve a score. However, it is the opposite of hedging in its everyday usage (make forecasts less definite), and is not done to improve a score. As a slight digression, Murphy (1978) argues that ‘the desire to eliminate hedging should encourage forecasters to express … forecasts in probabilistic terms’. As a slight digression, Murphy (1978) argues that ‘the desire to eliminate hedging should encourage forecasters to express … forecasts in probabilistic terms’. It seems to me that the definition of consistency could be turned around to say that a directive is consistent with a score, rather than a score consistent with a directive. Given any score, if we can find its consistent directive we can reverse the definition again to say that the score is consistent. It seems to me that the definition of consistency could be turned around to say that a directive is consistent with a score, rather than a score consistent with a directive. Given any score, if we can find its consistent directive we can reverse the definition again to say that the score is consistent.

20 Hedging for deterministic forecasts First, what do we mean by hedging? First, what do we mean by hedging? –For probability forecasts it implies improving ‘expected score’. But expectation is with respect to the forecaster’s true beliefs. For deterministic forecasts either the forecaster’s true beliefs are For deterministic forecasts either the forecaster’s true beliefs are –Deterministic (and clearly wrong) or –Probabilistic and unknown. So does hedging now imply improving actual score? Or is there another definition? So does hedging now imply improving actual score? Or is there another definition?

21 Back to equitability Although apparently not very useful for probability forecasts, equitability is often made a requirement for deterministic categorical forecasts. Although apparently not very useful for probability forecasts, equitability is often made a requirement for deterministic categorical forecasts. Does equitability rule out hedging when hedging implies improvement of actual score? Does equitability rule out hedging when hedging implies improvement of actual score? Does non-equitability necessarily imply that a score can be hedged? Does non-equitability necessarily imply that a score can be hedged?

22 Equitability – a conjecture Equitability ensures that hedging is impossible for deterministic categorical forecasts. Equitability ensures that hedging is impossible for deterministic categorical forecasts. It works for the Peirce skill score, an equitable score for binary forecasts It works for the Peirce skill score, an equitable score for binary forecasts –Transferring a proportion of forecasts of an event to forecasts of no event or, conversely, transferring a proportion of no-event forecasts to ‘event’, reduces the Peirce skill score. –But is this the only way that a forecaster can diverge from his/her true beliefs?

23 Non-equitable scores Non-equitable scores may or may not be hedged, depending on details of the data Non-equitable scores may or may not be hedged, depending on details of the data Consider ‘Proportion Correct’ in a (2x2) table, a non-equitable score, (a+d)/n Consider ‘Proportion Correct’ in a (2x2) table, a non-equitable score, (a+d)/n If a>b and d>c the score cannot be improved by transferring a proportion of forecasts to non-forecasts, or vice versa; otherwise it can. For the tornados, a b and d>c the score cannot be improved by transferring a proportion of forecasts to non-forecasts, or vice versa; otherwise it can. For the tornados, a < b, so PC can be hedged. Observe event Observe no event Forecast event ab Forecast no event cd