Fair scores for ensemble forecasts Chris Ferro University of Exeter 13th EMS Annual Meeting and 11th ECAM (10 September 2013, Reading, UK)

Slides:



Advertisements
Similar presentations
Chapter 5 One- and Two-Sample Estimation Problems.
Advertisements

Chapter 23: Inferences About Means
R_SimuSTAT_2 Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University.
STATISTICS Random Variables and Distribution Functions
Emtool Centre for the Analysis of Time Series London School of Economics Jochen Broecker & Liam Clarke ECMWF Users Meeting June 2006.
ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.
MEAN AND VARIANCE OF A DISTRIBUTION
Chapter 7, Sample Distribution
10/5/2013Multiplication Rule 11  Multiplication Rule 1: If a > b and c > 0 then a c > bc Examples If 7 > 3 and 5 > 0 then 7(5) > 3(5) If 2x + 6 > 8 then.
Measuring the performance of climate predictions Chris Ferro, Tom Fricker, David Stephenson Mathematics Research Institute University of Exeter, UK IMA.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
CHAPTER 2 – DISCRETE DISTRIBUTIONS HÜSEYIN GÜLER MATHEMATICAL STATISTICS Discrete Distributions 1.
What is a good ensemble forecast? Chris Ferro University of Exeter, UK With thanks to Tom Fricker, Keith Mitchell, Stefan Siegert, David Stephenson, Robin.
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
CHAPTER 14: Confidence Intervals: The Basics
Rare Events, Probability and Sample Size. Rare Events An event E is rare if its probability is very small, that is, if Pr{E} ≈ 0. Rare events require.
Review of Probability.
Probability Basic Probability Concepts Probability Distributions Sampling Distributions.
Details for Today: DATE:3 rd February 2005 BY:Mark Cresswell FOLLOWED BY:Assignment 2 briefing Evaluation of Model Performance 69EG3137 – Impacts & Models.
Creating probability forecasts of binary events from ensemble predictions and prior information - A comparison of methods Cristina Primo Institute Pierre.
On judging the credibility of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 12th International Meeting.
1 Only Valuable Experts Can be Valued Moshe Babaioff, Microsoft Research, Silicon Valley Liad Blumrosen, Hebrew U, Dept. of Economics Nicolas Lambert,
Review of Probability Theory. © Tallal Elshabrawy 2 Review of Probability Theory Experiments, Sample Spaces and Events Axioms of Probability Conditional.
Describing distributions with numbers
Evaluating decadal hindcasts: why and how? Chris Ferro (University of Exeter) T. Fricker, F. Otto, D. Stephenson, E. Suckling CliMathNet Conference (3.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
Verification of ensembles Courtesy of Barbara Brown Acknowledgments: Tom Hamill, Laurence Wilson, Tressa Fowler Copyright UCAR 2012, all rights reserved.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Verification Approaches for Ensemble Forecasts of Tropical Cyclones Eric Gilleland, Barbara Brown, and Paul Kucera Joint Numerical Testbed, NCAR, USA
In this chapter we introduce the ideas of confidence intervals and look at how to construct one for a single population proportion.
Chapter 6 USING PROBABILITY TO MAKE DECISIONS ABOUT DATA.
Section 6-5 The Central Limit Theorem. THE CENTRAL LIMIT THEOREM Given: 1.The random variable x has a distribution (which may or may not be normal) with.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Local Probabilistic Weather Predictions for Switzerland.
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
Essential Statistics Chapter 91 Introducing Probability.
CHAPTER 10 Introducing Probability BPS - 5TH ED.CHAPTER 10 1.
Verification of ensemble systems Chiara Marsigli ARPA-SIMC.
Two extra components in the Brier Score Decomposition David B. Stephenson, Caio A. S. Coelho (now at CPTEC), Ian.T. Jolliffe University of Reading, U.K.
A General Discussion of Probability Some “Probability Rules” Some abstract math language too! (from various internet sources)
Chapter 6 - Probability Math 22 Introductory Statistics.
Probability Michael J. Watts
Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:
CPS Proper scoring rules Vincent Conitzer
Details for Today: DATE:13 th January 2005 BY:Mark Cresswell FOLLOWED BY:Practical Dynamical Forecasting 69EG3137 – Impacts & Models of Climate Change.
Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.
Sample Space and Events Section 2.1 An experiment: is any action, process or phenomenon whose outcome is subject to uncertainty. An outcome: is a result.
Predicting the performance of climate predictions Chris Ferro (University of Exeter) Tom Fricker, Fredi Otto, Emma Suckling 13th EMS Annual Meeting and.
Test of a Population Median. The Population Median (  ) The population median ( , P 50 ) is defined for population T as the value for which the following.
Test of a Population Median. The Population Median (  ) The population median ( , P 50 ) is defined for population T as the value for which the following.
Chapter 1: Outcomes, Events, and Sample Spaces men 1.
Brief General Discussion of Probability: Some “Probability Rules” Some abstract math language too! (from various internet sources)
Copyright © 2016, 2013, and 2010, Pearson Education, Inc.
Verifying and interpreting ensemble products
Chapter Six Normal Curves and Sampling Probability Distributions
BIOS 501 Lecture 3 Binomial and Normal Distribution
Evaluating forecasts and models
CI for μ When σ is Unknown
forecasts of rare events
Probabilistic forecasts
COSMO-LEPS Verification
Improved skill of ENSO coupled model probability forecasts by Bayesian combination with empirical forecasts Caio A. S. Coelho, S. Pezzulli, M. Balmaseda.
CHAPTER 15 SUMMARY Chapter Specifics
Measuring the performance of climate predictions
Essential Statistics Introducing Probability
Verification of probabilistic forecasts: comparing proper scoring rules Thordis L. Thorarinsdottir and Nina Schuhen
the performance of weather forecasts
What is a good ensemble forecast?
How Confident Are You?.
Presentation transcript:

Fair scores for ensemble forecasts Chris Ferro University of Exeter 13th EMS Annual Meeting and 11th ECAM (10 September 2013, Reading, UK)

Evaluating ensemble forecasts Multiple predictions, e.g. model simulations from several initial conditions. Want scores that favour ensembles whose members behave as if they and the observation are drawn from the same probability distribution.

Current practice is unfair Current practice evaluates a proper scoring rule for the empirical distribution function of the ensemble. A scoring rule, s(p,y), for a probability forecast, p, and an observation, y, is proper if (for all p) the expected score, E y {s(p,y)}, is optimized when y has distribution p. Proper scoring rules favour probability forecasts that behave as if the observations are randomly sampled from the forecast distributions.

Examples of proper scoring rules Brier score: s(p,y) = (p – y) 2 for observation y = 0 or 1, and probability forecast 0 ≤ p ≤ 1. Ensemble Brier score: s(x,y) = (i/n – y) 2 where i of the n ensemble members predict the event {y = 1}. CRPS: for real y and forecast p(t) = Pr(y ≤ t), where H is the Heaviside function. Ensemble CRPS: where i(t) of the n ensemble members predict the event {y ≤ t}.

Example: ensemble CRPS Observations y ~ N(0,1) and n ensemble members x i ~ N(0,σ 2 ) for i = 1,..., n. Plot expected value of the ensemble CRPS against σ. The ensemble CRPS is optimized when ensemble is underdispersed (σ < 1). n = 2 n = 4 n = 8

Fair scoring rules for ensembles Interpret the ensemble as a random sample. Fair scoring rules favour ensembles that behave as if the observations are sampled from the same distribution. A scoring rule, s(x,y), for an ensemble forecast, x, sampled from p, and an observation, y, is fair if (for all p) the expected score, E x,y {s(x,y)}, is optimized when y ~ p. Fricker, Ferro, Stephenson (2013) Three recommendations for evaluating climate predictions. Meteorological Applications, 20, (open access)

Characterization: binary case Let y = 1 if an event occurs, and let y = 0 otherwise. Let s i,y be the (finite) score when i of n ensemble members forecast the event and the observation is y. The (negatively oriented) score is fair if (n – i)(s i+1,0 – s i,0 ) = i(s i-1,1 – s i,1 ) for i = 0, 1,..., n and s i+1,0 ≥ s i,0 for i = 0, 1,..., n – 1. Ferro (2013) Fair scores for ensemble forecasts. Submitted.

Examples of fair scoring rules Ensemble Brier score: s(x,y) = (i/n – y) 2 where i of the n ensemble members predict the event {y = 1}. Fair Brier score: s(x,y) = (i/n – y) 2 – i(n – i)/{n 2 (n – 1)}. Ensemble CRPS: where i(t) of the n ensemble members predict the event {y ≤ t}. Fair CRPS: if (x 1,..., x n ) are the n ensemble members,

Example: ensemble CRPS Observations y ~ N(0,1) and n ensemble members x i ~ N(0,σ 2 ) for i = 1,..., n. Plot expected value of the fair CRPS against σ. The fair CRPS is always optimized when ensemble is well dispersed (σ = 1). unfair score fair score n = 2 n = 4 n = 8 all n

Summary Evaluate ensemble forecasts (not only probability forecasts) to learn about ensemble prediction systems. Use fair scoring rules to favour ensembles whose members behave as if they and the observation are drawn from the same probability distribution. Unfair scoring rules will favour ensembles whose members are drawn from mis-calibrated distributions.

Dependent ensemble members A scoring rule, s(x,y), for an exchangeable ensemble, x, with marginal distribution p, and an observation, y, is fair if (for all p) the expected score is optimized when y ~ p. Fair scores exist only for some dependence structures. We rarely know the ‘correct’ dependence structure for an ensemble, and using an estimate sacrifices fairness. Use scores that are fair for those dependence structures that may be adopted when using the ensemble.