…with snow ECMWF model Chilbolton Winter (Oct-Mar) Summer (Apr-Sep) ECMWF overpredict low cloud in winter but not in summer
How good is a forecast? Overview of talk –Which skill scores have the most desirable properties? –How does skill depend on spatial scale, lead time etc? –If it has an inverse-exponential decay with forecast lead time, what is the half-life of the forecast? –Most model comparisons evaluate the cloud climatology –What about individual forecasts? –Standard measure shows forecast half- life of ~8 days (left) –But virtually insensitive to clouds! ECMWF 500-hPa geopotential anomaly correlation
Joint PDFs of cloud fraction Raw (1 hr) resolution –1 year from Murgtal –DWD COSMO model 6-hr averaging ab cd …or use a simple contingency table
Desirable properties of skill scores Equitable: all random forecasts score zero –This is essential! –Note that forecasting the right climatology versus height but with no other skill should also score zero Proper: not possible to hedge your bets –Some scores reward under- or over-prediction (e.g. hit rate) –Jolliffe and Stephenson: not possible to be equitable and proper! Independence of how often cloud occurs –Almost all scores asymptote to 0 or 1 for vanishingly rare events Dependence on 10x10 joint PDF, not just 2x2 table –Difference between cloud fraction of 0.9 and 1 is as important for radiation as a difference between 0 and 0.1 Linearity: so that can fit an inverse exponential –Some scores (Yules Q) saturate at the high-skill end
Three quite good scores 1. Log of odds ratio: LOR=ln(ad/bc) –Good properness properties –Unbounded: a perfect forecast scores infinity! Generalized skill score = (x-x random )/(x perfect -x random ) –Where x is any number derived from the joint PDF –Resulting scores vary linearly from random=0 to perfect=1 2. Heidke skill score: x=a+d –Monotonically related to the Equitable Threat Score, but more linear 3. Linear Brier score: x=mean absolute difference –Sensitive to cloud fraction errors in model for all values of cloud fraction
Score versus lead time, Murgtal 2007 Both scores well fitted by S=S 0 exp(-t/t 0 ) –Half life=ln(2)t 0 Met Office NAE has higher scores than DWD COSMO –But apparently a shorter half life (~2.7 days versus ~4.1 days) –Obviously need longer lead-time forecasts to check this!
DWD COSMO versus hours averaged Skill and lead time both increase with the number of hours over which cloud fraction is averaged –Larger-scale features are easier to forecast
Met Office versus hours averaged Statistics poorer for larger number of hours averaged –Log of odds ratio and Heidke skill score are sensitive to cloud fraction threshold –Linear Brier score considers all cloud fractions so more robust
Summary Half-life of a cloud forecast is between 2.5 and 5 days –Relatively insensitive to skill score (provided a good one is used) –Compare to ~8 days for ECMWF 500-hPa geopotential height forecast –Skill at forecasting cloud increases somewhat for larger scale features Important to assess the merits of various skill scores –At least 5 criteria to judge against, and none are good on all –Plenty of bad ones to use (hit rate, false-alarm rate etc)! –Worth trying Stephensonss Extreme Dependency Score, which is good for very rare events Wish list –Obtain Met Office cloud forecasts beyond a lead time of 3 days –Compare skill of the Met Office model at different model resolutions, but averaged to the same scale –Can we see what skill comes from global model at boundaries, what comes from mesoscale data assimilation etc?
Model cloud Model clear-sky A: Cloud hitB: False alarm C: MissD: Clear-sky hit Observed cloud Observed clear-sky Comparison with Met Office model over Chilbolton October 2003 Contingency tables
Simple skill score: Hit Rate Misleading: fewer cloud events so skill is only in predicting clear skies –Models which underestimate cloud will do better than they should Met Office short range forecast Météo France old cloud scheme
Scores independent of clearsky hits False alarm rate: fraction of forecasts of cloud which are wrong = B/(A+B) –perfect forecast is 0 Probability of detection: fraction of clouds correctly forecast = A/(A+C) –perfect forecast is 1 Skill decreases as cloud fraction threshold increases
More sophisticated scores Equitable threat score =(A-E)/(A+B+C-E) where E removes those hits that occurred by chance. –For both scores, 1 = perfect forecast, 0 = random forecast From now on use Equitable threat score with threshold of 0.1.
Skill versus height Model performance: –ECMWF, RACMO, Met Office models perform similarly –Météo France not so well, much worse before April 2003 –Met Office model significantly better for shorter lead time Potential for testing: –New model parameterisations –Global versus mesoscale versions of the Met Office model
Monthly skill versus time Measure of the skill of forecasting cloud fraction>0.05 –Comparing models using similar forecast lead time –Compared with the persistence forecast (yesterdays measurements) Lower skill in summer convective events
Skill versus lead time Unsurprisingly UK model most accurate in UK, German model most accurate in Germany! Half-life of cloud forecast ~2 days More challenging test than 500- hPa geopotential (half-life ~8 days)