Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.

Slides:



Advertisements
Similar presentations
Diplomanden-Doktoranden-Seminar Bonn – 29. Juni 2008 Surrogates and Kriging Part I: Kriging Ralf Lindau.
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
T-tests continued.
Lecture 11 (Chapter 9).
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Chapter 10 Estimation and Hypothesis Testing II: Independent and Paired Sample T-Test.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Correlation and regression
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Lecture 4 Measurement Accuracy and Statistical Variation.
Economics 20 - Prof. Anderson
Inference.ppt - © Aki Taanila1 Sampling Probability sample Non probability sample Statistical inference Sampling error.
7. Homogenization Seminar Budapest – October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
Major Points Formal Tests of Mean Differences Review of Concepts: Means, Standard Deviations, Standard Errors, Type I errors New Concepts: One and Two.
Daily Stew Kickoff – 27. January 2011 First Results of the Daily Stew Project Ralf Lindau.
Determining the Size of
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks,
LECTURE 22 VAR 1. Methods of calculating VAR (Cont.) Correlation method is conceptually simple and easy to apply; it only requires the mean returns and.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
Hypothesis test in climate analyses Xuebin Zhang Climate Research Division.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records Separation of true from spurious breaks Ralf.
DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Breaks in Daily Climate Records Ralf Lindau University of Bonn Germany.
Simulated Annealing.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Lab 3b: Distribution of the mean
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Geo479/579: Geostatistics Ch4. Spatial Description.
BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,
Robust Estimators.
Charge Sharing & Hit Identification & Cluster Information.
On the reliability of using the maximum explained variance as criterion for optimum segmentations Ralf Lindau & Victor Venema University of Bonn Germany.
Local Predictability of the Performance of an Ensemble Forecast System Liz Satterfield and Istvan Szunyogh Texas A&M University, College Station, TX Third.
Chapter 8: Simple Linear Regression Yang Zhenlin.
RADIOSONDE TEMPERATURE BIAS ESTIMATION USING A VARIATIONAL APPROACH Marco Milan Vienna 19/04/2012.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Bob Livezey NWS Climate Services Seminar February 13, 2013.
Chapter 13 Sampling distributions
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
ECE-7000: Nonlinear Dynamical Systems 2. Linear tools and general considerations 2.1 Stationarity and sampling - In principle, the more a scientific measurement.
Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
CHAPTER- 3.1 ERROR ANALYSIS.  Now we shall further consider  how to estimate uncertainties in our measurements,  the sources of the uncertainties,
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Bob Livezey Climate Prediction Center Seminar February 20, 2013.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
Market-Risk Measurement
Break and Noise Variance
The break signal in climate records: Random walk or random deviations
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
Dipdoc Seminar – 15. October 2018
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Inferential Statistics
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau

These are Breaks in German climate station are too small to be detected. But they are large enough to influence the trend significantly (a rather disturbing finding). The pure knowledge of break positions is not sufficient to determine the trend accurately. The correction part of homogenization algorithms is essential and break-aware information is not enough (originally assumed in Daily Stew). If breaks are partly systematic (non-zero overall mean), they induce a spurious mean trend. Is the corresponding break variance large enough to be detectable? Hardly. Dipdoc Seminar – 16. June 2014

Internal and External Variance Consider the differences of one station compared to a neighbor reference. The dominating natural variance is cancelled out, because it is very similar at both stations. Breaks become visible by abrupt changes in the station-reference time series. Internal variance (Noise) within the subperiods External variance (Signal) between the means of different subperiods Break criterion: Maximum external (explained) variance

Explained Variance vs.True Skill X-axis: Normally, we rely on the external or explained variance. Y-axis: For simulated data the true skill is known (measured as RMS 2 difference between true and proposed signal). For SNR of ½ the two measures are only weakly correlated. Dipdoc Seminar – 16. June 2014

RMS Standard vs. arbitrary The skills of standard search and an arbitrary segmentation are comparable. Obviously, the standard search is mainly optimizing the noise, producing completely random results Dipdoc Seminar – 16. June 2014

Which SNR is sufficient? So far we considered SNR = ½. Random segmentation and standard search have comparable skills. RMS skill for different SNRs: 0Random segmentation +Standard search For SNR > 1, the standard search is significantly better. Dipdoc Seminar – 16. June 2014 Random Standard

Conclusion 1a Break search algorithm rely on the explained variance to identify the breakpoints. For signal to noise ratios of ½, the explained variance is not a good measure of the true skill. Consequently, the obtained segmentations do not differ significantly from random. However, for higher SNR the method works.

A priori formula The different reaction of breaks and noise on randomly inserted breaks makes it possible to estimate break variance and break number a priori. If we insert many breaks, almost the entire break variance is explained plus a known fraction of noise. At k = n k half of the break variance is reached (22.8% in total). Dipdoc Seminar – 16. June

Break variance Repeated for all station pairs we find a mean break variance of about 0.2 Thus the ratio of break and noise variance is 0.2 / 0.8 = ¼ The signal to noise ratio SNR = ½ Dipdoc Seminar – 16. June 2014

Conclusions 1b For monthly temperature at German climate stations the SNR can be estimated by an a priori method to just that ominous value of ½. Consequently, breaks are hardly detectable in this data.

Trends Some old-fashion researchers are still interested in the linear trend of climate series. (Modern ones of course in higher-order two-point statistics at the highest resolution possible etc. ;-) Trend errors can easily be estimated by just considering the difference time series between two neighboring stations. Advantage: no need to apply a full complicated homogenization algorithm. Any non-zero trend difference just gives a measure of the uncertainty of trends.

Trend difference (one pair) Difference time series of the monthly temperature anomaly between the stations Aachen and Essen. The thick line denotes the 2-years running mean, the thin line is the linear trend. For neighboring stations the trend difference should be zero. Non-zero trends reflect errors (probably due to inhomogeneities). Here, the trend is K / cty. The pure statistical uncertainty is small with K / cty. Dipdoc Seminar – 16. June 2014

Trend differences (all stations) Trend differences of neighboring stations reflect the true uncertainty of trends (position of crosses). Statistical errors calculated by assuming homogeneous data are much smaller (vertical extend of crosses). We conclude that the data is strongly influenced by breaks. Dipdoc Seminar – 16. June 2014

Conclusions I For signal-to-noise ratios of ½ standard break search algorithms are not superior to random segmentations. They do not work. For monthly temperature at German climate stations the SNR can be estimated by an a priori method to just this ½. Although the relative break variance might be small (½) breaks influence the trend estimates strongly. This is a dilemma: the breaks are too small to be detected, but large enough to influence the trend significantly.

Part II Is a break-aware approach adequate to determine trends? No. Dipdoc Seminar – 16. June 2014

Break-aware idea Breaks are only detected, but not corrected. Calculate the mean trend over all homogeneous subperiods (omitting the known breakpoints). This trend should reflect the true trend. Dipdoc Seminar – 16. June 2014

Internal/External Covariance Dipdoc Seminar – 16. June 2014 Trend is the regression from the data y (depended ) on the time x (independent). Analogous to the variance, also the covariance can be split into an external and an internal part. Cov = C + c Var = V + v

Total trend as weighted average Dipdoc Seminar – 16. June 2014 Total trend External trend Internal trend Total trend is the weighted average of internal and external trend with weights V and v

Total trend as weighted average Dipdoc Seminar – 16. June 2014 The variance of the time  x 2 depends quadratically on the length of the subperiods T. Subperiod length T is reciprocal to subperiod number N The internal trend influences the total trend only marginally: If e.g. N-1 = 5 breaks are contained, only by a factor of 1/36. breaksNvV /43/4 231/98/9

Conclusion II Dipdoc Seminar – 16. June 2014

Part 2 ½ Scenario A certain change in the measurement technique causes in many stations a positive jump hidden by many others, which are random. Only such systematic breaks are critical as they induce a mean spurious trend into the data. What is the relation between the causing jump height (and its corresponding break variance) and the induced trend? Are all relevant jump heights detectable? Dipdoc Seminar – 16. June 2014

Systematic breaks induce trends Dipdoc Seminar – 16. June 2014

Some numbers from data Dipdoc Seminar – 16. June 2014

Conclusion 2 ½ Ok, the effect may vary from station to station. In this way it will be sometimes large enough to be detectable. But this is similar to the undertaking to estimate the mean of an entire distribution by just a few extremes. And what, if the variance is small. So small that we can’t see any extreme. Or so small that we see just one extreme. Dipdoc Seminar – 16. June 2014