Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Managerial Economics in a Global Economy
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.
7. Homogenization Seminar Budapest – October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
1 Seventh Lecture Error Analysis Instrumentation and Product Testing.
Daily Stew Kickoff – 27. January 2011 First Results of the Daily Stew Project Ralf Lindau.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks,
Random Sampling, Point Estimation and Maximum Likelihood.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?
Breaks in Daily Climate Records Ralf Lindau University of Bonn Germany.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
On the reliability of using the maximum explained variance as criterion for optimum segmentations Ralf Lindau & Victor Venema University of Bonn Germany.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.2 Prepared.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
CHAPTER- 3.1 ERROR ANALYSIS.  Now we shall further consider  how to estimate uncertainties in our measurements,  the sources of the uncertainties,
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Stats Methods at IC Lecture 3: Regression.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Modeling and Simulation CS 313
Multiple Regression.
The simple linear regression model and parameter estimation
Inference for Regression
SUR-2250 Error Theory.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Inference for Least Squares Lines
Linear Regression.
Normal Distribution.
Model validation and prediction
Correlation and Simple Linear Regression
Modeling and Simulation CS 313
Break and Noise Variance
Distribution of the Sample Means
The break signal in climate records: Random walk or random deviations
(Residuals and
Determining the distribution of Sample statistics
CHAPTER 29: Multiple Regression*
Introduction to Instrumentation Engineering
Multiple Regression.
Dipdoc Seminar – 15. October 2018
CHAPTER- 17 CORRELATION AND REGRESSION
The normal distribution
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Arithmetic Mean This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
M248: Analyzing data Block D UNIT D2 Regression.
Simple Linear Regression
Simple Linear Regression and Correlation
Product moment correlation
The Examination of Residuals
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Model Adequacy Checking
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter 5: Sampling Distributions
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully Estimating the Adjustments of Temperature trends

Break and Noise Variance Homogenization To homogenize we consider the difference time series between two neighboring stations. The dominating natural variance is cancelled out, because it is very similar at both stations. The relative break variance is increased and we have a realistic chance to detect the breaks. General proceeding Random combinations of test breaks are inserted. That one explaining the maximum variance is considered to show the true breaks. Technical application Dynamic Programming with Stop criterion Break Var Noise Var Dipdoc Seminar – 23.10.2017

Trend bias Trend bias If the positive and negative jumps do not cancel out each other, they introduce a trend bias. Underestimation of trend bias It is impossible to isolate the full break signal from the noise. Thus, only a certain part of it can be corrected. A small fraction remains (which has to be corrected after homogenization). Dipdoc Seminar – 23.10.2017

Underestimation of jump height The two fat horizontal lines indicate the true jump height Errors occur when the noise randomly (and erroneously) increases the data above the middle line. Then, a part of Segment 2 is (erroneously) exchanged to Segment 1 Correct detection x1 and x2 are determined as segment averages. x1 is nearly correct, but x2 is to high. Incorrect detection x1’ and x2’ are determined as segment averages. x2’ is nearly correct, but x1’ is to low. In both cases the jump height is underestimated. Dipdoc Seminar – 23.10.2017

Obviously, this systematic underestimation depends on the interaction between noise and break variance. To quantify this effect, the statistical properties of both break and noise variance has to be known. Nomenclature k: Number of test breaks (here: 3) n: Number of true breaks (here: 7) m: Total length (here: 100) l: Test segment length (here: 14, 4, etc.) Dipdoc Seminar – 23.10.2017

Statistical Characteristic of the Noise Variance Beta distributed

Example for noise variance We insert k = 3 random test breaks and check the variance they are able to explain. Since we have pure noise, the test segments’ means are very close to zero. However, there is a small random variation: This is the explained variance. Dipdoc Seminar – 23.10.2017

Statistic for Noise Variance We insert k = 3 test breaks at random positions into a random noise time series and calculate the explained variance. This procedure is repeated 1000 by 1000 times (1000 time series and 3000 test break positions). Relative explained variance: 𝒗= 𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓 𝒕𝒐𝒕𝒂𝒍 𝒗𝒂𝒓 Dipdoc Seminar – 23.10.2017

Beta distribution 𝑃 𝑣 = 𝑣 𝛼−1 1−𝑣 𝛽−1 𝐵(𝛼,𝛽) 𝐵 𝛼,𝛽 = Γ 𝛼 Γ 𝛽 Γ 𝛼+𝛽 Probability density: 𝑃 𝑣 = 𝑣 𝛼−1 1−𝑣 𝛽−1 𝐵(𝛼,𝛽) with Beta function 𝐵 𝛼,𝛽 = Γ 𝛼 Γ 𝛽 Γ 𝛼+𝛽 For noise the shape parameters are: 𝛼 = 𝑘 2 𝛽 = 𝑚−1−𝑘 2 when k denotes the number of test breaks and m the total length Dipdoc Seminar – 23.10.2017

Behavior of Noise optimum mean (random) The mean of a Beta distribution is given by: 𝒗 = 𝑬 𝒗 = 𝜶 𝜶+𝜷 Mean explained variance: 𝒗 = 𝒌 𝒎−𝟏 Maximum explained variance: 𝐯 𝒎𝒂𝒙 = 𝟏− 𝟏− 𝒌 𝒎−𝟏 𝟒 𝛼 = 𝑘 2 𝛽 = 𝑚−1−𝑘 2 Remember mean (random) / (m-1) Dipdoc Seminar – 23.10.2017

Statistical Characteristic of the Break Variance 1. Heuristic approach 2. Empirical approach 3. Theoretical approach

First Approach 𝜶 = 𝒌 𝟐 𝜷 = 𝒏−𝒌 𝟐 For true breaks, constant periods exist. Tested segment averages are the (weighted) means of such (few) constant periods. This is quite the same situation as for random scatter, only that less independent data is underlying. Obviously, the number of breaks n plays the same role as the time series length m did before for noise. Thus, the first approximation is: 𝜶 = 𝒌 𝟐 𝜷 = 𝒏−𝒌 𝟐 Dipdoc Seminar – 23.10.2017

Second Approach (1/3) 4 2 4 1 However, this would lead to 𝒗 = 𝒌 𝒏 = 𝟏, 𝐟𝐨𝐫 𝒌=𝒏 This is obviously only true, when all real breaks are actually matched by the test breaks, (which is not the case for random trials). Consider k=3, n=7 and count the number of platforms in each test segment. Altogether, there are 11 “independents”, in general n+k+1. 𝜶 = 𝒌 𝟐 𝜷 = 𝒏−𝒌 𝟐 Remember 4 2 4 1 Dipdoc Seminar – 23.10.2017

Second Approach (2/3) 𝒗 = 𝜶 𝜶+𝜷 = 𝒌 𝟐 𝒎−𝟏 𝟐 𝜶+𝜷= 𝒊𝒏𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒕𝒔 −𝟏 𝟐 For noise, we had: Now we have n+k+1 ”independents”, thus: 𝒗 = 𝜶 𝜶+𝜷 = 𝒌 𝟐 𝒎−𝟏 𝟐 𝜶+𝜷= 𝒊𝒏𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒕𝒔 −𝟏 𝟐 𝜶+𝜷 = 𝒏+𝒌 𝟐 , 𝜶 = 𝒌 𝟐 𝜷 = 𝒏 𝟐 4 2 4 1 Dipdoc Seminar – 23.10.2017

Second Approach (3/3) 4 2 4 1 This would lead to 𝒗 = 𝒌 𝒏+𝒌 =𝟏/𝟐, 𝐟𝐨𝐫 𝒌=𝒏 This is rather reasonable, because for n = k the situation is approximately: Each test segment contains one true break, thus two independents, which are then averaged. This leads to a reduction of the variance by a factor of 2. However, so far we did not take into account that the HSPs have different lengths. The effective number of true breaks must be smaller than the nominal. 4 2 4 1 Dipdoc Seminar – 23.10.2017

Effective observation number If we generate i = 1…N random time series of length j = 1…m with each element being: 𝑥 𝒊𝒋 ~ 𝒩(0,1) only a fraction of (m-1)/m can be found within the time series (because a fraction of 1/m is “lost” due to the variance of the time series means. How large is this effect if a step function with n breaks is considered? Dipdoc Seminar – 23.10.2017

Sketch of derivation (1/2) The mean of each time series is: The “lost” variance is: Which can be reduced to the sum over mean squared lengths: Which is equal to the weighted sum over l2 𝑥 = 1 𝑚 𝑖=1 𝑛+1 𝑙 𝑖 𝑥 𝑖 𝑉𝑎𝑟 𝑥 = 𝑥 𝑥 = 1 𝑚 𝑖=1 𝑛+1 𝑙 𝑖 𝑥 𝑖 2 = 1 𝑚 2 𝑖=1 𝒏+1 𝑗=1 𝒏+1 𝑙 𝑖 𝑙 𝑗 𝑥 𝑖 𝑥 𝑗 𝑉𝑎𝑟 𝑥 = 1 𝑚 2 𝑖=1 𝑛+1 𝑙 𝑖 2 𝑥 𝑖 𝑥 𝑖 = 1 𝑚 2 𝑖=1 𝑛+1 𝑙 𝑖 2 𝑥 𝑖 𝑥 𝑖 = 1 𝑚 2 𝑖=1 𝒏+1 𝑙 𝑖 2 𝑉𝑎𝑟 𝑥 = 𝑛+1 𝑚 2 1 𝑚−1 𝑛 𝑙=1 𝑚−𝑛 𝑚−𝑙−1 𝑛−1 𝑙 2 = 1 𝑚 𝑚 𝑘+1 𝑙=1 𝑚−𝑘 𝑚−𝑙−1 𝑘−1 𝑙 2 Dipdoc Seminar – 23.10.2017

Sketch of derivation (2/2) The sum of a product of two Binomial coefficients is solvable by the Vandermonde’s identity: Which leads to a solution for the l2 sum: Inserted into the original expression, we obtain for the “lost” external variance: The remaining internal variance is then: 𝑙=0 𝑚 𝑙 𝑗 𝑚−𝑙 𝑛−𝑗 = 𝑚+1 𝑛+1 𝑙=1 𝑚 𝑙 2 𝑚−1−𝑙 𝑛−1 = 2 𝑚 𝑛+2 + 𝑚 𝑛+1 𝑉𝑎𝑟 𝑥 = 2𝑚−𝑛 𝑚 𝑛+2 ≈ 2 𝑛+2 𝑉𝑎𝑟 𝑥 = 2𝑚−𝑘 𝑚 𝑘+2 (𝐶21) 1−𝑉𝑎𝑟 𝑥 = 𝑚+1 𝑛 𝑚 𝑛+2 ≈ 𝑛 𝑛+2 Dipdoc Seminar – 23.10.2017

Third approach 𝒗 = 𝒌 𝒌+ 𝒏 ∗ 𝒗= 𝒌 𝒏+𝒌 The relative unexplained variance of a test segment: with i: number of breaks within a test segment and n: number of breaks within the entire time series i = l/m n: m = l(k+1): with n* = n/2 +1 Similar to the second approach, but n counts only half. Remember: 1−𝑣 = 𝑖 𝑖+2 𝑛+2 𝑛 𝒗= 𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓 𝒕𝒐𝒕𝒂𝒍 𝒗𝒂𝒓 1−𝑣 = 𝑙 𝑚 𝑛 𝑙 𝑚 𝑛+2 𝑛+2 𝑛 = 𝑛+2 𝑛+2 𝑚 𝑙 1−𝑣 = 𝑛+2 𝑛+2 𝑘+1 = 𝑛 2 +1 𝑛 2 +𝑘+1 = 𝑛 ∗ 𝑘+ 𝑛 ∗ 𝑉𝑎𝑟 𝑥 = 2𝑚−𝑘 𝑚 𝑘+2 (𝐶21) 𝒗 = 𝒌 𝒌+ 𝒏 ∗ Remember 𝒗= 𝒌 𝒏+𝒌 Dipdoc Seminar – 23.10.2017

Statistical Characteristic of the Break Variance 1. Heuristic approach 2. Empirical approach 3. Theoretical approach

Empirical Var(k,n) Empirical test with 1000 random segmentations (fixed k) of 1000 time series (fixed n). Calculate the mean relative explained variance v from these 1,000,000 permutations. Repeat this procedure for all combinations of k = 1, …, 20 and n = 1, …, 20. 20 functions v(k) for the different n. Dipdoc Seminar – 23.10.2017

Stepwise Fitting (1/3) v/(1-v) is proportional to k. The slope is a function of n. (Numbers and lines do not cross). The slope is certainly not proportional, but rather reciprocal to n. (slp(1) large, slp(20) small). Thus, better to plot 1/slp(n). 𝒗 𝟏−𝒗 = 𝒔𝒍𝒑 𝒏 𝒌 Dipdoc Seminar – 23.10.2017

Stepwise Fitting (2/3) 𝒌 𝟏−𝒗 𝒗 = 𝟏 𝒔𝒍𝒑(𝒏) =𝒄𝒐𝒏𝒔𝒕 𝒌 𝟏−𝒗 𝒗 = 𝟏 𝒔𝒍𝒑(𝒏) =𝒄𝒐𝒏𝒔𝒕 We expect horizontal lines, if the reciprocal slope is really independent from k. This is largely confirmed. Averages over k gives than a value for each n. These seems to be rather linear in n. Thus, plot these averages as a function of n. Dipdoc Seminar – 23.10.2017

Stepwise Fitting (3/3) 𝒌 𝟏−𝒗 𝒗 = 𝟏 𝒔𝒍𝒑(𝒏) = 𝒂𝒏+𝒃 𝒌 𝟏−𝒗 𝒗 = 𝟏 𝒔𝒍𝒑(𝒏) = 𝒂𝒏+𝒃 0.629n + 1.855 with a = 0.629 and b = 1.855 𝒌 𝟏−𝒗 𝒗 = 𝟏 𝒔𝒍𝒑(𝒏) solve for v: 𝒗 = 𝒌 𝒌+ 𝟏 𝒔𝒍𝒑(𝒏) and insert an+b: 𝒗= 𝒌 𝒌+ 𝒏 ∗ , 𝐰𝐢𝐭𝐡 𝒏 ∗ =𝟎.𝟔𝟐𝟗 𝒏+𝟏.𝟖𝟓𝟓 Dipdoc Seminar – 23.10.2017

Application of findings Summarizing the stepwise fitting: The direct fit in the v/k space yields: The best heuristic approach was: 𝒗= 𝒌 𝒌+ 𝒏 ∗ , 𝐰𝐢𝐭𝐡 𝒏 ∗ =𝟎.𝟔𝟐𝟗 𝒏+𝟏.𝟖𝟓𝟓 𝒏 ∗ =𝟎.𝟓 𝒏+𝟏 𝒏 ∗ =𝟎.𝟔𝟐𝟏 𝒏+𝟏.𝟗𝟐𝟖 Dipdoc Seminar – 23.10.2017

Method of moments The mean of Beta distribution is: 𝒗 = 𝜶 𝜶+𝜷 The variance of a Beta distribution is: Which can be solved for a and b: 𝒗 = 𝜶 𝜶+𝜷 So far we developed an empirical equation for v, the mean explained variance. The same procedure is applied to derive equations for 𝛼 and 𝛽, the shape parameters describing the distribution of v. These coefficients are determined by the method of moments. 𝝈 𝒗 𝟐 = 𝜶𝜷 𝜶+𝜷+𝟏 𝜶+𝜷 𝟐 𝜶= 𝒗 𝟏− 𝒗 𝝈 𝒗 𝟐 −𝟏 𝒗 𝜷= 𝒗 𝟏− 𝒗 𝝈 𝒗 𝟐 −𝟏 𝟏− 𝒗 Dipdoc Seminar – 23.10.2017

Empirical values for a and b Again 1000 by 1000 permutations for fixed values of n and k are performed. The explained variance is calculated (for 1,000,000 permutation). From the mean and the variance of the resulting distribution, a and b are determined by the method of moments. This proceeding is repeated for combinations of k = 1…20 and n = 1…20. The result is plotted against k. a is strongly dependent on k, converging obviously to a = k/2 for large n. b is strongly dependent on n, converging obviously to b = n*/2 for large k. Dipdoc Seminar – 23.10.2017

Alpha/k and Beta/n* 𝜶=𝒇𝒌 𝒗 = 𝜶 𝜶+𝜷 = 𝒌 𝒌+ 𝒏 ∗ = 𝒇𝒌 𝒇𝒌+𝒇 𝒏 ∗ 𝒗 = 𝜶 𝜶+𝜷 = 𝒌 𝒌+ 𝒏 ∗ = 𝒇𝒌 𝒇𝒌+𝒇 𝒏 ∗ 𝒇= 𝜶 𝒌 = 𝜷 𝒏 ∗ 𝜷=𝒇 𝒏 ∗ Dipdoc Seminar – 23.10.2017

1/f f is neither a linear function of k nor of n, it is more promising to depict the reciprocal 𝒌 𝜶 . 1/f is rather linear in k with a slope reciprocal to n and an incept of 2. 𝒌 𝜶 = 𝒏 ∗ 𝜷 = 𝟏 𝒇 = 𝟏 𝒄 𝒌+𝟐 Dipdoc Seminar – 23.10.2017

Determination of c For a more detailed determination of c, we solve for c: and plot the result against k. 𝟏 𝒇 = 𝒏 ∗ 𝜷 = 𝟏 𝒄 𝒌+𝟐 𝑐 = 𝛽𝑘 𝑛 ∗ −2𝛽 𝒄 = 𝜷𝒌 𝒏 ∗ −𝟐𝜷 𝒄=𝒏+𝟑+𝟎.𝟎𝟑 𝒏−𝟏 𝒌 Dipdoc Seminar – 23.10.2017

Resulting fits for Alpha and Beta 𝜷 = 𝒄 𝒌+𝟐𝒄 𝒏 ∗ 𝜶 = 𝒄 𝒌+𝟐𝒄 𝒌 𝒄=𝒏+𝟑+𝟎.𝟎𝟑 𝒏−𝟏 𝒌 Dipdoc Seminar – 23.10.2017

Conclusion The explained noise variance is Beta distributed with: The explained break variance is Beta distributed with: 𝜶 = 𝒌 𝟐 𝜷 = 𝒎−𝟏−𝒌 𝟐 𝜶 = 𝒄 𝒌+𝟐𝒄 𝒌 𝜷 = 𝒄 𝒌+𝟐𝒄 𝒏 ∗ 𝒄 =𝒏+𝟑+𝟎.𝟎𝟑 𝒏−𝟏 𝒌 𝒏 ∗ =𝟎.𝟔𝟑 𝒏+𝟑 with Dipdoc Seminar – 23.10.2017