Dipdoc Seminar – 15. October 2018

Slides:



Advertisements
Similar presentations
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Advertisements

Correlation and regression
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.
STAT 497 APPLIED TIME SERIES ANALYSIS
The Simple Linear Regression Model: Specification and Estimation
Deterministic Solutions Geostatistical Solutions
Simple Linear Regression
Diplomanden-Doktoranden-Seminar Bonn – 18. Mai 2008 LandCaRe 2020 Temporal downscaling of heavy precipitation and some general thoughts about downscaling.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Economics 20 - Prof. Anderson
 The Law of Large Numbers – Read the preface to Chapter 7 on page 388 and be prepared to summarize the Law of Large Numbers.
7. Homogenization Seminar Budapest – October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
Daily Stew Kickoff – 27. January 2011 First Results of the Daily Stew Project Ralf Lindau.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
Component Reliability Analysis
Lecture 7: Simulations.
Alternative Measures of Risk. The Optimal Risk Measure Desirable Properties for Risk Measure A risk measure maps the whole distribution of one dollar.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks,
10 IMSC, August 2007, Beijing Page 1 An assessment of global, regional and local record-breaking statistics in annual mean temperature Eduardo Zorita.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
Introduction to Linear Regression
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records Separation of true from spurious breaks Ralf.
DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?
Breaks in Daily Climate Records Ralf Lindau University of Bonn Germany.
7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
On the reliability of using the maximum explained variance as criterion for optimum segmentations Ralf Lindau & Victor Venema University of Bonn Germany.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Advanced Tutorial on : Global offset and residual covariance ENVR 468 Prahlad Jat and Marc Serre.
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
ECE-7000: Nonlinear Dynamical Systems 2. Linear tools and general considerations 2.1 Stationarity and sampling - In principle, the more a scientific measurement.
Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
Computacion Inteligente Least-Square Methods for System Identification.
Demand Management and Forecasting Chapter 11 Portions Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
Chapter 13 Wiener Processes and Itô’s Lemma 1. Stochastic Processes Describes the way in which a variable such as a stock price, exchange rate or interest.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Multiple Random Variables and Joint Distributions
SUR-2250 Error Theory.
Wiener Processes and Itô’s Lemma
CHAPTER 6 Random Variables
Spatial statistics: Spatial Autocorrelation
Statistical Modelling
Inference for Least Squares Lines
Time Series Analysis and Its Applications
Linear Regression and Correlation Analysis
Chapter 11: Simple Linear Regression
Break and Noise Variance
A Session On Regression Analysis
The break signal in climate records: Random walk or random deviations
Chapter 3 Component Reliability Analysis of Structures.
12 Inferential Analysis.
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
Chapter 6 Confidence Intervals.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Undergraduated Econometrics
STOCHASTIC HYDROLOGY Random Processes
Chapter 14 Wiener Processes and Itô’s Lemma
12 Inferential Analysis.
Parametric Methods Berlin Chen, 2005 References:
Lecturer Dr. Veronika Alhanaqtah
Berlin Chen Department of Computer Science & Information Engineering
Berlin Chen Department of Computer Science & Information Engineering
Survey Networks Theory, Design and Testing
CH2 Time series.
Presentation transcript:

The break signal in climate records: Brownian motion or Random deviations? Ralf Lindau

Dipdoc Seminar – 15. October 2018 Break signal Climate records are affected by breaks resulting from relocations or changes in the measuring techniques. For the detection, differences of neighboring stations are considered to reduce the dominating natural variance. Homogenization algorithms identify breaks by searching for the maximum external variance (explained by the jumps). Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Benchmark datasets Benchmarking data sets are used to assess the skill of homogenization algorithms. These are artificial data sets with known breaks so that an evaluation of the algorithms is possible. However, benchmark datasets should reflect as much as possible the statistical properties of real data . An important question is how to model the breaks: As free random walk (Brownian motion) As random deviation from a fixed level (random noise) Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Conceptual model Same signal, two approaches: Which of the two DT is assumed to be an independent random variable? The deviations or the jumps? Depending on our choice different statistical properties of break signal will result. Random deviations Brownian motion Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Approach To distinguish BM and RD type breaks we use to following approach. We assume that the climate time series consists of four superimposed signals: Climate, noise, BM and RD type breaks 𝑥 𝑖 =𝛾 𝑖 +𝜀 𝑖 +𝛽 𝑖 +𝛿 𝑖 , with 𝛽 ~ 𝑁 0, 𝜎 𝛽 2 , 𝛿 ~ 𝑁 0, 𝜎 𝛿 2 , 𝜀 ~ 𝑁 0, 𝜎 𝜀 2 Breaks and noise are assumed to be normal distributed. The climate signal is expected to be more complicated, but will be cancelled out in the next step. Breaks occur randomly with an average probability (say 5%). Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Spatial difference The difference between two neighboring stations x1 and x2 is: 𝑑𝑖𝑓 𝑖 = 𝑥 1 𝑖 − 𝑥 2 𝑖 = 𝛽 1 𝑖 + 𝛿 1 𝑖 + 𝜀 1 𝑖 − 𝛽 2 𝑖 + 𝛿 2 𝑖 + 𝜀 2 𝑖 The climate signal is cancelled out, because it is the same at two neighboring stations. However noise due to the different weather at the two stations remains. Dipdoc Seminar – 15. October 2018

Spatiotemporal difference D Now we have the difference time series of station pairs. Within these time series the temporal difference between two time points i and i+L is built: 𝐷= 𝛽 1 𝑖 + 𝛿 1 𝑖 + 𝜀 1 𝑖 − 𝛽 2 𝑖 + 𝛿 2 𝑖 + 𝜀 2 𝑖 − 𝛽 1 𝑖+𝐿 + 𝛿 1 𝑖+𝐿 + 𝜀 1 𝑖+𝐿 + 𝛽 2 𝑖+𝐿 + 𝛿 2 𝑖+𝐿 + 𝜀 2 𝑖+𝐿 D is the sum (or difference) of 12 random numbers. Finally, we calculate the variance of D for classes of constant time lags L: Var(D(L)) Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Variance of D 𝐷= 𝛽 1 𝑖 + 𝛿 1 𝑖 + 𝜀 1 𝑖 − 𝛽 2 𝑖 + 𝛿 2 𝑖 + 𝜀 2 𝑖 − 𝛽 1 𝑖+𝐿 + 𝛿 1 𝑖+𝐿 + 𝜀 1 𝑖+𝐿 + 𝛽 2 𝑖+𝐿 + 𝛿 2 𝑖+𝐿 + 𝜀 2 𝑖+𝐿 A common rule is: Var 𝑎±𝑏 =Var 𝑎 +Var 𝑏 ±2 Cov 𝑎,𝑏 12 variance terms. Covariance only for breaks of the same station. These occur two times (for each station): Var 𝐷 =2 Var 𝛽 𝑖 +2 Var 𝛽 𝑖+𝐿 +4 Var 𝛿 +4 Var 𝜀 −4 Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 −4 Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 Dipdoc Seminar – 15. October 2018

Covariance of RD breaks For external pairs E(Cov) = 0 For internal pairs E(Cov) = Var(d) Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 = 𝑃 𝑖𝑛𝑡 Var 𝛿 The probability to find k breaks within a time span L: 𝑓 𝑘 = 𝜆 𝑘 𝑒 −𝜆 𝑘! , with 𝜆= 𝑝 𝛿 𝐿 𝑃 𝑖𝑛𝑡 = 𝑓 0 = 𝑒 −𝜆 = 𝑒 − 𝑝 𝛿 𝐿 Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 = Var 𝛿 𝑒 − 𝑝 𝛿 𝐿 Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Variance of BM breaks A classical BM is defined as: At time step i it consists of the sum of i random numbers: Breaks do not occur each year, but only with a probability pb: Analogously for i+L: 𝛽 𝑖 = 𝑗=1 𝑖 𝑎(𝑗) , 𝑎 ~ 𝑁 0, 𝜎 𝛽 2 Var 𝛽(𝑖) 𝑐𝑙𝑎𝑠 =𝑖 𝜎 𝛽 2 Var 𝛽 𝑖 =𝑖 𝑝 𝛽 𝜎 𝛽 2 Var 𝛽(𝑖+𝐿) = 𝑖+𝐿 𝑝 𝛽 𝜎 𝛽 2 Dipdoc Seminar – 15. October 2018

Covariance of BM breaks The covariance of two time steps within a Brownian motion is equal to the variance of the earlier one, because both values have all random numbers in common that constitutes the first: Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 =𝑖 𝑝 𝛽 𝜎 𝛽 2 Var 𝛽 𝑖 =𝑖 𝑝 𝛽 𝜎 𝛽 2 Var 𝛽(𝑖+𝐿) = 𝑖+𝐿 𝑝 𝛽 𝜎 𝛽 2 Our previous findings for the variance were: Together they give: Var 𝛽(𝑖) +Var 𝛽(𝑖+𝐿) −2Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 =𝐿 𝑝 𝛽 𝜎 𝛽 2 We obtain a linear function in L. Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Variance of D We return to the original formula : Var 𝐷 =2 Var 𝛽 𝑖 +2 Var 𝛽 𝑖+𝐿 +4 Var 𝛿 +4 Var 𝜀 −4 Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 −4 Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 and inserted our findings: Var 𝐷(𝐿) = 2 𝑝 𝛽 𝜎 𝛽 2 𝐿+ 4 𝜎 𝛿 2 1− 𝑒 − 𝑝 𝛿 𝐿 + 4 𝜎 𝜀 2 The variance of D(L) has three additive components: 1. Linear function for BM type breaks 2. Exponential function for RD type breaks 3. Constant offset for the noise Dipdoc Seminar – 15. October 2018

Test with simulated data RD breaks + noise BM breaks + noise RD + BM + noise sb = 0.0 pb = 0.00 sd = 0.1 pd = 0.05 sb = 0.1 sb = 0.1 pb = 0.05 sd = 0.0 pd = 0.00 sb = 0.1 pb = 0.05 sd = 0.1 pd = 0.05 The variance follows exactly the theory when the known parameters are inserted. But how good is a retrieval without a priori knowledge? Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Retrieval approach We had: Shortly written: Two tangents, one at the beginning, one at the end: 𝐿→0 𝐿→∞ Var 𝐷(𝐿) = 2 𝑝 𝛽 𝜎 𝛽 2 𝐿+ 4 𝜎 𝛿 2 1− 𝑒 − 𝑝 𝛿 𝐿 + 4 𝜎 𝜀 2 𝑉 𝐿 =𝑏𝐿+𝑑 1− 𝑒 −𝑐𝐿 +𝑒 𝑏=2 𝑝 𝛽 𝜎 𝛽 2 𝑐= 𝑝 𝛿 𝑑=4 𝜎 𝛿 2 𝑒=4 𝜎 𝜀 2 𝑉 1 𝐿 =𝑏𝐿+𝑐𝑑𝐿+𝑒= 𝑠𝑙𝑝 1 𝐿+ 𝑐𝑜𝑛 1 𝑉 2 𝐿 =𝑏𝐿+𝑑+𝑒= 𝑠𝑙𝑝 2 𝐿+ 𝑐𝑜𝑛 2 The two tangents have four parameters. From these we can calculate the unknowns b, c, d, and e. 𝐿→0 𝐿→∞ Dipdoc Seminar – 15. October 2018

Retrieval application Two-step retrieval: 1. Two tangents as first guess 2. Exhausting search around it. Nice geometrical interpretation Dipdoc Seminar – 15. October 2018

Retrieval test for sparse data 100 station pairs: Large scatter for high lags. But the retrieval works good, the data itself varies. Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Data ISTI data restricted to US and 1900 - 2000: At least 80 years of data. Distance less than 100 km. 1459 station pairs result. Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Result At short time lags the 1 – e-x increase caused by RD type breaks is visible. For long time lags the linear increase indicates BM type breaks. The offset determines the noise. BM: pb sb2 = 0.45 K2cty-1 RD: pd = 17.1 cty-1 sd2 = 0.12 K2 Noise: se2 = 0.15 K2 Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Conclusion Brownian motion and random deviation break types can be distinguished by calculating the variance of the spatiotemporal difference. The application shows that US data contain both break types. But we did not consider: Possible trend effects Stationarity of the variance Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Lag covariance for RD The covariance is an exponential function of the time lag. C(L) = a exp (-bL) break a = sb2 strength sb b = k/(n-k) number k As byproduct we have a nice method to retrieve also strength and number of breaks directly from the data. Input: sb = 1.000 k = 5.000 Output: k = 4.984 Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 US data, not normalized The covariance reflects mainly the mean difference between two stations. Therefore, the covariance (and variance) is strongly depended on the distance. Averaging over different distance classes would be dangerous. 10.0 350 km 250 km 150 km 50 km Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 US data, normalized Normalization with the time series mean helps. The expected function of the break covariance (e-function) becomes visible. But now the variance makes weird things. Minimum at L/4. Reaching the original value at L/2, increasing further for larger L. 0.5 350 km 250 km 150 km 50 km Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Simulated data not normalised normalised The normalization causes a deformation and a shift of both the covariance and the variance function. Dipdoc Seminar – 15. October 2018

Dipdoc Seminar – 15. October 2018 Rational 𝑥 𝑎 − 𝑥 𝑥 𝑏 − 𝑥 = 𝑥 𝑎 𝑥 𝑏 − 𝑥 𝑎 𝑥 − 𝑥 𝑏 𝑥 + 𝑥 𝑥 𝑥 𝑎 𝑥 = 𝑥 𝑎 ′ + 𝑥 𝑥 = 𝑥 𝑎 ′ 𝑥 + 𝑥 𝑥 𝑥 𝑎 ′ 𝑥 =0 𝑥 𝑎 − 𝑥 𝑥 𝑏 − 𝑥 = 𝑥 𝑎 𝑥 𝑏 − 𝑥 𝑥 𝑥 𝑎 ′ 𝑥 >0 The covariance of two time points a and b is: The mixed product is: Normally we say: Then we have just the shift: However, the mixed product is not zero, but depends on the lengths of the segment. For long segments: Segments at the beginning and at the end are shorter. For L/4 middle years dominate. Dipdoc Seminar – 15. October 2018