Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dipdoc Seminar – 15. October 2018

Similar presentations


Presentation on theme: "Dipdoc Seminar – 15. October 2018"— Presentation transcript:

1 The break signal in climate records: Brownian motion or Random deviations? Ralf Lindau

2 Dipdoc Seminar – 15. October 2018
Break signal Climate records are affected by breaks resulting from relocations or changes in the measuring techniques. For the detection, differences of neighboring stations are considered to reduce the dominating natural variance. Homogenization algorithms identify breaks by searching for the maximum external variance (explained by the jumps). Dipdoc Seminar – 15. October 2018

3 Dipdoc Seminar – 15. October 2018
Benchmark datasets Benchmarking data sets are used to assess the skill of homogenization algorithms. These are artificial data sets with known breaks so that an evaluation of the algorithms is possible. However, benchmark datasets should reflect as much as possible the statistical properties of real data . An important question is how to model the breaks: As free random walk (Brownian motion) As random deviation from a fixed level (random noise) Dipdoc Seminar – 15. October 2018

4 Dipdoc Seminar – 15. October 2018
Conceptual model Same signal, two approaches: Which of the two DT is assumed to be an independent random variable? The deviations or the jumps? Depending on our choice different statistical properties of break signal will result. Random deviations Brownian motion Dipdoc Seminar – 15. October 2018

5 Dipdoc Seminar – 15. October 2018
Approach To distinguish BM and RD type breaks we use to following approach. We assume that the climate time series consists of four superimposed signals: Climate, noise, BM and RD type breaks 𝑥 𝑖 =𝛾 𝑖 +𝜀 𝑖 +𝛽 𝑖 +𝛿 𝑖 , with 𝛽 ~ 𝑁 0, 𝜎 𝛽 2 , 𝛿 ~ 𝑁 0, 𝜎 𝛿 2 , 𝜀 ~ 𝑁 0, 𝜎 𝜀 2 Breaks and noise are assumed to be normal distributed. The climate signal is expected to be more complicated, but will be cancelled out in the next step. Breaks occur randomly with an average probability (say 5%). Dipdoc Seminar – 15. October 2018

6 Dipdoc Seminar – 15. October 2018
Spatial difference The difference between two neighboring stations x1 and x2 is: 𝑑𝑖𝑓 𝑖 = 𝑥 1 𝑖 − 𝑥 2 𝑖 = 𝛽 1 𝑖 + 𝛿 1 𝑖 + 𝜀 1 𝑖 − 𝛽 2 𝑖 + 𝛿 2 𝑖 + 𝜀 2 𝑖 The climate signal is cancelled out, because it is the same at two neighboring stations. However noise due to the different weather at the two stations remains. Dipdoc Seminar – 15. October 2018

7 Spatiotemporal difference D
Now we have the difference time series of station pairs. Within these time series the temporal difference between two time points i and i+L is built: 𝐷= 𝛽 1 𝑖 + 𝛿 1 𝑖 + 𝜀 1 𝑖 − 𝛽 2 𝑖 + 𝛿 2 𝑖 + 𝜀 2 𝑖 − 𝛽 1 𝑖+𝐿 + 𝛿 1 𝑖+𝐿 + 𝜀 1 𝑖+𝐿 + 𝛽 2 𝑖+𝐿 + 𝛿 2 𝑖+𝐿 + 𝜀 2 𝑖+𝐿 D is the sum (or difference) of 12 random numbers. Finally, we calculate the variance of D for classes of constant time lags L: Var(D(L)) Dipdoc Seminar – 15. October 2018

8 Dipdoc Seminar – 15. October 2018
Variance of D 𝐷= 𝛽 1 𝑖 + 𝛿 1 𝑖 + 𝜀 1 𝑖 − 𝛽 2 𝑖 + 𝛿 2 𝑖 + 𝜀 2 𝑖 − 𝛽 1 𝑖+𝐿 + 𝛿 1 𝑖+𝐿 + 𝜀 1 𝑖+𝐿 + 𝛽 2 𝑖+𝐿 + 𝛿 2 𝑖+𝐿 + 𝜀 2 𝑖+𝐿 A common rule is: Var 𝑎±𝑏 =Var 𝑎 +Var 𝑏 ±2 Cov 𝑎,𝑏 12 variance terms. Covariance only for breaks of the same station. These occur two times (for each station): Var 𝐷 =2 Var 𝛽 𝑖 +2 Var 𝛽 𝑖+𝐿 +4 Var 𝛿 +4 Var 𝜀 −4 Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 −4 Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 Dipdoc Seminar – 15. October 2018

9 Covariance of RD breaks
For external pairs E(Cov) = 0 For internal pairs E(Cov) = Var(d) Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 = 𝑃 𝑖𝑛𝑡 Var 𝛿 The probability to find k breaks within a time span L: 𝑓 𝑘 = 𝜆 𝑘 𝑒 −𝜆 𝑘! , with 𝜆= 𝑝 𝛿 𝐿 𝑃 𝑖𝑛𝑡 = 𝑓 0 = 𝑒 −𝜆 = 𝑒 − 𝑝 𝛿 𝐿 Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 = Var 𝛿 𝑒 − 𝑝 𝛿 𝐿 Dipdoc Seminar – 15. October 2018

10 Dipdoc Seminar – 15. October 2018
Variance of BM breaks A classical BM is defined as: At time step i it consists of the sum of i random numbers: Breaks do not occur each year, but only with a probability pb: Analogously for i+L: 𝛽 𝑖 = 𝑗=1 𝑖 𝑎(𝑗) , 𝑎 ~ 𝑁 0, 𝜎 𝛽 2 Var 𝛽(𝑖) 𝑐𝑙𝑎𝑠 =𝑖 𝜎 𝛽 2 Var 𝛽 𝑖 =𝑖 𝑝 𝛽 𝜎 𝛽 2 Var 𝛽(𝑖+𝐿) = 𝑖+𝐿 𝑝 𝛽 𝜎 𝛽 2 Dipdoc Seminar – 15. October 2018

11 Covariance of BM breaks
The covariance of two time steps within a Brownian motion is equal to the variance of the earlier one, because both values have all random numbers in common that constitutes the first: Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 =𝑖 𝑝 𝛽 𝜎 𝛽 2 Var 𝛽 𝑖 =𝑖 𝑝 𝛽 𝜎 𝛽 2 Var 𝛽(𝑖+𝐿) = 𝑖+𝐿 𝑝 𝛽 𝜎 𝛽 2 Our previous findings for the variance were: Together they give: Var 𝛽(𝑖) +Var 𝛽(𝑖+𝐿) −2Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 =𝐿 𝑝 𝛽 𝜎 𝛽 2 We obtain a linear function in L. Dipdoc Seminar – 15. October 2018

12 Dipdoc Seminar – 15. October 2018
Variance of D We return to the original formula : Var 𝐷 =2 Var 𝛽 𝑖 +2 Var 𝛽 𝑖+𝐿 +4 Var 𝛿 +4 Var 𝜀 −4 Cov 𝛽 𝑖 ,𝛽 𝑖+𝐿 −4 Cov 𝛿 𝑖 ,𝛿 𝑖+𝐿 and inserted our findings: Var 𝐷(𝐿) = 2 𝑝 𝛽 𝜎 𝛽 2 𝐿+ 4 𝜎 𝛿 2 1− 𝑒 − 𝑝 𝛿 𝐿 𝜎 𝜀 2 The variance of D(L) has three additive components: 1. Linear function for BM type breaks 2. Exponential function for RD type breaks 3. Constant offset for the noise Dipdoc Seminar – 15. October 2018

13 Test with simulated data
RD breaks + noise BM breaks + noise RD + BM + noise sb = 0.0 pb = 0.00 sd = 0.1 pd = 0.05 sb = 0.1 sb = 0.1 pb = 0.05 sd = 0.0 pd = 0.00 sb = 0.1 pb = 0.05 sd = 0.1 pd = 0.05 The variance follows exactly the theory when the known parameters are inserted. But how good is a retrieval without a priori knowledge? Dipdoc Seminar – 15. October 2018

14 Dipdoc Seminar – 15. October 2018
Retrieval approach We had: Shortly written: Two tangents, one at the beginning, one at the end: 𝐿→0 𝐿→∞ Var 𝐷(𝐿) = 2 𝑝 𝛽 𝜎 𝛽 2 𝐿+ 4 𝜎 𝛿 2 1− 𝑒 − 𝑝 𝛿 𝐿 𝜎 𝜀 2 𝑉 𝐿 =𝑏𝐿+𝑑 1− 𝑒 −𝑐𝐿 +𝑒 𝑏=2 𝑝 𝛽 𝜎 𝛽 2 𝑐= 𝑝 𝛿 𝑑=4 𝜎 𝛿 2 𝑒=4 𝜎 𝜀 2 𝑉 1 𝐿 =𝑏𝐿+𝑐𝑑𝐿+𝑒= 𝑠𝑙𝑝 1 𝐿+ 𝑐𝑜𝑛 1 𝑉 2 𝐿 =𝑏𝐿+𝑑+𝑒= 𝑠𝑙𝑝 2 𝐿+ 𝑐𝑜𝑛 2 The two tangents have four parameters. From these we can calculate the unknowns b, c, d, and e. 𝐿→0 𝐿→∞ Dipdoc Seminar – 15. October 2018

15 Retrieval application
Two-step retrieval: 1. Two tangents as first guess 2. Exhausting search around it. Nice geometrical interpretation Dipdoc Seminar – 15. October 2018

16 Retrieval test for sparse data
100 station pairs: Large scatter for high lags. But the retrieval works good, the data itself varies. Dipdoc Seminar – 15. October 2018

17 Dipdoc Seminar – 15. October 2018
Data ISTI data restricted to US and : At least 80 years of data. Distance less than 100 km. 1459 station pairs result. Dipdoc Seminar – 15. October 2018

18 Dipdoc Seminar – 15. October 2018
Result At short time lags the 1 – e-x increase caused by RD type breaks is visible. For long time lags the linear increase indicates BM type breaks. The offset determines the noise. BM: pb sb2 = K2cty-1 RD: pd = cty-1 sd = K2 Noise: se = K2 Dipdoc Seminar – 15. October 2018

19 Dipdoc Seminar – 15. October 2018
Conclusion Brownian motion and random deviation break types can be distinguished by calculating the variance of the spatiotemporal difference. The application shows that US data contain both break types. But we did not consider: Possible trend effects Stationarity of the variance Dipdoc Seminar – 15. October 2018

20 Dipdoc Seminar – 15. October 2018
Lag covariance for RD The covariance is an exponential function of the time lag. C(L) = a exp (-bL) break a = sb2 strength sb b = k/(n-k) number k As byproduct we have a nice method to retrieve also strength and number of breaks directly from the data. Input: sb = 1.000 k = 5.000 Output: k = 4.984 Dipdoc Seminar – 15. October 2018

21 Dipdoc Seminar – 15. October 2018
US data, not normalized The covariance reflects mainly the mean difference between two stations. Therefore, the covariance (and variance) is strongly depended on the distance. Averaging over different distance classes would be dangerous. 10.0 350 km 250 km 150 km 50 km Dipdoc Seminar – 15. October 2018

22 Dipdoc Seminar – 15. October 2018
US data, normalized Normalization with the time series mean helps. The expected function of the break covariance (e-function) becomes visible. But now the variance makes weird things. Minimum at L/4. Reaching the original value at L/2, increasing further for larger L. 0.5 350 km 250 km 150 km 50 km Dipdoc Seminar – 15. October 2018

23 Dipdoc Seminar – 15. October 2018
Simulated data not normalised normalised The normalization causes a deformation and a shift of both the covariance and the variance function. Dipdoc Seminar – 15. October 2018

24 Dipdoc Seminar – 15. October 2018
Rational 𝑥 𝑎 − 𝑥 𝑥 𝑏 − 𝑥 = 𝑥 𝑎 𝑥 𝑏 − 𝑥 𝑎 𝑥 − 𝑥 𝑏 𝑥 + 𝑥 𝑥 𝑥 𝑎 𝑥 = 𝑥 𝑎 ′ + 𝑥 𝑥 = 𝑥 𝑎 ′ 𝑥 + 𝑥 𝑥 𝑥 𝑎 ′ 𝑥 =0 𝑥 𝑎 − 𝑥 𝑥 𝑏 − 𝑥 = 𝑥 𝑎 𝑥 𝑏 − 𝑥 𝑥 𝑥 𝑎 ′ 𝑥 >0 The covariance of two time points a and b is: The mixed product is: Normally we say: Then we have just the shift: However, the mixed product is not zero, but depends on the lengths of the segment. For long segments: Segments at the beginning and at the end are shorter. For L/4 middle years dominate. Dipdoc Seminar – 15. October 2018


Download ppt "Dipdoc Seminar – 15. October 2018"

Similar presentations


Ads by Google