Download presentation
Presentation is loading. Please wait.
Published byEdgar Ryan Modified over 9 years ago
1
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne Burauskaite-Harju, and Karl Wahlin Department of Computer and Information Science Linköping University, SE-58183 Linköping, Sweden E-mail: angri@ida.liu.se
2
Budapest May 27, 2008 Objective of our work Combine the best ideas of a class of Mixed Linear Models (MLM) suggested by Picard et al. and Multiple Analysis of Series for Homogenization (MASH) Provide a unified notation and theoretical framework for breakpoint detection and correction Discuss further development of the cited models/methods
3
Budapest May 27, 2008 Parametric vs nonparametric approaches Parametric approaches are needed to capture the abruptness of a change Nonparametric approaches are suitable for tests of smooth trends in corrected data
4
Budapest May 27, 2008 Checklist for describing methods for breakpoint detection and correction 1.Candidate-reference comparisons Pairwise differences or differences between candidate series and optimally weighted reference series 2.Probability model of observed data Mean function (observed values adjusted for meteorological variability) Variance-covariance matrix (meteorological variability and relationship between observations made at different locations and/or different occasions) 3.Estimators of breakpoints and other model parameters for a given number of breakpoints Joint estimation of all model parameters or sequential identification of breakpoints Theoretically optimal estimators or ad-hoc methods
5
Budapest May 27, 2008 Checklist for describing methods for breakpoint detection and correction 4.Stopping rule for the number of breakpoints Hypothesis testing or information measures 5.Numerical algorithms for the chosen estimators Numerical stability and computational cost 6.Loss function for the performance of the breakpoint correction Minimizing the risk of erroneous estimates of individual breakpoints or false trends in the corrected series All the listed items should be documented in any assessment of methods for breakpoint detection and correction!
6
Budapest May 27, 2008 Candidate-reference comparisons Mixed Linear Models (MLM) Candidate-reference comparisons are determined a priori Multiple Analysis of Series for Homogenization (MASH) “Optimally weighted” references are created during the data analysis
7
Budapest May 27, 2008 Probability model of observed data - the mean function MASH The mean function of candidate-reference differences is stepwise constant (multiple breakpoints can be accommodated) MLM The mean function of candidate-reference differences is stepwise constant (multiple breakpoints can be accommodated)
8
Budapest May 27, 2008 Probability model of observed data - the variance-covariance matrix MASH The spatio-temporal covariance is split into spatial covariance and noise Candidate-reference differences observed at different occasions are assumed to be statistically independent MLM The spatio-temporal covariance of observed data is expressed by nested random components A time series of random components common to all sites in a local neighbourhood introduces both spatial and temporal correlations Noise (independent random components) adds to the variability of observed data
9
Budapest May 27, 2008 Probability model of observed data - distributional assumptions MASH The candidate-reference differences are assumed to form a Gaussian vector of independent random variables MLM All random components are assumed to be independent and to have a Gaussian distribution
10
Budapest May 27, 2008 Estimators of breakpoints and other model parameters for a given number of breakpoints MASH Method based on the idea that breakpoints are most easily detected if each candidate series is compared to an optimally selected reference series Breakpoints are estimated one at a time (?), given the previously detected breakpoints MLM Joint estimation of all model parameters, including the breakpoints The estimator defined as the argument maximizing the likelihood of observed data (Maximum-Likelihood estimation)
11
Budapest May 27, 2008 Numerical algorithms MASH The estimators used are defined by their numerical algorithms MLM Parameter estimates are computed using an Expectation-Maximization (EM) algorithm in which segmentation of observed data is alternated with estimation of model parameters for a given segmentation
12
Budapest May 27, 2008 A Mixed Linear Model of data from m stations observed at n occasions Incidence by station and segment Incidence by sampling occasion Observed values Noise Means by station and segment Random components by sampling occasion Vector of zeros and ones indicating the segment of each observation
13
Budapest May 27, 2008 Matrix representation used by Picard et al. Model: The matrix T defines the segmentation of the study period U is a zero mean normal vector with covariance matrix G E is a zero mean normal vector with diagonal covariance matrix R U and E are independent, implying that Y has covariance matrix.
14
Budapest May 27, 2008 Implicit model of candidate-reference differences Introduce the (nm)x(nm) matrix where n is the number of sampling occasions, m is the number of stations. Provided that the row sums of W are zero, we get the matrix equation
15
Budapest May 27, 2008 Alternating algorithms for joint estimation of all model parameters The entire space of parameters is searched by altering some of the coordinates at a time Each cycle of the alternating algorithm contains: i.a segmentation step (S) ii.an estimation step for a given segmentation of the data (E) iii.an optional step for deriving an “optimal” reference to each time series of data (O)
16
Budapest May 27, 2008 Remarks to alternating algorithms for joint estimation of all model parameters One does not need to maximize with respect to all of the latent parameters at once, but could instead maximize over one or a few of them at a time, alternating with the maximization step The algorithm can be made adaptive by altering the return time for different parts of the full cycle Additional constraints may be imposed on the structure of the variance-covariance matrix The mean function can be modified to accommodate mean functions that are non-constant between breakpoints Covariates can be introduced into the model
17
Budapest May 27, 2008 Proposed basis for a unified approach 1.A joint probabilistic framework comprised of multivariate normal distributions expressed as mixed linear models 2.Explicitly defined mean functions and variance-covariance matrices (stepwise constant or linear mean functions, spatial and temporal correlations etc) 3.Joint ML-estimation of all model parameters (including the location of breakpoints) is adopted as a desirable standard 4.Optimal weighting of references and other systems for candidate- reference systems are offered as options to all models 5.Various stopping rules for the number of breakpoints are offered as options to all models 6.The detection and correction for breakpoints should be regarded as a filter that reduces the risk of false conclusions regarding temporal trends
18
Budapest May 27, 2008 Some remarks on temporal scales Homogenizing subannual data may have three objectives: Facilitate the detection of breakpoints that occur in the middle of a year Facilitate the detection of breakpoints by using meteorological covariates Facilitate the detection of changepoints in extremes
19
Budapest May 27, 2008 Additional remark on parametric vs nonparametric approaches Parametric approaches are often a must when data are sparse Observations of extreme events are sparse The joint occurrence of shifts in the mean and higher percentiles calls for parametric modelling
20
Budapest May 27, 2008 Conclusions We need a checklist for describing all methods considered Mixed linear models provide a framework and generic notation for unifying “all” parametric approaches from SNHT to Caussinus & Mestre and MASH The choice of principles for parameter estimation should be separated from the construction of numerical algorithms Options for candidate-reference comparisons and stopping rules for the number of breakpoint should be offered to all underlying models
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.