Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.

Slides:



Advertisements
Similar presentations
State Space Models. Let { x t :t T} and { y t :t T} denote two vector valued time series that satisfy the system of equations: y t = A t x t + v t (The.
Advertisements

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
On the alternative approaches to ITRF formulation. A theoretical comparison. Department of Geodesy and Surveying Aristotle University of Thessaloniki Athanasios.
Dimension reduction (1)
Segmentation and Fitting Using Probabilistic Methods
Classical inference and design efficiency Zurich SPM Course 2014
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Visual Recognition Tutorial
Introduction to Mobile Robotics Bayes Filter Implementations Gaussian filters.
Time series analysis - lecture 5
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Dimensional reduction, PCA
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Probabilistic Robotics
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Today Wrap up of probability Vectors, Matrices. Calculus
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK L ÁSZLÓ M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest,
Computer Vision - A Modern Approach Set: Tracking Slides by D.A. Forsyth The three main issues in tracking.
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
Modern Navigation Thomas Herring
Chapter 21 R(x) Algorithm a) Anomaly Detection b) Matched Filter.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
CS Statistical Machine learning Lecture 24
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Overview of Optimization in Ag Economics Lecture 2.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Probability and statistics review ASEN 5070 LECTURE.
Lecture 2: Statistical learning primer for biologists
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
An Introduction To The Kalman Filter By, Santhosh Kumar.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
ASEN 5070: Statistical Orbit Determination I Fall 2014
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
Classification of unlabeled data:
Department of Civil and Environmental Engineering
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Dipdoc Seminar – 15. October 2018
Filtering and State Estimation: Basic Concepts
Contrasts & Statistical Inference
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
OVERVIEW OF LINEAR MODELS
Principles of the Global Positioning System Lecture 11
Contrasts & Statistical Inference
Contrasts & Statistical Inference
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Probabilistic Surrogate Models
Presentation transcript:

Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne Burauskaite-Harju, and Karl Wahlin Department of Computer and Information Science Linköping University, SE Linköping, Sweden

Budapest May 27, 2008 Objective of our work  Combine the best ideas of a class of Mixed Linear Models (MLM) suggested by Picard et al. and Multiple Analysis of Series for Homogenization (MASH)  Provide a unified notation and theoretical framework for breakpoint detection and correction  Discuss further development of the cited models/methods

Budapest May 27, 2008 Parametric vs nonparametric approaches  Parametric approaches are needed to capture the abruptness of a change  Nonparametric approaches are suitable for tests of smooth trends in corrected data

Budapest May 27, 2008 Checklist for describing methods for breakpoint detection and correction 1.Candidate-reference comparisons  Pairwise differences or differences between candidate series and optimally weighted reference series 2.Probability model of observed data  Mean function (observed values adjusted for meteorological variability)  Variance-covariance matrix (meteorological variability and relationship between observations made at different locations and/or different occasions) 3.Estimators of breakpoints and other model parameters for a given number of breakpoints  Joint estimation of all model parameters or sequential identification of breakpoints  Theoretically optimal estimators or ad-hoc methods

Budapest May 27, 2008 Checklist for describing methods for breakpoint detection and correction 4.Stopping rule for the number of breakpoints  Hypothesis testing or information measures 5.Numerical algorithms for the chosen estimators  Numerical stability and computational cost 6.Loss function for the performance of the breakpoint correction  Minimizing the risk of erroneous estimates of individual breakpoints or false trends in the corrected series All the listed items should be documented in any assessment of methods for breakpoint detection and correction!

Budapest May 27, 2008 Candidate-reference comparisons  Mixed Linear Models (MLM)  Candidate-reference comparisons are determined a priori  Multiple Analysis of Series for Homogenization (MASH)  “Optimally weighted” references are created during the data analysis

Budapest May 27, 2008 Probability model of observed data - the mean function  MASH  The mean function of candidate-reference differences is stepwise constant (multiple breakpoints can be accommodated)  MLM  The mean function of candidate-reference differences is stepwise constant (multiple breakpoints can be accommodated)

Budapest May 27, 2008 Probability model of observed data - the variance-covariance matrix  MASH The spatio-temporal covariance is split into spatial covariance and noise  Candidate-reference differences observed at different occasions are assumed to be statistically independent  MLM The spatio-temporal covariance of observed data is expressed by nested random components  A time series of random components common to all sites in a local neighbourhood introduces both spatial and temporal correlations  Noise (independent random components) adds to the variability of observed data

Budapest May 27, 2008 Probability model of observed data - distributional assumptions  MASH  The candidate-reference differences are assumed to form a Gaussian vector of independent random variables  MLM  All random components are assumed to be independent and to have a Gaussian distribution

Budapest May 27, 2008 Estimators of breakpoints and other model parameters for a given number of breakpoints  MASH  Method based on the idea that breakpoints are most easily detected if each candidate series is compared to an optimally selected reference series  Breakpoints are estimated one at a time (?), given the previously detected breakpoints  MLM  Joint estimation of all model parameters, including the breakpoints  The estimator defined as the argument maximizing the likelihood of observed data (Maximum-Likelihood estimation)

Budapest May 27, 2008 Numerical algorithms  MASH  The estimators used are defined by their numerical algorithms  MLM  Parameter estimates are computed using an Expectation-Maximization (EM) algorithm in which segmentation of observed data is alternated with estimation of model parameters for a given segmentation

Budapest May 27, 2008 A Mixed Linear Model of data from m stations observed at n occasions Incidence by station and segment Incidence by sampling occasion Observed values Noise Means by station and segment Random components by sampling occasion Vector of zeros and ones indicating the segment of each observation

Budapest May 27, 2008 Matrix representation used by Picard et al.  Model:  The matrix T defines the segmentation of the study period  U is a zero mean normal vector with covariance matrix G  E is a zero mean normal vector with diagonal covariance matrix R  U and E are independent, implying that Y has covariance matrix.

Budapest May 27, 2008 Implicit model of candidate-reference differences  Introduce the (nm)x(nm) matrix where n is the number of sampling occasions, m is the number of stations.  Provided that the row sums of W are zero, we get the matrix equation

Budapest May 27, 2008 Alternating algorithms for joint estimation of all model parameters  The entire space of parameters is searched by altering some of the coordinates at a time  Each cycle of the alternating algorithm contains: i.a segmentation step (S) ii.an estimation step for a given segmentation of the data (E) iii.an optional step for deriving an “optimal” reference to each time series of data (O)

Budapest May 27, 2008 Remarks to alternating algorithms for joint estimation of all model parameters  One does not need to maximize with respect to all of the latent parameters at once, but could instead maximize over one or a few of them at a time, alternating with the maximization step  The algorithm can be made adaptive by altering the return time for different parts of the full cycle  Additional constraints may be imposed on the structure of the variance-covariance matrix  The mean function can be modified to accommodate mean functions that are non-constant between breakpoints  Covariates can be introduced into the model

Budapest May 27, 2008 Proposed basis for a unified approach 1.A joint probabilistic framework comprised of multivariate normal distributions expressed as mixed linear models 2.Explicitly defined mean functions and variance-covariance matrices (stepwise constant or linear mean functions, spatial and temporal correlations etc) 3.Joint ML-estimation of all model parameters (including the location of breakpoints) is adopted as a desirable standard 4.Optimal weighting of references and other systems for candidate- reference systems are offered as options to all models 5.Various stopping rules for the number of breakpoints are offered as options to all models 6.The detection and correction for breakpoints should be regarded as a filter that reduces the risk of false conclusions regarding temporal trends

Budapest May 27, 2008 Some remarks on temporal scales  Homogenizing subannual data may have three objectives:  Facilitate the detection of breakpoints that occur in the middle of a year  Facilitate the detection of breakpoints by using meteorological covariates  Facilitate the detection of changepoints in extremes

Budapest May 27, 2008 Additional remark on parametric vs nonparametric approaches  Parametric approaches are often a must when data are sparse  Observations of extreme events are sparse  The joint occurrence of shifts in the mean and higher percentiles calls for parametric modelling

Budapest May 27, 2008 Conclusions  We need a checklist for describing all methods considered  Mixed linear models provide a framework and generic notation for unifying “all” parametric approaches from SNHT to Caussinus & Mestre and MASH  The choice of principles for parameter estimation should be separated from the construction of numerical algorithms  Options for candidate-reference comparisons and stopping rules for the number of breakpoint should be offered to all underlying models