1 Robustness of Multiway Methods in Relation to Homoscedastic and Hetroscedastic Noise T. Khayamian Department of Chemistry, Isfahan University of Technology,

Slides:



Advertisements
Similar presentations
Sampling plans for linear regression
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
1 G Lect 4M Interpreting multiple regression weights: suppression and spuriousness. Partial and semi-partial correlations Multiple regression in.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
Structural Equation Modeling
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Forecasting Using the Simple Linear Regression Model and Correlation
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 10 Curve Fitting and Regression Analysis
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
7th IEEE Technical Exchange Meeting 2000 Hybrid Wavelet-SVD based Filtering of Noise in Harmonics By Prof. Maamar Bettayeb and Syed Faisal Ali Shah King.
Part 4 Chapter 13 Linear Regression
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Computer vision: models, learning and inference
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
1 2. The PARAFAC model Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Business Statistics - QBM117 Statistical inference for regression.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Introduction to Linear Regression and Correlation Analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Chapter 11 Simple Regression
Simple Linear Regression Models
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Further Topics in Regression Analysis Objectives: By the end of this section, I will be able to… 1) Explain prediction error, calculate SSE, and.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
ES 240: Scientific and Engineering Computation. Chapter 13: Linear Regression 13. 1: Statistical Review Uchechukwu Ofoegbu Temple University.
Regression Models Residuals and Diagnosing the Quality of a Model.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Introduction to Biostatistics and Bioinformatics Regression and Correlation.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
LESSON 4.1. MULTIPLE LINEAR REGRESSION 1 Design and Data Analysis in Psychology II Salvador Chacón Moscoso Susana Sanduvete Chaves.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Regression Chapter 6 I Introduction to Regression
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Kin 304 Regression Linear Regression Least Sum of Squares
ECONOMETRICS DR. DEEPTI.
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
BPK 304W Regression Linear Regression Least Sum of Squares
Residuals and Diagnosing the Quality of a Model
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Linear Regression.
Linear regression Fitting a straight line to observations.
OVERVIEW OF LINEAR MODELS
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Correlation and Regression
Principal Component Analysis
Adequacy of Linear Regression Models
3.2. SIMPLE LINEAR REGRESSION
3 basic analytical tasks in bivariate (or multivariate) analyses:
Factor Analysis.
Probabilistic Surrogate Models
Presentation transcript:

1 Robustness of Multiway Methods in Relation to Homoscedastic and Hetroscedastic Noise T. Khayamian Department of Chemistry, Isfahan University of Technology, Isfahan 84154, Iran

2 Outline Introduction Introduction Prediction of component concentrations in Claus data using PARAFAC and N-PLS multi-way methods Prediction of component concentrations in Claus data using PARAFAC and N-PLS multi-way methods original data original data (original + noise) data (original + noise) data denoised data (using wavelet as a denoising method) denoised data (using wavelet as a denoising method) - Homoscedastic noise (level independent method) - Homoscedastic noise (level independent method) - Hetroscedastic noise (level dependent and minimum - Hetroscedastic noise (level dependent and minimum description length) description length) Conclusions Conclusions

3 Noise definition ●Noise is any component of a signal which impedes observation, detection or utilization of the information that the signal is carrying. ●Noise is measured by its standard deviation or peak to peak fluctuation

4 Different types of noise Hetroscedastic Homoscedastic Noise

5 Homoscedastic and Hetroscedastic Noise ● Homoscedastic noise: Noise is independent of variable, sample and signal with the normal distribution and a constant variance. Noise is independent of variable, sample and signal with the normal distribution and a constant variance. ● Hetroscedastic Noise: Noise is dependent on the variable, sample and signal. Noise is dependent on the variable, sample and signal. (Noise from different variables or samples can be correlated) (Noise from different variables or samples can be correlated) 1 2 R 1111 2222 2 2

6 Least squares method Homoscedastic noise :  ij is constant, uniform and independent of the signal, variables and samples Hetroscedastic noise :  ij is dependent on signal, variables or samples

7 Hetroscedastic noise in Univariate and Multivariate Calibration Methods ● Zeroth order calibration weighted linear regression weighted linear regression ● First order calibration weighted principle component analysis weighted principle component analysis ● Second order calibration Positive matrix factorization Positive matrix factorization Maximum likelihood PARAFAC Maximum likelihood PARAFAC

8 Claus data “fluorescence Spectroscopy” Analyte 1 (tyrosine) Analyte 2 (tyrptophane) Analyte 3 (phenyl alanine) Sample 1 2.7× Sample × Sample ×10 -4 Sample 4 1.6× × ×10 -4 Sample 5 9.0× × ×10 -4 C. A. Andesson and R. Bro. The N-way Toolbox for MATAB Chemom. Intell. Lab. Sys. 2000, 52 (1), models. kvl. dk

9 Fluorescence excitation and emission spectrum of five samples

10 X = ++ a1a1a1a1 b1b1b1b1 c1c1c1c1 c2c2c2c2 c3c3c3c3 a2a2a2a2 b2b2b2b2 b3b3b3b3 a3a3a3a3 Claus data PARAFAC: four samples were used for modeling Score (a1) (a1) Concentration analyte 1 Score (a2) (a2) Concentration analyte 2 Score (a3) (a3) Concentration analyte 3

11 Calculation of Score for a New Sample Z = kr (B, C) Un = reshape (Un, 12261, 1) Score Un = pinv(Z) * Un Un =

12 Relative Errors of Predicted Concentrations for Samples 4 & 5 (without adding noise) Analyte 1 Analyte 2 Analyte 3 Sample Sample

13 Generating of Noise Matrix 5 201×61 (Claus data) 5Noise Homoscedastic nois: Standard deviation of noise = 2%, 5%, 10% of the maximum value in the claus data Hetroscedastic noise : N = N(0,1). * 1/10 X Element by element was multiplied by one-tenth of the claus data 201×61 Claus data + Noise unfolding

14 Homoscedastic and Hetroscedastic noise were added to original data Hetroscedastic noise (10%) Homoscedastic noise (10%)

15 Reshape of Sample One Sample one with adding Homoscedastic noise The effect of adding Homoscedastic noise

16 Reshape of Sample One Sample one with adding Hetroscedastic noise The effect of adding Hetroscedastic noise

17 Wavelet can be used as a powerful tool for signal denoising Wavelet Denoising : ● Wavelet decomposition of the signal ● Selecting the threshold ● Applying the threshold to the wavelet coefficients ● Inverse transformation to the native domain

18 Thresholding methods : ● Global thresholding ● Level dependent thresholding ● Data dependent thresholding ● Cycle – spin thresholding ● Wavelet packet thresholding

19 Universal threshold : N = length of data array Xi = detail part of coefficient

20

21 Prediction of Analyte Concentrations for Samples 4 & 5 using PARAFAC

22 Comparison of Sum of the Square of Residuals (Homoscedastic noise - PARAFAC) SSR Model 1 SSR SSR Model 2 Without noise Noisy data Homo. noise 2% Homo. noise 5% Homo. noise 10% Denoised data Homo. noise 2% Homo. noise 5% Homo. noise 10% model 1 : sample 1, 2, 3, 4 / model 2 : sample 1, 2, 3, 5 Each number × 10 5

23 Var. Model 1 Var. Var. Model 2 Without noise Noisy data Homo. noise 2% Homo. noise 5% Homo. noise 10% Denoised data Homo. noise 2% Homo. noise 5% Homo. noise 10% Comparison of explained variation (Homoscedastic noise - PARAFAC)

24 Relative Errors of Predicted Concentrations for Sample 4 ( Homoscedastic noise – PARAFAC ) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 2 % noise % noise % noise Denoised data 2 % noise % noise % noise

25 Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 2 % noise % noise % noise Denoised data 2 % noise % noise % noise Relative Errors of Predicted Concentrations for Sample 5 ( Homoscedastic noise - PARAFAC )

26 Comparison of Sum of the Square of Residuals (Hetroscedastic noise) SSR Model 1 SSR Model 2 Without noise Noisy data Hetro. noise 10% Hetro. noise 20% Denoised data Hetro. noise 10% Hetro. noise 20% (Each number * 10 5 ) wavelet denoising (level dependent method)

27 Comparison of explained variation (Hetroscedastic noise - PARAFAC) Var. Model 1 Var. Model 2 Without noise Noisy data Hetro. noise 10% Hetro. noise 20% Denoised data Hetro. noise 10% Hetro. noise 20% wavelet denoising (level dependent method)

28 Relative Errors of Predicted Concentrations for sample 4 ( Hetroscedastic noise - PARAFAC) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise wavelet denoising (level dependent method)

29 Relative Errors of Predicted Concentrations for Sample 5 ( Hetroscedastic noise - PARAFAC) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise wavelet denoising (level dependent method)

30 Minimum Description Length The MDL is an approach to simultaneous noise suppression and signal compression.The MDL is an approach to simultaneous noise suppression and signal compression. It is free from any parameter setting such as threshold selection, which can be particularly useful for real data where the noise level is difficult to estimate.It is free from any parameter setting such as threshold selection, which can be particularly useful for real data where the noise level is difficult to estimate. m = filter type l m = the number of major coefficients retained γ j,k = the vector of wavelet coefficients of transformed type m γ j,k = the vector of the contracted wavelet coefficients ml

31 Signal Denoising with MDL method

32 Comparison of Sum of the Square of Residuals (Hetroscedastic noise - PARAFAC) SSR Model 1 SSR Model 2 Without noise Noisy data Hetro. noise 10% Hetro. noise 20% Denoised data Hetro. noise 10% Hetro. noise 20% Each number × 10 5 Wavelet Denoising (MDL)

33 Comparison of explained variation (Hetroscedastic noise - PARAFAC) Var. Model 1 Var. Model 2 Without noise Noisy data Hetro. noise 10% Hetro. noise 20% Denoised data Hetro. noise 10% Hetro. noise 20% Wavelet Denoising (MDL)

34 Relative Errors of Predicted Concentrations for sample 4 ( Hetroscedastic noise - PARAFAC) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise Wavelet Denoising (MDL)

35 Relative Errors of Predicted Concentrations for Sample 5 ( Hetroscedastic noise - PARAFAC ) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise Wavelet Denoising (MDL)

36 Prediction of Analyte Concentrations for Samples 4 & 5 using N-PLS

37 Relative Errors of Predicted Concentrations for Sample 4 ( Homoscedastic noise – NPLS model) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 2 % noise % noise % noise Denoised data 2 % noise % noise % noise X-block > 99 Y-block > 99

38 Relative Errors of Predicted Concentrations for Sample 5 ( Homoscedastic noise – NPLS model ) Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 2 % noise % noise % noise Denoised data 2 % noise % noise % noise X-block > 99 Y-block > 99

39 Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise Relative Errors of Predicted Concentrations for Sample 4 ( Hetroscedastic noise – NPLS model) Wavelet Denoising (MDL) X-block > 99 Y-block > 99

40 Comparison of Sum of the Square of Residuals (Hetroscedastic noise - PARAFAC) SSR Model 1 SSR Model 2 Without noise Noisy data Hetro. noise 10% Hetro. noise 20% Denoised data Hetro. noise 10% Hetro. noise 20% Each number × 10 5

41 Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise Relative Errors of Predicted Concentrations for Sample 4 ( Hetroscedastic noise – NPLS model)

42 Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise Relative Errors of Predicted Concentrations for Sample 5 ( Hetroscedastic noise - NPLS model )

43 Analyte 1 Analyte 2 Analyte 3 0 % noise Noisy data 10 % noise % noise Denoised data 10 % noise % noise Relative Errors of Predicted Concentrations for Sample 5 ( Hetroscedastic noise – NPLS model ) Wavelet Denoising (MDL)

44 X-Block Model 1 Y-Block X-Block Model 2 Y-Block noise 0% noise 0% Noisy data noise 2% noise 2% noise 5% noise 5% noise 10% noise 10% Denoised data noise 2% noise 2% noise 5% noise 5% noise 10% noise 10% Comparison of explained variation (Homoscedastic noise – NPLS model)

45 Comparison of explained variation (Hetroscedastic noise – NPLS model) X-Block Model 1 Y-Block X-Block Model 2 Y-Block noise 0% noise 0% Noisy data noise 10% noise 10% noise 20% noise 20% Denoised data noise 10% noise 10% noise 20% noise 20%

46 Comparison of explained variation (Hetroscedastic noise – NPLS model) X-Block Model 1 Y-Block X-Block Model 2 Y-Block noise 0% noise 0% Noisy data noise 10% noise 10% noise 20% noise 20% Denoised data noise 10% noise 10% noise 20% noise 20% Wavelet Denoising (MDL)