Download presentation
Presentation is loading. Please wait.
1
A Robust Approach for Dealing with Missing Values in Compositional Data Karel Hron, Matthias Templ, Peter Filzmoser ICORS’08, Antalya, 8. 9. 2008
2
Compositional data (CoDa)... D-part composition and contain essentially the same information simplex – sample space of D-part compositions D-1 dimensionality of compositions
3
Standard statistics and CoDa difficulties when applying standard statistical methods (like correlation analysis and PCA) the results can be completely useless reason: sample space of CoDa, induces different geometrical structure (Aitchison geometry) solution: family of logratio transformations from the simplex to real space (Aitchison, 1986) in case of missing values in CoDa allow for a reasonable imputation
4
Isometric logratio transformations shortly ilr (Egozcue et al., 2003), result in D-1 dimensional real space regularity of transformed data is provided, necessary for robust statistical methods isometry
5
Ilr and balances interpretation of ilr coordinates (balances) in the sense of original compositional parts is not possible reason: definition of CoDa solution: split the parts into separated groups and order balances this construction is provided using a special procedure, called sequential binary partition
6
Ilr and balances result of a special choice of sequential binary partition (SBP)
7
Outliers and CoDa 1) caused by Aitchison geometry: provide measure of differences between the compositions in a natural way, respecting their relative scale property distinguish between the following two differences within compositional parts, 0.500 and 0.501 vs. 0.001 and 0.002 consequence: the error term in the parts is not the same for values close to the baricentre or to the border of the simplex
8
Outliers and CoDa solution: using ilr transformation and outlier detection (Filzmoser and Hron, 2008)
9
Outliers and CoDa 2) caused by definition of CoDa: each observed composition is a member of the corresponding equivalence class every two compositions from the same class have zero Aitchison distance low and high values of c can simultaneously cause high Euclidean distance
10
Outliers and CoDa
11
Missing values in CoDa sets most statistical methods cannot be directly applied on data sets with missing information removing incomplete observations can cause an unacceptable loss of information most of imputation methods use assumptions like missing at random (MAR) and normality of the data outliers could have a dramatical influence on the estimation of missing values
12
Missing values in CoDa sets with robust imputation methods the estimation of missings is based on the majority of the data existing robust methods may not deal with compositional data (another geometry of the data and wrong identification of outliers) => a more effective way of dealing with CoDa for imputation, with respect to the Aitchison geometry, is needed
13
Robust imputation of missing values for CoDa we propose an iterative procedure to estimate the missing values initialization of the missings: fast kNN (Aitchison) compositional part with highest amount of missings is chosen and the data are transformed using proper ilr transformation – missing values from the chosen part (x 1 ) appear in one ilr variable and does not contaminate the others
14
Robust imputation of missing values for CoDa consequently, fast LTS regression (able to deal also with large data sets) of z 1 on z 2,…,z D-1 is prefered, but also other robust methods can be considered missing values are imputed for any variable (starting from the highest amount of missings) procedure is repeated in an iterative manner till convergence
15
Simulation study
17
References Aitchison, J., 1986, The statistical analysis of compositional data. Chapman and Hall, London. Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueraz, G., Barceló-Vidal, C., 2003, Isometric logratio transformations for compositional data analysis. Math. Geol., vo. 35, no. 3, p. 279-300. Filzmoser, P., Hron, K., 2008, Outlier detection for compositional data using robust methods. Math. Geosci., vo. 40, no. 3, p. 233-248.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.