Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.

Similar presentations


Presentation on theme: "Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation."— Presentation transcript:

1 Multivariate statistical methods

2 Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation vs. eploration analysis  confirmation – impact on parameter estimate and hypothesis testing  exploration – impact on data exploration, finding out of patterns and structure

3 Multivariate statistical methods Unit classification Cluster analysis Discrimination analysis Analysis of relations among variables Cannonical correlation analysis Factor analysis Principal component analysis

4 Methods for analysis of relations among variables

5 Principal component analysis the oldest and the most used multivariate statistical methods standed by Pearson in 1901 and independently from Pearson also by Hotelling in 1933 principal aims:  detection of relations among variables  reduction of variables number and finding of new purposeful variables

6 Principal component analysis as fundament is linear transformation of original variables into less number of new fictituous variables, so called principal components component characteristics:  are not mutually correlated  for m original variables is r<=m good dimension, r (best a lot less than m) principal components explain sufficiency variability of original variables

7 PCA component characteristics:  method is based on full explanation of total variability  principal components are ordered according share of explained variance  the most of variance is explained by first component, the least by last component

8 PCA procedure starting analysis – exploration of relations among variables (graphs, descriptive statistics) exploration of correlation matrix (existence of correlation among original variables – reduction of variables is possible) principal component analysis, choice of suitable number of components (usually is enough 70 – 90 % of explained variance) interpretation of principal components

9 PCA procedure PCA is based on 1. covariance matrix (the same units of variables, similar variance) 2. correlation matrix (standardized data or different units of variables)

10 Model of PCA → standardized original variable … weights of principal component … prin. components in standardized expression j,k = 1,2, …., p i = 1,2, …., n- number of units j = 1,2, …., p- number of variables

11 PCA – mathematical model original matrix – dataset X (n x m), n objects, m variables Z = [z ij ]standardized matrix X i = 1,…., nj = 1,…., m aim is find out transformation matrix Q, which convert m standardized variables (matrix Z) into m mutual independent component (matrix P) P = Z. Q

12 PCA – mathematical model Modification of P = Z. Q → we get matrix

13 PCA – mathematical model matrix Λ is matrix of covariance and variance of principal components. With regard to independence of principal components are covariances 0 and matrix Λ is diagonal with variances of principal component on diagonal sum of variances standardized variables equals to m. proportions indicate, how large is the share of the first, second, … last component on explanation of the total variance of all variables

14 PCA – mathematical model matrix R is correlation matrix of original variables where Diagonal values of matrix Λ are eigenvalues of matrix R, in columns of matrix Q are eigenvectors related to each eigenvalue

15 PCA – other notions coordinates of nonstandardized principal component are called „score“ matrix of all score for all objects (n) is called „score matrix“ scores for objects are in rows matrix columns are vectors of score

16 PCA – other notions share of total variability of each original variable X i, i = 1, 2,…, m, which is explained by r principals components is called communality of variable X i. is computed as second power of multiple coefficient of correlation → r 2

17 PCA – graphical visualisation Cattel´s graph → scree plot tool for determination of number of principal components

18 PCA – graphical visualization graph of coefficients of correlation (1st and 2nd principal component)

19 PCA – graphical visualization Graph of component score


Download ppt "Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation."

Similar presentations


Ads by Google