Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Statistical Methods

Similar presentations


Presentation on theme: "Multivariate Statistical Methods"— Presentation transcript:

1 Multivariate Statistical Methods
Principal Components Analysis (PCA) By Jen-pei Liu, PhD Division of Biometry, Department of Agronomy, National Taiwan University and Division of Biostatistics and Bioinformatics National Health Research Institutes 2019/2/24 Copyright by Jen-pei Liu, PhD

2 Principal Components Analysis
Introduction Procedures Properties Examples Summary 2019/2/24 Copyright by Jen-pei Liu, PhD

3 Copyright by Jen-pei Liu, PhD
Introduction Described by K. Pearson (1901) Computing methods by Hotelling (1933) Objective To transform the original variables X1,…,Xp into index variables Z1,…,Zp Z1,…,Zp are linear combinations of X1,…,Xp Z1,…,Zp are independent and are in order of important To describe the variation in the data 2019/2/24 Copyright by Jen-pei Liu, PhD

4 Copyright by Jen-pei Liu, PhD
Introduction Lack of correlation  index variables measure different dimensions (domains) Lack of correlation  only consider the variance of index variables and do not have to take covariance into consideration Ordering  Var(Z1)  Var(Z2)  …  Var(Zp) The Z index variables are called the principal components 2019/2/24 Copyright by Jen-pei Liu, PhD

5 Copyright by Jen-pei Liu, PhD
Introduction Variance of the variation in the full data set can be adequately describe by the few Z index variables Reduction of dimension from 2-digit number to just 2 to 4 principal compoents High correlations in the original variables 2019/2/24 Copyright by Jen-pei Liu, PhD

6 Copyright by Jen-pei Liu, PhD
Introduction 2019/2/24 Copyright by Jen-pei Liu, PhD

7 Copyright by Jen-pei Liu, PhD
Introduction Correlations of Female Sparrows X X X X X5 Total length (X1) Alar length (X2) Length of beak and Head (X3) Length of humerus (X4) Length of keel of sternum (X5) 2019/2/24 Copyright by Jen-pei Liu, PhD

8 Copyright by Jen-pei Liu, PhD
Introduction Coefficients for Components Component Variance X1 X2 X3 X4 X5 2019/2/24 Copyright by Jen-pei Liu, PhD

9 Copyright by Jen-pei Liu, PhD
Introduction Z1=0.452X X X X X5 Variance of Z1 is 3.62 Variance of Z1 accounts for 72.3% (3.62/5.00) of the total variation All coefficients of Z1 are smaller than 1 and sum of squares of these coefficients is equal to 1 Z1 is in fact as the average (or sum) of X1, X2, X3, X4, and X5 Z1 can be interpreted as the index for the size of the sparrow 2019/2/24 Copyright by Jen-pei Liu, PhD

10 Copyright by Jen-pei Liu, PhD
Procedures Data Structure Case X1 X2 … Xp 1 x11 x12 … x1p 2 x21 x22 … x2p . N xn1 xn2 … xnp 2019/2/24 Copyright by Jen-pei Liu, PhD

11 Copyright by Jen-pei Liu, PhD
Procedures The First Component The first component is a linear combination of X1, X2, …, Xp Z1= a11X1+a12X2+…+a1pXp Var(Z1) is as large as possible subject to condition that a112+a122+…+a1p2=1 2019/2/24 Copyright by Jen-pei Liu, PhD

12 Copyright by Jen-pei Liu, PhD
Procedures 2019/2/24 Copyright by Jen-pei Liu, PhD

13 Copyright by Jen-pei Liu, PhD
Procedures The second Component The second component is also a linear combination of X1, X2, …, and Xp Z1= a21X1+a22X2+…+a2pXp Var(Z2) is as large as possible subject to condition that a212+a222+…+a2p2=1, Var(Z2) is the second largest, Z1 and Z2 are not correlated 2019/2/24 Copyright by Jen-pei Liu, PhD

14 Copyright by Jen-pei Liu, PhD
Procedures The third Component The third component is also a linear combination of X1, X2, …, and Xp Z1= a31X1+a32X2+…+a3pXp Var(Z2) is as large as possible subject to condition that a312+a322+…+a3p2=1, Var(Z3) is the second largest, Z1, Z2 and Z3 are not correlated 2019/2/24 Copyright by Jen-pei Liu, PhD

15 Copyright by Jen-pei Liu, PhD
Procedures Continue until all p principal components are computed Covariance matrix of p variables 2019/2/24 Copyright by Jen-pei Liu, PhD

16 Copyright by Jen-pei Liu, PhD
Procedures 2019/2/24 Copyright by Jen-pei Liu, PhD

17 Copyright by Jen-pei Liu, PhD
Procedures 2019/2/24 Copyright by Jen-pei Liu, PhD

18 Copyright by Jen-pei Liu, PhD
Procedures Different variables might have different units and magnitudes PCA might be influenced by these magnitudes and units Standardization to have zero mean and unit variance Covariance on standardized variables is the correlation matrix 2019/2/24 Copyright by Jen-pei Liu, PhD

19 Copyright by Jen-pei Liu, PhD
Procedures Steps of (PCA) Standardizing variables X1, X2,…,Xp to have zero means and unit variances unless that the importance of variables is reflected in their variances Calculate the covariance matrix (correlation matrix) 2019/2/24 Copyright by Jen-pei Liu, PhD

20 Copyright by Jen-pei Liu, PhD
Procedures Steps of (PCA) Find the eigenvalues 1, 2,…, p and their corresponding eigenvectors a1, a2, …, ap The coefficients of the ith principal component Zi is the element of ai and i the variance of Zi Discard any components that accounts for only a small proportion of the variation in the data 2019/2/24 Copyright by Jen-pei Liu, PhD

21 Copyright by Jen-pei Liu, PhD
Properties 2019/2/24 Copyright by Jen-pei Liu, PhD

22 Copyright by Jen-pei Liu, PhD
Properties E(Z)=A V(Z)=AA’=diag{I, i=1,…,p} Cov(Zi,Xj)=aiji Corr(Zi,Xj)=aiji/cjj Corr(Zi,Xj)=aiji, if correlation matrix is used 2019/2/24 Copyright by Jen-pei Liu, PhD

23 Copyright by Jen-pei Liu, PhD
Examples Determination of the number of principal components Depends upon the needs of practitioners The proportion of the total variation explained by the selected principal components is high, e.g., at least 80% If correlation matrix is used, select the principal component with the variance greater than 1 because they accounts for more variation than the original variables (=1) Use scree plot 2019/2/24 Copyright by Jen-pei Liu, PhD

24 Copyright by Jen-pei Liu, PhD
Examples Evaluation of Statistics Course 16 students for 11 items (variables) Evaluation scales: 1(poor or not at all) to 5(excellent, strongly, or difficult) The first two principal components explain 76.0% of total variation and the last four principal components explain only 2.2% 2019/2/24 Copyright by Jen-pei Liu, PhD

25 Copyright by Jen-pei Liu, PhD
Examples 2019/2/24 Copyright by Jen-pei Liu, PhD

26 Copyright by Jen-pei Liu, PhD
Examples Test scores of 10 students in 4 subjects Student Subject Chinese(X1) English(X2) Math(X3) Social(X4) Source: Shen (1998) 2019/2/24 Copyright by Jen-pei Liu, PhD

27 Copyright by Jen-pei Liu, PhD
Examples Correlation Matrix X X X X4 X X X X 2019/2/24 Copyright by Jen-pei Liu, PhD

28 Copyright by Jen-pei Liu, PhD
Examples Eigenvalues and Eigenvectors Cum Eigenvector Eigenvalue Prop Prop X X X X4 2019/2/24 Copyright by Jen-pei Liu, PhD

29 Copyright by Jen-pei Liu, PhD
Examples Because the first two principal components account for 94.14%, we can just use these two principal components The first principal component can be interpreted as the index for the sum of Chinese, English and math The second principal component can be thought as social science 2019/2/24 Copyright by Jen-pei Liu, PhD

30 Copyright by Jen-pei Liu, PhD
Examples The above results can be also obtained by inspecting the correlation matrix Correlations among Chinese, English, and math exceed 0.8 Correlations between Chinese, English, and math with social science are below 0.3 2019/2/24 Copyright by Jen-pei Liu, PhD

31 Copyright by Jen-pei Liu, PhD
Examples Correlation between the first principal component with original variables Corr(Z1,X1)=a111 =0.5897 =0.9692 Corr(Z1,X2)=a121 =0.5682 =0.9339 Corr(Z1,X3)=a131 =0.5657 =0.9298 Corr(Z1,X4)=a14i =  =0.1592 2019/2/24 Copyright by Jen-pei Liu, PhD

32 Copyright by Jen-pei Liu, PhD
Examples Correlation between the second principal component with original variables Corr(Z2,X1)=a212 =0.12541.0638=0.1294 Corr(Z2,X2)=a222 = 1.0638= Corr(Z2,X3)=a232 = 1.0638= Corr(Z2,X4)=a242 = 1.0638=0.9856 2019/2/24 Copyright by Jen-pei Liu, PhD

33 Copyright by Jen-pei Liu, PhD
Examples Student 1st Component 2nd Component 2019/2/24 Copyright by Jen-pei Liu, PhD

34 Copyright by Jen-pei Liu, PhD
Examples Correlations of Female Sparrows X X X X X5 Total length (X1) Alar length (X2) Length of beak and Head (X3) Length of humerus (X4) Length of keel of sternum (X5) 2019/2/24 Copyright by Jen-pei Liu, PhD

35 Copyright by Jen-pei Liu, PhD
Examples Coefficients for Components Component Variance X1 X2 X3 X4 X5 2019/2/24 Copyright by Jen-pei Liu, PhD

36 Copyright by Jen-pei Liu, PhD
Examples The first principal component Z1=0.452X X X X X5 An index of bird size The second principal component Z2=-0.051X X X X X5 An index of bird shape 2019/2/24 Copyright by Jen-pei Liu, PhD

37 Copyright by Jen-pei Liu, PhD
Examples The value of the first principal component for the first bird Z1=0.452(-0.542)+0.462(0.725)+0.451(0.177)+ 0.471(0.055)+0.398(-0.33) = 0.064 The value of the second principal component for the first bird Z2=-0.051(-0.542)+0.300(0.725)+0.325(0.177)+ 0.185(0.055)+(-0.877(-0.33) = 0.602 2019/2/24 Copyright by Jen-pei Liu, PhD

38 Copyright by Jen-pei Liu, PhD
Examples Mean Standard Deviation Survivor Nonsurvivor Survivor Nonsurvivor 2019/2/24 Copyright by Jen-pei Liu, PhD

39 Copyright by Jen-pei Liu, PhD
Examples Employment in European Countries AGR MIN MAN PS CON SER FIN SPC TC AGR MIN MAN PS(3) CON SER FIN SPC TC 2019/2/24 Copyright by Jen-pei Liu, PhD

40 Copyright by Jen-pei Liu, PhD
Examples 9 eigenvalues: 3.112(34.6%), 1.809(20.1%), 1.496(16.6%), 1.063(11.8%), 0.710(7.9%) 0.311(3.5%), 0.293(3.3%), 0.204(2.4%), and 0(0.0%) The sum of percent employment is 1 The columns of correlation matrix are linearly dependent The last eigenvalue is 0 2019/2/24 Copyright by Jen-pei Liu, PhD

41 Copyright by Jen-pei Liu, PhD
Examples Select the principal components with eigenvaleues greater than 1  the first 4 principal components that explain 85% of the total variation in the data If we take first two principal components which can account only for 55% of total variation 2019/2/24 Copyright by Jen-pei Liu, PhD

42 Copyright by Jen-pei Liu, PhD
Examples The first principal component Z1=0.51(AGR)+0.37(Min)-0.25(MAN)-0.31(PS)-0.22(CON)-0.38(SER)-0.13(FIN)-0.42(SPS)-0.21(TC) A contrast between AGR(agriculture, forestry, and fishing) and MIN(mining and quarrying) versus others 2019/2/24 Copyright by Jen-pei Liu, PhD

43 Copyright by Jen-pei Liu, PhD
Examples The second principal component Z1=-0.-2(AGR)+0.00(Min)+0.43(MAN) +0.11(PS)-0.24(CON)-0.41(SER) -0.55(FIN)+0.05(SPS)+0.52(TC) A contrast between MAN(manufacturing) and TC(transport and communication) versus CON(construction),SER(service industry) and FIN(finance) 2019/2/24 Copyright by Jen-pei Liu, PhD

44 Copyright by Jen-pei Liu, PhD
2019/2/24 Copyright by Jen-pei Liu, PhD

45 Copyright by Jen-pei Liu, PhD
2019/2/24 Copyright by Jen-pei Liu, PhD

46 Copyright by Jen-pei Liu, PhD
Summary A linear combination of the original variables Try to reduce a large number of variables to a few index variables Index variables are not correlated and ordered in the magnitude of variation Illustration with real examples 2019/2/24 Copyright by Jen-pei Liu, PhD


Download ppt "Multivariate Statistical Methods"

Similar presentations


Ads by Google