pairs(Proteins, font.labels=0.1,gap=0.1,pch=".", cex= ) The scatter plot matrix shows no distinct outliers."> pairs(Proteins, font.labels=0.1,gap=0.1,pch=".", cex= ) The scatter plot matrix shows no distinct outliers.">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries.

Similar presentations


Presentation on theme: "Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries."— Presentation transcript:

1 Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries. Using PCA investigate the relationships between the countries on the basis of these variables. R data “Proteins” PC PCA Example

2 Variables: R - Red meat W- White meat E- Eggs M- Milk Fs- Fish C- Cereals St- Starchy foods Sd- Pulses, nuts and oilseeds FV- Fruits and vegetables T – total protein consumption PC PCA Example

3 PC PCA Example >Proteins=read.csv("E:/Multivariate_analysis/Data/Proteins.csv",header=T,row.nam es=1) # Read the data “proteins” with the first column as the row names: # Check the data for outliers by making a scatter plot matrix: > pairs(Proteins, font.labels=0.1,gap=0.1,pch=".", cex=0.00001) The scatter plot matrix shows no distinct outliers.

4 PC PCA Example

5 PC PCA Example > round(sapply(Proteins,var),2) R W E M Fs C St Sd FV T 11.58 13.99 1.24 50.38 12.04 121.23 2.74 4.08 3.67 45.81 #Calculate the variance for each variable: The variance ranges from 1.24 to 121.23. We will use the normalized data for PCA. # Normalize the data: >NProt=scale(Proteins)

6 PC PCA Example > round(cor(NProt),2) R W E M Fs C St Sd FV T R 1.00 0.19 0.58 0.54 0.06 -0.51 0.15 -0.41 -0.06 0.37 W 0.19 1.00 0.60 0.30 -0.20 -0.44 0.33 -0.67 -0.07 0.10 E 0.58 0.60 1.00 0.61 0.05 -0.70 0.41 -0.60 -0.16 0.19 M 0.54 0.30 0.61 1.00 0.16 -0.59 0.21 -0.62 -0.40 0.46 Fs 0.06 -0.20 0.05 0.16 1.00 -0.52 0.44 -0.12 0.23 -0.09 Cs -0.51 -0.44 -0.70 -0.59 -0.52 1.00 -0.58 0.64 0.04 0.19 St 0.15 0.33 0.41 0.21 0.44 -0.58 1.00 -0.50 0.07 -0.04 Sd -0.41 -0.67 -0.60 -0.62 -0.12 0.64 -0.50 1.00 0.35 -0.08 FV -0.06 -0.07 -0.16 -0.40 0.23 0.04 0.07 0.35 1.00 0.07 T 0.37 0.10 0.19 0.46 -0.09 0.19 -0.04 -0.08 0.07 1.00 # Calculate the correlation matrix for the normalized data:

7 PC PCA Example # Calculate the eigenvectors and eigenvalues for the data: > eigen(cor(NProt)) $values [1] 4.130067e+00 1.739939e+00 1.309278e+00 1.043551e+00 6.990377e-01 4.266669e-01 3.412258e-01 1.906500e-01 1.195844e-01 2.992982e-16 $vectors [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] -0.3180769 0.17809245 -0.38142753 0.039766137 -0.53138781 0.393811788 -0.42940825 0.1592276 0.17150487 -0.20838019 [2,] -0.3140588 0.11783853 0.36420271 -0.538507972 0.09760147 -0.309417061 -0.09254681 0.2919567 0.46186736 -0.22903415 [3,] -0.4202281 0.08236350 0.02047575 -0.155623651 -0.26932734 0.059357751 0.63995627 0.2652806 -0.48098579 -0.06827056 [4,] -0.3870300 0.23356182 -0.19997405 0.320360929 0.15848975 -0.307976584 0.17405921 -0.5444724 0.13218960 -0.43456461 [5,] -0.1271598 -0.57388821 -0.33003267 0.304161366 0.20323386 -0.303075844 -0.06315829 0.5200308 -0.01789764 -0.21247753 [6,] 0.4177240 0.31321549 -0.02354236 -0.104798477 0.29201244 0.196460437 -0.06971238 0.2001491 -0.30436394 -0.67412235 [7,] -0.2880798 -0.41038324 0.05768490 -0.150709175 0.42198545 0.680457657 0.11769041 -0.1889672 0.14706957 -0.10134794 [8,] 0.4177658 -0.04145202 -0.24796403 -0.008042093 -0.22507285 0.087921207 0.57816932 0.0829400 0.58938418 -0.12362100 [9,] 0.1197680 -0.34858202 -0.41210384 -0.643455476 -0.16834367 -0.222568384 -0.08684392 -0.3701826 -0.20995988 -0.11723988 [10,] -0.1062294 0.41709540 -0.58081103 -0.203145847 0.47623561 0.007702046 0.05178373 0.1801923 0.04898111 0.41440004

8 PC PCA Example Extract the principal components: > Proteins_PCA=princomp(NProt,cor=TRUE) > summary(Proteins_PCA,loadings=TRUE) Importance of components: Comp.1 Comp.2 Comp.3 Standard deviation 2.0322567 1.3190673 1.1442370 Proportion of Variance 0.4130067 0.1739939 0.1309278 Cumulative Proportion 0.4130067 0.5870006 0.7179284 Loadings: Comp.1 Comp.2 Comp.3 R -0.318 0.178 -0.381 W -0.314 0.118 0.364 E -0.420 Mi -0.387 0.234 -0.200 Fs -0.127 - 0.574 - 0.330 C 0.418 0.313 -0.105 St -0.288 -0.410 -0.151 Sd 0.418 -0.248 FV 0.120 -0.349 -0.412 T -0.106 0.417 - 0.581 The first principal component accounts for 41% of variance, the second principal component for 17%, and the third for 13%. The total variance explained by the three principal components is 71%.

9 PC PCA Example The equations of the three principal components: Eggs, cereals, and seeds have the highest loadings on the first principal component. Fish, Starch, and total protein consumption have the highest loadings on the second principal component. Fruits and vegetables and total protein consumption have the highest loadings on the third principal component.

10 PC PCA Example >screeplot(Proteins_PCA,main="Proteinns",cex.names=0.75) Plot the variance of the principal components:

11 PC PCA Example Calculate the axis scores : > round(Proteins_PCA$scores,2) Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Albania 3.67 0.65 1.13 1.95 1.92 -0.38 -0.66 -0.31 0.35 0 Austria -1.41 0.72 1.19 -0.95 -0.01 0.77 0.00 0.01 -0.13 0 Belgium -1.70 -0.11 -0.43 -0.25 0.19 -0.93 -0.16 0.34 -0.02 0 Bulgaria 3.05 1.88 -0.07 -0.31 -0.14 0.30 -0.61 0.78 -0.69 0 Czechoslovakia -0.38 0.10 1.24 -0.74 -0.06 -0.38 -0.80 0.04 -0.25 0 Denmark -2.54 -0.19 -0.21 0.96 -0.84 0.67 0.04 1.00 0.17 0 E.Germany -1.26 -1.61 1.97 -0.79 -0.14 -0.60 0.06 0.32 -0.33 0 Finland -1.81 0.77 -0.37 2.33 -1.25 0.18 0.05 -0.81 0.00 0 France -1.68 0.31 -2.54 -1.28 0.23 -0.34 -1.39 0.18 0.40 0 Greece 2.14 0.63 -3.15 -0.32 0.28 0.66 1.21 0.26 0.19 0 Hungary 1.51 0.45 1.64 -1.24 -0.15 0.12 0.83 0.21 0.51 0 Ireland -2.73 1.06 -0.29 -0.16 0.18 -0.88 0.75 -0.20 0.05 0 Italy 1.60 0.01 -0.60 -0.55 1.09 0.79 -0.01 -0.44 -0.83 0 Netherlands -1.74 0.51 0.78 -0.66 0.30 0.94 0.26 -0.09 0.44 0 Norway -0.90 -1.31 -0.19 1.75 -0.45 0.43 -0.01 -0.01 -0.19 0 Poland -0.23 -0.20 -0.41 -1.71 -1.36 0.09 0.03 -0.86 -0.35 0 Portugal 2.13 -4.50 -0.69 0.04 -0.30 0.34 -0.66 0.21 0.31 0 Romania 2.66 1.08 0.60 0.15 -0.54 -0.20 0.22 0.22 0.03 0 Spain 1.60 -2.73 -0.30 -0.24 0.61 -0.62 0.97 -0.42 -0.17 0 Sweden -1.87 -0.37 0.56 1.60 0.16 0.82 0.15 0.25 -0.35 0 Switzerland -0.95 0.98 -0.35 -0.28 0.77 0.72 -0.70 -0.69 0.26 0 UK -2.01 0.57 -0.89 0.62 1.43 -1.23 0.49 0.37 -0.23 0 USSR 0.78 0.49 -0.28 0.42 -1.50 -1.27 -0.32 -0.29 0.04 0 W.Germany -1.72 -0.32 1.24 -0.56 0.83 0.20 0.10 -0.14 0.42 0 Yugoslavia 3.79 1.11 0.42 0.24 -1.25 -0.20 0.16 0.08 0.36 0

12 PC PCA Example Plot PC1 scores vs. PC2 scores: > plot(Proteins_PCA$scores[,2]~Proteins_PCA$scores[,1],xlab="PC1",xlim=c(- 3,5),ylim=c(-5,3),ylab="PC2",pch=16) >text(Proteins_PCA$scores[,1],Proteins_PCA$scores[,2],labels=abbreviate(row.names(Proteins)),cex=0.75,pos=rep(1,25)) > abline(h=0) > abline(v=0) Plot PC1 scores vs. PC3 scores: > plot(Proteins_PCA$scores[,3]~Proteins_PCA$scores[,1],xlab="PC1",xlim=c(- 3,5),ylim=c(-5,3),ylab="PC3",pch=16) >text(Proteins_PCA$scores[,1],Proteins_PCA$scores[,3],labels=abbreviate(row.n ames(Proteins)),cex=0.75,pos=rep(1,25)) > abline(h=0) > abline(v=0) > biplot(Proteins_PCA,xlim=c(-0.4,0.5),ylim=c(-0.3,0.3), xlabs=abbreviate(row.names(Proteins))) Make a biplot showing the variables on the PCA diagram:

13 PC PCA Example Western states and eastern states are situated in two distinct groups showing that the source of proteins in western and eastern European countries are different. Spain and Portugal in PC1~PC2, and Greece and France in PC1~PC3 are isolated from the rest of the countries. PC1 vs. PC2 diagram: PC1 vs. PC3 diagram:

14 PC PCA Example Biplot of the PC1 and PC2 with arrows representing variables that contribute to the separation of countries in groups depending on the main source of proteins. The protein consumption from cereals and seeds are the two variables that distinguish the eastern European countries from the rest. Norway and East Germany are consumers of starch, fish, and fruits and vegetables. The western states are consumers of eggs, red meat, and white meat.


Download ppt "Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries."

Similar presentations


Ads by Google