Factor and Principle Component Analysis

Factor and Principle Component Analysis

Combine correlated variables
If X and Y are strongly correlated, don’t need both- redundant r = 0.94

Combine correlated variables
If X and Y are strongly correlated, should use X or Y? Can make a new variable “F” F = a X + b Y + error Most of the variation (information) in X and Y is now contained in F. One variable instead of two. The stronger the correlation, the more “F” contains all the information.

Standardized data Since we are only interested in the relative differences and correlations among the data, it is easier to work with the standardized data. If “X” is the original variable, we compute Z = (X – mean X) / SD(x) Z has overall mean 0 and SD=1 For Z, the correlation and covariances are the same. If each Z has variance =SD2=1, so if there are “K” variables, their total variance is K.

Consolidating many correlated variables

Correlation matrix, K=9 A G C B I D E F H 1 -0.215 0.912 0.936 -0.189
0.920 -0.199 -0.250 -0.177 0.884 -0.239 -0.242 0.856 0.855 0.908 -0.192 0.923 -0.212 -0.227 -0.129 0.911 0.9118 -0.147 -0.148 -0.202 -0.184 0.870 0.843 0.903 -0.235 -0.234 -0.206 -0.224 0.846

Sorted Correlation matrix
B C D E F G H I 1 0.936 0.912 0.920 -0.199 -0.215 -0.189 0.908 0.911 -0.147 -0.177 -0.148 -0.129 0.923 -0.212 -0.250 -0.227 -0.192 0.903 -0.235 -0.239 -0.234 -0.202 -0.206 -0.242 -0.224 -0.184 0.856 0.846 0.870 0.855 0.884 0.843

Heat map

Make K factors – keep most important
Initially, if have K variables, we make K factors where each “factor” is uncorrelated (orthogonal) with the others. The factor with the largest variance (also called the “eigenvalue”) is denoted “factor 1” and has the most “information”. The factor with the next largest variance is factor 2 etc. Keep the factors whose variance is larger than 1.0 – or examine scree plot.

Make K factors, K=9 Factor 1 = a11 X1 + a12 X2 + a13 X3 + … a19 X9 …
The aij values (weights) are chosen so the K factors are mutually orthogonal. Can compute variance (and SD) of each factor. Means are zero by definition. Note that this assumes linearity!

Eigenvalues = factor variances

Eigenvalues (variance accounted for) Scree plot
factor variance Percent Cum Percent 1 5.185 57.61 2 3.071 34.12 91.73 3 0.168 1.87 93.60 4 0.152 1.69 95.29 5 0.119 1.32 96.61 6 0.094 1.04 97.65 7 0.090 1.00 98.65 8 0.070 0.78 99.43 9 0.051 0.57 100.0 total 9.000 100.00 --

Make two factors variable Factor 1 Factor 2 A 0.964 -0.107 B 0.958
Rotated factor loadings variable Factor 1 Factor 2 A 0.964 -0.107 B 0.958 -0.053 C 0.944 -0.130 D 0.949 -0.137 E 0.945 -0.123 F -0.104 0.917 G -0.128 0.929 H -0.112 0.902 I -0.081 0.936

Factor loadings Factor 1=0.964 A B C D E + error Factor 2 =0.917 F G H I + error Coefficients are (approximately) the correlation of the variable with the factor. For example, is (approximately) the correlation of A with Factor 1. total variance accounted for by factors factor variance pct cum pct % 50.9% % 89.3%

Factors are uncorrelated (orthogonal) with each other
Factors are uncorrelated (orthogonal) with each other. They represent non redundant information

Communalities How much of the variation in each variable is accounted for by the factor(s) – similar to R2. variable value A 0.940 B 0.921 C 0.908 D 0.919 E 0.907 F 0.852 G 0.878 H 0.826 I 0.883

WGCNA- Weighted gene co-expression network analysis – Horvath (UCLA)

Factors can have factors

Power adjacency function results in a weighted gene network
Often choosing beta=6 works well but in general we use the “scale free topology criterion” described in Zhang and Horvath 2005.

Factor and Principle Component Analysis

Similar presentations

Presentation on theme: "Factor and Principle Component Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Factor and Principle Component Analysis

Similar presentations

Presentation on theme: "Factor and Principle Component Analysis"— Presentation transcript:

Similar presentations

About project

Feedback