Xuhua Xia Slide 1 Correlation Simple correlation –between two variables Multiple and Partial correlations –between one variable and a set of other variables.

Presentation on theme: "Xuhua Xia Slide 1 Correlation Simple correlation –between two variables Multiple and Partial correlations –between one variable and a set of other variables."— Presentation transcript:

Xuhua Xia Slide 1 Correlation Simple correlation –between two variables Multiple and Partial correlations –between one variable and a set of other variables Canonical Correlation –between two sets of variables each containing more than one variable. Simple and multiple correlations are special cases of canonical correlation. Multiple: x 1 on x 2 and x 3 Partial: between X and Y with Z being controlled for

Xuhua Xia Slide 2 Review of correlation XZY 1414.0000 1517.9087 1616.3255 2314.4441 2415.2952 2519.1587 2616.0299 2517.0000 3314.7556 3417.6823 3520.5301 3621.6408 4315.0903 4418.1603 4522.2471 5214.4450 5316.5554 5421.0047 5522.0000 6119.0000 6218.0000 6318.1863 6421.0000 Compute Pearson correlation coefficients between X and Z, X and Y and Z and Y. Compute partial correlation coefficient between X and Y, controlling for Z (i.e., the correlation coefficient between X and Y when Z is held constant), by using the equation in the previous slide. Run R to verify your calculation: install.packages("ggm") library(ggm) md<-read.table("XYZ.txt",header=T) cor(md) s<-var(md) parcor(s) install.packages("psych") library(psych) smc(s)

Data for canonical correlation Xuhua Xia Slide 3 # First three variables: physical # Last three variables: exercise # Middle-aged men weightwaistpulsechinssitupsjumps 1913650516260 193385812101101 18935461314558 2113856815138 17631741520040 16934501712038 154346414215105 1933646617031 1763754416025 15633541521573 1893752213060 16235621214537 1823656414142 1673460615540 154305617251250 166335213210115 247465015050 202376212120120 15732521123080 1383368215043

Xuhua Xia Slide 4 Many Possible Correlations With multiple DV’s (say A, B, C) and IV’s (say a, b, c, d, e), there could be many correlation patterns: –Variable A in the DV set could be correlated to variables a, b, c in the IV set –Variable B in the DV set could be correlated to variables c, d in the IV set –Variable C in the DV set could be correlated to variables a, c, e in the IV set With these plethora of possible correlated relationships, what is the best way of summarizing them?

Xuhua Xia Slide 5 Dealing with Two Sets of Variables The simple correlation approach: –For N DV’s and M IV’s, calculate the simple correlation coefficient between each of N DV’s and each of M IV’s, yielding a total of N*M correlation coefficients The multiple correlation approach: –For N DV’s and M IV’s, calculate multiple or partial correlation coefficients between each of N DV’s and the set of M IV’s, yielding a total of N correlation coefficients The canonical correlation Note: All these deal with linear correlations

Correlation matrix Xuhua Xia Slide 6 md<-read.table("Cancor.txt",header=T) attach(md) R<-cor(md) R weight waist pulse chins situps jumps weight 1.00000 0.86958 -0.365762 -0.38969 -0.70557 -0.226296 waist 0.86958 1.00000 -0.333131 -0.58893 -0.83610 -0.344578 pulse -0.36576 -0.33313 1.000000 0.15065 0.15723 0.034933 chins -0.38969 -0.58893 0.150648 1.00000 0.50058 0.495760 situps -0.70557 -0.83610 0.157234 0.50058 1.00000 0.461611 jumps -0.22630 -0.34458 0.034933 0.49576 0.46161 1.000000

Multiple correlations Slide 7 fit<-lm(weight~chins+situps+jumps);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 237.81551 14.97422 15.882 3.23e-11 *** chins -0.49462 0.99778 -0.496 0.62683 situps -0.37270 0.10717 -3.478 0.00311 ** jumps 0.07798 0.10038 0.777 0.44861 Multiple R-squared: 0.5178, Adjusted R-squared: 0.4274 fit<-lm(waist~chins+situps+jumps);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 44.904132 1.470390 30.539 1.3e-15 *** chins -0.178739 0.097977 -1.824 0.086838. situps -0.053678 0.010524 -5.101 0.000107 *** jumps 0.009669 0.009857 0.981 0.341223 Multiple R-squared: 0.7527, Adjusted R-squared: 0.7063 fit<-lm(pulse~chins+situps+jumps);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 52.09212 6.17858 8.431 2.79e-07 *** chins 0.17474 0.41170 0.424 0.677 situps 0.02021 0.04422 0.457 0.654 jumps -0.01279 0.04142 -0.309 0.762 Multiple R-squared: 0.03736, Adjusted R-squared: -0.1431

Multiple correlation Slide 8 fit<-lm(chins~weight+waist+pulse);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 47.179551 16.226537 2.908 0.0103 * weight 0.106933 0.084510 1.265 0.2239 waist -1.602230 0.608407 -2.633 0.0181 * pulse -0.006223 0.151557 -0.041 0.9678 Multiple R-squared: 0.4084, Adjusted R-squared: 0.2974 fit<-lm(situps~weight+waist+pulse);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 656.41462 102.44989 6.407 8.68e-06 *** weight 0.09125 0.53357 0.171 0.86636 waist -13.10675 3.84132 -3.412 0.00357 ** pulse -0.88500 0.95689 -0.925 0.36877 Multiple R-squared: 0.7161, Adjusted R-squared: 0.6629 fit<-lm(jumps~weight+waist+pulse);summary(fit) Estimate Std. Error t value Pr(>|t|) (Intercept) 318.4270 189.2686 1.682 0.112 weight 0.5820 0.9857 0.590 0.563 waist -9.2426 7.0966 -1.302 0.211 pulse -0.4683 1.7678 -0.265 0.794 Multiple R-squared: 0.1445, Adjusted R-squared: -0.01585

Canonical correlation (cc) install.packages("ggplot2") install.packages("Ggally") install.packages("CCA") install.packages("CCP") require(ggplot2) require(GGally) require(CCA) require(CCP) phys<-md[,1:3] exer<-md[,4:6] matcor(phys,exer) cc1<-cc(phys,exer) cc1 http://www.ats.ucla.edu/stat/r/dae/canonical.htm

cc output [1] 0.87857805 0.26499182 0.06266112 \$xcoef [,1] [,2] [,3] weight 0.007691932 0.08206036 -0.01089895 waist -0.352367502 -0.46672576 0.12741976 pulse -0.016888712 0.04500996 0.14113157 \$ycoef [,1] [,2] [,3] chins 0.063996632 0.19132168 0.116137756 situps 0.017876736 -0.01743903 0.001201433 jumps -0.002949483 0.00494516 -0.022700322 canonical correlations raw canonical coefficients matrices: U and V phys*U: raw canonical variates for phys exer*V: raw canonical variates for exer \$scores\$xscores: standardized canonical variates.

standardized canonical variates \$scores\$xscores [,1] [,2] [,3] [1,] -0.06587452 0.39294336 -0.90048466 [2,] -0.89033536 -0.01630780 0.46160952 [3,] 0.33866397 0.51550858 -1.57063280 [4,] -0.71810315 1.37075870 -0.01683463 [5,] 1.17525491 2.57590579 2.01305832 [6,] 0.46963797 -0.47893295 -0.91554740 [7,] 0.11781701 -1.07969890 1.22377873 [8,] 0.01706419 0.37702424 -1.48680882 [9,] -0.60117586 -1.12464792 -0.04505445 [10,] 0.65445550 -0.89895199 -0.33675460 [11,] -0.46740331 -0.14788320 -0.46900387 [12,] -0.13923760 -0.97996173 0.98174380 [13,] -0.23643419 -0.07554011 0.04439525 [14,] 0.28536698 -0.19295410 0.51756617 [15,] 1.66239672 0.42712450 -0.41495287 [16,] 0.76515225 -0.16836835 -0.72800719 [17,] -3.15880133 0.32106568 -0.23662794 [18,] -0.53629531 1.36900100 0.80062552 [19,] 1.04829236 -0.44018579 -0.75733645 [20,] 0.27955875 -1.74589901 1.83526836 \$scores\$yscores [,1] [,2] [,3] [1,] -0.23742244 -0.91888370 -0.28185833 [2,] -1.00085572 1.68690015 -0.47289464 [3,] -0.02345494 0.89826285 0.67222000 [4,] -0.17718803 -0.26188291 0.55274626 [5,] 1.14084951 0.23274696 1.37918010 [6,] -0.15539717 2.00062200 1.56074166 [7,] 1.15328755 0.10127530 -0.19445711 [8,] 0.05512308 -1.01048386 0.50220023 [9,] -0.23394065 -1.24840794 0.39411232 [10,] 1.31166763 0.13435186 0.64809096 [11,] -1.00146790 -0.93479995 -0.66871744 [12,] -0.02551244 0.60309281 1.03278901 [13,] -0.62373985 -0.83299874 -0.01462037 [14,] -0.23957331 -0.70439205 0.27987584 [15,] 1.56116497 0.76448365 -3.09433899 [16,] 0.97041241 0.04660035 -0.54360525 [17,] -2.46610861 0.21954878 -0.65396658 [18,] -0.71723790 1.44951672 -0.88137354 [19,] 1.30318577 -0.85790412 0.04265917 [20,] -0.59379197 -1.36764817 -0.25878331

Canonical structure: Correlations \$scores\$corr.X.xscores [,1] [,2] [,3] weight -0.8028458 0.53345479 -0.2662041 waist -0.9871691 0.07372001 -0.1416419 pulse 0.2061478 0.10981908 0.9723389 \$scores\$corr.Y.xscores [,1] [,2] [,3] chins 0.6101751 0.18985890 0.004125743 situps 0.8442193 -0.05748754 -0.010784582 jumps 0.3638095 0.09727830 -0.052192182 \$scores\$corr.X.yscores [,1] [,2] [,3] weight -0.7053627 0.14136116 -0.016680651 waist -0.8673051 0.01953520 -0.008875444 pulse 0.1811170 0.02910116 0.060927845 \$scores\$corr.Y.yscores [,1] [,2] [,3] chins 0.6945030 0.7164708 0.06584216 situps 0.9608928 -0.2169408 -0.17210961 jumps 0.4140890 0.3670993 -0.83292764 correlation between phys variables with CVs_U correlation between exer variables with CVs_U correlation between phys variables with CVs_V correlation between exer variables with CVs_V

Significance: p.asym in CCP vCancor<-cc1\$cor # p.asym(rho,N,p,q, tstat = "Wilks|Hotelling|Pillai|Roy") p.asym(vCancor,length(md\$weight),3,3, tstat = "Wilks") Wilks' Lambda, using F-approximation (Rao's F): stat approx df1 df2 p.value 1 to 3: 0.2112505 3.4003788 9 34.22293 0.004421278 2 to 3: 0.9261286 0.2933756 4 30.00000 0.879945478 3 to 3: 0.9960736 0.0630703 1 16.00000 0.804904236 plt.asym(res,rhostart=1) plt.asym(res,rhostart=2) plt.asym(res,rhostart=3) At least one cancor significant? Significant relationship after excluding cancor 1? Significant relationship after excluding cancor 1 and 2?

Slide 14 Ecology data: Assignment # 24 sites; for each site, record coverage of four species and concentration of four chemicals 21.0921.909.199.1820.9621.527.467.41 14.6914.8514.0614.0714.8014.6313.7113.69 2.112.173.133.063.172.432.101.96 9.589.478.148.069.549.719.369.43 10.0210.719.029.0611.1610.5910.9111.10 14.6514.3215.1015.1514.5914.6113.5513.55 24.4224.126.006.1224.3624.504.304.34 22.2022.104.144.0423.3722.744.905.06 8.348.889.169.068.758.197.597.58 10.4910.1211.0811.1310.0910.739.559.56 25.7225.911.121.1625.9426.011.981.99 4.164.443.053.093.974.894.534.53 12.0712.3111.0911.1512.6812.8912.6212.78 19.1319.3611.1311.0518.6919.059.019.16 5.805.154.114.186.076.335.104.96 1.271.152.102.171.271.800.730.75 22.1522.528.018.0422.0822.537.437.31 26.5326.270.140.1126.3326.880.550.57 17.2517.6811.1211.1817.3917.769.519.55 7.947.466.136.037.537.677.517.47 4.124.453.083.145.214.653.924.00 17.5917.5311.1911.0416.9716.7012.3012.26 15.4115.1613.1213.0315.7916.0112.0011.83 12.9012.9311.1211.1212.8012.0411.5211.52 19.1419.117.167.1419.8819.848.868.90 25.1125.503.133.2025.2825.444.264.23

Download ppt "Xuhua Xia Slide 1 Correlation Simple correlation –between two variables Multiple and Partial correlations –between one variable and a set of other variables."

Similar presentations