# BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis.

## Presentation on theme: "BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis."— Presentation transcript:

BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis

Example MANOVA: Bumpus Data 1.State Null/Alternative hypotheses 2.Define model (evaluate assumptions) 3.Calculate S f for the error of the full model 4.Define the reduced “null” model (always contains just an intercept) 5.Calculate S r for the error of the reduced model 6.Calculate a multivariate test statistic S f using and/or S r. (Note: one can use only S f to get a test statistic if performing a randomization test.) 7.Evaluate the probability of the test statistic if the null hypothesis were true a.By converting the test statistic to an F stat, which approximately follows and F distribution b.By performing a randomization test to create an empirical probability distribution 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis These two are linked because we are in a position where we might want to do multiple comparisons. Therefore, what we plot or how we present a table might be contingent upon what a multiple comparisons test reveals Also, multiple comparisons can be done several ways Let’s start by looking at a PC plot of the data (Because we used the original variables in the MANOVA, we must use a covariance matrix) BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > Y.cov.pca<-princomp(Y,cor=F) > par(mfrow=c(1,3)) > # PC 2 vs PC 1 > plot(Y.cov.pca\$scores[,1],Y.cov.pca\$scores[,2],type='n',asp=1) > points(Y.cov.pca\$scores[group=="m.TRUE",1],Y.cov.pca\$scores[group=="m.TRUE",2],pch=22,bg='blue') > points(Y.cov.pca\$scores[group=="m.FALSE",1],Y.cov.pca\$scores[group=="m.FALSE",2],pch=22,col='blue') > points(Y.cov.pca\$scores[group=="f.TRUE",1],Y.cov.pca\$scores[group=="f.TRUE",2],pch=21,bg='red') > points(Y.cov.pca\$scores[group=="f.FALSE",1],Y.cov.pca\$scores[group=="f.FALSE",2],pch=21,col='red') > > # PC 3 vs PC 1 > plot(Y.cov.pca\$scores[,1],Y.cov.pca\$scores[,3],type='n',asp=1) > points(Y.cov.pca\$scores[group=="m.TRUE",1],Y.cov.pca\$scores[group=="m.TRUE",3],pch=22,bg='blue') > points(Y.cov.pca\$scores[group=="m.FALSE",1],Y.cov.pca\$scores[group=="m.FALSE",3],pch=22,col='blue') > points(Y.cov.pca\$scores[group=="f.TRUE",1],Y.cov.pca\$scores[group=="f.TRUE",3],pch=21,bg='red') > points(Y.cov.pca\$scores[group=="f.FALSE",1],Y.cov.pca\$scores[group=="f.FALSE",3],pch=21,col='red') > > # PC 3 vs PC 2 > plot(Y.cov.pca\$scores[,2],Y.cov.pca\$scores[,3],type='n',asp=1) > points(Y.cov.pca\$scores[group=="m.TRUE",2],Y.cov.pca\$scores[group=="m.TRUE",3],pch=22,bg='blue') > points(Y.cov.pca\$scores[group=="m.FALSE",2],Y.cov.pca\$scores[group=="m.FALSE",3],pch=22,col='blue') > points(Y.cov.pca\$scores[group=="f.TRUE",2],Y.cov.pca\$scores[group=="f.TRUE",3],pch=21,bg='red') > points(Y.cov.pca\$scores[group=="f.FALSE",2],Y.cov.pca\$scores[group=="f.FALSE",3],pch=21,col='red')

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > # open circles = dead; closed = survived; blue = male; red = female

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > # 3D scatterplot > library(scatterplot3d) > a<-scatterplot3d(Y.cov.pca\$scores[,1],Y.cov.pca\$scores[,2],Y.cov.pca\$scores[,3],type='p',asp=1) > a\$points(Y.cov.pca\$scores[group=="m.TRUE",1],Y.cov.pca\$scores[group=="m.TRUE",2],Y.cov.pca\$scores[group=="m.TRUE",3],pch=22,bg='blue') > a\$points(Y.cov.pca\$scores[group=="m.FALSE",1],Y.cov.pca\$scores[group=="m.FALSE",2],Y.cov.pca\$scores[group=="m.FALSE",3],pch=22,col='blue') > a\$points(Y.cov.pca\$scores[group=="f.TRUE",1],Y.cov.pca\$scores[group=="f.TRUE",2],Y.cov.pca\$scores[group=="f.TRUE",3],pch=21,bg='red') > a\$points(Y.cov.pca\$scores[group=="f.FALSE",1],Y.cov.pca\$scores[group=="f.FALSE",2],Y.cov.pca\$scores[group=="f.FALSE",3],pch=21,col='red')

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis Have to go to R for this one…. BIOL 582Multivariate ANOVA > # Interactive 3D scatterplot > library(rgl) > col.ind<-as.numeric(group) # f.F = black, f.T = red, m.F = green, m.T =blue > plot3d(Y.cov.pca\$scores[,1],Y.cov.pca\$scores[,2],Y.cov.pca\$scores[,3],col=col.ind) > # note that this distorts aspect ratio

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) BIOL 582Multivariate ANOVA Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Where W is the error (within-group) covariance matrix, found as Hotelling’s T 2 can be converted to an F value by Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Where W is the error (within-group) covariance matrix, found as Hotelling’s T 2 can be converted to an F value by

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) BIOL 582Multivariate ANOVA Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Note, Hotelling’s T 2 is like a t stat that measures differences in means, which in multivariate practice is expressed as a squared distance The square root, d, is the simplest way to express the difference in means Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Note, Hotelling’s T 2 is like a t stat that measures differences in means, which in multivariate practice is expressed as a squared distance The square root, d, is the simplest way to express the difference in means

Example MANOVA: Hotelling T 2 tests BIOL 582Multivariate ANOVA > # Hotelling's T2 comparisons > > m.means<-predict(lm.group, data.frame(group=levels(group))) # multivariate means > rownames(m.means)<-levels(group) # label means > m.means AE BHL FL TTL SW SKL female.FALSE 241.6500 31.38000 17.91716 28.55595 15.19936 20.71243 female.TRUE 242.3793 31.53793 18.14699 28.92972 15.30394 21.05397 male.FALSE 246.2273 31.59091 18.07095 28.71355 15.32948 21.40527 male.TRUE 247.6744 31.67209 18.21771 28.89102 15.32329 21.74004 > mean.diffs<-dist(m.means) # all elements are differences in means > means.n<-by(AE,group,length) # the sample sizes for each mean > > g<-nrow(m.means) # number of group means > > # Create empty matrices for results > HT2.matrix<-F.matrix<-P.matrix<-matrix(0,g,g,dimnames=list(c(levels(group)),c(levels(group)))) > W<-1/(lm.group\$df.residual)*Sf # error covariance matrix > for(i in 1:g){ + y1<-m.means[i,];n1<-means.n[i] # by rows + for(j in 1:g){ + y2<-m.means[j,];n2<-means.n[j] # by rows also + T2<-n1*n2/(n1+n2)*t(y1-y2)%*%solve(W)%*%(y1-y2) + F<-(n1+n2-ncol(Y)-1)/((n1+n2-lm.group\$rank)*ncol(Y))*T2 + P<-1-pf(F,ncol(Y),(n1+n2-ncol(Y)-1)) + # fill in result matrices + HT2.matrix[i,j]<-T2;F.matrix[i,j]<-F;P.matrix[i,j]<-P + } >

Example MANOVA: Hotelling T 2 tests BIOL 582Multivariate ANOVA > > mean.diffs female.FALSE female.TRUE male.FALSE female.TRUE 0.9364565 male.FALSE 4.6412666 3.8712015 male.TRUE 6.1361939 5.3416972 1.5053122 > HT2.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE 0.000000 2.965359 16.530255 28.514610 female.TRUE 2.965359 0.000000 22.752539 34.003285 male.FALSE 16.530255 22.752539 0.000000 4.481377 male.TRUE 28.514610 34.003285 4.481377 0.000000 > F.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE 0.0000000 0.4612781 2.6172904 4.5107858 female.TRUE 0.4612781 0.0000000 3.6272163 5.4171901 male.FALSE 2.6172904 3.6272163 0.0000000 0.7198999 male.TRUE 4.5107858 5.4171901 0.7198999 0.0000000 > P.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE 1.0000000000 0.8328550545 0.026042668 0.0008545144 female.TRUE 0.8328550545 1.0000000000 0.003596766 0.0001367368 male.FALSE 0.0260426678 0.0035967655 1.000000000 0.6347116030 male.TRUE 0.0008545144 0.0001367368 0.634711603 1.0000000000 Note that females and males differ, but there are no differences in morphology between surviving and non-surviving birds

Example MANOVA: Randomization test BIOL 582Multivariate ANOVA > # Randomization Test approach > > P<-matrix(1,g,g) > permute<-999 > md.obs<-as.matrix(mean.diffs) # make sure R know this is a matrix > for(i in 1:permute){ + Y.r<-Y[sample(nrow(Y)),] # shuffle rows, not all values + lm.group.r<-lm(Y.r~group) + m.means.r<-predict(lm.group.r, data.frame(group=levels(group))) + mean.diffs.r<-as.matrix(dist(m.means.r)) # random mean differences + P =md.obs,P+1,P+0) # logical comparisons + } > dimnames(P)<-dimnames(md.obs) > P.values<-P/(permute+1) # P-values > > mean.diffs female.FALSE female.TRUE male.FALSE female.TRUE 0.9364565 male.FALSE 4.6412666 3.8712015 male.TRUE 6.1361939 5.3416972 1.5053122 > P.values female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE 1.000 0.620 0.003 0.001 female.TRUE 0.620 1.000 0.003 0.001 male.FALSE 0.003 0.003 1.000 0.216 male.TRUE 0.001 0.001 0.216 1.000 > Note that females and males differ, but there are no differences in morphology between surviving and non-surviving birds

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) Here is one way to present results (could do it the same way with Hotelling’s T 2 BIOL 582Multivariate ANOVA Table 1. Morphological distances (below diagonal) and corresponding P-values (above diagonal), based on 1,000 random permutations of shuffling subjects among the sex-survival groups. Bolded values represent sexual dimorphisms within survival type; italicized values are survival-type comparisons, within sex. FemaleMale non-survivedsurvivednon-survivedsurvived Female non-survived0.6200.0030.001 survived0.9360.0030.001 Male non-survived4.6413.8710.216 survived6.1365.3421.505

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) Discriminant analyses are useful, but often misused Rather than ask if groups are different, one asks (1) if subjects can be assigned to the correct group, and (2) which variables might be best for distinguishing group differences Also, a plot can be made to show group distinction (this is where misuse if often made… but not the only place) BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) Part 1: Classification Without getting into too much detail, the classification part of DA measures the probability that any subject belongs to a certain group. This can be done several ways (and the way it is generally done is not the best). The typical way is to use the same data used to estimate group means. Then the generalized Mahalanobis distance is measured for every i th subject to the j th group mean. Based on these distances, the posterior classification probability of belonging to each group is determined (smaller distance = higher probability). The prior classification probability is determined by group size (prior- probabilities = n j / Σn j ) Comparison of prior and posterior probabilities can help elucidate whether groups are clustered in the multivariate data space BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) Part 2: Variable loading Recall that the multivariate equivalent to the F value was the F matrix: A linear discriminant analysis (LDA) performs an eigenanalysis on this matrix. The eigenvectors are called canonical vectors. The variables that load highest on these vectors (or just the first vector) best characterize differences among groups. Projecting data onto these vectors produces a scatterplot (Canonical variate plot) that shows the best statistical discrimination of groups, if it exists. The plot does not show actual relationships of objects; rather it shows statistical relationships. If groups are distinct, the scatter will indicate it. Sometimes LDA is called a canonical variates analysis (CVA), which sounds much like principal components analysis. BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) These are really superficial descriptions of DA. A better treatment of the subject requires a multivariate stats course. However, be aware that some who work with multivariate data use such an analysis as a post-hoc method for dealing with MANOVA results. BIOL 582Multivariate ANOVA > #LDA Example > library(MASS) > > lda.group<-lda(group~Y,cv=True) # reverse of linear model formula > lda.fit<-predict(lda.group,group) # calculates post-probabilities and CVs > fit.table<-table(group,lda.fit\$class) # a summary table of classifications > fit.table # rows are actual; columns are predicted group female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE 3 7 8 2 female.TRUE 1 15 10 3 male.FALSE 2 7 17 18 male.TRUE 0 2 15 26 > > # classification success > CS<-sum(diag(fit.table))/nrow(Y)*100 > CS # very low classification.... [1] 44.85294 > > # However, looking at the table reveals that males tend to be classified as males > # which suggests males might be more distinct > # Even if there was a high classification success rate > # one has to use caution, because the same values were used to calculate > # group means (i.e., inherently better rate to expect than equal prior probability)

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > lda.group\$prior # what is expected by chance female.FALSE female.TRUE male.FALSE male.TRUE 0.1470588 0.2132353 0.3235294 0.3161765 > lda.fit\$posterior[1:10,] # the first 10 posterior probabilities female.FALSE female.TRUE male.FALSE male.TRUE 1 0.09162555 0.05118162 0.4930559 0.3641369 2 0.16343518 0.28885553 0.3121966 0.2355127 3 0.11406832 0.14269727 0.4342863 0.3089481 4 0.04710183 0.06797484 0.4262030 0.4587203 5 0.14647994 0.30878952 0.2968586 0.2478719 6 0.08462697 0.07996460 0.4483835 0.3870250 7 0.13939473 0.14526545 0.3868157 0.3285241 8 0.08688642 0.13789299 0.4070071 0.3682135 9 0.10429253 0.08706685 0.3712054 0.4374353 10 0.06296977 0.18031920 0.3780656 0.3786454 >

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > (lda.fit\$posterior-lda.group\$prior)[1:10,] # look for highest positive difference in each row female.FALSE female.TRUE male.FALSE male.TRUE 1 -0.0554332753 -0.09587720 0.34599707 0.217078117 2 -0.0498001173 0.07562023 0.09896126 0.022277445 3 -0.2094610894 -0.18083214 0.11075691 -0.014581332 4 -0.2690746455 -0.24820163 0.11002655 0.142543849 5 -0.0005788862 0.16173070 0.14979977 0.100813116 6 -0.1286083231 -0.13327069 0.23514816 0.173789675 7 -0.1841346788 -0.17826396 0.06328632 0.004994674 8 -0.2292900492 -0.17828348 0.09083063 0.052037010 9 -0.0427662902 -0.05999197 0.22414653 0.290376435 10 -0.1502655276 -0.03291610 0.16483031 0.165410143

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > (lda.fit\$posterior-lda.group\$prior)[1:10,] # look for highest positive difference in each row female.FALSE female.TRUE male.FALSE male.TRUE 1 -0.0554332753 -0.09587720 0.34599707 0.217078117 2 -0.0498001173 0.07562023 0.09896126 0.022277445 3 -0.2094610894 -0.18083214 0.11075691 -0.014581332 4 -0.2690746455 -0.24820163 0.11002655 0.142543849 5 -0.0005788862 0.16173070 0.14979977 0.100813116 6 -0.1286083231 -0.13327069 0.23514816 0.173789675 7 -0.1841346788 -0.17826396 0.06328632 0.004994674 8 -0.2292900492 -0.17828348 0.09083063 0.052037010 9 -0.0427662902 -0.05999197 0.22414653 0.290376435 10 -0.1502655276 -0.03291610 0.16483031 0.165410143 > lda.fit\$class[1:10] [1] male.FALSE male.FALSE male.FALSE male.TRUE female.TRUE male.FALSE male.FALSE [8] male.FALSE male.TRUE male.TRUE Levels: female.FALSE female.TRUE male.FALSE male.TRUE > Compare to actual > group[1:10] [1] male.TRUE male.FALSE male.FALSE male.TRUE male.TRUE male.FALSE male.TRUE male.FALSE [9] male.TRUE male.FALSE Levels: female.FALSE female.TRUE male.FALSE male.TRUE

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) BIOL 582Multivariate ANOVA > # Variable loadings > lda.group Call: lda(group ~ Y, cv = True) Prior probabilities of groups: female.FALSE female.TRUE male.FALSE male.TRUE 0.1470588 0.2132353 0.3235294 0.3161765 Group means: YAE YBHL YFL YTTL YSW YSKL female.FALSE 241.6500 31.38000 17.91716 28.55595 15.19936 20.71243 female.TRUE 242.3793 31.53793 18.14699 28.92972 15.30394 21.05397 male.FALSE 246.2273 31.59091 18.07095 28.71355 15.32948 21.40527 male.TRUE 247.6744 31.67209 18.21771 28.89102 15.32329 21.74004 Coefficients of linear discriminants: LD1 LD2 LD3 YAE -0.2141290 -0.1153824 -0.015984182 YBHL 0.1042882 -0.3799127 -0.036351076 YFL 0.2203001 1.1769264 -0.989261124 YTTL 0.4893930 0.3703951 0.272006957 YSW 0.3982302 -0.4163508 2.997840427 YSKL -0.4043300 0.7320153 -0.005272685 Proportion of trace: LD1 LD2 LD3 0.9034 0.0820 0.0146 > > # note: the number of CVs = g-1, where g is group levels > # proportion of trace is the amount of information explained > # by each CV. This indicates how many dimensions are neeed > # to explain group differences, most likely TTL, SW, and SKL are the variables that best describe group differences. SW and SKL differ in sign, meaning groups might have either long-slender skulls or short- wide skulls. Wide skulls positive covary with long legs, in distinguishing groups

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) BIOL 582Multivariate ANOVA > # CV plot > CV.scores<-lda.fit\$x[,1:2] > > plot(CV.scores,asp=1,type='n') > points(CV.scores[group=='female.FALSE',],col='red') > points(CV.scores[group=='female.TRUE',],pch=21,bg='red') > points(CV.scores[group=='male.FALSE',],pch=22,col='blue') > points(CV.scores[group=='male.TRUE',],pch=22,bg='blue') The reason for low classification is apparent now. This is a good example of how one can have sufficient statistical power but not have very meaningful conclusions about the biology. Remember this plot is only an abstract statistical visual aid. Some people make the mistake of showing these plots as “data plots” But the axes do not correspond to linear combinations of variables that express dispersion in the data; they correspond to linear combinations that best separate groups

Just for reference….. BIOL 582Multivariate ANOVA CV1 CV2 CV1 CV2 CV1 CV2 Classification success (discrimination ability)

Download ppt "BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis."

Similar presentations