Download presentation

Presentation is loading. Please wait.

Published byPrincess Campen Modified about 1 year ago

1
BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis

2
Example MANOVA: Bumpus Data 1.State Null/Alternative hypotheses 2.Define model (evaluate assumptions) 3.Calculate S f for the error of the full model 4.Define the reduced “null” model (always contains just an intercept) 5.Calculate S r for the error of the reduced model 6.Calculate a multivariate test statistic S f using and/or S r. (Note: one can use only S f to get a test statistic if performing a randomization test.) 7.Evaluate the probability of the test statistic if the null hypothesis were true a.By converting the test statistic to an F stat, which approximately follows and F distribution b.By performing a randomization test to create an empirical probability distribution 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA

3
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis These two are linked because we are in a position where we might want to do multiple comparisons. Therefore, what we plot or how we present a table might be contingent upon what a multiple comparisons test reveals Also, multiple comparisons can be done several ways Let’s start by looking at a PC plot of the data (Because we used the original variables in the MANOVA, we must use a covariance matrix) BIOL 582Multivariate ANOVA

4
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > Y.cov.pca<-princomp(Y,cor=F) > par(mfrow=c(1,3)) > # PC 2 vs PC 1 > plot(Y.cov.pca$scores[,1],Y.cov.pca$scores[,2],type='n',asp=1) > points(Y.cov.pca$scores[group=="m.TRUE",1],Y.cov.pca$scores[group=="m.TRUE",2],pch=22,bg='blue') > points(Y.cov.pca$scores[group=="m.FALSE",1],Y.cov.pca$scores[group=="m.FALSE",2],pch=22,col='blue') > points(Y.cov.pca$scores[group=="f.TRUE",1],Y.cov.pca$scores[group=="f.TRUE",2],pch=21,bg='red') > points(Y.cov.pca$scores[group=="f.FALSE",1],Y.cov.pca$scores[group=="f.FALSE",2],pch=21,col='red') > > # PC 3 vs PC 1 > plot(Y.cov.pca$scores[,1],Y.cov.pca$scores[,3],type='n',asp=1) > points(Y.cov.pca$scores[group=="m.TRUE",1],Y.cov.pca$scores[group=="m.TRUE",3],pch=22,bg='blue') > points(Y.cov.pca$scores[group=="m.FALSE",1],Y.cov.pca$scores[group=="m.FALSE",3],pch=22,col='blue') > points(Y.cov.pca$scores[group=="f.TRUE",1],Y.cov.pca$scores[group=="f.TRUE",3],pch=21,bg='red') > points(Y.cov.pca$scores[group=="f.FALSE",1],Y.cov.pca$scores[group=="f.FALSE",3],pch=21,col='red') > > # PC 3 vs PC 2 > plot(Y.cov.pca$scores[,2],Y.cov.pca$scores[,3],type='n',asp=1) > points(Y.cov.pca$scores[group=="m.TRUE",2],Y.cov.pca$scores[group=="m.TRUE",3],pch=22,bg='blue') > points(Y.cov.pca$scores[group=="m.FALSE",2],Y.cov.pca$scores[group=="m.FALSE",3],pch=22,col='blue') > points(Y.cov.pca$scores[group=="f.TRUE",2],Y.cov.pca$scores[group=="f.TRUE",3],pch=21,bg='red') > points(Y.cov.pca$scores[group=="f.FALSE",2],Y.cov.pca$scores[group=="f.FALSE",3],pch=21,col='red')

5
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > # open circles = dead; closed = survived; blue = male; red = female

6
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > # 3D scatterplot > library(scatterplot3d) > a<-scatterplot3d(Y.cov.pca$scores[,1],Y.cov.pca$scores[,2],Y.cov.pca$scores[,3],type='p',asp=1) > a$points(Y.cov.pca$scores[group=="m.TRUE",1],Y.cov.pca$scores[group=="m.TRUE",2],Y.cov.pca$scores[group=="m.TRUE",3],pch=22,bg='blue') > a$points(Y.cov.pca$scores[group=="m.FALSE",1],Y.cov.pca$scores[group=="m.FALSE",2],Y.cov.pca$scores[group=="m.FALSE",3],pch=22,col='blue') > a$points(Y.cov.pca$scores[group=="f.TRUE",1],Y.cov.pca$scores[group=="f.TRUE",2],Y.cov.pca$scores[group=="f.TRUE",3],pch=21,bg='red') > a$points(Y.cov.pca$scores[group=="f.FALSE",1],Y.cov.pca$scores[group=="f.FALSE",2],Y.cov.pca$scores[group=="f.FALSE",3],pch=21,col='red')

7
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis Have to go to R for this one…. BIOL 582Multivariate ANOVA > # Interactive 3D scatterplot > library(rgl) > col.ind<-as.numeric(group) # f.F = black, f.T = red, m.F = green, m.T =blue > plot3d(Y.cov.pca$scores[,1],Y.cov.pca$scores[,2],Y.cov.pca$scores[,3],col=col.ind) > # note that this distorts aspect ratio

8
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) BIOL 582Multivariate ANOVA Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Where W is the error (within-group) covariance matrix, found as Hotelling’s T 2 can be converted to an F value by Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Where W is the error (within-group) covariance matrix, found as Hotelling’s T 2 can be converted to an F value by

9
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) BIOL 582Multivariate ANOVA Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Note, Hotelling’s T 2 is like a t stat that measures differences in means, which in multivariate practice is expressed as a squared distance The square root, d, is the simplest way to express the difference in means Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Note, Hotelling’s T 2 is like a t stat that measures differences in means, which in multivariate practice is expressed as a squared distance The square root, d, is the simplest way to express the difference in means

10
Example MANOVA: Hotelling T 2 tests BIOL 582Multivariate ANOVA > # Hotelling's T2 comparisons > > m.means<-predict(lm.group, data.frame(group=levels(group))) # multivariate means > rownames(m.means)<-levels(group) # label means > m.means AE BHL FL TTL SW SKL female.FALSE female.TRUE male.FALSE male.TRUE > mean.diffs<-dist(m.means) # all elements are differences in means > means.n<-by(AE,group,length) # the sample sizes for each mean > > g<-nrow(m.means) # number of group means > > # Create empty matrices for results > HT2.matrix<-F.matrix<-P.matrix<-matrix(0,g,g,dimnames=list(c(levels(group)),c(levels(group)))) > W<-1/(lm.group$df.residual)*Sf # error covariance matrix > for(i in 1:g){ + y1<-m.means[i,];n1<-means.n[i] # by rows + for(j in 1:g){ + y2<-m.means[j,];n2<-means.n[j] # by rows also + T2<-n1*n2/(n1+n2)*t(y1-y2)%*%solve(W)%*%(y1-y2) + F<-(n1+n2-ncol(Y)-1)/((n1+n2-lm.group$rank)*ncol(Y))*T2 + P<-1-pf(F,ncol(Y),(n1+n2-ncol(Y)-1)) + # fill in result matrices + HT2.matrix[i,j]<-T2;F.matrix[i,j]<-F;P.matrix[i,j]<-P + } >

11
Example MANOVA: Hotelling T 2 tests BIOL 582Multivariate ANOVA > > mean.diffs female.FALSE female.TRUE male.FALSE female.TRUE male.FALSE male.TRUE > HT2.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > F.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > P.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE Note that females and males differ, but there are no differences in morphology between surviving and non-surviving birds

12
Example MANOVA: Randomization test BIOL 582Multivariate ANOVA > # Randomization Test approach > > P<-matrix(1,g,g) > permute<-999 > md.obs<-as.matrix(mean.diffs) # make sure R know this is a matrix > for(i in 1:permute){ + Y.r<-Y[sample(nrow(Y)),] # shuffle rows, not all values + lm.group.r<-lm(Y.r~group) + m.means.r<-predict(lm.group.r, data.frame(group=levels(group))) + mean.diffs.r<-as.matrix(dist(m.means.r)) # random mean differences + P =md.obs,P+1,P+0) # logical comparisons + } > dimnames(P)<-dimnames(md.obs) > P.values<-P/(permute+1) # P-values > > mean.diffs female.FALSE female.TRUE male.FALSE female.TRUE male.FALSE male.TRUE > P.values female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > Note that females and males differ, but there are no differences in morphology between surviving and non-surviving birds

13
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) Here is one way to present results (could do it the same way with Hotelling’s T 2 BIOL 582Multivariate ANOVA Table 1. Morphological distances (below diagonal) and corresponding P-values (above diagonal), based on 1,000 random permutations of shuffling subjects among the sex-survival groups. Bolded values represent sexual dimorphisms within survival type; italicized values are survival-type comparisons, within sex. FemaleMale non-survivedsurvivednon-survivedsurvived Female non-survived survived Male non-survived survived

14
Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) Discriminant analyses are useful, but often misused Rather than ask if groups are different, one asks (1) if subjects can be assigned to the correct group, and (2) which variables might be best for distinguishing group differences Also, a plot can be made to show group distinction (this is where misuse if often made… but not the only place) BIOL 582Multivariate ANOVA

15
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) Part 1: Classification Without getting into too much detail, the classification part of DA measures the probability that any subject belongs to a certain group. This can be done several ways (and the way it is generally done is not the best). The typical way is to use the same data used to estimate group means. Then the generalized Mahalanobis distance is measured for every i th subject to the j th group mean. Based on these distances, the posterior classification probability of belonging to each group is determined (smaller distance = higher probability). The prior classification probability is determined by group size (prior- probabilities = n j / Σn j ) Comparison of prior and posterior probabilities can help elucidate whether groups are clustered in the multivariate data space BIOL 582Multivariate ANOVA

16
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) Part 2: Variable loading Recall that the multivariate equivalent to the F value was the F matrix: A linear discriminant analysis (LDA) performs an eigenanalysis on this matrix. The eigenvectors are called canonical vectors. The variables that load highest on these vectors (or just the first vector) best characterize differences among groups. Projecting data onto these vectors produces a scatterplot (Canonical variate plot) that shows the best statistical discrimination of groups, if it exists. The plot does not show actual relationships of objects; rather it shows statistical relationships. If groups are distinct, the scatter will indicate it. Sometimes LDA is called a canonical variates analysis (CVA), which sounds much like principal components analysis. BIOL 582Multivariate ANOVA

17
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) These are really superficial descriptions of DA. A better treatment of the subject requires a multivariate stats course. However, be aware that some who work with multivariate data use such an analysis as a post-hoc method for dealing with MANOVA results. BIOL 582Multivariate ANOVA > #LDA Example > library(MASS) > > lda.group<-lda(group~Y,cv=True) # reverse of linear model formula > lda.fit<-predict(lda.group,group) # calculates post-probabilities and CVs > fit.table<-table(group,lda.fit$class) # a summary table of classifications > fit.table # rows are actual; columns are predicted group female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > > # classification success > CS<-sum(diag(fit.table))/nrow(Y)*100 > CS # very low classification.... [1] > > # However, looking at the table reveals that males tend to be classified as males > # which suggests males might be more distinct > # Even if there was a high classification success rate > # one has to use caution, because the same values were used to calculate > # group means (i.e., inherently better rate to expect than equal prior probability)

18
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > lda.group$prior # what is expected by chance female.FALSE female.TRUE male.FALSE male.TRUE > lda.fit$posterior[1:10,] # the first 10 posterior probabilities female.FALSE female.TRUE male.FALSE male.TRUE >

19
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > (lda.fit$posterior-lda.group$prior)[1:10,] # look for highest positive difference in each row female.FALSE female.TRUE male.FALSE male.TRUE

20
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > (lda.fit$posterior-lda.group$prior)[1:10,] # look for highest positive difference in each row female.FALSE female.TRUE male.FALSE male.TRUE > lda.fit$class[1:10] [1] male.FALSE male.FALSE male.FALSE male.TRUE female.TRUE male.FALSE male.FALSE [8] male.FALSE male.TRUE male.TRUE Levels: female.FALSE female.TRUE male.FALSE male.TRUE > Compare to actual > group[1:10] [1] male.TRUE male.FALSE male.FALSE male.TRUE male.TRUE male.FALSE male.TRUE male.FALSE [9] male.TRUE male.FALSE Levels: female.FALSE female.TRUE male.FALSE male.TRUE

21
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) BIOL 582Multivariate ANOVA > # Variable loadings > lda.group Call: lda(group ~ Y, cv = True) Prior probabilities of groups: female.FALSE female.TRUE male.FALSE male.TRUE Group means: YAE YBHL YFL YTTL YSW YSKL female.FALSE female.TRUE male.FALSE male.TRUE Coefficients of linear discriminants: LD1 LD2 LD3 YAE YBHL YFL YTTL YSW YSKL Proportion of trace: LD1 LD2 LD > > # note: the number of CVs = g-1, where g is group levels > # proportion of trace is the amount of information explained > # by each CV. This indicates how many dimensions are neeed > # to explain group differences, most likely TTL, SW, and SKL are the variables that best describe group differences. SW and SKL differ in sign, meaning groups might have either long-slender skulls or short- wide skulls. Wide skulls positive covary with long legs, in distinguishing groups

22
Example MANOVA: Bumpus Data / Discriminant Analysis (DA) BIOL 582Multivariate ANOVA > # CV plot > CV.scores<-lda.fit$x[,1:2] > > plot(CV.scores,asp=1,type='n') > points(CV.scores[group=='female.FALSE',],col='red') > points(CV.scores[group=='female.TRUE',],pch=21,bg='red') > points(CV.scores[group=='male.FALSE',],pch=22,col='blue') > points(CV.scores[group=='male.TRUE',],pch=22,bg='blue') The reason for low classification is apparent now. This is a good example of how one can have sufficient statistical power but not have very meaningful conclusions about the biology. Remember this plot is only an abstract statistical visual aid. Some people make the mistake of showing these plots as “data plots” But the axes do not correspond to linear combinations of variables that express dispersion in the data; they correspond to linear combinations that best separate groups

23
Just for reference….. BIOL 582Multivariate ANOVA CV1 CV2 CV1 CV2 CV1 CV2 Classification success (discrimination ability)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google