BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis.

Slides:



Advertisements
Similar presentations
MANOVA (and DISCRIMINANT ANALYSIS) Alan Garnham, Spring 2005
Advertisements

Agenda of Week VII Review of Week VI Multiple regression Canonical correlation.
Inference for Regression
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Independent Sample T-test Formula
MARE 250 Dr. Jason Turner Multiway, Multivariate, Covariate, ANOVA.
Statistical Methods Chichang Jou Tamkang University.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
MARE 250 Dr. Jason Turner Multiway, Multivariate, Covariate, ANOVA.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Lecture 9: One Way ANOVA Between Subjects
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
What Is Multivariate Analysis of Variance (MANOVA)?
8-2 Basics of Hypothesis Testing
Today Concepts underlying inferential statistics
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Chapter 14 Inferential Data Analysis
Simple Linear Regression Analysis
Multivariate Analysis of Variance, Part 1 BMTRY 726.
Inferential Statistics: SPSS
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
One-Way Manova For an expository presentation of multivariate analysis of variance (MANOVA). See the following paper, which addresses several questions:
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
BIOL 582 Lecture Set 17 Analysis of frequency and categorical data Part II: Goodness of Fit Tests for Continuous Frequency Distributions; Tests of Independence.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
ANOVA (Analysis of Variance) by Aziza Munir
BIOL 582 Lecture Set 21 One-Way MANOVA, Part I. So far we have learned two things about multivariate data: 1.That linear models work equally well with.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Statistical Analysis of Data1 of 38 1 of 42 Department of Cognitive Science Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 MANOVA Multivariate.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
CHI SQUARE TESTS.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
Discriminant Function Analysis Mechanics. Equations To get our results we’ll have to use those same SSCP matrices as we did with Manova.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
MANOVA Lecture 12 Nuance stuff Psy 524 Andrew Ainsworth.
Differences Among Groups
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Appendix I A Refresher on some Statistical Terms and Tests.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Stats Methods at IC Lecture 3: Regression.
Step 1: Specify a null hypothesis
MANOVA Dig it!.
Lecture Slides Elementary Statistics Twelfth Edition
Kin 304 Inferential Statistics
Chi Square Two-way Tables
Analysis of Variance (ANOVA)
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Inferential Statistics
15.1 The Role of Statistics in the Research Process
Understanding Statistical Inferences
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

BIOL 582 Lecture Set 22 One-Way MANOVA, Part II Post-hoc exercises Discriminant Analysis

Example MANOVA: Bumpus Data 1.State Null/Alternative hypotheses 2.Define model (evaluate assumptions) 3.Calculate S f for the error of the full model 4.Define the reduced “null” model (always contains just an intercept) 5.Calculate S r for the error of the reduced model 6.Calculate a multivariate test statistic S f using and/or S r. (Note: one can use only S f to get a test statistic if performing a randomization test.) 7.Evaluate the probability of the test statistic if the null hypothesis were true a.By converting the test statistic to an F stat, which approximately follows and F distribution b.By performing a randomization test to create an empirical probability distribution 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis These two are linked because we are in a position where we might want to do multiple comparisons. Therefore, what we plot or how we present a table might be contingent upon what a multiple comparisons test reveals Also, multiple comparisons can be done several ways Let’s start by looking at a PC plot of the data (Because we used the original variables in the MANOVA, we must use a covariance matrix) BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > Y.cov.pca<-princomp(Y,cor=F) > par(mfrow=c(1,3)) > # PC 2 vs PC 1 > plot(Y.cov.pca$scores[,1],Y.cov.pca$scores[,2],type='n',asp=1) > points(Y.cov.pca$scores[group=="m.TRUE",1],Y.cov.pca$scores[group=="m.TRUE",2],pch=22,bg='blue') > points(Y.cov.pca$scores[group=="m.FALSE",1],Y.cov.pca$scores[group=="m.FALSE",2],pch=22,col='blue') > points(Y.cov.pca$scores[group=="f.TRUE",1],Y.cov.pca$scores[group=="f.TRUE",2],pch=21,bg='red') > points(Y.cov.pca$scores[group=="f.FALSE",1],Y.cov.pca$scores[group=="f.FALSE",2],pch=21,col='red') > > # PC 3 vs PC 1 > plot(Y.cov.pca$scores[,1],Y.cov.pca$scores[,3],type='n',asp=1) > points(Y.cov.pca$scores[group=="m.TRUE",1],Y.cov.pca$scores[group=="m.TRUE",3],pch=22,bg='blue') > points(Y.cov.pca$scores[group=="m.FALSE",1],Y.cov.pca$scores[group=="m.FALSE",3],pch=22,col='blue') > points(Y.cov.pca$scores[group=="f.TRUE",1],Y.cov.pca$scores[group=="f.TRUE",3],pch=21,bg='red') > points(Y.cov.pca$scores[group=="f.FALSE",1],Y.cov.pca$scores[group=="f.FALSE",3],pch=21,col='red') > > # PC 3 vs PC 2 > plot(Y.cov.pca$scores[,2],Y.cov.pca$scores[,3],type='n',asp=1) > points(Y.cov.pca$scores[group=="m.TRUE",2],Y.cov.pca$scores[group=="m.TRUE",3],pch=22,bg='blue') > points(Y.cov.pca$scores[group=="m.FALSE",2],Y.cov.pca$scores[group=="m.FALSE",3],pch=22,col='blue') > points(Y.cov.pca$scores[group=="f.TRUE",2],Y.cov.pca$scores[group=="f.TRUE",3],pch=21,bg='red') > points(Y.cov.pca$scores[group=="f.FALSE",2],Y.cov.pca$scores[group=="f.FALSE",3],pch=21,col='red')

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > # open circles = dead; closed = survived; blue = male; red = female

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis BIOL 582Multivariate ANOVA > # 3D scatterplot > library(scatterplot3d) > a<-scatterplot3d(Y.cov.pca$scores[,1],Y.cov.pca$scores[,2],Y.cov.pca$scores[,3],type='p',asp=1) > a$points(Y.cov.pca$scores[group=="m.TRUE",1],Y.cov.pca$scores[group=="m.TRUE",2],Y.cov.pca$scores[group=="m.TRUE",3],pch=22,bg='blue') > a$points(Y.cov.pca$scores[group=="m.FALSE",1],Y.cov.pca$scores[group=="m.FALSE",2],Y.cov.pca$scores[group=="m.FALSE",3],pch=22,col='blue') > a$points(Y.cov.pca$scores[group=="f.TRUE",1],Y.cov.pca$scores[group=="f.TRUE",2],Y.cov.pca$scores[group=="f.TRUE",3],pch=21,bg='red') > a$points(Y.cov.pca$scores[group=="f.FALSE",1],Y.cov.pca$scores[group=="f.FALSE",2],Y.cov.pca$scores[group=="f.FALSE",3],pch=21,col='red')

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis Have to go to R for this one…. BIOL 582Multivariate ANOVA > # Interactive 3D scatterplot > library(rgl) > col.ind<-as.numeric(group) # f.F = black, f.T = red, m.F = green, m.T =blue > plot3d(Y.cov.pca$scores[,1],Y.cov.pca$scores[,2],Y.cov.pca$scores[,3],col=col.ind) > # note that this distorts aspect ratio

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) BIOL 582Multivariate ANOVA Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Where W is the error (within-group) covariance matrix, found as Hotelling’s T 2 can be converted to an F value by Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Where W is the error (within-group) covariance matrix, found as Hotelling’s T 2 can be converted to an F value by

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) BIOL 582Multivariate ANOVA Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Note, Hotelling’s T 2 is like a t stat that measures differences in means, which in multivariate practice is expressed as a squared distance The square root, d, is the simplest way to express the difference in means Hotelling’s T 2 is basically a multivariate t -test between any two multivariate means Note, Hotelling’s T 2 is like a t stat that measures differences in means, which in multivariate practice is expressed as a squared distance The square root, d, is the simplest way to express the difference in means

Example MANOVA: Hotelling T 2 tests BIOL 582Multivariate ANOVA > # Hotelling's T2 comparisons > > m.means<-predict(lm.group, data.frame(group=levels(group))) # multivariate means > rownames(m.means)<-levels(group) # label means > m.means AE BHL FL TTL SW SKL female.FALSE female.TRUE male.FALSE male.TRUE > mean.diffs<-dist(m.means) # all elements are differences in means > means.n<-by(AE,group,length) # the sample sizes for each mean > > g<-nrow(m.means) # number of group means > > # Create empty matrices for results > HT2.matrix<-F.matrix<-P.matrix<-matrix(0,g,g,dimnames=list(c(levels(group)),c(levels(group)))) > W<-1/(lm.group$df.residual)*Sf # error covariance matrix > for(i in 1:g){ + y1<-m.means[i,];n1<-means.n[i] # by rows + for(j in 1:g){ + y2<-m.means[j,];n2<-means.n[j] # by rows also + T2<-n1*n2/(n1+n2)*t(y1-y2)%*%solve(W)%*%(y1-y2) + F<-(n1+n2-ncol(Y)-1)/((n1+n2-lm.group$rank)*ncol(Y))*T2 + P<-1-pf(F,ncol(Y),(n1+n2-ncol(Y)-1)) + # fill in result matrices + HT2.matrix[i,j]<-T2;F.matrix[i,j]<-F;P.matrix[i,j]<-P + } >

Example MANOVA: Hotelling T 2 tests BIOL 582Multivariate ANOVA > > mean.diffs female.FALSE female.TRUE male.FALSE female.TRUE male.FALSE male.TRUE > HT2.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > F.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > P.matrix female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE Note that females and males differ, but there are no differences in morphology between surviving and non-surviving birds

Example MANOVA: Randomization test BIOL 582Multivariate ANOVA > # Randomization Test approach > > P<-matrix(1,g,g) > permute<-999 > md.obs<-as.matrix(mean.diffs) # make sure R know this is a matrix > for(i in 1:permute){ + Y.r<-Y[sample(nrow(Y)),] # shuffle rows, not all values + lm.group.r<-lm(Y.r~group) + m.means.r<-predict(lm.group.r, data.frame(group=levels(group))) + mean.diffs.r<-as.matrix(dist(m.means.r)) # random mean differences + P =md.obs,P+1,P+0) # logical comparisons + } > dimnames(P)<-dimnames(md.obs) > P.values<-P/(permute+1) # P-values > > mean.diffs female.FALSE female.TRUE male.FALSE female.TRUE male.FALSE male.TRUE > P.values female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > Note that females and males differ, but there are no differences in morphology between surviving and non-surviving birds

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) Here is one way to present results (could do it the same way with Hotelling’s T 2 BIOL 582Multivariate ANOVA Table 1. Morphological distances (below diagonal) and corresponding P-values (above diagonal), based on 1,000 random permutations of shuffling subjects among the sex-survival groups. Bolded values represent sexual dimorphisms within survival type; italicized values are survival-type comparisons, within sex. FemaleMale non-survivedsurvivednon-survivedsurvived Female non-survived survived Male non-survived survived

Example MANOVA: Bumpus Data 8.Generate plots/tables 9.Maybe do a discriminant analysis So it appears that males and females tend to differ in morphology, but survived versus dead birds, it’s a little hard to distinguish. What options are there for multiple comparisons 1.Hotelling’s T 2 (inferential) 2.Randomization Test 3.Discriminant analysis (exploratory but multi-faceted) Discriminant analyses are useful, but often misused Rather than ask if groups are different, one asks (1) if subjects can be assigned to the correct group, and (2) which variables might be best for distinguishing group differences Also, a plot can be made to show group distinction (this is where misuse if often made… but not the only place) BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) Part 1: Classification Without getting into too much detail, the classification part of DA measures the probability that any subject belongs to a certain group. This can be done several ways (and the way it is generally done is not the best). The typical way is to use the same data used to estimate group means. Then the generalized Mahalanobis distance is measured for every i th subject to the j th group mean. Based on these distances, the posterior classification probability of belonging to each group is determined (smaller distance = higher probability). The prior classification probability is determined by group size (prior- probabilities = n j / Σn j ) Comparison of prior and posterior probabilities can help elucidate whether groups are clustered in the multivariate data space BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) Part 2: Variable loading Recall that the multivariate equivalent to the F value was the F matrix: A linear discriminant analysis (LDA) performs an eigenanalysis on this matrix. The eigenvectors are called canonical vectors. The variables that load highest on these vectors (or just the first vector) best characterize differences among groups. Projecting data onto these vectors produces a scatterplot (Canonical variate plot) that shows the best statistical discrimination of groups, if it exists. The plot does not show actual relationships of objects; rather it shows statistical relationships. If groups are distinct, the scatter will indicate it. Sometimes LDA is called a canonical variates analysis (CVA), which sounds much like principal components analysis. BIOL 582Multivariate ANOVA

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) These are really superficial descriptions of DA. A better treatment of the subject requires a multivariate stats course. However, be aware that some who work with multivariate data use such an analysis as a post-hoc method for dealing with MANOVA results. BIOL 582Multivariate ANOVA > #LDA Example > library(MASS) > > lda.group<-lda(group~Y,cv=True) # reverse of linear model formula > lda.fit<-predict(lda.group,group) # calculates post-probabilities and CVs > fit.table<-table(group,lda.fit$class) # a summary table of classifications > fit.table # rows are actual; columns are predicted group female.FALSE female.TRUE male.FALSE male.TRUE female.FALSE female.TRUE male.FALSE male.TRUE > > # classification success > CS<-sum(diag(fit.table))/nrow(Y)*100 > CS # very low classification.... [1] > > # However, looking at the table reveals that males tend to be classified as males > # which suggests males might be more distinct > # Even if there was a high classification success rate > # one has to use caution, because the same values were used to calculate > # group means (i.e., inherently better rate to expect than equal prior probability)

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > lda.group$prior # what is expected by chance female.FALSE female.TRUE male.FALSE male.TRUE > lda.fit$posterior[1:10,] # the first 10 posterior probabilities female.FALSE female.TRUE male.FALSE male.TRUE >

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > (lda.fit$posterior-lda.group$prior)[1:10,] # look for highest positive difference in each row female.FALSE female.TRUE male.FALSE male.TRUE

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) How does the classification work? BIOL 582Multivariate ANOVA > (lda.fit$posterior-lda.group$prior)[1:10,] # look for highest positive difference in each row female.FALSE female.TRUE male.FALSE male.TRUE > lda.fit$class[1:10] [1] male.FALSE male.FALSE male.FALSE male.TRUE female.TRUE male.FALSE male.FALSE [8] male.FALSE male.TRUE male.TRUE Levels: female.FALSE female.TRUE male.FALSE male.TRUE > Compare to actual > group[1:10] [1] male.TRUE male.FALSE male.FALSE male.TRUE male.TRUE male.FALSE male.TRUE male.FALSE [9] male.TRUE male.FALSE Levels: female.FALSE female.TRUE male.FALSE male.TRUE

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) BIOL 582Multivariate ANOVA > # Variable loadings > lda.group Call: lda(group ~ Y, cv = True) Prior probabilities of groups: female.FALSE female.TRUE male.FALSE male.TRUE Group means: YAE YBHL YFL YTTL YSW YSKL female.FALSE female.TRUE male.FALSE male.TRUE Coefficients of linear discriminants: LD1 LD2 LD3 YAE YBHL YFL YTTL YSW YSKL Proportion of trace: LD1 LD2 LD > > # note: the number of CVs = g-1, where g is group levels > # proportion of trace is the amount of information explained > # by each CV. This indicates how many dimensions are neeed > # to explain group differences, most likely TTL, SW, and SKL are the variables that best describe group differences. SW and SKL differ in sign, meaning groups might have either long-slender skulls or short- wide skulls. Wide skulls positive covary with long legs, in distinguishing groups

Example MANOVA: Bumpus Data / Discriminant Analysis (DA) BIOL 582Multivariate ANOVA > # CV plot > CV.scores<-lda.fit$x[,1:2] > > plot(CV.scores,asp=1,type='n') > points(CV.scores[group=='female.FALSE',],col='red') > points(CV.scores[group=='female.TRUE',],pch=21,bg='red') > points(CV.scores[group=='male.FALSE',],pch=22,col='blue') > points(CV.scores[group=='male.TRUE',],pch=22,bg='blue') The reason for low classification is apparent now. This is a good example of how one can have sufficient statistical power but not have very meaningful conclusions about the biology. Remember this plot is only an abstract statistical visual aid. Some people make the mistake of showing these plots as “data plots” But the axes do not correspond to linear combinations of variables that express dispersion in the data; they correspond to linear combinations that best separate groups

Just for reference….. BIOL 582Multivariate ANOVA CV1 CV2 CV1 CV2 CV1 CV2 Classification success (discrimination ability)