Multivariate Statistics with Grouped Units Hal Whitehead BIOL4062/5062
Multivariate Statistics with Grouped Units: Summary Assumption Multivariate t-test Discriminant function analysis Multivariate Analysis of Variance (MANOVA) Canonical Variate Analysis
Multivariate Statistics with Grouped Units Data matrix is divided into groups of units: –Habitat types (community ecology) –Gender (animal behaviour) –Species (morphometrics) Variables Units
Multivariate Statistics with Grouped Units Assume: Homogeneity of Covariance Matrices (each group considered separately has the same covariance matrix)
Multivariate t-test Is there a significant difference between the multivariate means of two populations? Tested using Hotelling’s T 2 ? X1X1 X2X2
Multivariate t-test Hotelling’s T 2 : T 2 = (X 1 -X 2 )’.S -1.(X 1 -X 2 ).n 1.n 2 /(n 1 +n 2 ) –S is covariance matrix –X 1 is vector of means for first group –X 2 is vector of means for second group –n 1 is number of units in first group –n 1 is number of units in second group
Why do Multivariate Test rather than a Series of Univariate Tests? Significant differences may only be apparent in multivariate space Reduce Type I errors (one test rather than many) ? X1X1 X2X2 ? ?
Discriminant Function Analysis Quantifies difference between two groups of units Purposes: –How do we express the difference between two groups of units? –Which variables are important in quantifying this difference? –How much overlap is there between the two groups of units? –Classification of new unit into one or the other of the two groups.
Discriminant Function Analysis Discriminant function best expresses difference between two groups D = S -1 (X 1 - X 2 ) –S is covariance matrix –X 1 is vector of means for first group –X 2 is vector of means for second group D = a 1 ∙x 1 + a 2 ∙x a k ∙x k
Discriminant Function Analysis D = a 1 ∙ x 1 + a 2 ∙ x a k ∙ x k
Discriminant Function Analysis D = a 1 ∙ x 1 + a 2 ∙ x a k ∙ x k Stepwise removal of variables possible: D = a 2 ∙ x a k-4 ∙ x k-4
Multivariate T-test and Discriminant Function Nutrients in foliage of maple trees (1) Units: 11 sites (6 poor; 5 good) Variables: Nitrogen, Phosphorus, Potassium Mean vectors: X(p) = 0.10 X(g) = Within group covariance matrix: S = 1/
Multivariate T-test and Discriminant Function Nutrients in foliage of maple trees (2) T² = (P<0.01) Discriminant Function: D = -1.99N P K
Multivariate T-test and Discriminant Function Analysis of forest health using aerial photography (1) Units: 22 trees (11 healthy; 11 diseased) Variables: red, green, blue image densities Mean vectors: X(d) = 1.16 X(h) = Within group covariance matrix: S = 1/
Multivariate T-test and Discriminant Function Analysis of forest health using aerial photography (2) T² = (P<0.01) Discriminant Function: D = 4.24R G B
Classification of new individual Use discriminant function (D): –allocate i to group 1 if D(i)<k –allocate i to group 2 if D(i)>k Use Mahalanobis distances (D M ): –allocate i to group 1 if D M (X 1, i)< D M (X 2, i) –allocate i to group 2 if D M (X 1, i)> D M (X 2, i) where D M (X 1, i)< is Mahalanobis distance between i and the mean vector of group 1 {equivalent to discriminant function approach with k=0} Other approaches if data not normal, covariance matrices not homogeneous,...
More than one Group: Multivariate Analysis of Variance (MANOVA) Are there significant differences between the means of several groups of points in multivariate space? Wilk’s Λ= |Within Gps Covariance Matrix| |Total Covariance Matrix| |W| is determinant of matrix W 0 {maximum difference} < Λ < 1 {no difference}
More than one Group: Multivariate Analysis of Variance (MANOVA) Are there significant differences between the means of several groups of points in multivariate space? If no difference between groups, then: -[n-1-½(k-m)] ∙ Log(Λ) is approximately χ² k(m-1) n no. of units k no. of variables m no. of groups Other possible MANOVA statistics
Canonical Variate Analysis Generalization of discriminant function analysis for more than two groups m groups, each with homogeneous covariance matrix
Canonical Variate Analysis 1st canonical axis inclined in direction of greatest variability between means of m groups of samples 2nd canonical axis in direction of next greatest variability etc. (Axes not necessarily orthogonal) 1st canonical axis 2nd canonical axis
Canonical Variate Analysis Used to: –Disclose relationships between groups –How well, and by what functions, can groups be discriminated? –How different variables contribute to the discrimination of groups?
Canonical Variate Analysis Canonical variates are of form: y 1 = a 11 ∙x 1 + a 12 ∙x a 1k ∙x k y 2 = a 21 ∙x 1 + a 22 ∙x a 2k ∙x k... y m-1 = a m-1,1 ∙x 1 + a m-1,2 ∙x a m-1,k ∙x k Number of canonical variates: number of groups - 1 (m - 1) Tests of significance for each canonical variate
Canonical Variate Analysis T total covariance matrix W within-group covariance matrix B between-group covariance matrix B = T - W Eigenvectors of W -1 B are canonical variate coefficients: a 11 ∙ x 1 + a 12 ∙ x a 1k ∙ x k... Corresponding eigenvalues of W -1 B are: Between Groups Sum of Squares Within Groups Sum of Squares
Example: Sperm Whale Movements Variables: –movements in 3hr, 12hr, 24hr Units: –65 days following sperm whales Groups: –4 clans 00:00 24:00 MOVE3 MOVE12 MOVE24 MANOVA: Wilk’s Λ = (P=0.016)
Example: Sperm whale movements Canonical discriminant functions: 123 Constant MOVE MOVE MOVE Eigenvalues Significance:P 0.25P>0.9
Example: Sperm whale movements
Mahalanoblis Classification functions +14+Reg.Short CONSTANT MOVE MOVE MOVE
Example: Sperm whale movements Classification matrix (cases in row categories classified into columns) +14+Reg.Short%correct Reg Short Total
Example: Sperm whale movements Jackknifed Classification matrix +14+Reg.Short%correct Reg Short Total
Sperm Whale Movements More Complex MANOVA’s [MOVE3,MOVE12,MOVE24]=CLAN –CLAN: Λ = (P=0.016) [MOVE3,MOVE12,MOVE24]=AREA+CLAN –AREA: Λ = (P=0.063) –CLAN: Λ = (P=0.042) [MOVE3,MOVE12,MOVE24]=AREA+CLAN(AREA) –AREA: Λ = (P=0.025) –CLAN nested within AREA: Λ = (P=0.057)
Discriminant Functions, Canonical Variates, etc. Are groups different in multivariate space? How are they different? Which variables most contribute to the differences? Classification of new individuals