Presentation is loading. Please wait.

Presentation is loading. Please wait.

Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.1 Two sample comparisons l Univariate 2-sample comparisons l The.

Similar presentations


Presentation on theme: "Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.1 Two sample comparisons l Univariate 2-sample comparisons l The."— Presentation transcript:

1 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.1 Two sample comparisons l Univariate 2-sample comparisons l The biological rationale for multivariate comparisons l Why not multiple univariate comparisons? l Univariate 2-sample comparisons l The biological rationale for multivariate comparisons l Why not multiple univariate comparisons? l Comparison of multivariate means l Evaluating assumptions l Comparison of multivariate variances l Example: differences between Adirondack lakes with and without brook trout.

2 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.2 Univariate 2-sample tests l Appropriate when there are two groups to compare (e.g. control and treatment) l In principle, we can compare any sample statistic, e.g., group means, medians, variances, etc. l Appropriate when there are two groups to compare (e.g. control and treatment) l In principle, we can compare any sample statistic, e.g., group means, medians, variances, etc. Frequency Control Treatment s2Cs2C s2Ts2T

3 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.3 Two-sample comparisons: control versus experiment l Two plots of corn, one (control) with no treatment, the other (treatment) with nitrogen added l Biological prediction: nitrogen increases crop yield H 0 :  T   C (one-tailed) l Two plots of corn, one (control) with no treatment, the other (treatment) with nitrogen added l Biological prediction: nitrogen increases crop yield H 0 :  T   C (one-tailed) Frequency Control Treatment Yield

4 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.4 Comparing means: the t-test l Calculate difference between two means l H 0 (one-tailed): l Calculate t and associated p l Calculate difference between two means l H 0 (one-tailed): l Calculate t and associated p Frequency Control Treatment Yield

5 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.5 Comparing two means: the multivariate case l Suppose that for each sample unit in two different samples, we measure several variables X 1, X 2, …X P. l How might we compare the two samples? l Suppose that for each sample unit in two different samples, we measure several variables X 1, X 2, …X P. l How might we compare the two samples?

6 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.6 Possibility 1: multiple univariate tests l In this case, we compare the means of the two samples for each variable individually. l So if we have P variables, we would do P t- tests (or Mann- Whitney U tests) l In this case, we compare the means of the two samples for each variable individually. l So if we have P variables, we would do P t- tests (or Mann- Whitney U tests)

7 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.7 Problem 1: controlling experiment- wise  error For comparisons involving P variables the probability of accepting H 0 (no difference) is (1 -  ) P. For 4 independent variables, (1 -  ) P = (0.95) 4 =.815, so experiment- wise  (  e ) = 0.185. l Thus we would expect to reject H 0 for at least one variable about 19% of the time, even if the samples differed with respect to none of the four variables. For comparisons involving P variables the probability of accepting H 0 (no difference) is (1 -  ) P. For 4 independent variables, (1 -  ) P = (0.95) 4 =.815, so experiment- wise  (  e ) = 0.185. l Thus we would expect to reject H 0 for at least one variable about 19% of the time, even if the samples differed with respect to none of the four variables. Number of variables 0246810 Experiment-wise  (  e ) 0.0 0.2 0.4 0.6 0.8 1.0 Nominal  =.05

8 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.8 Controlling experiment-wise  error at nominal   by  adjusting by total number of comparisons To maintain  e at nominal , we need to adjust  for each comparison by the total number of comparisons. In this manner,  e becomes independent of the number of variables… l … but invariably such procedures are too conservative. To maintain  e at nominal , we need to adjust  for each comparison by the total number of comparisons. In this manner,  e becomes independent of the number of variables… l … but invariably such procedures are too conservative. Number of treatments 0246810 Experiment-wise  (  e ) 0.0 0.2 0.4 0.6 0.8 1.0 Nominal  =.05

9 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.9 Controlling  e by adjusting individual  ’s

10 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.10 Problem 2: reduced power l Samples/groups may differ with respect to their multivariate means but not with respect to the means of any single variable, because of the cumulative effects of several small differences. l Hence, univariate tests will usually have lower power. l Samples/groups may differ with respect to their multivariate means but not with respect to the means of any single variable, because of the cumulative effects of several small differences. l Hence, univariate tests will usually have lower power. X2X2 Sample 1 Sample 2 X1X1

11 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.11 Problems 3 and 4: loss of information l Univariate tests ignore correlations among variables, which is useful information in itself l With univariate tests, we cannot estimate the extent to which overall differences among samples/groups are due to particular variables. l Univariate tests ignore correlations among variables, which is useful information in itself l With univariate tests, we cannot estimate the extent to which overall differences among samples/groups are due to particular variables.

12 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.12 Hotelling’s T 2 : a multivariate extension of the t-test. l The (2-tailed) null hypothesis is that the vector of means are equal for the 2 populations… l … which implies that the populations are equal on all p variables. l The (2-tailed) null hypothesis is that the vector of means are equal for the 2 populations… l … which implies that the populations are equal on all p variables.

13 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.13 Hypothesis testing using Hotelling’s T 2. l Conveniently, T 2 can be transformed into F exactly… l … so hypotheses can be tested by comparing observed F to critical values of the F- distribution with p (number of variables) and (n 1 + n 2 - p - 1) df. l Conveniently, T 2 can be transformed into F exactly… l … so hypotheses can be tested by comparing observed F to critical values of the F- distribution with p (number of variables) and (n 1 + n 2 - p - 1) df.

14 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.14 Example: body size in Bumpus’s sparrows H 0 :  S =  NS (average size of surviving and non-surviving female sparrows is the same) l Variables: total length, alar extent, head length, humerus length, sternum and keel length l H 0 accepted. H 0 :  S =  NS (average size of surviving and non-surviving female sparrows is the same) l Variables: total length, alar extent, head length, humerus length, sternum and keel length l H 0 accepted.

15 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.15 AssumptionsAssumptions l All observations are independent (residuals are uncorrelated) l Within each sample (group), variables (residuals) are multivariate normally distributed l Each sample (group) has the same covariance matrix (compound symmetry) l All observations are independent (residuals are uncorrelated) l Within each sample (group), variables (residuals) are multivariate normally distributed l Each sample (group) has the same covariance matrix (compound symmetry)

16 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.16 Effect of violation of assumptions Assumption Effect on  Effect on power Independence of observations Very large, actual  much larger than nominal  Large, power much reduced NormalitySmall to negligible Reduced power for platykurtotic distributions, skewness has little effect Equality of covariance matrices Small to negligible if group Ns similar, if Ns very unequal, actual  larger than nominal  Power reduced, reduction greater for unequal Ns.

17 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.17 Checking independence of observations l Does the experimental design suggest that sampling units may not be independent (e.g. spatiotemporal correlation?) l Calculate intraclass R correlation for each variable. l Does the experimental design suggest that sampling units may not be independent (e.g. spatiotemporal correlation?) l Calculate intraclass R correlation for each variable. l Do autocorrelation plots for each variable/group combination to check for serial correlation.

18 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.18 Checking independence assumption l Run ACFs for all residuals for all groups separately, and check for evidence of autocorrelation among residuals. ACF of residuals of pH for lakes with brook trout

19 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.19 If non-independence is suspected… l Delete observations from each group until independence is achieved (N.B. this will reduce power!) l Pool observations into subgroups and use means of subgroups as observations. l Delete observations from each group until independence is achieved (N.B. this will reduce power!) l Pool observations into subgroups and use means of subgroups as observations. Group 1Group 2 Subgroups

20 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.20 Checking multivariate normality l While characterizing MVN is difficult, a necessary (but not sufficient) condition is that each of the variables (residuals) is normally distributed l If there are p variables, there are p sets of estimates and residuals generated for any fitted model. l Check normality by doing normal probability plots for each variable. l While characterizing MVN is difficult, a necessary (but not sufficient) condition is that each of the variables (residuals) is normally distributed l If there are p variables, there are p sets of estimates and residuals generated for any fitted model. l Check normality by doing normal probability plots for each variable. Normal probability plot of residuals of total length, comparison of survivors and non-survivors from Bumpus data

21 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.21 Calculate percentiles of  2 distribution with p (number of variables) degrees of freedom: l If data are multivariate normal, then for each group, a plot of distances versus percentiles should yield a straight line. Calculate percentiles of  2 distribution with p (number of variables) degrees of freedom: l If data are multivariate normal, then for each group, a plot of distances versus percentiles should yield a straight line. Checking multivariate normality l For each group, calculate vector of means and Mahalanobis distance D j 2, j = 1,…, N i, of each observation from the multivariate mean of group i. l For each group, order distances from smallest to largest: l For each group, calculate vector of means and Mahalanobis distance D j 2, j = 1,…, N i, of each observation from the multivariate mean of group i. l For each group, order distances from smallest to largest:

22 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.22 Equality of covariance matrices l Equality of covariance (C 1 = C 2 ) implies that each element of C 1 is equal to the corresponding element in C 2. l This is a very restrictive assumption that is almost never met in practice, so the real question is… l …how different are they? l Equality of covariance (C 1 = C 2 ) implies that each element of C 1 is equal to the corresponding element in C 2. l This is a very restrictive assumption that is almost never met in practice, so the real question is… l …how different are they? Variance Covariance C1C1 C2C2

23 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.23 Checking equality of variances l Plot residuals versus estimates for all variables and check for evidence of heteroscedasticity l Run Levene’s test for heterogeneity of variances for all variables. l Plot residuals versus estimates for all variables and check for evidence of heteroscedasticity l Run Levene’s test for heterogeneity of variances for all variables. Residuals versus estimates (total length), comparison of survivors and non-survivors from Bumpus data,

24 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.24 Box test for equality of covariance matrices l Calculate ln of the determinant of each group covariance matrix C ic and the pooled covariance matrix C l Use these values to calculate Box’s M l Use k (number of groups) and p (number of variables) to calculate C For reasonably large N i (> 20), M(1-C) is approx  2 distributed l Calculate ln of the determinant of each group covariance matrix C ic and the pooled covariance matrix C l Use these values to calculate Box’s M l Use k (number of groups) and p (number of variables) to calculate C For reasonably large N i (> 20), M(1-C) is approx  2 distributed

25 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.25 Box’s test (cont’d) l If the Box test is significant with approximately equal group sizes, type I error rate only slightly affected, but power is reduced to some extent l If the Box test is significant with unequal group sizes, compare determinants of group covariance matrices l If group with smaller N has smaller |C|, test statistics are liberal; if the other way around, they are conservative. l If the Box test is significant with unequal group sizes, compare determinants of group covariance matrices l If group with smaller N has smaller |C|, test statistics are liberal; if the other way around, they are conservative.

26 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.26 Important note! l Box’s test is quite sensitive to deviations from multivariate normality… l … so make sure the MVN assumption is valid before proceeding! l Box’s test is quite sensitive to deviations from multivariate normality… l … so make sure the MVN assumption is valid before proceeding!

27 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.27 Checking assumptions in MANOVA Independence (intraclass correlation, ACF) Use group means as unit of analysis Assess MV normality Check group sizes MVN graph test Check Univariate normality No Yes N i > 20 N i < 20

28 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.28 Checking assumptions in MANOVA (cont’d) MV normal? Check homogeneity of covariance matrices Most variables normal? Transform offending variables Group sizes more or less equal (R < 1.5)? Groups reasonably large (> 15)? Yes No Yes No END Yes No Transform variables, or adjust 

29 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.29 Comparing two variances: the univariate case l If variances are equal, then s 2 C = s 2 T l H 0 (Levene’s): l This test is relatively insensitive to non-normality l If variances are equal, then s 2 C = s 2 T l H 0 (Levene’s): l This test is relatively insensitive to non-normality Frequency Control Treatment s2Cs2C s2Ts2T

30 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.30 Comparing two multivariate variances I: Levene’s test l Standardize all variables to have zero mean and unit variance. l Calculate absolute value of the difference between the standardized value and the standardized mean (or median) l Compare mean absolute values using Hotelling’s T 2. l Standardize all variables to have zero mean and unit variance. l Calculate absolute value of the difference between the standardized value and the standardized mean (or median) l Compare mean absolute values using Hotelling’s T 2.

31 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.31 Comparing two multivariate variances II: van Valen’s test l Calculate the difference between the standardized value for each observation and the standardized mean (or median) squared, and sum over variables. l Compare average values for each sample with a univariate t-test (or some such) l Calculate the difference between the standardized value for each observation and the standardized mean (or median) squared, and sum over variables. l Compare average values for each sample with a univariate t-test (or some such)

32 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.32 Example: comparison of Adirondack lakes with and without brook trout l Goal: to elucidate the factors controlling brook trout presence/absence. l Question: do lakes with and without BT differ with respect to certain physiochemical variables, e.g. pH, DO, ANC, elevation, size, etc. l Goal: to elucidate the factors controlling brook trout presence/absence. l Question: do lakes with and without BT differ with respect to certain physiochemical variables, e.g. pH, DO, ANC, elevation, size, etc. BT absent BT present

33 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.33 Univariate F Tests Effect SS df MS F P DO 38.330 1 38.330 11.670 0.001 Error 2423.871 738 3.284 PH 47.726 1 47.726 80.256 0.000 Error 438.864 738 0.595 ANC 418836.384 1 418836.384 8.298 0.004 Error 3.72522E+07 738 50477.213 ELEVATION 5192.262 1 5192.262 0.404 0.525 Error 9488005.547 738 12856.376 SA 3731.309 1 3731.309 12.910 0.000 Error 213305.666 738 289.032

34 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.34 Multivariate test-statistics Multivariate Test Statistics Wilks' Lambda = 0.862 F-Statistic = 23.477 df = 5, 734 Prob = 0.000 Pillai Trace = 0.138 F-Statistic = 23.477 df = 5, 734 Prob = 0.000 Hotelling-Lawley Trace = 0.160 F-Statistic = 23.477 df = 5, 734 Prob = 0.000

35 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.35 The conclusion l Lakes with and without brook trout seem to differ with respect to pH, DO, ANC and elevation, but not with respect to elevation l The multivariate means are significantly different, i.e. the null is rejected. l Lakes with and without brook trout seem to differ with respect to pH, DO, ANC and elevation, but not with respect to elevation l The multivariate means are significantly different, i.e. the null is rejected. l But, before proceeding any further, we MUST check the assumptions of independence, normality, and equality of covariance matrices

36 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.36 Checking serial independence using ACF plots l Run MANOVA, save residuals and data l Extract set of residuals for p variables for each group (Brook trout present or absent) l Run ACF on residuals for each variable/group combination. l Run MANOVA, save residuals and data l Extract set of residuals for p variables for each group (Brook trout present or absent) l Run ACF on residuals for each variable/group combination. ACF of residuals of pH for lakes with brook trout

37 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.37 Checking independence using the intraclass correlation l Get MSs from univariate F tables, and calculate R for each variable l Are the values relatively small? l Get MSs from univariate F tables, and calculate R for each variable l Are the values relatively small?

38 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.38 Example: MV normality in Adirondack lakes with brook trout l Run DISCRIM with two groups (BT present and absent), 5 variables (pH, DO, ANC, elevation, SA) to generate Mahalanobis distances l Evidence of non- normality due to skewed distributions of ANC, SA. l Run DISCRIM with two groups (BT present and absent), 5 variables (pH, DO, ANC, elevation, SA) to generate Mahalanobis distances l Evidence of non- normality due to skewed distributions of ANC, SA.

39 Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.39 Box test for equality of covariance matrices l Conclusion: covariance matrices are heterogeneous… l …but analysis based on data which we know do not satisfy normality condition. l So, results are not reliable. l Solution: find transformations such that MVN condition is satisfied, and re-run analyses. l Conclusion: covariance matrices are heterogeneous… l …but analysis based on data which we know do not satisfy normality condition. l So, results are not reliable. l Solution: find transformations such that MVN condition is satisfied, and re-run analyses.


Download ppt "Université d’Ottawa / University of Ottawa 1999 Bio 8100s Multivariate biostatistics L5.1 Two sample comparisons l Univariate 2-sample comparisons l The."

Similar presentations


Ads by Google