# Section VII Comparing means & analysis of variance.

## Presentation on theme: "Section VII Comparing means & analysis of variance."— Presentation transcript:

Section VII Comparing means & analysis of variance

How to display means- ok in simple situations

Presenting means - ANOVA data One can also add “error bars” to these means. In analysis of variance, these error bars are based on the sample size and the pooled standard deviation, SDe. This SDe is the same residual SDe as in regression.

4 Don’t use bar graphs in complex situations

5 Use line graph

Comparing Means Two groups – t test (review) Mean differences are “statistically significant” (different beyond chance) relative to their standard error (SE d ), a measure of mean variability noise”). ___ ____ t = (Y 1 - Y 2 )= “signal” SE d “noise” _ Y i = mean of group i, SE d =standard error of mean difference t is mean difference in SE d units. As |t| increases, p value gets smaller. Rule of thumb: p 2 ____ ___ | Y 1 - Y 2 | > t cr SE d = 2 SE d =LSD t cr SE d = 2 SE d is the critical or least significant difference (LSD). So, getting the correct SE d is crucial!! SE d is the “yardstick” for significance

How to compute SE d ? SE d depends on n, SD and study design. (example: factorial or repeated measures) For a single mean, if n=sample size _ _____ SEM = SD/  n =  SD 2 /n __ __ For a mean difference (Y 1 - Y 2 ) The SE of the mean difference, SE d is given by _________________ SE d =  [ SD 1 2 /n 1 + SD 2 2 /n 2 ] or ________________ SE d =  [SEM 1 2 + SEM 2 2 ] If data is paired (before-after), first compute differences (d i =Y 2i -Y 1i ) for each person. For paired: SE d =SD(d i )/√n

3 or more groups-analysis of variance (ANOVA) Pooled SDs What if we have many treatment groups, each with its own mean and SD? Group Mean SD sample size (n) __ A Y 1 SD 1 n 1 B Y 2 SD 2 n 2 C Y 3 SD 3 n 3 … __ k Y k SD k n k

Check variance homogeneity

The Pooled SD e SD 2 pooled error = SD 2 e = (n 1 -1) SD 1 2 + (n 2 -1) SD 2 2 + … (n k -1) SD k 2 (n 1 -1) + (n 2 -1) + … (n k -1) ____ so, SD e = =  SD 2 e

In ANOVA - we use pooled SD e to compute SE d and to compute “post hoc” (post pooling) t statistics and p values. ____________________ SE d =  [ SD 1 2 /n 1 + SD 2 2 /n 2 ] ____________ = SD e  (1/n 1 ) + (1/n 2 ) SD 1 and SD 2 are replaced by SD p a “common yardstick”. If n 1 =n 2 =n, then SE d = SD e  2/n=constant

Transformations There are two requirements for the analysis of variance (ANOVA) model. 1. Within any treatment group, the mean should be the middle value. That is, the mean should be about the same as the median. When this is true, the data can usually be reasonably modeled by a Gaussian (“normal”) distribution. 2. The SD s should be similar (variance homogeneity) from group to group. Can plot mean vs median & residual errors to check #1 and mean versus SD to check #2.

What if its not true? Two options: a. Find a transformed scale where it is true. b. Don’t use the usual ANOVA model (use non constant variance ANOVA models or non parametric models). Option “a” is better if possible - more power.

Most common transform is log transformation Usually works for: 1. Radioactive count data 2. Titration data (titers), serial dilution data 3. Cell, bacterial, viral growth, CFUs 4. Steroids & hormones (E2, Testos, …) 5. Power data (decibels, earthquakes) 6. Acidity data (pH), … 7. Cytokines, Liver enzymes (Bilirubin…) In general, log transform works when a multiplicative phenomena is transformed to an additive phenomena.

Compute stats on the log scale & back transform results to original scale for final report. Since log(A)–log(B) =log(A/B), differences on the log scale correspond to ratios on the original scale. Remember 10 mean(log data) = geometric mean < arithmetic mean monotone transformation ladder- try these Y 2, Y 1.5, Y 1, Y 0.5 =√Y, Y 0 =log(Y), Y -0.5 =1/√Y, Y -1 =1/Y,Y -1.5, Y -2

Multiplicity & F tests Multiple testing can create “false positives”. We incorrectly declare means are “significantly” different as an artifact of doing many tests even if none of the means are truly different. Imagine we have k=four groups: A, B, C and D. There are six possible mean comparisons: A vs B A vs C A vs D B vs C B vs D C vs D

If we use p < 0.05 as our “significance” criterion, we have a 5% chance of a “false positive” mistake for any one of the six comparisons, assuming that none of the groups are really different from each other. We have a 95% chance of no false positives if none of the groups are really different. So, the chance of a “false positive” in any of the six comparisons is 1 – (0.95) 6 = 0.26 or 26%.

To guard against this we first compute the “overall” F statistic and its p value. The overall F statistic compares all the group means to the overall mean (M). __ F =  n i ( Y i – M) 2 /(k-1) =MS x = between group var (SD p ) 2 MS error within group var __ __ __ =[n 1 (Y 1 – M) 2 + n 2 (Y 2 -M) 2 + …n k (Y k -M) 2 ]/(k-1) (SD p ) 2 If “overall” p > 0.05, we stop. Only if the overall p < 0.05 will the pairwise post hoc (post overall) t tests and p values have no more than an overall 5% chance of a “false positive”.

This criterion was suggested by RA Fisher and is called the Fisher LSD (least significant difference) criterion. It is less conservative (has fewer false negatives) than the very conservative Bonferroni criterion. Bonferroni criterion: if making “m” comparisons, declare significant only if p < 0.05/m. This overall F is the same as the overall F test in regression for testing β 1 =β 2 =β 3 =…β k =0 (all regression coeffs=0).

Ex:Clond-time to fall off rod

One way analysis of variance time to fall data, k= 4 groups, df= k-1 R square0.5798 Adj R square0.5530 Root Mean Square Error=SD e 10.99 Mean of Response30.20 Observations (or Sum Wgts)51 SourceDFSum of SquaresMean SquareF RatioProb > F group37827.4382609.1521.618<.0001* Error475672.546120.69 C. Total5013499.984

Means & SDs in sec (JMP) LevelNumberMeanmedianSDSEM KO-no TBI821.19621.656.45982.2839 KO-TBI718.65918.478.73163.3002 WT-noTBI1549.19746.939.92322.5622 WT-TBI2123.90223.3313.31242.9050 No model ANOVA model, pooled SD e =10.986 sec LevelNumber Mean SEM KO-no TBI8 21.196 3.8841 KO-TBI7 18.659 4.1523 WT-noTBI15 49.197 2.8366 WT-TBI21 23.902 2.3973 Why are SEMs not the same??

Mean comparisons- post hoc t LevelMean WT-noTBIA49.197 WT-TBIB23.902 KO-no TBIB21.196 KO-TBIB18.659 Means not connected by the same letter are significantly different

Multiple comparisons-Tukey’s q As an alternative to Fisher LSD, for pairwise comparisons of “k” means, Tukey computed percentiles for q=(largest mean-smallest mean)/SE d under the null hyp that all means are equal. If mean diff > q SE d is the significance criterion, type I error is ≤ α for all comparisons. q > t > Z One looks up”t” on the q table instead of the t table.

t vs q for α=0.05, large n num means=k t q* 2 1.96 1.96 3 1.96 2.34 4 1.96 2.59 5 1.96 2.73 6 1.96 2.85 * Some tables give q for SE, not SE d, so must multiply q by √2.

Post hoc: t vs Tukey q, k=4 Level- LevelMean Diff SE difftp-Value- no correction p-Value- Tukey WT-noTBIKO-TBI30.545.036.073<.0001* WT-noTBIKO-no TBI28.004.815.822<.0001* WT-noTBIWT-TBI25.303.716.811<.0001* WT-TBIKO-TBI5.244.791.0940.27970.6952 WT-TBIKO-no TBI2.714.560.5930.55620.9338 KO-no TBIKO-TBI2.545.690.4460.65740.9700

Mean comparisons-Tukey LevelMean WT-noTBIA49.197 WT-TBIB23.902 KO-no TBIB21.196 KO-TBIB18.659 Means not connected by the same letter are significantly different

One way analysis of variance comparing means across groups-ANOVA vs regr Example: Comparing mean birth weight by race.

ANOVA via regression Coding categorical variables – dummy vs effect coding Below, we create two new variables, “af_am” and “other” from the variable “Race”. Dummy coding - “white” is the referent category Race Af_am other White-1 0 0 Black-2 1 0 Other-3 0 1

Dummy (0,1) coded variables are usually correlated with each other even in balanced designs – not orthogonal. However, they are easier to interpret. Effect coding, ‘white” is the referent category Race Af_am other White-1 -1 -1 Black-2 1 0 Other-3 0 1

In balanced designs, effect coded (-1, 0, 1) variables have zero correlation = they are orthogonal. In balanced designs, effect coded variables have sum and mean zero and cross products of zero. Under effect coding, cell means correspond to X i = -1 or 1 and marginal means correspond to X i =0.

ANOVA VIA REGRESSION (dummy vars) Birth weight Overall Analysis of Variance table Sum of Mean Source DF Squares Square F Value p value Model 2 5070608 2535304 4.97 0.0079 Error 186 94846445 509927 Total 188 99917053 Root MSE=SD e = 714.09 gm R-Square 0.0507 Dependent Mean 2944.656

ANOVA via regression - Dummy coding Variable df regr coef SE t p value Intercept 1 3103.74 72.88 42.59 <.0001 af_am 1 -384.05 157.87 -2.43 0.0159 other 1 -299.72 113.68 -2.64 0.0091 Birth wt = 3104 – 384 af_am – 300 other+error With dummy coding, regression coefficients are the mean difference from the referent group (white in this example)

ANOVA via regression (cont) effect coding for race Overall Analysis of Variance table Sum of Mean Source DF Squares Square F Value p value Model 2 5070608 2535304 4.97 0.0079 Error 186 94846445 509927 Total 188 99917053 Root MSE 714.09 R-Square 0.0507 Dependent Mean 2944.656

ANOVA via regression - effect coding Variable df regr coef SE t p value Intercept 1 2875.82 60.13 47.83 <.0001 af_am 1 -156.12 100.76 -1.55 0.1230 other 1 -71.80 78.43 -0.92 0.3612 Birth wt= 2876- 156 af_am – 72 other + error With effect coding, the 2875 is the mean of the race means, the unweighted overall mean. The regression coeffs are the deviations from this overall mean for each factor.

Multiway ANOVA

Balanced designs - ANOVA example Brain Weight data, n=7 x 4 = 28, nc=7 obs/cell DementiaSexBrain Weight (gm) NoF1223 NoF1228 NoF1222 NoF1204 NoF1234 NoF1211 NoF1217 ………

Mean brain weights (gms) in Males and Females with and without dementia Cell mean A balanced* 2 x 2 (ANOVA) design, n c = 7 obs per cell, n=7 x 4 = 28 obs total Means Dementia Males (1)Female (-1)Total Yes (1)1321.141201.711261.43 No (-1)1333.431219.861276.64 1327.291210.791269.04

MalesFemalesOverall DementiaCell Margin No dementiaCell Margin OverallMargin Terminology – cell means, marginal means

Difference in marginal sex means (Male – Female) 1327.29 - 1210.79 = 116.50, 116.50/2 = 58.25 Difference in marginal dementia means (Yes – No) 1261.43 - 1276.64 = -15.21, -15.21/2 = -7.61 Difference in cell mean differences (1321.14 - 1333.43) – (1201.71 - 1219.86) = 5.86 (1321.14 - 1201.71) – (1333.43 - 1219.86) = 5.86 note: 5.86/(2x2) = 1.46 * balanced = same sample size (n c ) in every cell

Brain weight via ANOVA - Effect coding (-1,1) MODEL: brain wt = sex dementia sex*dementia Class Levels Values sex 2 -1 1 dementia 2 -1 1 28 observations Source DF Sum of Squares Mean Square F Value Pr > F Model 3 96686 32228.70 451.05 <.0001 Error 24 1715 71.45 = SD 2 e C Total 27 98402 R-Square Coeff Var Root MSE Mean brain wt 0.9826 0.666092 8.453=SD e 1269.04 Source DF Type III SS Mean Square F Value Pr > F=p value sex 1 95005.75 95005.75 1329.64 <.0001 dementia 1 1620.32 1620.32 22.68 <.0001 sex*dementia 1 60.04 60.04 0.84 0.3685

brain weight - via regression –effect coding Sum of Mean Source DF Squares Square F Value Pr > F Model 3 96686 32228.70 451.05 <.0001 Error 24 1714.9 71.45 Corrected Total 27 98401 R-Square=0.9826 Root MSE=8.453 Mean =1269.04 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 1269.03571 1.59746 794.41 <.0001 sex 1 58.25000 1.59746 36.46 <.0001 dementia 1 -7.60714 1.59746 -4.76 <.0001 sexdem 1 1.46429 1.59746 0.92 0.3685 Brain wt=1269 +58.3 sex-7.6 dementia + 1.46 sex dementia

Balanced designs and Effect coding Type of person variable: dementia gender dementia*gender no dementia-Female -1 -1 1 dementia-Female 1 -1 -1 no dementia-Male -1 1 -1 dementia-Male 1 1 1 total 0 0 0 Correlations among X 1 =dementia, X 2 =gender, X 3 = dementia*gender Effect coding used with balanced data creates orthogonality Dementia Gender Dementia*gender Dementia 1.0 0.0 0.0 Gender 0.0 1.0 0.0 Dementia*gender 0.0 0.0 1.0

Relation between sum of squares (SS) and regression coefficients, SS=nb 2 Factor regr coefficient (b) nb 2 =Sum squares (n=28) Dementia 7.607 28(7.607) 2 = 1620.32 Gender -58.25 28 (58.25) 2 = 95005.75 Dementia*Gender 1.4643 28(1.46423) 2 = 60.036 The SS are functions of the squared regression coefficient & n. Dementia, Gender and the Dementia x Gender interaction are orthogonal. The statistical significance of each factor does not depend on whether the other factors are in the model. Makes evaluating each factor easy. Orthogonality holds if : 1. Effect coding is used in the regression 2. The design is balanced

ANOVA tables as a compact regression In general, if factor A has “a” levels (and “a” means), in a regression it must be represented by a-1 dummy or effect coded variables with a-1 corresponding regression coefficients. In the ANOVA table for factor A, the sum of squares for A (SSa), is made out of the sum of squares of the a-1 regression coefficients. DF=a-1.

Ex: a=4, a-1=3, three dummy vars SSa = constant (b 1 + b 2 + b 3 ) 2 So, if factor A is NOT significant in the ANOVA table, we can conclude that β 1 =β 2 =… β a-1 =0 without looking at each one individually, a major simplification. If factor B has “b” levels, there are a x b possible combinations (cells) and (a-1) + (b-1) + (a-1)(b-1)= ab-1 dummy (or effect coded) variables/ regression coefficients for A, B and the A x B interaction respectively. There are ab combinations of A and B. The squared effects of A, B and AxB are represented in a “condensed” form in the ANOVA table.

ANOVA table – summarizes ab-1 effects in three lines Factor df Sum Squares (SS) Mean square=SS/df A a-1 SSa SSa/(a-1) B b-1 SSb SSb/(b-1) AB (a-1)(b-1) SSab SSab/(a-1)(b-1)

When is the ANOVA table useful? Dependent Variable: depression score Source DF SS Mean Square F Value overall p value Model 199 3387.41 17.02 4.42 <.0001 Error 400 1540.17 3.85 Corrected Total 599 4927.58 root MSE=1.962, R 2 =0.687 Source DF SS Mean Square F Value p value gender 1 778.084 778.084 202.08 <.0001 race 3 229.689 76.563 19.88 <.0001 educ 4 104.838 26.209 6.81 <.0001 occ 4 1531.371 382.843 99.43 <.0001 gender*race 3 1.879 0.626 0.16 0.9215 gender*educ 4 3.575 0.894 0.23 0.9203 gender*occ 4 8.907 2.227 0.58 0.6785 race*educ 12 69.064 5.755 1.49 0.1230 race*occ 12 62.825 5.235 1.36 0.1826 educ*occ 16 60.568 3.786 0.98 0.4743 gender*race*educ 12 77.742 6.479 1.68 0.0682 gender*race*occ 12 59.705 4.975 1.29 0.2202 gender*educ*occ 16 100.920 6.308 1.64 0.0565 race*educ*occ 48 206.880 4.310 1.12 0.2792 gender*race*educ*occ 48 91.368 1.903 0.49 0.9982

8 graphs of 200 depression means. Y=depr, X=occ (occupation), X=educ. separate graph for each gender & race Males Females W W B B H H A A

One of the 8 graphs Note parallelism implying no interaction

Depression-final model Sum of Source DF Squares Mean Square F overall p Model 12 2643.981859 220.331822 56.64 <.0001 Error 587 2283.610408 3.89030 Corrected Total 599 4927.592267 R-Square Coeff Var Root MSE y Mean 0.536567 21.24713 1.972386=SD e 9.283069 Source DF SS Mean Square F Value p value gender 1 778.084257 778.08 200.01 <.0001 race 3 229.688698 76.56 19.68 <.0001 educ 4 104.837607 26.21 6.74 <.0001 occ 4 1531.371296 382.84 98.41 <.0001 Analysis shows that factors are additive (no significant interactions)

Example2 : ANOVA as a compact regression Example: Y = log pertussus antibody titer What if the potential predictive factors are: Blood type: A-, A+, B-, B+, Ab-, Ab+, O-,O+ (8 levels) Center: LA, SF, Chicago, NY, Houston, Seattle (6 levels) Vaccine: placebo, IgA, IgG (3 levels) How many β parameters are summarized? Factor df (= number of βs) SS MS=SS/df F p value (Intercept 1) -- Bloodtype 7 Center 5 Vaccine 2 Bloodtype * Center (7 x 5) = 35 Bloodtype * Vaccine (7 x 2) = 14 Center * Vaccine (5 x 2) = 10 BT*Center*Vaccine (7 x 5 x 2) = 70 Total model 144

If one of the "condensed" factors above is NOT significant, the entire set of βs for that factor can be removed from the model. The "sum of squares" ANOVA table is a condensed regression table that is useful for screening, particularly screening interactions. It allows one to test "chunks" of the model. If we also have balance, then all the parts above are orthogonal so the assessment of one factor or interaction is not affected if another factor or interaction is significant or not. This is an ideal analysis situation.

If all of the interaction terms are NOT significant, then one has proven that the influence of all the factors on the outcome is additive. If all the interaction terms for factor “B” are not significant, then the impact of factor B on Y is additive.

Balanced versus unbalanced ANOVA below “n c =” denotes the sample size in each cell unbalanced since n not same in each cell Cell and marginal mean amygdala volumes in cc MaleFemaleadj marg. mean Obs marg. mean Dementia0.5 (n c =10)0.5 (n c =90)0.50.5 (n=100) No Dementia1.5 (n c =190)1.5 (n c =10)1.51.5 (n=200) Adjusted marg. Means1.0 Observed marg. means1.45 (n=200)0.6 (n=100)n=300 (10 x 0.5 + 190 x 1.5)/200=1.45, (90 x 0.5 + 10 x 1.5)/100=0.60 Gender & dementia NOT orthogonal

Repeated measure ANOVA – paired t test example

Motivation-Jaw growth in children (cm) -Potthoff & Roy child8 yrs10 yrsdifference 121.020.0 221.021.50.5 320.524.03.5 423.524.51.0 521.523.01.5 620.021.01.0 721.522.51.0 823.0 0.0 920.021.01.0 1016.519.02.5 1124.525.00.5

8 yrs 10 yrs | difference Mean 21.1 22.2 | 1.0 SD 2.1 1.9 | 1.2 SE 0.64 0.57 | 0.36=SE d unpaired paired t 1.216 2.907 df 20 10 p value 0.2382 0.0156 Paired & unpaired t test give different p values even though they are using the same means. Corr is r= 0.83, not zero. SE d =sqrt(SE 1 2 + SE 2 2 - 2 SE 1 SE 2 r)

Repeated measures- must add subject effect Factor df SS MS F p value A a-1 SSa MS a F a p value for a B b-1 SSb MS b F b p value for b AB (a-1)(b-1) SSab MS ab F ab p value for ab Subject n-1 SSsub MS sub -- -- Error (n-1)(T-1) SSe MSe Otherwise “Error” SS is too big as it is within subject error and between subject variation combined

Factorial vs repeated measure ANOVA Model Residual SD 2 e SD e Factorial 4.07 2.02 Repeated measure 0.71 0.84 The SD e is too large if the subject effect is not taken into account. If SD e is too large, SEs are too large & p values are too large.

p values for comparing means to zero Type 3 Tests of Fixed Effects (ANOVA table) Effect Num DF Den DF F Value Pr > F year 1 20 1.48 0.2382 Least Squares Means Standard Effect year Estimate Error DF t Value Pr > |t| year 8 21.1818 0.6080 20 34.84 <.0001 year 10 22.2273 0.6080 20 36.56 <.0001 Differences of Least Squares Means Standard Effect year year Estimate Error DF t Value Pr > |t| year 8 10 -1.0455 0.8598 20 -1.22 0.2382

Repeated measure ANOVA Correct p value for comparing means (Co)variance Parameter Estimates Cov Parm Estimate id 3.3545 =SD p 2 -between person var (controlled) Residual 0.7114 =SD e 2 -within person variation (note 3.3545 + 0.7114 = 4.0659 from incorrect analysis) Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F year 1 10 8.45 0.0156 Least Squares Means Standard Effect year Estimate Error DF t Value Pr > |t| year 8 21.1818 0.6080 10 34.84 <.0001 year 10 22.2273 0.6080 10 36.56 <.0001 Differences of Least Squares Means Standard Effect year year Estimate Error DF t Value Pr > |t| year 8 10 -1.0455 0.3596 10 -2.91 0.0156