Download presentation

Presentation is loading. Please wait.

Published byTamsyn Wright Modified about 1 year ago

1
ANOVA example 4 Polychlorinated biphenyls (PCBs) previously used in the manufacture of large electrical transformers and capacitors, are extremely hazardous contaminants when released into the environment. Samples of fish were taken from each of four rivers and analyzed for PCB concentration (in ppm)

2
Question 4 Do the data provide sufficient evidence to indicate differences in the mean PCB concentration in fish for the four rivers? 4 Hypotheses: –H 0 : 1 = 2 = 3 = 4 –H A :the means are not all equal (at least one mean is not equal to the others.

3
First Step 4 Examine the data. What does this mean? 4 Boxplots 4 Histograms 4 Normal Quantile plots –Note command line language to do a grid of 4 probability plots at once. Go to File-->New--> Script File. Paste them into the script file window. Press F10 and the 4 plots are produced automatically. –OR: just paste these in the command line. par(mfrow=c(2,2)) for (i in 1:4) { qqnorm(PCBfish[,1][PCBfish[,2]==i],ylab="Data quantiles") title (paste("River ",i,sep=""))}

4
Can we do an ANOVA? What are the criteria? 4 Normally distributed 4 Equal standard deviations 4 Independent samples across treatments –What might this look like if it weren’t true? –Rivers connected? 4 Independent sample within treatments –What might this look like if it weren’t true? –Clustering?

5
Transformations (p. 65 & 69 of Sleuth) 4 Log transformation. –Why try this? –Ratio of largest to smallest > 10, data are skewed, and the group with the larger average has the larger spread 4 When do reciprocal –waiting times 4 When do square root? –Data are counts

6
Better? 4 Why or why not? 4 Standard deviations are much more similar

7
Do an ANOVA 4 Read table: –sum of squares –S pooled and s pooled 2 –F-value –p-value 4 What are your conclusions?

8
Conclusions 4 We can reject the null hypothesis of no difference in these group means. 4 At least one of the means is different from the others (is this statement the same as accepting the alternative hypothesis?) 4 “Convincing evidence exists that median PCB concentration of fish in these rivers is different (p-value of 0.002; analysis of variance F-test).”

9
Compare just two rivers... 4 Average and 95% CI for the difference in PCB in fish between Rivers 1 and 2 4. 4 Logged data, so… –1.09-1.52= river2 - river1 =-0.43 –e -0.43 =0.65 –The median concentration of PCB in fish in River 1 is 0.65 times that of fish in River 2.

10
Is this significant? 4 Two-sided, two-sample T-test: 4 Must do calculation of t-statistic (and p- value) by hand, because need to use s pooled to calculate SE. 4 S pool 4 SE:

11
Hypothesis test 4 Test the hypothesis that River1-River2=0 –Estimate/SE: –Suggestive only of a difference (in fact, at the 0.05 level, we would not reject the null), but we’ll still do a CI for practice

12
95% CI 4 95% CI for the difference in group means –qt(0.975,88); [1] 1.98729 –-0.43±(1.99)(0.28)-->(-0.98,0.13) –e -0.98 =0.37;e 0.13 =1.14 –Fish in River 1 have between 0.39 to 1.14 times as much PCB in their muscle as fish in River 2. (Are we surprised that this covers 1?)

13
ANOVA Explanation 4 Reduced model=equal means model –All these rivers have the same mean PCB concentration in the fish: null hypothesis 4 How wrong are we for this hypothesis? –Residual error is how wrong we are –Large residuals here mean the null hypothesis fits poorly

14
Graph of PCB in Each River: Equal Means =1.64 } Residual for highest point in River 1 to Equal Means average

15
ANOVA by hand (conceptual)

16
Graph of PCB in Each River: Separate Means } Residual for highest point in River 1 to Separate Means Model

17
ANOVA by hand (conceptual)

18
Model Inaccuracy 4 If the null hypothesis is correct, –The two models should be about equal in their ability to explain the data –AND, the magnitudes of the residuals should be about the same 4 If the null hypothesis is incorrect –The magnitudes of the residuals from the equal- means model will tend to be larger –Their larger sizes reflect model inaccuracy

19
Residual Sum of Squares 4 We need a single summary of the residuals for a particular model. Statisticians have chosen the sum of the squared residuals -- the residual sum of squares

20
Extra Sum of Squares 4 The error from your reduced (equal means) model - your error from your full (separate means) model is the difference in sizes of residuals from the full and reduced model. 4 This is called the Extra Sum of Squares 4 Another way to say this is: that the ESS measures the amount of unexplained variability in the reduced model that is explained by the full model. 4 How much better is it to say that each river has its own mean than to say that all the rivers have their own mean? 4 Thus: ESS=RSS reduced -RSS full

21
F-Statistic 4 How much difference in the models is enough to say it is significant (the same questions we’ve asked through t-tests, etc)? 4 We compare these two levels of unexplained variability in an F-test. 4 We take their difference, divide by the extra degrees of freedom, and scale them by the best estimate we have of variance

22
F-test (cont) 4 Large F-statistics are associated with large differences in the size of residuals from the two models. 4 This is evidence against the reduced model (null hyp) and in favor of the full model (different means). 4 This test is summarized by its p-value (based on an F-distribution).

23
ANOVA Table

24
S+ Printout 4 Residual standard error: 0.9200322 Df Sum of Sq Mean Sq F Value Pr(F) river 3 14.018 4.673 5.520 0.0016 Residuals 88 74.488 0.846 4 We can reject the null hypothesis of no difference in medians. At least one river has a different median PCB concentration 4 For some reason, S+ does not print out the reduced model information (total) that is on the ANOVA table we make by hand.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google