# Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.

## Presentation on theme: "Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1."— Presentation transcript:

Copyright ©2011 Brooks/Cole, Cengage Learning 2 A statistic is a numerical value computed from a sample. Its value may differ for different samples. e.g. sample mean, sample standard deviation s, and sample proportion. A parameter is a numerical value associated with a population. Considered fixed and unchanging. e.g. population mean , population standard deviation , and population proportion p. 9.1 Parameters, Statistics, and Statistical Inference

Copyright ©2011 Brooks/Cole, Cengage Learning 3 ANOVA Analysis of variance: tool for analyzing how the mean value of a quantitative response variable is affected by one or more categorical explanatory factors. If one categorical variable: one-way ANOVA If two categorical variables: two-way ANOVA

Copyright ©2011 Brooks/Cole, Cengage Learning 4 16.1 Comparing Means with an ANOVA F-Test F-statistic: H 0 :  1 =  2 = … =  k H a : The population means are not all equal.

Copyright ©2011 Brooks/Cole, Cengage Learning 5 Variation among sample means is 0 if all k sample means are equal and gets larger the more spread out they are. If large enough  evidence at least one population mean is different from others  reject null hypothesis. p-value found using an F-distribution (more later)

Copyright ©2011 Brooks/Cole, Cengage Learning 6 Example 16.1 Seat Location and GPA Q: Do best students sit in the front of a classroom? Data on seat location and GPA for n = 384 students; 88 sit in front, 218 in middle, 78 in back Students sitting in the front generally have slightly higher GPAs than others.

Copyright ©2011 Brooks/Cole, Cengage Learning 7 Example 16.1 Seat Location and GPA The F-statistic is 6.69 and the p-value is 0.0001. p-value so small  reject H 0 and conclude there are differences among the population means. H 0 :  1 =  2 =  3 H a : The three population means are not all equal.

Copyright ©2011 Brooks/Cole, Cengage Learning 8 Example 16.1 Seat Location and GPA 95% Confidence Intervals for 3 population means: Interval for “front” does not overlap with the other two intervals  significant difference between mean GPA for front-row sitters and mean GPA for other students

Copyright ©2011 Brooks/Cole, Cengage Learning 9 Notation for Summary Statistics k = number of groups, s i, and n i are the mean, standard deviation, and sample size for the i th sample group N = total sample size = n 1 + n 2 + … + n k Example 16.2 Seat Location and GPA Three seat locations  k = 3 n 1 = 88, n 2 = 218, n 3 = 78; N = 88+218+78 = 384

Copyright ©2011 Brooks/Cole, Cengage Learning 10 Assumptions for the F-Test Samples are independent random samples. Distribution of response variable is a normal curve within each population. Different populations may have different means. All populations have same standard deviation, . How k = 3 populations might look …

Copyright ©2011 Brooks/Cole, Cengage Learning 11 Conditions for Using the F-Test F-statistic can be used if data are not extremely skewed, there are no extreme outliers, and group standard deviations are not markedly different. Tests based on F-statistic are valid for data with skewness or outliers if sample sizes are large. A rough criterion for standard deviations is that the largest of the sample standard deviations should not be more than twice as large as the smallest of the sample standard deviations.

Copyright ©2011 Brooks/Cole, Cengage Learning 12 Example 16.3 Seat Location and GPA The boxplot showed two outliers in the group of students who typically sit in the middle of a classroom, but there are 218 students in that group so these outliers don’t have much influence on the results. The standard deviations for the three groups are nearly the same. Data do not appear to be skewed. Necessary conditions for F-test seem satisfied.

Copyright ©2011 Brooks/Cole, Cengage Learning 13 The Family of F-Distributions Skewed distributions with minimum value of 0. Specific F-distribution indicated by two parameters called degrees of freedom: numerator degrees of freedom and denominator degrees of freedom. In one-way ANOVA, numerator df = k – 1, and denominator df = N – k

Copyright ©2011 Brooks/Cole, Cengage Learning 14 Determining the p-Value Statistical Software reports the p-value in output. Table A.4 provides critical values for 1% and 5% significance levels. If the F-statistic is > than the 5% critical value, the p-value < 0.05. If the F-statistic is > than the 1% critical value, the p-value < 0.01. If the F-statistic is between the 1% and 5% critical values, the p-value is between 0.01 and 0.05.

Copyright ©2011 Brooks/Cole, Cengage Learning 15 16.2Details of One-Way Analysis of Variance Fundamental concept: the variation among the data values in the overall sample can be separated into: (1) differences between group means (2) natural variation among observations within a group Total variation = Variation between groups + Variation within groups ANOVA Table displays this information.

Copyright ©2011 Brooks/Cole, Cengage Learning 16 Measuring Variation Between Groups Sum of squares for groups = SS Groups Numerator of F-statistic = mean square for groups

Copyright ©2011 Brooks/Cole, Cengage Learning 17 Measuring Variation within Groups Sum of squared errors = SS Error Denominator of F-statistic = mean square error Pooled standard deviation:

Copyright ©2011 Brooks/Cole, Cengage Learning 18 Measuring Total Variation Total sum of squares = SS Total = SSTO SS Total = SS Groups + SS Error

Copyright ©2011 Brooks/Cole, Cengage Learning 19 General Format of a One-Way ANOVA Table

Copyright ©2011 Brooks/Cole, Cengage Learning 20 Example 16.7 Analysis of Variation among Weight Losses Program 3 appears to have the highest weight loss overall.

Copyright ©2011 Brooks/Cole, Cengage Learning 21 Example 16.8 Analysis of Variation among Weight Losses

Copyright ©2011 Brooks/Cole, Cengage Learning 22 Example 16.8 Analysis of Variation among Weight Losses

Copyright ©2011 Brooks/Cole, Cengage Learning 23 Example 16.8 Analysis of Variation among Weight Losses

Copyright ©2011 Brooks/Cole, Cengage Learning 24 Example 16.8 Analysis of Variation among Weight Losses “Factor” used instead of Groups as the groups (weight-loss programs) form an explanatory factor for the response. Note: Pooled StDev is

Copyright ©2011 Brooks/Cole, Cengage Learning 25 Example 16.9 Top Speeds of Supercars Data: top speeds for six runs on each of five supercars. Kitchens (1998, p. 783)

Copyright ©2011 Brooks/Cole, Cengage Learning 27 Example 16.9 Top Speeds F = 25.15 and p-value is 0.000  reject null hypothesis that population mean speeds are same for all five cars. Conditions are satisfied. Data not skewed and no extreme outliers. Largest sample std dev (5.02 Viper) not more than twice as large as smallest std dev (2.92 Acura). MS Error =14.5 is an estimate of variance of top speed for hypothetical distribution of all possible runs with one car. Estimated standard deviation for each car is 3.81. Based on sample means and CIs: Porsche and Ferrari seem to be significantly faster than other cars.

Copyright ©2011 Brooks/Cole, Cengage Learning 28 Computation of 95% Confidence Intervals for the Population Means In one-way analysis of variance, a confidence interval for a population mean   is where and t* is such that the confidence level is the probability between -t* and t* in a t-distribution with df = N – k.

Copyright ©2011 Brooks/Cole, Cengage Learning 29 16.3 Other Methods When data are skewed or extreme outliers present …better to analyze the median instead of mean Two such tests are: 1.Kruskal-Wallis Test 2.Mood’s Median Test Also called nonparametric tests. H 0 : Population medians are equal. H a : Population medians are not all equal.

Copyright ©2011 Brooks/Cole, Cengage Learning 30 Example 16.12 Drinks and Seat Location Data: Seat location and number of alcoholic drinks per week Students sitting in the back report drinking more. Data appear skewed, sample standard deviations differ.

Copyright ©2011 Brooks/Cole, Cengage Learning 31 Example 16.12 Drinks and Seat Location P = 0.000  strong evidence that the population median number of drinks per week are not all equal.

Copyright ©2011 Brooks/Cole, Cengage Learning 32 Example 16.13 Drinks and Seat Location P = 0.000 => the null hypothesis of equal population medians can be rejected.