 ## Presentation on theme: "Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13."— Presentation transcript:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 2 Hypothesis testing about: a population mean or mean difference (paired data) the difference between means of two populations the difference between two population proportions Three Cautions: 1. Inference is only valid if the sample is representative of the population for the question of interest. 2. Hypotheses and conclusions apply to the larger population(s) represented by the sample(s). 3. If the distribution of a quantitative variable is highly skewed, consider analyzing the median rather than the mean – called nonparametric methods (Topic 2 on CD).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 3 13.1 General Ideas of Significance Testing Steps in Any Hypothesis Test 1.Determine the null and alternative hypotheses. 2.Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic. 3.Assuming the null hypothesis is true, find the p-value. 4.Decide whether or not the result is statistically significant based on the p-value. 5.Report the conclusion in the context of the situation.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 4 13.2 Testing Hypotheses About One Mean or Paired Data Step 1: Determine null and alternative hypotheses 1. H 0 :  =  0 versus H a :    0 (two-sided) 2. H 0 :    0 versus H a :  <  0 (one-sided) 3. H 0 :    0 versus H a :  >  0 (one-sided) Often H 0 for a one-sided test is written as H 0 :  =  0. Remember a p-value is computed assuming H 0 is true, and  0 is the value used for that computation.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 5 Situation 1: Population of measurements of interest is approximately normal, and a random sample of any size is measured. In practice, use method if shape is not notably skewed or no extreme outliers. Situation 2: Population of measurements of interest is not approximately normal, but a large random sample (n  30) is measured. If extreme outliers or extreme skewness, better to have a larger sample. Step 2: Verify Necessary Data Conditions …

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 6 The t-statistic is a standardized score for measuring the difference between the sample mean and the null hypothesis value of the population mean: Continuing Step 2: The Test Statistic This t-statistic has (approx) a t-distribution with df = n - 1.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 7 For H a less than, the p-value is the area below t, even if t is positive. For H a greater than, the p-value is the area above t, even if t is negative. For H a two-sided, p-value is 2  area above |t|. Step 3: Assuming H 0 true, Find the p-value

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 8 These two steps remain the same for all of the hypothesis tests considered in this book. Choose a level of significance , and reject H 0 if the p-value is less than (or equal to) . Otherwise, conclude that there is not enough evidence to support the alternative hypothesis. Steps 4 and 5: Decide Whether or Not the Result is Statistically Significant based on the p-value and Report the Conclusion in the Context of the Situation

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9 Example 13.1 Normal Body Temperature What is normal body temperature? Is it actually less than 98.6 degrees Fahrenheit (on average)? Step 1: State the null and alternative hypotheses H 0 :  = 98.6 H a :  < 98.6 where  = mean body temperature in human population.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 10 Example 13.1 Normal Body Temp (cont) Data: random sample of n = 18 normal body temps Step 2: Verify data conditions … 98.2 97.8 99.0 98.6 98.2 97.8 98.4 99.7 98.2 97.4 97.6 98.4 98.0 99.2 98.6 97.1 97.2 98.5 Boxplot shows no outliers nor strong skewness. Sample mean of 98.217 is close to sample median of 98.2.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 11 Example 13.1 Normal Body Temp (cont) Step 2: … Summarizing data with a test statistic Test of mu = 98.600 vs mu < 98.600 Variable N Mean StDev SE Mean T P Temperature 18 98.2170.684 0.161 -2.38 0.015 Key elements: Sample statistic: = 98.217 (under “Mean”) Standard error: (under “SE Mean”) (under “T”)

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 12 Example 13.1 Normal Body Temp (cont) Step 3: Find the p-value From output: p-value = 0.015 From Table A.3: p-value is between 0.016 and 0.010. Area to left of t = -2.38 equals area to right of t = +2.38. The value t = 2.38 is between column headings 2.33 and 2.58 in table, and for df =17, the one-sided p-values are 0.016 and 0.010.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 13 Example 13.1 Normal Body Temp (cont) Step 4: Decide whether or not the result is statistically significant based on the p-value Using  = 0.05 as the level of significance criterion, the results are statistically significant because 0.015, the p-value of the test, is less than 0.05. In other words, we can reject the null hypothesis. Step 5: Report the Conclusion We can conclude, based on these data, that the mean temperature in the human population is actually less than 98.6 degrees.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 14 Paired Data and the Paired t-Test Data: two variables for n individuals or pairs; use the difference d = x 1 – x 2. Parameter:  d = population mean of differences Sample estimate: = sample mean of the differences Standard deviation and standard error: s d = standard deviation of the sample of differences; Often of interest: Is the mean difference in the population different from 0?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 15 Steps for a Paired t-Test Step 1: Determine null and alternative hypotheses H 0 :  d =  versus H a :  d   or H a :  d  Watch how differences are defined for selecting the H a. Step 2: Verify data conditions and compute test statistic Conditions apply to the differences. The t-test statistic is: Steps 3, 4 and 5: Similar to t-test for a single mean. The df = n – 1, where n is the number of differences.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 16 Example 13.2 Effect of Alcohol Study: n = 10 pilots perform simulation first under sober conditions and then after drinking alcohol. Response: Amount of useful performance time. (longer time is better) Question: Does useful performance time decrease with alcohol use? Step 1: State the null and alternative hypotheses H 0 :  d = 0 versus H a :  d > 0 where  d = population mean difference between alcohol and no alcohol measurements if all pilots took these tests.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 17 Example 13.2 Effect of Alcohol (cont) Data: random sample of n = 10 time differences Step 2: Verify data conditions … Boxplot shows no outliers nor extreme skewness.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 18 Example 13.2 Effect of Alcohol (cont) Step 2: … Summarizing data with a test statistic Test of mu = 0.0 vs mu > 0.0 Variable N Mean StDev SE Mean T P Diff 10 195.6230.5 72.9 2.68 0.013 Key elements: Sample statistic: = 195.6 (under “Mean”) Standard error: (under “SE Mean”) (under “T”)

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 19 Example 13.2 Effect of Alcohol (cont) Step 3: Find the p-value From output: p-value = 0.013 From Table A.3: p-value is between 0.007 and 0.015. The value t = 2.68 is between column headings 2.58 and 3.00 in the table, and for df =9, the one-sided p-values are 0.015 and 0.007.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 20 Example 13.2 Effect of Alcohol (cont) Steps 4 and 5: Decide whether or not the result is statistically significant based on the p-value and Report the Conclusion Using  = 0.05 as the level of significance criterion, we can reject the null hypothesis since the p-value of 0.013 is less than 0.05. Even with a small experiment, it appears that alcohol has a statistically significant effect and decreases performance time.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 21 Rejection Region Approach Replaces Steps 3 and 4 with: Substitute Step 3: Find the critical value and rejection region for the test. Substitute Step 4: If the test statistic is in the rejection region, conclude that the result is statistically significant and reject the null hypothesis. Otherwise, do not reject the null hypothesis. Note: Rejection region method and p-value method will always arrive at the same conclusion about statistical significance.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 22 Rejection Region Approach Summary (use row of Table A.2 corresponding to df) For Example 13.1 Normal Body Temperature? Alternative was one-sided to the left, df = 17, and  = 0.05. Critical value from table A.2 is –1.74. Rejection region is t  – 1.74. The test statistic was –2.38 so the null hypothesis is rejected. Same conclusion is reached.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 23 13.3 Testing The Difference between Two Means (Indep) Step 1: Determine null and alternative hypotheses H 0 :  1 –  2 =  versus H a :  1 –  2   or H a :  1 –  2  Watch how Population 1 and 2 are defined. Step 2: Verify data conditions and compute test statistic Both n’s are large or no extreme outliers or skewness in either sample. Samples are independent. The t-test statistic is: Steps 3, 4 and 5: Similar to t-test for one mean.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 24 Example 13.3 Effect of Stare on Driving Question: Does stare speed up crossing times? Step 1: State the null and alternative hypotheses H 0 :  1 –  2 =  versus H a :  1 –  2 >  where 1 = no-stare population and 2 = stare population. Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 25 Example 13.3 Effect of Stare (cont) Data: n 1 = 14 no stare and n 2 = 13 stare responses Step 2: Verify data conditions … No outliers nor extreme skewness for either group.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 26 Example 13.3 Effect of Stare (cont) Step 2: … Summarizing data with a test statistic Sample statistic: = 6.63 – 5.59 = 1.04 seconds Standard error:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 27 Example 13.3 Effect of Stare (cont) Steps 3, 4 and 5: Determine the p-value and make a conclusion in context. The p-value = 0.013, so we reject the null hypothesis, the results are “statistically significant”. The p-value is determined using a t-distribution with df = 21 (df using Welch approximation formula) and finding area to right of t = 2.41. Table A.3 => p-value is between 0.009 and 0.015. We can conclude that if all drivers were stared at, the mean crossing times at an intersection would be faster than under normal conditions.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 28 Pooled Two-Sample t-Test Based on assumption that the two populations have equal population standard deviations: Note: Pooled df = (n 1 – 1) + (n 2 – 1) = (n 1 + n 2 – 2).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 29 Guidelines for Using Pooled t-Test If sample sizes are equal, pooled and unpooled standard errors are equal and so t-statistic is same. If sample standard deviations are similar, assumption of common population variance is reasonable and pooled procedure can be used. If sample sizes are very different, pooled test can be quite misleading unless sample standard deviations similar. If sample sizes very different and smaller standard deviation accompanies larger sample size, do not recommend using pooled procedure. If sample sizes are very different, standard deviations are similar, and larger sample size produced the larger standard deviation, pooled t-test is acceptable and will be conservative.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 30 The null and alternative hypotheses are: H 0 :  1 –  2 =  versus H a :  1 –  2   where 1 = female population and 2 = male population. Example 13.5 Male and Female Sleep Times Data:The 83 female and 65 male responses from students in an intro stat class. Note: Sample sizes similar, sample standard deviations similar. Use of pooled procedure is warranted. Q: Is there a difference between how long female and male students slept the previous night?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 31 Example 13.5 Male and Female Sleep Times Two-sample T for sleep [without “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female 83 7.02 1.75 0.19 Male 65 6.55 1.68 0.21 95% CI for mu(f) – mu(m): (-0.10, 1.02) T-Test mu (f) = mu(m) (vs not =): T-Value = 1.62 P = 0.11 DF = 140 Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female 83 7.02 1.75 0.19 Male 65 6.55 1.68 0.21 95% CI for mu(f) – mu(m): (-0.10, 1.03) T-Test mu (f) = mu(m) (vs not =): T-Value = 1.62 P = 0.11 DF = 146 Both use Pooled StDev = 1.72

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 32 13.4 Testing The Difference Between Two Population Proportions Step 1: Determine null and alternative hypotheses H 0 : p 1 – p 2 =  versus H a : p 1 – p 2   or H a : p 1 – p 2  Watch how Population 1 and 2 are defined. Samples are independent. Sample sizes are large enough so that – – are at least 5 and preferably at least 10. Step 2: Verify data conditions …

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 33 Under the null hypothesis, there is a common population proportion p. This common value is estimated using all the data as: Continuing Step 2: The Test Statistic This z-statistic has (approx) a standard normal distribution. The standardized test statistic is:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 34 For H a less than, the p-value is the area below z, even if z is positive. For H a greater than, the p-value is the area above z, even if z is negative. For H a two-sided, p-value is 2  area above |z|. Step 3: Assuming H 0 true, Find the p-value Steps 4 and 5: Decide Whether or Not the Result is Statistically Significant based on p-value and Make a Conclusion in Context Choose a level of significance , and reject H 0 if the p-value is less than (or equal to) .

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 35 Example 13.6 Prevention of Ear Infections Question:Does the use of sweetener xylitol reduce the incidence of ear infections? Step 1: State the null and alternative hypotheses H 0 : p 1 – p 2 =  versus H a : p 1 – p 2 >  where p 1 = population proportion with ear infections on placebo p 2 = population proportion with ear infections on xylitol Randomized Experiment Results: Of 165 children on placebo, 68 got ear infection. Of 159 children on xylitol, 46 got ear infection.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 36 Example 13.6 Ear Infections (cont) Step 2: Verify conditions and compute z statistic There are at least 10 children in each sample who did and did not get ear infections, so conditions are met.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 37 Example 13.6 Ear Infections (cont) Steps 3, 4 and 5: Determine the p-value and make a conclusion in context. The p-value is the area above z = 2.32 using Table A.1. We have p-value = 0.0102. So we reject the null hypothesis, the results are “statistically significant”. We can conclude that taking xylitol would reduce the proportion of ear infections in the population of similar preschool children in comparison to taking a placebo.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 38 13.5 Relationship Between Tests and Confidence Intervals For two-sided tests (for one or two means): H 0 : parameter = null value and H a : parameter  null value Note: 95% confidence interval  5% significance level 99% confidence interval  1% significance level If the null value is covered by a (1 –  )100% confidence interval, the null hypothesis is not rejected and the test is not statistically significant at level . If the null value is not covered by a (1 –  )100% confidence interval, the null hypothesis is rejected and the test is statistically significant at level .

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 39 Example 13.4 Mean TV hours (M vs F) Question: Does the population mean daily TV hours differ for male and female college students? 95% CI for difference in population means: (-0.14, +0.98) Test H 0 :  1 –  2 =  versus H a :  1 –  2   using  = 0.05 The null value of 0 hours is in this interval. Thus the difference in the sample means of 0.42 hours is not significantly different from 0.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 40 Confidence Intervals and One-Sided Tests If the null value is covered by the interval, the test is not statistically significant at level . For the alternative H a : parameter > null value, the test is statistically significant at level  if the entire interval falls above the null value. For the alternative H a : parameter < null value, the test is statistically significant at level  if the entire interval falls below the null value. When testing the hypotheses: H 0 : parameter = null value versus a one-sided alternative, compare the null value to a (1 – 2  )100% confidence interval:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 41 Example 13.6 Ear Infections (cont) 95% CI for p 1 – p 2 is 0.020 to 0.226 Reject H 0 : p 1 – p 2 =  and accept H a : p 1 – p 2 >  with  = 0.025, because the entire confidence interval falls above the null value of 0. Note that the p-value for the test was 0.01, which is less than 0.025.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 42 13.6 Choosing an Appropriate Inference Procedure Confidence Interval or Hypothesis Test? Is main purpose to estimate the numerical value of a parameter or to make a “maybe not/maybe yes” conclusion about a specific hypothesized value for a parameter? Determining the Appropriate Parameter Is response variable categorical or quantitative? Is there one sample or two? If two, independent or paired?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 43 13.7 The Two Types of Errors and Their Probabilities When the null hypothesis is true, the probability of a type 1 error, the level of significance, and the  -level are all equivalent. When the null hypothesis is not true, a type 1 error cannot be made.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 44 Trade-Off in Probability for Two Errors There is an inverse relationship between the probabilities of the two types of errors. Increase probability of a type 1 error => decrease in probability of a type 2 error

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 45 Type 2 Errors and Power Three factors that affect probability of a type 2 error 1. Sample size; larger n reduces the probability of a type 2 error without affecting the probability of a type 1 error. 2. Level of significance; larger  reduces probability of a type 2 error by increasing the probability of a type 1 error. 3. Actual value of the population parameter; (not in researcher’s control. Farther truth falls from null value (in H a direction), the lower the probability of a type 2 error. When the alternative hypothesis is true, the probability of making the correct decision is called the power of a test.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 46 13.8 Effect Size Effect size is a measure of how much the truth differs from chance or from a control condition. Effect size for a single mean: Effect size for comparing two means:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 47 Estimating Effect Size Estimated effect size for a single mean: Estimated effect size for comparing two means: Relationship: Test statistic = Size of effect  Size of study