We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byNatalie Owen
Modified over 2 years ago
Copyright © 2010 Pearson Education, Inc. Slide
Copyright © 2010 Pearson Education, Inc. Slide Solution: C
Copyright © 2010 Pearson Education, Inc. Chapter 21 More About Tests and Intervals
Copyright © 2010 Pearson Education, Inc. Slide Null Hypothesis To perform a hypothesis test, the null must be a statement about the value of a parameter for a model. We then use this value to compute the probability that the observed sample statisticor something even farther from the null valuemight occur. You cannot prove a null hypothesis true. Use what you want to show as the alternative.
Copyright © 2010 Pearson Education, Inc. Slide Example: The diabetes drug Avandia was approved to treat type II diabetes in But in 2007 an article in the New England Journal of Medicine raised concerns that the drug might carry an increased risk of heart attack. This study combined results from a number of other separate studies to obtain an overall sample of 4485 diabetes patients taking Avandia. People with Type 2 diabetes are known to have about a 20.2% chance of suffering a heart attack within a seven-year period. According to the articles author, the risk found in the NEJM study was equivalent to a 28.9% chance of heart attack over a 7 years. The FDA is the government agency responsible for relabeling Avandia to warn of the risk if it is judged to be unsafe. Although the statistical methods they used are more sophisticated, we can get an idea of their reasoning with the tools we have learned. What null hypothesis and alternative hypothesis about 7-yr heart attack risk would you test? Explain.
Copyright © 2010 Pearson Education, Inc. Slide How to Think About P-Values A P-value is a conditional probability P(observed statistic given that the null hypothesis is true). The P-value is NOT the probability that the null hypothesis is true. Its not even the conditional probability that null hypothesis is true given the data.
Copyright © 2010 Pearson Education, Inc. Slide Example: A NEJM paper reported that the 7-yr risk of heart attack in diabetes patients taking the drub Avandia was increased from the baseline of 20.2% to an estimated 28.9% and said the P- value was.03. How should the P-value be interpreted in this context?
Copyright © 2010 Pearson Education, Inc. Slide What to Do with a High P-Value When we see a small P-value, we could continue to believe the null hypothesis and conclude that we just witnessed a rare event. But instead, we trust the data and use it as evidence to reject the null hypothesis. However big P-values just mean what we observed isnt surprising. That is, the results are now in line with our assumption that the null hypothesis models the world, so we have no reason to reject it. A big P-value doesnt prove that the null hypothesis is true, but it certainly offers no evidence that it is not true. Thus, when we see a large P-value, all we can say is that we dont reject the null hypothesis.
Copyright © 2010 Pearson Education, Inc. Slide Example: The question of whether Avandia increased the risk of heart attack was raised by a study in the NEJM. This study estimated the seven-year risk of heart attack to be 28.9% and reported a P-value of.03 for a test of whether this risk was higher than the baseline seven-year risk of 20.2%. An earlier study had estimated the 7-yr risk to be 26.9% and reported a P-value of Why did the researchers in the earlier study not express alarm about the increased risk they had seen?
Copyright © 2010 Pearson Education, Inc. Slide Alpha Levels Sometimes we need to make a firm decision about whether or not to reject the null hypothesis. When the P-value is small, it tells us that our data are rare given the null hypothesis. We can define rare event arbitrarily by setting a threshold for our P-value. If our P-value falls below that point, well reject H 0. We call such results statistically significant. The threshold is called an alpha level, denoted by.
Copyright © 2010 Pearson Education, Inc. Slide Alpha Levels (cont.) Common alpha (also called the significance level) levels are 0.10, 0.05, and Consider your alpha level carefully and choose an appropriate one for the situation. When we reject the null hypothesis, we say that the test is significant at that level. What can you say if the P-value does not fall below ? You should say that The data have failed to provide sufficient evidence to reject the null hypothesis. Dont say that you accept the null hypothesis. In a jury trial, if we do not find the defendant guilty, we say the defendant is not guiltywe dont say that the defendant is innocent.
Copyright © 2010 Pearson Education, Inc. Slide Alpha Levels (cont.) The P-value gives the reader far more information than just stating that you reject or fail to reject the null. In fact, by providing a P-value to the reader, you allow that person to make his or her own decisions about the test. What you consider to be statistically significant might not be the same as what someone else considers statistically significant. There is more than one alpha level that can be used, but each test will give only one P-value.
Copyright © 2010 Pearson Education, Inc. Slide Significant vs. Important What do we mean when we say that a test is statistically significant? All we mean is that the test statistic had a P-value lower than our alpha level. Dont be lulled into thinking that statistical significance carries with it any sense of practical importance or impact.
Copyright © 2010 Pearson Education, Inc. Slide Significant vs. Important (cont.) For large samples, even small, unimportant (insignificant) deviations from the null hypothesis can be statistically significant. On the other hand, if the sample is not large enough, even large, financially or scientifically significant differences may not be statistically significant. Its good practice to report the magnitude of the difference between the observed statistic value and the null hypothesis value (in the data units) along with the P-value on which we base statistical significance.
Copyright © 2010 Pearson Education, Inc. Slide Confidence Intervals and Hypothesis Tests Confidence intervals and hypothesis tests are built from the same calculations. They have the same assumptions and conditions. You can approximate a hypothesis test by examining a confidence interval. Just ask whether the null hypothesis value is consistent with a confidence interval for the parameter at the corresponding confidence level.
Copyright © 2010 Pearson Education, Inc. Slide Confidence Intervals and Hypothesis Tests (cont.) Because confidence intervals are two-sided, they correspond to two-sided tests. In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an -level of 100 – C%. The relationship between confidence intervals and one- sided hypothesis tests is a little more complicated. A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an - level of ½(100 – C)%.
Copyright © 2010 Pearson Education, Inc. Slide Example: The baseline 7-yr risk of heart attacks from diabetics is 20.2%. In 2007 a NEJM study reported a 95% confidence interval equivalent to 20.8% to 40.0% for the risk among patients taking Avandia. What did this confidence interval suggest to the FDA about the safety of the drug?
Copyright © 2010 Pearson Education, Inc. Slide *A 95% Confidence Interval for Small Samples When the Success/Failure Condition fails, all is not lost. Add four phony observations, two successes and two failures. So instead of we use the adjusted proportion
Copyright © 2010 Pearson Education, Inc. Slide *A Better Confidence Interval for Proportions (cont.) Now the adjusted interval is The adjusted form gives better performance overall and works much better for proportions of 0 or 1. It has the additional advantage that we no longer need to check the Success/Failure Condition.
Copyright © 2010 Pearson Education, Inc. Slide Example: Surgeons claimed their results to compare two methods for a surgical procedure used to alleviate pain on the outside of the wrist. A new method was compared with the traditional freehand method procedure. Of 45 operations using the freehand method, three were unsuccessful, for a failure rate of 6.7%. With only 3 failures, the data dont satisfy the success/failure condition, so we cant use a a standard confidence interval. What is the confidence interval using the plus-four method?
Copyright © 2010 Pearson Education, Inc. Slide Making Errors When we perform a hypothesis test, we can make mistakes in two ways: I. The null hypothesis is true, but we mistakenly reject it. (Type I error) II. The null hypothesis is false, but we fail to reject it. (Type II error)
Copyright © 2010 Pearson Education, Inc. Slide Making Errors (cont.) Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent. Heres an illustration of the four situations in a hypothesis test:
Copyright © 2010 Pearson Education, Inc. Slide Making Errors (cont.) How often will a Type I error occur? Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our level. When H 0 is false and we reject it, we have done the right thing. A tests ability to detect a false hypothesis is called the power of the test.
Copyright © 2010 Pearson Education, Inc. Slide Making Errors (cont.) When H 0 is false and we fail to reject it, we have made a Type II error. We assign the letter to the probability of this mistake. Its harder to assess the value of because we dont know what the value of the parameter really is. There is no single value for --we can think of a whole collection of s, one for each incorrect parameter value.
Copyright © 2010 Pearson Education, Inc. Slide Making Errors (cont.) One way to focus our attention on a particular is to think about the effect size. Ask How big a difference would matter? We could reduce for all alternative parameter values by increasing. This would reduce but increase the chance of a Type I error. This tension between Type I and Type II errors is inevitable. The only way to reduce both types of errors is to collect more data. Otherwise, we just wind up trading off one kind of error against the other.
Copyright © 2010 Pearson Education, Inc. Slide Example: Back to Avandia. The issue of the NEJM in which that study appeared also included an editorial that said, in part, A few events either way might have changed the findings for myocardial infarction or for deal from cardiovascular causes. In this setting, the possibility that the findings were due to chance cannot be excluded. What kind of error would the researchers have made if, in fact, their findings were due to chance? What could be the consequences of this error?
Copyright © 2010 Pearson Education, Inc. Slide Power The power of a test is the probability that it correctly rejects a false null hypothesis. When the power is high, we can be confident that weve looked hard enough at the situation. The power of a test is 1 –
Copyright © 2010 Pearson Education, Inc. Slide Power (cont.) Whenever a study fails to reject its null hypothesis, the tests power comes into question. When we calculate power, we imagine that the null hypothesis is false. The value of the power depends on how far the truth lies from the null hypothesis value. The distance between the null hypothesis value, p 0, and the truth, p, is called the effect size. Power depends directly on effect size.
Copyright © 2010 Pearson Education, Inc. Slide Example: The study of Avandia combined results from 47 different trials, a method called meta- analysis. The drugs manufacturer issued a statement that pointed out, Each study is designed differently and looks at unique questions. By combining data from many studies, meta-analyses can achieve a much larger sample size. How could this larger sample size help?
Copyright © 2010 Pearson Education, Inc. Slide A Picture Worth Words The larger the effect size, the easier it should be to see it. Obtaining a larger sample size decreases the probability of a Type II error, so it increases the power. It also makes sense that the more were willing to accept a Type I error, the less likely we will be to make a Type II error.
Copyright © 2010 Pearson Education, Inc. Slide A Picture Worth Words (cont.) This diagram shows the relationship between these concepts:
Copyright © 2010 Pearson Education, Inc. Slide Reducing Both Type I and Type II Error The previous figure seems to show that if we reduce Type I error, we must automatically increase Type II error. But, we can reduce both types of error by making both curves narrower. How do we make the curves narrower? Increase the sample size.
Copyright © 2010 Pearson Education, Inc. Slide Reducing Both Type I and Type II Error (cont.) This figure has means that are just as far apart as in the previous figure, but the sample sizes are larger, the standard deviations are smaller, and the error rates are reduced:
Copyright © 2010 Pearson Education, Inc. Slide Reducing Both Type I and Type II Error (cont.) Original comparison of errors: Comparison of errors with a larger sample size:
Copyright © 2010 Pearson Education, Inc. Slide What Can Go Wrong? Dont interpret the P-value as the probability that H 0 is true. The P-value is about the data, not the hypothesis. Its the probability of observing data this unusual, given that H 0 is true, not the other way around. Dont believe too strongly in arbitrary alpha levels. Its better to report your P-value and a confidence interval so that the reader can make her/his own decision.
Copyright © 2010 Pearson Education, Inc. Slide Example: More Avandia … The drug manufacturer pointed out in their rebuttal, Data from the earlier clinical trial did show a small increase in reports of myocardial infarction among the Avandia-treated group…however, the number of events is too small to reach a reliable conclusion about the role of the medicines may have played in this finding. Why would this smaller study have been less likely to detect the difference in risk? What are the appropriate statistical concepts for comparing the smaller studies?
Copyright © 2010 Pearson Education, Inc. Slide What have we learned? And, weve learned about the two kinds of errors we might make and seen why in the end were never sure weve made the right decision. Type I error, reject a true null hypothesis, probability is. Type II error, fail to reject a false null hypothesis, probability is. Power is the probability we reject a null hypothesis when it is false, 1 –. Larger sample size increase power and reduces the chances of both kinds of errors.
Copyright © 2010 Pearson Education, Inc. Slide Example: A bank wondered if it could hget more customers to make payments on delinquent balances by sending them a DVD urging them to set up a payment plan. The bank tested their strategy. A 90% confidence interval for the success rate is (0.29, 0.45). Their old send-a-letter method had worked 30% of the time. Can you reject the null hypothesis that the proportion is still 30% at alpha = 0.05? Example: Given the confidence interval the bank found in their trial of DVDs, what would you recommend that they do? Should they scrap the DVD strategy? Example: Explain what a type I error is in this context and what the consequences would be to the bank. Example: What is a type II error in the experiment and what would its consequences be? Example: For the bank, which situation has the higher power: a strategy that works really well, actually getting 60% of people to pay off their balances, or a strategy that barely increases the payoff rate to 32%? Explain.
Copyright © 2010 Pearson Education, Inc. Slide Homework: Pg odd Day 2 – Pg a,b
Copyright © 2010 Pearson Education, Inc. Slide
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 8: Hypothesis Testing (Chapter 7.1–7.2, 7.4) Distribution of Estimators (Chapter.
Test of Significance Presenter: Shib Sekhar Datta Moderator: M S Bharambe.
Effect Sizes and Power Review. Statistical Power Statistical power refers to the probability of finding a particular sized effect Specifically, it is.
Testing a Claim Chapter Intro I can make 80% of my BBall free throws. To test my claim, you ask me to shoot 20 free throws. I only make 8 and.
Quantitative Methods Lecture 3 Populations and Samples Statistics books often assume we already know the true mean or the true variance of the whole population.
AP Statistics Hamilton/Mann. Confidence intervals are one of the two most common types of statistical inference. Use confidence intervals when you.
Copyright © 2011 Pearson Education, Inc. Putting Statistics to Work.
1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlikeLicense. Your use of this material constitutes acceptance of that license.
1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO.
21-1 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 21 Hypothesis.
Chapter 13 Comparing Two Population Parameters AP Statistics Hamilton and Mann.
CS1512 Foundations of Computing Science 2 Lecture 5 Inferential statistics.
A.MANN ADAPTED FROM HAMILTON AP STATS Chapter 5 Producing Data.
Work measurement Part II of Work Study. 2 Introduction Work measurement is the application of techniques designed to establish the time for a qualified.
Hypothesis Testing Testing Statistical Significance.
Ethical Issues Raised by Current Research on Drug Addiction Dr Tom Walker Centre for Professional Ethics Keele University United Kingdom.
1 Logistic regression. 2 Programme 2:05 pm Talk 3:15 pm Coffee break 3:45 pm A short exercise 4:30 pm Finish.
1 MRes 3rd March 2010 Logistic regression. 2 Programme 2pm – 3:15pm. A talk. A break for coffee. 3:45pm – 4:30pm. A short exercise.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures AP Statistics Hamilton/Mann.
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL.
Statistics. Be able to state the null and alternative hypotheses for testing the difference between two population proportions. Know how to examine.
Chapter 8 An Economic Analysis of Financial Structure.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapter 10 Introduction to Inference Will Freeman.
1 MRes Wednesday 11 th March 2009 Logistic regression.
Hypothesis Tests: Two Independent Samples Cal State Northridge 320 Andrew Ainsworth PhD.
© 2016 SlidePlayer.com Inc. All rights reserved.