Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.

Slides:



Advertisements
Similar presentations
Chapter 10: Estimating with Confidence
Advertisements

Inference Sampling distributions Hypothesis testing.
Introduction to Inference Estimating with Confidence IPS Chapter 6.1 © 2009 W.H. Freeman and Company.
IPS Chapter 6 © 2012 W.H. Freeman and Company  6.1: Estimating with Confidence  6.2: Tests of Significance  6.3: Use and Abuse of Tests  6.4: Power.
Chapter 11: Inference for Distributions
Chapter 9 Hypothesis Testing.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview Definition Hypothesis
Confidence Intervals and Hypothesis Testing - II
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 9 Introduction to Hypothesis Testing.
Objectives 6.2 Tests of significance
14. Introduction to inference
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
BPS - 3rd Ed. Chapter 151 Inference in Practice. BPS - 3rd Ed. Chapter 152  If we know the standard deviation  of the population, a confidence interval.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Chapter 8 Introduction to Hypothesis Testing
Significance Tests: THE BASICS Could it happen by chance alone?
Introduction to inference Use and abuse of tests; power and decision IPS chapters 6.3 and 6.4 © 2006 W.H. Freeman and Company.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
CHAPTER 16: Inference in Practice ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
CHAPTER 9 Testing a Claim
Significance Test A claim is made. Is the claim true? Is the claim false?
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Essential Statistics Chapter 141 Thinking about Inference.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 6 Introduction to Inference.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Chapter 221 What Is a Test of Significance?. Chapter 222 Thought Question 1 The defendant in a court case is either guilty or innocent. Which of these.
Statistics 101 Chapter 10 Section 2. How to run a significance test Step 1: Identify the population of interest and the parameter you want to draw conclusions.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Inference Toolbox! To test a claim about an unknown population parameter: Step 1: State Identify the parameter (in context) and state your hypotheses.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Issues concerning the interpretation of statistical significance tests.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 21: More About Tests
IPS Chapter 6 © 2012 W.H. Freeman and Company  6.1: Estimating with Confidence  6.2: Tests of Significance  6.3: Use and Abuse of Tests  6.4: Power.
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
CHAPTER 16: Inference in Practice ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
BPS - 5th Ed. Chapter 151 Thinking about Inference.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Lesson Use and Abuse of Tests. Knowledge Objectives Distinguish between statistical significance and practical importance Identify the advantages.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
6.2 Large Sample Significance Tests for a Mean “The reason students have trouble understanding hypothesis testing may be that they are trying to think.”
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Inference for a population mean BPS chapter 16 © 2006 W.H. Freeman and Company.
Section 9.1 First Day The idea of a significance test What is a p-value?
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 6: Tests of significance / tests of hypothesis.
For the same confidence level, narrower confidence intervals can be achieved by using larger sample sizes.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
The Practice of Statistics in the Life Sciences Third Edition
Introduction to inference Use and abuse of tests; power and decision
Objectives 6.2 Tests of significance
CHAPTER 18: Inference in Practice
CHAPTER 16: Inference in Practice
Presentation transcript:

Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company

Objectives (BPS chapter 16) Inference in practice  Where did the data come from?  Cautions about z procedures  Cautions about confidence intervals  Cautions about significance tests  The power of a test  Type I and II errors

 When you use statistical inference, you are acting as if your data are a probability sample or come from a randomized experiment.  Statistical confidence intervals and hypothesis tests cannot remedy basic flaws in producing the data, such as voluntary response samples or uncontrolled experiments. Where did the data come from?

Requirements  The data must be an SRS, simple random sample, of the population. More complex sampling designs require more complex inference methods.  The sampling distribution must be approximately normal. This is not true in all instances.  We must know  the population standard deviation. This is often an unrealistic requisite. We'll see what can be done when  is unknown in the next chapter. Caution about z procedures

 We cannot use the z procedure if the population is not normally distributed and the sample size is too small because the central limit theorem will not work and the sampling distribution will not be approximately normal.  Poorly designed studies often produce useless results (e.g., agricultural studies before Fisher). Nothing can overcome a poor design.  Outliers influence averages and therefore your conclusions as well. Distr. of 1 observation Distr. for avg. of 2 observations Distr. for avg. of 10 observations Distr. for avg. of 25 observations

The margin of error does not cover all errors: The margin of error in a confidence interval covers only random sampling error. Undercoverage, nonresponse, or other forms of bias are often more serious than random sampling error (e.g., our elections polls). The margin of error does not take these into account at all. Cautions about confidence intervals

How small a P-value is convincing evidence against H 0 ? Factors often considered in choosing the significance level  :  What are the consequences of rejecting the null hypothesis (e.g., global warming, convicting a person for life with DNA evidence)?  Are you conducting a preliminary study? If so, you may want a larger alpha so that you will be less likely to miss an interesting result. Some conventions:  We typically use the standards of our field of work.  There are no “sharp” cutoffs: e.g., 4.9% versus 5.1 %.  It is the order of magnitude of the P-value that matters (“somewhat significant,” “significant,” or “very significant”). Cautions about significance tests

Practical significance Statistical significance only says whether the effect observed is likely to be due to chance alone because of random sampling. Statistical significance may not be practically important. That’s because statistical significance doesn’t tell you about the magnitude of the effect, only that there is one. An effect could be small enough to be irrelevant. And with a large enough sample size, a test of significance can detect even very small differences between two sets of data, as long as it is real.  Example: Drug to lower temperature, found to reproducibly lower a patient’s temperature by 0.4° Celsius (P-value < 0.01). But clinical benefits of temperature reduction, found to appear for 1° decrease or more.

Sample size affects statistical significance  Because large random samples have small chance variation, very small population effects can be highly significant if the sample is large.  Because small random samples have a lot of chance variation, even large population effects can fail to be significant if the sample is small.

Interpreting effect size: It’s all about context There is no consensus on how big an effect has to be in order to be considered meaningful. In some cases, effects that may appear to be trivial can in reality be very important.  Example: Improving the format of a computerized test reduces the average response time by about 2 seconds. Although this effect is small, it is important since this is done millions of times a year. The cumulative time savings of using the better format is gigantic. Always think about the context. Try to plot your results, and compare them with a baseline or results from similar studies.

More cautions… Confidence intervals vs. hypothesis tests  It’s a good idea to give a confidence interval for the parameter in which you are interested. A confidence interval actually estimates the size of an effect rather than simply asking if it is too large to reasonably occur by chance alone. Beware of multiple analyses  Running one test and reaching the 5% level of significance is reasonably good evidence that you have found something. Running 20 tests and reaching that level only once is not.  A single 95% confidence interval has probability 0.95 of capturing the true parameter each time you use it.  The probability that all of 20 confidence intervals will capture their parameters is much less than 95%: it is (0.95) 20 = only  If you think that multiple tests or intervals may have discovered an important effect, you need to gather new data to do inference about that specific effect.

The power of a test The power of a test of hypothesis with fixed significance level α is the probability that the test will reject the null hypothesis when the alternative is true. In other words, power is the probability that the data gathered in an experiment will be sufficient to reject a wrong null hypothesis. Knowing the power of your test is important:  When designing your experiment: To select a sample size large enough to detect an effect of a magnitude you think is meaningful.  When a test found no significance: Check that your test would have had enough power to detect an effect of a magnitude you think is meaningful.

How large a sample size do I need? How large a sample do we need for a z test at the 5% significance level to have a power 90% against various effect sizes? When calculating the effect size, think about how large an effect in the population is important in practice.

How large a sample size do I need? In general:  If you want a smaller significance level (  ) or a higher power (1 -  ), you need a larger sample.  A two-sided alternative hypothesis always requires a larger sample than a one-sided alternative.  Detecting a small effect requires a larger sample than detecting a larger effect.

Test of hypothesis at significance level α 5%: H 0 : µ = 0 versus H a : µ > 0 Can an exercise program increase bone density? From previous studies, we assume that σ = 2 for the percent change in bone density and would consider a percent increase of 1 medically important. Is 25 subjects a large enough sample for this project? A significance level of 5% implies a lower tail of 95% and z = Thus: All sample averages larger than will result in rejecting the null hypothesis.

What if the null hypothesis is wrong and the true population mean is 1? The power against the alternative µ = 1% is the probability that H 0 will be rejected when in fact µ = 1%. We expect that a sample size of 25 would yield a power of 80%. A test power of 80% or more is considered good statistical practice.

Type I and II errors  A Type I error is made when we reject the null hypothesis and the null hypothesis is actually true (incorrectly reject a true H 0 ). The probability of making a Type I error is the significance level .  A Type II error is made when we fail to reject the null hypothesis and the null hypothesis is false (incorrectly keep a false H 0 ). The probability of making a Type II error is labeled . The power of a test is 1 − .

Type I and II errors—court of law H 0 : The person on trial is not a thief. (In the U.S., people are considered innocent unless proven otherwise.) H a : The person on trial is a thief. (The police believe this person is the main suspect.)  A Type I error is made if a jury convicts a truly innocent person. (They reject the null hypothesis even though the null hypothesis is actually true.)  A Type II error is made if a truly guilty person is set free. (The jury fails to reject the null hypothesis even though the null hypothesis is false.)

Running a test of significance is a balancing act between the chance α of making a Type I error and the chance  of making a Type II error. Reducing α reduces the power of a test and thus increases . It might be tempting to emphasize greater power (the more the better).  However, with "too much power" trivial effects become highly significant.  A Type II error is not definitive since a failure to reject the null hypothesis does not imply that the null hypothesis is wrong.