Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.

Slides:



Advertisements
Similar presentations
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
STAT 135 LAB 14 TA: Dongmei Li. Hypothesis Testing Are the results of experimental data due to just random chance? Significance tests try to discover.
Chapter 10: Hypothesis Testing
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
QM Spring 2002 Business Statistics Introduction to Inference: Hypothesis Testing.
Business Statistics for Managerial Decision
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Hypothesis Testing:.
Unit 7b Statistical Inference - 2 Hypothesis Testing Using Data to Make Decisions FPP Chapters 27, 27, possibly 27 &/or 29 Z-tests for means Z-tests.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
STT 315 This lecture is based on Chapter 6. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing.
14. Introduction to inference
Lesson Carrying Out Significance Tests. Vocabulary Hypothesis – a statement or claim regarding a characteristic of one or more populations Hypothesis.
More About Significance Tests
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Significance Tests: THE BASICS Could it happen by chance alone?
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Hypotheses tests for means
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
CHAPTER 9 Testing a Claim
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 10.17:
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Chapter 20 Testing Hypothesis about proportions
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chapter 21: More About Tests
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Welcome to MM570 Psychological Statistics
AP Statistics Section 11.1 B More on Significance Tests.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Math 3680 Lecture #13 Hypothesis Testing: The z Test.
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Essential Statistics: Exploring the World through.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
The t-test With small samples, the z-test has to be modified. Statisticians use the t-test.
The normal approximation for probability histograms.
Review Statistical inference and test of significance.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 9: Hypothesis Tests for One Population Mean 9.5 P-Values.
What Is a Test of Significance?
Unit 5: Hypothesis Testing
Significance Tests: The Basics
Chapter 9: Significance Testing
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question of whether an observed difference is real or just a chance variation.

Example Suppose two investigators are arguing about a large box of numbered tickets. Null says the average is 50. (This number usually is obtained from experience or some other information.) But Alt says the average is different from 50. Both of them do not know how many tickets are there and what the average of the box is. So they agree to take a sample. The number of draws made at random is 500. The average of the sample turns out to be 48, and the SD is 15.3.

Example From the data, Null thinks that the difference is just 2, and the SD is So the difference is so small relative to the SD, and it is just chance. What is wrong here? From the chapters of accuracy of inference we learned, we know that if we don’t know the composition of the box, we could use: The bootstrap procedure to make an estimation. But then the SD of the sample is very different from the SE for average.

Example

Comments There are two sides in this example: One side thinks a difference is real; the other side thinks the difference is just by chance. The latter side is beaten by a calculation. This calculation is called a test of significance. The key idea is that: if an observed value is too many SEs away from its expected value, then it is hard to explain by chance.

Terminologies In statistics, we have two hypotheses when doing test of significance: the null hypothesis and the alternative hypothesis. In the previous example: Null hypothesis: the average of the box equals 50. Alternative hypothesis: the average of the box is less than 50. The null hypothesis corresponds to the idea that an observed difference is due to chance. To make a test of significance, the null hypothesis has to be set up as a box model for the data. The alternative hypothesis is another statement about the box, corresponding to the idea that the observed difference is real. In general, the null hypothesis is set to be rejected.

Terminologies

In the previous example, we calculated the z-statistic ≈ -3. Alt says that 3 SEs below its expected value is quite a lot. Why is that? This is because of the normal curve: the area under the curve to the left of -3 is very small—about 1 in 1,000. That is the chance of getting a sample average 3 SEs or more below its expected value is about 0.1%. This 0.1% is called an observed significance level. The observed significance level is often denoted by P—for probability—and referred to as a P-value. In the previous example, the P-value of the test is about 0.1%.

Interpretation of the P-value

Remarks As we seen, the test statistic z depends on the data, so does P. That is why P is called an observed significance level. We may see more clearly about the logic of the z-test: it is an argument by contradiction—to show the null hypothesis will lead to an absurd conclusion and must therefore be rejected. If we repeat the experiment of the test, the P-value tells us about the frequency of the time that our test statistics are as extreme as, or more extreme than the one we got before.(Think of multiple samples) This is another interpretation of the P-value, which is similar to the confidence intervals discussed before.

Remarks Remember, there is no way to define the probability of the null hypothesis being right. So the P-value is not the chance about the null hypothesis. No matter how often you do the draws, the box does not change. The null is just a statement about the box. The P-value of a test is the chance of getting a big test statistic— assuming the null hypothesis to be right. P is not the chance of the null hypothesis being right. The z-test is used for reasonably large samples, because of the normal approximation by CLT. With small samples, other techniques must be used.

Summary for making a test Set up the null hypothesis, in terms of a box model with tickets for the data. Pick a test statistic, draw the sample, measure the difference between the data from the sample and what is expected on the null hypothesis. Compute the observed significance level P-value.

Comments The choice of test statistic depends on the model and the hypothesis being considered. So far, we only have the “one-sample z-test”. Later on, we will discuss the “t-test”, the “two-sample z-test”, and the “χ²-test” if time permits. It is natural to ask how small the observed significance level has to be before we reject the null hypothesis. In general, we draw the line at 5%: If P is less than 5%, the result is called statistically significant (often shortened to significant). Another line is at 1%: If P is less than 1%, the result is called highly significant.

Questions Q: Do we have to strictly follow the lines? A: Need not. Do not let the jargon distract you from the main idea: the null hypothesis is rejected when the observed value is too many SEs away from the expected value. Q: In the previous example, for the alternative hypothesis, why we prefer “the average of the box is less than 50” rather than “more than 50”? A: This is due to the sample average 48. It becomes even worse if the average is more than 50. Q: When we do the z-test, what happen if z is positive? (For instance, suppose the sample average is 52 not 48.)

The test for counting problem The z-test can also be used when we have a problem about counting and classifying.

Example Charles Tart ran an experiment at the University of California, Davis, to demonstrate ESP (extrasensory perception). Tart used a machine to generate a number randomly, and the number will correspond to one of the 4 targets on the machine. Then the subject guesses which target was chosen, by pushing a button. The machine lights up the target it picked, ringing a bell if the subject guessed right.

Example Tart selected 15 subjects who were thought to be clairvoyant. Each of the subjects made 500 guesses. Out of the total 15 × 500 = 7,500 guesses, 2,006 were right. In general, the subjects will be right about ¼ of the time, whether or not they have the clairvoyant abilities. Then the expected correct guesses are ¼ × 7,500 = 1,875. The difference is 2,006 – 1,875 = 131 correct guesses. Tart use a test of significance to fend off the “it’s only chance”, in order to prove ESP.

Solution First, to set up a box model, Tart assume each of the 4 targets has 1 chance in 4 to be chosen. Tart assume (temporarily) that there is no ESP (null hypothesis), so that a guess has 1 chance in 4 to be right. By the assumption, the model is like the sum of 7,500 draws at random from the box with tickets: 1,0,0,0. (1 = right, 0 = wrong.) This completes the box model for the null hypothesis.

Solution

Comments There may be many reasonable explanations for the results, besides ESP. But chance variation is not one of them. The other possibilities to consider could be: For example, the random number generator of the machine may not be very good. Or the machine may be giving the subject some subtle clues as to which target it picked. The experiment may not prove ESP, but we know how the test of significance works.

Differences

The quantitative data is about the sum and the average. Then the SD has to be estimated. The qualitative data is about the number and percent. The SD is given by the composition. (counting and classifying) In the first example, there was an alternative hypothesis about the box: the average was below 50. Here, there is no sensible way to set up the alternative hypothesis. Reason: if the subjects do have ESP, the chance for each guess to be right may well depend on the previous trials, and may change from trial to trial. So the data will not be like draws from a box.

Differences In the first example, all the data were based on a box. All the arguments were based on the probability theory. Here, part of the question is whether the data are like draws from a box. If it were not, then it is not about the probability theory. In the last few chapters, we studied how to estimate parameters from data: averages, sums, numbers, and percentages. Here we learn how to test some arguments. Estimation and testing are related, but the goals are different.

Summary

The expected value is computed on the basis of the null hypothesis. If the null hypothesis determines the SD of the box, use this information when computing the SE. Otherwise, you have to estimate the SD from the data. The observed significance level, the P-value, is the chance of getting a test statistic as extreme as or more extreme than the observed one. The chance is computed on the basis that the null hypothesis is correct. The P-value is not the chance of the null hypothesis being right. Small values of P are evidence against the null hypothesis: they indicate something besides chance was operating to make the difference.