Confidence Intervals and Hypothesis tests with Proportions.

Slides:



Advertisements
Similar presentations
Inference on Proportions. Assumptions: SRS Normal distribution np > 10 & n(1-p) > 10 Population is at least 10n.
Advertisements

Confidence Intervals with Proportions
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Inference on Proportions. What are the steps for performing a confidence interval? 1.Assumptions 2.Calculations 3.Conclusion.
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Confidence Intervals Chapter 7. Rate your confidence Guess my mom’s age within 10 years? –within 5 years? –within 1 year? Shooting a basketball.
Confidence Intervals Chapter 10. Rate your confidence Name my age within 10 years? 0 within 5 years? 0 within 1 year? 0 Shooting a basketball.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 8 Introduction to Hypothesis Testing
Confidence Intervals Chapter 10. Rate your confidence Name my age within 10 years? within 5 years? within 1 year? Shooting a basketball at a wading.
Significance Tests for Proportions Presentation 9.2.
Two-Sample Proportions Inference Notes: Page 211.
Inference for Two Proportions Chapter 22
Hypothesis Tests Hypothesis Tests One Sample Proportion.
Overview Definition Hypothesis
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Testing Hypotheses About Proportions
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Two-Sample Proportions Inference. Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing on.
Hypothesis Tests with Proportions Chapter 10 Notes: Page 169.
Hypothesis Tests with Proportions Chapter 10. Write down the first number that you think of for the following... Pick a two-digit number between 10 and.
When should you find the Confidence Interval, and when should you use a Hypothesis Test? Page 174.
Inference on Proportions
Inference for Proportions One Sample. Confidence Intervals One Sample Proportions.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Chapter 20 Testing hypotheses about proportions
Confidence Intervals with Proportions Chapter 9 Notes: Page 165.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Chapter 22 Two-Sample Proportions Inference. Steps for doing a confidence interval: 1)State the parameter 2)Assumptions – 1) The samples are chosen randomly.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Confidence Intervals For a Sample Mean. Point Estimate singleUse a single statistic based on sample data to estimate a population parameter Simplest approach.
Confidence Intervals with Proportions Using the Calculator Notes: Page 166.
Confidence Intervals for Proportions Chapter 19. Rate your confidence Name my age within 10 years? within 5 years? within 1 year? Shooting a basketball.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Confidence Intervals. Rate your confidence Name my age within 10 years? within 5 years? within 1 year? Shooting a basketball at a wading pool,
Suppose we wanted to estimate the proportion of registered voters who are more enthusiastic about voting in this election compared to other years? Suppose.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 10: Confidence Intervals
Confidence Intervals with Means. Rate your confidence Name my age within 10 years? Name my age within 10 years? within 5 years? within 5 years?
Inference on Proportions. Assumptions: SRS Normal distribution np > 10 & n(1-p) > 10 Population is at least 10n.
Confidence Intervals for Proportions Chapter 19. Rate your confidence Name my age within 10 years? within 5 years? within 1 year? Shooting a basketball.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Two-Sample Proportions Inference. Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing.
Hypothesis Tests Hypothesis Tests Large Sample 1- Proportion z-test.
Hypothesis Tests for 1-Proportion Presentation 9.
Inference with Proportions II Hypothesis Testing Using a Single Sample.
Two-Sample Proportions Inference. Conditions TwoindependentTwo independent SRS’s (or randomly assigned treatments) Populations > 10n Both sampling dist.’s.
Two-Sample Proportions Inference
Two-Sample Proportions Inference
Two-Sample Proportions Inference
Hypothesis Tests for 1-Sample Proportion
Hypothesis Tests One Sample Means
Inference on Proportions
Two-Sample Proportions Inference
Two-Sample Hypothesis Test of Proportions
Hypothesis Tests with Proportions
Confidence Intervals with Proportions
Confidence Intervals with Proportions
Two-Sample Proportions Inference
Inference on Proportions
Comparing Two Proportions
Comparing Two Proportions
Inference on Proportions Confidence Intervals and Hypothesis Test
Inference on Proportions
Two-Sample Inference Procedures with Means
Presentation transcript:

Confidence Intervals and Hypothesis tests with Proportions

What happens to your confidence as the interval gets smaller? Your confidence level decreases with smaller intervals % % % %

Confidence level methodIs the success rate of the method used to construct the interval containUsing this method, ____% of the time the intervals constructed will contain the true population parameter

fromFound from the confidence level upper z-scoreThe upper z-score with probability p lying to its right under the standard normal curve Confidence leveltail areaz* Critical value (z*).05 z*= z*= z*= % 95% 99%

Confidence interval for a population proportion: Statistic + Critical value × Standard deviation of the statistic Margin of error But do we know the population proportion?

What are the steps for performing a confidence interval? 1.) Assumptions SRS of context Approximate Normal distribution because np > 10 & n(1-p) > 10 Population is at least 10n 2.) Calculations 3.) Conclusion We are ________% confident that the true proportion context is between ______ and ______.

As the confidence level increases, do the intervals generally get wider or more narrow? Explain. As the sample size increases, do the intervals generally get wider or more narrow? Explain. When 100 confidence intervals are generated, why are they all different? If the confidence level selected is 90%, about how many of 100 intervals will cover the true percentage of orange balls? Will exactly this number of intervals cover the true percentage each time 100 intervals are created? Explain.

A May 2000 Gallup Poll found that 38% of a random sample of 1012 adults said that they believe in ghosts. Find a 95% confidence interval for the true proportion of adults who believe in ghost.

Assumptions: Have an SRS of adults np =1012(.38) = & n(1-p) = 1012(.62) = Since both are greater than 10, the distribution can be approximated by a normal curve Population of adults is at least 10,120. We are 95% confident that the true proportion of adults who believe in ghosts is between 35% and 41%. Step 1: check assumptions! Step 2: make calculations Step 3: conclusion in context

The manager of the dairy section of a large supermarket took a random sample of 250 egg cartons and found that 40 cartons had at least one broken egg. Find a 90% confidence interval for the true proportion of egg cartons with at least one broken egg.

Assumptions: Have an SRS of egg cartons np =250(.16) = 40 & n(1-p) = 250(.84) = 210 Since both are greater than 10, the distribution can be approximated by a normal curve Population of cartons is at least We are 90% confident that the true proportion of egg cartons with at least one broken egg is between 12.2% and 19.8%. Step 1: check assumptions! Step 2: make calculations Step 3: conclusion in context

Another Gallop Poll istaken in order to measure the proportion of adults who approve of attempts to clone humans. What sample size is necessary to be within of the true proportion of adults who approve of attempts to clone humans with a 95% Confidence Interval? To find sample size: However, since we have not yet taken a sample, we do not know a p-hat (or p) to use!

Another Gallop Poll is taken in order to measure the proportion of adults who approve of attempts to clone humans. What sample size is necessary to be within of the true proportion of adults who approve of attempts to clone humans with a 95% Confidence Interval? Use p-hat =.5 Divide by 1.96 Square both sides Round up on sample size

What are hypothesis tests? Calculations that tell us if the sample statistics (p-hat) occurs by random chance or not OR... if it is statistically significant Is it... –a random occurrence due to natural variation? –an occurrence due to some other reason? NOT Statistically significant means that it is NOT a random chance occurrence! Is it one of the sample proportions that are likely to occur? Is it one that isn’t likely to occur? test statistic These calculations (called the test statistic) will tell us how many standard deviations a sample proportion is from the population proportion!

Steps: 1)Assumptions 2)Hypothesis statements & define parameters 3)Calculations 4)Conclusion, in context

Assumptions for z-test: Have an SRS of context Distribution is (approximately) normal because both np > 10 and n(1-p) > 10 Population is at least 10n YEA YEA – These are the same assumptions as confidence intervals!!

How to write hypothesis statements Null hypothesis – is the statement (claim) being tested; this is a statement of “no effect” or “no difference” Alternative hypothesis – is the statement that we suspect is true H0:H0:H0:H0: Ha:Ha:Ha:Ha:

How to write hypotheses: Null hypothesis H 0 : parameter = hypothesized value Alternative hypothesis H a : parameter > hypothesized value H a : parameter < hypothesized value H a : parameter = hypothesized value

Facts to remember about hypotheses: Hypotheses ALWAYS refer to populations (use parameters – never statistics) The alternative hypothesis should be what you are trying to prove! ALWAYS define your parameter in context!

Activity: For each pair of hypotheses, indicate which are not legitimate & explain why Must use parameter (population) x is a statistics (sample)  is the population proportion! Must use same number as H 0 ! P-hat is a statistic – Not a parameter! Must be NOT equal!

P-value - as extreme or moreAssuming H 0 is true, the probability that the statistic would have a value as extreme or more than what is actually observed Notice that this is a conditional probability The statistic is our p-hat! Why not find the probability that the p-hat equals a certain value? Remember that in continuous distributions, we cannot find probabilities of a single value!

P-values - as extreme or moreAssuming H 0 is true, the probability that the statistic would have a value as extreme or more than what is actually observed In other words... What is the probability of getting values more (or less) than our p-hat? We can use normalcdf to find this probability.

Level of significance - Is the amount of evidence necessary before we begin to doubt that the null hypothesis is true Is the probability that we will reject the null hypothesis, assuming that it is true Denoted by  –Can be any value –Usual values: 0.1, 0.05, 0.01 –Most common is 0.05

Statistically significant – as small smallerOur statistic (p-hat) is statistically significant if the p-value is as small or smaller than the level of significance (  ). Decisions: rejectIf p-value < , “reject” the null hypothesis at the  level. fail to rejectIf p-value > , “fail to reject” the null hypothesis at the  level. Our “guilty” verdict. Our “not guilty” verdict. Remember that the verdict is never “innocent” – so we can never decide that the null is true!

Facts about p-values: ALWAYS make the decision about the null hypothesis! Large p-values show support for the null hypothesis, but never that it is true! Small p-values show support that the null is not true. Double the p-value for two-tail (≠) tests Never acceptNever accept the null hypothesis!

Never “accept” the null hypothesis!

Calculating p-values For z-test statistic (z) – –Use normalcdf(lb,ub) to find the probability of the test statistic or more extreme –Remember the standard normal curve is comprised of z’s where  = 0 and  = 1 We will see how to compute this value tomorrow. Since we are in the standard normal curve, we do not need  here.

Writing Conclusions: 1)A statement of the decision being made (reject or fail to reject H 0 ) & why (linkage) 2)A statement of the results in context. (state in terms of H a ) AND

“Since the p-value ) , I reject (fail to reject) the H 0. There is (is not) sufficient evidence to suggest that H a.” Be sure to write H a in context (words)!

Formula for hypothesis test:

Example 5: A company is willing to renew its advertising contract with a local radio station only if the station can prove that more than 20% of the residents of the city have heard the ad and recognize the company’s product. The radio station conducts a random sample of 400 people and finds that 90 have heard the ad and recognize the product. Is this sufficient evidence for the company to renew its contract?

Assumptions: Have an SRS of people np = 400(.2) = 80 & n(1-p) = 400(.8) = Since both are greater than 10, this distribution is approximately normal. Population of people is at least H 0 : p =.2where p is the true proportion of people who H a : p >.2heard the ad Since the p-value > , I fail to reject the null hypothesis. There is not sufficient evidence to suggest that the true proportion of people who heard the ad is greater than.2. The company will not renew their advertising contract with the radio station. Use the parameter in the null hypothesis to check assumptions! Use the parameter in the null hypothesis to calculate standard deviation!

Calculate the appropriate confidence interval for the above problem. CI = (.19066,.25934) How do the results from the confidence interval compare to the results of the hypothesis test? The confidence interval contains the parameter of.2 thus providing no evidence that more than 20% had heard the ad.

Two-Sample Proportions Inference

Assumptions: TwoindependentTwo, independent SRS’s from populations ( or randomly assigned treatments) Populations at least 10n Normal approximation for both

Formula for confidence interval: Note: use p-hat when p is not known Standard error! Margin of error!

Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is the shape & standard error of the sampling distribution of the difference in the proportions of people with no visible scars between the two groups? Since n 1 p 1 =259, n 1 (1-p 1 )=57, n 2 p 2 =94, n 2 (1-p 2 )=325 and all > 10, then the distribution of difference in proportions is approximately normal.

Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is a 95% confidence interval of the difference in proportion of people who had no visible scars between the plasma compress treatment & control group?

Assumptions: Have 2 independent randomly assigned treatment groups Both distributions are approximately normal since n 1 p 1 =259, n 1 (1-p 1 )=57, n 2 p 2 =94, n 2 (1-p 2 )=325 and all > 5 Population of burn patients is at least Since these are all burn patients, we can add = 735. If not the same – you MUST list separately. We are 95% confident that the true the difference in proportion of people who had no visible scars between the plasma compress treatment & control group is between 53.7% and 65.4%

Example 2: Suppose that researchers want to estimate the difference in proportions of people who are against the death penalty in Texas & in California. If the two sample sizes are the same, what size sample is needed to be within 2% of the true difference at 90% confidence? Since both n’s are the same size, you have common denominators – so add! n = 3383

Hypothesis statements: H 0 : p 1 - p 2 = 0 H a : p 1 - p 2 > 0 H a : p 1 - p 2 < 0 H a : p 1 - p 2 ≠ 0 Be sure to define both p 1 & p 2 ! H 0 : p 1 = p 2 H a : p 1 > p 2 H a : p 1 < p 2 H a : p 1 ≠ p 2

Since we assume that the population proportions are equal in the null hypothesis, the variances are equal. Therefore, we pool the variances!

Formula for Hypothesis test: Usually p 1 – p 2 =0

Example 4: A forest in Oregon has an infestation of spruce moths. In an effort to control the moth, one area has been regularly sprayed from airplanes. In this area, a random sample of 495 spruce trees showed that 81 had been killed by moths. A second nearby area receives no treatment. In this area, a random sample of 518 spruce trees showed that 92 had been killed by the moth. Do these data indicate that the proportion of spruce trees killed by the moth is different for these areas?

Assumptions: Have 2 independent SRS of spruce trees Both distributions are approximately normal since n 1 p 1 =81, n 1 (1-p 1 )=414, n 2 p 2 =92, n 2 (1-p 2 )=426 and all > 10 Population of spruce trees is at least 10,130. H 0 : p 1 =p 2 where p 1 is the true proportion of trees killed by moths H a : p 1 ≠p 2 in the treated area p 2 is the true proportion of trees killed by moths in the untreated area P-value =  = 0.05 Since p-value > , I fail to reject H 0. There is not sufficient evidence to suggest that the proportion of spruce trees killed by the moth is different for these areas