Objectives (PSLS Chapter 19) Inference for a population proportion  Conditions for inference on proportions  The sample proportion (p hat )  The sampling.

Objectives (PSLS Chapter 19) Inference for a population proportion  Conditions for inference on proportions  The sample proportion (p hat )  The sampling distribution of  Significance test for a proportion  Confidence interval for p  Sample size for a desired margin of error

Conditions for inference on proportions Assumptions: 1.The data used for the estimate are a random sample from the population studied. 2.The population is at least 20 times as large as the sample. This ensures independence of successive trials in the random sampling. 3.The sample size n is large enough that the shape of the sampling distribution is approximately Normal. How large depends on the type of inference conducted.

We treat a group of 120 Herpes patients with a new drug; 30 get better: p ̂ = (??)/(??) = ?? (proportion of patients improving in sample) The sample proportion p ̂ We now study categorical data and draw inference on the proportion, or percentage, of the population with a specific characteristic. If we call a given categorical characteristic in the population “success,” then the sample proportion of successes, (p hat ) is:

Sampling distribution of p ̂

The mean and standard deviation (width) of the sampling distribution are both completely determined by p and n.

Significance test for p When testing: H 0 : p = p 0 (a given value we are testing) This is valid when both expected counts — expected successes np 0 and expected failures n(1 − p 0 ) — are each 10 or larger.

P-value for a one or two sided alternative The P-value is the probability, if H 0 was true, of obtaining a test statistic like the one computed or more extreme in the direction of H a.

Aphids evade predators (ladybugs) by dropping off the leaf. An experiment examined the mechanism of aphid drops. “When dropped upside-down from delicate tweezers, live aphids landed on their ventral side in 95% (sample proportion) of the trials (19 out of 20). In contrast, dead aphids landed on their ventral side in 52.2% of the trials (12 out of 23).” Is there evidence (at significance level 5%) that live aphids land right side up (on their ventral side) more often than chance would predict? Here, “chance” would be 50% ventral landings. So we test: The expected counts of success and failure are each 10, so the z procedure is valid. The test P-value is P(Z ≥ 4.02). From Table B, P = P(Z < -4.02) <.0002, highly significant. We reject H 0. There is very strong evidence (P <.0002) that the righting behavior of live aphids is better than chance.

Mendel’s first law of genetic inheritance states that crossing dominant and recessive homozygote parents yields a second generation made of 75% of dominant-trait individuals. When Mendel crossed pure breeds of plants producing smooth peas and plants producing wrinkled peas, the second generation (F2), was made of 5474 smooth peas and 1850 wrinkled peas. Do these data provide evidence that the proportion of smooth peas in the F2 population is not 75%? The sample proportion of smooth peas is: We test: From Table B, we find P = 2P(Z < -.59) = 2 x.2776 =.56, not significant. We fail to reject H 0. The data are consistent with a dominant-recessive genetic model.

Confidence interval for p When p is unknown, both the center and the spread of the sampling distribution are unknown  problem. We need to “guess” a value for p. Our options: This is the “large sample method”. It performs poorly when sample size is small. This is the “plus four method”. It is reasonably accurate. Always use with caution

Large-sample confidence interval for p Use this method when the number of successes and the number of failures are both at least 15. C z*-z* m Confidence intervals contain the population proportion p in C % of samples. For an SRS of size n drawn from a large population and with sample proportion p ̂ calculated from the data, an approximate level C confidence interval for p is C is the area under the standard normal curve between -z* and z*.

Medication side effects Arthritis is a painful, chronic inflammation of the joints. An experiment on the side effects of pain relievers examined arthritis patients to find the proportion of patients who suffer side effects. What are some side effects of ibuprofen? Serious side effects (seek medical attention immediately): Allergic reactions (difficulty breathing, swelling, or hives) Muscle cramps, numbness, or tingling Ulcers (open sores) in the mouth Rapid weight gain (fluid retention) Seizures Black, bloody, or tarry stools Blood in your urine or vomit Decreased hearing or ringing in the ears Jaundice (yellowing of the skin or eyes) Abdominal cramping, indigestion, or heartburn Less serious side effects (discuss with your doctor): Dizziness or headache Nausea, gaseousness, diarrhea, or constipation Depression Fatigue or weakness Dry mouth Irregular menstrual periods.

We compute a 90% confidence interval for the population proportion of arthritis patients who suffer some "adverse symptoms." What is the sample proportion p ̂ ? For a 90% confidence level, z* = 1.645. Using the large sample method:  With 90% confidence level, between 3.5% and 6.9% of arthritis patients taking this pain medication experience some adverse symptoms.

“Plus four” confidence interval for p The “plus four” method gives reasonably accurate confidence intervals. We act as if we had four additional observations, two successes and two failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. The “plus four” estimate of p is: An approximate level C confidence interval is: Use this method when C is at least 90% and sample size is at least 10.

We want a 90% CI for the population proportion of arthritis patients who suffer some “adverse symptoms.” An approximate 90% confidence interval for p using the “plus four” method is:  With 90% confidence, between 3.8% and 7.4% of the population of arthritis patients taking this pain medication experience some adverse symptoms. What is the value of the “plus four” estimate of p?

Sample size for a desired margin of error You may need to choose a sample size large enough to achieve a specified margin of error. Because the sampling distribution of p ̂ is a function of the unknown population proportion p this process requires that you guess a likely value for p: p*. Make an educated guess, or use p* = 0.5 (most conservative estimate).

What sample size would we need in order to achieve a margin of error no more than 0.01 (1 percentage point) with a 90% confidence level? We could use 0.5 for our guessed p*. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer “adverse symptoms” (a better guess than 50%). For a 90% confidence level, z* = 1.645.  To obtain a margin of error no more than 0.01 we need a sample size n of at least 2436 arthritis patients.

Sample size and margin of error continued  Ex) What sample size would we need in order to achieve a margin of error no more than 0.03 (3 percentage point) with a 95% confidence level? We need at least 385.

Objectives (PSLS Chapter 19) Inference for a population proportion  Conditions for inference on proportions  The sample proportion (p hat )  The sampling.

Similar presentations

Presentation on theme: "Objectives (PSLS Chapter 19) Inference for a population proportion  Conditions for inference on proportions  The sample proportion (p hat )  The sampling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Objectives (PSLS Chapter 19) Inference for a population proportion  Conditions for inference on proportions  The sample proportion (p hat )  The sampling.

Similar presentations

Presentation on theme: "Objectives (PSLS Chapter 19) Inference for a population proportion  Conditions for inference on proportions  The sample proportion (p hat )  The sampling."— Presentation transcript:

Similar presentations

About project

Feedback